WEBVTT

00:00.000 --> 00:07.000
Yeah, I think it will be great.

00:07.000 --> 00:09.000
Yes, thumbs up.

00:09.000 --> 00:10.000
Okay.

00:10.000 --> 00:11.000
Let's go again.

00:11.000 --> 00:12.000
Yeah.

00:12.000 --> 00:15.000
Are you hearing at the end of the room?

00:15.000 --> 00:16.000
Yeah.

00:16.000 --> 00:17.000
Okay.

00:17.000 --> 00:18.000
Great.

00:18.000 --> 00:20.000
Okay.

00:20.000 --> 00:22.000
Next talk.

00:22.000 --> 00:27.000
Next talk will be, give the blame for your dependencies.

00:27.000 --> 00:34.000
Please welcome Andrew and there's a bit with his talk.

00:34.000 --> 00:38.000
Well, before hand, can I get a share of hands for anyone who went to the package manager

00:38.000 --> 00:40.000
Devroom yesterday?

00:40.000 --> 00:41.000
Woo!

00:41.000 --> 00:42.000
Thank you very much for coming.

00:42.000 --> 00:43.000
That was great fun.

00:43.000 --> 00:44.000
Yes.

00:44.000 --> 00:45.000
So, get packages.

00:45.000 --> 00:51.000
This is a fairly new project, but it is based on about 10 years of work that I've done into

00:51.000 --> 00:56.000
mining dependencies from get repositories and package managers.

00:56.000 --> 01:02.000
It is the combination of a lot of different ideas or kind of dragged into one project that

01:02.000 --> 01:07.000
I have just been hacking on over Christmas because I had a crazy Christmas that didn't involve

01:07.000 --> 01:10.000
much leaving the house.

01:10.000 --> 01:15.000
So, I was going to live demo this and then I realized how much of a dangerous game that

01:15.000 --> 01:19.000
is and the morning woke up and was like, I'm going to rewrite my whole talk and do this

01:19.000 --> 01:24.000
as static things instead so that it's a little bit more reliable.

01:24.000 --> 01:27.000
So, I'm basically going to drive you through how this project works.

01:27.000 --> 01:32.000
The idea of the project is, oh, I was going to hit that button.

01:32.000 --> 01:33.000
Sorry.

01:33.000 --> 01:36.000
We'll just imagine that there's a minute missing.

01:36.000 --> 01:42.000
The idea is this project basically wants to make get more aware of how your

01:42.000 --> 01:48.000
algorithms, how your dependencies are actually integrated into get.

01:48.000 --> 01:52.000
Generally, get is completely oblivious of how you're using your dependencies.

01:52.000 --> 01:57.000
Maybe if you're using sub modules, but also if you're using sub modules, you probably

01:57.000 --> 02:00.000
hate yourself because you chose to use sub modules.

02:00.000 --> 02:07.000
So, this is instead primarily looking at manifest and lock files that are in produce to

02:07.000 --> 02:11.000
configure your packages and also record the full transfer dependencies.

02:11.000 --> 02:14.000
Tree that was produced once you successfully installed your packages.

02:14.000 --> 02:19.000
Can we make get more aware of the kind of the semantics of what's going on there?

02:19.000 --> 02:22.000
And then what kind of comes out of that once you can do that?

02:22.000 --> 02:27.000
So, it's really easy to install with home brew or it's written in go.

02:27.000 --> 02:30.000
So, you can just do go install straight from the repository.

02:30.000 --> 02:35.000
And I tried to make it act as much as like get would look.

02:35.000 --> 02:40.000
So, it installs it just kind of plugs itself into the get-possilin.

02:40.000 --> 02:42.000
And it has all these commands.

02:42.000 --> 02:43.000
And we're going to go through some of them.

02:43.000 --> 02:47.000
We're not going to go through every single one because some of them are really uninteresting.

02:47.000 --> 02:53.000
But also it supports loads and loads of different software ecosystems and different package managers.

02:53.000 --> 02:55.000
Many different kinds of file formats.

02:55.000 --> 02:59.000
The JavaScript one I just cut it off because otherwise it would be going over into the next room.

02:59.000 --> 03:04.000
Because of how many different kinds of file formats they have and how many different lock files.

03:04.000 --> 03:06.000
But there are many of them.

03:06.000 --> 03:09.000
And yes, they are generally supported.

03:09.000 --> 03:16.000
These are primarily language package managers that are supported because those are the ones that actually record

03:16.000 --> 03:20.000
kind of what your transitive dependency tree was for your application.

03:20.000 --> 03:25.000
And thinking about this as you'll get repository for your application rather than your system dependencies.

03:25.000 --> 03:28.000
Although it does have little bits and pieces in there which we will see.

03:28.000 --> 03:31.000
Start to show up.

03:31.000 --> 03:33.000
Straight up the back.

03:33.000 --> 03:41.000
The way that this works is you have to run an in it command for most of this to really work very effectively.

03:41.000 --> 03:44.000
But when you run that it actually becomes very quick.

03:44.000 --> 03:49.000
If you were to not do this it would be every command will be slow and you would just be like I hate this project.

03:49.000 --> 03:51.000
I don't want to use it anymore.

03:51.000 --> 04:06.000
So it will basically spider the whole history of your repository pulling out every time that you changed dependency file through the history of the project and make a small sequel like database of all of those dependency changes.

04:07.000 --> 04:15.000
Which then we basically just do sequel queries across that database to be able to kind of inspect and see what's going on.

04:15.000 --> 04:20.000
And you can see when you run that in it you see the size of the database you get out.

04:20.000 --> 04:31.000
This is for a project of mine Rails app which has a good number of dependencies and has been around for kind of 13 years or something.

04:31.000 --> 04:33.000
Let's add a lot of dependency up there.

04:33.000 --> 04:38.000
So Penderbot just loves pinging me every single day on this thing.

04:38.000 --> 04:41.000
And then it actually can start to show you some interesting stats.

04:41.000 --> 04:44.000
How many dependencies do you currently have?

04:44.000 --> 04:46.000
Which ecosystems are you using?

04:46.000 --> 04:48.000
How many times have you changed dependencies?

04:48.000 --> 04:52.000
4,700 times I have updated dependencies on this repository.

04:52.000 --> 04:53.000
That's mental.

04:53.000 --> 04:56.000
Also who changed dependencies the most?

04:56.000 --> 05:00.000
It turns out dependable has done twice as much activity on this project as I have.

05:01.000 --> 05:06.000
Which is kind of like depressing but also the old version of dependentbot is up there as well.

05:06.000 --> 05:09.000
And Mark come to comes in a fourth place there.

05:09.000 --> 05:14.000
This is very high level stats which isn't necessarily particularly useful.

05:14.000 --> 05:21.000
But is cool to start kind of go digging around is also quite fun if you just find some random project and you're like,

05:21.000 --> 05:27.000
Oh, let me see what actually is like going on under the hood of this project because it might be a thin layer under some interesting dependencies.

05:27.000 --> 05:33.000
You can also then say like okay well list me all the dependencies and this then looks across all the different files.

05:33.000 --> 05:36.000
You could then break that down say actually only list me the gem dependencies.

05:36.000 --> 05:43.000
I'm not bothered about the JavaScript stuff because there's way too much JavaScript stuff for me to ever really like comprehend because it's insane.

05:43.000 --> 05:48.000
You can also then say like oh can we break this out and visualize this as a tree.

05:48.000 --> 05:54.000
There are also some I'm not going to show them here but ways of saying can you give this as Jason or CSV.

05:54.000 --> 06:07.000
If you want to pipe it into other kinds of tools as well breaking it down by where these dependencies are used which files they're defined in different ways of kind of slicing and dicing that stuff.

06:07.000 --> 06:22.000
You can also do searches and this is literally a like query across the database to be like oh what kind of rails related gems so I have and when were they added and when were they last updated so you're kind of getting a very rich picture of what's going on.

06:22.000 --> 06:34.000
But also because it's get we can just use regular things like oh can you show me this like what happened in this commit and this commit changed a number of dependencies.

06:34.000 --> 06:39.000
But I can just start poking around in the history of this thing.

06:39.000 --> 06:51.000
I can start to kind of get a real picture of rather than just what is the massive text if Ruby gems not so bad but if you have got a 10,000 line Jason file from MPM.

06:51.000 --> 07:04.000
You're just not going to bother and get how literally hides those files when you should try to review it in a poor request is like don't look at this and it's like yes that's good advice but I might actually want to look at this at some point so this gives you a rich.

07:04.000 --> 07:09.000
A semantic understanding of what's happened rather than just.

07:09.000 --> 07:13.000
Which you know they're all good projects basically give you that same kind of functionality.

07:13.000 --> 07:19.000
You can also say can you tell me what happened over the life of the project for this particular command.

07:19.000 --> 07:25.000
I'm going to try and go quick because I got like 25 slides but we're touching on the load of different things.

07:25.000 --> 07:32.000
You can also just say okay well what's the log of like all dependency activity and this works for every different branch you can basically just kind of.

07:32.000 --> 07:40.000
Hop around exactly as you would do with get and I tried to basically go what commands are available and get and can we do the same thing.

07:40.000 --> 07:54.000
But for just packages I mean eventually it would be really nice if we just lose the word packages there and be able to say like oh like here's a semantic interesting thing that happened as I was going through the log rather than just get a big.

07:55.000 --> 08:02.000
You can also say why is this package here which shows you the commit that was added and also the commit message.

08:02.000 --> 08:06.000
To go and kind of go like okay give me more context around this.

08:06.000 --> 08:12.000
Where is this package which then finds each different manifest file that this thing is described it.

08:12.000 --> 08:22.000
And then also browse now this is cool because this actually shows out to the package manager under the hood to say what path has this package been installed in locally.

08:22.000 --> 08:29.000
So you can open it with your editor and just start diving straight into the code even if that package hasn't been blended into your git repo.

08:29.000 --> 08:32.000
Which is where we start to get into some really cool things.

08:32.000 --> 08:46.000
So you can do a blame you can say like who put this in turns out Andrew is responsible for almost all of the dependencies being added to Andrew's project which is you know there's a bus factor problem there that we won't go into that's a different talk.

08:46.000 --> 08:55.000
You can also say like okay well how many of these projects are outdated and you think how does Andrew know that and like how is this project know that or this is where we get into.

08:55.000 --> 09:03.000
We've moved beyond the intrinsic data in your gem file or in your manifest in your log files and we start to enrich this with.

09:03.000 --> 09:07.000
Extrinsic data so we're going out to the registries.

09:07.000 --> 09:14.000
And but we only do this kind of on demand but then we cash that back into the SQLite database so this is really quick.

09:14.000 --> 09:21.000
We start to be able to build up a history of what is going on but also what future things are available what extra stuff can we collect.

09:21.000 --> 09:26.000
The really easy one is to say okay well how many of these packages are.

09:26.000 --> 09:30.000
Have new versions available that I could update to.

09:31.000 --> 09:43.000
What are the licenses of these packages and this kind of output gets us get overwhelming again especially a few more and more packages but there are different ways of slicing and dicing this.

09:43.000 --> 09:56.000
Grouping by license as well or being able to pass a flag to say like what about only copi left just show me all the packages and the dependencies that I have that are copi left license or have no license as far as we can tell.

09:56.000 --> 10:03.000
Then vulnerabilities now I'm quite happy with this because when I ran this I was like I don't have any vulnerabilities like I'm in a good place.

10:03.000 --> 10:10.000
But you can also then go backwards in time to say like when were these vulnerabilities.

10:10.000 --> 10:18.000
Here in this project which is basically saying like there was a vulnerability between this version of this version what time.

10:18.000 --> 10:27.000
My project depending on those versions and how long did it take me to resolve and upgrade to a version that is no longer affected by that particular CVE.

10:27.000 --> 10:34.000
Again depending on the driver could project that you run this on that could give you an overwhelming amount of things and maybe a heart attack.

10:34.000 --> 10:41.000
So you know you can also say can you make me an S bomb because some people really really like S bombs.

10:41.000 --> 10:44.000
It's basically exactly the same data as I had everywhere else.

10:44.000 --> 10:50.000
It's just in a particular format and there's different flies because of course there's two different standards for S bomb.

10:50.000 --> 10:59.000
So you can say I want an SPDX S bomb or I want a S bomb or I want an XML S bomb for some reason.

10:59.000 --> 11:02.000
You can also buy set. So this is cool.

11:02.000 --> 11:08.000
You can then say like I know that a dependency is screwed me at some point but I don't know exactly when.

11:08.000 --> 11:14.000
So can we do a semantic thing to just step through and try and find when a dependency changed.

11:14.000 --> 11:18.000
Here's a bad commit that I know this is bad and here's a commit that is good.

11:18.000 --> 11:26.000
Step through to a binary thing until we find where it is where it was introduced and what version did it change.

11:26.000 --> 11:28.000
Five minutes okay cool.

11:28.000 --> 11:35.000
Now we get into the the kind of weird and wild world of how far down the rabbit hole can and you go.

11:35.000 --> 11:39.000
So intrinsic data, extrinsic data.

11:39.000 --> 11:43.000
What do we actually just start driving the package managers themselves?

11:43.000 --> 11:50.000
So I then made a set of commands that will basically map over all the different package managers.

11:50.000 --> 11:55.000
Come on lines to say when you say add a dependency it means this in mpm.

11:55.000 --> 11:56.000
It means this in root gms.

11:56.000 --> 11:58.000
It means this in UV.

11:58.000 --> 12:03.000
For 35 I think different package managers to be able to say can you add a thing?

12:03.000 --> 12:04.000
Can you update a thing?

12:04.000 --> 12:06.000
Can you update everything?

12:06.000 --> 12:08.000
Or can you remove a thing?

12:08.000 --> 12:12.000
What this can actually enable you to do is say goodbye dependable.

12:12.000 --> 12:13.000
I don't care about you anymore.

12:13.000 --> 12:18.000
I'm just going to do get a package update on a cron every so often.

12:18.000 --> 12:22.000
And all of that stuff just happens by querying the registry.

12:22.000 --> 12:28.000
Getting the updated command and then pumping it back in and running that against each individual dependency.

12:28.000 --> 12:30.000
Which then gets committed back in.

12:30.000 --> 12:33.000
And then it's all, you know, happy.

12:33.000 --> 12:39.000
And we totally papered over the individual differences between each package manager claim.

12:39.000 --> 12:42.000
Which means you don't need to then worry about it.

12:42.000 --> 12:45.000
But also you can have one CI command that just says,

12:45.000 --> 12:46.000
I'll do this for whatever.

12:46.000 --> 12:50.000
I actually don't care about what the underlying pieces are.

12:50.000 --> 12:54.000
And we start to be able to treat individual package managers in the same way.

12:54.000 --> 12:59.000
And build tools that think about things at higher level rather than just kind of getting stuck here.

12:59.000 --> 13:09.000
You can also say, can you give me a schema for the database so that I could write my own secret request.

13:09.000 --> 13:12.000
Because I really like doing that.

13:12.000 --> 13:19.000
Very simple one is just to be like which of these dependencies have been changed the most.

13:19.000 --> 13:25.000
But you know, the word is your oyster with that schema you can then write secret queries with it.

13:25.000 --> 13:36.000
Or you could just give it to Claude and say, Claude, go ham on this and tell me, you know, what is the weirdest and wonderful things you can find in this database.

13:36.000 --> 13:41.000
The cool thing about this is it is made up of a number of small components.

13:41.000 --> 13:44.000
It's not one big honking mess of things.

13:44.000 --> 13:48.000
So there's libraries for passing the manifest files.

13:48.000 --> 13:51.000
There's libraries for driving package managers.

13:51.000 --> 13:57.000
There's library for fetching the registry metadata from upstream.

13:57.000 --> 14:04.000
There's libraries for passing the licensing data and also for dealing with the version range resolution.

14:04.000 --> 14:10.000
Because each one of those package managers has a slightly different syntax for how you describe version ranges.

14:10.000 --> 14:13.000
Which becomes very useful when you're trying to work out.

14:13.000 --> 14:17.000
Does a CVE affect me because I'm using this particular kind of package.

14:17.000 --> 14:31.000
Maybe in the future we could also even plug in vex statements to be able to say, yes, we have a CVE but actually this vex statement doesn't mean that we don't actually affect this because we have the source code of the application there as well as the dependencies.

14:31.000 --> 14:39.000
And we use package URLs everywhere to basically normalize and treat it as an individual dependency.

14:39.000 --> 14:41.000
So, very cool.

14:41.000 --> 14:43.000
I encourage you to go check it out.

14:43.000 --> 14:45.000
We have a website and the Git repo.

14:45.000 --> 14:47.000
It's all MIT licensed.

14:47.000 --> 14:54.000
But there's one cool thing I wanted to add in at the end, which I started hacking on very recently and you think,

14:54.000 --> 14:55.000
but this is cool.

14:55.000 --> 14:56.000
It's all in my command line.

14:56.000 --> 15:02.000
It runs locally, but wouldn't it be great if it was integrated into forjo and code book?

15:02.000 --> 15:08.000
So, this is the same libraries, but it's plugged into my experimental forjo.

15:08.000 --> 15:14.000
The same rich diff, the semantic understanding of what happened in this commit.

15:14.000 --> 15:16.000
It's really small.

15:16.000 --> 15:17.000
I apologize.

15:17.000 --> 15:22.000
Basically, green means something was added and orange means something was changed.

15:22.000 --> 15:25.000
You can see what version through went to and from.

15:25.000 --> 15:29.000
And this would be the same UI from whichever package manager you're using.

15:29.000 --> 15:31.000
That's just a diff driver.

15:31.000 --> 15:37.000
There is a diff driver built into Git packages, which can plug straight into forjo and just work.

15:37.000 --> 15:40.000
But then you can go farther than that because why not?

15:40.000 --> 15:41.000
Mad science.

15:41.000 --> 15:44.000
And I've got one minute.

15:44.000 --> 15:48.000
We can build a whole bloody dependency graph feature into forjo.

15:48.000 --> 15:53.000
So, this means we can then have all the features that GitHub dependency graph has,

15:53.000 --> 15:55.000
but it's entirely open source.

15:55.000 --> 15:57.000
The effect is data from the registries.

15:57.000 --> 16:03.000
It's agnostic to how that happens because it's all using those low level libraries and those pieces.

16:03.000 --> 16:07.000
The work that I've done was in building those tools as components.

16:07.000 --> 16:11.000
As soon as you have that, then you're like, okay, well actually putting it all together.

16:11.000 --> 16:13.000
It's actually pretty easy.

16:13.000 --> 16:24.000
And we can then have a fully open source entirely like self hosted set of dependency information and kind of like intelligence tools.

16:24.000 --> 16:34.000
There can be used by anyone without needing to be in a centralized platform or to have kind of a big security company selling you those services.

16:34.000 --> 16:35.000
So, that's what I got.

16:35.000 --> 16:36.000
And I got five seconds left.

16:36.000 --> 16:38.000
So, I haven't got time for questions.

16:38.000 --> 16:39.000
Let's thank you very much.

16:39.000 --> 16:42.000
Thank you.

