WEBVTT

00:00.000 --> 00:18.000
Okay, I'm to be on this phone this time.

00:18.000 --> 00:20.000
Good morning.

00:20.000 --> 00:21.000
How's it doing today?

00:21.000 --> 00:22.000
Great, ready?

00:22.000 --> 00:23.000
Excited.

00:23.000 --> 00:25.000
Okay, so my name is Michael Windsor.

00:25.000 --> 00:28.000
I am the co-founder of a project called Alpha Omega.

00:29.000 --> 00:33.000
We started in 2022 with the goal of improving open source security.

00:33.000 --> 00:37.000
We have over the past several years donated around $20 million

00:37.000 --> 00:40.000
towards improving security and open source.

00:40.000 --> 00:43.000
It's a huge problem, obviously.

00:43.000 --> 00:46.000
And, you know, we started with the organization.

00:46.000 --> 00:49.000
Even if we have a lot of money, we can't do all the things.

00:49.000 --> 00:52.000
And so our focus has always been around creating change,

00:52.000 --> 00:56.000
catalyzing and improvements that then take off and do things for themselves.

00:56.000 --> 00:59.000
And, pretty excited about the opportunity.

00:59.000 --> 01:01.000
When we think about how to do things,

01:01.000 --> 01:05.000
the Alpha set of represents our ability to leverage to change.

01:05.000 --> 01:09.000
Things where we can work in with a specific individual and entire ecosystem.

01:09.000 --> 01:12.000
So, when my favorite example is in the Python ecosystem,

01:12.000 --> 01:15.000
it was nobody's job there to worry about security.

01:15.000 --> 01:16.000
Or it was everybody's job.

01:16.000 --> 01:17.000
Same thing.

01:17.000 --> 01:20.000
And we were able to fund a security engineering residence,

01:20.000 --> 01:25.000
named Seth Larson, who has continuously just improved the culture of security,

01:25.000 --> 01:30.000
the tooling of security, the standards of security across the entire Python ecosystem.

01:30.000 --> 01:34.000
And, in fact, his work has spread to other language ecosystems as well.

01:34.000 --> 01:37.000
The scale problem is actually really hard.

01:37.000 --> 01:39.000
And, this is one of the things we're talking about today,

01:39.000 --> 01:43.000
which is we want to get solutions can just be applied to the hundreds of thousands of projects

01:43.000 --> 01:48.000
that are not directly tied to either a language ecosystem or a package repository

01:48.000 --> 01:50.000
or some sort of point of leverage.

01:50.000 --> 01:54.000
How do we solve for the rest, everything?

01:54.000 --> 01:56.000
And, it's not easy.

01:56.000 --> 02:00.000
And, I want to talk about just the trends that we're facing.

02:00.000 --> 02:03.000
So, as a maintainer of an open-source project,

02:03.000 --> 02:06.000
you know, you might build out an espawn of your stuff

02:06.000 --> 02:08.000
and you have a bunch of stuff upstream.

02:08.000 --> 02:13.000
You're consuming a large amount of open-source packages.

02:13.000 --> 02:17.000
And, in fact, I would say until sort of the exeutils event,

02:17.000 --> 02:19.000
people just assumed that everything I get from upstream is great.

02:19.000 --> 02:22.000
It came down in the back of the unicorn with rainbows and everything

02:22.000 --> 02:24.000
and I'll have to worry about that problem, right?

02:24.000 --> 02:28.000
Which is ironically, just as bad as all the corporations we complained about

02:28.000 --> 02:31.000
consuming our open-source without giving a shit about how it get made.

02:31.000 --> 02:34.000
And, these numbers are not good, right?

02:34.000 --> 02:35.000
They only get worse.

02:35.000 --> 02:36.000
They get bigger and bigger and bigger.

02:36.000 --> 02:39.000
We get log jams as well where people can't upgrade

02:39.000 --> 02:42.000
so they can't patch the vulnerability and now you have a problem as well.

02:42.000 --> 02:49.000
And, so the force that this represents creates a lot of work, right?

02:49.000 --> 02:54.000
As a maintainer, you now have to look at every CVE in your upstream

02:54.000 --> 02:56.000
or as a consumer's same problem.

02:56.000 --> 02:58.000
And, you have to decide, do I upgrade or not?

02:58.000 --> 02:59.000
Is this a risk to me or not?

02:59.000 --> 03:00.000
Right?

03:00.000 --> 03:02.000
And, the default answer is upgrade all the things.

03:02.000 --> 03:06.000
Which, if you look at how it multiplies across your work,

03:06.000 --> 03:07.000
it's one thing.

03:07.000 --> 03:10.000
But now if you imagine a deep hierarchy of projects,

03:10.000 --> 03:13.000
there's a geometric explosion of work.

03:13.000 --> 03:15.000
Shared across lots of people, great, okay?

03:15.000 --> 03:18.000
It's shared, but it's still a ton of work and it will not all get done.

03:18.000 --> 03:23.000
And so that cascade of work here is really what creates the risk

03:23.000 --> 03:24.000
that we're worried about.

03:24.000 --> 03:26.000
So, is that landed my slides?

03:26.000 --> 03:27.000
Think so?

03:27.000 --> 03:28.000
No, okay.

03:28.000 --> 03:29.000
Thank you.

03:29.000 --> 03:32.000
So, how can we reduce the geometric pain here?

03:32.000 --> 03:33.000
Right?

03:33.000 --> 03:35.000
Well, you can't change the geometry of it, but you can change the constants

03:35.000 --> 03:37.000
and some of the multipliers here.

03:37.000 --> 03:38.000
Right?

03:38.000 --> 03:41.000
And so, if the span of vulnerabilities, you could look at it and say,

03:41.000 --> 03:46.000
well, not all of these vulnerabilities are actually going to affect my code.

03:46.000 --> 03:48.000
And everybody can do that downstream.

03:48.000 --> 03:50.000
Then we take this very big win, find out,

03:50.000 --> 03:53.000
and we can hopefully narrow the volume down and make it smaller,

03:53.000 --> 03:55.000
and make it manageable at scale.

03:55.000 --> 03:58.000
Right? And that's really what we're here to talk about today.

03:58.000 --> 04:00.000
So, with that, I'll tee off to Kevin.

04:00.000 --> 04:01.000
Thanks, very much.

04:09.000 --> 04:10.000
All right.

04:10.000 --> 04:12.000
Hi, my name is Gavskarnas.

04:12.000 --> 04:15.000
I'm a maintainer of Apache Lachfordian commons,

04:15.000 --> 04:19.000
and there's security member of the ASF.

04:19.000 --> 04:23.000
So, this is the ASB of whom you certainly know what an ASB of them is.

04:23.000 --> 04:28.000
How many of you have already had vetses?

04:28.000 --> 04:30.000
Okay.

04:30.000 --> 04:36.000
It's a big number, but for those that never heard about this,

04:36.000 --> 04:41.000
Alex is a machine-readable format to express the fact

04:41.000 --> 04:46.000
that the vulnerabilities that is contained in your application

04:46.000 --> 04:48.000
is actually exportable.

04:48.000 --> 04:53.000
The abbreviation stands for vulnerability expertise exchange.

04:53.000 --> 04:58.000
Open SSF produces a very nice document,

04:58.000 --> 05:01.000
a couple of, we can go, I think,

05:02.000 --> 05:04.000
where they analyze where these are used.

05:04.000 --> 05:08.000
So, we know that big companies like Microsoft Redhead open SSF.

05:08.000 --> 05:10.000
Cisco service now.

05:10.000 --> 05:12.000
These only use vetses.

05:12.000 --> 05:16.000
Some of those companies are part of vetses as a requirement

05:16.000 --> 05:19.000
for all their contractors.

05:19.000 --> 05:24.000
And yeah, why vetses are important in these days?

05:24.000 --> 05:29.000
Well, this was already in Antony Slides.

05:29.000 --> 05:33.000
The CRL says, we found no exploitable vulnerabilities

05:33.000 --> 05:39.000
except for us as stress, exploitable.

05:39.000 --> 05:40.000
Yeah.

05:40.000 --> 05:46.000
And yeah, I knew about vetses.

05:46.000 --> 05:51.000
I learned about vetses in 2023 from the source,

05:51.000 --> 05:53.000
from Steve Springett.

05:53.000 --> 05:56.000
And he warns me, yeah, it's very, very expensive.

05:56.000 --> 05:58.000
Of course, I had to verify that.

05:58.000 --> 06:02.000
And so that's what happens if you do it.

06:02.000 --> 06:08.000
By hand, every day in engineer gets there are around 100 CVs

06:08.000 --> 06:11.000
that are published.

06:11.000 --> 06:16.000
You need to check if the CVE actually applies to your application.

06:16.000 --> 06:19.000
Nowadays, we have S-bomb, so that's easy.

06:19.000 --> 06:23.000
Fortunately, then you have to understand what's written

06:23.000 --> 06:24.000
in the CVE.

06:24.000 --> 06:28.000
Some CVEs are very worried, some are very, very bad.

06:28.000 --> 06:34.000
Then you have to dive in all the calls,

06:34.000 --> 06:38.000
the trace from your application to the library.

06:38.000 --> 06:43.000
If this is the library, which is seven dependency levels deep,

06:43.000 --> 06:49.000
you have to look at a lot of code, assess if this is

06:49.000 --> 06:52.000
an exploitable and repeat for each version,

06:52.000 --> 06:58.000
because each version changes the code path.

06:58.000 --> 06:59.000
OK.

06:59.000 --> 07:06.000
And so, well, this is very expensive.

07:06.000 --> 07:12.000
Last year, I met Munawar guests in the room,

07:12.000 --> 07:16.000
so thank you, Alexis, for organizing the dev room.

07:16.000 --> 07:23.000
And we decided to experiment if we can use dev access

07:23.000 --> 07:25.000
for all open-source projects.

07:25.000 --> 07:29.000
So we know that Vexes for an open-source project

07:29.000 --> 07:35.000
will certainly improve the confidence that commercial vendors

07:35.000 --> 07:42.000
or downstream projects will have project, but unfortunately,

07:42.000 --> 07:46.000
at the time, we're very, very expensive.

07:46.000 --> 07:47.000
OK.

07:47.000 --> 07:51.000
So let's talk about the cost that Peter and Michael were mentioned.

07:51.000 --> 07:55.000
So if you are a large organization, that's why,

07:55.000 --> 08:01.000
so you encountered about 700,000 update decisions every year.

08:01.000 --> 08:05.000
And if each one of them, when you're generating the Vexes documents,

08:05.000 --> 08:09.000
it takes on average 10 hours, that takes about 7 million hours,

08:10.000 --> 08:14.000
which is about 3,365% years.

08:14.000 --> 08:19.000
Even if you get cheap labor, that's about 400 million dollars per year.

08:19.000 --> 08:23.000
That's why organizations find that it is very expensive.

08:23.000 --> 08:25.000
It's actually even more than that.

08:25.000 --> 08:29.000
This is, like, assuming the lower side of things,

08:29.000 --> 08:32.000
we have heard from experts in this area,

08:32.000 --> 08:36.000
that for those organizations generating one Vexes documents

08:36.000 --> 08:39.000
requires about half a million dollars.

08:39.000 --> 08:45.000
So this is a very expensive problem right now.

08:45.000 --> 08:48.000
And if organizations find it expensive,

08:48.000 --> 08:51.000
it's just impossible for maintainers.

08:51.000 --> 08:55.000
So think of Apache Solar, with whom we're working,

08:55.000 --> 08:58.000
and they have about 460 dependencies.

08:58.000 --> 09:04.000
About 300 CVs are found on average on these dependencies every year.

09:04.000 --> 09:08.000
Again, with 10 hours per Vexes, that's about 3,000 hours,

09:08.000 --> 09:14.000
about 1.5% years of dedicated work in order to generate that.

09:14.000 --> 09:17.000
That's also impossible for maintainers.

09:17.000 --> 09:21.000
So that's why, even though this is needed,

09:21.000 --> 09:26.000
the adoption right now is not there,

09:26.000 --> 09:31.000
because we are still trying to figure out how to do it at scale.

09:31.000 --> 09:36.000
If we, however, is able to do that, there's benefits of that.

09:36.000 --> 09:39.000
There was a study from the same group,

09:39.000 --> 09:42.000
when they did post-analysis of their SCA results,

09:42.000 --> 09:47.000
and they found out that 97% of the bugs that they find,

09:47.000 --> 09:51.000
even they do these SCA, they are actually false positives,

09:51.000 --> 09:53.000
as in those are not reachable.

09:53.000 --> 09:56.000
So if Vexes documents or somehow you find out,

09:56.000 --> 09:59.000
whether a vulnerability is reaching you or not,

09:59.000 --> 10:03.000
and you can eliminate the 97% of the toil,

10:03.000 --> 10:08.000
then all of those compounding numbers suddenly become tenable.

10:08.000 --> 10:13.000
So that's the idea where we come in.

10:13.000 --> 10:16.000
So how do we generate Vexes? Here's a simple idea.

10:16.000 --> 10:21.000
So let's say you have an upstream package A,

10:21.000 --> 10:25.000
which is used by a package B, which is used by a package C,

10:25.000 --> 10:27.000
which is used by a package D,

10:27.000 --> 10:32.000
and there is a vulnerability that was reported in package A.

10:32.000 --> 10:37.000
The question is, is that vulnerability reachable from package D or not?

10:37.000 --> 10:39.000
So how do we define that reachability?

10:39.000 --> 10:42.000
The way we do that is we look into,

10:42.000 --> 10:46.000
we create call graphs of A, B, C, and D,

10:46.000 --> 10:49.000
then try to find a path from D,

10:49.000 --> 10:52.000
which is the downstream package of interest,

10:52.000 --> 10:59.000
all the way to the place where the vulnerable code resides in A.

10:59.000 --> 11:05.000
So the first problem that we are solving here

11:05.000 --> 11:10.000
is to find out where which part of A is vulnerable.

11:10.000 --> 11:13.000
So that's what we call root cause analysis.

11:13.000 --> 11:15.000
So we have a component which is a root,

11:15.000 --> 11:18.000
we have written an AI agent that is,

11:18.000 --> 11:21.000
that we call the root cause analysis component of this puzzle.

11:21.000 --> 11:26.000
And what that does is given a particular CVE and a particular package,

11:26.000 --> 11:31.000
it identifies a set of methods or functions

11:31.000 --> 11:35.000
that when executed that vulnerability would be manifested.

11:35.000 --> 11:40.000
Then we also have to calculate the call graphs for each one of those,

11:40.000 --> 11:43.000
each one of those intermediary chains,

11:43.000 --> 11:46.000
and find those call graphs.

11:46.000 --> 11:49.000
And that's going to give you a layer of the terrain,

11:49.000 --> 11:52.000
how are the methods one calling each other.

11:52.000 --> 11:55.000
And once that is there, once those two components are there,

11:55.000 --> 11:59.000
then you can write a simple stitcher that's just going to hop from one method

11:59.000 --> 12:01.000
or another and eventually you are trying to find out

12:01.000 --> 12:06.000
whether you are reaching to that particular vulnerability or not.

12:06.000 --> 12:08.000
Now remember, this is,

12:08.000 --> 12:10.000
we are using the term reachability here.

12:10.000 --> 12:13.000
There is a more stronger term exploitability,

12:13.000 --> 12:16.000
which is even though you are calling something it may or may not be external.

12:16.000 --> 12:18.000
That's an even harder problem.

12:18.000 --> 12:19.000
We are not even going there.

12:19.000 --> 12:22.000
We are just solving or trying to solve the reachability problem.

12:22.000 --> 12:26.000
So as I explained the three components,

12:26.000 --> 12:32.000
we have released open source with the work that was funded by Alfa Omega.

12:32.000 --> 12:37.000
These three components that find the root cause,

12:37.000 --> 12:39.000
generate the call graphs,

12:39.000 --> 12:43.000
and then also do that stitching in order to,

12:43.000 --> 12:46.000
like, create these vex documents.

12:46.000 --> 12:50.000
And the link over there that tells the,

12:50.000 --> 12:52.000
that shows the repo org,

12:52.000 --> 12:56.000
that's in GitHub that stores these things,

12:56.000 --> 12:58.000
these components.

12:58.000 --> 13:01.000
Right now this is supporting only Java ecosystem.

13:01.000 --> 13:04.000
So the call graph is the one where it is language dependent.

13:04.000 --> 13:06.000
The root cause analysis,

13:06.000 --> 13:08.000
it looks at different kinds of,

13:08.000 --> 13:09.000
it's an AI agent,

13:09.000 --> 13:13.000
so it looks at different kind of CVs,

13:13.000 --> 13:18.000
and attempts to find the root cause in that particular language.

13:18.000 --> 13:20.000
But the call graph service is limited to Java.

13:20.000 --> 13:23.000
So that's why this whole thing that we are demonstrating

13:23.000 --> 13:27.000
is specific for Java language or Java packages.

13:27.000 --> 13:29.000
So how does this help?

13:29.000 --> 13:33.000
So let me tell you an anecdotal story that happened last year.

13:33.000 --> 13:36.000
So this was April 1, 2015,

13:36.000 --> 13:40.000
when there was a vulnerability that was found in Apache Parket Avru.

13:40.000 --> 13:43.000
This was CV-2025, 3065.

13:43.000 --> 13:45.000
And around April 15,

13:45.000 --> 13:49.000
Peter, who has access to this internal conversations,

13:49.000 --> 13:51.000
inside Apache ecosystem,

13:51.000 --> 13:53.000
got this door attention like,

13:53.000 --> 13:55.000
okay, there's a conversation that's going on

13:55.000 --> 13:58.000
that should we fix that vulnerability or not.

13:58.000 --> 14:01.000
So let's try to see if we can,

14:01.000 --> 14:03.000
if we can fix that.

14:04.000 --> 14:06.000
If that is reachable or not.

14:06.000 --> 14:09.000
So the chain was very simple there.

14:09.000 --> 14:12.000
It was just like Apache Hadoub is calling Parket.

14:12.000 --> 14:15.000
However, the Parket Avru is shaded.

14:15.000 --> 14:17.000
So if you are looking for binaries,

14:17.000 --> 14:19.000
it's not able to find that.

14:19.000 --> 14:21.000
However, we are working at source code level.

14:21.000 --> 14:23.000
So we are able to find that dependency chain.

14:23.000 --> 14:26.000
We started working on April 16.

14:26.000 --> 14:30.000
And at that time, the technology was not fully developed.

14:30.000 --> 14:32.000
So it was partially managed with two to some time.

14:32.000 --> 14:35.000
But by April 18, we generated the vex documents.

14:35.000 --> 14:37.000
The result was that it was not reachable.

14:37.000 --> 14:42.000
So we explained, like, created the vex document,

14:42.000 --> 14:45.000
created an explanation of why we are finding it,

14:45.000 --> 14:48.000
that we shared it with that team.

14:48.000 --> 14:52.000
So we got the response that they were going to do the update

14:52.000 --> 14:54.000
anyway for other reasons for compatibility reasons,

14:54.000 --> 14:55.000
but they appreciate the fact.

14:55.000 --> 14:57.000
And this is like,

14:57.000 --> 15:01.000
eventually, like, whether this can be done in a CICD pipeline

15:01.000 --> 15:04.000
or some other way by those maintainers.

15:04.000 --> 15:06.000
So there's actually, if the tooling is available,

15:06.000 --> 15:10.000
there's active interest in people trying to adopt that.

15:10.000 --> 15:14.000
So that's where we, and there's also another side part of the story

15:14.000 --> 15:17.000
that there were two other additional vulnerabilities

15:17.000 --> 15:19.000
that they were exploring at that particular point.

15:19.000 --> 15:21.000
Those two were, however,

15:21.000 --> 15:23.000
exploitable, both of them were reachable.

15:23.000 --> 15:24.000
And they did the fix.

15:24.000 --> 15:29.000
But one after 12 months, the other one after 18 months.

15:29.000 --> 15:32.000
So with the absence of vex like help,

15:32.000 --> 15:35.000
it's hard for people to prioritize the vulnerabilities,

15:35.000 --> 15:37.000
like, which one to fix, which one to prioritize,

15:37.000 --> 15:40.000
because there's so many of them coming at you,

15:40.000 --> 15:42.000
and which one you do have to fix.

15:42.000 --> 15:44.000
So we created those vex documents,

15:44.000 --> 15:48.000
both of them were exportable and they were eventually fixed.

15:48.000 --> 15:51.000
So that's a specific case study that happened,

15:51.000 --> 15:56.000
that also motivated us to create these vex documents

15:56.000 --> 16:01.000
and basically see how we can integrate that with the CICD pipeline

16:01.000 --> 16:05.000
of the appetizer of the foundation problems.

16:05.000 --> 16:09.000
So I just want to recap a little bit here.

16:09.000 --> 16:12.000
We have a lot of information flowing through the S-bomb

16:12.000 --> 16:19.000
driven CVE pipeline, the crates toil for everybody downstream.

16:19.000 --> 16:21.000
From individual maintainers along the way,

16:21.000 --> 16:23.000
to the end-use your applications,

16:23.000 --> 16:26.000
whether it's airflow or solar or whatever,

16:26.000 --> 16:29.000
the toil is spectacular.

16:29.000 --> 16:32.000
When I already talked about how this is for Java today,

16:32.000 --> 16:34.000
this is actually based on work that was done

16:34.000 --> 16:36.000
for project called caps lock,

16:36.000 --> 16:38.000
that does call graph analysis,

16:38.000 --> 16:40.000
initially was done for go,

16:40.000 --> 16:43.000
and now there are versions of this for Java and the rest as well.

16:43.000 --> 16:46.000
It is a strong belief of mine that call graphs

16:46.000 --> 16:48.000
needs to become normally available,

16:48.000 --> 16:51.000
because they provide all kinds of interesting opportunities.

16:51.000 --> 16:53.000
This is just one example of things that we can do

16:53.000 --> 16:56.000
when you have call graphs available to do static analysis

16:56.000 --> 16:59.000
and richer analysis on your code to understand

16:59.000 --> 17:01.000
where things are happening, what the risks are,

17:01.000 --> 17:03.000
even to generate more effective unit tests

17:03.000 --> 17:06.000
to validate your high-ramps law contracts.

17:06.000 --> 17:09.000
So this is a beginning of an experiment across

17:09.000 --> 17:13.000
a whole bunch of things about reducing toil for maintainers

17:13.000 --> 17:18.000
at every stage of a secure software development pipeline.

17:18.000 --> 17:21.000
So I think we're sort of approaching time here

17:21.000 --> 17:23.000
and we don't have some time for questions as well.

17:23.000 --> 17:25.000
I am particularly glad to thank Kiotter

17:25.000 --> 17:27.000
and Munowar for their work on this project

17:27.000 --> 17:29.000
to get us to this stage here.

17:29.000 --> 17:32.000
The most important thing you can do is engage.

17:32.000 --> 17:33.000
If this is interesting to you,

17:33.000 --> 17:35.000
when you want to be able to do this kind of things to your world,

17:35.000 --> 17:37.000
show up,

17:37.000 --> 17:39.000
this links start to asking for help,

17:39.000 --> 17:41.000
see if you can use it,

17:41.000 --> 17:43.000
tell us what we're doing wrong, et cetera.

17:43.000 --> 17:45.000
This is a classic early stage open source project

17:45.000 --> 17:47.000
where all of your PRs are very welcome.

17:47.000 --> 17:50.000
So again, thank you both for the work you're doing.

17:50.000 --> 17:51.000
Thank you for the time today.

17:51.000 --> 17:52.000
We'll take some questions now.

17:52.000 --> 17:54.000
We have plenty of time because they've been warning me

17:54.000 --> 17:56.000
about time left and we like questions more than we like talking.

17:56.000 --> 17:58.000
So, thank you.

17:58.000 --> 17:59.000
Thank you.

17:59.000 --> 18:00.000
Thank you.

18:00.000 --> 18:02.000
Thank you.

18:02.000 --> 18:04.000
We're going to repeat the questions,

18:04.000 --> 18:06.000
but you're close, so I'll bring the microphone to you.

18:06.000 --> 18:09.000
So I would guess a valid ingredient

18:09.000 --> 18:12.000
of basically ingredient of all your,

18:12.000 --> 18:16.000
what you suggest would be that the vulnerability report

18:16.000 --> 18:20.000
comes with a call graph or describing under which condition

18:20.000 --> 18:21.000
it could be used.

18:21.000 --> 18:24.000
I mean,

18:24.000 --> 18:28.000
I mean, this should be like the CICD pipeline that we create.

18:28.000 --> 18:30.000
Oh, we're good.

18:30.000 --> 18:32.000
So CICD pipeline that we created,

18:32.000 --> 18:35.000
this runs with as a GitHub actions,

18:35.000 --> 18:37.000
and this finds all of these info.

18:37.000 --> 18:41.000
You do have to provide some stuff, but basically right now,

18:41.000 --> 18:44.000
there's this metadata GitHub repository.

18:44.000 --> 18:48.000
That's where we are storing the results that we're calculating for future.

18:48.000 --> 18:49.000
Like other people can use it.

18:49.000 --> 18:51.000
Once we calculate a call graph,

18:51.000 --> 18:54.000
there's no reason for somebody else to calculate that call graph again.

18:54.000 --> 18:56.000
So right now we're doing the easiest way,

18:56.000 --> 18:58.000
which is storing this in GitHub,

18:58.000 --> 19:00.000
but we are open to, I mean,

19:00.000 --> 19:04.000
that's actually one of the areas that we mentioned in our future work,

19:04.000 --> 19:09.000
that we should really figure out how to make it reusable

19:09.000 --> 19:11.000
and for others.

19:11.000 --> 19:16.000
Yes, so we can also consider an integrated into the project.

19:16.000 --> 19:19.000
So we just saw that because

19:19.000 --> 19:23.000
Monkeys' dependencies there are many projects that I work from,

19:23.000 --> 19:28.000
also it's easy to convince those projects to publish it in the GitHub repo.

19:28.000 --> 19:30.000
Follow me to the back.

19:30.000 --> 19:31.000
Yes.

19:31.000 --> 19:33.000
If you have a call graph,

19:33.000 --> 19:35.000
if you have originally,

19:35.000 --> 19:39.000
can you update the challenges to eliminate symbols,

19:39.000 --> 19:43.000
functions, libraries that are unused,

19:43.000 --> 19:47.000
such that you don't actually have those things present,

19:47.000 --> 19:49.000
if they're never called.

19:49.000 --> 19:52.000
Okay, I repeat the question.

19:52.000 --> 19:55.000
If you have the call graph,

19:55.000 --> 19:56.000
and you have the reachability,

19:56.000 --> 20:00.000
can you update the libraries to remove the symbols?

20:00.000 --> 20:04.000
That are not used.

20:04.000 --> 20:06.000
Okay.

20:06.000 --> 20:09.000
So this is a problem I've been thinking about for a while.

20:09.000 --> 20:11.000
Sorry, I'm off camera.

20:11.000 --> 20:13.000
There you go, I was hiding.

20:13.000 --> 20:16.000
And there's a lot of ways to think about this problem, right?

20:16.000 --> 20:20.000
So this particular project is about helping you understand the reachability

20:20.000 --> 20:21.000
and exploitability ultimately,

20:21.000 --> 20:23.000
or aiding your evaluation of that,

20:23.000 --> 20:27.000
in a set of projects we are not modifying your dependencies to suit your needs.

20:27.000 --> 20:31.000
But it is obviously very interesting to think about what you can do.

20:31.000 --> 20:33.000
I understand this here.

20:33.000 --> 20:38.000
What you can do to effectively fork and maintain a subset of your dependencies,

20:38.000 --> 20:40.000
so you're reducing your attack surface in a variety of ways,

20:40.000 --> 20:42.000
based on your usage of it.

20:42.000 --> 20:46.000
Similarly, you can also generate unit tests that validate your contract,

20:46.000 --> 20:49.000
or shim layers, that sanitize your calls,

20:49.000 --> 20:52.000
and to give you exactly the kind of controller that layer.

20:52.000 --> 20:55.000
So this is why I'm excited about making call graphs more normal,

20:55.000 --> 20:58.000
because this kind of operation used to be a lot of work to think about,

20:58.000 --> 21:03.000
or a heavy maintenance burden now becomes a normal thing you could imagine.

21:08.000 --> 21:11.000
Spongebob thing it says, go look at this dependency,

21:11.000 --> 21:13.000
evaluate the calling patterns,

21:13.000 --> 21:16.000
generate unit tests for me that validate my contract with this.

21:16.000 --> 21:18.000
I call it Hiram's tests,

21:18.000 --> 21:20.000
and it worked really well.

21:20.000 --> 21:21.000
It was great.

21:21.000 --> 21:23.000
So we'll take the next question now.

21:24.000 --> 21:26.000
On the way back.

21:45.000 --> 21:49.000
So right now, the question is,

21:49.000 --> 21:51.000
if we have multiple languages,

21:51.000 --> 21:53.000
can we use this,

21:53.000 --> 21:59.000
like basically use it to go through call graphs across different language boundaries, right?

21:59.000 --> 22:00.000
Yes.

22:00.000 --> 22:03.000
The answer is, right now we don't do that,

22:03.000 --> 22:06.000
but these are created in a modular way,

22:06.000 --> 22:08.000
as in each of the call graphs are in separate files,

22:08.000 --> 22:10.000
and so if you're doing stitching,

22:10.000 --> 22:13.000
once you identify the functions,

22:13.000 --> 22:16.000
it's just a matter of just stitching one across another.

22:16.000 --> 22:18.000
So theoretically it can be done.

22:18.000 --> 22:19.000
Back in the academic year,

22:19.000 --> 22:21.000
like 10 years ago, I did,

22:21.000 --> 22:23.000
like my students did the original work

22:23.000 --> 22:25.000
on doing multiple languages analysis,

22:25.000 --> 22:28.000
static analysis of stuff.

22:28.000 --> 22:30.000
It excites me,

22:30.000 --> 22:31.000
it's just not done yet,

22:31.000 --> 22:33.000
but theoretically it can be done.

22:33.000 --> 22:34.000
Thank you.

22:34.000 --> 22:35.000
Yes.

22:35.000 --> 22:36.000
I just want to say,

22:36.000 --> 22:38.000
we just finished getting this work in the first time

22:38.000 --> 22:41.000
when you're already asking for cross-platform course language support.

22:41.000 --> 22:42.000
Okay.

22:42.000 --> 22:43.000
So I have a statement to you.

22:43.000 --> 22:44.000
PR's welcome, sir.

22:46.000 --> 22:47.000
Yes.

22:47.000 --> 22:48.000
I'll be back to you.

22:48.000 --> 22:49.000
Yes.

22:49.000 --> 22:53.000
I should get to a good class for the resources

22:53.000 --> 22:57.000
or a class that you can show that you've had to do then

22:57.000 --> 22:59.000
to be part of the voice system.

22:59.000 --> 23:02.000
Could you have a question?

23:02.000 --> 23:04.000
Can you have a question?

23:04.000 --> 23:09.000
Can you have a question?

23:09.000 --> 23:11.000
If you just look for Google GoCapSlock,

23:11.000 --> 23:17.000
you'll find the original project that was built.

23:17.000 --> 23:18.000
Generally, it's a call graph,

23:18.000 --> 23:20.000
but what's also really doing is doing analysis

23:20.000 --> 23:22.000
of the underlying capabilities that you're

23:22.000 --> 23:23.000
call graph enables.

23:23.000 --> 23:26.000
System calls down to network environment,

23:26.000 --> 23:30.000
file system, whatever access is standard libraries

23:30.000 --> 23:31.000
providing.

23:31.000 --> 23:34.000
So you can understand what capabilities your dependency graph

23:34.000 --> 23:36.000
is imposing on your process,

23:36.000 --> 23:37.000
and essentially enabling.

23:37.000 --> 23:42.000
But that was a go-specific thing.

23:42.000 --> 23:46.000
Alpha-mega-funded ports or implementations of caps lock

23:46.000 --> 23:50.000
and call graph analysis to both rust and Java.

23:50.000 --> 23:52.000
Those are still ongoing efforts,

23:52.000 --> 23:55.000
but that's sort of the state of affairs right now.

23:55.000 --> 23:59.000
The question is, why cannot the new caps lock

23:59.000 --> 24:01.000
be stopped?

24:01.000 --> 24:04.000
The question is, why can't we use caps lock?

24:04.000 --> 24:06.000
So caps lock is basically finding,

24:06.000 --> 24:08.000
like creating these call graphs to find out

24:08.000 --> 24:10.000
whether capabilities reach you or not.

24:10.000 --> 24:13.000
We're talking about whether vulnerabilities reach you or not.

24:13.000 --> 24:15.000
It's essentially the same problem.

24:15.000 --> 24:18.000
In our case, two, the call graphs that we're generated,

24:18.000 --> 24:21.000
they're generated in caps lock format.

24:21.000 --> 24:23.000
So they're usable.

24:23.000 --> 24:26.000
And that's why like this can also go across multiple boundaries.

24:26.000 --> 24:28.000
Once we create these across other,

24:28.000 --> 24:31.000
there's a standard format that was originally created by Google.

24:31.000 --> 24:33.000
That is now can be usable.

24:33.000 --> 24:36.000
And there's a working group if you want to join that too.

24:36.000 --> 24:37.000
Okay, here.

24:37.000 --> 24:40.000
So the question was, I'm an open source maintainer.

24:40.000 --> 24:43.000
Sometimes, I think, our ability is my own code.

24:43.000 --> 24:48.000
So the question was, I'm an open source maintainer.

24:48.000 --> 24:50.000
And I occasionally, sometimes,

24:50.000 --> 24:52.000
fix vulnerabilities in my own code.

24:52.000 --> 24:54.000
I mean, that seems nice.

24:54.000 --> 24:58.000
And so you're asking, why do I care about this?

24:58.000 --> 25:02.000
I think that, as a maintainer,

25:02.000 --> 25:04.000
fixing vulnerabilities seems like a good idea.

25:04.000 --> 25:09.000
That, as a maintainer who cares about how people downstream

25:09.000 --> 25:12.000
are having to deal with the changes you make.

25:12.000 --> 25:14.000
So when you make a patch to a vulnerability,

25:14.000 --> 25:16.000
you're saying, I fix this problem, right?

25:16.000 --> 25:17.000
Great.

25:17.000 --> 25:21.000
And if everybody could pick up your update immediately

25:21.000 --> 25:24.000
all through the cascading effects of that transitive graph,

25:24.000 --> 25:26.000
you know, you fix all the vulnerabilities.

25:26.000 --> 25:29.000
Everybody just picks up updates, no problem.

25:29.000 --> 25:32.000
But in practice, the cost for everybody to pick up,

25:32.000 --> 25:35.000
you're updated version, you know, it's just to have it.

25:35.000 --> 25:37.000
People will stick, they'll use to bend apart to pick up

25:37.000 --> 25:39.000
the latest version and rate up until the build breaks,

25:39.000 --> 25:42.000
and then they fit in it and don't change it until somebody forces them.

25:42.000 --> 25:45.000
And so your act of doing vulnerability,

25:45.000 --> 25:48.000
reachability analysis on your own vulnerabilities,

25:48.000 --> 25:51.000
creates a statement that tells people and helps people

25:51.000 --> 25:54.000
understand whether or not they need to take up your fix right now,

25:54.000 --> 25:56.000
or to that gentleman up there's point of view.

25:56.000 --> 25:58.000
We don't use that part of the code or we sanitize our inputs,

25:58.000 --> 25:59.000
so we're okay.

25:59.000 --> 26:01.000
We're not going to pick up his patch this week.

26:01.000 --> 26:05.000
We'll wait until our patch choose day next week and do something differently.

26:05.000 --> 26:06.000
Do I just do that?

26:06.000 --> 26:10.000
Do I just do that?

26:10.000 --> 26:11.000
Yeah.

26:11.000 --> 26:15.000
Well, it's not, well, the device is something

26:15.000 --> 26:21.000
that is produced not by the project that has the vulnerability

26:21.000 --> 26:24.000
by downstream projects.

26:24.000 --> 26:28.000
So I started being interested in CVEs.

26:29.000 --> 26:32.000
When Kafka wanted to integrate log4j,

26:32.000 --> 26:37.000
but they said, oh wait, log4j depends on many parsers,

26:37.000 --> 26:42.000
and now parsers generate 70 CVEs per minute,

26:42.000 --> 26:48.000
which will make Kafka much,

26:48.000 --> 26:51.000
generate a lot of work to Kafka.

26:51.000 --> 26:54.000
So my answer to that is, no,

26:54.000 --> 26:56.000
if we generate a Vx file, you know,

26:56.000 --> 27:01.000
project, and we say that all those parsers are only used

27:01.000 --> 27:06.000
on trusted code, so all the Vxs will be not affected,

27:06.000 --> 27:10.000
then there is still less work upstream.

27:10.000 --> 27:11.000
Down through.

27:11.000 --> 27:12.000
All right.

27:12.000 --> 27:13.000
We are out of time.

27:13.000 --> 27:15.000
We'll continue to take questions outside in the fresh air,

27:15.000 --> 27:17.000
where we can all breathe fresh air.

27:17.000 --> 27:19.000
Thank you very much for your time today.

27:19.000 --> 27:20.000
Thank you.

27:26.000 --> 27:28.000
Thank you.

