WEBVTT

00:00.000 --> 00:08.480
All right, welcome back, everyone.

00:08.480 --> 00:17.000
With my pleasure, I welcome your tian, Franken, to talk about back home.

00:17.000 --> 00:18.000
Thank you.

00:18.000 --> 00:20.000
Thank you for the introduction.

00:20.000 --> 00:22.000
So, I'm Giftiana.

00:22.000 --> 00:26.000
My researcher at Distinette, a research group at K-Level.

00:26.000 --> 00:30.000
And I'm going to talk about buckhawk, what we call automated browser

00:30.000 --> 00:32.000
buckby section on steroids.

00:32.000 --> 00:34.000
So, maybe some context.

00:34.000 --> 00:38.000
I think we can all agree that when software becomes more and more complex,

00:38.000 --> 00:41.000
the likelihood of a buck being introduced also increases, right?

00:41.000 --> 00:45.000
And even to that extent, that seemingly small changes to the codebase

00:45.000 --> 00:48.000
can lead to severe security bugs.

00:48.000 --> 00:51.000
Let's place that in the context of browsers.

00:51.000 --> 00:55.000
Nowadays, very complex systems, tens of millions of lines of codes.

00:55.000 --> 01:00.000
And it comes very close to other complex systems that we know being operating systems.

01:00.000 --> 01:04.000
And indeed, this often translates into vulnerabilities.

01:04.000 --> 01:09.000
For browsers, I think on a weekly basis, we see new vulnerabilities being discovered,

01:09.000 --> 01:11.000
being fixed, discovered fixed.

01:11.000 --> 01:13.000
It's like an endless cycle.

01:13.000 --> 01:19.000
And we think we can improve this process by looking at buckfixing process.

01:19.000 --> 01:22.000
I think most of these processes look a bit like this.

01:22.000 --> 01:26.000
First, vulnerability is reported, then it's priaged.

01:26.000 --> 01:30.000
Somebody has assigned to fix it, and then the fix is tested.

01:30.000 --> 01:35.000
Now, we think with a proof of concept of vulnerability, we can do a lot more.

01:35.000 --> 01:38.000
We can do a full buck life cycle analysis.

01:38.000 --> 01:44.000
Because, for example, if you know the cause of a problem, it's often much easier to fix it, right?

01:44.000 --> 01:48.000
Also, there might, in the history of the codebase,

01:48.000 --> 01:53.000
there might have already been a solution at some point in time if the bug is regressed.

01:53.000 --> 01:58.000
And in the long run, we think we can improve the development process as a whole.

01:58.000 --> 02:01.000
Now, we don't want to burden the developer with even more work,

02:01.000 --> 02:05.000
so we automated this buck life cycle analysis process.

02:05.000 --> 02:11.000
Just conceptually, buckhawk works with the version control system often get.

02:11.000 --> 02:14.000
Just a recap, let's say you get it.

02:14.000 --> 02:18.000
If you commit, you save like a snapshot of the codebase.

02:18.000 --> 02:23.000
So here, the green dots stand for snapshots of that codebase.

02:23.000 --> 02:27.000
And let's say we also have a proof concept of a vulnerability.

02:27.000 --> 02:34.000
So then, in theory, we can take that proof concept and check it on every snapshot of that codebase, right?

02:34.000 --> 02:40.000
So here, the red squares stand for snapshots that are vulnerable to that's bug.

02:40.000 --> 02:45.000
And with this, there's really a lot of information that can help us fix the bug,

02:45.000 --> 02:49.000
but also prevent bugs in the future.

02:49.000 --> 02:55.000
And of course, like codebase like Chromium, there's nowadays already more than a million commits,

02:55.000 --> 02:58.000
so that's the main reason why we automated this.

02:58.000 --> 03:05.000
So just to give you an idea of the tool, this is a tool user interface.

03:05.000 --> 03:09.000
So as you can see, we can define what browser we want to use for now.

03:09.000 --> 03:11.000
We support Chromium Firefox.

03:11.000 --> 03:13.000
The proof concept of our bug is here.

03:13.000 --> 03:17.000
We selected and then essentially we start buckhawk.

03:17.000 --> 03:23.000
This is run on a VPS with ATPU course and 16 gigabytes of RAM.

03:23.000 --> 03:26.000
It's not sped up and this was recorded.

03:26.000 --> 03:28.000
I think I did it yesterday.

03:28.000 --> 03:34.000
So in a while, we will see dots appearing here and each of these dots stand for commits.

03:34.000 --> 03:38.000
So on this line, we see commits where the bug is reproduced.

03:38.000 --> 03:41.000
On this line, commits where it's not reproduced.

03:41.000 --> 03:47.000
And in a bit, we will see a general idea of the lifecycle of that specific bug we're testing.

03:47.000 --> 03:53.000
As you can see here, I've seen that this before, so it's not a surprise for me,

03:53.000 --> 04:00.000
but here we will see a dot, and we know that the bug is introduced between these two dots, right?

04:00.000 --> 04:02.000
Because the reproducibility has shifted.

04:02.000 --> 04:08.000
And if we leave this on for about 10 minutes, we will know the exact commits where the bug was introduced,

04:08.000 --> 04:12.000
which again gives a lot of information.

04:12.000 --> 04:18.000
I'll go ahead and this is also more of a high level view of buckhawk.

04:18.000 --> 04:23.000
First of all, we give parameters for the evaluation, being the browser,

04:23.000 --> 04:28.000
while there's a range of versions that we want to consider, and then of course the proof concept.

04:28.000 --> 04:33.000
Proof concept can be quite complex sometimes, but to some extent buckhawk can handle that.

04:33.000 --> 04:42.000
I don't have time to go into the details here, but I can explain the next steps that buckhawk will do in an automated way.

04:42.000 --> 04:49.000
It's essentially a loop, and first, first thing in that loop, buckhawk will choose the next commit to evaluate.

04:49.000 --> 04:55.000
Then it will take that commit, search the executable, that associated with that commit.

04:55.000 --> 05:00.000
And as a last step in that loop, it will evaluate the proof concept against the executable.

05:00.000 --> 05:05.000
So this is loops until the full analysis is finished.

05:05.000 --> 05:12.000
Okay, so I don't think I have time to go really into detail, but in essence we made a sequence algorithm

05:12.000 --> 05:17.000
that's in an intelligent way, chooses the next commit, it's in two phases.

05:17.000 --> 05:23.000
And in the end, maybe the last phase is the most important, it's called the pinpointing phase.

05:23.000 --> 05:29.000
So as you can see, buckhawk will concentrate on these parts because these are the most interesting.

05:29.000 --> 05:36.000
And at the end of that phase, if I can click it, you see that there's a lot more evaluations here.

05:36.000 --> 05:43.000
And if we zoom in, we can see what commit was responsible for introducing or fixing a buck.

05:43.000 --> 05:52.000
Okay, so because we do dynamic experiments, we need executables that associate with a certain commit.

05:52.000 --> 05:56.000
And for this, we rely heavily on repositories of browsers.

05:56.000 --> 06:03.000
For example, for Firefox, we use a nightly for chromium browser snapshots.

06:03.000 --> 06:09.000
In the past, we also built these from source, but it's very resource intensive, so we don't really do that anymore.

06:09.000 --> 06:12.000
And we built a lot of optimizations.

06:12.000 --> 06:20.000
For example, the sequence algorithm in the last slide also takes into account availability of these executables.

06:20.000 --> 06:26.000
Then the most important step is, of course, to evaluate the proof concept on the executable.

06:26.000 --> 06:34.000
For this, we cannot use browser automation tools like Selenium because they only go back to, let's say, version 50.

06:34.000 --> 06:40.000
I don't know it by head, but we want to go further in history even more, right?

06:40.000 --> 06:46.000
At this moment, buckhawk supports browsers until version 20 till the latest release.

06:46.000 --> 06:50.000
And we just call them using the command line interface.

06:50.000 --> 07:00.000
So in essence, each browser, each executable, is executed in its own isolated Docker container and all dependencies are taken care of.

07:00.000 --> 07:10.000
And a proof of concept is essentially like a bunch of web pages that interact with the browser and try to elicit a certain vulnerability.

07:10.000 --> 07:22.000
When this is done, we reject the logs and buckhawk can then say, okay, in this commit, that buck was reproduced or in the other one, it was not.

07:22.000 --> 07:24.000
This is like a summary.

07:24.000 --> 07:27.000
When the analysis is complete, you will see something like this.

07:27.000 --> 07:37.000
We can zoom into that and click even the dots that represent commits and go to the commit web page at public, for example, for chromium.

07:37.000 --> 07:45.000
So we showed that buckhawk can be very useful by analyzing 75 CSP bugs.

07:45.000 --> 07:55.000
We find a lot of interesting findings here, for example, that some of these vulnerabilities are already publicly disclosed before they're even fixed.

07:55.000 --> 07:58.000
This was the case for tree vulnerabilities.

07:58.000 --> 08:03.000
So that shows that there's some room for improvement in these buck handling processes, right?

08:03.000 --> 08:11.000
Also, at the time of our evaluation, tree vulnerabilities were still affecting the laser release of these browsers.

08:11.000 --> 08:19.000
I don't think I can go into this, but what's maybe more important is that we want to extend further.

08:19.000 --> 08:22.000
So before we started with just web browsers, right?

08:22.000 --> 08:30.000
But now we also want to take the V8 engine, for example, and just run proof concepts on that.

08:30.000 --> 08:34.000
So that's now supported in the latest version of buckhawk.

08:34.000 --> 08:38.000
Also, rapidly run times, but this is a bit more experimental.

08:38.000 --> 08:41.000
There's some hurdles compared to browsers.

08:41.000 --> 08:47.000
For example, we have certain builds flags for V8, for example, enabling the sandbox.

08:47.000 --> 08:57.000
And also, we're still looking for repository with Spiderman key executables that are associated with single commits.

08:57.000 --> 09:01.000
So if you know this repository, please tell me.

09:01.000 --> 09:08.000
And even beyond that, beyond the browser ecosystem, we think buckhawk can also be very useful.

09:08.000 --> 09:12.000
Because there's only three main ingredients that we need for buckhawk to be applied.

09:12.000 --> 09:18.000
And that's first of all, like a gold base that is a version controlled.

09:18.000 --> 09:24.000
Also, we need an easy way to execute the builds that are associated with commits.

09:24.000 --> 09:30.000
Either the building can be quite efficient or there's like some repository where we can download them.

09:30.000 --> 09:37.000
And then lastly, we should be able to interact with these executables, for example, automating user behavior.

09:37.000 --> 09:46.000
And these are the most important aspects, and if these hold true, I think buckhawk can be applied on any system.

09:46.000 --> 09:50.000
Okay, so I think I give you a summary of buckhawk.

09:50.000 --> 09:55.000
If you're interested here, some links also of previous presentations.

09:55.000 --> 09:59.000
And if you want to discuss more, I'd be happy to talk to you after this session.

09:59.000 --> 10:00.000
Thank you.

10:00.000 --> 10:07.000
We already have a question.

10:07.000 --> 10:11.000
Did you reach out to Google or Maria, but it?

10:11.000 --> 10:14.000
I mean, some people, but I didn't get a response.

10:14.000 --> 10:17.000
Okay, I'm in charge of that, Maria.

10:17.000 --> 10:22.000
So please, we should chat because it's super exciting what we've been doing.

10:22.000 --> 10:25.000
Okay, right, too.

10:25.000 --> 10:27.000
Thank you, it's incredible.

10:27.000 --> 10:33.000
I hope question actually, how long does it take to find a regression point like for one buck?

10:33.000 --> 10:37.000
Okay, it depends, but how early are you looking for a version?

10:37.000 --> 10:41.000
I think it's safe to say, on average, 10 minutes, something like that.

10:41.000 --> 10:46.000
And it really depends, like, is there a lot of fluctuations in reproducibility?

10:46.000 --> 10:49.000
But in most cases, I'd say 10 minutes.

10:49.000 --> 10:50.000
Yeah.

10:50.000 --> 10:55.000
So you didn't turn on the whole browser like it's just from home?

10:55.000 --> 10:58.000
We do turn on the whole browser.

10:58.000 --> 11:02.000
So essentially what happens?

11:02.000 --> 11:04.000
Let's say you start with evaluating.

11:04.000 --> 11:07.000
You just buckock will choose the commits.

11:07.000 --> 11:12.000
It simply downloads and immediately instructs the browser in RAM to visit the proof

11:12.000 --> 11:14.000
of concept webpage.

11:14.000 --> 11:18.000
And based on the request sent, it can already decide, okay, this one is a committed

11:18.000 --> 11:22.000
process where the buck is affecting the browser or not.

11:22.000 --> 11:24.000
So in that sense, and we also run it in parallel.

11:24.000 --> 11:31.000
So I think in the example I showed, there's 12 containers running at all times doing the evaluation.

11:31.000 --> 11:39.000
Cool.

11:39.000 --> 11:45.000
It goes up and down all the time, so it's reproducible, not very reproducible.

11:45.000 --> 11:52.000
But the proof of concept is that not deterministic or why does it happen that it fluctuates as much.

11:52.000 --> 11:55.000
Like I understand that some point is fixed, but yeah.

11:55.000 --> 11:57.000
But this is like a time axis, right?

11:57.000 --> 11:58.000
The axis is time.

11:58.000 --> 12:02.000
And so this is, for example, let's say 25, I think.

12:02.000 --> 12:05.000
And here, 20, 26.

12:05.000 --> 12:12.000
So that means that between the releases of 25 and 26, the buck was fixed in this case.

12:12.000 --> 12:20.000
So this is going up and down because you go throughout history, the development history of the browser.

12:20.000 --> 12:25.000
But the buck is deterministic and it.

12:25.000 --> 12:26.000
Yeah.

12:26.000 --> 12:32.000
So here you see, well, here the buck is certainly reintroduced.

12:32.000 --> 12:36.000
Here it might be because the policy that we're testing is not yet supported.

12:36.000 --> 12:38.000
But here indeed, it was fixed.

12:38.000 --> 12:50.000
It remained fixed for some time, and it was reintroduced a few versions after.

12:50.000 --> 12:59.000
So do you think your tool can generalize to like other programs, you know, like compilers, maybe, or things like that?

12:59.000 --> 13:01.000
I think it's possible.

13:01.000 --> 13:07.000
Like what do you need from the software being tested in order to do the by-section?

13:07.000 --> 13:09.000
Yes, it's simply these three things.

13:09.000 --> 13:12.000
I think that's all we need.

13:12.000 --> 13:22.000
Get a buckhawk works per commits, fetch the executable, and then interact with the executable to elicit the buck.

13:22.000 --> 13:24.000
And if that works, we can...

13:24.000 --> 13:27.000
I'm sorry, I didn't realize you had a whole slide, I missed it.

13:27.000 --> 13:32.000
I'm not in no problem.

13:32.000 --> 13:42.000
So the reason you're not doing a simple binary search is because this pattern of being fixed and then broken again is really common or what?

13:42.000 --> 13:51.000
We use an adapted version of binary search, but because, for example, not every executable is available, we have to adapt.

13:51.000 --> 13:59.000
And also we first want to have like a general idea of the life cycle, and only then, because we want to find regressions.

13:59.000 --> 14:07.000
Only then, when we have like a general idea, we want to really, in that sense, do binary search on the actual commit.

14:18.000 --> 14:26.000
Yeah, my question is, how is your difficulty to extend this to other web browsers or web engines?

14:27.000 --> 14:35.000
Not that difficult. For example, once we had a running for chromium, it was just looking for how things were done in Firefox.

14:35.000 --> 14:39.000
For example, it they used macro instead of get at the time.

14:39.000 --> 14:48.000
It was just writing an adapter for that, also different endpoints for fetching the executable executables, but other than that, it's quite similar.

14:48.000 --> 14:56.000
And my second question, not one is, how hard would it be to explore back the idea of building from sources?

14:56.000 --> 14:59.000
Let's say that you have a powerful enough machine.

15:03.000 --> 15:12.000
If you have a robust script for that, and you can build versions of 10 years ago, which is, I think, quite difficult sometimes.

15:12.000 --> 15:22.000
It shouldn't be a problem, but I think the bottleneck is some things don't work 10 years later, for example, in a building process of chromium or Firefox.

15:22.000 --> 15:27.000
Yeah, I was thinking more to integrate it with the continuous integration system for instance of WebKit.

15:27.000 --> 15:30.000
What do I mean, trust that you can find in something that might happen?

15:30.000 --> 15:33.000
It started happening in the past five weeks or so.

15:33.000 --> 15:36.000
Yeah, CICD is a good example how this could be applied, indeed.

15:36.000 --> 15:42.000
And I think in the future we will certainly look for that if we find a way to do it more efficiently.

15:42.000 --> 15:44.000
Okay, thank you.

15:44.000 --> 15:46.000
Thank you.

15:46.000 --> 15:48.000
Okay, perfect.

15:48.000 --> 15:49.000
No more questions.

15:49.000 --> 15:51.000
Thank you so much.

15:51.000 --> 15:52.000
Thank you.