WEBVTT

00:00.000 --> 00:19.120
Hello everyone, let's welcome our next speaker, Dimitri and Mikhail who will be talking about finding

00:19.120 --> 00:29.440
beggars using fuzz ink, it's your turn.

00:29.440 --> 00:35.920
Thank you so much, so good afternoon everyone, so I'm Mikhail Makozy, I'm a researcher at

00:35.920 --> 00:40.800
University of Paris-Seclet in France, and so yeah it's my very great pleasure to be here in

00:40.800 --> 00:47.200
Phosem to tell you about our tool, Rosa, which you can actually find on GitHub and which is about

00:47.200 --> 00:54.400
finding bug doors with fuzz ink, so this is a joint work with Emilia, Stefano who's just there

00:54.400 --> 01:00.240
and Dimitri, which was the main developer of Rosa, and so yeah I will actually start to talk

01:00.240 --> 01:05.120
by giving you a general overview about what Rosa is about, and then I will leave the flow to

01:05.120 --> 01:11.520
Dimitri, we will give you more technical details, make a demo of the tool and talk about the

01:11.520 --> 01:20.080
performance of the tool, so let's start by actually presenting you what what we mean by bug doors

01:20.080 --> 01:26.400
and fuzz ink, so what is a bug door when we started this work we saw that it was yeah we had

01:26.400 --> 01:31.040
a simple definition of it, but actually as we started digging we realized that people could

01:31.040 --> 01:37.120
mean really a lot of different things because they can be bug doors in many parts of IT systems,

01:37.200 --> 01:43.200
so for example it can be like a server which has been weekly configured on purpose so that people

01:43.200 --> 01:50.480
are aware of that, can use it to get access to the server, it can be like you have a machine learning

01:50.480 --> 01:55.600
model which is used to authenticate people and some guys if poison the training data so that

01:55.600 --> 02:02.800
they will always be authenticated and yeah it can happen, it can be also like a in cryptography,

02:02.880 --> 02:09.440
some mathematical flows that had been planted in the algorithm itself in the mat of the algorithm

02:09.440 --> 02:14.560
so that if you are aware of this flow you can actually decipher more or less everything that is

02:14.560 --> 02:22.400
encrypted with this algorithm and there are other other kind of bug doors in the IT stack,

02:22.400 --> 02:28.800
but in our work we focus on what we call code level bug doors, so the idea is that you have like

02:28.880 --> 02:36.560
hidden access which is hard coded in the program code, so the idea is that you have to feed the

02:36.560 --> 02:42.960
program with a given input, a secret input value that you have to know and if you give the program

02:42.960 --> 02:49.840
this input it will give you like it will enable you to bypass the authentication mechanism of the program

02:49.840 --> 02:56.400
or also to give you access to like underlying system resources like a shell and yeah examples

02:56.400 --> 03:01.920
to make things a bit more clear for the first first case like let's imagine the program is

03:01.920 --> 03:09.200
Sudu that you use to like to get a root access on your on your new machine, so the the back door

03:09.200 --> 03:14.800
could be like a password, a key password that has been hard coded in the code of Sudu,

03:14.800 --> 03:21.280
so that like if you run Sudu and use this password value you get root access and can run commands

03:21.280 --> 03:28.320
as as root, for the other example the program could be like a web server and if you send an

03:28.320 --> 03:34.240
HTTP request with a given argument value a secret value it will create a reverse shell and you

03:34.240 --> 03:39.440
will have for example root access to the the machine where the server is deployed and probably to

03:39.440 --> 03:46.320
the private network behind behind this machine and so like this is not just like academics like

03:46.320 --> 03:52.160
us dreaming in their ivory tower this is a real really huge so they've been examples real

03:52.160 --> 03:56.800
examples of such back doors injections so you guys probably have heard about the famous

03:57.520 --> 04:04.720
XZ uh back door injections but it's just the tip of the iceberg they've been other documented cases

04:04.720 --> 04:11.120
like back doors and PHP, the STPD, ProFTPD and also yeah like here in fuzz them we are more about

04:11.200 --> 04:15.680
open source software but there are also a lot of back doors that have been found like in

04:15.680 --> 04:22.480
proprietary binary firmware of devices like routers and this kind of stuff and in all

04:22.480 --> 04:28.160
these cases you have what we call the butterfly effect of supply chain attack because yeah these that

04:28.160 --> 04:34.320
if you plant the back door uh like uh at the right place in the supply chain that it will infect

04:34.320 --> 04:39.520
everything above like in a router you plant it in the firmware and all the people who get this

04:39.520 --> 04:46.320
router get the back door if you plant it like in XZ uh all the the software which reuse XZ will

04:46.320 --> 04:51.920
get the back door actually so it's it's it's really really bad and so to solve this this problem

04:51.920 --> 04:58.800
the technique that we we have decided to use is gray box fuzzing so basically it's kind of

05:00.000 --> 05:05.600
brute force testing of programs but with more less randomized inputs that you send to the program

05:05.680 --> 05:09.920
but it's a bit more clever than that because you have some kind of a feedback loop that try to

05:09.920 --> 05:16.880
cover more and more parts of more and more branches in your in your code actually and actually

05:16.880 --> 05:23.920
it's coupled with a simple mechanism to detect uh code failures which is just crash detection

05:23.920 --> 05:29.840
but in this way you can still with some sanitizers for example you can detect a lot of memory

05:29.920 --> 05:36.560
errors like buffer flows and yeah that all these kind of steps and so like it's a line of work that

05:36.560 --> 05:43.360
is getting a lot of traction with some modern fuzzers like AFL++ which we are using in this work

05:43.360 --> 05:49.760
it's been gotten quite popular for a few reasons uh first like they have a good track

05:49.760 --> 05:56.400
recourse of finding CVs in many different kind of codes they are efficient for exploring different

05:56.400 --> 06:02.320
source of program including source or binary only programs and they've mitigated what in the

06:02.320 --> 06:07.200
community we call the magic byte problem so the idea is that if you do some kind of randomized

06:07.200 --> 06:13.680
testing of a program maybe at some point the code you will have an if statement saying if this part

06:13.680 --> 06:20.960
of the input is equal to this long uh stream of bytes then you can enter uh inside the if statement

06:21.040 --> 06:26.480
and obviously if you are trying to guess the right value to enter there by randomized testing

06:26.480 --> 06:31.520
probably it will never happen and so you want to test this brick part of the code inside the

06:31.520 --> 06:37.520
if statement and know like modern fuzzers like AFL++ if mechanism to quickly analyze the code

06:37.520 --> 06:46.000
and recognize such uh hard-coded magic bytes value uh so yeah that was a bit what we meant about

06:46.720 --> 06:52.080
what back doors and fuzzings are you know let's discuss our work to try to detect back doors

06:52.080 --> 06:58.960
with uh with fuzzings so when we started this work our uh use cases the use cases that we

06:58.960 --> 07:04.240
had in mind were more or less like vetting people willing to vet like some third party components

07:04.240 --> 07:10.640
like libraries or uh firmware from some routers that's yeah they wanted to include in their

07:10.640 --> 07:16.320
infrastructures as they want to to put the library in the software or the a router model

07:16.320 --> 07:22.160
in there in their network stack and they want to make sure that the software is free of back doors

07:22.160 --> 07:28.400
and yeah not currently we are working on another use case which is trying to prevent the injection

07:28.400 --> 07:34.720
of back doors like in open source project like at a CI level like having some commit vetting mechanism

07:34.720 --> 07:39.680
that tried to detect that at some point one guy tried to inject a back door and yeah if you want to

07:39.920 --> 07:44.800
this comes with other other challenges and if you want to know more about that just come to

07:44.800 --> 07:51.520
my route to our talk in the CI and testing uh track where the metry will tell you more about

07:51.520 --> 07:58.320
this line uh so yeah so that was our use case and to try to solve this use case we started to

07:58.320 --> 08:04.400
by looking at yeah what more or less was existing and um what we found it was it was mainly like

08:04.480 --> 08:09.920
manual code reverse engineering to try to find the back door in the code but obviously that's

08:09.920 --> 08:15.360
really royal pain and so most of the times people don't do it and they've also been a few

08:16.320 --> 08:22.480
tools that have been developed to do that but it's basically just trying to automate a part of the

08:22.480 --> 08:28.240
manual engineering effort and actually these tools they are more they are actually gathering dust

08:28.240 --> 08:32.480
so most of them are not available anymore or they are available but they are not maintained

08:32.560 --> 08:37.040
and they don't work there was just one we could try but the results were a bit terrible

08:38.400 --> 08:42.800
and yeah and also another problem that we noticed is that there is a limited

08:43.760 --> 08:50.560
back door sample availability problem because the report about back doors are rare and when you get one

08:50.560 --> 08:58.000
it's usually like about artifacts that are lost or that do not work and so this is this is painful

08:58.160 --> 09:02.720
because if you want to create a technique to develop to detect back door at some point you want to

09:02.720 --> 09:07.920
test it to see if it works and detect real back doors and if you don't have samples it's it can

09:07.920 --> 09:15.200
be a bit annoying and so yeah given that there was room to do something better we decided to go

09:15.200 --> 09:20.640
with grey box fencing because it's largely automatic so it's better than just doing some manual

09:20.720 --> 09:26.880
steps it can explore all types of program including if you want to vet like a binary

09:26.880 --> 09:32.320
firmware and yeah it has already been very successful to fight cv's like buffer of the

09:32.320 --> 09:39.040
flow so why not finding back doors? Actually there is a good reason is that actually they just

09:39.040 --> 09:43.520
can't out of the box they can find back doors because what they detect is crash they detect

09:43.520 --> 09:49.120
if the program crashes when you test them but usually if you've done your back door well it won't

09:49.120 --> 09:54.960
crash because otherwise people will notice it quickly so obviously it won't work to detect

09:54.960 --> 09:59.840
back doors in that way and the big challenge in our work was to find a way to teach the

09:59.840 --> 10:05.200
fessor to recognize that at some point during program execution a back door triggered so that was

10:05.200 --> 10:14.000
the big the big deal of our of our work and yeah here comes here come the contributions of our work

10:14.240 --> 10:21.680
so the idea of Rosa is to take out of the box grey box fessor called fl++ and to

10:21.680 --> 10:28.560
couple it with a mechanism called metamorphic oracle that actually enables detecting that

10:28.560 --> 10:33.280
the back door is triggered and yeah I will just give you the intuition and then the metry will go

10:33.280 --> 10:39.280
into the details so the idea is that if you first for example sedu with only one password value you

10:39.280 --> 10:44.960
should observe always the same kind of behavior like seducing the password is wrong but if

10:44.960 --> 10:50.080
suddenly your fessor find out the the hard-coated password value suddenly you will see something

10:50.080 --> 10:55.600
totally divergent divergent by a behavior like a new process being created and other steps

10:55.600 --> 11:01.680
and it's from these divergence that truly is divergent and the fessor can detect the back door in

11:01.680 --> 11:07.280
Rosa and so the other things that we have introduced is Rosa room which is a long overdue

11:07.280 --> 11:14.000
standard has since standardized back door benchmark with 17 programs seven with authentic real

11:14.000 --> 11:19.840
back doors from real attack that we have we we have found and then where we have asked a researcher

11:19.840 --> 11:27.280
to inject synthetic back doors in popular open source programs so yeah now the metry the mic is

11:27.280 --> 11:34.480
yours so that you can give the technical details all right so let's go over a real example to see

11:34.480 --> 11:42.160
how Rosa works right so there's this belkin router model that is fairly popular and people have

11:42.160 --> 11:47.040
figured out that it runs a bullet server so they did this by manually digging into the server

11:47.760 --> 11:52.160
this bullet web server seems to not be maintained anymore but belkin used it at the time

11:52.160 --> 11:56.640
by adding some stuff on top of it and people actually figured out that they've added a back door

11:56.640 --> 12:02.560
in the server so how does Rosa detect this back door well Rosa works in two phases so there's a first

12:02.640 --> 12:08.720
very short phase where Rosa just runs a fessor and discovers what we call the representative inputs

12:08.720 --> 12:13.440
so those are the most common inputs to the program so you know this is an HTTP server so the

12:13.440 --> 12:19.280
fessor generates HTTP requests let's say it first starts with a get request at the root of the server

12:19.280 --> 12:24.800
that yields some control flow graph edges in the server program and also amid some system

12:24.800 --> 12:29.280
goals and that is saved in a database and the fessor goes on maybe you know it manages to

12:29.360 --> 12:34.000
modify the string a little bit and generate a post request that yields slightly different edges so that's

12:34.000 --> 12:39.920
a different input so that gets saved so one and so forth and then very quickly after about a minute

12:39.920 --> 12:45.920
we switch to phase two where the fessor now intensively explores the program and every new behavior

12:45.920 --> 12:52.400
is vetted individually so let's say the fessor again mutates the input a little bit and hits this

12:52.480 --> 13:00.720
other endpoint and that produces some system calls so what we do is we first choose the most similar

13:00.720 --> 13:05.680
representative input from the first phase so let's say that it's b in this case and we compare the

13:05.680 --> 13:11.440
system call vectors and if they're the same then the behaviors are equivalent and since they're

13:11.440 --> 13:18.800
representative inputs are considered to be the nine then we can assume that input c is also safe so that's

13:19.360 --> 13:23.440
but at some point due to what Mikael said due to the guiding the fessor will end up hitting

13:24.400 --> 13:29.280
some hard-coded endpoint that is special in the server and that will produce some different

13:29.280 --> 13:34.560
system calls right so again we'll do the same thing which is the most similar input let's say it's a

13:34.560 --> 13:39.920
from the first phase and the behavior is no longer equivalent so this is suspicious this might be a

13:39.920 --> 13:45.760
background but this alone does not prove that there's indeed a background so a human expert has to

13:45.840 --> 13:52.320
vet these findings and there's a semi-automatic method that allows us to do this fast so what you do

13:52.320 --> 13:58.400
is basically you take the divergent system calls that Rosa gives you in this case you know there's

13:58.400 --> 14:03.760
50 states and 59 if you know your system call numbers you know what those are and you run that through

14:03.760 --> 14:08.960
esterase and then you look at the actual log that esterase gives you and you can basically immediately

14:08.960 --> 14:15.760
see that there's a fork and exec of bin esth so reverse shell on the server without authentication

14:15.760 --> 14:23.520
so obviously this is not right so let's actually do this in practice so huge disclaimer we've

14:23.520 --> 14:27.440
biased the fessor a lot here for the purposes of the demo to go faster so this usually takes

14:27.440 --> 14:32.640
about four hours but most of the overhead is about emulation we can talk about it later if you want

14:32.640 --> 14:43.040
so let's do it so I have here a part of the actually b server that we've decompiled and if we open

14:43.040 --> 14:48.480
it up so this is decompiled code right so it's going to look ugly but basically they've added this

14:49.440 --> 14:54.640
bizarre function on top of the server and if we scroll down a little bit at some point there's a

14:54.640 --> 14:59.840
system call that seems to arbitrarily execute some command right so but we don't know that yet we don't

14:59.920 --> 15:04.960
want to have to reverse engineer the server so how could we find that with rosin well rosin only

15:04.960 --> 15:09.360
really needs a simple configuration file to work fessing is pretty easy if you've done it before

15:09.360 --> 15:15.760
and so there's a wizard basically to generate that config file it's just a simple tomofile

15:15.760 --> 15:21.920
that mostly describes what settings to use in the fessor but that is automatically generated and

15:21.920 --> 15:27.360
if I run it we can see rosin action so it will spin up the fessor in the background and it will

15:27.360 --> 15:33.040
start trying to find the back door in the actual program so the interface looks like a fessor so

15:33.040 --> 15:40.400
we built this with retitude if you're familiar it was really easy to build and the goal here is again

15:40.400 --> 15:45.680
to collect some representative inputs first which it's already done and now we're trying to see if

15:45.680 --> 15:52.720
there's any suspicious behaviors in the program and it should find a back door from roll

15:53.360 --> 16:07.440
if not I'll re-run it but so yeah fessing is not easy to course but yeah basically you have a bunch

16:07.440 --> 16:13.520
of stats and what interests you most is the top right quarter and there we go back or fun so

16:13.520 --> 16:22.400
so I'll stop it here all right so all of that data of course is persisted to disk so we can

16:22.400 --> 16:28.160
look at it at our leisure and so there's this output directory and if I look into it there's a bunch

16:28.160 --> 16:32.800
of stats again if you've used the fessor this should look familiar so what I'm interested in is

16:32.800 --> 16:37.840
the back door sub directory which stores inputs that lead to suspicious behavior so let's look at

16:37.840 --> 16:43.280
what's in there so there's one input that was saved it saved in kind of a weird way because we

16:43.360 --> 16:47.200
did duplicate findings as much as possible so you don't have to look through a ton of them

16:48.000 --> 16:52.160
and how do I know why this is suspicious well there's an if you're a little tool called

16:52.160 --> 16:59.760
Rosa explain where if I give it the name of the input it will explain why this is a back door so

16:59.760 --> 17:05.040
basically it says that it's a back door because it found these system calls with this input but not

17:05.040 --> 17:11.760
with the closest representative one so again I will do what I explained before I will run this through

17:11.760 --> 17:19.040
esterase and I will only feed it I will tell it to only filter these inputs and to use the

17:19.040 --> 17:23.840
back door input that I found before this will run it through esterase and now I can look at the

17:23.840 --> 17:32.320
log and you know the log starts with the QMU fork that AFL uses to run the fessor and if I scroll

17:32.320 --> 17:38.480
down at some point what do you know there's an exact V of SH something something pipes into something

17:39.120 --> 17:44.080
that looks very fishy to me so I'm going to keep looking so actually next thing that I'm

17:44.080 --> 17:48.720
going to do is I'm going to look at the input itself just to get an idea of like what is the input

17:48.720 --> 17:55.120
look like so the input looks like this and again like that's very specific you know I'm hitting

17:55.120 --> 17:59.440
a very specific endpoint and I'm passing a very specific parameter to it so let's slide for real

17:59.440 --> 18:06.640
on the actual server so let's start the server so this is as if we're running on the actual

18:06.640 --> 18:13.200
router I'm going to hit the endpoint right so I'm just going to hit dev oops I'm going to do curl

18:15.600 --> 18:22.960
dev.cgi and then I'm going to hit it with ID whoops I'm roots on the server so I can run

18:22.960 --> 18:26.160
every kind of commands is root on the server and there you go that's how you find the background

18:26.240 --> 18:36.720
all right to wrap up so we went through this effort to build this back door evaluation benchmark

18:36.720 --> 18:42.400
so we can actually see how Rosa phares in a variety of scenarios and we used standard

18:42.400 --> 18:46.880
of fuzzing evaluation practices to do it the questions are can we detect all of these back

18:46.880 --> 18:52.000
doors in different scenarios with enough automation robustness and so on and also how do we

18:52.000 --> 18:59.440
compare to existing tools so as Macau said only one tool is surviving sadly and that tool is

18:59.440 --> 19:05.200
a reverse engineering aid tool basically so we compared with that one so first of all fuzzing is

19:05.200 --> 19:09.120
non-deterministic so you have to run it many times to see if you actually can detect things

19:09.120 --> 19:15.680
reliably and we detect all back doors in the benchmark with an 87% success rate so this is in line

19:15.680 --> 19:21.920
with most fuzzing approaches actually pretty good rate then in terms of time it takes Rosa

19:21.920 --> 19:26.640
an hour and a half on average to detect a back door this is largely due to the emulation overhead

19:26.640 --> 19:32.240
and as you can see the state of the art tool is much much faster because it kind of does

19:32.240 --> 19:37.440
a glorified gripping of strings basically in the binary that allows it to go much faster

19:37.440 --> 19:41.600
but it also cannot detect the vast majority of the back doors because not all backers are built

19:41.600 --> 19:47.840
like that and where Rosa really shines is the automation part where it only reports seven inputs

19:47.840 --> 19:53.120
to vet like I showed you semi-automatically whereas with the state of the art program you have to

19:53.120 --> 19:58.000
look through over 300 strings and what I mean by that is you have to actually reverse the binary

19:58.000 --> 20:04.000
and look through it so yeah in conclusion this is the first generic fuzzer-based back door detector

20:04.000 --> 20:08.800
and also the first back door detection benchmark we detect all back doors in it in an hour

20:08.800 --> 20:14.560
30 minutes on average with seven inputs to vet on average which is 44 times less positives without

20:14.640 --> 20:16.560
reverse engineering without the source code thank you

20:26.240 --> 20:32.640
well you got plenty of time for questions so I see some yeah I have a question about

20:33.680 --> 20:40.240
is that yeah what one kind of techniques do you use to force the code to run through these

20:40.560 --> 20:45.520
pause in general it's not computable probably you kind of even if you have the source code there's

20:45.520 --> 20:52.560
no way to do it in in full generality so what kind of techniques are you using there to force Rosa

20:52.560 --> 20:59.040
to find these you know the right state to enter these these back doors these potential back doors

20:59.040 --> 21:03.840
right so the guidance is fully on the fuzzer side right so we don't actually

21:04.800 --> 21:11.520
change how a fuzzer class in this case goes through the code a fuzzer has like a decades worth

21:11.520 --> 21:18.160
of good heuristics to go through most of the code it possible all of the code so that alone seems to

21:18.160 --> 21:27.920
be reliable enough to trigger all different behaviors have a very basic question you showed the

21:27.920 --> 21:35.760
back in route as an example how did you get the back in route a software web server to run on

21:35.760 --> 21:41.760
another system so that's thanks to you and you right so since we're in a binary only setting

21:42.480 --> 21:46.480
a fuzzer like a couple of plus needs some feedback from the target obviously if you have a

21:46.480 --> 21:51.120
binary you can't get that so you either rewrite it basically or you run it through an emulator

21:51.120 --> 21:55.280
and that's what we use here so the advantage is that you get any piece of information you want

21:55.360 --> 21:59.040
the disadvantage is that it's slow that's why it took four hours for for Belkin

22:07.360 --> 22:15.200
I have two questions the first one is you mentioned it's very well detects like hard-coded credentials

22:15.760 --> 22:22.080
what about like if like it's a little bit smarter like if the input is like an algorithm like a key

22:22.160 --> 22:28.560
generator you know like if the input checks on to a specific thing or this is the first question

22:28.560 --> 22:36.080
and the second one you your example shows a trace and like I assume like Rosa also uses a

22:36.080 --> 22:42.080
trace or somehow to collect the system calls and it's quite common for like hidden back doors to detect

22:42.080 --> 22:47.520
that the program is under a trace and to not reveal behavior so how is that addressed or that

22:47.520 --> 22:55.040
so for the first question so the so we don't only design hardware the factor so and in fact

22:55.040 --> 23:00.240
we have multiple factors in the benchmark that are done on purpose without using hard-coded

23:00.240 --> 23:05.520
credentials the magic there is basically from the fuzzer right so the fuzzer has you risks

23:05.520 --> 23:10.560
to help it solve this magic by problem where yeah you can depend on some lightweight

23:10.560 --> 23:14.400
off-estation let's say something like that if you go much further than that there's actually

23:14.400 --> 23:18.400
like fuzzer blocking techniques for example you can hide something behind a cryptographic check

23:18.400 --> 23:23.680
there's no way for a fuzzer to go through that obviously but that is generally something that's

23:23.680 --> 23:30.080
hard to detect with dynamic analysis and often times it might be also hard to detect by static

23:30.080 --> 23:35.040
analysis for different reasons so that's kind of a general blocking point um for the second question

23:35.040 --> 23:41.520
about estates so Rosa does not actually use estates Rosa relies on to a new and we just literally

23:41.520 --> 23:47.200
like save the register value right when when it's this goes um and yeah sure you have programs that

23:47.200 --> 23:51.520
use estates I mean that's another limitation let's say for the tool you can't maybe you can't

23:51.520 --> 23:56.560
detect those but further ones that don't which honestly all of the real ones that we just said don't

23:57.120 --> 24:04.160
you can do so yeah that's another idea yeah well I was working on a prototype actually with that

24:04.160 --> 24:09.920
but then you're limited to specific systems so you have a trade-off

24:10.160 --> 24:15.600
fuzzer for the presentation are really interesting approach really like it I've got one question

24:15.600 --> 24:26.080
which is why the two repository are archived so um it's because the reason why we couldn't compare

24:26.080 --> 24:30.880
to the state of the art is that none of them were archived so in my opinion if you want to do

24:30.880 --> 24:34.320
signs and do a well you have to archive everything you have to make it available

24:34.400 --> 24:39.360
style archive don't archive and make it out of sense right there are archived and the

24:39.360 --> 24:43.760
proper sense so there are archived for forever there are kids for people to be able to use

24:43.760 --> 24:48.960
far in the future so that you can replicate with the exact commitment hash that I used and so on

24:50.000 --> 24:55.280
so that's the the point behind archiving okay thank you we are out of the time thank you for

24:55.280 --> 25:03.480
the presentation thank you

