WEBVTT

00:00.000 --> 00:12.600
All right, we're going to go back to it and we've got to talk that is a bit different

00:12.600 --> 00:15.040
from the rest of the agenda.

00:15.040 --> 00:18.840
This time it's about setting up EBPF using NIX.

00:18.840 --> 00:27.440
I don't think I've seen talks about NIX and the EBPF before, so that sounds interesting.

00:27.440 --> 00:30.440
So that's EFA presenting that for us.

00:30.440 --> 00:31.440
So yeah.

00:31.440 --> 00:32.440
Go ahead.

00:32.440 --> 00:33.440
Thank you.

00:33.440 --> 00:34.440
Hi, everyone.

00:34.440 --> 00:35.440
My name is EFA.

00:35.440 --> 00:40.840
I'm a PhD student in RIA and this talk will mostly be on the DevOps side.

00:40.840 --> 00:46.440
So if you're expecting very technical EBPF stuff, then stay here after I finish the talk,

00:46.440 --> 00:47.440
please.

00:47.440 --> 00:49.440
Okay.

00:49.440 --> 00:56.440
So this is how this talk is meant to me, like how I demonstrate how I use NIX to do EBPF

00:56.440 --> 00:57.440
review at work.

00:57.440 --> 01:03.760
And the overall cycle is I use this specific laptop to the collaboration and the cycles.

01:03.760 --> 01:08.760
And then use Q-Move and our VD switch to do virtualized networking between the virtual

01:08.760 --> 01:09.760
machines.

01:09.760 --> 01:13.800
And then you use a specific deployment toward them.

01:13.800 --> 01:19.800
I supervise the road and to deploy into the HPC system at the cluster where I work

01:19.800 --> 01:20.800
at.

01:20.800 --> 01:26.400
And then I click collect live metrics from the system as well.

01:26.400 --> 01:33.400
So I started a project for using multi-developing a multi-catching system for network

01:33.400 --> 01:37.080
file system over XDP.

01:37.080 --> 01:40.280
That project is running late.

01:40.280 --> 01:44.000
So why I'm presenting DevOps stuff right now.

01:44.000 --> 01:49.560
And a little background about NIX, NIX is a declarative package manager.

01:49.560 --> 01:52.480
It's also a functional programming language.

01:52.480 --> 02:00.480
And the concept of it is to take your source code and then call a built-in function derivation

02:00.480 --> 02:04.240
to transform your source code into closures.

02:04.240 --> 02:09.680
Closures are just packages built on your system and put into NIX store path.

02:09.680 --> 02:15.120
And NIX OS is an operating system as closure.

02:15.120 --> 02:19.880
And if you interested in NIX and NIX of the stuff, you can head to the bedroom after the

02:19.880 --> 02:23.400
talk.

02:23.400 --> 02:29.440
So what it's a test bid, test bid is like a regular production system, but the code running

02:29.440 --> 02:33.160
on there are a femoral, so it's short lived.

02:33.160 --> 02:38.240
And for this specific test bid, it's called 5,000.

02:38.240 --> 02:47.400
It's a part of slices afar and it's a big, pan-repean HPC project that are being most

02:47.400 --> 02:50.520
of their stuff, it's open source.

02:50.520 --> 02:57.720
And the little flow here, this graph, I showed a little bit earlier, you can take this

02:57.720 --> 03:02.840
part and just swap this with there.

03:02.840 --> 03:12.640
So users SSH into an access node, SSH jump host, and then you can interact with the back

03:12.640 --> 03:18.120
bone at work and then the site specific front end.

03:18.120 --> 03:26.280
So for example, since I work out of Grenoble, I log into the front end and then go to the

03:26.280 --> 03:28.480
front end site for a Grenoble.

03:28.480 --> 03:32.840
And if you want to submit a job, I use a tool called OAR sub.

03:32.840 --> 03:42.360
So OAR is a HPC job management system similar to SLIRM and you use the K deploy to deploy

03:42.360 --> 03:49.120
the actual in-n-rd and kernel to the bare metal machines in the cluster.

03:49.120 --> 03:55.000
So the problems we're having is if you're doing a pretty big project and you need to

03:55.000 --> 04:01.000
synchronize the headers of the compiler versions and sometimes LSPs won't even work with

04:01.000 --> 04:02.800
their editor.

04:02.800 --> 04:12.760
So if you have niche use case in some like guarded BPM health perfunctions like override

04:12.760 --> 04:19.360
return, you have to tweak the compilation flags and for it you'll be working properly.

04:19.360 --> 04:28.200
And since during the stage of development, most people use a QM to do the local testing,

04:28.200 --> 04:33.120
but what if you have multiple machines involved like QM3 for and running these commands

04:33.120 --> 04:35.080
can be pretty hard.

04:35.080 --> 04:38.920
You can do it though and you want to see what if you want to see the networking between

04:38.920 --> 04:39.920
them.

04:39.920 --> 04:44.600
So better to use a wrapper tool for that.

04:44.600 --> 04:51.200
What if you're doing booting inside a cluster and you want to clip a larger amount of data

04:51.200 --> 04:56.520
and the automation scripts for those are non-trivial to write.

04:56.520 --> 05:00.440
And also if you're doing academic work, you want to result to be reproducible.

05:00.440 --> 05:06.400
So the artifact evaluation like evaluators can grab your stuff and then rerun.

05:06.400 --> 05:11.800
So hopefully you get the SM badge, artifact reproduce.

05:11.800 --> 05:18.640
So what worked for me is using a mixed with specific tool, but you can run this on non-nixed

05:18.640 --> 05:19.960
system with mixing stalled.

05:19.960 --> 05:25.600
It's called NIXS VMS, it's basically a wrapper around QMU and generates scripts and you

05:25.640 --> 05:32.080
set up the virtual networking stuff and the starting script for QMU VMS.

05:32.080 --> 05:37.000
So with this you can have multiple machines running different kernels with networking

05:37.000 --> 05:42.920
between them and you can also get the benefits for just using NIX like for example binary

05:42.920 --> 05:44.320
cache and everything.

05:44.320 --> 05:50.040
So you'll be built one path exactly once and never again.

05:50.040 --> 05:53.760
The deployment tool is called NIXS Compose.

05:53.760 --> 06:00.400
It's a multi-favor, flavor deployment tool for firmware experiments and it works with

06:00.400 --> 06:06.840
system the unspawn, Docker bare metal and some other stuff I don't use and you can substitute

06:06.840 --> 06:16.560
this stuff with your own development tool and you get user space tooling so you can write

06:16.600 --> 06:26.360
your own package definition in here and then inputs from will grab native build inputs,

06:26.360 --> 06:31.600
building inputs, library stuff, directing to your shell so you don't need to worry about

06:31.600 --> 06:37.840
the versions and stuff so that's really good and then the compilers and the tools like

06:37.840 --> 06:46.520
LLVM strip will be including there like with this and now if you want to use a weird

06:46.560 --> 06:53.720
kernel let's say multi kernel and it's still in the middle of this and the repo is public.

06:53.720 --> 07:00.320
So to do that you can define a NIX file and you specify the name version and what compilers

07:00.320 --> 07:05.560
tool chain you want to build it with in the source you can either use a local path or

07:05.560 --> 07:12.840
file set or just fetch from whatever get for you use and then you can add your own kernel

07:12.880 --> 07:18.920
patches like it's a list of files or a list of patches can be fetched from where we

07:18.920 --> 07:34.400
remote and then the compilation flags and then you invoke that with a boilerplate plate like this

07:34.400 --> 07:42.040
you're done you have the NIX system with the specific kernel ready to go and if you want

07:42.040 --> 07:48.280
to do testing on one machine so you need to do boilerplate like the invocation function

07:48.280 --> 07:57.400
run XOS test and the name for that and then the declarative NIX closure like with no dot

07:57.400 --> 08:02.200
machine name or can be whatever name you want and then in here it's just regular and

08:02.200 --> 08:06.600
XOS module so you can write whatever you want so here in in this example you are seeing

08:06.600 --> 08:15.080
SCX the EVPF based external scheduler and with just this one line you can have that in the

08:15.080 --> 08:22.600
machine and that here is a block of imperative Python statements you can invoke so this will

08:22.600 --> 08:30.840
be run in the now interactive mode of the testing and this can also be invoked manually in the

08:31.320 --> 08:38.840
interactive mode and what if you want more machines you just add a new block and then

08:39.640 --> 08:45.880
declare whatever you want so this becomes really handy for me well I'm trying to deploy

08:46.920 --> 08:53.880
like a benchmark with 15 machines and just add 15 machines in here or you can use a map function

08:54.520 --> 09:02.040
to wrap the config with shared machine just different names and then it will just be there

09:04.040 --> 09:12.520
but watching side the run XOS test result if you do a ripple you can see that it's the config

09:12.520 --> 09:19.400
for the tested self and driver driver interactive the name and nodes. Nosing here contains the

09:19.480 --> 09:32.200
actual evaluated mixed with system call from the next package lib and with what we talked about

09:32.200 --> 09:40.200
drivers and the drivers will automatically include a script for starting human beings and a

09:40.200 --> 09:47.800
release which to set up virtual networking between them and I wrote a very simple

09:48.920 --> 10:00.120
BPM program to do statics mocking so in here I'm using the the bad way of string comparison and

10:01.800 --> 10:09.320
as the last talk specified and then setting the replacing the entirety of the statics call

10:09.320 --> 10:16.200
with a static zero like almost all every every field is set to zero and then you call on the

10:16.200 --> 10:23.240
overwrite return function which requires a special allow error injection plan error injection flag

10:23.800 --> 10:32.040
and BPM allow k probe allow return or something but you need to flex to for to work and then in

10:32.040 --> 10:40.520
here we also have a k return probe so you can count how many times and get a histogram of the

10:40.520 --> 10:49.160
data we collected so for user space programs we're using cloud flares ebpf exporter so it will load

10:49.160 --> 10:57.240
the object file for you and then you have a shared yam profile where you define the bpf map so

10:57.240 --> 11:02.760
it will read into the map and your bpf we're going to rewind the map and then export it through the

11:02.760 --> 11:13.400
previous format and then you can collect and plot the metrics with we're funna and then

11:14.120 --> 11:20.680
the same thing will work locally and in a large scale deployment but in here for the simplicity

11:20.680 --> 11:28.440
I'm only using a tune-up setup so the program I'll show you guys we're living here the source

11:28.440 --> 11:34.760
block of it and then you give a name and a version to it and then in the input and faces is where

11:34.760 --> 11:40.280
you declare the dependencies and compatos needed and faces are how you invoke it and install into

11:40.280 --> 11:50.120
your next store path and then you call the a wrapper function like packages that SDD and V

11:50.120 --> 11:55.240
that make their version will call the built-in function derivation to build the package and then

11:55.240 --> 12:02.520
derivation gets realized into your closure and in this closure you will see that if you have

12:02.600 --> 12:11.240
lib or bin or whatever it can have dependencies and this can of dependencies forms a closure and

12:11.960 --> 12:19.160
in my specific example the object files are put into lib music directly under the store path and then

12:19.160 --> 12:25.880
in this portion you write an exclusive modules to define your kernel packages and in here I'm

12:25.880 --> 12:33.400
overriding the standard kernel into your kernel with a low air injection flag enabled and also

12:33.960 --> 12:44.440
configuring a system b service to start the bbf exporter to load the object files and export

12:45.480 --> 12:53.640
the metrics and HTTP reverse proxy so I can access it and then done here after the system

12:53.640 --> 12:59.800
configuration is evaluated you can either use a local VM or physical machine to test it out

13:01.400 --> 13:08.520
so for local testing we've already talked about this and if you compile it you won't want

13:08.520 --> 13:16.360
and the future compilation will be pretty fast and then you can push the cache to a remote server

13:16.360 --> 13:23.800
or have a CI system build it so if other people want to do the development as you want to do

13:23.800 --> 13:29.320
then you can just fetch the cache on the server so their build will be fast as well and if you

13:29.320 --> 13:35.160
are working at a very big company your company might be interested in a software built material

13:35.160 --> 13:45.320
and makes it pretty easy to track the dependencies and this morning add another dev room the creator

13:45.400 --> 13:53.960
of system d mentioned a f v saw and next let's local test will use this to create an SSH

13:53.960 --> 14:03.480
backdoor and with one single life code so super easy so now I will show you guys a demo for running

14:03.480 --> 14:05.480
two machines on a local machine

14:06.200 --> 14:25.960
so here is the repository this file is the bpf program I wrote in the same

14:25.960 --> 14:36.440
there were three there's a Yamoconfiguration for reading the map sorry yeah is that better

14:38.040 --> 14:48.840
okay so that's the bpf program that's the config and then in the test is where I set up the module

14:49.160 --> 14:56.440
so this is for local testing for the local test in the non interactive mode I want to

14:56.440 --> 15:04.120
check that on the export side I have these flags enabled and I need to wait for this to start

15:04.120 --> 15:08.680
and then on the other note I can cover the end point and see everything works and on the

15:09.640 --> 15:17.000
modified kernel machine I will make sure that I can get zero on the the time stamps

15:18.840 --> 15:26.600
when the bpf project the object files loaded and then I'll disable the client program so when I

15:26.600 --> 15:29.000
do the step again I can see the regular date again

15:29.560 --> 15:44.600
oops so this is the interactive test driver as you can see the system the SSH proxies enabled and

15:44.600 --> 15:52.920
you can direct the SSH onto the human environment with v-sock let's you start all two

15:52.920 --> 15:58.520
start two machines so now the two machines are starting now can see the logs here

16:05.080 --> 16:12.920
so after it's put it out SSH onto the collector node where it's not the collector node

16:12.920 --> 16:17.720
to get a port forwarding so I can access the graphana on my local machine

16:18.680 --> 16:26.680
on the exporter node

16:34.440 --> 16:46.200
oh if we create a file on the tmp overwrite and then we start it you should see that everything is

16:46.200 --> 17:00.200
zero because the the object file is loaded and if we disable the client side

17:00.200 --> 17:08.520
the unload the object file you can see the data's back you normal let me restart it again

17:08.840 --> 17:28.440
and then already infinite loop do start let's head to the Prometheus side

17:28.440 --> 17:42.120
sorry the graphana side and logging with default password I've an admin skip this

17:42.200 --> 18:00.200
dashboard make a new dashboard visualization permit yes let's do a time series first

18:00.200 --> 18:11.960
to see the count the metrics are prefixed with FSD which means fossilm and let's do the count

18:13.080 --> 18:20.760
around the query can see here is the one way you first put it up you can see some going up

18:20.760 --> 18:24.200
and then we stopped it it's empty for a bit and then we're running infinite loop

18:25.320 --> 18:34.200
it's going at the being precise if we go to histogram and change it to pocket

18:34.440 --> 18:44.200
it can actually see the life it like coming in so that's it for the local demo let me stop it real quick

18:47.560 --> 18:55.240
okay and I think it can see it's really easy to start a two-note setup and even if you add

18:55.240 --> 19:02.200
more machines it will just be a couple lines of code and that's it and for production deployment

19:02.280 --> 19:08.760
it's really nice that you can get bit perfect reproducibility on some store paths for example

19:08.760 --> 19:17.640
kernel and the object file but for some programs compiled with okam or haskell you may not

19:17.640 --> 19:23.400
be get bit perfect reproducibility and I think the maintainers are addressing this problem at the

19:23.400 --> 19:31.640
moment so let's see what it goes and since everything is in the closure the entire like the entirety

19:31.640 --> 19:38.120
of the system like the the configs they the kernels in there are everything and it will be

19:38.120 --> 19:45.960
very easy to write a deployment harness and now let's do a production deployment demo

19:46.680 --> 20:04.360
so in here I have essentially until the granobo node of grid 5000 and right before starting this talk

20:04.360 --> 20:13.480
I deploy the exact same environment to a two-note setup and you can see dachu 1991

20:16.040 --> 20:31.320
let's make the text bigger and if I start a new tab and do k-console k-console is a way to

20:31.400 --> 20:42.600
see the what's going on the machine let's copy 19 here so this is the collector node

20:42.680 --> 21:02.360
that's the export node and if we check the kernel config

21:02.520 --> 21:14.840
error injections is indeed enabled if you go back to the regular node

21:15.480 --> 21:37.080
and a collector x-border have function enabled but the function error injections in dd

21:37.080 --> 21:48.200
is able so it's not touched and I'm the exported node sorry I'm the collector node let me give

21:48.200 --> 21:56.280
the IP address of that and in here we should be able to log into graphana as well

21:56.280 --> 22:05.000
and see the live matrix coming from the exported node skip this

22:27.000 --> 22:38.680
and yeah we do indeed see data here and that's the enough for for the production demo

22:39.320 --> 22:46.200
and in conclusion what you guys just see is made with less than 250 lines of next code so it's

22:46.200 --> 22:53.000
very easy to implement to do and you also get portability with it so all the codes you saw

22:53.800 --> 22:59.720
can be ran on your machine if you tweak a little stuff to fit your program

23:00.760 --> 23:06.760
adding another deployment another bpf program to dd deployment will be as easy as adding the name

23:06.760 --> 23:12.520
for the object file that's it and now I'll take questions

23:18.280 --> 23:18.760
thank you

23:23.800 --> 23:36.280
google chrome is not happy it's blocking something here sorry I'm curious too because the things

23:36.280 --> 23:41.560
that you show here it's it's against kernels that have been either package in nixOS or in

23:41.560 --> 23:46.680
github that you can compile have you tried I don't even know if it's possible but like

23:46.680 --> 23:52.120
kernels that for example Ubuntu or fedora users so you can actually test against things running

23:52.120 --> 23:59.560
for those you can yeah so you can grab whatever source you want so for example in this page

24:00.840 --> 24:06.280
this page you're here yeah so if you change the owner repository or whatever source you want

24:06.280 --> 24:10.520
if you tweak the configs a little bit you can get a compiling and then you should work okay

24:11.880 --> 24:18.280
follow up question do you know if there's people building packages for nixOS but kernels for kernels

24:18.360 --> 24:26.200
running for other OS's I'm not aware but if you search for a project called

24:27.880 --> 24:33.720
system manager or something it allows you to want to like manage

24:35.240 --> 24:40.280
configs and everything in other open system and also there is another project for rather name of

24:40.280 --> 24:48.680
and allows you to run nixOS tests in not only nixOS v-s buying v-s arch Linux v-s or whatever

24:48.680 --> 24:52.040
you want so I think okay awesome yeah okay thank you

24:56.520 --> 25:04.920
no more questions so I had a question you mentioned earlier that some package

25:04.920 --> 25:14.040
histories might be like completely reproducible like byte perfect yeah do you know what

25:15.080 --> 25:21.560
the differences are between those that for which that works and those that for which it doesn't work

25:23.080 --> 25:33.480
so bbf be affected by that so for now bit perfect store paths for example I believe oh

25:33.480 --> 25:38.600
camel programs are not perfect reproduced because in somewhere during the compilation stage

25:40.440 --> 25:47.240
the types the packs for the types you're going to strip in the library file header files will change

25:47.240 --> 25:53.320
because that part is not different not differentistic and the O camel comparison so that's

25:53.320 --> 25:59.080
a portion if you're why it's not 100% reproducible but if that one gets up address upstream they should be

25:59.960 --> 26:09.000
all right thank you another question and you mentioned that one of the motivations for

26:09.000 --> 26:15.480
working with nix for the process to help with sharing your environment with some of the people

26:15.480 --> 26:25.640
like forgetting we preserve reproducible that well beats or results have you shared your environment

26:25.640 --> 26:32.600
with colleagues for working with bbf before and how did that go so I've shared it with my supervisor

26:32.600 --> 26:39.240
I'm not sure if he's actually running my code but he's happy with it and I have multiple machines

26:39.240 --> 26:45.400
I have this laptop and another laptop running in the lab and a couple more pretty powerful

26:45.400 --> 26:52.280
machine in the cluster and I just only need to clone the repository and run third and a lot so that

26:52.360 --> 26:58.120
all my developing environment gets on the same machine exactly how I want including the LSPs and

26:58.120 --> 27:03.320
everything it's pretty good it's a you see good at what your your took

27:04.520 --> 27:11.240
your supervisor you see good at what you took I think so so you should definitely on behalf of the bf

27:11.240 --> 27:17.000
organizing team you should definitely try it so that we can know yeah and I can show you guys one more

27:17.000 --> 27:24.120
thing in this repository the reason why this file does not have diagnostic is because I didn't

27:24.120 --> 27:32.600
run with bear so if I clean the build repository and then do a bear with it and then it generates the

27:33.960 --> 27:39.240
the compiled commands and then I go back in again now everything's good plan these running

27:40.200 --> 27:47.640
easy asset and concretely if I want you to try your environment I just need to install

27:47.640 --> 27:52.840
nicks on my machine and then I can clone the repository and then run next developed with your

27:52.840 --> 27:57.640
and perfect shell zsh whatever you want and then you should get the same advantage yeah all right thank you

27:59.640 --> 28:04.600
the next door so yeah coming when you install nicks all is at some additional stuff

28:10.200 --> 28:15.240
so if I use the install right set it up automatically okay thank you any other questions

28:17.880 --> 28:21.720
like once going twice all right thank you thank you

