WEBVTT

00:00.000 --> 00:08.160
I'm happy for that, very happy to be there.

00:08.160 --> 00:13.320
This is my second facem, and I'm very, very excited to be in the EVP approved for the first

00:13.320 --> 00:14.320
sign.

00:14.320 --> 00:18.240
So I'm the Nia Shailu, I'm a software engineer at Isovatant, and I'm a community

00:18.240 --> 00:21.520
oriented, and I'm going to present with.

00:21.520 --> 00:27.640
Hi, I'm Chris Tarazi, a senior staff engineer at Isovatant at Cisco, and a contributor

00:27.640 --> 00:33.560
on the Scaling Project for the past six years.

00:33.560 --> 00:38.200
We're going to look into some common, yes, OK.

00:38.200 --> 00:40.320
We're going to close it to the mic.

00:40.320 --> 00:41.320
OK.

00:41.320 --> 00:42.320
Better?

00:42.320 --> 00:44.320
OK.

00:44.320 --> 00:47.640
We're going to look into some common and interesting hookpoint gotchas.

00:47.640 --> 00:51.120
So as a disclaimer, this won't be an exhaustive list.

00:51.120 --> 00:54.840
So there's definitely other existing gotchas that we're not going to cover in this talk.

00:58.080 --> 01:01.080
So he's a click-up review of what desktop will cover.

01:01.080 --> 01:03.520
We're going to introduce what are tracing hookpoints.

01:03.520 --> 01:07.800
I feel like you're quite familiar with them, but we're going to say the basics.

01:07.800 --> 01:10.640
And we're going to cover some generic gotchas.

01:10.640 --> 01:13.200
And we're thinking to more of a nice and interesting ones.

01:13.200 --> 01:16.360
We'll see, you know, about it or not.

01:16.360 --> 01:21.680
All right, so let's do a quick overview of the most popular tracing hookpoints

01:21.680 --> 01:23.880
that we can use with EVPF.

01:23.880 --> 01:27.080
And these are the ones that we'll also cover in the talk.

01:27.080 --> 01:31.840
So starting with K probes and U probes, these are roughly the same except K probes

01:31.840 --> 01:37.320
for kernel functions and U probes for user space functions.

01:37.320 --> 01:43.280
What's unique about them is they can be attached to any offset from the function entry or exit.

01:43.280 --> 01:48.360
And these probes don't have stability guarantees because they're dynamic.

01:48.360 --> 01:50.960
For example, between kernel versions.

01:50.960 --> 01:58.440
And so like if the function you attach to changes, you'll be required to rewrite the hook program.

01:58.440 --> 02:05.040
They've existed for a long time in the kernel, so they're the most ubiquitous and easiest to use.

02:05.040 --> 02:10.800
Yeah, so and F probes are similar to K probes, but they differ in a few ways.

02:10.800 --> 02:17.080
First, they use the F trace infrastructure, which makes them more efficient.

02:17.080 --> 02:23.120
Especially if your K probes are not running as optimized, we'll touch on that later.

02:23.120 --> 02:29.160
And F probes can only be attached on the function entry or exit and don't support offsets like K probes.

02:29.160 --> 02:33.680
And lastly, we have trace points, which are statically defined points in the kernel.

02:33.680 --> 02:41.640
And these are designed for stability, of course, there's caveats here and there, but for the most part.

02:41.640 --> 02:45.840
So they work by injecting five bytes of no ops at the trace point entrance,

02:45.840 --> 02:52.840
which are then later re-written into jumps to your trace point program.

02:54.960 --> 03:00.240
So here is some performance benchmarks that were done in various scenarios.

03:00.240 --> 03:06.800
And here it's showcasing the different performance difference between these hook point types.

03:06.800 --> 03:13.080
The most relevant number here is the overhead percentage that gives us like the high level picture of the performance cost.

03:13.080 --> 03:18.480
Keep in mind, the amount of overhead is dependent on whether the function probe is in the hot path or not.

03:18.480 --> 03:26.680
So it's not a direct translation to the overall systems load-up to overall systems load-up.

03:26.680 --> 03:33.880
Unsurprisingly, the trace points here have the highest performance followed by F probes and then K probes last.

03:33.880 --> 03:43.040
The K probes under the default intrap based implementation relies on breakpoints, which clearly is the least perform implementation.

03:43.040 --> 03:54.040
So the first gotcha we're going to talk about now that we have this overview is about terminal versions.

03:54.040 --> 04:00.040
So on most of the infrastructure out there in production, there are different terminal versions running.

04:00.040 --> 04:07.040
I heard that meta has like 20 around 20 kernel versions running in parallel.

04:07.040 --> 04:08.040
That's interesting.

04:08.040 --> 04:13.040
How do you write EBPF programs running on different kernel versions?

04:13.040 --> 04:16.040
K prob and IF prob have no stability guarantee.

04:16.040 --> 04:18.040
Please just mention it.

04:18.040 --> 04:23.040
And terminal internal functions can change with any new release, basically.

04:23.040 --> 04:26.040
Function can be a new name, removed in line.

04:26.040 --> 04:31.040
And then you're probably going to fail to attach because the symbol is not found.

04:31.040 --> 04:39.040
So if you're not familiar, it's just like probably it's going to actually symbol corresponding to something in your program and then convert that into an offset.

04:39.040 --> 04:42.040
And that's actually to that offset.

04:42.040 --> 04:44.040
The function arguments can also change.

04:44.040 --> 04:47.040
I mean, you're a factory encode every day, you know.

04:47.040 --> 04:49.040
So your family always try to change it.

04:49.040 --> 04:55.040
Like you're adding a new feel or maybe you're just moving around stuff because you want to reorganize your code.

04:55.040 --> 05:01.040
And you can read the wrong arguments if you're running that program in another variable.

05:01.040 --> 05:05.040
And you will just have garbage drive at the end.

05:05.040 --> 05:10.040
So how do you handle that with F prob and K prob?

05:10.040 --> 05:14.040
One of the alternatives is to actually use trace points.

05:14.040 --> 05:19.040
There are actually more reliable because there is a guarantee of stability.

05:19.040 --> 05:22.040
The arguments were documented and maintained.

05:22.040 --> 05:25.040
It's basically done by Linux kernel developers.

05:25.040 --> 05:27.040
So you can trust that.

05:27.040 --> 05:31.040
And you can see all the available events there.

05:31.040 --> 05:33.040
I have a question for you.

05:33.040 --> 05:36.040
You're going to actually answer a participant during this talk.

05:36.040 --> 05:38.040
It's interacting if you didn't know.

05:38.040 --> 05:42.040
So what if there is no trace point for the function that you want to cover?

05:42.040 --> 05:44.040
What is your option?

05:44.040 --> 05:45.040
Who say A?

05:45.040 --> 05:47.040
Okay, just have a look.

05:47.040 --> 05:48.040
Who say A?

05:48.040 --> 05:49.040
Hardcore.

05:49.040 --> 05:51.040
The function from name and hope it doesn't change.

05:51.040 --> 05:52.040
No one.

05:52.040 --> 05:53.040
Right.

05:53.040 --> 05:56.040
Separate BPS program for each kernel version.

05:56.040 --> 05:58.040
No one.

05:58.040 --> 06:01.040
Use core with BTTF for automatic adaptation.

06:01.040 --> 06:02.040
Yay.

06:02.040 --> 06:03.040
Some hands.

06:03.040 --> 06:06.040
I've done an EDPF and use kernel module instead.

06:06.040 --> 06:07.040
That is great people.

06:07.040 --> 06:08.040
Yeah.

06:08.040 --> 06:09.040
Okay.

06:09.040 --> 06:10.040
Grace.

06:10.040 --> 06:11.040
Yeah.

06:11.040 --> 06:12.040
That was actually the answer.

06:12.040 --> 06:13.040
See.

06:13.040 --> 06:15.040
I mean, if you want to go for D, that's your choice.

06:15.040 --> 06:17.040
So let's see what is core.

06:17.040 --> 06:21.040
Core is like compile once run everywhere with BTF.

06:21.040 --> 06:25.040
Core means that BTF programs can add up to kernel at low time.

06:25.040 --> 06:29.040
And BTF means BPS type format.

06:29.040 --> 06:33.040
And there are type information on your program that are actually stored somewhere in the kernel.

06:33.040 --> 06:35.040
Not embedded in the kernel, sorry.

06:35.040 --> 06:39.040
And you will have a location on the fly.

06:39.040 --> 06:43.040
And the BFF example will add just offset automatically.

06:43.040 --> 06:45.040
So you're attaching to an offset.

06:45.040 --> 06:48.040
And those offset are stored for each part of your program.

06:48.040 --> 06:51.040
And then you can attach wherever you want on the fly.

06:51.040 --> 06:53.040
And it's going to work.

06:53.040 --> 06:54.040
Amazing.

06:54.040 --> 06:56.040
Let's move to the second gacha.

06:56.040 --> 06:58.040
The architecture.

06:58.040 --> 07:00.040
So K-Probs.

07:00.040 --> 07:02.040
If you don't know, have infernal interface.

07:02.040 --> 07:05.040
The context path to a K-Prob is like a structure.

07:05.040 --> 07:07.040
Obscure one.

07:07.040 --> 07:10.040
PDRX, which is a strike for presenting the registers.

07:11.040 --> 07:13.040
Where it's going to store the function parameters.

07:13.040 --> 07:16.040
So if you want to run a program.

07:16.040 --> 07:18.040
If you're writing a program.

07:18.040 --> 07:21.040
And that you want to run it on different architecture.

07:21.040 --> 07:23.040
It's not going to work as is.

07:23.040 --> 07:27.040
Because all the functions that are calling the registers are different.

07:27.040 --> 07:31.040
Basically, the registers are storing all your information on different on the different architecture.

07:31.040 --> 07:33.040
886, Aaron 64.

07:33.040 --> 07:35.040
And the way to access them is different.

07:35.040 --> 07:37.040
So the function to access them are different.

07:37.040 --> 07:41.040
So if you write a program for one architecture, one another,

07:41.040 --> 07:42.040
it's not going to be the same.

07:42.040 --> 07:45.040
So what are your alternators?

07:45.040 --> 07:47.040
You can, again, use other hook points.

07:47.040 --> 07:49.040
Maybe F-Prob trace points.

07:49.040 --> 07:50.040
No road trace points.

07:50.040 --> 07:52.040
I actually don't want to do that.

07:52.040 --> 07:53.040
But yeah.

07:53.040 --> 07:57.040
And there is also the BPS provider that helps to access

07:57.040 --> 07:58.040
the registers.

07:58.040 --> 08:02.040
So basically, it's agnostic of the type of registers.

08:02.040 --> 08:04.040
It's going to rot at for you.

08:04.040 --> 08:09.040
And you're going to specify when you're compiling the type of architectures that you want.

08:09.040 --> 08:13.040
So at the end of what you have is one program for several architectures.

08:13.040 --> 08:17.040
And you're going to choose when you're compiling the type of architecture that you are.

08:17.040 --> 08:18.040
That you want to.

08:18.040 --> 08:20.040
So several binaries.

08:20.040 --> 08:23.040
But one code to maintain at the end.

08:23.040 --> 08:29.040
Okay, let's move on to the third one, dynamic links.

08:29.040 --> 08:32.040
So we have a simple program here.

08:32.040 --> 08:35.040
We have a library, which is doing an addition.

08:35.040 --> 08:38.040
I think that we all familiar with C program here.

08:38.040 --> 08:39.040
Is it okay?

08:39.040 --> 08:40.040
Yeah.

08:40.040 --> 08:42.040
Then we have a main.

08:42.040 --> 08:46.040
I just put here the body's important.

08:46.040 --> 08:51.040
So the rest of the main is loading, attaching the program and run.

08:51.040 --> 08:54.040
The add function, calling the add function.

08:54.040 --> 08:55.040
Okay.

08:55.040 --> 09:00.040
What I'm going to do is to use this BPS trace to actually probe to the add function.

09:00.040 --> 09:02.040
So I'm going to compile the library.

09:02.040 --> 09:04.040
I'm going to compile the main.

09:04.040 --> 09:09.040
And then add to the library and to the function add.

09:09.040 --> 09:16.040
And when I take the symbol table, I can see that I have the function add with enough that.

09:16.040 --> 09:18.040
And then I'm going to run this program.

09:18.040 --> 09:21.040
I have a basic addition that is actually done.

09:21.040 --> 09:26.040
And I can see that my probe is actually like heating.

09:26.040 --> 09:27.040
We can see the log here.

09:27.040 --> 09:28.040
Can you see my cursor?

09:28.040 --> 09:29.040
Yeah.

09:29.040 --> 09:30.040
You can see that.

09:30.040 --> 09:31.040
Okay.

09:31.040 --> 09:32.040
We're happy.

09:32.040 --> 09:36.040
We have a trace that add function has been called.

09:36.040 --> 09:39.040
So what I'm going to do now is that I'm going to update the library.

09:39.040 --> 09:41.040
Because I don't know.

09:41.040 --> 09:43.040
I want to update the library.

09:43.040 --> 09:45.040
You do that when you code and add some code.

09:45.040 --> 09:48.040
So I'm going to insert new function, subtract and multiply here.

09:48.040 --> 09:53.040
I'm going to add them above my add function.

09:53.040 --> 09:56.040
Then I'm going to call again my program.

09:56.040 --> 09:59.040
I state I attached to my BFF trace.

09:59.040 --> 10:02.040
Because I'm running debugging staff or production staff.

10:02.040 --> 10:04.040
And I have again a question for you.

10:04.040 --> 10:06.040
So your view probably is tracing add.

10:06.040 --> 10:08.040
It totally gets a library and adds new function.

10:08.040 --> 10:10.040
You're running a program again.

10:10.040 --> 10:12.040
What do you see?

10:12.040 --> 10:14.040
What is the output?

10:14.040 --> 10:15.040
Is it a?

10:15.040 --> 10:16.040
Everything worth fine.

10:16.040 --> 10:18.040
You're going to see the add again.

10:18.040 --> 10:19.040
Is it b?

10:19.040 --> 10:21.040
You're going to have the subtract?

10:21.040 --> 10:22.040
Is it c?

10:22.040 --> 10:23.040
Nothing.

10:23.040 --> 10:25.040
I'm going to stop working or not.

10:25.040 --> 10:27.040
I'll be about.

10:27.040 --> 10:29.040
C?

10:29.040 --> 10:31.040
Yay.

10:31.040 --> 10:33.040
Yeah, I think nothing is going to happen.

10:33.040 --> 10:35.040
So why?

10:35.040 --> 10:37.040
Yeah.

10:37.040 --> 10:39.040
That's right.

10:39.040 --> 10:43.040
So you're going to recompile the library.

10:43.040 --> 10:45.040
And the offset are going to change.

10:45.040 --> 10:49.040
So the add function is going to have a new offset.

10:49.040 --> 10:52.040
But your BFF trace attached to the old symbol and the old offset.

10:52.040 --> 10:57.040
So you have to actually reattach in the case of updating libraries

10:57.040 --> 11:00.040
to make sure that your probe is going to actually

11:00.040 --> 11:02.040
going to hit.

11:02.040 --> 11:05.040
Such a gacha, right?

11:05.040 --> 11:09.040
So we've seen that library compile updated.

11:09.040 --> 11:12.040
Then in production you have library updates.

11:12.040 --> 11:13.040
But it's very rare.

11:13.040 --> 11:14.040
Yeah.

11:14.040 --> 11:16.040
It can happen in case of security patches.

11:16.040 --> 11:18.040
But that's not often.

11:19.040 --> 11:22.040
And your monitoring system may break silencing

11:22.040 --> 11:23.040
without warning.

11:23.040 --> 11:25.040
You have different solutions for that.

11:25.040 --> 11:29.040
You can monitor for file using, like, I notify weight.

11:29.040 --> 11:32.040
You can use the provider program to reattach automatically.

11:32.040 --> 11:35.040
If you have any other alternative, please share.

11:35.040 --> 11:37.040
Thanks.

11:37.040 --> 11:42.040
Let's move on to the fourth gacha about inlining.

11:42.040 --> 11:44.040
So compiler have.

11:45.040 --> 11:50.040
Advanced form of inlining depending on the level of optimization you choose.

11:50.040 --> 11:54.040
And we're going to go a little bit more into detail.

11:54.040 --> 11:56.040
But we're going to see only two gacha about inlining.

11:56.040 --> 12:00.040
And as Daniel mentioned a few days ago, there is a zoo of compiling

12:00.040 --> 12:02.040
like compilation gacha.

12:02.040 --> 12:05.040
There are many of them.

12:05.040 --> 12:09.040
But I found two very cool ones that I want to share with you.

12:09.040 --> 12:11.040
So what is inlining first?

12:11.040 --> 12:15.040
Inlining is an optimization where a function call is replaced by the actual code

12:15.040 --> 12:18.040
of the function itself at the point of call.

12:18.040 --> 12:20.040
We're going to go more into detail.

12:20.040 --> 12:22.040
That's the basic definition.

12:22.040 --> 12:25.040
Okay, let's have a look to code again.

12:25.040 --> 12:26.040
Add function.

12:26.040 --> 12:27.040
I'm not very original.

12:27.040 --> 12:28.040
And main.

12:28.040 --> 12:34.040
This time, if you pass an argument that is under five,

12:34.040 --> 12:36.040
we're just going to return.

12:36.040 --> 12:37.040
So early return.

12:37.040 --> 12:39.040
Otherwise, we're going to call the add.

12:42.040 --> 12:45.040
Then I'm going to compile without any optimization.

12:45.040 --> 12:48.040
So 0, 0, 0.

12:48.040 --> 12:50.040
And I'm going to see when I design symbol the program,

12:50.040 --> 12:58.040
which is an M2 tool that I can see the offset my symbol add.

12:58.040 --> 13:02.040
And the T just means that it's global external symbol.

13:02.040 --> 13:07.040
Then I'm going to compile with a level of optimization of O2.

13:07.040 --> 13:09.040
And I'm happy.

13:09.040 --> 13:11.040
I still have my symbol.

13:11.040 --> 13:19.040
So I can normally attach in any case to my add function and see my probes, right?

13:19.040 --> 13:21.040
So let's see.

13:21.040 --> 13:22.040
Okay.

13:22.040 --> 13:26.040
So it's going to be tricky because I don't have my ID here.

13:26.040 --> 13:28.040
So bear with me please.

13:28.040 --> 13:31.040
Mirror ring here.

13:31.040 --> 13:33.040
Okay.

13:33.040 --> 13:36.040
So select even lying, right?

13:36.040 --> 13:39.040
I have the non-optimized.

13:39.040 --> 13:42.040
I'm going to trace the non-optimized one.

13:42.040 --> 13:44.040
Okay.

13:44.040 --> 13:45.040
Attach in here.

13:45.040 --> 13:49.040
And then I'm going to run my program mute.

13:49.040 --> 13:50.040
Okay.

13:50.040 --> 13:53.040
Let's not mess up that.

13:53.040 --> 13:55.040
Okay.

13:55.040 --> 13:57.040
Then I'm going to run it here.

13:57.040 --> 13:58.040
Okay.

13:58.040 --> 13:59.040
You can see my history here.

13:59.040 --> 14:00.040
Let's clean it.

14:00.040 --> 14:01.040
This price.

14:01.040 --> 14:02.040
Okay.

14:02.040 --> 14:03.040
You can see the usage.

14:03.040 --> 14:04.040
I have 0.

14:04.040 --> 14:05.040
It's under five.

14:06.040 --> 14:08.040
Then I'm going to run it with 10.

14:08.040 --> 14:11.040
So you still have nothing is happening in the problem, right?

14:11.040 --> 14:13.040
Because the function is not cold.

14:13.040 --> 14:14.040
Then you can see.

14:14.040 --> 14:15.040
It's actually heating.

14:15.040 --> 14:16.040
Great.

14:16.040 --> 14:17.040
We're happy.

14:17.040 --> 14:18.040
Okay.

14:18.040 --> 14:21.040
Now I'm going to do it with the optimized one.

14:21.040 --> 14:24.040
So let me trace the optimized one first.

14:24.040 --> 14:25.040
Okay.

14:25.040 --> 14:27.040
Can you all see your book?

14:27.040 --> 14:28.040
Yeah.

14:28.040 --> 14:31.040
Okay.

14:31.040 --> 14:32.040
Okay.

14:32.040 --> 14:34.040
Then I'm going to run again with zero.

14:34.040 --> 14:36.040
And I have no problem.

14:36.040 --> 14:37.040
That's normal.

14:37.040 --> 14:38.040
Right?

14:38.040 --> 14:41.040
Then I'm going to do it with 10.

14:41.040 --> 14:42.040
Oh.

14:42.040 --> 14:44.040
No problem again.

14:44.040 --> 14:45.040
Oops.

14:45.040 --> 14:46.040
What's happening?

14:46.040 --> 14:47.040
Let's see.

14:47.040 --> 14:48.040
Okay.

14:48.040 --> 14:50.040
So if I go back to the slide.

14:50.040 --> 14:52.040
So here what you're saying.

14:52.040 --> 14:53.040
Okay.

14:53.040 --> 14:54.040
Don't be afraid.

14:54.040 --> 14:55.040
It's okay.

14:55.040 --> 14:57.040
It's only this assembly binary.

14:57.040 --> 14:59.040
We have a bunch of instructions here.

14:59.040 --> 15:03.040
What you want to look at is the outside of the offset of the main.

15:03.040 --> 15:05.040
And the offset of the add.

15:05.040 --> 15:06.040
Okay.

15:06.040 --> 15:08.040
I have on one side.

15:08.040 --> 15:09.040
Not optimization.

15:09.040 --> 15:10.040
It's branching.

15:10.040 --> 15:12.040
You can see the BL here.

15:12.040 --> 15:13.040
Just here.

15:13.040 --> 15:20.040
So BL means that the main instruction are going to jump and run to the add function.

15:20.040 --> 15:22.040
And it's going to attach to the other offset.

15:22.040 --> 15:25.040
The actual programs.

15:25.040 --> 15:31.040
So the offset is corresponding to the actions instruction that are going to be executed.

15:31.040 --> 15:36.040
What's happening in case of optimization is that I still have my main idea of set.

15:36.040 --> 15:37.040
Right?

15:37.040 --> 15:42.040
Then instead of having this branching, I have all the instructions that was in the function that

15:42.040 --> 15:45.040
are in line in the main program in the caller.

15:45.040 --> 15:48.040
And you can see here that I have the add.

15:48.040 --> 15:50.040
Add a new offset inserted.

15:50.040 --> 15:52.040
All the instructions are started.

15:52.040 --> 15:56.040
So when I was attaching to the symbol.

15:56.040 --> 15:58.040
It's not the proper offset.

15:58.040 --> 16:02.040
The actual one that is going to be executed by the CPU.

16:02.040 --> 16:06.040
So how do you fix that?

16:06.040 --> 16:08.040
Do you have any idea?

16:08.040 --> 16:10.040
Okay.

16:10.040 --> 16:11.040
What?

16:11.040 --> 16:12.040
No, it might not go.

16:12.040 --> 16:13.040
Okay.

16:13.040 --> 16:14.040
That's an option.

16:14.040 --> 16:17.040
I actually have another one for you.

16:17.040 --> 16:22.040
I'm going to show you how you can attach to offset directly.

16:22.040 --> 16:24.040
That's very little level.

16:24.040 --> 16:26.040
I don't even know what else we could do that.

16:26.040 --> 16:27.040
Okay.

16:27.040 --> 16:28.040
That's funny.

16:28.040 --> 16:32.040
So I'm going to use Perf for a funny reason when I was doing like a 2AM yesterday.

16:32.040 --> 16:34.040
My final demos.

16:34.040 --> 16:40.040
That you can not use 2bpftrace version like the 20 and the 24 because they're not

16:40.040 --> 16:44.040
supporting the same like NM comments.

16:44.040 --> 16:47.040
So my two examples were not working.

16:47.040 --> 16:49.040
So maybe through regression if you want to file a bad.

16:49.040 --> 16:51.040
I can give you information.

16:51.040 --> 16:54.040
Anyway, I'm going to use Perf for that.

16:54.040 --> 16:57.040
So Perf is going to trace it also.

16:57.040 --> 16:58.040
Let's do it.

16:58.040 --> 17:00.040
Bear with me again.

17:00.040 --> 17:01.040
Mirroring.

17:01.040 --> 17:02.040
Okay.

17:02.040 --> 17:05.040
So I'm going to attach to the offset here.

17:05.040 --> 17:12.040
I didn't tell you about this offset is just like the instruction of the function.

17:12.040 --> 17:13.040
Sorry.

17:13.040 --> 17:17.040
The offset of the instruction at minus the main one.

17:17.040 --> 17:20.040
So this is a variety of offset on the main.

17:20.040 --> 17:23.040
But I computed here and I'm going to attach to this offset.

17:23.040 --> 17:26.040
So main plus the relay relating offset.

17:26.040 --> 17:27.040
Okay.

17:27.040 --> 17:29.040
Let me.

17:29.040 --> 17:31.040
Okay.

17:31.040 --> 17:34.040
Let me do that.

17:34.040 --> 17:35.040
Okay.

17:35.040 --> 17:37.040
So I'm going to do it on target zero.

17:37.040 --> 17:39.040
So we're going to see what we see.

17:39.040 --> 17:40.040
We got it.

17:40.040 --> 17:41.040
Yeah.

17:41.040 --> 17:42.040
I mean, yeah.

17:42.040 --> 17:43.040
It's done.

17:43.040 --> 17:44.040
Then I just.

17:44.040 --> 17:46.040
We got all the even start to have been captured.

17:46.040 --> 17:47.040
And there's nothing.

17:47.040 --> 17:48.040
Right?

17:48.040 --> 17:51.040
Because I'm calling the not optimized function.

17:51.040 --> 17:56.040
Then I'm going to call the optimized program here.

17:56.040 --> 18:01.040
And.

18:01.040 --> 18:02.040
Yeah.

18:02.040 --> 18:06.040
Let's see what's happening.

18:06.040 --> 18:07.040
Yay.

18:07.040 --> 18:09.040
We have actually a problem.

18:09.040 --> 18:13.040
So you can see that by attaching to the main with the offset.

18:13.040 --> 18:18.040
We are actually capturing at the proper instruction.

18:18.040 --> 18:21.040
That is doing the ad.

18:21.040 --> 18:23.040
So yeah, let me know if you do that sometime.

18:23.040 --> 18:24.040
Well, they're begging.

18:24.040 --> 18:26.040
That would be funny.

18:26.040 --> 18:27.040
Okay.

18:27.040 --> 18:28.040
Confision.

18:28.040 --> 18:30.040
So we've seen that.

18:30.040 --> 18:35.040
About selecting the lining that the compiler can actually have the symbol existing in the binary.

18:35.040 --> 18:37.040
It's visible using an m.

18:37.040 --> 18:39.040
But then it's in line.

18:39.040 --> 18:40.040
Into the color.

18:40.040 --> 18:43.040
So you probably have such a symbol on the symbol.

18:43.040 --> 18:46.040
But it's never going to fire.

18:47.040 --> 18:51.040
Because it's like the executed instruction or all in line in the code.

18:51.040 --> 18:53.040
How you can detect that.

18:53.040 --> 18:54.040
You can use object dump.

18:54.040 --> 18:56.040
You can use LLMium dwarf dump.

18:56.040 --> 18:58.040
There are not very easily friendly.

18:58.040 --> 18:59.040
I might admit.

18:59.040 --> 19:02.040
But when you get used to it, that's okay.

19:02.040 --> 19:07.040
And you can check if the function is in line by seeing if it's branching or law.

19:07.040 --> 19:11.040
Or if you have directly the instruction.

19:11.040 --> 19:15.040
So be aware of one thing.

19:16.040 --> 19:19.040
It's that there are some caveats.

19:19.040 --> 19:22.040
Here we have only one call.

19:22.040 --> 19:26.040
But you can have multiple pops for all call site.

19:26.040 --> 19:29.040
So that can be a problem.

19:29.040 --> 19:30.040
Okay.

19:30.040 --> 19:33.040
Let's move on to the next one.

19:33.040 --> 19:35.040
K-Prob and you probably lining.

19:35.040 --> 19:38.040
So we have a program again.

19:38.040 --> 19:43.040
And how many symbols does this generate when compiling is the following come in O2?

19:43.040 --> 19:45.040
One, two, ten.

19:45.040 --> 19:47.040
But the whole symbols.

19:47.040 --> 19:49.040
You don't know.

19:49.040 --> 19:50.040
It's actually two.

19:50.040 --> 19:53.040
So we have a part.

19:53.040 --> 19:56.040
What's happening with GCC?

19:56.040 --> 19:59.040
We have a past path and a slow path.

19:59.040 --> 20:02.040
Basically we have before and after.

20:02.040 --> 20:04.040
Before we have one function.

20:04.040 --> 20:05.040
After we have two parts.

20:05.040 --> 20:07.040
We have the fast path.

20:07.040 --> 20:09.040
Which is just the early return of the code.

20:09.040 --> 20:12.040
And then we jump to the processing part.

20:13.040 --> 20:16.040
So instead of having one symbol holding all the code.

20:16.040 --> 20:18.040
We have two symbols.

20:18.040 --> 20:21.040
Why does GCC dot dot do that?

20:21.040 --> 20:24.040
It's just like it's better for range prediction.

20:24.040 --> 20:27.040
It's smaller like instruction in the cache.

20:27.040 --> 20:29.040
It's registered register pressure.

20:29.040 --> 20:32.040
I mean it's doing optimization.

20:32.040 --> 20:35.040
So let's have a look quickly.

20:35.040 --> 20:37.040
Okay.

20:37.040 --> 20:40.040
Let's move here.

20:40.040 --> 20:43.040
And here.

20:43.040 --> 20:46.040
Okay.

20:46.040 --> 20:49.040
Let's do that quickly.

20:49.040 --> 20:54.040
So I'm going to trace on no partial log.

20:54.040 --> 20:56.040
That means that it's not optimized.

20:56.040 --> 21:00.040
I can see if I'm running that.

21:00.040 --> 21:03.040
That I have called here.

21:03.040 --> 21:09.040
So it's probing when I'm calling with the values that are actually not early return.

21:10.040 --> 21:12.040
Then I'm going to.

21:12.040 --> 21:13.040
Okay.

21:13.040 --> 21:14.040
Do.

21:14.040 --> 21:16.040
BFF trace.

21:16.040 --> 21:18.040
On the one that is optimized.

21:18.040 --> 21:20.040
And I'm going to run it.

21:20.040 --> 21:21.040
Here.

21:21.040 --> 21:23.040
And nothing is probing.

21:23.040 --> 21:24.040
Okay.

21:24.040 --> 21:28.040
I'm actually doing it on the symbol allocate resource.

21:28.040 --> 21:32.040
So let's see if I actually do it on the part.

21:32.040 --> 21:35.040
So you can see that I have the stiffs here.

21:35.040 --> 21:39.040
So it means that that's the code that is actually doing the location.

21:39.040 --> 21:42.040
And I'm going to run it again.

21:42.040 --> 21:43.040
Yay.

21:43.040 --> 21:46.040
We have our props.

21:46.040 --> 21:48.040
So.

21:48.040 --> 21:51.040
That's just a summary of it for reference.

21:51.040 --> 21:52.040
But basically.

21:52.040 --> 21:58.040
You just have silence when you're like when you're probing to the allocate resource symbol.

21:58.040 --> 22:01.040
Because it's just jumping to another part symbol.

22:01.040 --> 22:07.040
And the second one is the prod that we just so when you allocate to the part.

22:07.040 --> 22:09.040
So what are the alternatives again?

22:09.040 --> 22:11.040
Generally.

22:11.040 --> 22:13.040
You can.

22:13.040 --> 22:14.040
Check for trace points.

22:14.040 --> 22:16.040
I know that's the running guy of the talk.

22:16.040 --> 22:18.040
But there is no one one translation.

22:18.040 --> 22:21.040
Sometimes depending on the function in your use case.

22:21.040 --> 22:22.040
So.

22:22.040 --> 22:23.040
Yeah.

22:23.040 --> 22:24.040
It depends.

22:24.040 --> 22:26.040
But if you're using your prob.

22:26.040 --> 22:28.040
You can actually as you mentioned before.

22:28.040 --> 22:30.040
I use.

22:30.040 --> 22:32.040
The non partially lining.

22:32.040 --> 22:35.040
Otherwise for k prob.

22:35.040 --> 22:37.040
You're not going to recommend the kind of right.

22:37.040 --> 22:39.040
So you can check for syphix symbols.

22:39.040 --> 22:42.040
There are different one of difference one.

22:42.040 --> 22:44.040
And you can vary what we've dwarfed again.

22:44.040 --> 22:47.040
Or see source mapping with object dump.

22:47.040 --> 22:51.040
So that was the gotcha for in lining.

22:51.040 --> 22:52.040
All right.

22:52.040 --> 22:54.040
Now we're going to take a look at the next gotcha.

22:54.040 --> 22:56.040
Missed executions.

22:56.040 --> 22:59.040
So here we have this output from bpf tool.

22:59.040 --> 23:02.040
Showing an f probe k probe in a trace point.

23:02.040 --> 23:05.040
And there's something interesting there at the end.

23:05.040 --> 23:10.040
So any ideas what recursion miss counter represents here.

23:10.040 --> 23:12.040
A.

23:12.040 --> 23:15.040
B.

23:15.040 --> 23:17.040
C.

23:17.040 --> 23:22.040
D.

23:22.040 --> 23:23.040
All right.

23:23.040 --> 23:25.040
Well, I got to move fast so running out of time.

23:25.040 --> 23:26.040
It's B.

23:26.040 --> 23:32.040
How many times recursion prevented the bpf program from running again while already executing.

23:32.040 --> 23:39.040
So as we saw in the bpf tool output, you're tracing bpf programs can miss execution.

23:39.040 --> 23:44.040
So in the case you're depending on these probes for like security or observability,

23:44.040 --> 23:46.040
this can be like a challenge.

23:46.040 --> 23:49.040
So for example, let's say you're on the security team.

23:49.040 --> 23:52.040
You're interested in monitoring file operations in your infrastructure.

23:53.040 --> 23:57.040
You're using bpf to monitor like file open events, for example.

23:57.040 --> 24:02.040
And you might notice like you're expecting a thousand events from your metrics.

24:02.040 --> 24:07.040
Or and your audit logs say like only 900 events or something like that.

24:07.040 --> 24:08.040
You're missing like 100.

24:08.040 --> 24:15.040
Or you're debugging a kernel function or kernel internals and you're tracing a hot function.

24:15.040 --> 24:20.040
The bug that you're looking for may have occurred like when the probe missed an execution.

24:20.040 --> 24:24.040
So you miss your like stack trace dump or whatever.

24:24.040 --> 24:28.040
So yeah, might be surprising to learn that your tracing hook points are not 100% reliable.

24:28.040 --> 24:34.040
So let's dig into the different ways that k probes, f probes, trace points can miss.

24:34.040 --> 24:38.040
How they can and why and we'll discuss some more grounds.

24:38.040 --> 24:46.040
So probes generally have two different places in the kernel where the where they can miss executing.

24:47.040 --> 24:52.040
One is when the probe renters or recurses itself before completing the first.

24:52.040 --> 24:58.040
The second way is when there's any concurrent execution of any other bpf program.

24:58.040 --> 25:02.040
So crucially these misses only happen if execution occurs on the same CPU.

25:02.040 --> 25:03.040
It's a different CPU.

25:03.040 --> 25:04.040
It's a new context.

25:04.040 --> 25:07.040
So it won't miss that's not a problem.

25:07.040 --> 25:09.040
So let's look at what a recursion looks like.

25:09.040 --> 25:13.040
So let's look at example one bear with the diagram here.

25:13.040 --> 25:16.040
So let's say you have a k probe on open at two.

25:16.040 --> 25:21.040
That calls print k which takes the print k lock.

25:21.040 --> 25:25.040
At the same time you have a trace point on contention begin.

25:25.040 --> 25:30.040
Let's say contention on the lock occurs like on the print k lock.

25:30.040 --> 25:34.040
The trace point program fires which also might call print k.

25:34.040 --> 25:40.040
That leads to a recursive execution and the kernel skips the execution of the second program.

25:41.040 --> 25:44.040
And I don't oops.

25:44.040 --> 25:46.040
I don't like scroll down.

25:46.040 --> 25:49.040
I can scroll down.

25:49.040 --> 25:50.040
Okay, that's fine.

25:50.040 --> 25:51.040
That's fine.

25:51.040 --> 25:52.040
Yeah.

25:52.040 --> 25:53.040
Okay.

25:53.040 --> 25:55.040
So in the second example we have something a little simpler.

25:55.040 --> 25:57.040
We have a k probe on open at two.

25:57.040 --> 26:02.040
And in the middle of executing the introp can fire.

26:02.040 --> 26:05.040
That handler's code for the introp then calls open at two again.

26:05.040 --> 26:07.040
And then boom you have another recursion.

26:07.040 --> 26:10.040
And yeah.

26:10.040 --> 26:12.040
So let's dig into k probe.

26:12.040 --> 26:16.040
So k probes can miss at two different layers.

26:16.040 --> 26:19.040
One is at the handler or the attach layer.

26:19.040 --> 26:25.040
This is the layer where your k probe is actually like set up and called.

26:25.040 --> 26:31.040
So and then at this layer there is like k probe specific recursion logic that's performed.

26:31.040 --> 26:35.040
And it primarily checks if there are other k probes running on the same CPU.

26:35.040 --> 26:39.040
And then the other layer is the actual bpf program execution layer.

26:39.040 --> 26:42.040
This is like the actual jitted instructions.

26:42.040 --> 26:47.040
And at this layer there is like the bpf program's mutual exclusion.

26:47.040 --> 26:53.040
Which in other words means only one bpf program can be executing on the same CPU at a time.

26:53.040 --> 26:56.040
So let's look at the handler and attach layer.

26:56.040 --> 27:01.040
We have three different handlers depending on your kernel config and version.

27:01.040 --> 27:05.040
First one is the standard like break point handler.

27:05.040 --> 27:08.040
Then we have the f trace handler it's optimized.

27:08.040 --> 27:16.040
Then we have another handler called opt which stands for optimized as well.

27:16.040 --> 27:25.040
Yeah so anyway nowadays running optimized k probes can be seen here as like most common.

27:25.040 --> 27:29.040
But in three hand there still exists for legacy purposes.

27:29.040 --> 27:37.040
Okay so all three handlers have this same check here where they're checking if there are other k probes running on the same CPU.

27:37.040 --> 27:43.040
And that's done by looking at the current k probe variable which points to the currently running k probe.

27:43.040 --> 27:51.040
And that's very similar to like the current task variable that we all know from for processes in the kernel.

27:51.040 --> 27:55.040
And so yeah if this condition is true we missed the execution of the k probe.

27:55.040 --> 27:58.040
So why does this happen even anyway?

27:58.040 --> 28:10.040
So from my understanding the kernel drops the execution to corrupting the state of the k probe that's currently running due to the k probe being like an interrupt based thing,

28:10.040 --> 28:15.040
especially in the like the entry handler type.

28:15.040 --> 28:22.040
And so if in the future that handlers deprecated I believe it's possible that the k probe running restriction could be lifted.

28:22.040 --> 28:27.040
I would love to be corrected here but that could reduce the amount of missed executions for k probe specifically.

28:27.040 --> 28:32.040
All right so let's look at the second layer here for k probes.

28:32.040 --> 28:38.040
We've passed the attached layer meaning there was no k probe detected running on the same CPU.

28:38.040 --> 28:41.040
Now we reach the program execution layer.

28:41.040 --> 28:46.040
So at this layer we have a per CPU variable that's checked BPF prog active.

28:46.040 --> 28:50.040
And this tells you if there's another BPF program already running on the same CPU.

28:50.040 --> 28:56.040
So this check exists to protect the per CPU kernel state that BPF programs might access or modify.

28:56.040 --> 28:59.040
So that's why the kernel drops the execution here.

28:59.040 --> 29:05.040
And without that drop the kernel state could get corrupted from nested executions.

29:05.040 --> 29:08.040
All right let's go to F probes now.

29:08.040 --> 29:18.040
So F probes don't use the k probe running helper they use like its own dry lock function.

29:18.040 --> 29:25.040
And this function actually supports one layer, one level of nested execution.

29:25.040 --> 29:30.040
As long as that execution is from the same context.

29:30.040 --> 29:37.040
So for example we have F probe with IRQ and NMI to F probes i mean and that's not allowed.

29:37.040 --> 29:46.040
But if you have two F probes the same F probe on the same CPU but from the same context that's that nesting is allowed at the handler layer for F probes.

29:46.040 --> 29:51.040
All right so F probes BPF program execution layer.

29:51.040 --> 30:04.040
So F probes are BPF trampoline based not in truck based meaning the at the execution layer it checks if the same program is already executing on the same CPU.

30:04.040 --> 30:07.040
Wrap up okay.

30:07.040 --> 30:10.040
So.

30:10.040 --> 30:18.040
Yeah so because F probes are based on trampolines each each F probe program has its own trampoline where it stores its state.

30:18.040 --> 30:27.040
So if there's a nested execution the F probe the nested F probes could override the BPF trampoline memory.

30:27.040 --> 30:34.040
Okay so here's summary of what we just discussed i'll skip this for time purposes.

30:34.040 --> 30:39.040
So what are your choices so if you can try to use BPF LSM hooks.

30:39.040 --> 30:46.040
For example if you have an open at two K probe F probe on open at two maybe try the file open BPF LSM hook.

30:46.040 --> 30:58.040
Because LSM hooks are guaranteed to run under the NMI context with concurrent executions and that is unlike tracing BPF programs.

30:58.040 --> 31:07.040
And if you can't use F probes sorry if you can't use BPF LSM hooks try trace points because they have the highest performance.

31:07.040 --> 31:15.040
So if you have if you have spend the least amount of time in your hook then there's less chance of interrupts that can occur and therefore.

31:15.040 --> 31:18.040
Recursion.

31:18.040 --> 31:20.040
There's a summary move on.

31:20.040 --> 31:22.040
All right if you take aways.

31:22.040 --> 31:23.040
I think.

31:23.040 --> 31:24.040
Got a wrap up.

31:24.040 --> 31:25.040
Okay.

31:25.040 --> 31:28.040
All right thanks everyone.

31:28.040 --> 31:35.040
Thank you.

