WEBVTT

00:00.000 --> 00:10.160
We welcome for the new people who joined Welcome to the Go Devroom again.

00:10.160 --> 00:14.720
Our next talk is really interesting because we often talking about instruments in a

00:14.720 --> 00:20.200
lot today but we never talked about to do it without actually changing a single line

00:20.200 --> 00:21.200
of code.

00:21.200 --> 00:26.000
So I have two amazing speakers on stage at the same time which is Hannah and Kemal which

00:26.000 --> 00:28.760
are both going to talk about instrumentation.

00:28.760 --> 00:29.760
A pass!

00:29.760 --> 00:41.280
Okay, so yeah, we're going to talk about instrumentation.

00:41.280 --> 00:44.040
We talk a lot about profiling that was cool.

00:44.040 --> 00:45.320
Parker got a lot of mentions.

00:45.320 --> 00:49.040
I used to maintain Parker so I'm glad that it's getting traction.

00:49.040 --> 00:52.080
No, okay, it's good.

00:52.080 --> 00:55.000
Okay, should I repeat myself?

00:55.000 --> 00:56.800
Yeah, let's go.

00:56.800 --> 00:57.800
Okay.

00:57.800 --> 00:58.800
Let's take a look at that.

00:58.800 --> 00:59.800
Okay.

00:59.800 --> 01:12.880
So, cool, yesterday I was actually when I was walking thinking about this talk and like

01:12.880 --> 01:17.360
zero code changes whatnot and I realized that maybe all the things we're going to

01:17.360 --> 01:22.120
mention today, maybe they are not needed because in the future only the AI are going

01:22.120 --> 01:24.640
to write the code.

01:24.640 --> 01:29.200
We can just give it away and okay, just instrument our go applications and we can get

01:29.200 --> 01:30.200
away with it.

01:30.200 --> 01:34.600
But until that time, let's see what we can do today.

01:34.600 --> 01:36.800
So first of all, why do we care?

01:36.800 --> 01:37.800
Right?

01:37.800 --> 01:44.400
So the observability promise you understand the whole system behavior, they buy production

01:44.400 --> 01:49.360
issues and then prevent any other objects or like troubleshoot when they happen.

01:49.360 --> 01:56.320
What happens is distributed complexity, partial visibility, you don't see all your like

01:56.320 --> 02:01.240
metrics, pens or like any metrics that you are collecting and then okay, it's very

02:01.240 --> 02:04.040
some my machine but just happen.

02:04.040 --> 02:07.640
So how do you achieve this?

02:07.640 --> 02:10.320
You need to pay the tax right?

02:10.320 --> 02:15.920
Today we're going to strictly talk about or like our example is going to be about distributed

02:15.920 --> 02:21.840
tracing, collecting the traces, whatnot, but technically this could apply any of the

02:21.840 --> 02:24.600
signals that you collect from your processes.

02:24.600 --> 02:31.720
So you import an SDK into your go application, initialize your tracer, grab your handlers

02:31.720 --> 02:37.520
if this is an online HTTP based application, propagate context everywhere because if you

02:37.520 --> 02:44.000
are now making an HTTP request from the HTTP client or like making a VBC code query and

02:44.000 --> 02:49.200
you need to propagate all these contexts and then you need to shut down everything, gracefully

02:49.200 --> 02:53.080
make sure everything flush to your observability system whatnot.

02:53.080 --> 02:57.320
And now if you are like working for a big enterprise company and if you have like hundreds

02:57.320 --> 03:03.240
of services, microservices, you need to do this again and again and again and again.

03:03.240 --> 03:13.080
So where the instrumentation gets away, so it's one of the things that like let to open

03:13.080 --> 03:15.560
telemetry to rise up.

03:15.560 --> 03:21.760
You are like getting something from data dog, something from Neveralic or like other APM

03:21.760 --> 03:22.760
vendors.

03:22.760 --> 03:29.240
You put that in your code base and now it's your like the third vendor specific code and

03:29.240 --> 03:36.600
then this is not specific to your vendor or like business logic but then it's inconsistent.

03:36.600 --> 03:42.160
Maybe one of the teams are implemented something in their service and it exposes like

03:42.160 --> 03:46.520
different attributes and the label, labels, the other service they don't have it, like

03:46.520 --> 03:51.520
how do you ensure that is the case and like how do you ensure that you don't clutter

03:51.520 --> 03:58.400
your code base right and then there's this anxiety of like I'm collecting all these data

03:58.400 --> 04:04.440
signals and they're like useful and everything but then like what is the cost of it,

04:04.440 --> 04:09.320
what is the overhead of like doing all these sort of instrumentation.

04:09.320 --> 04:16.320
So we can't do anything about the last part or maybe we can but it's like harder to tweak

04:16.320 --> 04:24.040
the performance part but what about like getting the toilet of the way, toilet of instrumentation

04:24.040 --> 04:28.040
out of the way and directly provide the value of observability.

04:28.040 --> 04:33.440
So that means like no SDK imports, no wrapping any handlers or functions, no context

04:33.440 --> 04:41.000
propagation, we don't need to deal about it and then just observability just make it work.

04:41.000 --> 04:47.240
So first of all like let's step aside and talk about instrumentation, what do you mean

04:47.240 --> 04:48.240
by instrumentation?

04:48.240 --> 04:54.480
You have your application, let's say then you have your like backend, your application

04:54.480 --> 04:59.200
interacts with your backend or any other services, this could get easily complicated but

04:59.200 --> 05:03.440
then what's happening in between, maybe you need to understand like what's happening

05:03.440 --> 05:08.400
in the ingress point, what's happening in if I'm like calling a microservice about the

05:08.400 --> 05:11.240
auto, what not, how do you do that?

05:11.240 --> 05:17.680
So you can use logs, like it's the easiest, developers, loud em, right?

05:17.680 --> 05:24.800
You can just put a log line, you see it on your local machine and push that to some service

05:24.800 --> 05:30.240
but the problem is like yes, it's super convenient to add the logs but it's one of the

05:30.240 --> 05:36.880
hardest signals to like store, search and make understand that, it's a challenging task.

05:36.880 --> 05:42.560
Yes, then you have metrics, there's a reader, they're easier to collect but they're aggregated

05:42.560 --> 05:43.560
data, right?

05:43.560 --> 05:50.200
And they don't tell a story about your individual transactions, then there does the tracing

05:50.200 --> 05:51.200
come from, right?

05:51.200 --> 05:55.880
It's about transaction, you have like hierarchical data, you have the context propagation,

05:55.880 --> 06:01.880
it's rich, you can derive metrics and logs from the traces, you can build a fat event system

06:01.880 --> 06:08.760
based on it, it gives you more data but again, it's complicated to store, let's not

06:08.760 --> 06:13.880
talk about the storage but it's complicated to instrument and collect these data.

06:13.880 --> 06:18.160
So that's where the auto instrumentation comes in.

06:18.160 --> 06:19.160
Cool.

06:19.160 --> 06:23.400
So the point of our talk is about auto instrumentation so let's talk a little bit about

06:23.400 --> 06:28.520
that but before we get into all of that we need to talk about all of the manual toy

06:28.520 --> 06:32.880
all that comes into manual instrumentation that come all was talking about.

06:32.880 --> 06:39.360
So let's say that we have this HTTP request handler that's just doing something coding.

06:39.360 --> 06:41.720
Let's say I want to instrument this.

06:41.720 --> 06:46.760
I need to add around 15 lines of code for every single handler that I'm having so every

06:46.760 --> 06:49.640
health check that you have every endpoint that you have.

06:49.640 --> 06:53.080
You've got to start a span, you've got to stop the span, you've got to set their attributes

06:53.080 --> 06:58.360
somehow, it goes on and on, this doesn't include starting the tracer, everything like

06:58.360 --> 06:59.360
that.

06:59.360 --> 07:03.520
So what's the point of doing all this work?

07:03.520 --> 07:09.000
I am lazy, I don't, I want to have all this data but I don't want to do it myself

07:09.000 --> 07:14.000
and as Kamal said maybe AI will do this for me eventually but so far, haven't heard

07:14.000 --> 07:15.560
of anything.

07:15.560 --> 07:21.080
I need to do something in between having my application and actually getting data to

07:21.080 --> 07:24.080
actually profit from having this application.

07:24.080 --> 07:27.880
So this is exactly where auto instrumentation comes in.

07:27.880 --> 07:32.720
Auto instrumentation as you can probably tell by the name is a way of instrumenting your

07:32.720 --> 07:36.120
code without having to make any code changes.

07:36.120 --> 07:38.520
And there are two different types that we're going to talk about today.

07:38.520 --> 07:44.000
The first one is run time auto instrumentation which you can probably tell happens during

07:44.000 --> 07:46.000
run time.

07:46.000 --> 07:51.600
This is often a source of making code changes and if you know about go which is a compiled

07:51.600 --> 07:54.720
language, we can't make source code changes.

07:54.720 --> 07:59.320
So luckily for us in go there are other alternatives that don't require source code changes

07:59.320 --> 08:01.800
and we're going to talk about that.

08:01.800 --> 08:08.800
The other type is compile time instrumentation which happens at compile time surprisingly.

08:08.800 --> 08:13.200
And this works a lot better for languages like go which are compiled because this will

08:13.200 --> 08:18.200
never require require you to make source code changes.

08:18.200 --> 08:23.000
Okay, so let's go more in depth into each of these approaches.

08:23.000 --> 08:28.600
The first one for run time basically all of the approaches and go talk about EBPF.

08:28.600 --> 08:33.040
And if you were at the EBPF Devroom yesterday, you've heard a lot about it and you probably

08:33.040 --> 08:36.880
know but as a overview, it used things called hooks.

08:36.880 --> 08:41.200
You probes are in the user space, K probes are in the kernel space and then there's also

08:41.200 --> 08:46.920
static hooks called USTT and all of these hooks give you an opportunity to skip from

08:46.920 --> 08:50.960
your application to some kind of tracing code.

08:50.960 --> 08:55.440
There's another approach that's called library injection that takes advantage of LDPRLO

08:55.440 --> 09:00.720
to basically do some weird magic that I don't understand to shove some code into your application

09:00.720 --> 09:03.280
who knows.

09:03.280 --> 09:07.480
So ignoring LDPRLO, let's talk about EBPF.

09:07.480 --> 09:13.040
So as I mentioned earlier, it gives you a hook but when you have your basic application,

09:13.040 --> 09:16.560
you're most likely going to do some communication with the kernel, whether you're in

09:16.560 --> 09:21.280
user space or kernel space and as you're doing all of this work, that happens behind

09:21.280 --> 09:22.280
the scenes.

09:22.280 --> 09:29.240
EBPF gives you a hook that tells the code, hey, jump to this tracing code, make some spans,

09:29.240 --> 09:31.920
make some stresses for me.

09:31.920 --> 09:38.480
There's an example of EBPF Automate Instrumentation, this is an example of the OTAL auto instrumentation

09:38.480 --> 09:45.200
for a go and it's very, very simple, all you have to do is create this config file.

09:45.200 --> 09:51.480
There's basically nothing here about tracing, all you have to do is bring in the library

09:51.480 --> 09:55.120
that they have and you don't have to make any source code changes.

09:55.120 --> 09:59.920
This will give you all of your traces and this one takes advantage of Upros, we'll talk a

10:00.000 --> 10:06.400
little bit more about this later, but goes to show how very simple this is.

10:06.400 --> 10:10.280
The other EBPF or POSHA we see in go is called OBI.

10:10.280 --> 10:15.480
This one is a library that was originally created by Bella but was donated to the open

10:15.480 --> 10:22.160
telemetry community and this one also uses EBPF to trace your go code.

10:22.160 --> 10:25.360
So a little bit more about go code or OBI.

10:25.360 --> 10:27.680
It supports many different languages, not just go.

10:27.680 --> 10:33.760
So if you are for some reason not using go, you can still probably use OBI and it also has

10:33.760 --> 10:36.320
a lot of different coverage for different protocols that you want to use.

10:36.320 --> 10:40.400
So HTTP, GRPC, etc.

10:40.400 --> 10:44.400
The thing about EBPF that it's very important to know is that while you're accessing the

10:44.400 --> 10:50.560
kernel, it's going to require you to give it a administrative privileges and root access,

10:50.560 --> 10:55.080
which is something will delve into a little bit more later on, but this is very important

10:55.080 --> 10:58.600
to keep in mind.

10:58.600 --> 11:02.640
And the way the OBI works is pretty similar to what I mentioned previously.

11:02.640 --> 11:07.480
You have your application in the kernel, there's a hook and after this hook, there's

11:07.480 --> 11:10.800
a OBI side card that does all of the tracing for you.

11:10.800 --> 11:16.360
And again, you don't have to make any source code changes.

11:16.360 --> 11:21.560
Similar to the other EBPF application that we saw, this is also very, very easy to set up.

11:21.560 --> 11:25.760
It doesn't require any source code changes, and you can even do this without having to

11:25.760 --> 11:29.880
stop your application, which is kind of crazy.

11:29.880 --> 11:34.320
All you have to do is tell it, what poor things being sent to, don't want traces, don't

11:34.320 --> 11:42.440
want metrics, use promethias, etc., and as soon as you run this very fancy command with

11:42.440 --> 11:45.960
or without your application still running, you get traces.

11:45.960 --> 11:48.680
So pretty magical.

11:48.680 --> 11:51.120
So that's it for now for runtime approaches.

11:51.120 --> 11:55.360
Let's talk about the other side of things, which is compile time instrumentation.

11:55.360 --> 11:58.800
There are two main things that we want to talk about today.

11:58.800 --> 12:02.960
One is, or a cast-year-un, which I don't know if we mentioned this, but we both work at

12:02.960 --> 12:03.960
data dog.

12:03.960 --> 12:08.280
So our cast-year-un is one of our projects, and is in a compile time approach that takes

12:08.280 --> 12:11.320
advantage of tool exec and go.

12:11.320 --> 12:16.200
And open the telemetry compile time instrumentation, sig, is a special interest group that

12:16.200 --> 12:20.600
is the collaboration between data dog, Ali Baba, and Otel.

12:20.600 --> 12:26.320
This is basically going to be one big library to support all of your compile time instrumentation

12:26.320 --> 12:27.320
needs.

12:27.320 --> 12:31.640
This one is still a work in progress, but we're working hard on it, and hopefully it will

12:31.640 --> 12:35.920
be available more widely soon.

12:35.920 --> 12:39.560
So how does compile time instrumentation work?

12:39.560 --> 12:43.840
As you all probably know, you start over for your code, a bunch of compiling things happen,

12:43.840 --> 12:48.120
and then you end up with an executable at some point that is actually what is being run

12:48.120 --> 12:50.480
when you run your application.

12:50.480 --> 12:54.800
Inside the compiler, there are a bunch of different steps, but we're going to just look

12:54.800 --> 12:56.200
at a few of them.

12:56.200 --> 13:01.360
The first thing that happens is that your code is broken down into abstract syntax trees,

13:01.360 --> 13:06.640
or ASTs, which are then turned into intermediate representations or IRs.

13:06.640 --> 13:12.560
And what AST is very briefly, is just a tree that consists of a bunch of nodes that represent

13:12.560 --> 13:16.560
your functions, your packages, your variables, et cetera.

13:16.560 --> 13:21.200
Once you have your IR and your ASTs, that gets broken down into machine code, a bunch

13:21.200 --> 13:27.080
of other steps, including linking, happen, and then you get an executable, very cool.

13:27.080 --> 13:32.600
The thing that we want to focus in on here is the compile, or the AST stuff, and using

13:32.600 --> 13:34.600
tool exec.

13:34.600 --> 13:42.400
So specifically for orchestrion, we use the tool exec tool, and this is also what the Otel

13:42.400 --> 13:44.960
compile time approach does.

13:44.960 --> 13:52.240
Using tool exec to go into the compile time steps, trace through the entire AST, and edit

13:52.240 --> 13:54.160
the nodes in the tree.

13:54.160 --> 13:59.480
This is called abstract oriented code, and you basically give joint points, which points

13:59.480 --> 14:05.760
to different nodes in the tree, and then an advice will tell you how to change the node.

14:05.760 --> 14:12.320
This is not just data dog specific, we support the open telemetry standard, and it supports

14:12.320 --> 14:21.320
a bunch of different packages and other dependencies that can happen automatically.

14:21.320 --> 14:26.080
However, if you are like us, and you don't want to use the data dog libraries under

14:26.080 --> 14:31.480
the hood, you can do something else, which is to create a config file, and what we did

14:31.480 --> 14:37.720
for the purposes of this presentation is to edit orchestrion to instead of using our

14:37.720 --> 14:40.920
data dog tracers to use Otel tracers.

14:40.920 --> 14:46.720
This is an extremely simplified config file, because I had very limited space on this

14:46.720 --> 14:51.920
slide, so if you try to copy this is probably not going to work, but basically what

14:51.920 --> 14:57.880
it does, it tells the orchestrion library, what joint points you look at, so for example

14:57.880 --> 15:02.920
the main function of my main package, and then what advice it does, which is to, at the beginning

15:02.920 --> 15:06.000
of the function, start up a tracer, and then start a span.

15:06.000 --> 15:10.960
So very simple stuff, and that means you're not limited to using data dog if you don't

15:10.960 --> 15:13.800
want to.

15:13.800 --> 15:18.560
As I mentioned previously, this uses tool exec, but of course if you don't want to do all

15:18.560 --> 15:23.240
these fancy tool exec things, you can also just use the orchestrion command, and also

15:23.240 --> 15:27.640
you can use environment variables if you so wish to, get things run.

15:27.640 --> 15:33.120
This will edit your AST while your code is compiling, so by the time it turns into an executable

15:33.120 --> 15:39.840
file, all the tracer tracing is injected, and your source code is untouched.

15:39.840 --> 15:45.560
Okay, now that we've done all that, finally the meat of the presentation.

15:45.560 --> 15:52.680
Okay, so the always one, like we have a bunch of other things that we are working on

15:52.680 --> 15:56.320
to tackle the auto instrumentation things, we're going to talk about it, but like we really

15:56.320 --> 16:01.160
wanted to see like what would be the effect of these things, like versus, okay, I have

16:01.160 --> 16:07.920
the baseline application, then I have a manual instrumentation, then I use orchestrion and

16:07.920 --> 16:13.640
injected some instrumentation, and then okay, I use OBI, like what are the trade-offs.

16:13.640 --> 16:21.320
For that we basically wrote several applications, and then example applications, and set

16:21.320 --> 16:25.800
it up with all the open source tooling, everything is available in a repo, we will share

16:25.840 --> 16:32.600
the link in the end, we use a Docker-based observability stack, we come up with some

16:32.600 --> 16:39.440
archetypes to actually generate the representative workloads, because like, okay, like

16:39.440 --> 16:46.240
your app can be ideal, it can be IO1, CPU1d, or maybe it's just like a mix bag of that,

16:46.240 --> 16:49.640
and we would like to see the all the differences and how it behaves.

16:49.640 --> 16:56.640
We also collected CPU memory from the hosted self, because when you run an BPA agent

16:56.640 --> 17:01.640
as your process, you need to actually consider both of their resources, not just the process

17:01.640 --> 17:08.840
one, we try to collect those things, and then, yeah, as I've mentioned before, but these

17:08.840 --> 17:13.840
are just HTTP simple applications and have some traces.

17:13.840 --> 17:19.440
To actually focus on the effect of the app approaches, these are not doing a lot of

17:19.440 --> 17:24.040
complicating stuff like making calls to the other services, so that you don't introduce

17:24.040 --> 17:31.640
any other noises caused by the external services, it's just like pure spend creation

17:31.640 --> 17:35.160
and trace creation in the end.

17:35.160 --> 17:42.280
On the methodology of the benchmarking and how to make this kind of cleanly, we also

17:42.280 --> 17:48.880
talk a lot about this in the software performance room this year, probably if you want

17:48.880 --> 17:53.200
to watch that talk as well, you would know about the methodology whatnot, but like, you

17:53.200 --> 17:57.840
can also check the repo, pull, and test yourself, right?

17:57.840 --> 18:03.240
For that we have scenarios that we tested, default baseline, no instrumentation, manual

18:03.240 --> 18:09.120
instrumentation, then the OBI itself, EBPF auto, like these projects eventually go on

18:09.120 --> 18:14.720
a merge, but they are now like separate projects, and our question, because it's the

18:14.720 --> 18:19.040
ready one, but eventually it will be an OTLC, we are calling it, open telemetry

18:19.040 --> 18:26.000
compile time instrumentation, and again, we have a go application containers for each of these

18:26.000 --> 18:32.000
scenarios, we have an OTL Collector, you can check everything with the EGR traces,

18:32.000 --> 18:37.360
Prometheus metrics, and the Generate load with K6, but then we also have a simulation layer,

18:37.360 --> 18:43.360
so when you have a request, we simulate like CPU one, the operation, IO one, the operation,

18:43.360 --> 18:50.720
whatnot, and we use identical bare metal hardware to actually run these things, and then

18:50.720 --> 18:58.120
we like sustain the load for eight minutes, so how do they compare?

18:59.000 --> 19:03.000
All right, exciting.

19:03.000 --> 19:09.160
Okay, so just to show off some data dog dashboards, when we make this public, we'll make

19:09.160 --> 19:13.680
it available using Grafana, it's just that we're more familiar with data dog.

19:13.680 --> 19:18.280
We did a few measurements, the first one being about latency of the requests, the throughput

19:18.280 --> 19:22.760
of all of the requests coming in, I know this is really hard to see, so I'm just going to

19:22.760 --> 19:30.120
go through this quickly, we have a summary, some CPU and memory metrics, and some host

19:30.120 --> 19:38.200
metrics, again, for memory and CPU, so hopefully this is slightly easier to read if not a

19:38.200 --> 19:43.880
little bit less crowded, though it kind of still is, but we're comparing the baseline, which

19:43.880 --> 19:49.320
again, is the default instrumentation, or no instrumentation, rather, the manual instrumentation,

19:49.400 --> 19:55.640
which is just using the Oto SDK, EBPF, and the tool train approach, which is still using

19:55.640 --> 20:02.040
Oto under the hood, so shouldn't introduce too much noise. In the first column, you see, we

20:02.040 --> 20:12.760
have CPU, and as expected for different approaches, we're actually not expected, sorry, we updated

20:12.760 --> 20:21.160
these numbers up pretty recently, strangely enough, where's my pointer? No pointer. The EBPF

20:21.160 --> 20:27.480
and the tool train approach seem to be using less CPUs surprisingly, and I'll do a little bit

20:27.480 --> 20:31.400
more talking about this after I go through all of the columns, for memory, as expected, we're

20:31.400 --> 20:36.120
using a lot more memory, or a little bit more memory for each approach compared to the baseline.

20:36.120 --> 20:41.080
For latency, we're seeing that requests are going through quicker, and then as a result, the

20:41.160 --> 20:45.880
throughput is higher than the baseline, so maybe not what you expected to see.

20:47.320 --> 20:51.880
Evidently, I did not pay attention to Kamal and Augusta's talk about benchmarking,

20:53.400 --> 20:57.880
but if you were there at the talk, you know there were a bunch of tips that you could be using,

20:57.880 --> 21:05.320
and of course benchmarks are a little finicky, so this is just the latest results, but

21:05.400 --> 21:12.920
let's, in addition to benchmarking, talk about who the winner is.

21:14.520 --> 21:19.000
So, in addition to just peer numbers, we want to know how easy it is, and how

21:19.800 --> 21:26.200
quote-unquote good each approach is. So, for this, we're just going to be looking at the

21:26.920 --> 21:30.200
EBPF approach and the tool chain approach across four different aspects.

21:31.000 --> 21:35.480
The first one is performance, and as you saw from the previous slides, we're using more

21:35.480 --> 21:40.360
memory, so obviously there's going to be a little bit of overhead when you start instrumenting your

21:40.360 --> 21:46.440
code, so a little bit of a trade-off if you want to get data. For stability, as I mentioned earlier,

21:46.440 --> 21:53.560
EBPF requires you to use probes, which often require you to know offsets of all of your functions

21:53.560 --> 21:58.280
in your variables in your code, which means that it's a little hard to use, and if you're

21:58.360 --> 22:04.200
rerunning or rebuilding your code, sometimes things can break. On the other hand, for tool chain approaches,

22:04.200 --> 22:10.120
all you really have to know is that your code is compiling, so it's a little bit more stable in

22:10.120 --> 22:17.000
terms of like run to run. In terms of security, as I mentioned earlier, and I hope everyone

22:17.000 --> 22:22.360
remember this, because I told you to remember this, the EBPF approach requires you to give a

22:22.360 --> 22:30.360
administrative privileges to the library, which is a little scary. It's accessing your roots,

22:30.360 --> 22:37.320
and it's not great if there are potentially dangerous people on your nutwork. The tool chain

22:37.320 --> 22:42.440
approach doesn't do this, so it's a little safer. And then last of all, for portability or like

22:42.440 --> 22:49.000
ease of use, EBPF is great, because for things like OBI, you don't even have to restart your

22:49.080 --> 22:55.320
application, and you can just drop in the sidecar and then things start popping up. However,

22:55.320 --> 23:01.000
it does require you to have a Linux kernel, so if you're using a MacBook, you're kind of out of luck.

23:02.120 --> 23:07.080
But the tool chain approach, though, again, as long as your code is compiling, it works,

23:07.080 --> 23:12.280
but it does require you to restart your application. So they have their pros and cons.

23:13.000 --> 23:21.160
So the winner, I'm sorry, there's no clear winner. I'm kind of bidded everyone. It really depends.

23:22.280 --> 23:27.960
If you want to use EBPF, that's great if you want flexibility in terms of use, but for

23:29.000 --> 23:34.920
stability and security, protecting your code, compile time instrumentation might be the way to go for you.

23:34.920 --> 23:42.760
And a little bit more about EBPF hooks and probes and things like that. Donia and her

23:42.760 --> 23:49.640
coworker Chris had a fantastic talk yesterday about the gotchas of using hooks and probes.

23:50.360 --> 23:57.560
And our coworker was somehow had another talk yesterday about the performance impacts and

23:58.200 --> 24:03.720
other issues that we've had with EBPF. So if you're interested in learning a little bit more about this,

24:03.720 --> 24:06.120
you can go check out their slides and their recordings.

24:08.200 --> 24:13.960
Do you have time? Yeah. Yeah, we have enough time to talk about it. But again, as we told

24:14.680 --> 24:22.360
talked before, we are still thinking about this. And we still try to find ways to push the boundaries

24:22.360 --> 24:27.960
and really find a way to like auto instrument the go application, sorry. So one of the things

24:27.960 --> 24:34.360
are available already in the other languages or the systems is having USDT. This is

24:35.160 --> 24:40.360
statistically defined tracing points for the user space. And some of the languages, some of

24:40.360 --> 24:46.280
the binary runtime, they already have this. You can enable them. This is basically injecting

24:46.280 --> 24:52.120
couple of empty bytes to each function prologue and you can use these places to

24:52.120 --> 24:58.200
either inject a library or like hook with the U-props system. What it gets into table,

24:58.200 --> 25:03.960
the all the downsides of like or beyond needs to write right now deal with like calculating the

25:03.960 --> 25:11.560
offsets where to find the binaries which memory to read and do this for each and every go version.

25:11.560 --> 25:15.800
And most of the things are can easily break because if you temper with the go stack when it's

25:15.800 --> 25:21.240
executing, if you go runtime would just panic and just you're at all that gets crashed.

25:21.240 --> 25:27.800
But if we can put USTTs into go application, actually we can make these things stable and we can

25:27.800 --> 25:33.960
benefit for the both of the world. So how do we do this right now? There's a library called

25:33.960 --> 25:40.200
SELP, but it's not and it's using the sub-st, which is a native binary and it's basically

25:40.200 --> 25:48.120
that generates what's required and link them in the runtime. But again, then you have those knobs,

25:48.200 --> 25:52.920
there's no runtime to it. There's no execution costate and they are just like dormant and when you hook

25:52.920 --> 26:00.280
some BPPF trace program or any other BPPF program and then you can utilize that event collecting

26:00.280 --> 26:08.040
to data. I'm going to just give you our earlier running out of the time. And the same strategies

26:08.040 --> 26:14.040
enabled like for the other runtime so I'm not. But again, this is this isn't working for like

26:14.200 --> 26:20.680
latest goal versions. SELP is like out of data and this is basically also like dynamic time

26:20.680 --> 26:28.440
library opening, whatnot, it's not exactly secure. The other way that you can also do is like

26:28.440 --> 26:33.720
again the dark magic of injecting a third party native library and hook into the actual calls and

26:33.720 --> 26:41.800
generate these spans. There is a framework for that called Frida and I also experimented with that

26:41.800 --> 26:48.200
V also examples of the LIP-STP in the library but since they're like so unstable, we do

26:48.200 --> 26:54.120
it and include them to the benchmarks. But there is a way that we can craft by supporting

26:54.120 --> 27:03.400
these libraries whatnot. But actually one of the, this is another like approach to do this as well.

27:03.400 --> 27:08.680
They are building their own quarks, they are building their own framework based on CGO and

27:08.680 --> 27:16.440
extending to runtime but it's basically an injection framework. So but there's also another idea

27:16.440 --> 27:22.280
that I want to pursue and I've been working on this for a while. So come like one of the things

27:22.280 --> 27:28.200
that recently we, the goal runtime added is the flight recorder. And flight recorder is for like

27:28.200 --> 27:34.360
getting traces out of the scheduler, GC whatnot, it's about the internals of the goal runtime itself.

27:34.360 --> 27:41.560
But I thought like okay like if we can extend it and add some more like tracing points in the end

27:41.560 --> 27:46.760
and then make it just like aggregate the data and stream out of the system and we can actually

27:46.760 --> 27:54.840
use this for our purposes as well. And for that I'm working on a POC to see that if it's actually

27:54.840 --> 28:00.600
can work and I got some results but it needs a lot of performance improvements that's why we also

28:00.600 --> 28:06.840
that included that to the benchmarks. We also don't know if the goal runtime team

28:06.840 --> 28:13.400
gonna agree with us. So there's a long way ahead for this proof of concept. And the second one,

28:13.400 --> 28:19.720
the another POC that we are working on is injecting USTD probes directly to the goal compile time.

28:19.720 --> 28:25.400
This is achievable. USTT is work like you just need to add another f-section to your binary

28:25.400 --> 28:31.000
and then it they are discoverable and they are stable. And if you if we actually implement this

28:31.000 --> 28:36.200
in the goal compile time and the tool chain we can have these probes and we can add these probes

28:36.200 --> 28:42.200
to the standard library and this is there is no runtime and execution overhead of this and we can

28:42.200 --> 28:47.000
just enable them for the Linux systems and then an EVPS system or an injection system can

28:47.000 --> 28:53.880
hook into these tracing points that are defined. I come up with some tooling. I actually implemented

28:53.880 --> 29:00.760
these. I have a proof of concept PR whatnot to discover these points and maybe generate

29:01.640 --> 29:07.640
BFF trace to demo these things whatnot and like the API is definitely like looks like this right

29:07.640 --> 29:13.400
USTD at a probe and then collect the data. Maybe capture some arguments whatnot. I'm still

29:13.400 --> 29:18.920
working on it. It's not stable. That's why it's not made the cut for the benchmarks but I will

29:18.920 --> 29:26.120
keep updating the repo that we're going to share. And yeah if you are interested you can check it

29:26.120 --> 29:31.480
out and see how it goes. So instrumentation is helpful because of the ability, helpful.

29:32.120 --> 29:38.920
Auto instrumentation is possible with a lot of trade-offs. You need to pick your battles. You can use

29:38.920 --> 29:45.160
the BFF page approach but it's brittle. You can use compile time approach but then you need to change

29:45.240 --> 29:51.240
how you build your application. Even it's minor. It's a change. You can still contribute all of these.

29:51.240 --> 29:56.200
The open source CIGs are working. There's an OBI CIG and an open telemetry. There's a compile

29:56.200 --> 30:01.720
time CIG that we are working on. There is the auto injector framework. You can extend auto

30:01.720 --> 30:07.080
injector to include go and you can always contribute to the go time. If you are interested in any

30:07.080 --> 30:13.480
of these subjects, try to discover and join these CIGs and say hi and let's start working on that.

30:13.480 --> 30:22.040
With that, thanks for listening. Thank you. Thank you so much.