WEBVTT

00:00.000 --> 00:12.840
So I'll new go for us. We are going to it using Go. And we have some very nice performance

00:12.840 --> 00:18.640
things to do. I find it Alex Ober, Alex is actually also managing a deaf room today

00:18.640 --> 00:26.120
and he just left his completely busy room to talk about how to use P going Go, so an extra

00:26.120 --> 00:37.760
hard of much room. Thank you. So I guess can we can start. Before the tour, I wanted to

00:37.760 --> 00:44.440
ask how many of you actually know what profile guide the optimization is. Great, wonderful.

00:44.440 --> 00:51.040
I like to skip through slides. So the few words about me, I used to be a C++ engineer

00:51.040 --> 00:58.880
right now. I'm a Rust engineer. Keep in mind, I'm not a Go engineer at all. I wrote several

00:58.880 --> 01:06.160
lines of gore and I'm having that. But I think my point of view will be interesting because

01:06.160 --> 01:14.720
PGO in Go is kind of special compared to let's say, a more mature PGO from C++ in

01:14.720 --> 01:20.640
the Rust world. It's kind of special. Regarding my interest, I'm interested in compilers

01:20.640 --> 01:27.840
stuff. I was hacking LVM. I prefer more LVM stuff compared to GCC. I'm also in PGO

01:27.840 --> 01:32.960
offer, this is a resource that's trying to collect as much possible information about PGO

01:32.960 --> 01:41.920
mostly from C++ in the Rust world. But I've covered it as a Go ecosystem too. And I'm

01:41.920 --> 01:47.520
a software performance developer organizer this year. So let's start. A bit fury about PGO,

01:47.520 --> 01:54.080
but we can skip it. Great. So we have a Head of Time compilation model. It's pretty simple.

01:54.080 --> 02:02.080
So our score is binary output. It's executed in target machine with regular one. So unfortunately,

02:02.080 --> 02:09.680
we have a just-and-time compilation model too. So it's a Java V8 stuff. So they have

02:09.760 --> 02:15.120
a possibility to not simply compile to some kind of byte code, but also execute with the byte

02:15.120 --> 02:25.360
content target machine. So just in time, it has a really huge advantage here because they can collect

02:25.360 --> 02:32.640
runtime statistics on a target machine. How our code is executed, which parts, which functions

02:32.720 --> 02:38.080
come frequently, et cetera. And fortunately, in the Head of Time world, we don't have such a privilege.

02:39.840 --> 02:43.920
We need this information because we have a lot of optimizations, which are dependent

02:43.920 --> 02:50.960
can be improved by providing to them around time statistics. The most important one is in Lightning

02:50.960 --> 02:57.920
is important in every compiler, including the Go one. And the solution profile guided optimizations

02:58.480 --> 03:02.720
so we just collect runtime statistics on how it's called PGR profile,

03:02.720 --> 03:06.800
pass the profile to the compiler, use the profile during the compilation phase.

03:07.440 --> 03:13.760
And it's it, our binary is faster, if an asterisk, because it's not actually true for all cases.

03:13.760 --> 03:20.400
It's first we need to understand we can optimize if PGR for now only CPU intensive stuff

03:20.400 --> 03:28.160
and stuff that is not already optimized. Whereas some developments,

03:29.680 --> 03:36.560
really fresh developments, less than a week ago, about applying PGR for GPU stuff, device GPU

03:36.560 --> 03:47.440
stuff, but it's highly unstable and available only in LLVM. And okay, our binary will be pretty fast.

03:47.520 --> 03:54.560
So, if you want to know more about PGR internals in the Go compiler, in the main Go compiler,

03:54.560 --> 04:01.280
highly recommend the stored from Go for a coin UK. And I highly recommend to ask questions to

04:01.280 --> 04:09.520
this guy on GitHub, in the Go repository, because actually it was the only person from the

04:09.600 --> 04:16.800
Go compiler, the team who answered my PGR related questions well from my point of view.

04:18.320 --> 04:29.920
So, why should we, we should care about PGR? Most of these benchmarks are not go unfortunately,

04:29.920 --> 04:36.720
unfortunately for you. But it's the reason because it's simply much easier to find

04:36.720 --> 04:46.000
ready benchmarks for more mature PGR systems. Let's see it. And most of these benchmarks actually

04:46.000 --> 04:54.080
all of them, I did myself. So, I'm pretty sure. You can expect almost similar results from the

04:54.960 --> 05:01.760
PGR applying PGR in the Go ecosystem, but keep in mind one thing. Go compiler, implement,

05:02.720 --> 05:09.600
less less aggressive optimization compared to C, C++, RUSTF, or GCC, or LLVM,

05:10.400 --> 05:16.240
a consistency, because if you've seen at least once the amount of optimization passes in LLVM,

05:16.240 --> 05:21.040
you understand. So, there are a lot, really, a lot of optimization since I told LLM.

05:21.920 --> 05:28.320
So, let's talk about PGR in Go. In the game, it was implemented firstly in a 1.20,

05:28.320 --> 05:36.240
previous 1.21, it's globally available. Much, much later compared to C and C++ world.

05:37.600 --> 05:44.560
It was implemented by Google and Google also is the offer of the same thing PGR, which is

05:45.760 --> 05:54.000
frequently called as autofidio or simply PGR. It's just same thing PGR, nothing special, that's it.

05:54.960 --> 06:00.640
Google implemented this kind of PGR for exactly one reason, because they wanted to collect

06:00.640 --> 06:07.280
PGR profiles directly from production environment. The regular ones, the default one PGR mode,

06:07.280 --> 06:17.760
instrumentation mode, gives you an ability to collect much, okay, more precise PGR profiles,

06:17.760 --> 06:21.520
but in the cost of much, much higher instrumentation costs on the run time.

06:22.480 --> 06:29.280
If you hear that you cannot run on production instrumented binaries, let's not completely

06:29.280 --> 06:35.280
true because they actually answer it depends, it depends on your requirements, etc. But the

06:35.280 --> 06:41.040
go engineer decided you don't need it, and simply PGR will be enough.

06:42.800 --> 06:51.120
And the consequence of that, that Go compiler misses a lot of interesting PGR modes,

06:51.200 --> 06:58.560
instrumentation, instrumentation kinds. For example, Varra, a lot of instrumentation flavors

06:58.560 --> 07:06.000
before translating to IR, after in-lining phase, combining sampling and instrumentation at once,

07:06.000 --> 07:14.080
trying to fill gaps in the PGR profile by passing it to some machine learning models.

07:14.800 --> 07:22.720
And one of the fancies PGR kinds is a temporal PGR that optimizes not a run time

07:22.720 --> 07:28.560
speed of your application, but a start-up time code start. It was implemented by a meta for

07:28.560 --> 07:39.040
mobile devices. It's not available for Go. So unfortunately, only the main Go compiler supports PGR,

07:39.600 --> 07:46.480
LLGo, the LLVM-based compiler for Go doesn't support PGR at all, and even

07:46.480 --> 07:53.440
there is no any request to this. I forgot to report it to the upstream. And if you hear about

07:54.320 --> 08:00.960
TINIGO, so subset of Go for Embedded, I post it TINIGO also doesn't support PGR.

08:00.960 --> 08:07.600
I left a request, I guess, one year ago also in the upstream and the developer said, yeah,

08:07.600 --> 08:16.560
great idea, give it a try. Okay, that's it. So, I don't care much, sorry. So, and another interesting

08:16.560 --> 08:25.280
detail that Go compiler reuse the PPR off ecosystem for gasoline PGR profiles, it's a huge

08:25.280 --> 08:33.920
difference compared to C++ world because they implemented custom profiles that are not

08:33.920 --> 08:39.440
compatible between compilers, you need special tooling, which is called out video from Google,

08:39.440 --> 08:47.280
this tooling is not so good, it's not so easy to build on your machine, it requires specific

08:47.280 --> 08:53.760
version of LLVM, you cannot build it on the latest LLVM version where I lot of stuff. So,

08:54.720 --> 08:58.240
from this point of view, PGR in Go is much better.

08:58.320 --> 09:06.000
Actually, you can try, so if you are not happy with TPR off format, for some reason, for example,

09:06.000 --> 09:13.040
you have your own proprietary open source profiling stuff profiling a system which was developed

09:13.040 --> 09:19.200
before PPR off, you can try to convert your profiles into the PPR off compatible format,

09:19.200 --> 09:28.240
but it nuances. Whereas a artificial documentation by the way Go documentation about PGR,

09:28.240 --> 09:34.400
I would say one of the most user-friendly in the ecosystem not only in Go, but actually for

09:34.400 --> 09:42.320
in the world, PGR ecosystem because for C++ world PGR, the documentation is terrible.

09:42.400 --> 09:52.160
It was terrible right now, it's simply bad. So, and Go developers really tried and they did a good

09:52.160 --> 09:59.680
job, trying to not only describe what PGR is, that's what we usually have, but how to use

09:59.760 --> 10:08.160
in practice, and a huge kudos to them really. And the second interesting detail,

10:09.040 --> 10:14.960
it's what designed PGR mainly for service like workloads. For example, if you have a binary,

10:14.960 --> 10:21.680
if you buy your HTTP handlers, they are running continuously and you collect profiles from a PGR

10:22.080 --> 10:30.080
from time to time, if some frequency. In C++ world, the instrumentation was the first PGR mode,

10:30.640 --> 10:37.680
and it didn't have such limitations, so usually you got a PGR profile at the exit of your program.

10:38.240 --> 10:45.440
You can highly customize this behavior, but it's pretty undocumented. You need to read a lot of

10:45.520 --> 10:53.440
DM sources, et cetera. Compared to Go, once again, Go, you can just

10:55.360 --> 11:03.840
at a package profile, HTTP, and you will get HTTP provided profiles all simply

11:03.840 --> 11:12.160
right and mainly, like a gather and dump somewhere to the file. So, let's talk about interesting

11:12.240 --> 11:19.200
possible PGR issues that you can meet in your PGR journey in Go. So, at first,

11:19.200 --> 11:26.960
mismatched and outdated PGR profiles, official documentation says that, but they choose,

11:26.960 --> 11:36.080
they work very carefully. So, that if your profile is kind of outdated because

11:36.480 --> 11:43.360
new code is added, so it will not be covered by a PGR existing PGR profile, or some code

11:43.360 --> 11:48.080
that is allergic covered by PGR profile is changed. So, your profile.

11:48.880 --> 12:04.000
Okay, backup, yep, backup is fine.

12:04.000 --> 12:20.160
Yep, yeah, a couple of last engineers here. So, Go, Go implementation of PGR is pretty stable

12:21.120 --> 12:28.320
and sustainable to this changes. However, I still recommend you to avoid problems if the

12:28.400 --> 12:36.960
stale profiles just try to regenerate them as frequently as possible and that's it. You will

12:36.960 --> 12:46.080
eliminate a really big amount of potential problems with possible regression and performance,

12:46.080 --> 12:53.200
et cetera. Please don't even try to play with these things because it's really hard to back them.

12:53.840 --> 13:03.680
I would say so. And another interesting thing, C++ ecosystem gives you an ability.

13:04.480 --> 13:10.800
When you provide a really outdated profile to the compiler, you will get at least from LVM

13:10.800 --> 13:15.280
a system, you will get a lot of warnings mismatched profiles, something like that.

13:15.280 --> 13:20.080
Unfortunately, such functionality is not available for the Go compiler yet, there is a request

13:21.040 --> 13:27.840
on the screen, but it's not implemented and there is no activity. So, be careful.

13:29.680 --> 13:36.320
And regarding performance regressions, there is an interesting note in the

13:36.320 --> 13:41.120
documentation that you should not expect performance regressions from the wrong PGR profile.

13:42.720 --> 13:49.680
I was kind of curious about such a bold statement, I would say. And of course, it was a lie.

13:50.560 --> 13:57.920
Because there is an issue in the upstream, when a person provided a PGR profile to the workload,

13:58.560 --> 14:05.920
and the resulting binary was performing worse than before PGR.

14:06.960 --> 14:13.600
And I would say it's kind of a problem for the whole PGR ecosystem, not only for Gore, for one

14:14.400 --> 14:19.440
really annoying reason. If you met such a thing, what do you do?

14:20.560 --> 14:27.920
You can try to try to regenerate PGR profile, okay, let's say you did try it and it didn't help.

14:27.920 --> 14:35.840
What else? You as a good engineer are going to the upstream and report it, and they will try to

14:36.000 --> 14:43.280
so they need a reproduction, they need your court, minimal viable product type, they will need

14:44.720 --> 14:51.520
probably your profile or better benchmarking profile gathering scenario. You will need to provide it

14:52.400 --> 15:00.240
once again, it's a very difficult job. And even if all of this information highly likely,

15:00.400 --> 15:08.560
your issue will not be resolved. Because they have much, they have a lot of enough work to implement

15:08.560 --> 15:15.920
it, they go compiler and debugging performance regressions from PGR in your specific case

15:16.320 --> 15:24.800
is not a very fun job to do. We have the same problem in LVM and GCC, I reported such bugs in

15:24.800 --> 15:32.480
these ecosystems and I got exactly the same answer, so silence, so that's it.

15:34.880 --> 15:42.400
And debugging yourself is a really challenging thing, and I wanted to highlight one thing regarding

15:42.400 --> 15:51.680
the one-shot utilities like CLIs, for example, log, generator, etc. I, so remember the service

15:51.680 --> 15:59.920
like or end of PGR. So if you have one-shot utility, for example, you start in it,

15:59.920 --> 16:05.280
it's running, for example, for 10 seconds, and it's finished. How to collect a PGR profiles

16:06.000 --> 16:14.560
from this workload? Your utility doesn't have a HTP handler, it's not a service, it's like

16:14.560 --> 16:24.080
a one-time utility. Unfortunately, I was, I proposed an idea about some kind of automation,

16:24.080 --> 16:30.000
regarding dumping a PGR profile like counters, internal accounts, etc. at the end of the program,

16:30.000 --> 16:38.800
how it's done in the C++ ecosystem. This idea was rejected, and the main motivation

16:38.880 --> 16:43.680
of all the links, the public, of course, in GitHub, the main motivation is files I understand,

16:43.680 --> 16:50.240
they don't actually care as much about such workloads. I cannot blame them because we have

16:50.240 --> 16:59.920
a lot of us writing a lot of you. Writing goes services and go, not like utilities, but anyway,

17:00.480 --> 17:07.840
if you want to dump PGR profiles, you will need to implement it by yourself. For example,

17:08.480 --> 17:16.400
once again, example from the documentation, you will just need to use the Proof ecosystem,

17:16.400 --> 17:24.720
and write some code manually. Compare it to Rust ecosystem, for example, in Rust,

17:24.720 --> 17:28.560
there is a really great,

18:25.120 --> 18:34.400
you can get a copy of it because as you know, it will also form an app as it grandsons.

18:38.720 --> 18:51.840
To begin with, this recommendation mandatory, so for example, the pres yours there.

18:51.840 --> 18:55.840
in variable variables, and possible compiler switches.

18:58.840 --> 19:02.840
Unfortunately, this use case is ugly in every ecosystem,

19:02.840 --> 19:05.840
because it's multi-linked language build.

19:05.840 --> 19:08.840
Multi-language build, you will usually have a build system,

19:08.840 --> 19:10.840
which is oriented only into one language.

19:10.840 --> 19:16.840
Cargo in Rust, whatever in C and C++.

19:16.840 --> 19:18.840
They have a bunch of them.

19:19.840 --> 19:21.840
So whatever, it's always ugly.

19:21.840 --> 19:26.840
Unfortunately, no build system can really automate well.

19:26.840 --> 19:30.840
This use case for actually,

19:30.840 --> 19:33.840
a lot of interesting reasons.

19:33.840 --> 19:36.840
For example, so how to implement it manually.

19:36.840 --> 19:41.840
You will need to pass manually to the C compiler.

19:41.840 --> 19:43.840
For example, if you see dependency,

19:43.840 --> 19:46.840
all corresponding pj of x.

19:47.840 --> 19:49.840
And here you go.

19:49.840 --> 19:51.840
Different pjomots.

19:51.840 --> 19:55.840
You will need to read the old documentation,

19:55.840 --> 19:59.840
regarding pj about your gcc or client compiler.

19:59.840 --> 20:02.840
And remember, this documentation is not that good,

20:02.840 --> 20:05.840
especially in gcc part.

20:05.840 --> 20:08.840
You will need to choose the right pjomot,

20:08.840 --> 20:12.840
because we are a lot of them, especially in LLVM.

20:12.840 --> 20:15.840
And you can have really interesting combinations.

20:15.840 --> 20:18.840
For example, sampling pj of as a go part,

20:18.840 --> 20:21.840
and instrumentation pj of as a C part.

20:21.840 --> 20:25.840
And of course, pj of profiles are not compatible at all

20:25.840 --> 20:27.840
between compilers.

20:27.840 --> 20:30.840
Gcc or LLVM use their own format,

20:30.840 --> 20:32.840
and pprof is not compatible at all.

20:32.840 --> 20:37.840
You will need to get this file from the C compiler,

20:37.840 --> 20:40.840
prepare it in the right way.

20:40.840 --> 20:43.840
You will need to learn right tools,

20:43.840 --> 20:47.840
especially for gcc, they are not a three-wheel.

20:47.840 --> 20:50.840
And pass in the right way to the compiler once again

20:50.840 --> 20:51.840
to the C compiler.

20:51.840 --> 20:54.840
Of course, all of these things should be done via

20:54.840 --> 20:56.840
CGo environment variables.

20:56.840 --> 20:59.840
I wouldn't say it's impossible to do,

20:59.840 --> 21:01.840
but it's steady of styles to do.

21:01.840 --> 21:04.840
It's manual labor.

21:04.840 --> 21:07.840
And believe me, a lot of likes,

21:07.840 --> 21:09.840
it could be a problem.

21:09.840 --> 21:12.840
And if you want to do it like a reproducible way,

21:12.840 --> 21:15.840
like write a script, believe me,

21:15.840 --> 21:17.840
such a script will be really ugly.

21:17.840 --> 21:19.840
For example, if you're trying to package

21:19.840 --> 21:22.840
writing a resipe for a go application

21:22.840 --> 21:23.840
for native dependency,

21:23.840 --> 21:26.840
and you want to write a routine with

21:26.840 --> 21:29.840
optimizes with pj of all the application

21:29.840 --> 21:33.840
during the build process of the package from this resipe.

21:33.840 --> 21:36.840
As for example, a regular use case for pj

21:36.840 --> 21:38.840
optimised application, for example,

21:38.840 --> 21:42.840
for RSI or for Clank, how they are packaged

21:42.840 --> 21:44.840
on the distributions, especially

21:44.840 --> 21:46.840
performance-arranted distributions.

21:46.840 --> 21:49.840
Regarding pj of the go-sego,

21:49.840 --> 21:53.840
I actually tried to find an information.

21:53.840 --> 21:57.840
And I found only one topic at the RAST forum,

21:57.840 --> 22:01.840
and one person actually asked exactly this use case.

22:01.840 --> 22:05.840
And as you see, not so many people were interested

22:05.840 --> 22:07.840
in the topic, so automatically closed

22:07.840 --> 22:09.840
where not interested.

22:09.840 --> 22:11.840
So that's it. That's all information.

22:11.840 --> 22:16.840
So you will meet a lot of interesting

22:16.840 --> 22:19.840
traps on your journey.

22:19.840 --> 22:22.840
So reproducibility and pj,

22:22.840 --> 22:24.840
that's really a huge topic.

22:24.840 --> 22:26.840
For example, when I was talking

22:26.840 --> 22:30.840
about any person like maintainer from distribution,

22:30.840 --> 22:32.840
it's their first question.

22:32.840 --> 22:36.840
How to make reproducibility when pj is enabled?

22:36.840 --> 22:40.840
So here we are actually two dedicated cases.

22:40.840 --> 22:44.840
The first one reproducibility saved pj of profile.

22:44.840 --> 22:47.840
You dump a profile, committed to a VCS.

22:47.840 --> 22:50.840
For example, go once again, did a great job.

22:50.840 --> 22:53.840
There is a standard default dot pj.

22:53.840 --> 22:56.840
We don't have such option in c++,

22:56.840 --> 22:58.840
and we really suffer a bit from that.

22:58.840 --> 23:01.840
And reproducibility pj of profile generation.

23:01.840 --> 23:04.840
So the first case, just saved profile,

23:04.840 --> 23:07.840
but please keep in mind, it can be stale.

23:07.840 --> 23:11.840
We need to regenerate, some frequency, et cetera, et cetera.

23:11.840 --> 23:13.840
Like outdated profiles.

23:13.840 --> 23:17.840
The second case regarding reproducible pj of profile generation

23:17.840 --> 23:19.840
is impossible to achieve.

23:19.840 --> 23:22.840
Don't even try, because it's required

23:22.840 --> 23:25.840
deterministic execution of your application.

23:25.840 --> 23:30.840
And it's, so if you're talking about huge applications,

23:31.840 --> 23:33.840
it's not a viable option to achieve.

23:33.840 --> 23:36.840
So just don't waste your time.

23:36.840 --> 23:37.840
On it.

23:37.840 --> 23:41.840
Regarding saved pj of profile, there is another thing.

23:41.840 --> 23:45.840
I met, for example, when you find an open source project,

23:45.840 --> 23:48.840
for example, this project is file d, file dot d,

23:48.840 --> 23:53.840
that's like a lock processor written in go or buy a zone.

23:53.840 --> 23:58.840
And I found this project, and I found default pj of,

23:59.840 --> 24:02.840
that means that project is optimized by pj of,

24:02.840 --> 24:04.840
and I asked this question.

24:04.840 --> 24:08.840
How did you collect this pj of profile?

24:08.840 --> 24:10.840
The answer was science.

24:10.840 --> 24:15.840
So when you try to build this application,

24:15.840 --> 24:18.840
this profile will be automatically applied

24:18.840 --> 24:22.840
by the go compiler, because it's default dot pj of,

24:22.840 --> 24:25.840
but you don't know the scenario

24:25.840 --> 24:27.840
where this profile was gathered.

24:27.840 --> 24:31.840
There is no information to get it,

24:31.840 --> 24:33.840
except for asking maintainers.

24:33.840 --> 24:36.840
Actually, I found this guy on telegram chat

24:36.840 --> 24:38.840
in some local group, and asked,

24:38.840 --> 24:40.840
if I should remember, asking directly,

24:40.840 --> 24:43.840
and he said about this scenario, et cetera, et cetera,

24:43.840 --> 24:47.840
but no public available information on the internet yet.

24:47.840 --> 24:50.840
Keep it in mind when you see any open source project

24:50.840 --> 24:54.840
with provided pj of profile.

24:54.840 --> 24:58.840
So it's scale, a bunch of additional issues.

24:58.840 --> 25:02.840
You need to collect profiles from hundreds of services

25:02.840 --> 25:05.840
from thousands of machines.

25:05.840 --> 25:07.840
I need to implement proper gathering,

25:07.840 --> 25:10.840
symbolizing store, cleaning routines,

25:10.840 --> 25:12.840
for all of this pj of profiles,

25:12.840 --> 25:14.840
because you will have thousands of them.

25:14.840 --> 25:17.840
If you are talking, especially about large fleets,

25:17.840 --> 25:20.840
like for large big tech stuff.

25:20.840 --> 25:23.840
And of course, you need to make this professor,

25:23.840 --> 25:27.840
you're a robot, a robot, et cetera, et cetera.

25:27.840 --> 25:29.840
Tracking for all of his profiles,

25:29.840 --> 25:32.840
skew, and how they are outdated,

25:32.840 --> 25:34.840
for sure services, raising arrivals,

25:34.840 --> 25:36.840
but we're talking about actually

25:36.840 --> 25:38.840
a pretty strong enterprise here, et cetera.

25:38.840 --> 25:40.840
You don't want to implement all of this thing

25:40.840 --> 25:41.840
from this scratch.

25:41.840 --> 25:45.840
Luckily, we have several solutions.

25:45.840 --> 25:48.840
Let's not our way using pj of manually

25:48.840 --> 25:49.840
at such a scale.

25:49.840 --> 25:51.840
It's not a viable option, believe me.

25:51.840 --> 25:53.840
There is open source solution.

25:53.840 --> 25:55.840
Go or again, that one is parka, parka.dev.

25:55.840 --> 25:57.840
It's open source one.

25:57.840 --> 26:00.840
And another one is solution from Yandex.

26:00.840 --> 26:04.840
Yandex perforated was an open source

26:04.840 --> 26:07.840
one exactly one year ago,

26:07.840 --> 26:10.840
one day before the previous was them.

26:10.840 --> 26:13.840
And there are actually possibilities

26:13.840 --> 26:16.840
to extend our open source solutions,

26:16.840 --> 26:18.840
like Grafana Pyros, copetanlas,

26:18.840 --> 26:19.840
or Fireing Platform.

26:19.840 --> 26:22.840
I opened request to them,

26:22.840 --> 26:24.840
regarding extending their functionality

26:24.840 --> 26:29.840
for doing pjore silenced in both cases.

26:29.840 --> 26:33.840
So, of course, if you're a large enough like Google,

26:33.840 --> 26:35.840
or whatever big tech you can write,

26:35.840 --> 26:37.840
you own profiler, and that's a viable option.

26:37.840 --> 26:40.840
For example, Google, Google White Profiler,

26:40.840 --> 26:42.840
a zone vision at a zone,

26:42.840 --> 26:44.840
and many, many other systems at profiler

26:44.840 --> 26:46.840
is very, very low data,

26:46.840 --> 26:50.840
also has a linear one, probably ready to do it.

26:50.840 --> 26:55.840
So, however, if you decide to use

26:55.840 --> 26:58.840
with solutions, with solutions that kind complicated,

26:58.840 --> 27:00.840
to admin, because it's a large scale.

27:00.840 --> 27:04.840
Stuff, you need to deploy several parts,

27:04.840 --> 27:06.840
you need to,

27:06.840 --> 27:08.840
admin different database,

27:08.840 --> 27:10.840
you need to monitor them, etc,

27:10.840 --> 27:12.840
for example, object storage,

27:12.840 --> 27:14.840
free open source stress, probably safe,

27:14.840 --> 27:16.840
so if you produce, it's not far.

27:16.840 --> 27:18.840
Fun job to admin it,

27:18.840 --> 27:20.840
that's the same actually picture,

27:20.840 --> 27:22.840
but for young expert for it,

27:22.840 --> 27:24.840
once again, postgres, click house,

27:24.840 --> 27:26.840
it's really compatible object store.

27:26.840 --> 27:28.840
Okay, that's not that fun,

27:28.840 --> 27:30.840
but you need to do it.

27:30.840 --> 27:32.840
If you want to use it.

27:32.840 --> 27:33.840
Let's quickly compare,

27:33.840 --> 27:35.840
because I don't have much time,

27:35.840 --> 27:36.840
for minutes.

27:36.840 --> 27:38.840
So, both projects are alive, great.

27:38.840 --> 27:40.840
Unfortunately,

27:40.840 --> 27:42.840
here for it are written in C++,

27:42.840 --> 27:44.840
because the index uses many C++,

27:44.840 --> 27:46.840
Spark is written in Go.

27:46.840 --> 27:48.840
So, we will decide to extend,

27:48.840 --> 27:50.840
a Spark is your way to go,

27:50.840 --> 27:52.840
please don't type C++.

27:52.840 --> 27:54.840
Licenses are both fine,

27:54.840 --> 27:56.840
apashlicness.2.

27:56.840 --> 27:58.840
The commutation, actually,

27:58.840 --> 28:00.840
I would say, both could be improved,

28:00.840 --> 28:06.840
but Spark doesn't cover the SPGO part,

28:06.840 --> 28:08.840
of the functionality at all,

28:08.840 --> 28:11.840
only one commit in the git commit history,

28:11.840 --> 28:12.840
where they said,

28:12.840 --> 28:14.840
we added end to end test

28:14.840 --> 28:17.840
to cover the SPGO functionality.

28:17.840 --> 28:18.840
Let's see,

28:18.840 --> 28:21.840
young decks cover a sampling PGO,

28:21.840 --> 28:23.840
very well, because they use it.

28:23.840 --> 28:25.840
Payed support.

28:25.840 --> 28:26.840
If you're interested,

28:26.840 --> 28:27.840
if you're afraid to know

28:27.840 --> 28:28.840
and they are not interested,

28:28.840 --> 28:29.840
I'm pretty sure,

28:29.840 --> 28:31.840
Spark, yes, they provided.

28:31.840 --> 28:33.840
PGO support for GORE,

28:33.840 --> 28:34.840
it's an important part.

28:34.840 --> 28:36.840
Spark supports it.

28:37.840 --> 28:40.840
It's written documentation.

28:40.840 --> 28:42.840
Perforator supports it,

28:42.840 --> 28:43.840
but in theory,

28:43.840 --> 28:46.840
because they didn't try to use it in production yet,

28:46.840 --> 28:49.840
because most of the service of written in C++

28:49.840 --> 28:52.840
and C++ is extensively optimized by PGORE

28:52.840 --> 28:54.840
from perforator, go,

28:54.840 --> 28:56.840
not yet, but we want to achieve it.

28:56.840 --> 28:58.840
PGO supports other languages,

28:58.840 --> 29:00.840
Spark, no,

29:00.840 --> 29:02.840
where is an issue in the upstream,

29:02.840 --> 29:04.840
no activity yet,

29:04.840 --> 29:06.840
if I say, no, we don't plan to implement it.

29:06.840 --> 29:08.840
Perforator, yes,

29:08.840 --> 29:10.840
and they are interested in it a lot,

29:10.840 --> 29:12.840
especially in native languages,

29:12.840 --> 29:13.840
because once again,

29:13.840 --> 29:16.840
most of the service of written in C++,

29:16.840 --> 29:18.840
and the way you will support

29:18.840 --> 29:20.840
it definitely in a good way.

29:20.840 --> 29:22.840
So, in the last word,

29:22.840 --> 29:24.840
one minute, about PLO,

29:24.840 --> 29:25.840
what is it?

29:25.840 --> 29:27.840
Perforator supports it,

29:27.840 --> 29:28.840
but with nuances,

29:28.840 --> 29:30.840
and Spark doesn't.

29:30.840 --> 29:33.840
PLO, it's actually a PGO on steroids.

29:33.840 --> 29:35.840
It's an optimized binary,

29:35.840 --> 29:36.840
even after PGORE,

29:36.840 --> 29:37.840
the most important optimization

29:37.840 --> 29:40.840
is the re-ording function inside the binary,

29:40.840 --> 29:45.840
for making your CPU instruction cache,

29:45.840 --> 29:48.840
because we'll be less cache-missa,

29:48.840 --> 29:51.840
instruction cache-missa's during the execution.

29:51.840 --> 29:53.840
Available, open source tools at the moment,

29:53.840 --> 29:54.840
LLV and both the main one,

29:54.840 --> 29:56.840
Google Propeller,

29:56.840 --> 29:57.840
is a second one,

29:57.840 --> 29:58.840
until finally out,

29:58.840 --> 29:59.840
optimizer,

29:59.840 --> 30:01.840
archive, to rest in peace.

30:02.840 --> 30:04.840
So, a performance impact

30:04.840 --> 30:09.840
from using PLO for GORE,

30:09.840 --> 30:10.840
is really huge.

30:10.840 --> 30:11.840
You can check,

30:11.840 --> 30:14.840
Huawei implemented this functionality,

30:14.840 --> 30:17.840
but unfortunately, they have for you bad news.

30:17.840 --> 30:19.840
Upstream,

30:19.840 --> 30:23.840
didn't agree to accept

30:23.840 --> 30:25.840
this change into the goal-linker,

30:25.840 --> 30:27.840
because it requires some additional

30:27.840 --> 30:29.840
immaterial occasions and changes

30:29.840 --> 30:30.840
since some goal-linker,

30:30.840 --> 30:33.840
and Huawei guys just decided

30:33.840 --> 30:35.840
they are not motivated enough

30:35.840 --> 30:36.840
to push it into the upstream,

30:36.840 --> 30:37.840
and that's it.

30:37.840 --> 30:39.840
So, go a system,

30:39.840 --> 30:41.840
doesn't have a PLO,

30:41.840 --> 30:43.840
C++ has.

30:43.840 --> 30:45.840
So, summary,

30:45.840 --> 30:46.840
PGORE is great,

30:46.840 --> 30:48.840
local PGORE optimizations.

30:48.840 --> 30:49.840
That's it. Thank you.

30:49.840 --> 30:50.840
Thank you.

30:50.840 --> 30:51.840
Thank you.

30:51.840 --> 30:52.840
Thank you.

31:00.840 --> 31:02.840
Thank you.

