WEBVTT

00:00.000 --> 00:13.000
All right, so welcome to my talk. This is Scott Pachink, click house in production with X-ray.

00:13.000 --> 00:19.000
So this is me, Palo Marcosolta, I'm a software engineer working at click house, core team.

00:19.000 --> 00:25.600
The core team is the team responsible for developing in C++ database itself. I consider myself

00:25.600 --> 00:30.600
a systems developer, so I like low level stuff, I'm passionate about video game development,

00:30.600 --> 00:37.600
and I'm interested in compilers, build systems, new program on languages and things like that.

00:37.600 --> 00:42.600
All right, first a few questions. Who knows what could reloading or hot patching is?

00:42.600 --> 00:51.600
Which you're handy? Okay, almost everyone, right? Who knows what LLVM's X-ray is?

00:52.600 --> 00:56.600
Quite a few, not bad.

00:56.600 --> 01:01.600
All right, so who's ever a cat to unpack something in production and which they can have a new look

01:01.600 --> 01:10.600
trace to figure out what the hell's happening? Everyone, right? So it happens every day.

01:10.600 --> 01:17.600
So let's go to what is hot patching, the basics. So hot patching means changing the code

01:17.600 --> 01:23.600
while the process is still running. So in our case, since we are talking about a native application,

01:23.600 --> 01:29.600
we are talking about modifying the machine code while the process is still running.

01:29.600 --> 01:35.600
This allows us to have fast reputation and we can well, the process is in production.

01:35.600 --> 01:42.600
We can patch and cover different behavior. This is often done in video game development.

01:42.600 --> 01:57.600
It's, yeah, I don't know why. It's often called reloading, rather than code patching,

01:57.600 --> 02:04.600
because I can use a different laptop, maybe or I don't know what's happening.

02:04.600 --> 02:07.600
I mean, my screen is not flickering.

02:08.600 --> 02:16.600
Okay, okay. So there are a few examples of open social libraries that are using this sort of architecture,

02:16.600 --> 02:21.600
in which they hold reload the code. One of them is one time called by C++.

02:21.600 --> 02:28.600
The second one is from Fungus. It's a very simple C straightforward library called CR.

02:28.600 --> 02:35.600
And life++ is the most advanced solution that I've seen out there.

02:35.600 --> 02:40.600
And well-injing is actually using this live++ code reload code.

02:40.600 --> 02:52.600
So a code patch, code patching, it's been there in the Jesus, in the compilers for a while.

02:52.600 --> 02:56.600
Yeah, okay.

02:56.600 --> 03:01.600
All right.

03:02.600 --> 03:09.600
So a hot patching has been out there in the compilers for a while. There are different ways in which you can,

03:09.600 --> 03:16.600
there's that. One of them is by using the instrument functions parameter in which it generates instrumentation code

03:16.600 --> 03:22.600
for every single function, entry and exit, and just by implementing those two functions.

03:22.600 --> 03:26.600
You can hook to all the entries and exits.

03:27.600 --> 03:32.600
I am using QR codes that you can see once every two seconds.

03:32.600 --> 03:51.600
But don't worry about them because they will be one last slide in which you can take all the links together.

03:52.600 --> 03:58.600
Well, hopefully will.

04:08.600 --> 04:12.600
Anyone knows how to hook a patch on each DMI cable?

04:13.600 --> 04:18.600
All right. So the other option that it's been out there for a while.

04:18.600 --> 04:25.600
It's called a patchable function entry in which you can ask the compiler to introduce notes

04:25.600 --> 04:33.600
into either before the function or after the function starts, but before the function body starts.

04:33.600 --> 04:41.600
So this is done. So you save some room filled with notes so that you can override those instructions with your own code.

04:41.600 --> 04:48.600
Okay. So let's move on to x-ray. x-ray consists of three main parts.

04:48.600 --> 04:55.600
The third one is a compilers inserted instrumentation points in a very similar way to what we saw here with the notes.

04:55.600 --> 05:01.600
The second part is a runtime library for enabling and deceiving tracing a runtime.

05:01.600 --> 05:06.600
And the third part is a suite of tools for analyzing the traces.

05:06.600 --> 05:17.600
We are not using this set of tools ourselves in clickhouse because we don't want to have to deploy into production all those tools binary and everything.

05:17.600 --> 05:28.600
So we fight to do a more seamless experience and we have developed our own analysis tools within the same clickhouse binary that we used for everything.

05:29.600 --> 05:36.600
So x-ray has existed since 2018 and it's available on Linux in all those architectures.

05:36.600 --> 05:43.600
Luckily for us at clickhouse we are using x64 and r64 so we are good.

05:43.600 --> 05:51.600
So this will be like the simplest example that you may have for using x-ray programmatically.

05:51.600 --> 05:59.600
First you need to utilize the library then you patch this function patches all the functions.

05:59.600 --> 06:05.600
So there's another function that allows you to patch one single function but this one patches all of them.

06:05.600 --> 06:13.600
You set a handler and then you call whatever you want to in this case dummy full function and you do the cleanup.

06:14.600 --> 06:28.600
This is a dummy full which print something and then the handler needs to be decorated somehow so that you don't call recursively the handler while it is handling something right.

06:28.600 --> 06:39.600
It takes two different parameters the first one it's the function ID and the second one is the entry type which allows you to distinguish whether it's function entry or exit.

06:40.600 --> 06:53.600
We can compile that example by using dash f x-ray instrument and then by default ldm will introduce will instrument all the functions that are larger than two hundred instructions.

06:53.600 --> 07:04.600
Since our friend food is not larger than that I use the instruction threshold equal to one to force the compiler to instrument it.

07:05.600 --> 07:27.600
So if we run this we can see that the food is here right before that we see the function ID which is one because in this case there is one function and that's it and the type it's here which means entry and then we see the type one which means exit at the at the end.

07:27.600 --> 07:47.600
Now in order to do this ldm well the compiler ldm is also adding a new section into the elf binary this section is called x-ray instrumentation map which is basically a map of function ID and memory addresses memory function addresses let's say right.

07:48.600 --> 08:06.600
So this introduces some over some it at some size to the final binary in our test we saw that for x64 it was like 4% so it seems worth it.

08:06.600 --> 08:23.600
And checking out this full function with the debugger ldv we can check this is before patching we can see how it introduces some knobs at the beginning of the function and then some other knobs at the exit.

08:23.600 --> 08:38.600
These pieces are usually called slets so I will call them entry slets and exit slets there is a jump first instead of all knobs because you need to do it in such a way that you can patch atomically.

08:38.600 --> 08:44.600
So you start actually patching the knobs and finally you patch the jump to make it atomic.

08:44.600 --> 08:54.600
Then this is after you can patch everything if you see the difference the knobs have been come a call to x-ray function entry similar.

08:54.600 --> 09:10.600
And then you see that there is a move of one literal one which is the function ID into a register which will be used by function entry and then it will call the handlers that we have set up for that particular function.

09:10.600 --> 09:18.600
Okay, how did we integrate this into clickhouse first what is clickhouse is all up database.

09:18.600 --> 09:32.600
A very fast one hopefully all up stands for the A stands for analytics are supposed to OLTP that T is for transaction on such a sparse grace my SQL and the such.

09:32.600 --> 09:45.600
Those OLTP databases store the data in rows all up that this is store the data in columns so that they can run analytics much faster because of cache coherency and everything.

09:45.600 --> 10:06.600
It is started in 2009 the index metric by LXA millovidov who is also the founder of clickhouse ink and the city of it was open source in 2016 and there are patchy 2.0 license it is developed in C++ even though it has some third party libraries in C and in Rust it has a super strong focus in performance.

10:06.600 --> 10:21.600
And internally in the company we use it we do a lot of dog fooding so we use it for store all the CID results for a lot of all cell mobility we use it for pastila which is a serving text snippet service.

10:21.600 --> 10:42.600
So how did we integrate X-ray into clickhouse first when thinking about how to do this we wanted to be able to to is to patch or to hot patch this in a regular release.

10:42.600 --> 10:54.600
So we need to make sure that whatever we did had an negligible impact in production and this nob thing that we did that we saw before actually enables to do that.

10:54.600 --> 11:15.600
Then X-ray already provides a runtime library which allows us to patch whatever function we want to runtime which is very convenient and since this is a database we thought why don't we use SQL statements to to as a mechanism to interface with the developer to know which functions we want to patch.

11:15.600 --> 11:32.600
We run a POC that was done by our intern Alina Barakova and I want to say thank you big thank you because I try to mentor her to the best of my abilities but as you can imagine for an intern it was an an easy task to grasp.

11:32.600 --> 11:52.600
So see that great work I'm very happy with what she did and then I took the book and I rework it to have a production ready thing that we could release and that landed into clickhouse in 25.12 this last December so it is quite fresh just being a few minor improvements since then.

11:52.600 --> 11:55.600
All of rank have been back ported.

11:55.600 --> 12:02.600
Now this is the cheese sheet of how we can make use of these in SQL statements.

12:02.600 --> 12:10.600
Peoples already have a symbol stable that you can use to install spec which symbols are within the elf binary.

12:10.600 --> 12:18.600
So what I did was adding a new column for to have the function ID the same function ID that X-ray instrument is a map is using.

12:19.600 --> 12:24.600
And by using that we can we can check out all the functions that we come patch.

12:24.600 --> 12:44.600
Then we system instrument at or remove symbol as a string and then we can set three different colors that we will see in a minute and then whether we want to patch at entry or exit and the parameters actually depend on the hunter.

12:45.600 --> 12:55.600
Then once we have added any instrumentation points we can check which are enabled by by taking a look at system instrumentation and finally.

12:55.600 --> 13:04.600
Checking out trace log we can see which of the instrumented points were hit with which parameters, what time, etc.

13:04.600 --> 13:07.600
So let's go through a real example of how this works.

13:07.600 --> 13:12.600
Let's say that I want to patch this query metric log start.

13:12.600 --> 13:19.600
I am very fond of this method because it's one of the first ones that I introduced into the search code.

13:19.600 --> 13:24.600
So by running this query we will get a result such as this.

13:24.600 --> 13:27.600
There are two functions that match that thing.

13:27.600 --> 13:30.600
The first one it's that little method that we care about.

13:31.600 --> 13:40.600
The second one if you are familiar with C++ mangling which crafty it's actually a lambda name because that function contains a lambda.

13:40.600 --> 13:48.600
So by default click also ready smart enough to assume that you meant the usual start query if you only use a start query.

13:48.600 --> 13:53.600
Otherwise you can use the full model name and I'm patched a lambda.

13:54.600 --> 14:01.600
Right let's add three instrumentation points the three that we have the first one will be a log.

14:01.600 --> 14:08.600
We will patch it at entry and then the only parameter that we take it's our coded string.

14:08.600 --> 14:18.600
This will be logged into our usual logging system along with the stack trace so that we know where it will come from.

14:18.600 --> 14:23.600
This is very useful to know whether your path is taken this way or this other way.

14:23.600 --> 14:41.600
Then we have slip which in this case we are patching at exit and then it accepts either one single parameter to slip for a fix amount of time or you can pass two different parameters to slip for a random time uniformly distributed between those two values.

14:41.600 --> 14:47.600
To try to to exercise some stress to see whether something happens right.

14:47.600 --> 14:53.600
Then profile is my favorite handler because it allows me to profile deterministically.

14:53.600 --> 15:05.600
Click also ready has some clean profiling but you know for doing development and even in production is very useful to have deterministic profiling to catch all the outliers and everything.

15:05.600 --> 15:12.600
All right so once we have.

15:12.600 --> 15:23.600
Once we have instrumented at where points we can check them out in system instrumentation you see log at entry with the parameters same for slip.

15:23.600 --> 15:33.600
Same for profile for profile there is no entry or exit because if you want to profile you need to patch both entry and exit or wise it doesn't make any sense.

15:33.600 --> 15:53.600
And after you can finish patching and testing everything that you want you can remove either all of the instrumented points you can remove one of them or you can remove all of all of the instrumented points associated with specific function as in this case query metric log blah blah.

15:53.600 --> 16:11.600
Finally you can take a look at trace log to see what instrumented points we are hit in this case we see profile at entry the duration nanoseconds we cannot see there because of the PDF but it says null okay.

16:11.600 --> 16:19.600
At exit we already have the full function executed so we can actually get the duration nanoseconds.

16:19.600 --> 16:32.600
This is fine but it is not very human friendly so when thinking about how we can make this better we thought about perfect speed scope and these sort of tools all of them use.

16:32.600 --> 16:48.600
The exit current trace event format which is a very straightforward this one file that looks like that but we can export using a SQL such as this one it's a big each SQL I don't want you to focus too much on that I just want you to know that it's.

16:48.600 --> 16:58.600
And this is what it looks like in perfecto I patch this slip for nanoseconds.

16:58.600 --> 17:14.600
Function and you can see the duration the thread you run you can see all the thread from up to bottom the query ad belongs to the cbid the start super convenient for the the ministering profiling.

17:15.600 --> 17:20.600
In terms of feature work we are thinking about adding more handlers as we think that they are needed.

17:20.600 --> 17:29.600
We discuss about if it makes sense to add some retombation to always creating with lua brands until it is but this will probably.

17:29.600 --> 17:38.600
Open a new kind of worms because the moment that you need to grip on the bindings for everything you allow people to run whatever the one it's probably risky.

17:38.600 --> 17:50.600
And the most important thing is to educate others that this thing exists especially people from support team and everything so they can debug production issues more.

17:51.600 --> 18:02.600
Collect the feedback to improve this and some caveats you cannot really mix x-ray and sanitize bills because they are symbols clashing in little clan RT.

18:02.600 --> 18:10.600
And this is not an issue for our release bills because we don't release sanitize bills but for the CI and for all the tests that we have.

18:10.600 --> 18:15.600
We cannot have both of them so we cannot test sanitize bills along with x-ray.

18:15.600 --> 18:19.600
You need to be aware that if you mess something up.

18:20.600 --> 18:29.600
You do it big time because you are doing it in production and it's a hot patchy machine called so nothing has happened but if anything may happen it will be.

18:30.600 --> 18:58.600
You're doing rich things and even though profiling is very cool profiling a function as we saw there is no easy way to profile all the functions within that function because you will need to know beforehand all the functions that you need to patch and it's difficult to do so you may patch all the functions and only enable this for those on your same thread but it's it's messy and we don't want to affect the rest of the program right.

18:58.600 --> 19:01.600
So for now we are going to keep it that way.

19:01.600 --> 19:10.600
Okay finally there is a community slack that you can join if you want to we will also have a click house community dinner tonight in Brussels.

19:10.600 --> 19:19.600
You are welcome to come there would be finger food and stuff like that we also have our technical blocks which are really good.

19:19.600 --> 19:31.600
And this is the final QR code that you may want to check out because all the links that I gave in all the slides are there.

19:31.600 --> 19:33.600
Two minutes.

19:33.600 --> 19:35.600
Any questions?

19:35.600 --> 19:52.600
I just want to say that time span is and this is like CS time so I hope that you appreciate the commitment to the talk.

19:52.600 --> 20:19.600
Yeah the question is regarding the performance difference between the touching and if statement or changing the machine code directly.

20:19.600 --> 20:33.600
So I wish again I haven't tested that personally but you know that any statement is going to involve a one branch play with the branch predictor and everything.

20:33.600 --> 20:44.600
It looks like it's not going to be much always but once you add that into every single function that is larger than 200 it starts to accumulate right.

20:44.600 --> 20:55.600
So we wanted to have something as low level as we could and this is the best that we could find and I think it's really cool x-ray it's a cool technology to do these sort of things.

20:55.600 --> 20:58.600
Question.

20:58.600 --> 21:08.600
Yes the question is if we are running this in production and the answer is yes.

21:08.600 --> 21:10.600
Yeah I hope that's fun.

21:10.600 --> 21:13.600
I wish I could eat the guys not listening to the talk.

21:13.600 --> 21:17.600
This is something that.

21:17.600 --> 21:29.600
This is something that only developers can do in our production thing and there is a media of security checks that you need to be granted in order to do this.

21:29.600 --> 21:42.600
There are different users who have different capabilities and this will be like the admin god mode capability so I think that we should be good but of course if you're.

21:42.600 --> 21:58.600
I mean everything that you may patch it's either going to look which is nothing sleep or profile my effect a little bit but it's supposed to be meant for developers depending something in production not for a regular user.

21:58.600 --> 22:12.600
I don't know this is why we kind of said we're talking a shame for this scripting will be cool but you know if you allow people to do whatever they want they will do whatever they want.

22:12.600 --> 22:14.600
Another question.

22:14.600 --> 22:21.600
Actually only allow you to patch on a call if you want to jump and then developers to control flow to like patch on.

22:21.600 --> 22:23.600
So could you repeat then?

22:23.600 --> 22:25.600
Can you only the entry to the entry set?

22:25.600 --> 22:26.600
Uh-huh.

22:26.600 --> 22:32.600
I just call in and I also have to jump and then developers to control those who actually want to do that.

22:32.600 --> 22:40.600
Yeah the question is rewarding whether you can only code patch the function entry or exit or in the middle of the code right.

22:40.600 --> 22:47.600
It is right now you can only do it with either a entry or exit because this is where X ray it's introducing the notes.

22:47.600 --> 22:49.600
I guess that no.

22:49.600 --> 22:58.600
No no, the question is if I always let the start but tonight it's like if function is about tonight jump for another function I would know that.

22:58.600 --> 22:59.600
Ah ah ah.

22:59.600 --> 23:07.600
Alright so the question was whether you can jump into a different function which fixes the behavior let's say of your wrong function.

23:07.600 --> 23:14.320
Yes, it could be done, it's not done because right now there's no handover that does that but there's nothing

23:14.320 --> 23:20.400
Parenting you from doing that. I mean you call diam into somewhere else and one some other could which will actually be like

23:20.400 --> 23:25.120
I will our code patching for something that wants to change behavior like in video games or whatever

23:25.680 --> 23:27.680
It could be done. It is not done and

23:28.480 --> 23:30.480
There was another question over there

23:31.680 --> 23:33.680
All right

23:37.600 --> 23:39.600
Thank you

