WEBVTT

00:00.000 --> 00:09.320
All right. Hello, everyone. Thank you for coming. Hope you're ready for this adventure.

00:09.320 --> 00:14.080
Let's get started. I'm Nelson Vides. I'm a senior engineer at the Ensemble, and I work

00:14.080 --> 00:18.880
on the NS infrastructure. I have that's the room. I'm also a manair Langanel Xireba Angelista

00:18.880 --> 00:23.680
have been doing this to for most of my career. I also did some CE and stuff. I mean,

00:23.680 --> 00:28.680
if you're wondering why the funny outfit, this is the Ensemble's Christmas sweater, but

00:28.680 --> 00:32.300
really, if you're wondering the funny outfit, I've three weeks ago, I just broke my

00:32.300 --> 00:38.840
collar bones, no boarding, and for that, I recommend everyone really to have a split keyword.

00:38.840 --> 00:45.640
It's very helpful if you want to continue working. Anyway, let's get started. So I'm going

00:45.640 --> 00:50.080
to talk about an Ensemble built on the NS open source, of course. First of all, it's coming

00:50.080 --> 00:56.480
from 2012, and it has been in production for about 11 years. We host the Alfredo Tith

00:56.480 --> 01:00.880
in the name server. The Alfredo Tith records for a lot of cool guys, for example, the

01:00.880 --> 01:09.680
Linux Foundation, happy of that one. Of course, stability is important. I'm going to talk

01:09.680 --> 01:16.480
about the problem when we went from zero. Zero means downtime. To no matter what, we don't

01:16.480 --> 01:22.040
have downtime. And just using classical algorithms. First, the NS basics, I'm sure 99% of

01:22.040 --> 01:27.120
people in the world know here, what you see, you know, let's get the end record of

01:27.120 --> 01:34.120
first them.org, so I can connect to the website. I ask to the name server, to the root server,

01:34.120 --> 01:38.680
say, hey, you know, no, I don't know, but I know the server that has not come as key, okay.

01:38.680 --> 01:42.000
So I ask the one that has not come. No, I don't know it, but I won't, the one that has

01:42.000 --> 01:46.920
first them not not org. Sorry. So I ask the other server, and the other server finally gives

01:46.920 --> 01:55.080
me the address. What do you see in this thing? A lot of network traffic. And distributed.

01:55.080 --> 02:02.520
What is the problem space? DNS on the attack. It's a prime target. A quarter of the

02:02.520 --> 02:07.160
catastrophic worldwide goes to the NS infrastructure. Another quarter goes to TCP-C inflow

02:07.160 --> 02:14.160
attacks. That those two is path. Why so much goes to the NS? If you shut down the NS,

02:14.160 --> 02:20.640
you actually shut down everyone else because you cannot find it. So it also has a terrible

02:20.640 --> 02:27.200
amplification factor. The query for fastened or a record is around 40 bytes plus minus. And

02:27.200 --> 02:32.400
the answer is already around 80. And that's only for one, a record. Like if you put the

02:32.400 --> 02:38.480
NSs and you have like four IP addresses and then you ask for TXT records and the ACTPS records,

02:38.560 --> 02:45.840
you get mega-lice of answers for just 40 bytes. So it sucks. Another problem is UDP, no connection

02:45.840 --> 02:51.760
handshake, so it's very easy to spoof. This is preventable, but it's expensive on routers,

02:51.760 --> 02:58.160
so middleware don't want to do it. Unfortunately, it's expensive for performance-wise. And

02:58.160 --> 03:03.200
also critical if the NS goes down, everyone else is on reachable, so it's actually down.

03:03.280 --> 03:08.880
Three core challenges, congestion, overload, and thundering hurt. Thundering hurt is

03:08.880 --> 03:12.080
probably one you wouldn't expect. I discovered in a benchmark.

03:13.680 --> 03:20.400
congestion, so kills are full, overload, there's just way too many for the available resources. Thundering

03:20.400 --> 03:25.200
hurt, you try to take decisions that synchronize, and that's not good.

03:26.080 --> 03:35.120
If you want to leave this talk with one takeaway, it is this one, do not ever expose and

03:35.120 --> 03:38.880
unbounded resource to the outside world. Otherwise, you're going to get dedos, and it's incredibly

03:38.880 --> 03:45.360
easy. There is no such thing as an unbounded resource, memory, you can have a lot, but it's finite,

03:45.360 --> 03:50.800
network, you can have a lot, but it's also finite, everything is finite. A couple of

03:51.760 --> 03:55.360
theoretical foundations, so that you understand what is it that we're going to optimize.

03:56.400 --> 04:01.680
I started mathematics, by the way, not computer sciences, a lot of no less transfer, but I have some

04:01.680 --> 04:09.760
hobbies. Little slow. We want to get an idea of how long are we going to wait in a queue. We

04:09.760 --> 04:16.320
mean you go to a shop and you're going to buy some, how they say cheese. And you see 10 people in

04:16.400 --> 04:20.080
the queue, and each one of them gets served every two minutes. You can guess that you're going

04:20.080 --> 04:28.480
to get out after 10 people for two minutes or 20 minutes, you will get served. So arrival rate,

04:28.480 --> 04:34.080
or serving rate, is static at deployment time. It's your implementation divided by your hardware.

04:34.080 --> 04:39.280
The moment you deploy, this is decided. So if you want low latency, you need short queues.

04:39.680 --> 04:47.200
If you let the queue row, so the W, and Blanda is static, L will just go wherever it wants to go.

04:50.240 --> 04:58.000
Another formula, the complicated Cs with squares and so on. That is basically something that you

04:58.000 --> 05:06.400
cannot control. This is a data and the variability of utilization, you don't know how your clients

05:06.400 --> 05:11.680
are going to make requests. How is again your implementation divided by your hardware? This is

05:11.680 --> 05:17.920
what you can do in a given amount of time. So what is row? That's the important one. It's utilization.

05:18.480 --> 05:25.040
This basically means that the fuller areas the queue is not only it takes longer to serve the last

05:25.040 --> 05:30.800
one. It takes longer exponentially, not linearly, polinometallysory, not exponentially.

05:31.360 --> 05:39.360
This means that if utilization is around half, the weight in time equals service time. But if

05:39.360 --> 05:46.080
utilization is 0.9, let's say CPU utilization. Weight in time is 9 times. And if utilization

05:46.080 --> 05:56.720
goes to 95%, then weight in time is 19x. So you want to optimize for not for higher nominal CPU usage,

05:56.720 --> 06:02.880
but for lower Q latency. In DNS, it has very tight timeout. If you don't give a request after so long,

06:02.880 --> 06:10.960
then people will retry, to late. So you get the idea of the problem with a queue and latency.

06:10.960 --> 06:18.400
So let's now solve these problems. Congestion. Typically, in case you have a queue package

06:18.400 --> 06:23.760
arriving over the network and the queue has a size, you can configure in your kernel. The kernel

06:23.760 --> 06:29.920
has the queue, your program has queue. Let's say that the size is 4, you so it fits on the slide,

06:29.920 --> 06:35.120
because you know, and you want to arrive. There is a package that has been waiting for,

06:37.280 --> 06:42.800
it has a delay. There is a package that has been waiting for 4 milliseconds, another one for 6,

06:42.800 --> 06:47.360
for 8, and then a new package arrives. Where do you put it? It's full. You are going to drop it.

06:47.360 --> 06:52.800
Why are you dropping the new one? That is absolutely fresh. And not the last one that has been

06:52.880 --> 06:59.200
waiting for a while. Maybe it's just about the timeout. This is the classic that the universal default

06:59.200 --> 07:03.360
tail drop, if it doesn't fit, you just throw it away, but then keep in mind you're throwing the

07:03.360 --> 07:09.280
freshest. Or even more complicated. Or more annoying. Imagine that. I'm going to remove the mouse.

07:10.320 --> 07:15.600
Imagine that you just finished resolving a package that got stuck for a bit. So now the whole queue

07:15.600 --> 07:20.320
is delayed a little bit. And then some package arrives. There has one millisecond and a new one arrives.

07:21.120 --> 07:25.600
Where do you put it? You're going to drop it, but you have a bunch of packets here. Maybe one of them

07:25.600 --> 07:33.600
is a better candidate. So by the time we drop the fresh one, there is some other package. That

07:33.600 --> 07:39.120
probably were better candidates. The problem is bus of load and this is not a new problem. They're

07:39.120 --> 07:46.320
used to be entire working groups on the Linux kernel and on networking internet groups. Large

07:46.400 --> 07:52.400
buffers means less packet loss until they get full. And then it means high latency.

07:53.920 --> 07:58.640
Backing when the internet was not so fast, we used to think that the big buffers make the internet

07:58.640 --> 08:07.280
faster, but in reality, it's a latency. And now that we have this 10g, 100g, this incredibly large

08:07.280 --> 08:14.400
throughput, big buffers means minutes of latency. So that's not how we do things anymore.

08:15.360 --> 08:19.680
One solution is code. It's fantastic default in the Linux kernel since quite a while.

08:21.440 --> 08:27.120
Very good for most mixed traffic. There is some special algorithms. If you only have TCP,

08:27.120 --> 08:31.920
but otherwise, if you have every possible thing, code is very nice. Why is code all doing?

08:33.280 --> 08:38.320
This is an example. I'm doing that long, as I said, about should be so free level anyway.

08:38.320 --> 08:42.960
Imagine a new packet arrives and I have the socket and the time of arrival and then the

08:42.960 --> 08:48.160
being is the binary. This is the stuff that I need to decode and compute and so on. And before

08:48.160 --> 08:55.520
I compute the naive algorithms that I'm guilty myself of doing, it works, but it's very inefficient.

08:56.240 --> 09:01.120
Let's imagine that my time on this one second. If this has been in the queue, when I pick it up,

09:01.120 --> 09:06.400
it has been for more than one second then drop it. But what if it has been for 999 milliseconds?

09:06.480 --> 09:10.800
But I may finish computations for sure. It's already past the timeout.

09:11.920 --> 09:17.920
So do I drop it at 80% of the time at 50% of the timeout? How do I calculate this?

09:19.200 --> 09:23.920
So this is something that code does, getting the current time on the time of arrival.

09:23.920 --> 09:26.320
It calculates a different and it self-heels. How?

09:28.480 --> 09:34.960
Codell, you can see it as a state machine, gets packages and checks if the soil

09:34.960 --> 09:41.040
time that is, the time it has been waiting is below a target. Target is usually between 5 to 10% of

09:42.160 --> 09:49.520
your configure timeout. So if we say one second, target is going to be from 50 to 100 milliseconds,

09:49.520 --> 09:54.480
configurable. In my case, I configure a hundred. So 10% Linux kernel by default configure 5,

09:54.480 --> 10:00.400
a parameter. 5 is usually very fine until you have a specific benchmark that tells you otherwise.

10:00.400 --> 10:03.840
Next packet arrives below Soljorn, whatever, next one, next one, next one, next one.

10:03.840 --> 10:10.560
On the moment you have the first packet above Soljorn time, it's 10%. It's not expired. It's fine.

10:10.560 --> 10:14.400
You don't drop it yet. But now you start tracking. You go to first above.

10:15.760 --> 10:20.400
If the next one is also above Soljorn time, and the next one and the next one.

10:20.400 --> 10:26.880
Until the entire interval, that is your configure timeout, then codell says, this queue got stuck.

10:26.880 --> 10:31.680
This is not one quick spike that you just born very fast. The queue got stuck.

10:31.680 --> 10:37.200
So now you need to drop one packet. Considering that you still probably have packages around 51

10:37.200 --> 10:42.720
milliseconds. So potentially you're dropping very early, this one. But effectively, you have a

10:44.480 --> 10:48.240
queue that got stuck. At the end of the queue, you're going to have very late packages.

10:50.080 --> 10:53.840
You drop that one and when you drop the next one,

10:53.840 --> 11:02.320
it's going to work on a controller. So don't drop on the first spike. Wait for the congestion.

11:02.320 --> 11:08.000
And now, how do you decide when to drop the next? It's not every time you see the above Soljorn,

11:08.000 --> 11:15.520
remember Soljorn is very short. If Soljorn continues being above target, not in the next interval,

11:15.520 --> 11:23.520
because that is too slow. Not dividing by count because it's way too aggressive. It grows way too fast.

11:24.320 --> 11:30.320
But a square root of the number of packages that you have drops so far. And this grows relatively

11:30.320 --> 11:35.360
slow at the beginning, because maybe it's your suspect, but eventually it's going to get very aggressive.

11:37.920 --> 11:42.880
And with that, you control the problem of our congested queue. Second problem, overload.

11:43.600 --> 11:46.400
Too many requests, limited resources. So I say nothing is infinite.

11:46.800 --> 11:52.960
As a solution, queue week. That's another default for TCP, since God knows how long,

11:52.960 --> 12:05.840
I don't remember 15, 10 years. I have here. 2006, yes. The idea is that TCP congestion controls the congestion window.

12:06.560 --> 12:11.840
So it keeps track of when the last packet withdraws and then it sells heal.

12:12.080 --> 12:15.520
Calculating how much packets can it really send that the receiver is going to receive.

12:17.200 --> 12:22.960
It controls acceptance rates. What we want to control is acceptance rate. How many requests are

12:22.960 --> 12:31.920
we going to accept in DNS? But we can use the same math. Find the capacity that tells you when we hit

12:31.920 --> 12:37.920
congestion or when we escape the window in TCP is the case when we hit too much utilization.

12:38.320 --> 12:44.560
So then we know that the queue is going to get a lot of latency. Back off, you see in a cubic formula

12:44.560 --> 12:49.280
and recover very fast because it's a cubic formula. It's not a square root. It's a cubic. It's a

12:49.280 --> 12:55.040
aggressive at the beginning. So it comes back to the more many detected the last failure very fast.

12:55.040 --> 12:59.120
And at that point where the last failure was, then it's very slow.

13:00.640 --> 13:03.680
Why cubic fast? Recovery is stable plateau and then careful growing.

13:04.640 --> 13:10.160
We are going to use cubic to tell us the rate and with the rate we are going to decide how long to

13:10.160 --> 13:17.120
sleep, how long to wait until we take the next packet from the network. The meantime the CPU

13:17.120 --> 13:23.920
is basically calculating the responses to all the ones that are very right. The state machine for

13:23.920 --> 13:32.320
cubic is something like this. CPU has a lot of utilization. Then if it says no, it's just continue

13:32.320 --> 13:36.400
growing, continue growing, let more packages arrive until basically delay zero. That means

13:37.600 --> 13:43.760
use all your power to take everything from the network. Until you see that the CPU is to congestion,

13:43.760 --> 13:51.760
then congestion and cubic says rate multiplied by zero. That is 20% slow down. The default

13:51.760 --> 13:59.360
in the Linux kernel is 30% slow down in my benchmarks 20% was better. And then this is our new

13:59.360 --> 14:07.760
current rate. And we are going to grow on that rate. We use some limitations. So this doesn't

14:07.760 --> 14:18.160
escape. You can read the detail later. The moment we hit a congestion, we say that this point,

14:18.160 --> 14:22.880
the middle of the graph, that is where we are going to place the congestion moment. And we automatically

14:22.880 --> 14:28.240
drop 20% behind here. And we are going to very quickly grow until we hit the congestion moment.

14:29.120 --> 14:34.000
Then the moment where it was congested last time. If nothing happened there, we are going to

14:34.000 --> 14:40.000
continue growing very slowly. Nothing happened if we just have more capacity. Like queries are

14:41.200 --> 14:45.600
simpler this time. They take less resources. Then we are going to start growing super fast.

14:45.600 --> 14:50.880
Until we hit a new congestion point, move the graph, move around the ground again.

14:51.840 --> 15:02.640
Why it works? It finds capacity automatically. It is responsive that the maximum rate. So it

15:02.640 --> 15:09.680
doesn't grow indefinitely. Because at some point, infinite rate means 0 and 20% of infinite

15:09.680 --> 15:15.840
also means 0. So it is never going to tell me to delay. I need to put a maximum on the rate.

15:16.160 --> 15:23.840
The last challenge is thunder in here. Now, these two algorithms are happening on a bunch of

15:23.840 --> 15:28.400
threads. It is a thread that takes packages from the queue passes to another one through sols.

15:29.680 --> 15:33.760
Like that, there is many threads. I am doing an Erlang. So these are processes. And I can have millions

15:33.760 --> 15:41.440
of them. They are literally 233 machine words. So I run one kilobite. Really, a million of them is

15:41.440 --> 15:48.640
one giga of RAM. So I can have a lot. It allows me to architect this thing. I can just one

15:48.640 --> 15:52.800
takes packages from the queue passes from the network passes to another to the resolution. And then

15:52.800 --> 15:59.600
they synchronize if they, you know, bug pressure. The problem is when you run, when I was running

15:59.600 --> 16:06.080
this load test in a multithreaded server, all threads congestion, CPU is a global variable. And

16:06.080 --> 16:10.800
suddenly all threads say, oh congestion, slow down. And then everyone's slow down. At the same time.

16:12.400 --> 16:20.000
And then everyone recover at the same time. Because, you know, everyone got the same amount of sleep.

16:21.680 --> 16:26.960
And the problem was that I could see how we were trying to like dedos our servers to like calculate

16:26.960 --> 16:33.440
capacity. And then I'm seeing metrics and to my shock. I see that suddenly CPU is 30%.

16:34.400 --> 16:41.200
What? Why is it so wasted? And it's cyclic. It goes to 30% and immediately again to 100% for a while

16:41.200 --> 16:47.440
before it says 30%. And immediately I'm like, I'm wasting capacity. I could be answering a lot more

16:47.440 --> 16:52.800
queries in time. Basically, I'm just telling to a world school everyone. I'm not working today.

16:52.800 --> 16:58.800
I could work for some of them. So this was something that I know in me a lot when I saw it in the

16:58.800 --> 17:02.400
merge marks. I work on this so hard. I was so excited. I was like, yeah, let's do it. Let's like,

17:02.480 --> 17:06.480
you're going to see. And then I was like, okay, hold on. Let me, let me improve this.

17:07.760 --> 17:16.240
So two rules. Never delay after that delay. Cubic says, well, well, to be see slow down. And

17:16.240 --> 17:21.680
then you delay. You come back a millisecond later. And don't ask again, I wish still to be

17:21.680 --> 17:26.240
see should I delay? Just work. When you come back from a delay, do your work. When you finish a work

17:26.240 --> 17:30.480
thing as for delay again, I was guilty of this. I was like, oh, still busy. Okay, continuously

17:31.440 --> 17:39.360
this was eventually synchronizing everyone and taking my CPU way too low. And another rule,

17:39.360 --> 17:46.160
put some geeter. It's a classic. It's easy. And it works amazingly well. I haven't marked

17:46.160 --> 17:56.720
this. No synchronization anymore. In my benchmark, I can see the 7.5% randomization. So if Cubic

17:56.720 --> 18:03.360
says sleep for 100 milliseconds, then one thread is going to sleep for 93, another one for 107. And randomly,

18:03.360 --> 18:12.400
I have as I say millions of threads. So the distribution gets very good. And another thing

18:12.400 --> 18:20.640
was a third rule is how am I calculating the utilization? This is important one. That was one

18:20.640 --> 18:24.160
that I actually had in my studies and completely forgot. I was not good at statistics. Only when I

18:24.160 --> 18:32.480
grew up, I realized how important it was in high school. If my thread calculates utilization

18:32.480 --> 18:40.800
at the moment, when it needs to decide congested or not, it's not sampling. I may ask that question

18:40.800 --> 18:45.520
at that precise nanosecond when CPU got you a little bit free. Or you're a little bit more busy.

18:46.480 --> 18:53.360
And if all your programs are top or fair, and so on, all of them are doing sampling. They're

18:53.360 --> 18:59.280
just like every nanosecond making a question. And then it's moving the graph. That's something

18:59.280 --> 19:03.040
that I was not doing. I was just asking the question on the spot. What I should have been doing

19:03.040 --> 19:10.080
is it's moving the graph. So I have a different thread that just another grand process. I have

19:10.080 --> 19:18.480
so many. That is calculating the times. Like in on a timeout is calculating the utilization

19:18.480 --> 19:24.880
and running an exponential moving average. That is, the latest value is 30 percent. Everyone else is

19:24.880 --> 19:31.920
70 percent. So every time I calculate a new value, I remember what was the last one. And smooth

19:32.000 --> 19:39.280
and in out, this 30 percent is another parameter. Some people use 25. Some people use 40.

19:39.280 --> 19:44.240
My case, I run a lot of trials. 30 did the job very well. Just one line of code.

19:45.520 --> 19:50.560
Important theorem because I like them very much. In the quiz Shannon says that you have to sample

19:50.560 --> 19:55.120
twice as often as they think you're measuring. A funny thing, you remember when you watch a movie

19:55.120 --> 19:58.560
and a wheel is moving. And there is a moment when you have the impression that the wheel is moving

19:59.440 --> 20:04.400
when the wheel moves very fast. That is because the wheel is moving faster than the frame rate

20:04.400 --> 20:10.400
of the camera. Ideally, if the camera wants to capture sampling, it needs to move twice as fast as

20:10.400 --> 20:17.040
the wheel. That's why that effect happens. So at the end, to recap, it works more or less like this.

20:17.040 --> 20:23.440
Schedule authorization gets smoothened. It's feedback for both code and cubic, which create feedback

20:23.440 --> 20:29.920
that I put back in the scheduler utilization. Results, they note, never dies. Most important thing.

20:31.200 --> 20:36.000
Under any load, under any attack, we actually got the dose attack last Thursday. I was so excited about it.

20:39.760 --> 20:45.840
Thank you. I was a finishing work. We're preparing something and suddenly I'm like,

20:45.840 --> 20:49.120
wait, million notification. What's going on? And everybody was at the dose attack and it's like,

20:49.120 --> 20:57.680
yes, just before fostering them. Awesome. So, not very open, be in die. Well, happens like before

20:57.680 --> 21:02.320
I did all this work on my hardware. This was the queries per second that I could return and then

21:02.320 --> 21:08.000
some improvement under normal load. But once you pass over load massively, it used to die.

21:08.000 --> 21:12.240
This means downtime. This means somebody waking up in the middle of the night. That's not me anymore.

21:13.200 --> 21:20.080
This means like customer trust, like queries are not resolving. This is like the worst case scenario.

21:20.080 --> 21:26.160
This is nasty. It's also downtime means five minutes, ten minutes, an hour. This means,

21:27.440 --> 21:33.040
like the gradient service. But some people still get requests. And nobody needs to run from the toilet

21:33.040 --> 21:38.480
because oh my god, I have a request. I need to fix this. Fine. You can just, can go in three minutes is fine.

21:39.040 --> 21:42.880
This also means it's self healing. Probably by the time you arrive to the computer from the alarm,

21:43.440 --> 21:47.760
everything, which is what happened on Thursday. I arrived to the computer. Everybody's like,

21:47.760 --> 21:57.120
what's happening? It's like nothing anymore. Cool. To recap, cubic for red limiting code for the

21:57.120 --> 22:01.760
humanagement. To be honest, code was my favorite discovery, my unemployment in this was a lot of fun.

22:02.800 --> 22:07.920
Signal is moving with exponential moving average and remember to sample frequent enough.

22:08.480 --> 22:14.320
And it's so simple. We'll know this. Try it yourself. It's open source. This has the two

22:14.320 --> 22:19.920
pull requests that implemented most of these important changes. Read the code, let or something from

22:19.920 --> 22:26.240
it. People say that that langs syntax is ugly. I don't mind, but it's self readable and it's functional.

22:28.080 --> 22:29.680
And that's it for me. Thank you very much.

22:38.480 --> 22:40.720
Thank you, Nelson. We have a couple of minutes. Any questions?

22:51.440 --> 22:58.080
Very, very good question. Thank you. Yes, of course. The question is if I account for TCPO only for

22:58.080 --> 23:04.000
UDP, 99% of this work is in UDP because it's the workload we get, 99% of the traffic is on UDP.

23:04.960 --> 23:13.120
In TCP, yes, also, but it's not that smart. It's for the future because, you know,

23:13.120 --> 23:18.960
UDP was urgent. It's my backlog. It's probably on Monday. I'm going to start doing similar things

23:18.960 --> 23:27.680
for TCP. We actually solved the loss of tax and the alert was for TCPO. It's also a question

23:27.680 --> 23:39.280
on the server, because you can do the load loss. You got four, but yeah. I'm sure that evil

23:40.240 --> 23:43.680
actor is going to watch this video and say, oh, that's how I'm going to attack Nelson now.

23:45.600 --> 23:47.760
Yeah, on Monday, I will fix it.

23:51.120 --> 23:52.080
Another question, maybe?

23:52.080 --> 23:57.760
Yes, sure. Do you think that you are willing to know that stuff could be implemented for

23:57.760 --> 24:05.040
as a type of like the routing of market security? Yes. Absolutely. So, question is if this kind of

24:05.040 --> 24:10.640
QM mechanisms can be implemented for other kinds of services like routing, yes, totally. Actually,

24:10.640 --> 24:17.680
they already are. The Linux kernel uses on the TCP stack in Codel, the same way is the default.

24:18.160 --> 24:21.760
And there is, as I said, at the beginning, there is an alternative for TCPO only

24:23.440 --> 24:28.960
traffic, then there is a slightly better flag that you can use inable on your production server,

24:28.960 --> 24:33.040
but the default is Codel already. It's actually a fair QCodel, which is why I already do in

24:33.040 --> 24:39.200
that land because I have millions of processes, but in Linux, it just opens a bunch of Qs and

24:39.200 --> 24:44.080
like puts the traffic fairly between the Qs and each Q is running its own copy of Codel.

24:44.720 --> 24:53.520
Just the same thing I'm doing with millions of threats. And with Linux, most routers should be doing the same.

24:53.520 --> 25:00.080
OpenVRT does it, it's also a default. There was a big movement in the internet working group

25:00.080 --> 25:06.000
something like 6 or 7 years ago to push all router manufacturers to make this a default,

25:06.000 --> 25:10.320
because the default used to be big buffers, they'll drop. I explained at the beginning is, but

25:10.320 --> 25:16.080
especially for really big capacity. So yes, it should even feed your

25:16.080 --> 25:20.560
HTTP work server. Many of these lessons are not the end-specific. If you are serving requests

25:20.560 --> 25:25.280
over HTTP, they are also like you also don't have infinite resources. And maybe the time out is

25:25.280 --> 25:30.080
not a few milliseconds, maybe there are requests that are fine to be served ten minutes later,

25:30.080 --> 25:34.640
but you still don't want to queue all of them because you're going to get deals. So you need to manage

25:34.640 --> 25:45.200
the Q. Do we still have time for one more question? All right. Perfect. Thank you very much.