WEBVTT

00:00.000 --> 00:29.960
So, thanks for inviting me and for organizing

00:29.960 --> 00:36.600
track on our favorite topic.

00:36.600 --> 00:43.040
I don't think I have ever spoken at a conference where the previous talk among other

00:43.040 --> 00:47.760
things was a project that was one of the Nobel Peace Prize.

00:47.760 --> 00:51.160
No, no, Nobel Prize in physics, in certain.

00:51.160 --> 00:55.600
So, it's like a high bar to follow, but the other talk has been good.

00:55.600 --> 00:58.040
I will try to do my best.

00:58.040 --> 01:06.040
I have to talk about what I have been calling continuous performance engineering, something

01:06.040 --> 01:13.640
I have been doing for the past 10 years, in various database companies and now with my own

01:13.640 --> 01:17.720
company trying to productize and evangelize it.

01:17.720 --> 01:23.560
Doing this presentation, I realized that everybody else is talking about continuous benchmarking.

01:23.560 --> 01:28.440
So this is where we are, we haven't even agreed on what we should call this thing that

01:28.440 --> 01:34.440
we know we need to be doing, but I will after this talk maybe adopt this as well because

01:34.440 --> 01:38.000
I realized that majority is gravitating there.

01:38.000 --> 01:43.320
There was a poll earlier, but people have changed a little bit, so this is of course, first

01:43.320 --> 01:49.560
them and out of first them, this is the few hundred people who actually care about performance,

01:49.640 --> 01:54.520
but it's like somebody in the room who not just you, but like in the project, there isn't

01:54.520 --> 01:58.040
really any benchmarks and nobody is doing benchmarking.

01:58.040 --> 02:10.040
Yes, and for some projects, this is actually valid, but then maybe the second option, yes,

02:10.040 --> 02:18.080
we do benchmarks, but not running continuously every day, but once a year, with the release

02:18.160 --> 02:25.040
candidate, I'm told to past it, 10 or so people, maybe.

02:25.040 --> 02:32.480
So I think things are not as bad, we are making progress, so I think the number 3 is probably

02:32.480 --> 02:39.760
going to be popular, so we do have benchmarks running every day in continuous integration,

02:39.760 --> 02:44.080
but we don't actually look at the results or even if you try to look at the results

02:44.160 --> 02:49.600
there, not very useful, who would identify as this one?

02:51.600 --> 03:02.560
Seriously, if, okay, so number 4, we have benchmarks in continuous integration, we run them

03:02.560 --> 03:08.400
every day, at least, or maybe every commit in the best case, and we get actionable results,

03:08.480 --> 03:14.480
we assign tickets to the engineer who did the progression, and they are fixed within weeks,

03:14.480 --> 03:16.720
rather than in December before Christmas.

03:19.680 --> 03:21.920
Now, okay, and most people are just listening, that's fine.

03:22.800 --> 03:28.800
So, fairly even distribution, this is still better, I did this some years ago, and there was like

03:28.800 --> 03:35.840
one who is in number 4, but like I said, of course, we are, this is like maybe

03:36.800 --> 03:43.600
out of all of Europe, the engineers who care about the finance might have been in this room, so

03:44.800 --> 03:50.880
not so bad, we are making progress, but still, you know, if you ask the same question

03:52.320 --> 03:59.120
like about unit testing or testing, like who, who has automated testing,

03:59.920 --> 04:06.560
that, and not even like flakey, the tests, we have good results, and then we, if something fails,

04:06.560 --> 04:10.080
you know, we probably catch it in the poll request, don't even merge it to me,

04:10.640 --> 04:17.600
this is basically everybody doing this for 10, 15 years at least.

04:18.160 --> 04:27.120
I had in 2006 a project where we had Nokia, so one guy actually had the phone and with his own

04:27.200 --> 04:33.280
fingers, he had a hundred tests that he would like physically press on the different buttons

04:33.280 --> 04:40.320
and user interface. He was finished like at 15, 16, in the afternoon, went home and in the next

04:40.320 --> 04:45.840
morning he took another phone and did the same program. This of course never happens anymore,

04:46.560 --> 04:56.880
but with performance, we are a little bit behind, I would say, it's not like a given that

04:58.080 --> 05:04.560
all the projects have some performance tests and we know which tools to use and we like just,

05:04.560 --> 05:09.280
you know, put some Yamal lines in into GitHub and it just works.

05:11.280 --> 05:23.120
So we are, I would say, 10, 20 years behind, so we are like the Velociraptor, which is the fastest

05:23.200 --> 05:33.600
dinosaur, so it's fast, but it's not there. And where is that Augustan Kamala right here?

05:34.640 --> 05:47.920
Yes. So I had to steal your slide because it actually is the problem. This is exactly the problem.

05:48.560 --> 05:54.160
I think there are some other problems. One is that performance and benchmarking is kind of

05:54.160 --> 06:00.400
difficult. Deploying, I automatically deployment, I think it's also difficult until we had

06:00.400 --> 06:05.840
tools like that of form or something which you can just, you know, let's take care of the hard

06:05.840 --> 06:13.680
problem. But the thing with the automated deployment is that when we do software, we do need the

06:13.680 --> 06:19.520
deployment. I've got to watch there is no point in what we are doing. You can kind of postpone

06:19.520 --> 06:25.920
the performance stuff until customers really complained. So this might be one reason

06:27.200 --> 06:32.960
that you really had to solve the deployment and also to benchmarking, you also need the deployment.

06:32.960 --> 06:37.200
So there is a dependence, you need to solve the deployment problem first anyway.

06:37.680 --> 06:47.680
Now when I've had two years start up when I'm trying to sell performance and benchmarking related

06:49.120 --> 06:56.160
services to companies, one thing I find that the performance engineers are often very busy.

06:56.800 --> 07:04.720
Might take like weeks or months to talk with them because they are at some important customer fixing

07:04.800 --> 07:11.840
performance issues in production. And because we are fixing performance issues in production,

07:11.840 --> 07:19.200
we don't have the capacity to do to kind of evolve that tooling that you could fix them before

07:19.200 --> 07:23.840
they get into production. The other thing is, and I used to do this as well, sometimes it's kind

07:23.840 --> 07:29.200
of nice to be the hero that you get to work with the most important customers and you go there

07:29.200 --> 07:34.080
and you know spend some days doing something and then it's fixed and everybody's really grateful

07:34.800 --> 07:40.720
the account manager takes you to a nice restaurant and then you fly to the next place and do the

07:40.720 --> 07:45.360
same thing again. So maybe we don't want to fix it. I don't know. Certainly,

07:47.040 --> 07:53.600
and this was like genuinely my slide before I saw the data dog talk, but the part of the problem

07:53.600 --> 08:03.360
is that math is kind of difficult but unfortunately we are going to need it. So that is probably

08:03.840 --> 08:11.360
one problem as well. And later when we talk about the tuning which also has been covered a

08:11.360 --> 08:17.840
little bit here in previous talks, I find it most performance engineers actually unintuitive

08:18.720 --> 08:28.080
how to tune a server to get repeatable results. So what I'm trying to say is that maybe we

08:28.080 --> 08:36.400
need to do what was done for QA testing and deployment. Some years they had like a DevOps movement.

08:36.400 --> 08:42.000
I actually first time I heard about DevOps was in Boston probably like 15 years ago or so.

08:43.200 --> 08:50.560
And if there was to be like continuous benchmarking movement, these are some

08:51.520 --> 08:57.280
some people and conferences and open source projects and companies that you might

08:58.000 --> 09:04.640
want to follow. Scott Moore is very active on YouTube and LinkedIn. So he's a good

09:06.640 --> 09:15.520
good place to start. And if it wasn't obvious, New York here is the company that

09:16.320 --> 09:23.440
I have been pushing for two years and I'm trying to, the goal is to mainstream some

09:23.440 --> 09:32.000
open source tools that in the past 10 years have become developed and open source and so on.

09:32.000 --> 09:40.720
So I'm here to tell you what I've learned and made available in the past 10 years and I hope that

09:40.800 --> 09:51.680
you will also tell others about these solutions that continuous benchmarking actually is possible

09:51.680 --> 09:59.280
and achievable and getting the repeatable, stable results is something we can do if we share

09:59.280 --> 10:06.400
the shared knowledge how to do it. And from here on the talk has three points.

10:07.360 --> 10:12.880
If you ever see in Finnish President's tube, there is not one interview where the answer

10:12.880 --> 10:20.880
wouldn't be structured into three points. So I believe this is a good way to go. Now for the first

10:20.880 --> 10:28.720
one we and we had kind of synchronized before that the benchmark design in itself is already hard

10:29.440 --> 10:33.920
even like let alone doing it like repeatably or continuously.

10:34.880 --> 10:41.680
So I'm not going to repeat a lot what was already in the previous talk and part of this is kind

10:41.680 --> 10:48.480
of well-established already. So the nice thing is we have these frameworks typically each

10:48.480 --> 10:54.800
language has like one or two. So if you are writing in Java you do JMAG and so on.

10:56.320 --> 11:02.240
So that's kind of well understood and then of course if you use these frameworks you can

11:04.000 --> 11:13.360
relatively easily put them into GitHub workflow or Jenkins and so on. So here we have a good starting

11:13.360 --> 11:24.320
point. I have also in my career developed frameworks to run like distributed benchmarks

11:24.320 --> 11:31.360
so like deploy an entire MongoDB cluster and then deploy some application that creates some

11:31.360 --> 11:36.720
work load and then run the benchmark and then of course because it's in the cloud all these

11:36.720 --> 11:41.920
servers and the spins so you need to collect all the log files and results and so on. And it's quite

11:41.920 --> 11:49.600
complicated to do that. I believe this is an area where solutions are not very established yet but

11:49.600 --> 11:59.200
I see a lot of opportunity for innovation here. So I'm not going to talk more about that but it's

11:59.360 --> 12:06.240
like to highlight an area where I think this isn't very established yet but even if if you start

12:06.240 --> 12:16.000
with single node benchmarks with the well-established tools that's a good starting point I want to

12:16.000 --> 12:24.880
mention one open source project that is like relatively popular and active and I have also like

12:24.880 --> 12:31.120
forked and a new conversion of this one but this has been this is existed for I think six years

12:31.120 --> 12:40.160
at least and has several contributors and so what this project does is you run your benchmark with

12:40.160 --> 12:48.240
the frameworks from the previous slide and this GitHub action benchmark has parses that understand

12:48.320 --> 12:55.120
the output of all of them and you can of course contribute more. I saw recently some activity

12:55.840 --> 13:03.440
I have contributed also a parser for if you want to just use like the time command line utility

13:03.440 --> 13:11.040
to like execute something from the command line which of course isn't very very like a high

13:11.040 --> 13:17.680
fidelity result but it can be like a simple way to do at least something. So hopefully that will

13:17.840 --> 13:24.960
get merged upstream and what this then does is it collects your results and there's some

13:24.960 --> 13:32.240
interesting way to basically maintain a database or like a JSON file in your GitHub repository

13:34.080 --> 13:44.800
which I wouldn't do because your analytics gets kind of embedded in the test history itself

13:44.880 --> 13:50.160
but it's kind of a simple way it doesn't cost anything when it's in your GitHub repository

13:51.760 --> 13:56.960
and then it has threshold based alerts and default threshold and and this highlights the

13:57.840 --> 14:06.000
problem I'm talking about in this talk default threshold in this tool is 100% so so if your performance

14:06.000 --> 14:15.520
describes 2x then it will create a ticket or depending on configuration failure or pull request

14:16.080 --> 14:24.640
and block it and but 2x is quite large is the point so we would want to maybe catch

14:25.280 --> 14:39.360
regression as much smaller than that. And this leads to topic number 2 how do we do change point

14:39.360 --> 14:45.040
detection what is change point detection so topic number 1 we have design benchmarks

14:46.560 --> 14:52.560
we try to design them well so that we get like useful results but they are still noisy

14:52.800 --> 15:07.280
this is from an old publication we did at MongoDB excuse me I think I need to take medicine before

15:07.280 --> 15:14.880
got to take it before it or so so this was a MongoDB test I think at least the first one is

15:14.960 --> 15:25.600
one of the worst case scenarios that we had in 2015 and in this picture there is one regression

15:25.600 --> 15:34.720
on one of the graphs but everything else is noise so how do you how how do you work with this even

15:34.720 --> 15:40.560
like looking at it as a human it's not easy to spot where it's the regression let alone having

15:40.640 --> 15:47.200
something automated where you you have no human looking at it and it should it should find the regression

15:48.480 --> 15:54.960
so but this is normal I don't know those who had like in in the data go dog case for example

15:55.360 --> 16:02.080
20 30% up and down and this is not I know this you for example had variants in your slide

16:03.040 --> 16:07.440
we decided in MongoDB to look at the range from minimum to maximum

16:07.920 --> 16:16.400
result which is stricter but we felt it was kind of like meaningful that we want to have

16:16.400 --> 16:28.400
all all test runs behaving well and so then you get get up to 70% outliers sometimes but

16:28.400 --> 16:36.960
but there is an actual change that is persistent so this was caused by something in the actual

16:36.960 --> 16:49.440
git commit at that point right so this means this means when I was on the team there were

16:49.440 --> 17:00.480
four of us so in a month each of us had one week where we were we were attached to the triage results

17:00.560 --> 17:09.840
coming out of the out of the CI system and you can imagine that mostly they were false alarms

17:11.280 --> 17:18.240
when input data is like this this is something more recent is from the Nürkia

17:20.240 --> 17:26.800
side and some of our users choose to publish their results especially for open such projects it's

17:26.800 --> 17:33.920
convenient because the contributors don't need to create an account and and here also this is

17:33.920 --> 17:44.080
torso database finish American project very popular they get like 20 pull requests per day

17:44.080 --> 17:51.920
many of them from external contributors and simple two simple test select one select count star

17:51.920 --> 17:57.840
this is with the git how default runner very so once in a while you get this kind of hiccups

17:57.840 --> 18:05.840
up and down 40 to 50% compared to the here you can of course with the human are you can see

18:06.400 --> 18:15.120
where is the kind of normal result that that's that's the one that is kind of without these effects

18:15.120 --> 18:23.200
from the infrastructure so now if we do threshold based alerting on this kind of data in

18:23.200 --> 18:34.720
MongoDB or third database it's like first idea that comes to mind here is a generated data set

18:36.320 --> 18:44.320
and and on these two graphs the green line is like an actual change which could be like regression

18:44.800 --> 18:51.360
and everything else is false alarms and in 2015 this was my job so one week every month

18:52.720 --> 18:57.840
I would come to work on Monday and I would have to look through they were like issues created out of

18:57.840 --> 19:04.480
all of those and you know already before you look at them that maybe today 100% of them will be false

19:04.480 --> 19:10.960
positives or like once a week there would be like an actual regression that you then could

19:11.280 --> 19:22.000
file and send to the developer to fix and we were lucky so the first thing we tried

19:23.120 --> 19:29.520
something else than threshold based alerting was this kind of algorithm from Mattis on

19:29.520 --> 19:36.320
and James and that had just been published a year before and you need to understand some

19:36.320 --> 19:42.000
math that we were also lucky that we had an intern that actually studied math and went back to

19:42.000 --> 19:50.080
the master and PhD later who then did the first version of what is today the Otava

19:50.080 --> 20:00.640
incubating project in Apache and this is so so this is the same graph so this is the same same

20:00.800 --> 20:08.320
times is now in both of these graphs the top one is the change direction algorithm and the bottom one

20:08.320 --> 20:17.040
is this like just comparing like some percentage compared to previous point gets an alert so

20:19.360 --> 20:27.920
the key and the key of how do you get this kind of better performance is that it looks at more

20:27.920 --> 20:37.360
data of course than just just the previous point or some some small window and the key word here

20:37.360 --> 20:46.080
is many monitoring tools that have these kind of functions have been designed for monitoring

20:46.080 --> 20:53.280
production and that is a different problem than finding regressions in like in in poll

20:53.280 --> 21:03.040
request before you commit why so in production so the two phenomena you have are outliers which

21:03.040 --> 21:12.000
can be like a spike or single event and then change points which means there is a persistent

21:12.000 --> 21:19.760
change like in the middle here in this demo data set and and we are now only interested in

21:20.640 --> 21:29.600
in in change point the pection so we want to if there is a regression in the actual code it

21:29.600 --> 21:35.520
means like when we rerun the benchmark tomorrow the regression will still be there and then everything

21:35.520 --> 21:42.640
else that's from like network flakiness or noisy neighbors or so on we want to ignore but a lot

21:42.640 --> 21:50.960
of the a lot of the existing tooling actually is designed to also alert you on outliers and

21:50.960 --> 21:59.360
and we want we consider for us that's a distraction okay so that is change point detection

22:00.800 --> 22:08.800
you can check it out on a patch or but this was transfer formative for me and the team

22:09.760 --> 22:19.520
and and it's a sense we published it has been used by many other companies and and that was

22:19.520 --> 22:27.040
the first time that we started believing that this continuous benchmarking is possible in

22:27.040 --> 22:33.040
the sense not just that we can execute the benchmarks but also that we can automatically

22:33.120 --> 22:45.200
analyze results in an actionable way and also actually back then we did another project where

22:47.760 --> 22:51.680
where we looked at the other problem is that why is there so much noise in the benchmark

22:51.680 --> 23:00.320
that results in the first place and now I can a little bit there's a little bit overlap between

23:00.400 --> 23:08.800
the previous talks but let's say if you were here for the data talk this is like wrong answers

23:08.800 --> 23:17.520
only slide so don't like we already know the answer but but if I ask some people who went

23:17.520 --> 23:24.480
sitting in this room today then the typical answers are well you shouldn't use cloud infrastructure

23:24.560 --> 23:32.640
for benchmarking probably we had noisy neighbors or there is some other kind of bad cloud

23:32.640 --> 23:42.640
instances actually this previous talk from cover you and it's true that sometimes you observe

23:42.640 --> 23:50.960
this correctly and there's reason why this kind of change in performance happens but it's not

23:51.920 --> 23:59.520
across the population it's not well understood what what are the reasons why we why we can't

23:59.520 --> 24:10.800
get like easy to analyze reliable performance results and again we had to go back to our core principles

24:11.520 --> 24:17.280
let's let's apply scientific methods or not just math but also like okay let's let's stop

24:18.240 --> 24:26.400
you know quoting things that you read on somebody's blog and let's get to work those on

24:26.400 --> 24:34.400
experiment and see what the results are and then you repeat this we we actually spent two engineers

24:34.400 --> 24:40.800
spent the three months where the objective was to find out that what would be a good

24:40.800 --> 24:50.720
configuration in Amazon in our case to minimize this range from minimum and maximum result

24:53.120 --> 25:02.160
of benchmarks where we knew that there was no regression in fact the benchmark like the same

25:02.160 --> 25:09.280
for me all the time 25 times and here is a fun exercise you can do I'm not going to read all of

25:09.280 --> 25:15.200
them what we will come back so so we we kind of retroactively listed the assumptions because this is how

25:15.200 --> 25:22.640
science work you have hybrid thesis then you test and then you test either confirms or or

25:22.640 --> 25:28.000
invalidates the assumption and then you go to the next one and these were assumptions that were built

25:28.000 --> 25:36.960
into our benchmark in infrastructure but they had never been validated they were just accepted as

25:37.920 --> 25:49.600
fact so for example at the end if you if you read the third one there was there was this kind

25:49.600 --> 25:57.760
of belief that when you launch cloud instances you can kind of sometimes get bad instances

25:58.480 --> 26:05.440
for some reason like maybe noisy neighbor or maybe they're just bad without neighbors and

26:05.520 --> 26:10.800
because of this we had a system where we run some kind of test benchmark first and if the result

26:10.800 --> 26:17.040
wasn't above some threshold we would shut down everything and try again three times and then the

26:17.040 --> 26:26.960
third time we just kind of gave up and run the benchmark anyway and yeah so now to what we did was

26:26.960 --> 26:32.160
instead we took like three months that we didn't really test MongoDB that much and we instead

26:32.240 --> 26:40.640
we tested like our own infrastructure that can we get better results out of it so we stopped on a

26:40.640 --> 26:48.800
on a on a specific git commit that we knew were good and so it was always the same binary

26:50.080 --> 26:55.680
and because you you don't want to have too many moving parts right so you you you only want to

26:55.680 --> 27:01.920
change one thing in the test and keep everything else constant and then we launched five

27:01.920 --> 27:07.280
different servers and repeated each test five times and here it's like from an old publication

27:07.280 --> 27:14.960
again copied I copied what I did and and this is what it looks like so now immediately when you see

27:14.960 --> 27:23.840
this result what can we see from it that validates or invalidates the assumption on previous slide

27:23.920 --> 27:35.120
this does not show so like the five five first dots on each line it's like from the same

27:35.120 --> 27:40.480
server and then like five to ten is the next server and this does not show that suddenly there

27:40.480 --> 27:46.960
would be a server where the five dots are significantly different you see maybe like the first test

27:47.040 --> 27:52.400
there's a bit like the cold cache effect and then like test four one to four a lot of five are

27:53.120 --> 27:59.760
faster but generally there's a lot of variability here but but the variability does not

28:00.960 --> 28:12.160
correlate with like a 0.5 10 15 where we changed server so this was false and imagine this was also

28:12.160 --> 28:19.920
at the time when we paid for Amazon instance is per hour so imagine we would start 16 node

28:19.920 --> 28:25.280
shorted cluster in the worst case they would run this test and say oh no this is like some of

28:25.280 --> 28:31.520
the null server let's shut down everything we had already paid for an hour let's start 16 new ones

28:32.960 --> 28:38.080
this was all based on the false assumption but this does happen the reason this didn't happen for us

28:38.080 --> 28:46.080
is we used C3 family instances which are kind of like a bit more expensive and better quality

28:47.120 --> 28:51.440
if you use the M instances it's true that you can get different CPU generations

28:53.440 --> 29:00.800
even if you you are using like the same same instance type from Amazon they might I think they

29:01.040 --> 29:06.720
kind of give you like more better CPU than you are paying for or something but in the C

29:08.560 --> 29:14.320
with the C family still today this does not happen so so this is also a good lesson now when you

29:14.320 --> 29:19.600
go home you could do the same mistakes and I listen to Hendrick at first them and he said that

29:19.600 --> 29:25.040
this does not happen but actually it can happen if you don't have the exact same instance actually

29:25.040 --> 29:33.280
that I had so you need to do the testing and kind of rigorous scientific attitude so this was

29:33.280 --> 29:42.480
covered so once we iterated over this and kind of changed various configurations we found that

29:43.520 --> 29:51.120
for example the CPUs are not designed so infrastructure that we use isn't even trying to produce

29:51.200 --> 29:59.360
repeatable results in fact the CPUs have a lot of features that explicitly change performance

29:59.360 --> 30:09.280
all the time for environmental reasons for example to save energy so so there is now what

30:09.280 --> 30:16.720
is it's called CPU power client the common line utility which you can then use to turn off

30:17.040 --> 30:23.520
most of these features so that your CPUs have and you can use this in the cloud some

30:24.320 --> 30:34.160
things you can not do on virtualized infrastructure but but actually it turns out that mostly

30:34.160 --> 30:39.440
the cloud mostly it's not the problem to do benchmarking in the cloud like all of these things

30:39.440 --> 30:45.040
if you had your own server in your own laboratory this would still be a problem and you would

30:45.360 --> 30:52.160
have to turn this off by configuration the other big source so we had assumed that

30:53.440 --> 30:59.040
that if like SSDs of course are fast so that has to be good for benchmarking we had assumed that

31:00.320 --> 31:07.520
if you use the so called local SSDs in Amazon that's probably good but that was the biggest source

31:08.080 --> 31:15.440
variation and the explanation is if you read Amazon documentation know what nowhere does it say

31:15.440 --> 31:22.240
that these are local to the server instance so they go somewhere top of rack or somewhere where and

31:22.240 --> 31:28.480
this is where you have the noise neighbor problem so actually the CPU virtualization sorry

31:28.560 --> 31:41.840
CPU virtualization is quite good quality and in my experience CPU doesn't have a lot of noise

31:41.840 --> 31:49.280
neighbor issues this is the reason we have and this could like even on the same instance during

31:49.280 --> 31:56.320
the same hour this could suddenly change a lot because of the SSD and if you use EBS at the time

31:56.400 --> 32:04.240
we use like provision IOPS this is very stable performance because Amazon's are cheap so

32:04.240 --> 32:11.200
if you pay for like 5000 IOPS you would get exactly that not more and this is what what it

32:11.200 --> 32:17.600
looked like when we then deployed the improvements in production so we were able to get more stable

32:18.320 --> 32:25.440
lines in fact in the MongoDB case and there is a blog post still public and there is a link at

32:25.520 --> 32:34.400
the end of this presentation we got all the tests within a 5% range which is a significant improvement

32:34.400 --> 32:45.040
compared to something like 40%. One thing we did in a part of this project we also did tests

32:45.040 --> 32:51.440
where we omitted MongoDB completely and just like tested CPU and this can network performance and

32:51.440 --> 32:57.200
then we thought hey why don't we put this in the CIS as well so every day we kind of

32:58.080 --> 33:06.800
verify that it is the infrastructure still behaving the same as before and generally the answer is yes

33:07.920 --> 33:13.040
but one day we came to work notice this like January

33:13.440 --> 33:31.440
for in 2018 what happened all cloud vendors released fixed firmware fixes for hard bleed

33:31.840 --> 33:42.000
or what was the other one I think this is hard bleed so this is a curative fix and all our performance tests were red

33:43.280 --> 33:48.960
and we thought okay so who screwed up but because we had the canneries running to say oh no actually

33:48.960 --> 33:56.080
this is now a problem in the infrastructure so but by that time we had already been going a year or so

33:57.600 --> 34:03.680
so this is this is like exceptionally event that the regression was explained by the cloud

34:03.680 --> 34:12.240
environment okay so I've been working on this now last half year again 10 years later what has

34:12.320 --> 34:25.440
changed oh so yeah okay so if on on Nürkir.com I've done the same kind of tuning so so you can have a

34:25.440 --> 34:33.120
GitHub runner and there are there are many companies that offer these third party runners and often

34:33.120 --> 34:41.120
they are like faster than the default runner and or cheaper or often bought and Nürkir is the only

34:41.760 --> 34:47.680
one who is offering runners that are not faster and not cheaper but they are better quality

34:47.680 --> 34:55.600
because they are configured for repeatable results and okay I'm apparently out of time but

34:57.040 --> 35:02.320
but using this is quite simple you install the app and it's like one line where you can change the

35:02.640 --> 35:12.160
runner and I just want to say I was personally not expecting this but in some of the tests this

35:12.160 --> 35:18.720
is the same that was in the beginning of the presentation select count star when used which to this

35:18.720 --> 35:28.560
configuration in the torso project you actually get like for a time period when there is no regression

35:28.640 --> 35:34.720
in the code itself it stays within one nanosecond and I was quite impressed by this so you can

35:34.720 --> 35:41.760
use cloud for continuous benchmarking and if you have your own hardware you will have the same

35:41.760 --> 35:49.760
problems and this same configuration is needed so I think this is a good slide to end

35:50.960 --> 35:57.600
do we take any questions nope I will hang around here for the rest of the day so happy to talk

35:58.960 --> 36:00.960
you

36:06.960 --> 36:10.960
sorry small announcement could you please help us and leave through that door