WEBVTT

00:00.000 --> 00:12.560
Hi everybody. I'm Jonas Talmensen. Today I'm going to be talking about OS test. I've been

00:12.560 --> 00:19.200
measuring topics on every single operating system that I can get my hands on. So, I'm

00:19.200 --> 00:25.800
Jonas. Previously I worked at Google for 80 years. I did release engineering testing and

00:25.800 --> 00:30.800
supply chain security for that programming language. These days I'm working full-time on

00:30.800 --> 00:35.800
the sorts operating system and I've got done funding to work on OS test. I'm going to

00:35.800 --> 00:42.360
be talking about today. So, sorts exist my operating system. I just saw the most

00:42.360 --> 00:50.360
chaotic boot ever and I've implemented about 90% of topics in sorts according to the

00:50.360 --> 00:56.280
data that I'm going to be sharing today. Sortics is fully self-hosting and installable operating

00:56.280 --> 01:02.680
system. I'm aiming for server, site, as the primary use case. So, it's actually running its own

01:02.680 --> 01:09.320
infrastructure on sortics.org. It's building itself every night and everything is just done natively,

01:09.320 --> 01:16.920
including this presentation. So, did you know that as a new public's 2024 standard? So, we didn't

01:17.000 --> 01:22.360
release a new standard for eight years and finally, they pushed out a new standard with

01:22.360 --> 01:29.560
exciting new features like AS, Prindiff, NGN.H and just other quality-of-life things. Unfortunately,

01:29.560 --> 01:36.200
their official change is just really, really incomplete. So, I did a blog post about that a few

01:36.200 --> 01:42.840
years ago where I actually found out what's actually new in the standard. And the big problem

01:42.840 --> 01:48.600
is that there's no official test suite for POSIX. There is one that is proprietary, but you

01:48.600 --> 01:53.800
can't really just go and use it yourself. It's a bit out of date, but I'm hearing that it's actually

01:53.800 --> 01:58.280
being updated to the latest version now, which is really nice. But we still need something that

01:58.280 --> 02:04.680
can benefit all of the open source systems. So, the good news is that my friends at NLNet Foundation

02:04.680 --> 02:11.320
they were willing to fund this work. So, that's what I'm going to be presenting today. So, my plan

02:11.320 --> 02:21.080
is to have 100% coverage of POSIX going up to button. So, the first idea is to have a lot of generated

02:21.080 --> 02:26.920
tests that were used for actual standard. And then make sure that's a test for every single function,

02:26.920 --> 02:33.640
every single declaration. And then I move on to actually invoking every function in a handwritten

02:33.720 --> 02:40.600
manner with hello world inputs. And then, finally, I go on to write a lot of low-level

02:40.600 --> 02:47.640
detailed test reads for every single area. Such that at every time we have ever increasing

02:47.640 --> 02:54.760
detail, but there's always 100% coverage. If you have ever read the POSIX standout, you'll realize

02:54.760 --> 03:00.600
there's actually almost code. It says things like the time header shell declare the TM structure

03:00.600 --> 03:05.240
of which shall include at least the following members. There's a lot of sentences like that

03:05.240 --> 03:15.400
and they all over POSIX. So, I passed POSIX with regular expressions. With HTML and a lot of regular

03:15.400 --> 03:21.160
expressions and a lot of special cases and a lot of insanity and it's the worst code ever. And guess

03:21.160 --> 03:30.040
what? It works. So, I was able to output machine readable API files for POSIX. One line per

03:30.040 --> 03:36.520
definition, which says essentially this declaration must exist. It must have these allowed types.

03:36.520 --> 03:40.840
It could be a function, maybe it's a macro. There's a lot of different requirements. Some of

03:40.840 --> 03:46.200
these are option groups and POSIX. There's different namespace rules. It's all in there.

03:47.080 --> 03:54.440
And it's machine readable. Which means I can generate tests. So, I can generate a 4,000 test

03:55.320 --> 04:00.120
for instance here you can see to check whether our clock gets time exists. I assign it to a function

04:00.120 --> 04:06.120
pointer. We'll be right type. And the assignment is only allowed by GCC if it actually has the right

04:06.120 --> 04:13.000
parameters. So, not just any function is allowed. Only one with correct compatible types.

04:13.000 --> 04:18.520
And likewise, I can do the same with struct members and so on. And in do so, I find a lot of

04:18.520 --> 04:23.560
missing features. And then I've went on to do the same thing with the namespace because I can do

04:23.640 --> 04:28.280
things in the opposite direction where pre-processed the system headers. And then I just figure out,

04:28.280 --> 04:33.400
oh, here's a list of everything that's defined in those headers. And then I can come check with

04:33.400 --> 04:39.560
the factual list of what POSIX allows under the different feature macros. And in doing so,

04:39.560 --> 04:45.560
I find a lot of namespace pollution. And there's everywhere, even on GCC, especially due to the

04:45.560 --> 04:53.320
new POSIX standard, which shuffled a few things around. But just because the functions exist,

04:53.320 --> 04:59.240
that doesn't mean they work. So, a lot of functions in practice, it turns out that just return

04:59.240 --> 05:08.280
e-nosis, e-inval, or that just plain buggy in practice. And it turns out that's 1188 functions

05:08.280 --> 05:18.120
in POSIX. So, this is what happens when you invoke all of them. That's my YouTube video. But for instance,

05:18.120 --> 05:23.720
if I just give a Easter Island on food, I'd get free out. And I do the same with every single function.

05:23.720 --> 05:29.160
Some of them require a lot of complicated setup, others are super simple. And it actually turns out

05:29.160 --> 05:33.000
that a lot of functions are stopped in practice. But now I have a data on that.

05:33.000 --> 05:41.640
Then finally, the final phase is to write a lot of detailed testreads. So, so far, I've written a bunch of

05:41.640 --> 05:51.640
stuff on IO, malach processes, pseudo-terminal signals, and UDP, and so on. And when you start poking

05:51.640 --> 05:56.920
at these interfaces in depth, you find bugs instantly. It turns out that if there's a sense

05:56.920 --> 06:04.840
in POSIX, some system forgot to read that sentence. And they're going to have a bug. Especially,

06:04.840 --> 06:12.280
if you look at the stuff that most people actually don't use in practice. So, the next step was to

06:12.280 --> 06:19.160
make a laboratory with every single POSIX OS. All I need is the ability to SSH, into the machine,

06:19.240 --> 06:25.080
run tar, to copy stuff over, make to run it, SSH to execute scripts, and CCF calls to run

06:26.920 --> 06:32.680
compilations. And I have all of these operating systems. If you have any operating system that's

06:32.680 --> 06:37.800
proprietary and is not listed here, please do get in touch because I've wanted to test everything.

06:39.000 --> 06:45.960
So, I ran the test everywhere, and now I have a data problem. Because I found more bugs when I

06:45.960 --> 06:56.280
can possibly report anywhere. 75,000 data points, and yeah. So, even my website scripts

06:56.280 --> 07:01.880
crashed. So, I had to rewrite everything. So, this is my website as it looks now. It's a lot of

07:01.880 --> 07:06.920
data, but it's meant to be actionable. If you have an operating system, you will have a column,

07:06.920 --> 07:13.000
and you can click all the red boxes to explore the different results. So, I'm a front page

07:13.240 --> 07:18.680
show aggregate values and percentages, and you can click things by not 100% and then go in and

07:18.680 --> 07:27.480
see what is missing. So, this is how different operating systems compare and POSIX go in from

07:27.480 --> 07:33.880
left to right in terms of how well they actually do. And you can see that minics is actually the

07:33.880 --> 07:41.880
worst POSIX system. No surprise, Linux is actually the best. But there's a lot of really interesting

07:41.960 --> 07:47.960
stuff in here. Like, OmniOS is doing extremely well. AIX is actually the best proprietary

07:47.960 --> 07:55.640
system, which is very faithfully strict POSIX 2008. So, yeah. I don't think anybody else has done this

07:55.640 --> 08:01.080
before. So, I did the same with the new version of the standard to differ the differences. The red

08:01.080 --> 08:06.360
bar here is how well they're doing on the new standard. And you can see minics, which unfortunately

08:06.440 --> 08:12.520
is abandoned, they also function very well as time capsule to show that some of the historical

08:12.520 --> 08:16.360
systems that just happen to have some of the new stuff because the fact of stuff got standard

08:16.360 --> 08:21.960
the highest. Meanwhile, my strategic system actually implements 90% of the new POSIX

08:22.520 --> 08:27.720
stand-up in their decisions. So, if you want to try those out, my system is actually the best one to

08:27.720 --> 08:33.880
do it on. So, here's some other bugs that I found. For instance, on Dragonfly BST,

08:34.840 --> 08:39.880
there's a few functions that have run parameters in the types. This will probably not matter

08:39.880 --> 08:46.120
if you invoke the functions in practice. But if you make a function pointer to these system calls,

08:47.240 --> 08:54.840
that's going to blow up when passing API mismatch. As in terms of runtime bugs, I find a lot

08:54.840 --> 08:59.560
of that stuff too. So, for instance, if you make an alternate signal stack and execute a new

08:59.560 --> 09:05.640
program, well, POSIX says that when you execute a new program, the alternate signal stack is

09:05.640 --> 09:11.640
reset, but it turns out that Dragonfly BST, Open BST, actually preserved a bit because they

09:11.640 --> 09:18.200
forgot to read the sentence in POSIX versus reset this state. And that means after execute,

09:18.200 --> 09:24.520
well, that process might be an unstable state if a signal happens. Here's not a really cool

09:25.160 --> 09:29.960
graph. So, I've been doing this for six months now and I've been measuring POSIX compliance

09:29.960 --> 09:36.360
over time. And most systems in that time scale have not done much, so that is basically how

09:36.360 --> 09:42.040
it sounds aligned. I admit that most of them. But as a signal in every graph here, so you can see

09:42.040 --> 09:49.080
Open BST and Free BST, they actually have a bunch of recent improvements on the latest releases.

09:49.800 --> 09:54.200
But the most interesting thing is the happy operating systems of upcoming systems.

09:54.200 --> 09:59.720
For instance, my software system in blue here has been increasing steadily. And then, my friends

09:59.720 --> 10:07.080
from Managram, yellow, have overtaken me, which is just annoying. So, we are having a friendly

10:07.080 --> 10:13.880
rivalry and redox is also just accelerating. So, this is measuring how my project is improving

10:14.840 --> 10:21.240
and this is the referendum room where everything is off topic, but I want it to be on topic.

10:21.800 --> 10:29.960
I can tell you that referendum exists everywhere. This is portable. So, yeah. So, my idea is to have

10:30.760 --> 10:37.000
enough useful data to have a critical mass. I can't just go and pop these thousands upon

10:37.000 --> 10:41.960
thousands of bug reports because I don't have time for that. And you know how this, you end up

10:42.920 --> 10:47.560
filing a bug and then you have to debate everybody about it and say, I am right and you are wrong.

10:48.920 --> 10:54.680
But now, because all of the data is just published, it's self-service people can come and just fix

10:54.680 --> 11:02.200
our bugs. I could just keep writing tests. So, in the end, OS tests will just make every system

11:02.200 --> 11:10.760
slightly better. And OS tests is a community project. So, if you want to know some

11:10.760 --> 11:15.960
portable data, like, if you have a .t file, you want to know if that works everywhere.

11:15.960 --> 11:20.680
But just send it to me. I'll just run that .t file everywhere. And if I find anything interesting,

11:20.680 --> 11:25.240
it's a test case. So, we are using it in the continuous integration for

11:25.240 --> 11:31.320
sautics, mannequin, redox, and hopefully more systems, I heard a room of that, Apple might also

11:31.320 --> 11:36.840
be looking at it, which is incredible because that's in public system, that's not so well.

11:36.840 --> 11:45.640
Yeah. So, in the end, I'm hoping that we can make new quality plastic systems and improve

11:45.640 --> 11:52.200
the existing ones. So, what's the best plastic system? It's muscle lip-C, I'm a lean

11:52.200 --> 12:00.200
external, no doubt about it, but a lot of them are really close to getting up there. So,

12:00.200 --> 12:04.360
if people go and fix a few meters, feature macro, saying a few small things here in there,

12:04.360 --> 12:09.400
Gilepsy can probably beat it. And if you want to try out the latest

12:09.400 --> 12:18.040
plastic 2000-24 features, then sautics actually has 90% of them. So, thank you. I'm Jonas.

12:26.040 --> 12:28.040
And of course, with presentation of sautics.

12:28.040 --> 12:38.520
Any questions?

12:52.600 --> 12:57.880
Hello. Thanks for the amazing presentation. I want to ask, have you noticed

12:58.360 --> 13:04.200
like development, accelerating for these OS's live, and a graph to say for my

13:04.200 --> 13:10.440
S's? I'm in the Managom development teams at a say, and we have definitely been trying to use

13:10.440 --> 13:15.480
this a lot to try and beat Linux at their own games at a say, but I was wondering if you saw

13:15.480 --> 13:22.200
other OS groups or other development things, just basically seeing development live. So, to say,

13:22.840 --> 13:35.640
when you add tests? Yeah. So, the question is whether I've seen a signal in other systems

13:35.640 --> 13:43.480
that have been increasing over time. Managom, Sautics and Redox, is the primary one spot.

13:43.480 --> 13:48.280
When I started doing OS tests a few years back in the original form before the major work,

13:48.280 --> 13:53.960
OpenBC did find it and they did improve a bunch of stuff. I did also contact the muscle

13:53.960 --> 13:59.160
and G ellipsi communities, and they've been using my blog posts and so on to actually improve

13:59.160 --> 14:03.640
their systems. I also know that by only G ellipsi has also been using these blog posts. So,

14:03.640 --> 14:07.480
I think it's just a matter of time before I see more stuff in the data, but over time,

14:07.480 --> 14:11.320
I will be tracking everything. So, I can come back next year and say, yeah, POS is 24.

14:11.400 --> 14:20.920
It's now complete everywhere. Is there a certain criteria to any operating system just be sent

14:20.920 --> 14:28.680
to over and get tested for two or how does that work? Yeah, I had the slide, which basically said that

14:33.320 --> 14:41.080
essentially just need to make an SH plus, of course, a compiler. I want to be able

14:41.080 --> 14:48.280
to SSH into systems so I can do things automatically in a lab and I'd deal about some kind of

14:48.280 --> 14:53.160
experience for being able to install an upgrade with system. So, those are the minimum requirements,

14:53.160 --> 14:56.840
but some of the upcoming systems where I've just found ways to execute it, otherwise,

14:56.840 --> 15:02.440
and then I'm just able to put the data down from the CI, even for I'm not able to automate the execution.

15:02.840 --> 15:04.440
Thank you.

15:08.920 --> 15:16.120
What are the difficulties of implementing the new POSX thing? Is it because there are a lot of things,

15:16.120 --> 15:22.920
some functionalities that are complicated to implement? Yeah, so the question is, what's the

15:22.920 --> 15:30.280
difficulty is of implemented projects? Partially, there's just a lot of it, and without a public

15:30.360 --> 15:35.560
test read, it's really hard to know how well you're doing. Oftentimes, you just pour some piece

15:35.560 --> 15:41.080
of software, you see what compilation errors you get, and then you fix those compilation errors.

15:41.080 --> 15:45.880
And then, at runtime, you see what system, of course, are failing and you implement those,

15:45.880 --> 15:50.920
but you don't really have a coherent big picture about it. And that's what those types

15:50.920 --> 15:55.320
are supposed to help you with. It's just knowing how well you're doing, how far you've come,

15:55.400 --> 16:00.920
and then that's just a ton of corner cases that you have to do, but that's for life of implementing

16:00.920 --> 16:07.320
operating system. Thank you. Thank you, another again.

