WEBVTT

00:00.000 --> 00:06.000
Thank you very much.

00:06.000 --> 00:07.000
Hi, everyone.

00:07.000 --> 00:08.000
Thanks for coming.

00:08.000 --> 00:09.000
My name is Philip.

00:09.000 --> 00:13.000
I currently work at Apple on the XNU kernel.

00:13.000 --> 00:16.000
I work out of an office that we have in London.

00:16.000 --> 00:20.000
But in years past, I spent quite a lot of my adolescents

00:20.000 --> 00:24.000
and adult life working on this hubby OS called Axel.

00:24.000 --> 00:27.000
So I'll give you a quick demo now.

00:27.000 --> 00:31.000
So you can kind of get a sense for what the system is like.

00:31.000 --> 00:36.000
So this is a UEFI Rust bootloader that I made.

00:36.000 --> 00:38.000
And then we boot into the system.

00:38.000 --> 00:40.000
And we get the desktop environment.

00:40.000 --> 00:41.000
So we can click around a bit.

00:41.000 --> 00:44.000
Obviously it's running in QMU.

00:44.000 --> 00:46.000
It's got a doom floor.

00:46.000 --> 00:49.000
So you can play doom.

00:49.000 --> 00:51.000
Cheers.

00:55.000 --> 00:56.000
Cool.

00:56.000 --> 01:02.000
We could look at some other system utilities like a numerate the PCI bus.

01:02.000 --> 01:05.000
As you can see, there's like gooey toolkits.

01:05.000 --> 01:07.000
Here's a file browser.

01:07.000 --> 01:11.000
We can look at the kind of visualized virtual address spaces

01:11.000 --> 01:15.000
of different processes.

01:15.000 --> 01:18.000
Let's see what else can we look at.

01:18.000 --> 01:19.000
So it's a micro kernel.

01:19.000 --> 01:22.000
So everything's in user space including device drivers,

01:22.000 --> 01:25.000
like the mouse driver and keyboard driver.

01:25.000 --> 01:27.000
And these all communicate via message passing.

01:27.000 --> 01:29.000
So as an example, we could look at like,

01:29.000 --> 01:33.000
we could open a text viewer for this cogent interface.

01:33.000 --> 01:37.000
That the keyboard driver is using to send events

01:37.000 --> 01:39.000
to the wind manager.

01:39.000 --> 01:42.000
Good look at some other stuff as well.

01:42.000 --> 01:50.000
Here's a gameboy emulator that I also wrote in Rust.

01:50.000 --> 01:54.000
All the text that you see on the screen is rendered using a true

01:54.000 --> 01:57.000
type font renderer that I wrote in Rust.

01:57.000 --> 02:00.000
And this is all running within this wind manager.

02:00.000 --> 02:03.000
This desktop environment called Axel Wind Manager.

02:03.000 --> 02:09.000
Now, in this kind of long list of stuff that I've put up on the screen

02:09.000 --> 02:14.000
and pointed out, you'll notice that nowhere in this have I pointed out a GPU driver.

02:14.000 --> 02:16.000
And there's a few reasons for that.

02:16.000 --> 02:19.000
One of them is that it's kind of difficult to take a kind of,

02:19.000 --> 02:22.000
you know, abstract system software component like a GPU driver

02:22.000 --> 02:25.000
and then point to it in an official demo.

02:25.000 --> 02:29.000
And then the other reason is that I actually don't have a GPU driver

02:29.000 --> 02:30.000
in this operating system.

02:30.000 --> 02:33.000
This is all rendered on the CPU.

02:33.000 --> 02:36.000
And you might ask yourself, okay, so what?

02:36.000 --> 02:39.000
You know, all the other stuff that I showed is all in software

02:39.000 --> 02:42.000
when it comes to CPU, why is the wind manager,

02:42.000 --> 02:45.000
why does that have any special considerations?

02:45.000 --> 02:49.000
Well, it turns out that if you want to have this kind of environment

02:49.000 --> 02:53.000
of, you know, kind of an interactive system with animations

02:53.000 --> 02:56.000
and it's all quite interactive.

02:56.000 --> 03:00.000
It can be pretty difficult to render all of this entirely on the CPU

03:00.000 --> 03:04.000
while having a frame rate of decent and fast.

03:04.000 --> 03:08.000
So this talk, as you might have seen, is kind of some of the things

03:08.000 --> 03:12.000
that we have to do along the way to save work

03:12.000 --> 03:17.000
and make this fast enough to be interactive.

03:17.000 --> 03:19.000
All right.

03:19.000 --> 03:24.000
So switching over to this kind of web-based slide deck here.

03:24.000 --> 03:27.000
Yeah, so just to give you a kind of fuel-for-the-system,

03:27.000 --> 03:30.000
axolose as a whole, meaning like the kernel

03:30.000 --> 03:33.000
and then all of the user space stuff that I've written for it

03:33.000 --> 03:37.000
is roughly on the order of like 100,000 lines of code.

03:37.000 --> 03:42.000
And then this wind manager itself is like, say, 5,000, 6,000 lines

03:42.000 --> 03:44.000
of rust, give or take.

03:44.000 --> 03:46.000
This is a clipping of the source tree.

03:46.000 --> 03:47.000
Again, just to get a sense.

03:47.000 --> 03:51.000
So it handles a few things like the various state machines for

03:51.000 --> 03:54.000
window management and communicating with applications

03:54.000 --> 03:58.000
and kind of the window life cycles, kind of event loop stuff

03:58.000 --> 04:00.000
or interacting with the operating system.

04:00.000 --> 04:03.000
And then the core compositor itself, which is actually responsible

04:03.000 --> 04:06.000
for pushing all the pixels and figuring out what to draw

04:06.000 --> 04:11.000
and doing animations and that kind of thing.

04:11.000 --> 04:13.000
Cool.

04:13.000 --> 04:17.000
So conceptually, if you wanted to make this sort of project,

04:17.000 --> 04:21.000
if you wanted to make a compositing desktop environment,

04:21.000 --> 04:23.000
it's pretty straightforward conceptually.

04:23.000 --> 04:27.000
We have this like classic metaphor of documents laid out

04:27.000 --> 04:29.000
on the top of a desk, a desktop.

04:29.000 --> 04:33.000
And we have these kind of this idea of like the spatial indexes,

04:33.000 --> 04:36.000
like these documents are laid out in a stack

04:36.000 --> 04:39.000
and they all have a z index and you can drag them around on the desk

04:39.000 --> 04:41.000
and you can rearrange them.

04:42.000 --> 04:46.000
By the way, all of the demos in this talk,

04:46.000 --> 04:49.000
these are like live 3D renders.

04:49.000 --> 04:54.000
I don't know what that B was, I don't think was me.

04:54.000 --> 04:55.000
Cool.

04:55.000 --> 04:58.000
And then the way that this works is the compositor itself,

04:58.000 --> 05:01.000
Axel Windermanager has like a giant frame buffer,

05:01.000 --> 05:04.000
that you know gets eventually rendered to the display

05:04.000 --> 05:05.000
output device.

05:05.000 --> 05:08.000
And that's like a big grid of pixels.

05:09.000 --> 05:12.000
And then so it'll draw like the desktop background to that.

05:12.000 --> 05:16.000
And then all the applications that have windows that are managed by the Windermanager

05:16.000 --> 05:20.000
have their own frame buffers, which is like a shared memory channel

05:20.000 --> 05:23.000
between the Windermanager and the application that owns the window.

05:23.000 --> 05:26.000
And then when the application says, hey, Windermanager,

05:26.000 --> 05:28.000
you know, I want to redraw my window.

05:28.000 --> 05:29.000
I've got some updated content.

05:29.000 --> 05:34.000
The Windermanager will draw like the window decorations

05:34.000 --> 05:37.000
to the final screen frame buffer.

05:37.000 --> 05:40.000
And then it'll take the frame buffer from that shared memory channel

05:40.000 --> 05:46.000
and copy it in to wherever it should be on the desktop.

05:46.000 --> 05:50.000
So this is really just like a specialized man copy with lots of like rectangle

05:50.000 --> 05:51.000
mouth happening.

05:51.000 --> 05:54.000
But there's a term of jarge, a term of art for this.

05:54.000 --> 05:55.000
It's called blitting.

05:55.000 --> 05:58.000
So the Windermanager does a lot of blitting of windows.

06:01.000 --> 06:02.000
Cool.

06:02.000 --> 06:05.000
So if you wanted to do this and kind of do it in the most simple way

06:05.000 --> 06:08.000
just to start out with, we could have a pretty simple set up

06:08.000 --> 06:13.000
where initially the compositor would draw the background to the screen

06:13.000 --> 06:16.000
and then every frame it would like loop through all the windows

06:16.000 --> 06:20.000
draw each of those and then finally draw the cursor.

06:22.000 --> 06:25.000
And we can see kind of how this would work in practice.

06:25.000 --> 06:30.000
So this now is the actual acts of Windermanager

06:30.000 --> 06:34.000
kind of extricated from the rest of the OS compiled web assembly

06:34.000 --> 06:35.000
and running in the browser.

06:35.000 --> 06:37.000
So this is again like the real thing.

06:37.000 --> 06:40.000
And if we start to play with it a little bit,

06:40.000 --> 06:44.000
immediately we can see why this strategy kind of falls apart.

06:44.000 --> 06:48.000
We're not doing anything to like clear up the contents of previous frames.

06:48.000 --> 06:53.000
So over time we just get a corrupted frame buffer with a bunch of old work leftover.

06:53.000 --> 06:58.000
And we can throw in some windows to kind of highlight this problem a little bit more.

06:58.000 --> 07:03.000
So it looks all right, but clearly we've got some work to do.

07:04.000 --> 07:09.000
And you can drag some windows around, resize them.

07:09.000 --> 07:11.000
This is my jog, by the way, isn't it?

07:11.000 --> 07:12.000
Just taper.

07:18.000 --> 07:20.000
All right.

07:20.000 --> 07:24.000
So one really easy way to fix this is just every time we draw a frame

07:24.000 --> 07:27.000
just totally clear the canvas and start over.

07:27.000 --> 07:28.000
And that works pretty well.

07:28.000 --> 07:31.000
That solves the problem that we just had.

07:31.000 --> 07:35.000
And we can throw in some more windows.

07:35.000 --> 07:40.000
So really, this works, this works pretty well.

07:40.000 --> 07:42.000
We could just stop here.

07:42.000 --> 07:44.000
This is is perfectly good enough.

07:44.000 --> 07:48.000
However, I've got about 15 minutes or so left in this lot.

07:48.000 --> 07:51.000
So we're definitely going to have to figure something else out.

07:51.000 --> 07:54.000
And really there is a problem with this approach.

07:54.000 --> 07:59.000
This approach is actually probably the best you could do in terms of the ratio between

07:59.000 --> 08:01.000
like effort to bugs.

08:01.000 --> 08:05.000
It was really easy to do and it has basically no bugs.

08:05.000 --> 08:10.000
But unfortunately, there's like a secret kind of third constraint that we're also going to worry about.

08:10.000 --> 08:14.000
And that is this approach is extremely slow.

08:14.000 --> 08:18.000
And this isn't really the way we like to do things and kind of the, you know,

08:18.000 --> 08:20.000
hobby, operating systems world.

08:20.000 --> 08:22.000
We like things to be really fast.

08:22.000 --> 08:26.000
Take a lot of effort and inevitably have tons of bugs.

08:26.000 --> 08:31.000
Because that's where all the fun is.

08:31.000 --> 08:36.000
And we can kind of visualize the problem a little bit more here if I add in this frames counter

08:36.000 --> 08:41.000
and then do some movie magic to make it reflect like this actual frame rate.

08:41.000 --> 08:46.000
It works, but it's not super pretty.

08:46.000 --> 08:51.000
We can definitely do better than this.

08:51.000 --> 08:55.000
And there's a few reasons for why this is one which is kind of pretty easy to spot

08:55.000 --> 08:58.000
is we might have done all the work to like render a window.

08:58.000 --> 09:02.000
But then it ends up just completely occluded by another window in front of it.

09:02.000 --> 09:06.000
And this problem of overdraw causes a lot of wasted work.

09:06.000 --> 09:08.000
And we can quantify this overdraw.

09:08.000 --> 09:11.000
So about this counter and the upper right here.

09:11.000 --> 09:15.000
And you can see that like as I make these windows larger,

09:15.000 --> 09:17.000
this overdraw counter goes up.

09:17.000 --> 09:21.000
And that's because, you know, we're doing all the work to like render the background.

09:21.000 --> 09:24.000
And then we just draw over it again with the window.

09:24.000 --> 09:28.000
And if I drag a window on top of another one, the problem gets even worse.

09:28.000 --> 09:31.000
Because we're going the background and then the southern window and then it all gets

09:31.000 --> 09:34.000
occluded again.

09:34.000 --> 09:38.000
And of course as I like close windows, if you look at our frames counter, it'll go up.

09:38.000 --> 09:43.000
Because we're doing less work.

09:43.000 --> 09:45.000
Okay, so how do we solve this?

09:45.000 --> 09:48.000
This problem of drawing these occluded regions?

09:48.000 --> 09:50.000
What we can do the math and figure out all right.

09:50.000 --> 09:53.000
We've got all these windows, but what kind of portions of each of them are actually going to

09:53.000 --> 09:56.000
wind up visible on the final desktop?

09:56.000 --> 09:59.000
So if we think of a window, obviously it starts out.

09:59.000 --> 10:01.000
The full thing is visible.

10:01.000 --> 10:07.000
But then as we start adding more windows on top of it, we can kind of do this operation

10:07.000 --> 10:12.000
where we like, you know, squeeze up all the rectangles and figure out exactly what portions

10:12.000 --> 10:13.000
are still uncluded.

10:13.000 --> 10:18.000
And we can recursively do this operation as we consider everything in the Z stack and keep on

10:18.000 --> 10:22.000
shrinking these rectangles until we find out, you know, the final list of what's

10:22.000 --> 10:26.000
going to be visible on the desktop.

10:26.000 --> 10:31.000
And one optimization that we'll go ahead and do here now is if you do this sort of algorithm

10:31.000 --> 10:36.000
naively, it's really easy to wind up with these rectangles which are like, you know, they

10:36.000 --> 10:39.000
could clearly be merged into a larger rectangle.

10:39.000 --> 10:43.000
And it pays off to do this because this operation of drawing these rectangles is like

10:43.000 --> 10:44.000
our main bottleneck.

10:44.000 --> 10:49.000
So we want to reduce the wasted work as much as we can up front.

10:49.000 --> 10:54.000
All right.

10:54.000 --> 10:57.000
So this operation sounds simple.

10:57.000 --> 11:02.000
And to be honest it is, there's just a lot of kind of hairy edge cases to think about

11:02.000 --> 11:03.000
and kind of deal with.

11:03.000 --> 11:07.000
So here's just a gallery of some of the cases you might be in.

11:07.000 --> 11:13.000
So like over here you've got, you know, you've got one visible frame that gets

11:13.000 --> 11:18.000
occluded and split into four or can be split into three or it can be split into two or

11:18.000 --> 11:22.000
isontally or two vertically.

11:22.000 --> 11:23.000
That's not right.

11:23.000 --> 11:29.000
So just a lot of kind of edge cases to think about.

11:29.000 --> 11:34.000
And to deal with this, I kind of ended up writing a bunch of unit tests in this in this

11:34.000 --> 11:35.000
repository.

11:35.000 --> 11:40.000
So I tried to make it easy for myself to be like, okay, given this layout of windows on the

11:40.000 --> 11:45.000
desktop, here's exactly the configuration of visible regions that we expect to end up with.

11:45.000 --> 11:50.000
Which helped a lot while locking down the the business logic here.

11:50.000 --> 11:56.000
All right, one last kind of optimization will make to this set up before moving on.

11:56.000 --> 12:01.000
So the naive way to do this is we compute all the visible regions and then we've got a big list,

12:01.000 --> 12:04.000
a big list of all the things that are visible on the desktop.

12:04.000 --> 12:09.000
But one of the main operations that we find ourselves doing in the compositor is like,

12:09.000 --> 12:14.000
okay, given this rectangle, say it's a mouse cursor or whatever, what does it intersect with?

12:15.000 --> 12:22.000
And we do this so often that scanning through this list of rectangles becomes a really heavy operation.

12:22.000 --> 12:25.000
And so we can pick a better data structure to kind of deal with this.

12:25.000 --> 12:30.000
So there are these really neat data structures called like hard trees or spatial indexing,

12:30.000 --> 12:35.000
which gives much better algorithmic complexity on answering this question of given a rectangle,

12:35.000 --> 12:38.000
you know, what are the rectangles that intersect with it?

12:39.000 --> 12:45.000
We go from like a logarithmic complexity to cubic in the naive setup.

12:45.000 --> 12:51.000
And we know from like classical computational complexity that as we have more stuff on the desktop,

12:51.000 --> 12:56.000
the difference between these two setups becomes more and more pronounced.

12:56.000 --> 12:59.000
So you know, really pays off.

13:01.000 --> 13:04.000
Cool. So now we're moving to a world in which we're doing this.

13:04.000 --> 13:06.000
We're figuring out, all right, given all the windows,

13:06.000 --> 13:09.000
but are there visible regions and then only drawing those.

13:09.000 --> 13:12.000
So we'll go ahead and kind of move some stuff around,

13:12.000 --> 13:17.000
so you can kind of get a feel for how this works and what it looks like.

13:21.000 --> 13:24.000
We can do the same thing with just some slower animations.

13:24.000 --> 13:26.000
Again, just to get a sense.

13:26.000 --> 13:29.000
You can add some more windows.

13:36.000 --> 13:43.000
And now if we, all right, so same thing again,

13:43.000 --> 13:46.000
but I've thrown in some more desktop elements just to make this a bit.

13:46.000 --> 13:49.000
Look a bit more realistic.

13:49.000 --> 13:53.000
So as I say, this is not the full operating system.

13:53.000 --> 13:56.000
This is just kind of everything is sort of simulated.

13:56.000 --> 14:01.000
So this menu bar and the dock normally these are like their own applications,

14:01.000 --> 14:04.000
running a new space, but here I've just sort of hacked them in.

14:04.000 --> 14:06.000
But anyway, you get the idea.

14:06.000 --> 14:08.000
And you can see that as elements,

14:08.000 --> 14:10.000
like these shortcuts become occluded,

14:10.000 --> 14:13.000
we see that, you know, we don't have to do anything to draw those,

14:13.000 --> 14:17.000
and we don't kind of render these regions here.

14:17.000 --> 14:19.000
We don't do the splitting rather.

14:21.000 --> 14:24.000
That's the Z index changes, this all updates.

14:27.000 --> 14:29.000
Now if we check our overdraw again,

14:29.000 --> 14:32.000
first of all our frames per second has rocketed up

14:32.000 --> 14:36.000
to a cool 11 frames per second, which is,

14:36.000 --> 14:40.000
it's all right, but we've still definitely got some improvements we can make from here.

14:40.000 --> 14:44.000
But this problem of overdraw has basically completely gone away.

14:44.000 --> 14:49.000
So you can see that we've got 196 pixels being overdrawn.

14:49.000 --> 14:51.000
And that's actually just the mouse cursor.

14:51.000 --> 14:53.000
It's 14 by 14.

14:53.000 --> 14:56.000
So we could do like the visible region splitting for the cursor,

14:56.000 --> 14:58.000
but it's not really worth it.

14:58.000 --> 15:00.000
We just accept a bit of overdraw for that.

15:00.000 --> 15:04.000
But everything else as I stack windows and add more stuff in,

15:04.000 --> 15:08.000
we don't get any more overdraw, which is great.

15:10.000 --> 15:15.000
Okay, so moving on, thinking about kind of the life cycle of an application,

15:15.000 --> 15:18.000
obviously windows are going to be rendering stuff over time.

15:18.000 --> 15:20.000
They're going to have a frame, they're going to do some work,

15:20.000 --> 15:21.000
they're going to have another frame,

15:21.000 --> 15:23.000
and they're going to tell the window manager,

15:23.000 --> 15:25.000
hey, please, please redraw me.

15:26.000 --> 15:30.000
So we can think about what the compositor would do in this case.

15:30.000 --> 15:34.000
So these are both kind of the final frame that's rendered by the compositor.

15:34.000 --> 15:36.000
So here's the previous frame,

15:36.000 --> 15:40.000
and then it re-renders everything with the new application frame,

15:40.000 --> 15:43.000
and draws that to the frame buffer.

15:43.000 --> 15:46.000
But if we look at this frame that we just drew,

15:46.000 --> 15:49.000
the only thing that changed was this,

15:49.000 --> 15:51.000
this we know in the middle,

15:51.000 --> 15:54.000
all the other drawing that we did was just waste work.

15:54.000 --> 15:59.000
Oops, and this is kind of going to be our next step

15:59.000 --> 16:01.000
in terms of making an efficient compositor.

16:01.000 --> 16:04.000
So previously what we were doing was this strategy up top,

16:04.000 --> 16:06.000
where every frame would clear the screen,

16:06.000 --> 16:09.000
figure out all the drawable regions, draw all of those.

16:09.000 --> 16:11.000
But then in the new world,

16:11.000 --> 16:13.000
we're just going to be everything,

16:13.000 --> 16:15.000
we're just going to be iterating whatever rectangles

16:15.000 --> 16:17.000
have been dirtied since the previous frame.

16:17.000 --> 16:24.000
So this involves quite a lot of like different cues

16:24.000 --> 16:28.000
and state management that we didn't really have to think about before.

16:28.000 --> 16:30.000
I'm not going to walk through this all,

16:30.000 --> 16:33.000
but just to give you a sense for kind of what this involves.

16:33.000 --> 16:35.000
We've got, as I say, these per frame cues,

16:35.000 --> 16:37.000
we've got these trees of like visible regions,

16:37.000 --> 16:39.000
and then as I do these different scenarios,

16:39.000 --> 16:42.000
you know, moving the mouse or a window

16:42.000 --> 16:44.000
says it needs to be redrawn,

16:44.000 --> 16:48.000
you change the Z index or whatever different cues

16:48.000 --> 16:50.000
kind of get implicated,

16:50.000 --> 16:55.000
and we'll need to deal with them in the render loop.

16:55.000 --> 17:00.000
Okay, so let's start to try and build up a strategy using these

17:00.000 --> 17:03.000
cues that will track our dirty regions.

17:03.000 --> 17:06.000
So just as a reminder, this is the sort of a scene

17:06.000 --> 17:08.000
we're going to try to be rendering.

17:08.000 --> 17:09.000
It's not going to look like this at first,

17:09.000 --> 17:12.000
but we'll build back up to it.

17:12.000 --> 17:14.000
So we've got these cues,

17:14.000 --> 17:19.000
but you can see them kind of down here.

17:19.000 --> 17:21.000
But we need to figure out,

17:21.000 --> 17:24.000
okay, when do we need to add something to one of these cues?

17:24.000 --> 17:27.000
Well, one example that's pretty easy to start with

17:27.000 --> 17:29.000
is whenever the mouse moves,

17:29.000 --> 17:31.000
we know we're going to need to redraw the background

17:31.000 --> 17:33.000
wherever the mouse previously was.

17:33.000 --> 17:36.000
So we can do that here by adding this line,

17:36.000 --> 17:37.000
you know, cue a redraw,

17:37.000 --> 17:39.000
where the mouse previously was,

17:39.000 --> 17:40.000
and that looks like this.

17:40.000 --> 17:42.000
You can see down here,

17:42.000 --> 17:44.000
and the list of background draws,

17:44.000 --> 17:46.000
we're cueing background draws,

17:46.000 --> 17:48.000
whenever I move the mouse around.

17:48.000 --> 17:52.000
And we can check,

17:52.000 --> 17:55.000
we can add back in our frames per second counter.

17:55.000 --> 17:57.000
And yeah, so this,

17:57.000 --> 17:59.000
we're getting like infinity frames per second.

17:59.000 --> 18:02.000
So this is definitely going to work well for us.

18:02.000 --> 18:04.000
Obviously we're not really drawing anything yet,

18:04.000 --> 18:06.000
so it's going to go down,

18:06.000 --> 18:09.000
but as a strategy, this is going to work great.

18:09.000 --> 18:13.000
Now it turns out that rather than just drawing the previous mouse frame,

18:13.000 --> 18:17.000
it is useful to union where the mouse previously was,

18:17.000 --> 18:18.000
and where it is now,

18:18.000 --> 18:20.000
so we can just go ahead and make that change,

18:20.000 --> 18:22.000
and that looks like this.

18:27.000 --> 18:30.000
Okay, now as we kind of looked at,

18:30.000 --> 18:32.000
there's some windows on the screen,

18:32.000 --> 18:34.000
but you can't really see that at all,

18:34.000 --> 18:35.000
like they're here,

18:35.000 --> 18:37.000
but obviously they're invisible.

18:37.000 --> 18:38.000
The other reason for that,

18:38.000 --> 18:40.000
is we're never telling,

18:40.000 --> 18:41.000
we're never doing this operation,

18:41.000 --> 18:43.000
of like recombuting where its drawable regions are.

18:43.000 --> 18:45.000
So we can now that in next,

18:45.000 --> 18:46.000
we can say,

18:46.000 --> 18:48.000
these windows are animating in,

18:48.000 --> 18:49.000
so we can say,

18:49.000 --> 18:51.000
whenever we do an animation step,

18:51.000 --> 18:53.000
recompute the drawable regions of,

18:53.000 --> 18:54.000
you know,

18:54.000 --> 18:56.000
everything that's been dirtied by the animation.

18:56.000 --> 18:57.000
So that helps a bit,

18:57.000 --> 19:00.000
they've got these drawable rectangles now,

19:00.000 --> 19:02.000
but we still can't see them,

19:02.000 --> 19:05.000
because we also need to tell the compositor that,

19:06.000 --> 19:08.000
when we kind of,

19:08.000 --> 19:10.000
when we dirty a region,

19:10.000 --> 19:13.000
we need to redraw the stuff that's within it.

19:22.000 --> 19:24.000
So that works okay,

19:24.000 --> 19:25.000
but of course,

19:25.000 --> 19:27.000
we also need to clear up the background in that space,

19:27.000 --> 19:29.000
so we can do that now.

19:29.000 --> 19:31.000
So we'll keep track of,

19:31.000 --> 19:33.000
whenever a region has been dirty,

19:33.000 --> 19:34.000
we'll keep track of,

19:34.000 --> 19:35.000
okay,

19:35.000 --> 19:37.000
what portion of the background is now unaccluded,

19:37.000 --> 19:38.000
we're at this previously included,

19:38.000 --> 19:41.000
and we can hue background draws for those.

19:41.000 --> 19:42.000
And this is,

19:42.000 --> 19:44.000
this is starting to look pretty good.

19:44.000 --> 19:46.000
I mean things fall apart a little bit

19:46.000 --> 19:48.000
when we like drag a window around,

19:48.000 --> 19:50.000
but we're definitely getting there,

19:50.000 --> 19:52.000
so we can keep going.

19:58.000 --> 19:59.000
All right,

19:59.000 --> 20:00.000
so now we can make it,

20:00.000 --> 20:01.000
so whenever we drag a window,

20:01.000 --> 20:05.000
we hue and update for the entire affected region of the drag.

20:09.000 --> 20:10.000
All right,

20:10.000 --> 20:12.000
we're getting somewhere.

20:12.000 --> 20:15.000
I noticed now that if I click on a window

20:15.000 --> 20:17.000
to change this the order,

20:17.000 --> 20:20.000
nothing really updates until I drag the window around,

20:20.000 --> 20:21.000
which isn't right.

20:21.000 --> 20:22.000
I'm clicking this,

20:22.000 --> 20:24.000
which maybe you can't really see,

20:24.000 --> 20:26.000
but it's not popping to the front until I move it.

20:26.000 --> 20:28.000
So we'll also need to make it

20:28.000 --> 20:30.000
so when you modify the Z index,

20:30.000 --> 20:34.000
that triggers an update.

20:34.000 --> 20:39.000
And that's working pretty well now.

20:39.000 --> 20:41.000
So one thing here is,

20:41.000 --> 20:43.000
when you hover over the title bars,

20:43.000 --> 20:46.000
we're supposed to draw these like icons,

20:46.000 --> 20:49.000
which is kind of happening as I jiggle the mouse cursor over it,

20:49.000 --> 20:51.000
and triggers some dirty regions,

20:51.000 --> 20:52.000
but not as it should.

20:52.000 --> 20:55.000
So we'll make it so when you move the mouse over a title bar,

20:55.000 --> 20:58.000
we'll do a redraw there.

20:58.000 --> 21:03.000
And now that's working well.

21:03.000 --> 21:05.000
Cool.

21:05.000 --> 21:07.000
So keep it going a little bit.

21:07.000 --> 21:10.000
Let's add in some desktop shortcuts and see how that goes.

21:10.000 --> 21:13.000
So you can kind of see a few of them here,

21:13.000 --> 21:15.000
but the rest seem to be invisible,

21:15.000 --> 21:18.000
and actually they will show up if I like it dirty them,

21:18.000 --> 21:21.000
by kind of hitting them with another shortcut,

21:21.000 --> 21:23.000
or something like that.

21:23.000 --> 21:24.000
That's kind of silly.

21:24.000 --> 21:25.000
Also they're meant to highlight,

21:25.000 --> 21:26.000
as I drag the cursor over them,

21:26.000 --> 21:29.000
and that's not happening.

21:29.000 --> 21:31.000
So we can make it so,

21:31.000 --> 21:34.000
in this kind of state machine of dealing with mouse input,

21:34.000 --> 21:36.000
as I drag over the shortcuts,

21:36.000 --> 21:39.000
they handle that, they get a refresh.

21:39.000 --> 21:41.000
Of course, they still weren't popping up

21:41.000 --> 21:43.000
like when the page is loaded, which isn't right.

21:43.000 --> 21:46.000
So we can make it so when the shortcuts are created,

21:46.000 --> 21:49.000
we'll also call them to get a redraw.

21:49.000 --> 21:52.000
And now our shortcuts are looking good.

21:57.000 --> 21:59.000
Okay, that's it.

21:59.000 --> 22:01.000
So this is what we got to.

22:01.000 --> 22:04.000
Actually, maybe I'll show that with the,

22:04.000 --> 22:07.000
so you can see all these kind of dirty regions,

22:07.000 --> 22:11.000
see what happens, click around a bit more.

22:11.000 --> 22:13.000
So I change the Z index,

22:13.000 --> 22:15.000
you can see things get updated.

22:15.000 --> 22:17.000
Stuff can animate in and out,

22:17.000 --> 22:20.000
and you can see the these kind of ghost rectangles,

22:20.000 --> 22:23.000
everything that's happening.

22:26.000 --> 22:29.000
So that's it. I've turned off that visualization of the,

22:29.000 --> 22:30.000
the dirty regions now.

22:30.000 --> 22:33.000
This is the kind of final state that we're going to get to in this,

22:33.000 --> 22:35.000
in this talk.

22:35.000 --> 22:37.000
We have animations.

22:37.000 --> 22:40.000
And everything is kind of you can see it in these bottom cues here.

22:40.000 --> 22:42.000
Everything is getting dirty,

22:42.000 --> 22:45.000
as it should without needing to do any unnecessary rewards.

22:55.000 --> 22:55.500
Cool.

22:55.500 --> 22:57.500
So one thing I thought I would mention is,

22:57.500 --> 22:59.500
over the course of writing this system,

22:59.500 --> 23:01.500
it can be kind of annoying,

23:01.500 --> 23:03.500
because you have to like boot up the operating system,

23:03.500 --> 23:04.500
and anyway,

23:04.500 --> 23:06.500
I made this mode in the compositor,

23:06.500 --> 23:09.500
where any time I would do anything like spawn a window,

23:09.500 --> 23:10.500
or like, you know,

23:10.500 --> 23:11.500
drag anything around,

23:11.500 --> 23:12.500
or move the mouse,

23:12.500 --> 23:15.500
it would dump all the actions to a file,

23:15.500 --> 23:16.500
which looks like this,

23:16.500 --> 23:17.500
it's text file.

23:17.500 --> 23:19.500
And then I made another mode in the compositor,

23:19.500 --> 23:21.500
which could kind of take that file as input,

23:21.500 --> 23:23.500
and simulate all the actions happening.

23:24.500 --> 23:26.500
And then it would generate frames,

23:26.500 --> 23:29.500
like every frame would be outputted to a file,

23:29.500 --> 23:32.500
and then strung together into an animation.

23:32.500 --> 23:34.500
And then that allowed me to kind of test that,

23:34.500 --> 23:36.500
as I was making changes to the compositor,

23:36.500 --> 23:38.500
I wasn't introducing any visual regressions

23:38.500 --> 23:40.500
and what got rendered.

23:40.500 --> 23:42.500
Because it turns out that,

23:42.500 --> 23:44.500
when you're writing a compositor like this,

23:44.500 --> 23:48.500
it can be really easy to like accidentally make the literature.

23:48.500 --> 23:50.500
So actually, over the course of making this presentation,

23:50.500 --> 23:54.500
I saved a few snapshots of kind of where things went wrong a bit.

23:54.500 --> 23:56.500
So here's one example,

23:56.500 --> 23:57.500
like,

23:57.500 --> 24:00.500
this is where it's that.

24:00.500 --> 24:04.500
It's these like rectangles showing the region that was dirty,

24:04.500 --> 24:08.500
but I wasn't properly excluding them from getting marked as dirty.

24:08.500 --> 24:09.500
So anyway,

24:09.500 --> 24:11.500
just nonsense.

24:14.500 --> 24:16.500
Oh yeah, so you get all kinds of situations like this,

24:16.500 --> 24:19.500
where you kind of forget to redraw something quite right,

24:19.500 --> 24:22.500
and often like the mouse cursor will kind of unstick it,

24:22.500 --> 24:25.500
because that's a pretty reliable redraw this.

24:25.500 --> 24:27.500
It can be kind of fun.

24:27.500 --> 24:32.500
I don't really know, oh yeah,

24:32.500 --> 24:34.500
so just another,

24:34.500 --> 24:36.500
when no drags are broken.

24:36.500 --> 24:38.500
There's a few of these.

24:40.500 --> 24:41.500
Yeah.

24:41.500 --> 24:49.500
All right.

24:49.500 --> 24:50.500
So that's it.

24:50.500 --> 24:52.500
Thanks for watching.

24:52.500 --> 24:57.500
This presentation is up at AxelOS.com slash compositor.

24:57.500 --> 25:01.500
If you want to try out any of these visualizations for yourself,

25:01.500 --> 25:06.500
and then I've left a couple links to the AxelOS GitHub project there.

25:06.500 --> 25:09.500
So yeah, thanks for your attention.

25:09.500 --> 25:12.500
Thank you.

25:16.500 --> 25:17.500
Hey.

25:17.500 --> 25:18.500
Good talk.

25:18.500 --> 25:20.500
I like all the optimizations.

25:20.500 --> 25:23.500
You did to avoid redrawing unnecessary regions.

25:23.500 --> 25:24.500
I was wondering,

25:24.500 --> 25:26.500
I guess everything is running on a single CPU,

25:26.500 --> 25:29.500
but now that systems are more multi core,

25:29.500 --> 25:32.500
but it makes sense to come up this screen

25:32.500 --> 25:34.500
and to multiple separations and have,

25:34.500 --> 25:35.500
for example,

25:35.500 --> 25:37.500
force it used for a corner each.

25:37.500 --> 25:39.500
Or would they just complicated synchronization?

25:39.500 --> 25:41.500
Or would they not try to insert just what?

25:41.500 --> 25:43.500
Would it actually be beneficial?

25:43.500 --> 25:44.500
Yeah.

25:44.500 --> 25:46.500
I think that would be quite neat.

25:46.500 --> 25:47.500
I mean,

25:47.500 --> 25:51.500
the slow part is definitely still writing to the video memory.

25:51.500 --> 25:55.500
One note about AxelOS in general.

25:55.500 --> 25:57.500
It is like SMP capable,

25:57.500 --> 25:59.500
so it can do multi core,

25:59.500 --> 26:02.500
but it's single threaded in the context of a process,

26:02.500 --> 26:04.500
which is just because I never got around

26:04.500 --> 26:06.500
and do multi threading,

26:06.500 --> 26:08.500
so maybe you have to deal with that somehow.

26:08.500 --> 26:10.500
But yeah, you could do it in multiple processes if you wanted to.

26:10.500 --> 26:12.500
As you mean, you had to come on there,

26:12.500 --> 26:14.500
is perfectly multi-strading capable.

26:14.500 --> 26:18.500
Based on your experiences writing composites put there,

26:18.500 --> 26:21.500
a book performance quite a bit.

26:21.500 --> 26:24.500
I mean, if you say the bottlenecks writing to the framework,

26:24.500 --> 26:27.500
or if you instead of having one core writing to the framework,

26:27.500 --> 26:30.500
or you have four core spreading to different regions of the framework,

26:30.500 --> 26:33.500
is the memory subsystem going to be the bottleneck or the CPUs?

26:33.500 --> 26:34.500
Yeah, I don't know.

26:34.500 --> 26:36.500
I mean, you definitely get into the problem of like tearing,

26:36.500 --> 26:38.500
because if you're having multiple processes running to it,

26:38.500 --> 26:40.500
you still need some kind of synchronizations

26:40.500 --> 26:43.500
to make sure you don't see like a midframe artifact.

26:43.500 --> 26:45.500
I would say that if we were trying to make this faster,

26:45.500 --> 26:49.500
the thing to do would definitely be to write a GPU driver,

26:49.500 --> 26:54.500
and do this with GPU acceleration instead,

26:54.500 --> 26:59.500
for the question of like making this faster,

26:59.500 --> 27:01.500
maybe do it on the GPU.

27:02.500 --> 27:03.500
I know this isn't what you asked,

27:03.500 --> 27:05.500
but I'll just kind of word vomit it anyway.

27:05.500 --> 27:10.500
I think that for writing a GPU driver in a hobby OS,

27:10.500 --> 27:12.500
like this, you basically have three options.

27:12.500 --> 27:16.500
You have target, like a really old GPU,

27:16.500 --> 27:18.500
that is kind of well known in the hobby OS space,

27:18.500 --> 27:20.500
and well documented, et cetera.

27:20.500 --> 27:25.500
Target a modern GPU, or do like a paravirtualized GPU driver,

27:25.500 --> 27:28.500
where like, I'm running this often in QMU.

27:28.500 --> 27:31.500
And you have some kind of paravirtualized system

27:31.500 --> 27:33.500
with QMU work, QMU helps,

27:33.500 --> 27:36.500
and offloads the GPU on your behalf.

27:36.500 --> 27:38.500
So I looked into the first option,

27:38.500 --> 27:41.500
writing a GPU driver for like an old,

27:41.500 --> 27:44.500
you know, well known graphics card.

27:44.500 --> 27:47.500
And what I found was that all of these cards

27:47.500 --> 27:49.500
aren't really interesting for my purposes,

27:49.500 --> 27:52.500
because I want a high resolution desktop,

27:52.500 --> 27:53.500
and all of these cards,

27:53.500 --> 27:54.500
like I forgot what it's called,

27:54.500 --> 27:57.500
like the Voodoo 3DFX something,

27:58.500 --> 28:02.500
caps out at say 800p or whatever.

28:02.500 --> 28:04.500
So I didn't want to do that.

28:04.500 --> 28:06.500
I also didn't want to write a paravirtualized driver,

28:06.500 --> 28:08.500
because that kind of gives up the illusion

28:08.500 --> 28:10.500
that this is intended for real hardware,

28:10.500 --> 28:12.500
like I often run it in QMU,

28:12.500 --> 28:14.500
but in my mind, that's just for convenience,

28:14.500 --> 28:16.500
and I don't really want to commit to having

28:16.500 --> 28:18.500
a very different rendering path

28:18.500 --> 28:21.500
in the emulated environment versus on real hardware.

28:21.500 --> 28:25.500
So that leaves writing like a GPU driver,

28:25.500 --> 28:27.500
you know, a modern GPU driver,

28:27.500 --> 28:30.500
which I think would be a totally reasonable thing

28:30.500 --> 28:31.500
for me to do when they do,

28:31.500 --> 28:34.500
but they're just quite complicated,

28:34.500 --> 28:36.500
and I never kind of got around to it.

28:36.500 --> 28:38.500
Thank you for the question.

28:38.500 --> 28:40.500
Thanks for the talk.

28:40.500 --> 28:41.500
It was amazing.

28:41.500 --> 28:42.500
Thank you.

28:42.500 --> 28:43.500
Thank you.

28:43.500 --> 28:45.500
So 25 years ago,

28:45.500 --> 28:47.500
when we started,

28:47.500 --> 28:49.500
I mean, before we started doing this,

28:49.500 --> 28:50.500
composing approach,

28:50.500 --> 28:52.500
we have been doing diagram,

28:53.500 --> 28:56.500
but we do have a lot of different rendering

28:56.500 --> 28:58.500
and good things, right?

28:58.500 --> 29:00.500
Of course, if you have a powerful GPU,

29:00.500 --> 29:02.500
there's no reason to go back to it.

29:02.500 --> 29:04.500
But if you are going somewhere under it,

29:04.500 --> 29:05.500
and what's wondering,

29:05.500 --> 29:08.500
would it be possible to somehow combine these two approaches

29:08.500 --> 29:12.500
so do the composing approach

29:12.500 --> 29:13.500
in complicated situations,

29:13.500 --> 29:16.500
and maybe switch to drivers,

29:16.500 --> 29:18.500
put thinking some, you know,

29:18.500 --> 29:20.500
one window is just changing something

29:20.500 --> 29:25.100
But in this kind of solo homegrown project,

29:25.100 --> 29:28.940
the biggest liability is me making mistakes

29:28.940 --> 29:31.860
and me having tripling on some memory

29:31.860 --> 29:34.260
or like the message subsystem isn't setting up

29:34.260 --> 29:35.420
the shared buffers properly,

29:35.420 --> 29:38.020
or I haven't clip the rectangles properly

29:38.020 --> 29:41.380
and then I get memory corruption in some random who knows.

29:41.380 --> 29:44.900
And so I definitely try to not bifurcate stuff like that,

29:44.900 --> 29:48.180
but I think it would be a valid thing to try.

29:48.180 --> 29:50.340
I guess hopefully you would get to a place

29:50.340 --> 29:52.220
where the compils of an overhead is, yeah.

29:52.220 --> 29:55.300
But if it's a partial rectangle for a framework

29:55.300 --> 29:57.220
or you'll be staying through multiple areas,

29:57.220 --> 29:58.700
like multiple, absolutely, yeah.

29:58.700 --> 30:01.740
So it's more like one mem copy per row, right?

30:01.740 --> 30:02.740
Yeah.

30:04.740 --> 30:05.500
Thank you.

30:05.500 --> 30:11.300
This resummed what I did for my talk, previous.

30:11.300 --> 30:14.060
And the question is, is VBA dead by now?

30:14.060 --> 30:17.700
You couldn't use VESAC acceleration infrastructure

30:18.260 --> 30:20.940
Yeah, so I'm getting the frame buffer from VESAC,

30:20.940 --> 30:24.140
but I'm not kind of using anything else.

30:24.140 --> 30:27.460
Because it should provide you at least beater

30:27.460 --> 30:29.300
which would speed up you.

30:29.300 --> 30:31.940
And you can avoid breaking drivers.

30:31.940 --> 30:33.060
Yeah, I haven't looked at that,

30:33.060 --> 30:34.340
but definitely worth a look.

30:34.340 --> 30:35.620
Thanks for the heads up.

30:35.620 --> 30:38.140
I think I've got a, I've got a note that I'm out of time,

30:38.140 --> 30:40.140
but thank you everyone.

30:40.580 --> 30:42.300
Thank you for joining this video,

30:42.300 --> 30:43.140
thank you.

30:48.180 --> 30:48.980
Thank you.

