WEBVTT

00:00.000 --> 00:10.520
All right, let's talk at the open media room, guys.

00:10.520 --> 00:16.400
I'm going to be talking about open source video mixing with real time control and also

00:16.400 --> 00:21.400
most specifically syncing multiple videos.

00:21.400 --> 00:24.840
I'm going to start with the project that this was done with this misserver.

00:24.840 --> 00:26.880
It's source media in a certain way.

00:26.880 --> 00:31.520
There is a shared memory page in them that has three metadata.

00:31.520 --> 00:35.120
Now for each track for the audio track, the V8 track potential, multiple track, there is

00:35.120 --> 00:39.440
another page in shared memory that stores information about the track.

00:39.440 --> 00:44.520
And it has references to more pages, even more in shared memory that have roughly

00:44.520 --> 00:46.720
a minute of media data each.

00:46.720 --> 00:50.920
And what we do is that the right media also comes into one of these pages at some point

00:50.920 --> 00:52.120
that fills up.

00:52.120 --> 00:56.400
And then the other pages get deleted and we keep like a small stack of them going so

00:56.400 --> 00:57.920
to speak.

00:57.920 --> 01:02.560
This works great for media because usually you want to be able to go back a little bit

01:02.560 --> 01:03.560
stuff like that.

01:03.560 --> 01:07.720
And this lets you keep stuff in memory without needing to continuously rewrite stuff.

01:07.720 --> 01:11.480
But it doesn't work so well for the raw video.

01:11.480 --> 01:15.080
So for raw tracks to do something else, if you have a raw video track, we don't do all

01:15.080 --> 01:16.080
those extra pages.

01:16.080 --> 01:20.920
Instead, allocate a large ring buffer of raw pixel frames in the video metadata page

01:20.920 --> 01:21.920
in shared memory.

01:21.920 --> 01:26.000
And then we overwrite those in a loop.

01:26.000 --> 01:33.200
This lets you do raw tracks without needing to continuously create files in shared memory,

01:33.200 --> 01:37.680
which tends to get very inefficient at high bet rates.

01:37.680 --> 01:42.360
So with that in place, we can now do encoding or decoding to a form of raw formats and keep

01:42.360 --> 01:44.800
those in memory the same way we keep normal streams.

01:44.800 --> 01:46.880
We have some ways to do this.

01:46.880 --> 01:52.640
One of them is Ms. Program KVExec, which is literally the thing that lets you pipe a non-roll

01:52.640 --> 01:57.000
stream a metlosca format into a binary virtual choice and it expects the raw format out

01:57.000 --> 02:01.440
again in metlosca formats because you can actually send raw pixels of metlosca in case people

02:01.440 --> 02:02.440
didn't know that.

02:02.440 --> 02:08.080
We also have a version that effectively uses the top one to call FMPEG with the right parameters

02:08.080 --> 02:09.080
and do that.

02:09.080 --> 02:14.880
And then the last one, Ms. Prof. KV, is an integration with the FMPEG library, so a lip-a-v codec.

02:14.880 --> 02:18.400
And it does it more directly so that you don't have to send things over a pipe.

02:18.400 --> 02:23.080
That's the fastest one, but it requires compiling with the FMPEG libraries, which makes

02:23.080 --> 02:24.840
distributions a little trickier.

02:24.840 --> 02:31.520
We also support your reading the raw pixels directly from a webcam with a video for links.

02:31.520 --> 02:35.680
So now that we have a streaming graph format, we can do fun things with it.

02:35.680 --> 02:42.720
So we came up with a compositor process, which takes these raw video metadata buffers from

02:42.720 --> 02:46.360
shared memory and will then read from them directly.

02:46.360 --> 02:49.760
And the neat thing is that if one of the stream thing goes away, well, we simply make

02:49.760 --> 02:54.040
the pointer invalid, but we can keep rendering and not really care.

02:54.040 --> 02:57.040
If one of them falls behind, well, we just use the newest stream we have.

02:57.040 --> 03:00.320
If one of them goes ahead, well, we can theoretically go back a little bit, but at some point

03:00.320 --> 03:03.920
the ringbar for them's out and we'll get newer data and we want it, but at least we'll

03:03.920 --> 03:04.920
have video.

03:04.920 --> 03:09.480
It will place something so that's neat.

03:09.480 --> 03:15.480
The compositor receives configuration in JSON format from the control process and then writes

03:15.520 --> 03:17.160
it to another pixel buffer.

03:17.160 --> 03:21.400
With speed and can do the opposite to turn it back into something useful, like S6X4,

03:21.400 --> 03:23.640
ATPC, everyone, whatever.

03:23.640 --> 03:27.080
And it lets us create cool little things like this with multiple feeds in a grid, but

03:27.080 --> 03:28.080
I don't have to be in a grid.

03:28.080 --> 03:30.680
You can put them in any kind of free form control.

03:30.680 --> 03:32.480
This is what the next slide should be about.

03:32.480 --> 03:33.480
Yeah.

03:33.480 --> 03:34.800
We also have a fancy weapon to face.

03:34.800 --> 03:37.800
I know it's a little small, sorry.

03:37.800 --> 03:42.560
There are bigger versions if you install the software.

03:42.600 --> 03:45.920
They're in the top you can select what sources you want, assuming that they're in

03:45.920 --> 03:48.000
a well format, which this thing can do for you as well.

03:48.000 --> 03:51.760
You can pick if you want a grid, you want a free style if you want, like one of those

03:51.760 --> 03:56.560
old versions, security camera views, and then you can make it happen.

03:56.560 --> 03:59.440
As a little trick, we also added in support for it.

03:59.440 --> 04:02.800
It's hard to see here, but we can actually put titles in the videos, so you can actually

04:02.800 --> 04:04.960
see what they're named or something like that.

04:04.960 --> 04:12.520
You can put your own text in there, we too can open the main IBM phones to do that.

04:12.520 --> 04:17.000
And the neat thing is that if you change this configuration in the weapon interface or using

04:17.000 --> 04:21.360
JSON yourself at API calls, it updates and he'll time without anything going down.

04:21.360 --> 04:28.040
It immediately moves on to the next one, which brings me to the next problem, that none of

04:28.040 --> 04:29.040
that is in sync.

04:29.040 --> 04:32.000
Like I said, it takes the last frame for everything, and everyone can take the last

04:32.000 --> 04:34.840
frame for everything, but it doesn't mean that they are synchronized to each other.

04:34.840 --> 04:39.840
If you have two cameras in the one thing or three or four cameras, there will be a slight

04:39.840 --> 04:44.200
decync between them, because some cameras are faster, some cameras are slower.

04:44.200 --> 04:47.400
Maybe one of them is coming over the network, and the others are not.

04:47.400 --> 04:51.400
Maybe one is going over the different media protocol than the others, and it's not fun.

04:51.400 --> 04:52.400
So how do you solve this?

04:52.400 --> 04:54.680
Does anyone have any ideas?

04:54.680 --> 04:55.680
No?

04:55.680 --> 04:57.680
Okay, well, this is the solution.

04:57.680 --> 05:00.040
This looks really stupid, it's going to this.

05:00.040 --> 05:05.600
This is a QR code that basically holds a UTC time in milliseconds.

05:05.600 --> 05:07.160
It's just a unique time.

05:07.160 --> 05:10.440
Finally, encode is so that it becomes a small code as possible.

05:10.440 --> 05:12.840
You can change that frame right at me, you can actually do this in a real time.

05:12.840 --> 05:19.080
You can make this go faster, you can make it go slower, and each of these codes contains

05:19.080 --> 05:20.080
the time code.

05:20.080 --> 05:23.160
That's welcome to topics you start for debugging, you don't actually need it.

05:23.160 --> 05:27.240
What we did is that if the compositor detects in one of the feeds this code, it knows

05:27.240 --> 05:28.240
ah.

05:28.240 --> 05:31.400
When I see a frame with the code in it, that is the time that that frame should be displayed

05:31.400 --> 05:34.080
at or is intended to be displayed at.

05:34.080 --> 05:37.520
It then looks at all the codes that we see in all of the feeds.

05:37.520 --> 05:42.240
Takes the slowest of them, slows them all down to that pace, and boom, we're in sync.

05:42.240 --> 05:45.120
So what does that look like?

05:45.120 --> 05:47.880
Well, I have clearly recorded the demo video.

05:47.880 --> 05:51.360
I would have loved to do it here in a cool time, but the setup alone would take 15 minutes,

05:51.360 --> 05:53.560
so that wasn't going to happen.

05:53.560 --> 05:58.320
So here is a little demo video that one of my colleagues recorded, that is sitting

05:58.320 --> 06:00.040
down the back.

06:00.040 --> 06:04.440
So this is Miss Surface Interface, for people who are familiar with Luke's familiar, we're

06:04.440 --> 06:10.320
showing off here that we have a camera feed that it is decoding to raw, or I think that

06:10.320 --> 06:16.360
it is going to be set up to be decoding to, yeah, we're here for adding a decoder.

06:16.360 --> 06:21.080
And then the raw pixels are available for putting in the compositor, so here we have

06:21.080 --> 06:25.480
a compositor set up currently with three feeds, and we're going to show off that's all

06:25.480 --> 06:28.120
of them show the same thing, but it's going to be from angles, and that they're not

06:28.120 --> 06:29.120
in sync.

06:29.120 --> 06:33.040
He's going to a clef in his hand, and you'll see that they're all not in sync at all.

06:33.040 --> 06:36.400
I mean, they're close, but they're not quite there.

06:36.400 --> 06:40.880
So then the next step is we're going to show that code on all the feeds.

06:40.880 --> 06:45.880
Yeah, he's quick showing off how that interface works, that's not really important, not

06:45.880 --> 06:49.240
right now anyway.

06:49.240 --> 06:53.760
So we're going to fill up the QR code generator, and we made that a progressive web app,

06:53.760 --> 06:57.280
so you can use loader on your phone and install it as like, and a little template icon

06:57.280 --> 06:58.280
that we just fill up.

06:58.280 --> 07:01.680
It doesn't require internet, because it's all local.

07:01.680 --> 07:05.000
So you show that to the feeds, and you'll see that at the top of every per debugging

07:05.000 --> 07:08.400
in the phone, so at the top you can see some text, I hope it's readable probably, not because

07:08.400 --> 07:09.400
it's tiny.

07:09.400 --> 07:12.240
It's currently says it is guessing the time, based on the Seafetime.

07:12.240 --> 07:15.520
And as soon as I've managed to scan that code, which, depending on your camera, might

07:15.520 --> 07:20.680
take a while to get it not blurry, everyone has experience scanning QR codes, that's, but

07:20.680 --> 07:22.880
once it becomes not blurry, there we go.

07:22.920 --> 07:25.520
It has now synced it by QR code, so it knows that the time is seen.

07:25.520 --> 07:28.240
You'll see that the clock that is running there is also synchronized to the clock that

07:28.240 --> 07:29.920
we see on the phone itself.

07:29.920 --> 07:36.200
And we do that in all three of them, and once all three of them have the time codes, then

07:36.200 --> 07:38.480
they'll hopefully be in sync.

07:38.480 --> 07:45.400
I'm not that it'll be, because it's pretty recorded.

07:45.400 --> 07:49.960
And you can already, if you are able to see this, it's hard to tell because there's

07:49.960 --> 07:53.320
nothing moving, those two are already in sync with each other now, and the third one

07:53.320 --> 07:56.320
will join them as soon as the code is scanned.

07:56.320 --> 07:59.840
Of course, you might not have the ability to get all your feeds in sync, and if you don't

07:59.840 --> 08:05.800
scan a code and have the feeds, that only used to be latest frame best effort.

08:05.800 --> 08:09.200
But now we have all of them, and now he's going to do the hand clap again, and we'll see

08:09.200 --> 08:17.640
that it is much more in sync.

08:17.640 --> 08:18.640
There we go.

08:18.640 --> 08:20.120
I'll sync up.

08:20.120 --> 08:25.240
And now, for bonus points, we're going to add a webcam to the multi-fueler, which will

08:25.240 --> 08:28.680
heal time update it.

08:28.680 --> 08:32.100
And here you can see some of the interfaces well, how you can position it, like with

08:32.100 --> 08:38.480
a nice dragon drop thing, and you can still yep, and then go back to the feed, and it

08:38.480 --> 08:42.280
will be right there, normally it doesn't take that long to load, but it actually shut down

08:42.280 --> 08:46.400
because no one was watching it.

08:46.400 --> 08:49.000
And you'll see that the other three are still synced, because they didn't forget their

08:49.000 --> 08:53.840
time codes.

08:53.840 --> 08:57.600
And now, my colleague is realizing he forgot to put the debug info on that one, so he's

08:57.600 --> 09:04.840
going to turn the debug info on.

09:04.840 --> 09:10.360
There we go.

09:10.360 --> 09:13.720
And then once that is enabled, we can see that it's also in sync.

09:13.720 --> 09:18.040
You can see it's already set sync by QR codes.

09:18.040 --> 09:22.440
And now we're going to do one final hand clap with the fun fact that this webcam is terrible.

09:22.440 --> 09:28.400
You'll see the frame rate is absolutely, absolutely, it's telequality and still, even though

09:28.400 --> 09:32.120
the frame rate is wildly different from the others, it is synced pretty well, as well as

09:32.120 --> 09:36.360
we can do with that low frame rate.

09:36.360 --> 09:40.840
So yeah, that's the demo video.

09:40.840 --> 09:46.440
When can you use this right now, two days ago, we released 3.10 of Mr. and it includes this

09:46.440 --> 09:47.440
feature.

09:47.440 --> 09:48.920
It's public domain code.

09:48.920 --> 09:54.640
So anyone can use it, anyone can steal it, you can reuse our sync codes, we don't care.

09:54.640 --> 09:56.640
It'll work.

09:56.640 --> 10:00.920
This brings me to the answer, which is, are there any questions?

10:01.840 --> 10:02.840
Yeah, I go ahead.

10:02.840 --> 10:11.520
So you've shown us how you synchronize the clocks on the various media sources at one

10:11.520 --> 10:12.520
question.

10:12.520 --> 10:15.720
How do you deal with the clocks drifting up to that?

10:15.720 --> 10:17.040
That's a good question.

10:17.040 --> 10:23.160
The question is, we've shown how the clocks are synced but how do you do with drift afterwards?

10:23.160 --> 10:24.960
Right now, we don't.

10:24.960 --> 10:29.240
We found that in reality, most video sources tend to not drift that much.

10:29.240 --> 10:34.680
It will be like maybe a few seconds per day, which for most purposes if you're doing

10:34.680 --> 10:37.960
a broadcast of an hour or two, you will not notice noticeable drift.

10:37.960 --> 10:42.440
If it does drift, you can simply show the QR code together and it will update its sync.

10:42.440 --> 10:44.680
But it's not ideal, but it works.

10:44.680 --> 10:48.720
So if you have like a particularly bad karma that tends to drift a lot, just show the

10:48.720 --> 10:52.760
code to it after 30 minutes or something and you'll stay in sync.

10:52.760 --> 10:56.000
Of course, you can also put it in the feed somewhere in a corner or something and then

10:56.000 --> 10:57.000
normal mode.

10:57.000 --> 10:58.000
Hopefully.

10:58.760 --> 11:03.880
Yes, but the next time you're synced, you will have to be smaller than some of the streams

11:03.880 --> 11:06.480
and all you have to synchronize the code together.

11:06.480 --> 11:10.680
So the follow-up question is, want that, stop some of the schemes?

11:10.680 --> 11:11.680
Yes.

11:11.680 --> 11:15.920
What we actually do is we rewind all the other schemes if necessary immediately.

11:15.920 --> 11:19.480
So you'll see some of them repeat the small section to go and think.

11:19.480 --> 11:22.680
But it will keep going and the frame rate of the output of stay steady.

11:22.680 --> 11:24.920
So that's pretty nice.

11:24.920 --> 11:25.680
Anyone else?

11:25.680 --> 11:26.480
Yes, go ahead.

11:26.720 --> 11:28.880
Do you vote on some piranhas?

11:28.880 --> 11:35.320
So if you think your code is worth it in the latest event, so are you going to sign this

11:35.320 --> 11:36.320
order?

11:36.320 --> 11:37.600
Ah, that's a good question.

11:37.600 --> 11:39.320
The question is, are the QR code signed?

11:39.320 --> 11:42.760
The answer is, no, not in this version.

11:42.760 --> 11:47.480
We did think of that and the plan is to put a prefix in front of the code that you can

11:47.480 --> 11:51.120
determine yourself and only if that prefix is visible, then it would work.

11:51.120 --> 11:53.040
That's not a quite signing, of course.

11:53.040 --> 11:57.880
The idea is that you would show these codes to the camera before the stream goes live.

11:57.880 --> 12:00.160
So other people don't see your prefix.

12:00.160 --> 12:05.160
And then theoretically, if someone is walking around with a sign that, you know, in 1970 or

12:05.160 --> 12:09.200
something, then it will not ruin your life.

12:09.200 --> 12:10.880
Of course, we could add signing in the future.

12:10.880 --> 12:12.520
This is just V1, you know?

12:12.520 --> 12:16.760
And if people say that this is going to be a problem, then yeah, we'll definitely add

12:16.760 --> 12:17.760
signing to it.

12:17.760 --> 12:22.920
And I bet that if this becomes a popular technique, we might have to.

12:22.920 --> 12:25.680
Yeah, we don't think of that as well.

12:25.680 --> 12:26.680
Anyone else?

12:26.680 --> 12:27.680
Yes, go ahead.

12:27.680 --> 12:32.160
Have you ever said that a sign can buy audio, which is coming like, obviously?

12:32.160 --> 12:35.640
Yeah, we thought about thinking, sorry, let me repeat the question.

12:35.640 --> 12:37.920
Have you considered thinking by audio?

12:37.920 --> 12:40.040
Yes, we thought about that.

12:40.040 --> 12:45.000
For now, we figured that video was the important thing to have in sync in this case.

12:45.000 --> 12:47.760
We want to add more audio support to the composer as well.

12:47.760 --> 12:51.840
Right now, we can copy and audio feed from one of the streams that we can, for example,

12:51.840 --> 12:52.840
mix them together yet.

12:52.840 --> 12:57.200
So if you have an audio and two of them, you want to mix them, you can do that yet.

12:57.200 --> 12:59.760
That's coming, but not quite.

12:59.760 --> 13:02.720
And at that point, I imagine it will become more important as well, especially if you have

13:02.720 --> 13:06.440
like, are you only streams that you want to keep in sync with the video?

13:06.440 --> 13:10.440
And we were thinking of putting, so some frequency signals in there that effectively

13:10.440 --> 13:13.920
do the same thing as the QR code, but then with audio.

13:13.920 --> 13:19.360
But if you have cool ideas, then share them.

13:19.360 --> 13:20.360
Anyone else?

13:20.360 --> 13:21.360
Yes, go ahead.

13:21.520 --> 13:25.160
Are there a visible time that you are allowed to see?

13:25.160 --> 13:26.160
What was that, sorry?

13:26.160 --> 13:32.520
I already said on the amount of time that you are allowed to see, because if you are using

13:32.520 --> 13:33.520
the memory somehow.

13:33.520 --> 13:37.120
Hi, so is there a limit to how much these sync you can have?

13:37.120 --> 13:43.200
Currently, we store, I think, from what I've had 200 frames for every row of it, we

13:43.200 --> 13:49.480
figured that was a nice in between of not having two enough frames to sync within a reasonable

13:49.480 --> 13:53.480
time, and also not having so many because you are not a wrap all the time.

13:53.480 --> 13:57.080
That is configurable as a compile option.

13:57.080 --> 14:00.480
So you have completely compiled the app if you want to change it, because that Vingberfer

14:00.480 --> 14:03.280
was not made to be dynamically resized.

14:03.280 --> 14:08.120
200 frames gives you depending on your frame rate a few seconds, which tends to be enough,

14:08.120 --> 14:12.200
but if you're really far behind, then you'll need more.

14:12.200 --> 14:16.520
What we're planning to maybe do is integrate the decoder more, because we could theoretically

14:16.520 --> 14:20.680
tell it, hey, when decoding, fall back this many seconds before you decode it, and then the

14:20.680 --> 14:24.240
Berfer can stay small, and we just slow down the decode a little bit.

14:24.240 --> 14:27.440
But we haven't had to do that yet, we haven't found cameras that are more than three or four

14:27.440 --> 14:29.080
seconds behind.

14:29.080 --> 14:32.440
So hopefully we don't need to do that, but maybe in future.

14:32.440 --> 14:44.920
Yeah, go ahead.

14:44.920 --> 14:45.920
That is a good question.

14:45.920 --> 14:49.200
We'll be able to detect, of course, if you see the code again, but other than that, it will

14:49.200 --> 14:52.000
be pretty hard.

14:52.000 --> 14:55.480
So there are some ways that you can detect whether a camera is in sync or not, like if

14:55.480 --> 14:58.400
there's a big change in the light, you may have noticed that when the guy clapped in his

14:58.400 --> 15:02.760
hands, we also turn to light on and off to make it more obvious where that point was.

15:02.760 --> 15:06.920
That works great, but you can't rely on the time that the light changes intensity that

15:06.920 --> 15:08.480
that means you want to check sync, right?

15:08.480 --> 15:13.040
It could also be that maybe a lamp was aimed at a camera at some point, but not at the

15:13.040 --> 15:14.040
other cameras.

15:14.040 --> 15:15.800
So that's not reliable.

15:15.800 --> 15:20.720
We would love to do something that checks on the fly continuously, but we have not

15:20.720 --> 15:23.600
been able to come up with a liable way.

15:23.600 --> 15:28.200
If you know one, let us know.

15:28.200 --> 15:30.200
All right, number questions then?

15:30.200 --> 15:31.200
Cool.

15:31.200 --> 15:32.200
Well done.

15:32.200 --> 15:33.200
What was the end of the talk?

15:33.200 --> 15:34.800
I'm the end of the open media there for them.

15:34.800 --> 15:35.840
I hope y'all have a good time.

