WEBVTT

00:00.000 --> 00:12.960
Hello everyone, so I'm Everest, my course, I need a 5-0-0.

00:12.960 --> 00:23.920
I will show and demonstrate what we have done with Open Research Institute, so I'm part

00:23.920 --> 00:32.320
of the team. I'm not the specialist of this project, but I will try to explain this in more

00:32.320 --> 00:41.840
details. So it's about space and terrestrial communication, especially more than them.

00:41.840 --> 00:51.320
So what is Open Voice? Well, it's an open source digital voice protocol. It's designed

00:51.320 --> 01:02.200
to be a bandwidth constraint, and it could be used on terrestrial or satellite communication.

01:02.200 --> 01:13.080
This project is done by a lot of people around the world.

01:13.080 --> 01:29.720
Why we develop this kind of new modern? Well, there is no really high quality voice communication.

01:29.720 --> 01:45.320
Because we mainly want to use the FCC or the spectrum radio spectrum on until UHF, we need

01:45.320 --> 01:55.400
to be less than 25 kilohertz. So in this particular modem and modulation, we are above that,

01:55.720 --> 02:08.760
we can have more quality with that. What's important is that it's free implementable protocol,

02:08.760 --> 02:19.640
so it's completely open source. We have already demonstrated in summer 2025, and on the last meeting,

02:20.600 --> 02:32.600
we have succeeded in transmitting over the air, voice and data. So who is Open Research

02:32.600 --> 02:43.320
Institute? Well, it's an installation based in US, with volunteers, which are about, well,

02:43.400 --> 02:57.640
all around the world. There is a lot of project on this foundation. In two years ago, I used

02:57.640 --> 03:07.960
personally, DVBS2 FPG-RP, which allowed to, for example, to broadcast on Q100, which is a

03:07.960 --> 03:18.520
geostation satellite, and while I use the IP, Open Source IP FPGA-DVBS2. So it could be DVBS2,

03:18.520 --> 03:24.760
it could be, well, there is projects like AirF Bitbanger, which is more electronic cards,

03:25.480 --> 03:41.800
but it could, the Open Research Institute also use regular regulatory works, so with the FCC in US.

03:42.360 --> 03:59.720
You can find all the details on the link, which is there. So what this? Well, it built around

04:00.680 --> 04:13.160
16 kilobits, Open Source Voice Core, and with this Open Source Core, we have a super

04:13.160 --> 04:23.320
your voice quality compared to other. So the voice quality is good, and it can be mixed also

04:23.320 --> 04:34.360
with data, so packet, imagine you can chat and send audio at the same time. Generally, we have

04:34.360 --> 04:42.760
some packet protocol, and we have some audio protocol, and with this, with this, open protocol,

04:42.760 --> 04:54.120
we can mix both. So above the voice quality, I don't have the samples here, but you can

04:55.160 --> 05:06.360
reach them with this links. We compare audio quality of several codecs. There is a whole

05:06.440 --> 05:17.480
presentation here on YouTube, please don't go there, because it's a spoiler, as the

05:17.480 --> 05:37.640
presentation is quite same as this day. So then what's the architecture of this protocol?

05:37.640 --> 05:45.320
Well, we need first a 40 minutes second fixed frame lens, why, but it's a back practice for

05:45.400 --> 05:56.200
opus codec, in order to have low latency and quality and less overhead. In order to

05:56.200 --> 06:04.920
moderate it, we use the minimum shift gain modulation. So as we have 16 kilobits

06:05.480 --> 06:23.480
codec, we have now 45.2 bitrate, which is MSK sorry. So we have a toned separation of 27.1 kilobits.

06:24.200 --> 06:36.360
So we have a nertonyl bandwidth constant envelope, so the main spectrum is between 21 kilobits.

06:40.040 --> 06:49.080
Using this MSK is useful to filters and amplifiers, we don't need to have

06:49.320 --> 07:01.640
really a linear amplifier like we use in QPSK, for example. So it's easier.

07:04.920 --> 07:18.280
Now deep, a little more in the codec itself. So first, we need to have like a transport stream

07:18.360 --> 07:29.160
in DVV. Here we have some baseman data frame, okay. And the interlocutor is the interface between

07:29.160 --> 07:47.240
the humane and the baseman packet. So the interlocutor is written in Python and is getting all the

07:49.160 --> 08:00.600
input, which is the audio, the chat and even from that as. And can then send and be received by

08:00.600 --> 08:12.600
the other interlocutor. So there is two main cases. The first is quite easy, which is

08:13.480 --> 08:25.320
I go from one to one just by EIP, okay. So imagine you just have two computers or two devices.

08:26.120 --> 08:36.280
And with this baseman packet, you translate and receive, okay. Now we can also do that

08:36.920 --> 08:51.960
through RF. And then at this time, we use the modem. So there is what the interlocutor look like,

08:51.960 --> 09:06.040
well, it's a web-based interface or a common line. So you can easily chat and generate

09:06.040 --> 09:20.360
audio just with mic. So how we, well, there is several cases to connect with the modem.

09:21.400 --> 09:32.280
Well, the first one is, well, the simplex1, which say that, okay, we send the data frame,

09:33.240 --> 09:40.120
the baseman data frame, going to the modem, and then going to a radio frequency in

09:40.120 --> 09:50.440
MISK and then demodulated and going to the all the interlocutor setting. So for that, we have

09:50.440 --> 09:59.880
to configure the IP address of the modem to reach the transmitter on the receiver.

10:02.680 --> 10:13.080
Another case is like a band type or a transponder, satellite transponder. So instead of having

10:13.080 --> 10:21.800
the radio frequency directly going from one point to another, we use a satellite, which is

10:21.800 --> 10:31.880
a transponder. And then so this is the same case, but we use the transponder or the satellite

10:31.880 --> 10:48.280
to go to the receiver. And there is also some more complicated example, but which is quite interesting.

10:49.000 --> 11:03.240
Here, we use another satellite, which embedded a processing inside. So the processing

11:03.240 --> 11:17.480
is receiving the operating voice protocol here on MISK. So the app link is on MISK, but the

11:17.480 --> 11:27.640
don't link, instead of sending it in MISK, so the same baseman, we multiplex it here in DVB

11:27.640 --> 11:40.280
has 2. DVB has 2 is a normally, well, a general satellite protocol, which could be used mainly

11:40.280 --> 11:49.400
on video, well, it's the standard commercial protocol to send you all the videos.

11:50.520 --> 12:04.600
But you can also send some IP data. So the idea here is that satellite is getting all the

12:05.560 --> 12:13.720
maybe another slide like that. So the satellite received the MISK, demodulated,

12:14.440 --> 12:25.720
then re-injected in a multiplex. So you have, you can have multiple claimed multiplex to the receiver

12:25.720 --> 12:35.960
side. So for example, you can have some conference, which means that several anti-locutors send

12:36.680 --> 12:48.440
the MISK or the opening voice app to the satellite, which demodulate and translate on DVB as 2.

12:48.440 --> 12:59.160
Of course, this is received, all the participants receive the multiplex broadcast DVB as 2.

13:05.240 --> 13:14.440
So how does all it work? How does all your text and data go out of interlocutors,

13:14.440 --> 13:25.800
all the web interfaces is working? Well, you have here a microphone speaker keyboard,

13:25.800 --> 13:33.000
terminal browser, and then you have, well, one, become a line, or the web interface.

13:36.120 --> 13:39.640
As I told before, it's mainly based on Python scripts.

13:40.200 --> 13:50.360
So this is the example of the web interface and just the common line, and you see that there is

13:50.360 --> 14:04.440
a lot of common blocks on that. On the web browser, we use web circuit and going to the Python class,

14:05.080 --> 14:11.160
and on the terminal one, we have already we have a direct transmission on that.

14:17.720 --> 14:27.400
So here you can see more in data, and you can see that we can have some option of

14:27.800 --> 14:36.600
transcription, for example with Whisper, to a voice, to a text transmission.

14:41.000 --> 14:46.840
So this is on the received side of, well, it is the side,

14:49.240 --> 14:54.840
instead of the here it is the transmit side, and here's the received side.

14:55.160 --> 15:05.160
So a lot of components are in common. So what does an operating voice,

15:05.160 --> 15:18.680
based on data? So here we are coming back to the baseband, we have developed in order to go to the

15:18.680 --> 15:33.960
mode, okay? So the first one, so this is, there is several packet type, which is audio, text,

15:33.960 --> 15:45.960
control, but a feature one is data, which is any data. First of all, the operating voice here,

15:46.040 --> 15:56.440
as a station ID, which could be your call sign, for example, or a derived of the call sign.

15:59.160 --> 16:10.520
You can have a notification token, which is not a military, but just if we need that in a

16:10.520 --> 16:22.680
feature release. After that, we have a constant over a bit surfing, which means that here,

16:23.320 --> 16:36.120
so on the payload. Why that? Because when we transmit data, we have variable length of data,

16:36.680 --> 16:46.680
and this constant over a bit surfing can be used to keep track of all the data,

16:47.320 --> 17:00.040
rear-sumble packets, and can also do some surfing in order to have a constant frame.

17:00.760 --> 17:16.360
Because we have a constant frame format. So for example here, we have several

17:17.240 --> 17:37.560
civil packets, and we can see that there is the cups at each start of packet, okay? Not on the

17:37.560 --> 17:55.080
voice, but on the data. So now, we are on the audio side, okay? So we are 80 bytes of

17:55.080 --> 18:08.840
office audio at each time. We use RTPA data, which is real-time protocol, which is there to have

18:09.880 --> 18:18.040
the synchronization, and to see if we recreate a clock at the receiver side.

18:19.000 --> 18:30.040
So the audio frame is the audio payload, and the office packet and the RTPA packet.

18:30.680 --> 18:48.360
Sorry. So on the RTPA data, so we are going deeper and deeper on the protocol,

18:49.160 --> 18:57.560
and on the RTPA, we have a hash of this station ID. We have a sequence number in order to

18:58.200 --> 19:08.200
if we lose some packet, and we can then try to resumble to skip some synchronization.

19:08.600 --> 19:17.160
The timestamp is increments normally every 40 milliseconds, so if we have

19:17.800 --> 19:29.960
skip in timestamp, we know that we have lose some packets, and we have we have a failed type,

19:29.960 --> 19:38.600
which is for now the office codec, but maybe in the future we can switch to another codec.

19:38.600 --> 20:02.600
So now we are in text control message. So the payload is NUTF8F8Data's. The RTPA is not used, because

20:02.600 --> 20:14.520
we use some data, and we don't have to synchronize all that. As the text control message

20:14.520 --> 20:23.400
are variable lengths, it could be much longer than the frame, and then we use the cups

20:23.400 --> 20:39.400
to resumble all that. The UDP data is in the internal port number, which we can

20:39.720 --> 20:51.000
payload tip. So we use this to prioritize the incoming data, and we want that the voice

20:52.120 --> 21:01.400
is priority to the control, and then the text and the data. So we use this UDP

21:01.400 --> 21:11.000
either for that, and so we can prioritize all that. The UDP data is a source and destination

21:11.000 --> 21:26.600
IP, and a protocol field. So there is a first tip of data defined to transport

21:27.560 --> 21:37.400
any IP data. This is a future update. I think that we are developing right now.

21:41.640 --> 21:49.160
So now we have as we have analyzed all the protocol, the baseband protocol, how

21:49.160 --> 22:07.240
can we modulate and go into radio frequency? Well, we use civil stage before modulating in MSK.

22:07.880 --> 22:26.200
First, there is a randomization process, which is a city SDS common LFS process, then there is a

22:26.200 --> 22:38.680
forward error correction, then an interliver with 30 rows, a sync word insertion and sub-detection

22:39.240 --> 22:52.680
in order to help to demodulate it. The MSK modulation is a design which has been

22:52.840 --> 23:09.720
influenced by MSK outcomes. So when we get this process, so the frame is coming in, then we have

23:10.680 --> 23:21.000
feeful and some plug domain crossing, then we run the mice and encode in VTRB, then have an

23:21.000 --> 23:31.560
interliver, we add a sync word, and then we have the modulation king and the same on the

23:31.560 --> 23:42.680
receive path. So what is the performance and the characteristic? Well, the cutting gain is about

23:42.680 --> 23:49.080
5 dB from the VTRB decoder and we have 2 dB more because we have a soft decision.

23:49.640 --> 24:04.760
The sync word performance is quite good and has a pick to side up to 1.1, which seems to be

24:05.080 --> 24:18.360
optimal. The sync word detection has two three sold. We have one, when we try to hunting

24:19.160 --> 24:29.560
the demodulation, when we have MSK, we have the radio frequency arriving and we first and

24:29.800 --> 24:38.360
globally, to know how is some power rate and try to synchronize it. So we have first

24:41.640 --> 24:50.520
three sold for the first hunting and then as soon as we have ants, then we have another

24:50.520 --> 25:04.600
three sold to verify the sync word. For the latency, because it's important, the main

25:04.600 --> 25:16.680
latency is under the device, which means that the modem latency is quite short, but right now this

25:16.760 --> 25:27.960
is more the device, which is the PC, for example, which has 50 to 100 minutes of full frame.

25:28.520 --> 25:40.680
You have OSD days, put a shown playback queue and on the receive side, it's the same. So it's

25:40.680 --> 25:52.760
mainly some issue with the device. We can have less latency device, but right now we have

25:52.760 --> 26:04.120
so around 100 minutes again. On the latency of the modem itself, well, we work with the

26:04.680 --> 26:16.120
711.4 megats clock, which is the puto as they are maximum sample rate clock. So the

26:16.120 --> 26:28.840
transmit is at the bond 63 microsecond. On the receive side, well, we need some data, so we

26:28.840 --> 26:37.400
have, we need enough data to have a sub-decision of the VRB decoder. So it's about one

26:37.400 --> 26:53.400
millisecond, which is very short compared to the interlocutor latency. There is an implementation

26:53.400 --> 27:09.720
of the modem in C++, but we need to do that in each gear. Why? Because as soon as you have

27:11.960 --> 27:21.640
developed something in each gear, you can create an AZIC. Outwear is also fast-efficient and compact,

27:21.640 --> 27:34.680
so it's a good way to learn FPGA, which means that there is a lot of volunteers who learn

27:34.680 --> 27:41.640
at the same time on this kind of project. So it could be FPGA, it could be software.

27:42.200 --> 27:53.880
And why we use the amateur radio band? Well, because it's coordinates for the

27:54.120 --> 28:11.560
space and it's non-commentsure and we can use sub-band. As soon as it is UHF, we can experiment on that.

28:11.720 --> 28:27.240
Why MSK, not JMSK? We have a lot of even on satellite. We are not constrained by bandwidths,

28:28.280 --> 28:38.120
really, as a commercial satellite, so we can have a little more bandwidth for MSK and we have

28:38.200 --> 28:54.120
a better SNA on that. We have lower, well, the JMSK has lower side, side-lapse, but we don't really

28:54.120 --> 29:03.720
need to pack a lot of channels. Why is 16 kilobits for code? Well, it's very good for security.

29:04.200 --> 29:21.240
And going up is occupying more bandwidth and in the contrary, it's a minimum to have a good quality one.

29:22.120 --> 29:30.040
Yeah, so we use Opus, but why not codec two, which is also an open source,

29:33.320 --> 29:44.440
developed by an amateur radio hand guy. Well, the codec two is mainly focused on very low bit rate,

29:45.160 --> 29:55.320
and so for amateur is more on the HF band instead of UHF, where you don't have a lot of bandwidth,

29:55.320 --> 30:05.880
only 2.5 kilobits, bandwidths. Here it's another case, because we have plenty of bandwidths.

30:06.200 --> 30:20.920
But this codec two is also very good. There is just several aspects and we can choose one on the other.

30:24.440 --> 30:33.240
The important thing and what we want is mainly some quality on the audio.

30:36.600 --> 30:54.520
So we try to have some benchmark on that, try to comparing with other, and there is several

30:55.160 --> 30:59.640
it's not very easy to open as a comparison, but we try to do that.

31:02.040 --> 31:12.040
So we use the Pictus Cycle Hub ratio comparison for the FEC comparison interleving comparison.

31:13.000 --> 31:20.680
And so there is a lot of other metrics we can try to use.

31:24.520 --> 31:37.000
It will be published in future work. So here you have the, the component, the, is it for each

31:39.000 --> 31:49.160
codec, so open, open voice protocol and 17 p25, discharge the MR, and it's in, or use,

31:49.160 --> 32:00.440
you, you use a fusion, and you can see that, yes, we want to be open source, and so only,

32:01.720 --> 32:08.520
only few are good and especially for the vocal license,

32:10.840 --> 32:14.600
because all of the other DVSI license.

32:19.800 --> 32:35.480
And this is for the synchro lens, so 8.1 is quite good compared to the other.

32:38.280 --> 32:47.960
I'm not especially on on this metrics, so I will, I give you all the, the comparison that

32:50.120 --> 32:54.760
I'm not, I'm not very specialist on that, sorry.

32:58.440 --> 33:10.200
Here you have a comparison with all the FEC, the code rate, et cetera, and we can see if we have

33:10.360 --> 33:23.960
a FEC architecture, is a soft decision, which, which is, okay, to, well, the soft decision,

33:25.720 --> 33:32.840
increase, well, increase the quality of the decoding also.

33:33.160 --> 33:50.440
Here is the interior lever, so the interior lever is, to, you know, that to, on, if, if you have

33:50.440 --> 33:56.840
an interference on the radio frequency, we want that if there is a burst on one, we don't

33:56.840 --> 34:05.960
lose all the bits at the same time, so we try to spread it and enter it, enter it.

34:09.800 --> 34:21.560
So there is several mechanics to do that, so we use 32 row here.

34:21.720 --> 34:31.880
We are to find all this component, all these open tools, so you can have all the, this link on that.

34:34.280 --> 34:42.440
You can, and as soon as you're interesting in this project or in other, you can get involved

34:43.320 --> 34:53.320
with this link here on getting started, and we have weekly meeting with all the, all the projects.

34:56.360 --> 34:58.040
Thank you.