WEBVTT

00:00.000 --> 00:16.400
Okay, hi, okay, so yeah, my name is the Phoncelivator, and today I'm going to show you

00:16.400 --> 00:23.520
risk-5 extension parting without the boring part. And so, first of all, like I've been working

00:23.520 --> 00:30.080
in hypervisors, and I also work at art-5 CPUs at synopsis, and so I've been mostly working

00:30.080 --> 00:35.280
on operating systems and also some part of the toolchain, but most importantly for this talk,

00:35.280 --> 00:41.760
I sit next to a GCC engineer and I hear him cry a lot about how this specs change. So,

00:41.760 --> 00:48.240
I try to make his life a little bit easier, and I'll try to show you how. So, so today we're going

00:48.240 --> 00:53.440
to go through what are the challenges that the risk-5 ecosystem currently has, and it's specific

00:53.440 --> 00:59.200
location, then we're going to showcase the risk-5 unified EV, which is how we're trying to solve this,

00:59.200 --> 01:05.920
and then we're going to show some practical use cases of binutiles and chemo. So, starting with

01:05.920 --> 01:10.720
these challenges with the risk-5 specification, I hope that by the end of this section we agree

01:10.720 --> 01:16.240
that these are growth challenges, but still they exist and they're very annoying to deal with.

01:16.240 --> 01:22.720
So, currently the risk-5 ecosystem relies on multiple disconnected specifications. The most

01:22.720 --> 01:28.320
known ones are the isomanial, obviously, and this is what most people look into, but there's others,

01:28.320 --> 01:33.120
like the assembly manual is the only source for pseudo-instructions, and then there's even

01:33.120 --> 01:38.480
risk-5 upcodes, which has all the machine readable description of instructions, and then there's

01:38.480 --> 01:43.680
sale that has like these foremost specifications that you turn into when I assess, and it's also

01:43.680 --> 01:49.040
in some way part of the specification. And more than this, when you're developing, you'll need to

01:49.040 --> 01:53.920
look specific things about other external things that are not the ISA, and these are interrupt

01:53.920 --> 01:59.600
controllers, these are MMUs, and all these specs, leaving different repositories that you'll

01:59.600 --> 02:04.560
need to kind of search and try to dig your information on your own, which is very annoying.

02:04.560 --> 02:10.320
And when I was in my journey of doing this, I realized that trying to connect this and create

02:10.320 --> 02:15.520
a mental model was for me very, very hard, and I was very, very confused. And so,

02:16.080 --> 02:21.680
and so I started to dig a little bit more, like, why is it this so wrong, and why is it so hard

02:21.680 --> 02:28.480
for me to grasp all these different manuals, and what not? And so, I realized, well, the ISA manual

02:28.480 --> 02:35.280
is just like one big chunk of text, text, that's ASKIDOK, and that also has many other problems,

02:35.280 --> 02:39.280
and just searching information. It also has problems of, like, how do you verify that

02:39.280 --> 02:44.720
information is actually correct, because it's just text, right? There's no way to verify anything,

02:44.800 --> 02:50.240
and then, how do you even diversify, because there's PRMs, there's TRMs that need to be made out of this,

02:50.240 --> 02:56.640
and are you going to copy paste the spec? Well, that's what people do, but I think that's not the

02:56.640 --> 03:02.320
best way of doing this. And so, then there's RIS5, OPCOT, the STC's machine readable, and it has

03:02.320 --> 03:07.360
all the instructions and pseudo instructions, or most of the pseudo instruction, but in reality,

03:07.360 --> 03:12.800
it's not very complete, it doesn't allow all pseudo instructions, the format itself doesn't allow,

03:13.600 --> 03:18.400
and then RIS5 has this thing called implementation options, in which some parts of the spec

03:18.400 --> 03:23.840
change depending on how you want to implement it, and RIS5 OPCOT doesn't allow you to describe this.

03:24.480 --> 03:29.680
And so, yeah, you can really rely on these for any testing that are not including verification,

03:29.680 --> 03:35.680
and so, it's not super complete. And I decided to understand, like, why is it like this?

03:35.680 --> 03:40.960
RIS5 is growing and growing so hard, and so, why do we face so many problems in this specification,

03:40.960 --> 03:47.840
which supposedly is the first step, the first stone, right? And reality is, RIS5 was first,

03:47.840 --> 03:53.840
the AI's manual was first published in 2015, and it had five extensions. Now, 11 years later,

03:53.840 --> 04:00.320
it has more than 200, so there's a big size difference, and even RIS5 OPCOT, it was created

04:00.320 --> 04:05.600
even five years before the spec was first published. You can even see here the first committee in

04:05.600 --> 04:13.200
July 19, and it was refactored, but not fully refactored, and so it also went from five to 200 extensions.

04:15.200 --> 04:19.760
And what I see with this is that the infrastructure in which we are writing the specification

04:19.760 --> 04:26.800
was completely outgrown by the size of RIS5, from five to 200 extensions, and then to all the other extensions.

04:26.880 --> 04:36.400
So to summarize this, so the information exists in disconnected repos, and it's very hard to find

04:36.400 --> 04:42.160
it, sources are sometimes not official, like the assembly manual is not its community

04:42.160 --> 04:50.080
maintained, so it's not an official ratified manual of RIS5, and the data is completely reliable,

04:50.080 --> 04:55.600
and there's different sources for the same data. And then sources of truth don't encapsulate all the

04:56.560 --> 05:00.480
information. And then I took a look at the tool side, but I do in my day-to-day, and I realize,

05:00.480 --> 05:05.600
well, we have all these projects in which we use RIS5 data, and this includes like the kernel,

05:05.600 --> 05:13.280
kimo, camp5, the Knutoolshain, go Zephyr, and well, well, VMS in the afternoon, right?

05:13.280 --> 05:19.200
But all these projects have the same information, and what I realized is, if in one side, we have the

05:19.200 --> 05:26.720
information, if in one side, we have the information, and that's very hard to find this information.

05:26.720 --> 05:31.680
In the other side, we have all these projects that have the same information. So thinking about it,

05:31.680 --> 05:35.920
someone from each of these projects has to go through the same pain of looking at that

05:35.920 --> 05:41.520
information and losing a lot of hours in their lives, trying to figure it to their own project.

05:41.520 --> 05:46.240
So let's say the kernel, or let's say even the goal-language, and then someone from the Knutoolshain,

05:46.240 --> 05:52.080
will go and have to do exactly the same path on the same information. And I don't think this is quite

05:52.080 --> 05:57.360
optimal, and I got an idea. What if instead of doing this all manually for all the information

05:57.360 --> 06:02.560
and all the projects, we just picked up this information and got it in a central place,

06:02.560 --> 06:06.800
well, with some refurbishing of the information so that it would be easier, and then we will

06:06.800 --> 06:12.640
create some pipeline that will allow to output the information in the in the format that all these

06:12.640 --> 06:20.240
projects specify. And this is one of the reasons that UDB appeared. So what is the risk-5 unified

06:20.240 --> 06:25.200
database? I know database is not the best title, but well, trust me, this is just a bunch of

06:25.200 --> 06:33.280
YAML files, it's very, very, very simple and actually quite nice to use. So the risk-5 unified

06:33.280 --> 06:38.400
database is, first of all, we're golden source of truth, that is, there is five specification

06:38.400 --> 06:42.960
and all of the specification, and then even a custom overlay, if you do some custom extensions,

06:42.960 --> 06:48.480
or if you implement something extra to the ISA, then there's some tools that don't really matter,

06:48.480 --> 06:52.000
but then there's the use cases, right? We can generate out of these documents, as I said,

06:52.640 --> 06:57.360
different kinds of documents, but also different kinds of information for tools, and even for

06:57.360 --> 07:01.840
AI and well, I'll showcase a little bit on the end of this presentation, what I mean by that. But so,

07:03.520 --> 07:08.000
to understand what kind of information we have in the UDB, I put it in just a simple

07:08.000 --> 07:15.040
table, so we have instructions and the CSRs, for those of you who are not with five native CSRs,

07:15.040 --> 07:20.960
are control and status registers, they are just normal registers, you interact with the CPU to see

07:20.960 --> 07:26.800
what's going on, and then there's profiles which are subsets of the ISA that people agree on to

07:26.800 --> 07:31.600
have software compatibility across implementations, and then there's even implementation specific

07:31.600 --> 07:36.800
attributes that you can specify for a different CPU, so different CPUs all have their own

07:36.880 --> 07:43.840
perics and whatnot, and this is what this is about. And so, I'll showcase what this is described

07:43.840 --> 07:49.360
for, I'll just showcase these instructions, and this is just an YAMO file, so it has these few

07:49.360 --> 07:54.480
parameters which describe an instruction, which are basically the kind, the 9-1 name, the script

07:54.480 --> 07:59.920
and of what this instruction is, this is just a simple ad, right? There's the assembly, and then there's

07:59.920 --> 08:06.560
the encoding variables for it, and in what location they're in, in the string, and then you

08:06.560 --> 08:11.760
even have this operation and sale, which is like the semantics of the instruction so that you can

08:11.760 --> 08:17.920
generate something that is semantically valuable out of UDB, and so this is what UDB is,

08:17.920 --> 08:23.680
it's just a bunch of YAMO files together with some tools, right? And what are the use cases we're

08:23.680 --> 08:28.400
targeting with this? Well, first of all, we have the docs, as I've said, then there's even

08:28.400 --> 08:34.160
certifications, so like if you want to ensure your CPU is like very much what you're saying,

08:34.160 --> 08:39.520
it is, you can generate automatic tests out of here to run it, to run them, and then there's

08:39.520 --> 08:44.000
tools, right? Compiler, Z-buggers, they all, simulators, they all have this information, this bunch

08:44.000 --> 08:49.840
of the fine files, and so we can just bring up the, bring this data and just convert it in what,

08:49.840 --> 08:54.000
what's your favorite format or the format you need it for, for example, B-neutiles, and you'll

08:54.000 --> 08:59.840
just generated automatically and save hopefully a lot of time. And so yeah, and yeah, again, right?

08:59.840 --> 09:05.600
But going on to UDB into development, I did a proof of concept for B-neutiles, and someone

09:05.600 --> 09:11.920
also did this for Kimu, and I'm going to showcase them both now. So how does this, well, first

09:11.920 --> 09:17.920
why, right? I spoke about the, the GCC engineer I hear complaining about his life a lot, and so

09:18.640 --> 09:24.960
this is kind of the first reason I did it. So we have in Arc5 CPU cars, we have DSP extensions,

09:24.960 --> 09:29.440
and this have lots and lots of instructions, I think definitely more than 100, and this

09:29.440 --> 09:36.000
keeps changing. And so the first, uh, a bringing of these in B-neutiles look like this. So it was

09:36.000 --> 09:42.320
4,000 lines of declarative code, just literally a bunch of defines and just some very small tests,

09:42.320 --> 09:46.560
which I mean, in my opinion, this looks like a very poor guy that went through a very

09:46.560 --> 09:53.600
set afternoon to create them, and I think he will agree with me. And so, and so, how can we automate

09:53.600 --> 09:58.480
this boring part? Because those are not, that is just the start, but then the spec will change,

09:58.560 --> 10:02.000
you'll have to change one line, and then the tests, and then another line, and then the tests,

10:02.000 --> 10:07.280
and this is really just like not the work we will prefer to be doing, right? So if we change the

10:07.280 --> 10:11.680
DB, maybe we can change automatically being new deals without going through this process. And so,

10:12.400 --> 10:16.960
what you're trying to achieve that is exactly that. We're trying to achieve an A to developers,

10:16.960 --> 10:22.400
and also to ensure that no one will lose their high site as well, those 4,000 declarative lines

10:23.360 --> 10:27.520
definitely suggest. But we don't try to achieve like full replication of

10:27.520 --> 10:32.000
being new deals, that's like very complex, there's designed decisions made by humans, and those

10:32.000 --> 10:36.720
can't be standardized. So this is not an end-to-end solution, it's really innate so that you can

10:36.720 --> 10:43.200
save definitely some time. And so, what we generate, well, the first file we generate is

10:43.200 --> 10:49.280
risk5obcodes.c, which as you may see, and if you remember that Yamofile I showed, there's a lot of,

10:50.240 --> 10:54.960
there's a lot of these parameters. They're around the Yamofile, and they're just direct correlations

10:54.960 --> 10:59.440
between these two files, so like the name is definitely there. The class is definitely reliant on

10:59.440 --> 11:04.880
the extension we are using, but this is a tricky one actually. Operants are there, we just need to

11:04.880 --> 11:10.240
map them to binitials operants, and then match and mask are definitely in that encoding in Yamof,

11:10.240 --> 11:15.600
and then there's this well-standard match, optical, and being for, well, this is binitials specific,

11:15.600 --> 11:20.640
so it's up to the generation, but as you may see, the growth of this is already in the database.

11:20.640 --> 11:26.800
So, there's also like risk5obcodes.ht, where you define much mask and class, we do this,

11:26.800 --> 11:31.200
and then there's even gas tests, right? We have, well, tested, doesn't test that file. It's

11:31.200 --> 11:36.080
all you can have, but we can generate these tests by manipulating how the operants are

11:36.080 --> 11:42.560
valid or invalid, basically knowing what these operants are. And so here you have, you can have a

11:42.560 --> 11:49.200
look at risk5obcodes.c that was generated out of UDB, so this is for the I, each this is based

11:49.200 --> 11:55.360
risk5 instruction site, like the smaller amount of instructions, this is just like your basic

11:55.360 --> 12:03.440
branches and ads and ALU ops, and so you may see that the file itself is looks very much as it

12:03.440 --> 12:09.200
should look like. There is like here one no match in, this is because, well, binitials operants are a bit

12:09.200 --> 12:14.560
tricky to match, and in this specific cases, I decided to be very specific that we're not

12:14.560 --> 12:21.840
matching what we should be matching, and so you should manually change the generation to map one

12:21.840 --> 12:28.080
to the other. And this is what risk5obcodes.c looks like. This is what risk5obcodes.ht looks like,

12:28.080 --> 12:35.280
it's just some basic defines for the same file, this is a very reduced size of it, and then this

12:35.360 --> 12:39.200
is what tests look like. Do you have the tests that are valid, tests that are invalid,

12:39.200 --> 12:47.520
and this were all automatically generated out of that original database. And so on this binitials approach,

12:47.520 --> 12:54.720
I do need help from you guys, because I mean this problem in, what comes first, the egg, or the

12:54.720 --> 12:59.600
she can in which I've developed this tool, but I haven't fully tested it, and I'll probably

12:59.680 --> 13:05.680
will, if you give me another year or so, and that I actually have enough time to fully test

13:05.680 --> 13:10.880
and build binitials and new extensions, but I actually need people that can help me on this, because,

13:10.880 --> 13:16.480
well, I'm just one guy. So if you're interested in parting extensions to binitials, please try this,

13:16.480 --> 13:23.040
and let me know how it goes, so that we can continue to iterate, right? And so next comes this

13:23.040 --> 13:29.520
schema approach. This was not done by me, this was presented by Rev.nge Labs, I think in collaboration

13:29.520 --> 13:35.680
with Qualcomm by Anton Ewanson, and I'll send you a different featheriko, if my tell is correct,

13:35.680 --> 13:40.960
and so what they did was an emulator in the loop design. They described these extensions inside

13:40.960 --> 13:46.560
UDB, and then they automatically generated the keymo definition of all these extensions,

13:46.560 --> 13:50.560
so that they could rapid prototype when defining an extension and go from like,

13:50.560 --> 13:56.640
architect to functional model, and then back and forth very closely. And this is quite interesting.

13:56.640 --> 14:01.600
There is a pipeline for downstream case, and like all these tests and semantics, and

14:02.800 --> 14:07.440
and tiny coding keymo, these are all generated out of UDB, and if you're interested in this,

14:07.440 --> 14:11.360
please definitely check it. Their work was very, very good on this manner.

14:12.400 --> 14:18.400
And so what other use cases can we have with UDB? So first certification and architectural tests,

14:18.720 --> 14:24.320
I thought I talked about it earlier, then we have like a fly assessed that integrates with

14:24.320 --> 14:28.640
re-node as well, so this is different than the keymo approach, but this is also an instruction

14:28.640 --> 14:34.160
set simulator out of the database, automatically generated, and then we're using it like to

14:34.160 --> 14:38.960
for documentation inside synopsis, for like programmers, reference manual, technical reference manual,

14:38.960 --> 14:42.400
and so on. And so,

14:42.480 --> 14:50.720
do you mention art here, the successor of ARC, and it's now a risk file, yeah?

14:52.960 --> 14:56.720
And so yeah, so all other downstream use cases, so people who are using risk file,

14:56.720 --> 15:03.760
upcodes already for creating decoders in hardware and other things, so we can also do that out of UDB.

15:05.040 --> 15:10.000
And so yeah, UDB is now being collaboratively developed. I mean, this slide was a bit

15:10.080 --> 15:14.400
of some of these companies were acquired, but they were all working on this,

15:14.400 --> 15:18.240
and there were a lot of mentees working on it, and now we're official risk file project,

15:18.240 --> 15:23.600
we are on risk file GitHub, yeah. So check us, yeah, if you're interested, come see the repo,

15:23.600 --> 15:29.280
and before we go like, if you're interested in how this UDB and think works and how it can be used

15:29.280 --> 15:35.120
for AI, I created this small demo that is like a chatbot with UDB, and if you want to ask

15:35.120 --> 15:39.600
it some questions, it will give you a risk file information. Thanks.

15:48.640 --> 15:52.880
Well, we have five minutes for questions if you have any. Hey.

15:52.880 --> 16:08.560
Well, yes, I can repeat the question. So the question was, we have had a lot of this with

16:08.560 --> 16:14.800
C-Gen, what can this do that C-Gen can do? So I haven't been around for that long, and I have never

16:14.800 --> 16:21.040
used C-Gen. There's definitely a lot of C-Gen tools in Arc 5 and Arc, and that were used,

16:21.040 --> 16:26.960
and none of them were solving this problem. So maybe was it because people weren't inventive enough?

16:26.960 --> 16:31.520
Maybe, but I'm not the best person to answer that question. Sorry.

16:42.320 --> 16:47.120
Yeah, so the question is how did we feel the database, if it was by hand, the answer is

16:47.120 --> 16:51.520
partially. So there was this risk file uptoed database that already had some of the information,

16:51.520 --> 16:56.000
but it was not completely enough. We used that to first generate the skeletons, and then we

16:56.000 --> 17:03.280
manually went, well, we had 13 mentees that we used as helps for writing that, and we manually went

17:03.280 --> 17:08.400
on and on and feel that this database. Yes. Yes.

17:08.400 --> 17:15.920
In your introduction, yes. A lot of IP that doesn't have any of the opcodes within the

17:15.920 --> 17:21.680
really talk to you because it's an opcodes. So do you have the way of modeling the other IP?

17:21.680 --> 17:26.720
So the question is, I mentioned a lot of the IP that doesn't have a lot of two with opcodes,

17:26.720 --> 17:34.560
and I'll show this quickly. So these I believe are the non-ISAS specifications that are like,

17:34.560 --> 17:41.280
that are things like inter-optarchy texture, and this is it. Well, the thing is, some of these things

17:41.280 --> 17:47.040
are not directly related to opcodes, but first, some of them actually have some opcodes

17:47.040 --> 17:52.400
relation, like the AI actually introduces some instructions, and this is very tricky, but we do

17:52.400 --> 17:59.680
have a way of modeling things that are not, I say, only. This is a new project, so we first wanted

17:59.760 --> 18:05.600
to get the ISA and opcodes, but yes, now we're modeling the other things, but we don't do it

18:05.600 --> 18:11.760
functionally. So, functionally modeling, no, but describing them as they are in text and in different

18:11.760 --> 18:21.920
attributes. Yes. Is that, I'm happy to have a model of this myself, just as big fields.

18:21.920 --> 18:44.080
Okay, yeah, definitely, I am, yeah. Thanks. Hi. So, the question is if the spec has any incompatibilities,

18:44.080 --> 18:49.280
from like previous versions of the spec to the next versions of the spec. Well,

18:50.240 --> 18:55.840
actually, I don't think so. Well, I mean, our, like, I've modeled custom extensions in

18:55.840 --> 19:00.640
UDP that have incompatibilities, but what we have is this versioning system in which, like, if there's

19:01.280 --> 19:05.760
for example, an instruction that has different operands into versions, you can definitely specify, like,

19:05.760 --> 19:18.800
version 1.0 and 1.2. And so, yeah. Okay, I guess there's no other questions. Thank you.

19:19.280 --> 19:22.560
Thank you.

