WEBVTT

00:00.000 --> 00:18.000
Okay, so let me introduce the next very special talk and that's like especially for me as I'm actually a chief sticker office in the company and I was bullied into design and sticker.

00:19.000 --> 00:33.000
But yes, our new car will guide us into water tea platform ease and like if somebody listen to my talk, this is a platform that I butcher all the time and try to use it and your new car is actually the one who builds it.

00:33.000 --> 00:35.000
So please go ahead.

00:37.000 --> 00:39.000
Hello everyone.

00:39.000 --> 00:45.000
I am the one who builds it, but there are people in the zoom actually fix it.

00:45.000 --> 00:49.000
So so hi, my name is John Luca.

00:49.000 --> 00:57.000
This is the talk about a plastic introduction to ET platform because I just wanted to give where we can speak about the hardware and the theory as much.

00:57.000 --> 01:05.000
I just wanted to see, this is how you get a hands-on on the project and this is how you can see what's long and see what you can fix and see what you can join.

01:05.000 --> 01:10.000
Because first of all, I think we need to start by saying what is going to talk about.

01:10.000 --> 01:18.000
First of all, I believe not everyone here knows what ET platform is and so we're going to define it.

01:18.000 --> 01:26.000
Then we're going to see how to actually get the platform on your machine and run it and see what you have and what you can do.

01:26.000 --> 01:34.000
And then we're going to get from ET platform just look up to what I found this and what I found it does and what we're up to.

01:34.000 --> 01:40.000
Okay, about me, this slide is becoming very static and not very changing.

01:40.000 --> 01:45.000
The two things that I've changed in the past year are number three because I joined an echo.

01:45.000 --> 01:49.000
Number five because I had zero free time to do personal project.

01:50.000 --> 02:09.000
Okay, so what is it the board that you see here is it is a one board is the development board that we actually use it is based on the former Esperanto it is a one which is now part of an echo and we're planning to open source everything.

02:09.000 --> 02:19.000
The way to think about this chip and I'm going to go later on is essentially an accelerator based on circa a thousand is five calls and he has incredibly low power.

02:19.000 --> 02:24.000
And as you can see there and be more precise you can actually have a thousand is five.

02:24.000 --> 02:44.000
So I'm going to describe quickly the other so at least we know what we're talking about and then go back to the software you can run it.

02:44.000 --> 02:48.000
Some people might have a sense of the job you either use this slide today.

02:48.000 --> 02:59.000
So this is how a board looks like when when you see the board before you will quickly notice that there's this big chip on the hit sink you find the LPD are for around.

03:00.000 --> 03:17.000
This is the pimic and that's on the top left is actually the FTDDI so what are the useful so it is a one is the big chip FTDI is the one that give you the UR to USB so actually you can check or you can control when you have the physical card the stuff.

03:17.000 --> 03:27.000
The pimic is actually the guy who actually handled the power of the chip and you see the memory in PCA is the actual slot where you fit the card into.

03:27.000 --> 03:46.000
So this is another I have various various graphs that I used to explain how the hardware works and none of them gives any justice to the elegance of the actual chip that is actually very uniform and made in different stuff.

03:46.000 --> 03:51.000
But when I think about programming this chip this is the probably warp at the much that I have in my mind.

03:51.000 --> 03:54.000
As you can see we had the PCI where you get the host.

03:54.000 --> 03:59.000
We get a service processor that is the kind of controls of a thing.

03:59.000 --> 04:04.000
We got the compute shares and that's where most of the thousand calls are.

04:04.000 --> 04:14.000
We even got four out of all the CPUs and that's the last call maxions while the small shares are called minion and then you got various devices as everything goes.

04:14.000 --> 04:26.000
The things you understand from this is that you get this huge array of CPUs that are small and in order can be programmed and there you get a service processor that control everything.

04:26.000 --> 04:37.000
Right, so what's a minion minion is one of the many thousand CPUs that we have and essentially you can think of it as a very small this five 64 CPUs.

04:37.000 --> 04:51.000
So you actually have machine mode supervisor mode a user mode he has various extension array interesting like as a tactical atomic see definitely have a SIMD instructions or you can do fixed parked back to extensions.

04:51.000 --> 04:56.000
And you get actual tens of accelerators we actually do matrix multiplication.

04:56.000 --> 05:02.000
So it's actually quite powerful and you got a thousand of them.

05:02.000 --> 05:19.000
Right, so now that we solve how the hardware works we can actually start understanding what is it platform so it platform is a software is completely open source it is on GitHub and contains what you see here essentially contains everything to actually.

05:19.000 --> 05:31.000
If you have a card run kernels on it runs software on this thousand CPUs or if you are if you don't have a card you can simulate it just to learn to test and to see how it works.

05:32.000 --> 05:37.000
Ideally like going from higher level to lower level you get that on time library.

05:37.000 --> 05:44.000
The random library is the library that actually you interface and the way you interface to run temporary is that you program.

05:44.000 --> 05:57.000
You write a program you compile it for this small CPUs you use the random library to load it into device ram and then you tell all the CPU execute this program right now.

05:57.000 --> 06:10.000
And of course the program can actually easily say okay if I'm CPU one I'm going to do job for CPU one and CPU two under the job for CPU two so that's the differentiation that you have simplifying.

06:10.000 --> 06:17.000
The other things you'll find in it platform is the device layer which is the thing that abstract the mechanism to have a device.

06:17.000 --> 06:31.000
And these allow us to run the same software for the computer file but then a simulator that you can just run in your computer and the actual hardware that's going to use our Linux server.

06:31.000 --> 06:34.000
And as I said this is on GitHub.

06:34.000 --> 06:42.000
This is the software that you will find as I said you're going to find the runtime library to device layer the simulator you don't see either but also the firmware.

06:42.000 --> 06:54.000
And this is what's funny you will actually run the simulator the firmware simulated so you can actually even test the simulator for this very complex machine.

06:54.000 --> 06:59.000
And of course everything is open source right so.

06:59.000 --> 07:08.000
This is what I wanted to do today I wanted to be very precise on how to actually get clone compile and run.

07:08.000 --> 07:17.000
First of all I went the hard way because it's actually a Docker build but there was two short of the slides so I decided to actually use the full thing.

07:17.000 --> 07:31.000
It's definitely a Docker file you definitely can look at I found the read me for for that but anyway the first step to actually compile everything you need the gcc tool chain for the device they have particular extension.

07:31.000 --> 07:42.000
And this is the step as you can see is usually this five new tool chain which is a very common way to deploy this file tool chain in the window.

07:42.000 --> 07:47.000
I now it doesn't work but we're going to fix it by the end of the week.

07:47.000 --> 07:50.000
It's not all problem is upstream.

07:50.000 --> 07:55.000
So yes you're going to clone the this file tool chain and then this is essentially.

07:55.000 --> 08:02.000
I'm assuming you're going to 24 or four but there are people that are being able to build and install this on Arc Linux and in other.

08:02.000 --> 08:11.000
Distribution but Linux is necessary there are patches to build this on macOS but that off branch so far.

08:11.000 --> 08:22.000
So essentially yeah you build these are the usual things required to build gcc you configure and you build you configure and make and then you will actually have.

08:22.000 --> 08:26.000
Installing opportunity as you can see.

08:26.000 --> 08:31.000
The full tool chain for minion that's all you need to do.

08:31.000 --> 08:37.000
Once you do the tool chain you can start compiling gcc platform because you can start compiling the firmware.

08:37.000 --> 08:44.000
And sorry the first line is long this slides are very fresh.

08:44.000 --> 08:49.000
So the first thing you need to do then is you have to need to clone it platform.

08:49.000 --> 09:00.000
And as I read me that I explained even how to run with Docker but you get the dependency and then it's just a very normal see make and build.

09:00.000 --> 09:07.000
The general things that is different is I need to tell him where to find the tool chain and we just installed it in.

09:07.000 --> 09:10.000
But that's pretty much it.

09:10.000 --> 09:17.000
So moment you do this you actually gonna have everything that I discovered before.

09:17.000 --> 09:23.000
Including the simulator and everything but that's small because once you compile what you can do well.

09:23.000 --> 09:25.000
You can run tests.

09:25.000 --> 09:30.000
If you do an opportunity then you will find a lot of tests that actually stress the simulator.

09:30.000 --> 09:34.000
But if you just add dash dash mode equal PCIe.

09:34.000 --> 09:38.000
You actually gonna run this test test on the card if you'd like it to have one.

09:38.000 --> 09:45.000
And actually the one that says something about how to access the card later.

09:45.000 --> 09:46.000
Exciting things.

09:46.000 --> 09:50.000
Those are full simulator already built and installed.

09:50.000 --> 09:54.000
And it's essential to call the model for us.

09:54.000 --> 09:57.000
And you will find it there.

09:57.000 --> 10:00.000
I will tell you later how to run stuff on it.

10:00.000 --> 10:04.000
If you're interested in the full model, share runs on the real card.

10:04.000 --> 10:07.000
You will find it in an opportunity, be a sperant of firmware.

10:07.000 --> 10:12.000
And you can modify and compile it and test it on the simulator.

10:12.000 --> 10:14.000
Right.

10:14.000 --> 10:16.000
So what else can you do?

10:16.000 --> 10:23.000
Well, there's definitely an example kernel which is usually very interesting to run with simulator just to see what happened.

10:23.000 --> 10:27.000
So you can just do it in the platforms of the same examples.

10:27.000 --> 10:31.000
You type a make and we should probably fix the make file.

10:31.000 --> 10:37.000
But you will actually find it is a quanta self in this directory in the build directory.

10:37.000 --> 10:42.000
And once you have that, you can actually run the simulator with this.

10:42.000 --> 10:47.000
And you actually see every single instruction and the register that it changed and now it works.

10:47.000 --> 10:50.000
So this is actually how you learn to do this.

10:50.000 --> 10:55.000
And that's actually how you can actually experiment, modify and experiment the test to see where it is.

10:55.000 --> 10:58.000
But you want to see actual kernels.

10:58.000 --> 11:04.000
And even like if the simple but we actually have a port with lmcpp, that is a basic port.

11:04.000 --> 11:10.000
But it has a lot of kernels that actually work in order to be able to run the model.

11:10.000 --> 11:13.000
You can actually just, you know, have found it has this.

11:13.000 --> 11:18.000
You can just go to find the lmcpp, you can just get lmcpp and actually find

11:18.000 --> 11:22.000
in the gmail structure in it, gmail it.

11:22.000 --> 11:24.000
You're going to find the kernels.

11:24.000 --> 11:30.000
You can actually see how this can actually be used in a proper system.

11:30.000 --> 11:36.000
You see this is 80 soquan all well not at all because we are very open company.

11:36.000 --> 11:41.000
And actually, 80 plus term has already the simulator support for lmum,

11:41.000 --> 11:44.000
which is the next chip.

11:44.000 --> 11:48.000
And so when you actually build a dipl safer, we would be a Servicator.

11:48.000 --> 11:49.000
One is Cimum.

11:49.000 --> 11:52.000
For it is 1 year old, your days for lmum.

11:52.000 --> 11:54.000
And lm is the next chip.

11:54.000 --> 11:58.000
And you can actually even run lm test.

11:58.000 --> 12:00.000
Now, of course, what is lmum?

12:00.000 --> 12:03.000
You will expect a slide for me, but I didn't do any.

12:03.000 --> 12:08.000
doing it because, actually, the whole spec, while we're developing the chip, is already

12:08.000 --> 12:09.000
there.

12:09.000 --> 12:13.200
So, actually, go there and even get the reference panel directly from the hardware

12:13.200 --> 12:16.200
engineer working on it.

12:16.200 --> 12:22.000
So, this is the basics and the question is, how can you join?

12:22.000 --> 12:27.080
Because there's a lot going on with very open, and this is, this gets everyone from people

12:27.080 --> 12:32.760
working on MSEPP and AI on Accelerator based on this five to actually people interested

12:32.760 --> 12:38.880
in to see how does a film of work and hardware handle, and we say, yeah, well, we have this

12:38.880 --> 12:39.880
code.

12:39.880 --> 12:44.880
If you actually go to getup.com, I found it or told, you're a fan of this code in

12:44.880 --> 12:45.880
right.

12:45.880 --> 12:46.880
You should join us.

12:46.880 --> 12:51.400
Honestly, it's very hard to keep up with the average intelligence of the people

12:51.400 --> 12:54.560
I feel very stupid, but that's my life.

12:54.560 --> 12:59.400
It's actually very interesting, there's discussion going on from compiler technology

12:59.400 --> 13:05.800
to whatever new is happening in the world of AI to how do I fix the links to people

13:05.800 --> 13:09.560
suggesting artifacts, kernel drivers, and everything else.

13:09.560 --> 13:16.560
And the results are not the important thing, every Tuesday, I think local time is 5pm.

13:16.560 --> 13:23.080
We actually have an ET platform calling, which you'll find a lot of an echo and other

13:23.080 --> 13:28.800
engineers just joining and discussing about technology, what's our plan, what's happening

13:28.800 --> 13:34.720
in the process, and what can people can help, what people are doing, and it's a completely

13:34.720 --> 13:40.680
open-source project, it's completely open, and you'll learn a lot just by being there

13:40.680 --> 13:43.080
at least I did.

13:43.080 --> 13:47.080
And yes, so you actually can go to the event calendar in this code and you'll say it.

13:47.080 --> 13:53.840
And as I said, when we said the ET platform is open, so we actually mean it.

13:53.840 --> 13:57.840
That's it for me, other any question.

13:57.840 --> 13:58.840
Please.

13:58.840 --> 14:08.800
I have a question, is there a new Airbnb platform on the chip of the RBA in 23?

14:08.800 --> 14:10.800
No, absolutely no.

14:10.800 --> 14:16.800
So the question is, is the new Airbnb chip at V8 when you take a compatible, absolutely

14:16.800 --> 14:17.800
not?

14:17.800 --> 14:19.800
That's why I linked the spec.

14:19.800 --> 14:27.880
No, the new Airbnb chip is actually, if you see the way the technical is called a neighborhood.

14:27.880 --> 14:32.760
So essentially, 8 of this minion, we don't the thing, so it's still like the same architecture,

14:32.760 --> 14:39.040
with some Lata fixed cylinder, and then 24 megabytes of memory, and then it's actually

14:39.040 --> 14:43.600
going to have a dual-modal operation when it's as small microcontroller, so it's on GPIO,

14:43.600 --> 14:51.640
what SPI, what if things, or as I've lived a while, it actually has octopi-pulbats interface

14:51.640 --> 14:52.640
for memory.

14:52.640 --> 14:56.640
So there will be a smart memory within and of compute.

14:56.640 --> 14:58.640
Please.

14:58.640 --> 14:59.640
Yes.

14:59.640 --> 15:12.880
We do have, we do have the advert, the chip exists, you saw the picture, right?

15:12.880 --> 15:18.960
We do, so the way it works is that, that way we're not selling it, so far, we do have access

15:18.960 --> 15:23.280
for the community, for people that are involved in community, can actually SSH to machine

15:23.280 --> 15:27.640
in San Francisco, hopefully we're going to bring it on the other side of Atlantic, and it's

15:27.640 --> 15:32.720
part of the community access to architecture experiment.

15:32.720 --> 15:47.880
So it doesn't include, those the emulator include performance models, is definitely an emulator

15:47.880 --> 15:52.840
that was used both to simulate correctly, but if you see the dash L that I add, you have

15:52.840 --> 15:57.600
to get the place used by DV, so actually you can see clock by clock how it changed.

15:57.600 --> 16:04.400
It's not clocked, so let me say, it's not cycle accurate, but it's mostly a way to actually

16:04.400 --> 16:13.280
measure that the behavior of the shape is exactly the same as the behavior of the thing.

16:13.280 --> 16:20.000
So there was, my question is, should it make silver as a chip with 90,000 colours?

16:20.000 --> 16:21.000
Yeah.

16:21.600 --> 16:29.440
We're not building nannanded, of course, sorry, no, no, no, how many colours?

16:29.440 --> 16:32.720
Yeah, we're not building nannanded, there's some colours, so.

16:32.720 --> 16:36.960
Well, now we're starting by eight, and then we'll go, goly.

16:36.960 --> 16:39.200
No, it's up to, what do we want to go?

16:39.200 --> 16:42.600
Essentially, this is probably the city always better equipped to answer that question, but

16:42.600 --> 16:49.680
I can, if you want, we're going for inference, we're going for the middle market.

16:50.240 --> 16:53.720
I'm actually here for the blinking lights, so you can speak with them.

16:53.720 --> 16:57.040
I like to say that I'm here for the blinking light enough for the business, because I prefer

16:57.040 --> 17:03.400
to talk about problems that actual selling something, so anyone else, please?

17:12.400 --> 17:17.000
That's a very good question, I should, as I said, is a simple port.

17:17.000 --> 17:22.080
So the question is, what are the performance of LMSVP on it is a one?

17:22.080 --> 17:26.080
They can go very well, the chip is definitely fast.

17:26.080 --> 17:34.000
LMSVP is not, because this was a demo for us to show that we use the LMSVP as a way

17:34.000 --> 17:39.880
to show anyone's quite other the time, talk to you, show that the softest stack we were building

17:39.880 --> 17:42.600
actually could compile and work on the L card.

17:42.600 --> 17:48.920
So LMSVP was actually a way for us to not only learn, but also see that using the highest

17:48.920 --> 17:52.040
possible levels of the stack we could actually work it.

17:52.040 --> 17:55.680
It's quite interesting, because if you see how the currency works, we actually wanted

17:55.680 --> 18:02.040
to test the fact that this was a CPU, because this was CPUs, the non-dax accelerators,

18:02.040 --> 18:03.040
right?

18:03.040 --> 18:07.320
So we actually wrote it like we wrote C and just wrote, because we wanted to be sure

18:07.320 --> 18:08.320
of that.

18:08.320 --> 18:13.240
There are effort in the community and people are happy to join that actually, there's

18:13.240 --> 18:16.600
definitely an effort going on right now for actually writing up to my screen and then of course

18:16.600 --> 18:19.600
we started from math pool.

18:19.600 --> 18:21.600
Anyone else?

18:21.600 --> 18:25.600
I think we're done.

