WEBVTT

00:00.000 --> 00:10.800
So this one is going to be done by somebody who needs no introduction, because we

00:10.800 --> 00:17.160
weird for me to introduce myself. My lightning talk is tiny grad on microcontrollers.

00:17.160 --> 00:22.760
So we had one talk about, you know, people trying to run AI on MPs. I figured out, you

00:22.760 --> 00:27.240
know, I need AI on an MPU, because I happen to be part of the community that is about

00:27.240 --> 00:33.440
to tape out. It's a pretty interesting microcontroller level MPU. It's one of the first

00:33.440 --> 00:39.080
fully open source tape outs at 16 nanometers. If you're interested, go at the foundry

00:39.080 --> 00:46.120
GitHub, you can find all about it. It's a pretty capable MPU at about one top, 16 megabytes

00:46.120 --> 00:50.480
of AMRAM, so yeah, it's not super tiny, so you can actually do a lot of interesting things

00:50.480 --> 00:55.640
with it, like depth models, y'all models, you know, stuff like that.

00:55.640 --> 01:00.560
It actually came from a much bigger architecture. That again, we're now dealing with

01:00.560 --> 01:05.880
an AI foundry called ET. It used to be done by this company called Esperanto Technology,

01:05.880 --> 01:11.520
but now it's sort of like we're dealing with it. And back then, it was scaled out to the

01:11.520 --> 01:15.960
1000 cores. And, you know, if you want to play with those types of CPUs, I actually have some

01:15.960 --> 01:20.440
in my pocket and in my lab. But again, this microcontroller that we're taping out is just

01:20.440 --> 01:27.680
much more scaled down version of this 1000 core CPU. So, if I want to run something, you

01:27.680 --> 01:33.400
know, anywhere, what am I usual suspects for a small kind of constraint device? Well, again,

01:33.400 --> 01:37.840
I mean, there is M-Learn, which again, last year at Fauston was a great talk, so if you're

01:37.840 --> 01:42.280
curious, you know, go check it out. It's kind of very zephyr OS friendly, you know, you

01:42.280 --> 01:46.560
have a lot of fun using it, but it's really constrained in types of models that it can

01:46.560 --> 01:54.160
run not available, you know, to sort of the kind of models that I was interested in.

01:54.160 --> 01:58.080
Then there's obviously a light attitude for microcontrollers. Again, the supports for zephyr OS

01:58.080 --> 02:03.080
is kind of like, I mean, nobody knows. Basel is not my favorite thing. And then, of course,

02:03.080 --> 02:08.640
there's exicotarch. Exicotarch, we actually had a talk about it. You know, here is this weird

02:08.640 --> 02:13.680
kind of combination of small things and big things. So, I'm not quite sure, you know, what

02:13.680 --> 02:18.320
to think about it yet, is pretty young, so maybe it will develop into something that I would

02:18.320 --> 02:25.000
actually find the joy to use. But with Fauston, right? So, like, why should we constrain ourselves

02:25.000 --> 02:30.040
to the things that I kind of like pre-cammed and given to us by big vendors, you know, like

02:30.040 --> 02:34.880
PyTorch community? So, what I considered them was, you know, micro-TVM, but apparently

02:34.880 --> 02:40.440
that thing died. So, like, I didn't know. And then I'm like, okay, fine, I know G-G-ML, I know

02:40.440 --> 02:44.240
tiny grad. And there was actually this other project that the clips from the foundation

02:44.240 --> 02:49.720
called age that kind of tries to bridge the gap between big accelerators and small ones.

02:49.720 --> 02:53.360
And the way I look at them is, like, all of them are basically looking at a compute graph

02:53.360 --> 02:58.080
and trying to lower it into, like, big devices or small devices. So, daisy tuner, again, there

02:58.080 --> 03:02.640
was a talk about that. There's one of the cool ones that I really would like to play with,

03:02.640 --> 03:08.800
and even, like, push it to the micro side of the devices. A-I-H-H graph is something that they

03:08.800 --> 03:14.120
did lower it to a lot of MPUs and actually lower it to the A6 even. So, that actually

03:14.120 --> 03:18.480
definitely has a backend, but, again, I'm unfamiliar with it. I-I-E, and I'm L-I-R, I just

03:18.480 --> 03:24.240
like, I don't talk to me about that. So, tiny grad. So, tiny grad is this framework, not

03:24.240 --> 03:28.440
a lot of people, for some reason, know about it. But it's basically kind of this idea that

03:28.440 --> 03:34.000
if we have a really optimized internal representation of a compute graph, it's very compact.

03:34.000 --> 03:38.760
It contains only small amount of operators. We can do a lot of good things with it.

03:38.760 --> 03:43.520
And, give you kind of like a toolbox of things that can be applied to it, right? So, that's

03:43.520 --> 03:49.160
what I chose. My additional constraint was to basically make sure that it can be done

03:49.160 --> 03:52.560
by cloud code, because I'm lazy these days, and it just won't cloud to do everything for

03:52.560 --> 03:57.080
me. And, by the way, tiny grad is brought to you by the same person who hacked PlayStation

03:57.080 --> 04:01.320
way back when, and, you know, that blog post back in 2010 was such a breath of fresh air

04:01.320 --> 04:07.400
for me, it's the same guy. As my friends, I'm Luca, likes to say tiny grad is small enough

04:07.400 --> 04:12.320
to actually fit into cloud codes, context window. So, that is literally my kind of, you know,

04:12.320 --> 04:16.760
it's not quiet, but like, that's my cloud.md, right? You know, that's what I wanted cloud

04:16.760 --> 04:21.600
to do. And, amazingly, that actually went pretty well. So, like, there's a really good series

04:21.600 --> 04:25.000
of blog posts, and it's not just for cloud to read them. You're welcome to read them as

04:25.000 --> 04:29.640
well. They were reading for humans. They've kind of introduced you to the tiny grad.

04:29.640 --> 04:33.680
What I decided to do is to look at how they implemented web GPU backhand, because, you know,

04:33.680 --> 04:37.480
tiny grad has it. And, I kind of went from there and just, you know, it's actually

04:37.480 --> 04:42.880
amazingly well worked, you know, with cloud code. So, takeaways. Takeaways from, like,

04:42.880 --> 04:46.720
experimenting for basically a week with cloud code. I actually ended up generating, you know,

04:46.720 --> 04:54.800
some semblance of an L file. Kind of tiny grad is all predicated on, you know, taking, basically

04:54.800 --> 05:00.400
a graph expressed somehow and producing a bunch of things that you can then push into either

05:00.400 --> 05:05.440
a device itself. Like, you can literally produce an L file, and all of that is done through

05:05.440 --> 05:10.960
the renders. So, with the renderer, you basically have, again, either a graph that's given to

05:10.960 --> 05:16.080
you, or you can construct a graph, like, I'm doing here with the u-ops. You can kind of like,

05:16.080 --> 05:21.600
you chain them and build that graph in memory. And then you can ask it to render, and it would

05:21.600 --> 05:29.040
render into something like, you know, a cuda actual kernel. So, that's what I did. One of the things

05:29.120 --> 05:34.000
that didn't work out, I didn't actually look quite managed to generate, like, c and c++ code,

05:34.000 --> 05:38.800
that would be compact. So, I need to kind of like, look into tiny grads, pattern matching,

05:38.800 --> 05:43.200
and optimizations like that. But other than that, I'm actually on the way of using tiny grad,

05:44.080 --> 05:49.120
to produce code for microcontroller imputes. Again, my models are yellow and depth perception.

05:50.240 --> 05:54.800
So, I recommend you all try that, because, again, it just happens to be one of the toolboxes.

05:54.880 --> 05:59.120
There is not a product per se, but it's so easy to combine and work with, that you can

05:59.120 --> 06:04.240
run a lot of experiments, especially if you use cloud in really short amount of time. So, that's it.

06:04.240 --> 06:07.600
Thank you so much.

