WEBVTT

00:00.000 --> 00:17.000
Okay, welcome everyone at the next talk. We have Sebastian who had really nice talk about

00:17.000 --> 00:25.000
I'm really looking forward to see the right ones and run everywhere approach for not only AI but also for

00:25.000 --> 00:35.000
all kind of GPU operations that we can do on tensors and using Rust and Slang and maybe Rust in the future

00:35.000 --> 00:41.000
fully, let's see Sebastian, it's all in your hands now.

00:41.000 --> 00:51.000
Yeah, thank you very much for the introduction. So yeah, I'm Sebastian Kose, so we'll be talking

00:51.000 --> 01:03.000
about Slang and Rust. So a little bit of background. I'm not entirely like an AI guy, so at first I was mostly

01:03.000 --> 01:10.000
specialized into Lina algebra and physics engines. So if you are familiar with the Rust ecosystem,

01:10.000 --> 01:16.000
you might know some of my libraries, the main ones being in algebra for like Lina algebra,

01:16.000 --> 01:24.000
matrix operations etc. And rapier, which is a physics engine for rigid bodies, so you can use it for

01:24.000 --> 01:34.000
games, for robotics, etc. And more recently I've got some interest into Lina min France, so I started making this toy

01:34.000 --> 01:44.000
AI inference library, which is named Slang. And the SL in Slang means Slang. So my goal here is that

01:44.000 --> 01:58.000
as a programmer, I really don't want to spend too much time rewriting again and again the same piece of code for

01:58.000 --> 02:06.000
the same GPU operations. So right now if we look at a lot of the existing ecosystem like the

02:06.000 --> 02:14.000
GPU, candle and many inference engines, what happens is that depending on the platform you want to

02:14.000 --> 02:22.000
target, you have to write your GPU code as many times as you have platforms. So if you want to target

02:22.000 --> 02:30.000
metal, you want to you need to write metal shaders, if you want to target QDA, you need to

02:30.000 --> 02:38.000
write QDA shaders, etc. So it's always the same thing, but with different languages. So my goal here is,

02:38.000 --> 02:46.000
can you just run write things once and have it run everywhere? And this becomes especially important to me,

02:46.000 --> 02:54.000
because I'm not just on AI, I want to do this for physics, I want to do this, I don't know for some

02:54.000 --> 03:02.000
synthesis for all kinds of domains. And right now GPU programmers, there are not a lot of

03:02.000 --> 03:10.000
GPU programmers because it's very, very difficult. So if a programmer has to write things 10 times

03:10.000 --> 03:18.000
to support every platforms, that's just impossible. So just even for from a community perspective,

03:18.000 --> 03:26.000
it's extremely important to provide these tools, to provide these languages, but you can use

03:26.000 --> 03:36.000
to write things once and then everywhere. So I've been experimenting with slang, so slang is kind of all

03:36.000 --> 03:44.000
you at the same time. So it's all because it has been run for a few years, it was developed by

03:44.000 --> 03:52.000
Nvidia, but at the end of 2024, what happens is that the project was transferred to Kronos.

03:52.000 --> 04:00.000
So now it's a Kronos project, it's open source, open governance, so it's way more appealing

04:00.000 --> 04:08.000
from a community perspective to start using that. So what's slang is two things, it's at the same

04:08.000 --> 04:16.000
time a programming language for the GPU, and at the same time it's a compiler for that language. So the idea is you

04:16.000 --> 04:24.000
write your shader once in slang and then the compiler will translate your shader into your platform specific

04:24.000 --> 04:32.000
code. So, for example, if you want to support the web, it can convert your slang shader into

04:32.000 --> 04:40.000
WJSL, if you want to support meta, it creates MSL, it can create pts5 for QDA, it can even

04:40.000 --> 04:48.000
support the CPU, and it has some things here, targets I have never really tried, like by

04:48.000 --> 04:56.000
each other, but you can see it has a very, very wide coverage in terms of targets. And actually before

04:56.000 --> 05:04.000
doing some slang, I actually made the same experiment that using WJSL, which is very present

05:04.000 --> 05:11.600
in the Rust ecosystem, and WJSL has this library called Naga, and Naga is kind of very

05:11.600 --> 05:19.600
much like slang, except that it takes WJSL as input or also spv as input, and it will generate

05:19.600 --> 05:27.600
the same kind of outputs, but it cannot do compute only apis because it's a lot more focused on graphics.

05:27.600 --> 05:37.600
So why would you want to use slang? So of course, open source, open governance, it sounds

05:37.600 --> 05:44.640
very promising for its future, you write your shader once, it runs everywhere, and also it

05:44.640 --> 05:50.640
supports both compute and rendering. So if you have a pipeline like some AI inference and then you

05:50.640 --> 05:57.200
need to use a result of that on some rendering shaders or some physics or whatever, you could do this

05:57.200 --> 06:04.320
because you remain within the same API for compute and graphics, so you wouldn't need any synchronization

06:04.320 --> 06:12.640
between CPU GPU, you can just keep everything on the GPU. And like web GPU, you are not limited

06:12.640 --> 06:21.920
by the web GPU standard, so you do have access to more low-level operations, which can be

06:21.920 --> 06:28.320
important if you are interested in supporting higher performance tools for native, so you can

06:28.320 --> 06:34.400
have some sort of conditional comparisons if I am targeting the web, and I don't use these kinds of

06:34.400 --> 06:42.400
very operations, but if I am targeting native, then you can fall back to a more efficient

06:42.400 --> 06:50.080
implementation. It has automatic differentiation built in, I haven't used that actually,

06:51.120 --> 06:58.000
but I suppose it can be very convenient for training, and the most important part is

06:58.000 --> 07:06.640
it has a very modern syntax. So even if you just use slang for writing shaders and creating

07:06.640 --> 07:13.440
sprv kernels from it, that's always your win just because it's so much nicer to use

07:14.080 --> 07:23.520
vanWGSL or GLSL or any kind of these languages. And finally, it supports reflection, so you can

07:23.600 --> 07:30.720
ask the slang compiler information about your shaders, so you can let the slang compiler

07:30.720 --> 07:35.680
allocate binding sets and binding groups for you, and it will automatically tell you, okay,

07:35.680 --> 07:42.400
for this kernel argument, you need to bind it that way, so you don't have to worry about this

07:42.400 --> 07:48.240
kind of registered location, which sometimes can be very annoying to do.

07:48.880 --> 07:56.320
There is this QR code here, so this is a talk from Kronos, which I found extremely interesting,

07:56.320 --> 08:01.520
so if you are interested in slang, you might want to start with that talk because it gives

08:01.520 --> 08:07.680
a lot of information about what it's about, how it works, the limitations, potential use cases,

08:08.880 --> 08:17.520
so I found it very, very interesting. So this is slang, so this part here,

08:18.320 --> 08:27.440
this is only about writing and converting shaders. Now, this does not tell you how you can use this

08:27.440 --> 08:34.960
from rest, if you actually want to run it for your inference or for anything else. So you need

08:34.960 --> 08:41.680
four parts, first you need the slang compiler, so this translate your shaders from slang to your

08:41.680 --> 08:47.760
target, you need responding, so you can actually call the slang compiler, so the two blue things here

08:48.400 --> 08:54.880
so slang C++ and shaders slang, these already existed when I started this work, so we already

08:54.880 --> 09:02.000
had the rest bindings, but these are low level bindings, so I kind of had to make a very small

09:02.000 --> 09:08.240
wrapper on top of that to make it easier to use for the simple use case of translating the shader,

09:09.360 --> 09:16.080
and then the hard part, and I believe it's the hardest part, is to write an i-ch i, so i is

09:16.160 --> 09:23.360
run the hardware interface, so the idea is that on the one hand slang allows you to write your shader

09:23.360 --> 09:29.360
ones and convert it for every platform, and on the other hand you need to write your CPU code,

09:29.360 --> 09:36.960
so all the CPU side orchestration, creating buffers, launching a kernel, etc, all that you also want

09:36.960 --> 09:44.960
to write them once and have them run on every platform, so it means that they need when you write

09:45.040 --> 09:52.320
create buffer, it needs to translate to the meta API if you are on the quest to translate

09:52.880 --> 09:58.480
to the Q-der allocation, if you are targeting Q-der, etc, so you kind of

09:59.680 --> 10:07.840
still need this kind of multi-backhand interface, but only for the GPU API not for the shaders themselves,

10:08.800 --> 10:16.720
so I've started walking on sling a trail, which is essentially that API that you just call

10:16.720 --> 10:25.600
and depending on your platform it will automatically use the proper low-level hardware API,

10:26.560 --> 10:33.680
it's very incomplete right now, so right now it's only support compute and only web GPU and some

10:34.480 --> 10:43.760
Q-der, what's interesting is that if you are using C++, be aware that they already have

10:43.760 --> 10:53.120
so Microsoft already has an IHI implementation for sling, which means that you will probably

10:53.120 --> 11:00.320
already be in a very good place for calling sling shaders directly from C++, but in the rest

11:00.320 --> 11:06.640
but it doesn't exist, and finally the more interesting part of the actual tensile, so

11:06.640 --> 11:13.040
as tensile is the tensile library which provides tensile operations like metric multiplication,

11:15.040 --> 11:21.840
addition of shovectors, these kinds of very general operations you might want to use for

11:22.160 --> 11:29.040
inference, but also for physics simulation or anything else, and that's kind of the reason why

11:29.040 --> 11:36.960
this is a new library instead of just creating a sling backhand to something like the MSB or

11:36.960 --> 11:45.600
candle, because I really need these operations to be reusable from different domains and not just

11:46.080 --> 11:53.840
actual intelligence, slay is the inference library and an exclusive slash phase are just

11:54.720 --> 12:03.120
physics simulation, so it's not related to IHI at all at the moment, so to give you an idea of

12:03.120 --> 12:11.680
what sling looks like, this is a very basic kernel, and the most important part, and I think it's

12:11.680 --> 12:18.000
kind of silly that this today, this is still important, you can see that A and B, the true tensile

12:18.000 --> 12:26.240
arguments, they are specified as arguments to the add assigned kernel function, so these are not

12:26.240 --> 12:33.200
global variables that you can see in a lot of shading languages today, and just that makes the code

12:33.200 --> 12:40.000
extremely more easy to maintain and to read, and the rest is kind of very standard,

12:40.960 --> 12:49.680
some things this does not show, is that sling has a modular system, so you can from what's

12:49.680 --> 12:55.920
kernel, and you can see here this is, and everything else, this is generic,

12:55.920 --> 13:01.680
graphic up to the backhand, so you can write this once, and then it will specialize automatically

13:01.680 --> 13:09.440
depending if you select web GPU backhand, a queue that backhand, etc. So the rest is kind of similar,

13:09.520 --> 13:16.240
you can specify your kernel arguments, so you have a bunch of GPU buffers, and you don't have to know

13:16.240 --> 13:22.080
like from what backhand is the first original from, and then the lunch function just takes

13:22.080 --> 13:31.760
these arguments and trigger the add assigned operation. So not anywhere here, you've seen any kind of

13:31.760 --> 13:38.000
bending groups and bending indices, because this is automatically allocated by sling, and yeah,

13:38.160 --> 13:44.800
you don't need to worry about the backhand, because this is automatically handled by your generic

13:44.800 --> 13:53.840
arguments. So here I've put a bunch of links, which are the main libraries I mentioned here,

13:55.120 --> 14:04.640
slay is very incomplete right now, so I'm still at the experimentation phase, and you get a bunch

14:04.720 --> 14:12.320
of operations, so five implemented Yama, and whispers, whispers, model, and the idea is to continue

14:12.320 --> 14:20.240
to provide these low-level operations, and more models in the future. There is a similar experiment,

14:20.240 --> 14:27.920
I've done before starting slanks, so if you are interested in a purely web GPU, the WGSL implementation,

14:28.400 --> 14:36.320
then you can have a look at the WGML. So what's next? Right now, like I said, I'm

14:36.320 --> 14:42.240
still in the exploration phase, and my goal is to promote this idea of writing shader once, and

14:42.240 --> 14:49.440
have it run everywhere, so that we get more and more contributors from the way the range of

14:50.160 --> 14:59.120
backgrounds, and not just AI, and sling is very nice, it's extremely close to the ideal solution,

15:00.000 --> 15:07.520
except that today does not have a package manager, but I think there's some potential

15:07.520 --> 15:14.320
work in the future that makes that, but more importantly, it does not allow any kind of CPU GPU

15:14.320 --> 15:19.360
co-sharing, just because it's just a completely different language from Rust. So,

15:21.040 --> 15:29.040
I do have one last experiment, I've started recently, which is to use something called RustGPU

15:29.040 --> 15:37.840
and RustGPU. So these are libraries which provide a computer that can for the RustGPU,

15:37.920 --> 15:47.520
which helps translate compile Rust code directly into spv or directly into pts.

15:48.640 --> 15:59.280
So with that, we could automatically use the cargo package manager, and also for sharing shader code,

15:59.280 --> 16:06.800
and also we could have some ways of sharing code between GPU and CPU, which is especially important

16:06.800 --> 16:12.880
if you want to share the definition of structure, for example, to avoid some kind of mismatch,

16:12.880 --> 16:19.120
which can be very difficult to debug. So that would be my last experiment before I actually

16:19.120 --> 16:25.680
dive into selecting my preferred solution for this Rust platform, single source GPU physics

16:26.560 --> 16:34.320
implementation, and I also have a very strong focus with here on generating data sets for

16:34.320 --> 16:44.800
embedded AI based on GPU physics simulation. So yeah, thank you all for being here, and if you

16:44.800 --> 16:54.800
have any question of hesitate.

17:14.880 --> 17:40.000
So, the question is if I support generics, so, well, if free aspects to that, so we can talk about

17:40.080 --> 17:46.480
generics on the language itself, so this one I don't need to support them, because the

17:46.480 --> 17:56.000
slang compiler does that for me, generics on the Rust side, right now it's only about the backhand

17:56.000 --> 18:04.160
selection. So the only generic part you can specify is whether you want to use the web GPU backhand

18:04.240 --> 18:17.280
or the QW backhand, but it does not let you change something like what group sizes, so you can

18:17.280 --> 18:27.280
visualize these yet.