WEBVTT

00:00.000 --> 00:18.960
All right, thank you. So welcome everybody. Good morning. I hope everybody had enough sleep,

00:18.960 --> 00:25.840
and potentially enough coffee so far. My name is JP Lair. I work at AMD. You can see that

00:25.840 --> 00:31.760
there, they're here and on my t-shirt as well. So in case you missed it, you have plenty of

00:31.760 --> 00:38.000
opportunities to see it. So I'm going to talk about rocking on the rocks. Now what is that? And I

00:38.000 --> 00:44.000
learned from a colleague after I put the slides together that they say on the rocks to something

00:44.000 --> 00:49.520
when it's in a bad shape. So I was like, hmm, that's not the message I want to convey. So for me

00:49.520 --> 00:57.360
on the rocks, it's just the preferable way to serve a drink that I don't particularly like, okay?

00:57.360 --> 01:05.520
So, but it's rock them on the rocks. So that's first look at rock them a bit. So rock them as our

01:05.520 --> 01:12.160
software stack plenty of people probably know. There's a bunch of things in here that's a picture

01:12.240 --> 01:19.440
and newer picture that I'd taken from a blog post that we did for a very nice version rock

01:19.440 --> 01:28.800
in 7.9 tech preview. In case you didn't know that we had, you know, 7172 and 799 anyway.

01:28.800 --> 01:32.960
So we have a bunch of operating systems supported at the bottom. We have then, of course,

01:32.960 --> 01:39.200
the infrastructure stuff, including the GPU driver and Kubernetes support and whatever.

01:39.200 --> 01:44.560
And then we have the software on top of it, which is the rock and core SDK. That is the one thing

01:44.560 --> 01:50.480
that I'm talking about mostly today, which has the compilers, the run times and whatever.

01:50.480 --> 01:56.400
And then we have vertical SDKs, which are more domain specific packages with, you know,

01:56.400 --> 02:02.080
extensions to the core SDK, tile it for example for data science, right? So we ship more stuff there.

02:02.960 --> 02:07.600
And then we have more operating system support monitoring support. And then, of course,

02:07.760 --> 02:14.000
right on top, there's the whole AI ecosystem that, you know, it's very, very, very prominent right now.

02:14.000 --> 02:22.560
So talking about the expansion SDKs, there's one for HPC. There's a couple of things where it adds

02:22.560 --> 02:27.600
value for Fortran. So if you actually want to call into some of the hip libraries that we ship,

02:27.760 --> 02:36.640
so hip FFTW would have hip FFT or something. The rock and HPC expansion SDK comes with hip

02:36.640 --> 02:41.040
fort, which generates all the bindings that you can use from your Fortran application to call

02:41.040 --> 02:46.400
into those hip libraries. And then there's the rock and data science expansion SDK that gives you

02:46.400 --> 02:54.640
GPU accelerated data processing. I am not an expert, so I asked some colleagues and it apparently

02:54.640 --> 03:01.200
includes the hip DF, which is hip data frame. So the data frames processing and hip graph

03:01.200 --> 03:08.080
and hip mm and it's out of preview. So it should work, right? So you can go check it out and see

03:08.080 --> 03:17.920
what actually does work. And then we also have the rock and click live science expansion SDK that

03:17.920 --> 03:26.720
gives you GPU accelerated IO in some some medical image processing. That is as much as I know,

03:27.280 --> 03:34.320
still in early access state. So you may see a couple of bumps there, but both of them are also

03:34.320 --> 03:40.160
compatible with a rapid ecosystem. So I think that's one of the most important things here.

03:40.160 --> 03:45.040
So then looking at the Core SDK again, this is what most people think of or at least what I think

03:45.040 --> 03:49.760
of as kind of the rock and thing, right? We have the compiler. We have the math libraries. We have

03:49.760 --> 03:58.160
the communication libraries and whatnot. And this is actually what the rock builds or what the rock

03:59.840 --> 04:12.320
is for and then the rock. What is the rock, right? So the rock is basically our path forward to build

04:12.480 --> 04:21.200
tests and distribute rockham, right? It also comes with a couple of changes in how we organize our

04:21.200 --> 04:29.760
software in terms of the repository structure. And one of the goals is probably one of the main

04:29.760 --> 04:35.040
goals is to make it not only easier to to build and ship it, but also faster, right? Previously,

04:35.120 --> 04:41.760
we had a multi week, multi weeks in between rock and releases. With the rock, we build and

04:41.760 --> 04:48.800
nightly so that you can get, you know, the most coolest features every night. So coming to why,

04:48.800 --> 04:55.040
again, faster shipping. But also, I mean, building rockham was just put it politely, not obvious,

04:55.680 --> 05:03.680
right? Anybody here try to build rockham? Yes, who agrees that building rockham was an obvious?

05:05.520 --> 05:13.200
Yes, okay, see. So with the rock, it's getting better, certainly. So I think one of the cool things here

05:13.200 --> 05:22.240
is that actually making your software build a build is embracing the open source ecosystem,

05:22.240 --> 05:25.680
right? Because I mean, if we open source to stuff, that's great, but if you cannot build it, that's

05:25.680 --> 05:34.160
worth us, right? So that's, I think that's great. So speaking of the rock, the rock is now kind of the

05:34.240 --> 05:41.040
single source of truth, right? And as part of that, the components or the libraries and

05:41.920 --> 05:50.480
the system level stuff, those have been integrated into what is called super repositories. So instead of

05:50.480 --> 05:56.880
having like, I don't know, 56 different repositories, there is no only four or five, maybe, right?

05:58.000 --> 06:02.720
And I think that's a big step forward, because it's now a little more obvious,

06:02.720 --> 06:07.680
what you actually need in order to build rockham in the first place. And so inside the rock,

06:08.640 --> 06:14.160
you will have references to those other repositories that bring in the components.

06:14.640 --> 06:18.320
You will have the build recipes, and you will have the containers that we use as the

06:18.320 --> 06:25.440
build environment to actually build rockham. In addition, test recipes and packaging recipes,

06:25.440 --> 06:31.600
although I think the test recipes are in there yet, and the packaging recipes are also still a little

06:31.680 --> 06:39.520
bit on the development. So the repo consolidation, what does it actually mean?

06:41.200 --> 06:47.520
There now is a single rockham libraries super repo that has all the, let's say, AMD SMI

06:48.400 --> 06:54.960
and other system level components that have been previously in living in their own repository.

06:54.960 --> 07:03.440
So they are now consolidated here in rockham libraries. Then there is the rockham systems,

07:04.480 --> 07:09.120
sorry, the rockham libraries, of course, is the libraries I was confused here a bit.

07:10.080 --> 07:15.600
The rockham libraries has all the math lips and those sorts of things. The rockham systems

07:15.600 --> 07:20.240
has AMD SMI, so the systems level stuff apologies for the confusion here.

07:21.120 --> 07:27.120
And then we also have rock GDP as one of the other high-level components.

07:28.400 --> 07:35.440
And then we also have the compiler. So those are kind of like your four high-level

07:35.440 --> 07:40.560
components that you actually need in order to build the rockham core SDK. And I'm a compiler person,

07:40.560 --> 07:48.240
so I really enjoy this one. And also we will see later that the compiler and rock GDP are a little

07:48.320 --> 07:52.240
different when it comes to contributing to these things. So if you find a bug,

07:53.760 --> 07:57.520
how you would actually help us fixing it is different here. But again, I'm a compiler person,

07:57.520 --> 08:02.000
so to me to compile it special. Okay, not only for contributing reasons, but in general.

08:05.360 --> 08:13.200
So how does the rock keep track of the components? So those four arrows should refer to the

08:13.280 --> 08:19.200
different super repository that we have, which is the compiler, the libraries, the system, and the

08:19.200 --> 08:25.520
debugger. And basically inside the rock you will have sub module pointers that keep track of a certain

08:25.520 --> 08:32.560
state of that repository. So that this is a combination or a configuration of the rock, right?

08:32.560 --> 08:38.000
So this is a particular version that you can build with the rock.

08:43.200 --> 08:57.360
And let's see. And then what happens is for the libraries and the systems super repo.

08:58.320 --> 09:03.920
These sub module pointers are basically updated every day to track the most recent advancements

09:03.920 --> 09:09.680
in those super repositories if and only if the integration pipeline is green, right? So if you

09:09.760 --> 09:15.120
find a test error while you're advancing these sub module pointers, you are not going to

09:15.120 --> 09:19.920
to maintain that you're going to roll back. So that the rock is kind of always green. That's the

09:19.920 --> 09:31.920
idea and that's what we want to achieve moving forward. Okay, so building rock.

09:32.000 --> 09:43.120
That has been a pain in the past. So I'm very happy that with the rock, I show you commands now,

09:43.120 --> 09:51.920
okay, and that actually worked on my machine TM, right? So this. So all you do is you do a CMake,

09:51.920 --> 09:58.080
I use Ninja for building, I set the source to the rock source, I put in the build directory,

09:58.080 --> 10:05.680
and I want to build for the AMD GPU families, GFX1030. GFX1030, by the way, is a consumer grade GPU,

10:05.680 --> 10:10.800
right? It's not not the data sender GPU. So we have much more support now also for the data sent

10:10.800 --> 10:14.960
for the consumer grade GPUs, which I think is great. It's amazing because you don't need to spend

10:14.960 --> 10:22.080
thousands of dollars for a data sender GPU to try stuff out, right? Okay, but so this, you know,

10:22.320 --> 10:30.240
this will build all of it, and that's maybe not what you want because you don't want to wait

10:30.240 --> 10:36.880
hours and hours for your, you know, your command to come back. So maybe you want to use this

10:36.880 --> 10:43.040
configuration. So I just added the rock enable all equals off, right, because we don't want to build

10:43.040 --> 10:47.760
all, we just want to build the compiler. I just want to build the compiler again because the compiler

10:47.840 --> 11:01.440
special to me, right? Okay, so we build flang now. So I tested this on a 16 core 32 threads machine

11:01.440 --> 11:11.360
with 128 gigs of RAM, and you want to add that flag, because otherwise you will tell the out of,

11:11.360 --> 11:16.720
out of memory manager like, oh, yeah, please take a job, right? Because flang consumes six to

11:16.720 --> 11:23.040
seven gigabytes per process while compiling. So you can easily smash your system if you do not limit

11:23.040 --> 11:31.680
the parallelism when you're compiling flying. The other thing is that the rock uses something that

11:31.680 --> 11:40.160
is called like a job pool or something in CMake. So it will typically spawn more threads or

11:40.160 --> 11:46.400
processes, then you specifically tell it something to do, right? And so there is a CMake flag that

11:46.400 --> 11:52.560
you can basically set the rock background jobs to just one. So it will just respect however many

11:53.840 --> 11:59.760
processes you specify when you tell it to build something so you don't overload your system. But again,

11:59.760 --> 12:05.840
flang parallel compiling jobs, that's important. And then maybe you also want to, you know, enable

12:05.840 --> 12:11.120
the hip run time because you want to actually produce something that's useful for you. So you know,

12:11.200 --> 12:19.600
you can do that and you can then with the components that got built here, you can run compile

12:19.600 --> 12:26.160
and run your hip applications. Which I think is neat because that's way easier than what you

12:27.040 --> 12:33.520
used to do. Then of course, if you run into problems, you know, contributions.com.

12:33.520 --> 12:37.200
All of this is open source also for the reason that if you if you some look or something

12:37.280 --> 12:42.720
that doesn't work, open an issue in the rock or in like one of the specific repositories,

12:42.720 --> 12:49.440
you know, create a fork, branch, a contribute, whatever. But for rock GDP, it's a little different.

12:49.440 --> 12:59.120
And so the contributions for rock GDP kind of need to be upstreamable at some point. And

13:00.560 --> 13:06.800
for upstreaming into GDP, there's specific process that people need to follow. And I think

13:06.800 --> 13:10.000
they have to sign some sort of agreement. That's what you told me.

13:18.000 --> 13:28.960
Okay. So whatever you contribute to rock GDP needs to be upstreamable and for that and to make

13:29.040 --> 13:36.640
that happen, you have to hand over the copyright to the FSF, so that, you know, that's in a

13:36.640 --> 13:41.200
well shape. But so if you have questions about that particular, lands a lot here here so you can

13:41.200 --> 13:49.440
you can talk to him. And so now let's look at the compiler. When we look at this picture,

13:50.640 --> 13:55.760
I told you that the repositories move forward every night, right, these sub module pointers.

13:55.840 --> 14:04.080
For the compiler, they don't. For various reasons, one of them is we merge upstream LVM into

14:04.080 --> 14:11.040
our downstream compiler about four to six times a day. And there's a lot of churn and there's a lot

14:11.040 --> 14:17.120
of breakage coming in. And so we want to give the developers that actually use the rock for developing

14:17.200 --> 14:28.160
a more stable compiler for a certain amount of time. And so we aim for moving the compiler forward

14:28.160 --> 14:35.600
every two to four weeks. So that's why, you know, the errors a little bit higher here.

14:35.920 --> 14:45.920
Now looking again, if you want to contribute and looking at the compiler in that regard,

14:47.280 --> 14:52.080
the preferred way of contributing to the rock and compiler is to actually go to LVM upstream

14:53.600 --> 14:58.560
because that makes it, you know, much nicer, much easier to have for all the other projects that

14:58.560 --> 15:04.880
also want to run on AMD GPU for example, but that will lie on the LVM project to do so.

15:04.880 --> 15:11.520
So Triton is an example here, right? They track LVM upstream. And so if you want to fix a bug in the

15:11.520 --> 15:18.640
AMD GPU back and for example, you want to do that in LVM upstream so that also Triton can pick it up.

15:19.520 --> 15:26.160
And then those changes get merged into the AMD LVM fork. That's the one that is tracked down stream.

15:28.800 --> 15:32.880
And then we have some downstream special sauce that, you know, we contribute only there.

15:33.680 --> 15:40.480
And then the rock will actually track that branch, that fork in its sub module pointer.

15:43.360 --> 15:49.360
So let's say you build all of this with the first line here, right? You run CMake build,

15:49.360 --> 15:53.760
probably want to add another dash dash parallel here actually. And then you can actually just

15:53.760 --> 16:01.440
do see test in the build directory and it will run some initial tests to see if the build worked

16:02.640 --> 16:08.000
and if that actually makes sense what what you produced. It does not yet do component level testing

16:08.000 --> 16:12.960
if you run it that way. I think that will be there at some point, but it's not there yet at least not

16:12.960 --> 16:19.520
with this like super easy command. Now just again looking at this timeframe what I mentioned

16:20.240 --> 16:24.160
in the legacy rock conversion basically. So this is time I'm going from left to right.

16:25.600 --> 16:32.720
We would merge from our development compiler into the released into the release branches. Let's say here

16:33.920 --> 16:41.520
and there, right? So if we would fix a bug in the compiler here, you would need to wait until the next

16:42.480 --> 16:49.120
merge to actually get that. And that's kind of bad if you have like a fast paste ecosystem.

16:49.840 --> 16:57.760
And so with the rock, we will have much more updates into the compiler so that you know,

16:57.760 --> 17:02.080
everybody can benefit from the advancements and from the fixes that we do.

17:02.400 --> 17:10.880
And so to sum it up, I think rock among the rocks is kind of great because it gives you a

17:11.520 --> 17:18.000
really open source way of doing things because it's in the public and also like AMD engineers

17:18.000 --> 17:23.680
are encouraged to open up the issues on GitHub not internally or not only internally so that you know,

17:23.680 --> 17:28.000
people can track out progress, people can see what we do and I think it's a great.

17:28.080 --> 17:33.280
There's also more and more a community around this. So there's an AMD developer community on

17:33.280 --> 17:37.600
Discord. If you look at the rock, read me, there's a link to it so you can join that.

17:39.200 --> 17:43.840
There's AI folks in there. There's HPC folks in there. There's people from the rock. So if you have a

17:43.840 --> 17:49.600
question about how to build it or why doesn't that work, that's a great place to be and ask questions.

17:50.800 --> 17:57.120
And then for again, for the more fast paste ecosystem, which is not necessarily HPC, I

17:57.760 --> 18:02.720
know, but for the more fast paste ecosystem, the faster delivery that the rock gives us, I think it's

18:02.720 --> 18:09.200
great, great thing. So then I put all the references here that I used so you can actually

18:09.200 --> 18:15.680
read them, look at them and links to the repositories. Then I have this slide that I'm typically

18:15.680 --> 18:20.720
need to show you so I could be wrong here, right? And with that, I thank you very much and I'm

18:20.720 --> 18:30.560
hoping to have questions.

18:45.200 --> 18:50.500
Okay. So the question is whether the rock has a version and all the other

18:50.500 --> 18:52.500
components also have a version and then there's a rock

18:52.500 --> 18:54.500
version, how that does all come together.

18:54.500 --> 18:56.500
The answer is I don't know.

18:56.500 --> 18:59.500
I got confused with the all versioning stuff,

18:59.500 --> 19:02.500
but so I don't know what's the current status

19:02.500 --> 19:04.500
there is.

19:04.500 --> 19:12.500
Can you go back to slide 13?

19:12.500 --> 19:17.500
Slide 13. I was asked to go back to slide 13.

19:17.500 --> 19:19.500
Which one is that?

19:19.500 --> 19:23.500
What's up with all the branches?

19:23.500 --> 19:27.500
What's up with all the branches?

19:27.500 --> 19:31.500
The contributing model for AMD engineers is not that we do forks

19:31.500 --> 19:33.500
and then do poor request.

19:33.500 --> 19:37.500
But that everybody opens up branches in those repositories.

19:37.500 --> 19:40.500
That's why there are so many branches.

19:40.500 --> 19:44.500
Everybody writes to the repository

19:44.500 --> 19:47.500
is supposed to be stating branches essentially

19:48.500 --> 19:49.500
Yes, everybody.

19:49.500 --> 19:52.500
Yes, so people who have push rights work in those repositories

19:52.500 --> 19:55.500
and then again, so for example, the rock and libraries repository

19:55.500 --> 19:57.500
has like I don't know how many different sub-projects

19:57.500 --> 19:59.500
because all the libraries are in there.

19:59.500 --> 20:01.500
So if you're just considering maybe there's

20:01.500 --> 20:03.500
30 libraries in there.

20:03.500 --> 20:09.500
So in so many engineers, that's why you see like up to a thousand branches.

20:09.500 --> 20:13.500
There is a naming scheme, yes.

20:13.500 --> 20:15.500
Yes, it makes sense.

20:15.500 --> 20:19.500
Okay, more questions here.

20:19.500 --> 20:25.500
Do you show the thing it's supposed to be?

20:25.500 --> 20:33.500
So the question was that we need to specify the GPU family

20:33.500 --> 20:37.500
and whether I'm getting confused with the preview here.

20:37.500 --> 20:39.500
Whether it's possible to have generic builds.

20:39.500 --> 20:41.500
The answer is yes.

20:41.500 --> 20:44.500
It's possible and actually so the rock.

20:45.500 --> 20:48.500
There's a distinction between targets and families.

20:48.500 --> 20:51.500
So this would be technically a target.

20:51.500 --> 20:54.500
The families can be even more abstract,

20:54.500 --> 20:57.500
but I don't recall them because I only think in targets.

20:57.500 --> 21:01.500
So it would be like GFX 11 something.

21:01.500 --> 21:04.500
And that will give you support for all the GFX 11 based systems

21:04.500 --> 21:06.500
that are currently supported.

21:06.500 --> 21:09.500
So you can build for multiple architectures at the same time.

21:09.500 --> 21:12.500
And I think you could also specify the actual generic target

21:12.500 --> 21:14.500
to build those as well.

21:22.500 --> 21:25.500
So the question is what's the size difference between a generic build

21:25.500 --> 21:26.500
and just a strict sailor build?

21:26.500 --> 21:28.500
And the answer is I don't know.

21:28.500 --> 21:46.500
So what was the size difference?

21:46.500 --> 21:47.500
Yeah.

21:47.500 --> 21:50.500
So from the system administrators hat,

21:50.500 --> 21:55.500
the question was how is the supposed work with the packaging?

21:55.500 --> 21:57.500
And the answer is I'm not quite sure.

21:57.500 --> 22:00.500
There's other people looking at that.

22:00.500 --> 22:03.500
But my understanding is that so right now,

22:03.500 --> 22:06.500
this will produce basically charge easy artifacts

22:06.500 --> 22:09.500
that are currently also just pushed into some

22:09.500 --> 22:11.500
history storage so you can download those.

22:11.500 --> 22:15.500
And they're also producing Python wheels from it right now.

22:15.500 --> 22:17.500
And moving forward,

22:17.500 --> 22:22.500
there will be packages for the different distros.

22:22.500 --> 22:25.500
And I would hope that there will be some more,

22:25.500 --> 22:31.500
let's say, not distros specific means of grabbing the tar gz

22:31.500 --> 22:33.500
and putting stuff somewhere.

22:33.500 --> 22:34.500
Right?

22:34.500 --> 22:37.500
Does that answer your question?

22:41.500 --> 22:43.500
Okay, maybe we can take this off line.

22:43.500 --> 22:44.500
Okay, cool.

22:53.500 --> 22:59.500
So the question was whether we do binary compatibility

22:59.500 --> 23:04.500
between releases or between certain releases.

23:04.500 --> 23:09.500
So is that ABI compatibility or?

23:09.500 --> 23:14.500
Okay, so my answer that is I'm not quite sure actually.

23:14.500 --> 23:17.500
That might be a question for somebody from the rock team.

23:17.500 --> 23:20.500
I don't know if Mario's is here.

23:21.500 --> 23:23.500
No, he isn't okay.

23:23.500 --> 23:24.500
But that would be another.

23:24.500 --> 23:26.500
So that would be something you can easily ask

23:26.500 --> 23:29.500
on this developer community thing.

23:29.500 --> 23:32.500
They may have an answer for you there.

23:32.500 --> 23:34.500
Okay.

23:34.500 --> 23:40.500
Is there a for open source developers to get access to some resources,

23:40.500 --> 23:43.500
like the SSAH and this GPU there,

23:43.500 --> 23:45.500
or the way we're connecting?

23:45.500 --> 23:48.500
So the question is, is there a way for open source developers

23:48.500 --> 23:53.500
to get access to hardware, especially like GPUs?

23:53.500 --> 23:58.500
So they used to be the AMD developer,

23:58.500 --> 24:03.500
the AMD developer cloud, I think it was called.

24:03.500 --> 24:10.500
One time I checked, you had to have like an AMD content person.

24:10.500 --> 24:13.500
I think that has been relaxed a little bit.

24:13.500 --> 24:16.500
And there's also now there's another program

24:16.500 --> 24:21.500
that I forgot the name of, where it's mostly geared towards AI use cases,

24:21.500 --> 24:23.500
I think.

24:23.500 --> 24:29.500
But it is in general to bring up the ecosystem around AMD GPUs more.

24:29.500 --> 24:32.500
So that might be a vehicle.

24:32.500 --> 24:35.500
But yeah, for specifics, I'm not quite sure.

24:35.500 --> 24:37.500
I would also need to check the website where you know,

24:37.500 --> 24:39.500
ask people.

24:39.500 --> 24:40.500
Okay.

24:40.500 --> 24:42.500
More questions.

24:42.500 --> 24:43.500
Yes?

24:44.500 --> 25:01.500
So the question is, how close do the versions from user land and kernel land need to be?

25:01.500 --> 25:02.500
Right?

25:02.500 --> 25:04.500
When you have, let's say, a PyTorch container,

25:04.500 --> 25:06.500
and you have the KFD and your host system.

25:06.500 --> 25:07.500
How close do they?

25:07.500 --> 25:12.500
At least in the past, we guarantee two versions up and down what's compatibility

25:12.500 --> 25:13.500
from everything.

25:13.500 --> 25:16.500
My experience is that it's much more than that.

25:16.500 --> 25:18.500
But also, it has been some restructuring going on,

25:18.500 --> 25:22.500
so that we're actually more isolating things a little bit

25:22.500 --> 25:26.500
because of specifically these AI container world.

25:26.500 --> 25:32.500
So I don't think it has to be super tightly put together

25:32.500 --> 25:33.500
from a version point of view.

25:33.500 --> 25:36.500
But again, the details, I'm not entirely sure.

25:36.500 --> 25:38.500
So yeah.

25:38.500 --> 25:39.500
And times up.

25:39.500 --> 25:40.500
Thank you.

25:40.500 --> 25:41.500
Thank you.

25:41.500 --> 25:43.500
Thank you.

