WEBVTT

00:00.000 --> 00:16.560
OK. Good morning everybody. I'll give you an introduction to the easy project, which is

00:16.560 --> 00:24.080
our way of trying to keep the P in HPC. How many people here know what HPC stands for?

00:24.160 --> 00:32.720
Everybody. OK. I can skip one slide. So I'm in HPC, is that mean? It's super computing here in Belgium

00:32.720 --> 00:40.000
and in the university when doing open source for a while. Like other things beyond open source as well.

00:40.000 --> 00:45.360
Usually if you bring me a beer, I'm happy. It's a bit too early for that for them. I've been coming

00:45.360 --> 00:49.840
to force them for a while. I've been co-organizing the HPC with their room, which is just two doors down.

00:49.840 --> 00:56.320
Since 2014, and I'm the lead developer of the EasyBook project and also a core contributor to

00:56.320 --> 01:02.320
Easy and there's a strong relation between Eastern. So HPC, everybody knows it. I can skip it.

01:03.120 --> 01:08.960
Large computers, parallel computing, expensive servers, lots of CPU cores, maybe some GPUs as well.

01:08.960 --> 01:14.160
Fast interconnect storage. It's a complex D, right? And it's not that easy to use as well.

01:14.160 --> 01:23.520
So anything we can do to simplify this is actually helpful. What is important that in super computers,

01:23.520 --> 01:30.800
you often see if every recent CPUs, like the often the vendors will first deliver a first batch

01:30.800 --> 01:36.880
of new CPUs to the super computer. So you're dealing with the latest stuff. Modern micro-pressants

01:36.880 --> 01:42.160
are support vector instructions. Things like AVX, AVX, AVX, 12, you've hopefully heard about these things.

01:42.240 --> 01:49.120
On R, we have the scalable vector extensions. And using these are quite important for performance.

01:49.120 --> 01:52.800
Like, if you're not using the vector instructions or if the binaries you're running are not using

01:52.800 --> 01:59.120
vector instructions, it's like you're not using half your PC or half your CPU. You could be running

01:59.120 --> 02:04.560
on a Pentium 4, mind as well. So these things are important because some parts of the hardware are

02:04.560 --> 02:09.600
really dedicated to running the parts of the binary that use these vector instructions. So if you're

02:09.600 --> 02:14.000
not using them, you're leaving a lot of performance on the table. You're not using a big part

02:14.000 --> 02:19.600
of what the CPU supports. So you will be impacted and potentially heavily impacted,

02:20.560 --> 02:26.240
especially with scientific software running on super computers. There's a flip side as well.

02:26.240 --> 02:31.120
If your binary is used vector instructions and you're running it on the CPU that's a bit older,

02:31.120 --> 02:35.440
that doesn't support them, it will just crash. You'll get illegal instruction and it will not work at all.

02:35.520 --> 02:40.960
So there's this balance between both. But we do in hyperformance computing, we care a lot about

02:40.960 --> 02:45.520
the performance, so we go through the pain of not using these generic bindaries that work everywhere.

02:46.640 --> 02:52.800
So we want to keep the P and HPC. That means we need to optimize our software for the hardware

02:52.800 --> 02:55.920
that's going to run on. So we won't take around the binary from the internet and say,

02:55.920 --> 03:00.560
this is probably be fine. Now we're going to build our software from source on the host,

03:00.560 --> 03:04.640
where it's actually going to run. And we're going to tell the compiler, optimize for this CPU that you're

03:04.720 --> 03:10.960
seeing right here. So this can be quite important, especially for scientific software. So how important?

03:11.520 --> 03:17.280
There's a good example. Gromax is a molecular dynamics simulation software, so you've similar

03:17.280 --> 03:22.880
like molecules and atoms, and that's all I know about Gromax, don't ask me more. But I know it

03:22.880 --> 03:29.040
consumes a lot of compute. There's some numbers that don't have statistics. It consumes about

03:29.120 --> 03:36.720
5% of the super computing power in Europe. That's a lot, right? 5% is not small, but this is one

03:36.720 --> 03:41.360
software package, consuming 5% of everything that's there in Europe. So this is huge.

03:42.000 --> 03:50.320
Now with Gromax, you have to be very careful. What the graph shows here is we're on one system,

03:50.320 --> 03:56.080
a fairly recent system in AMD Epics and for, that has 192 cores in total. So we're stuck to a

03:56.160 --> 04:00.720
single computer. We're running a recent version of Gromax with a particular input. The details

04:00.720 --> 04:05.120
are there, but they're not really too important. And the different graphs show you different

04:05.120 --> 04:10.400
Gromax binaries. So exact same source code, exact same Gromax version, exact same system.

04:10.400 --> 04:16.080
All we did is compile the binary differently. Higher is better, so higher means higher,

04:16.080 --> 04:20.000
better performance. If we're using a generic binary on the left, we're essentially using

04:20.560 --> 04:25.600
two instructions, which I think are pending for 20 years ago already supported. So it makes

04:25.600 --> 04:32.240
zero sense. And then you get about eight nanoseconds of simulated time per day. So we're metric,

04:32.240 --> 04:38.560
but yeah, it gives you some idea. If we're doing a bit better job, if you say build Gromax

04:38.560 --> 04:44.880
for a modern, more modern CPU, let's say the last decade, anything that supports ATX 2 instructions,

04:45.040 --> 04:51.440
we already gain a bit. We get to let's say 11 and a half. And if we do, if we go full out and

04:51.440 --> 04:57.360
really optimize for the CPU, we're running on, we get another bump. Now, the difference here is 78%

04:57.360 --> 05:01.200
going from left to right. So if you're not careful and you happen to be on the left,

05:01.200 --> 05:05.120
you're losing a lot of performance and you're spending just way more time running your simulations.

05:05.120 --> 05:09.840
And you're not using half the harder, that was quite expensive to buy. So that's a bad idea.

05:09.840 --> 05:14.000
Don't want to do that. You want to make sure you're on the right side of the graph as much as

05:14.000 --> 05:20.960
as possible. Now, very good. Why don't we always do that? Well, we try to. But there's a lot of

05:20.960 --> 05:26.560
scientific software that get run on supercomputers. And getting worse and worse, it's really expanding

05:26.560 --> 05:32.160
to other fields. By one of our metrics is now very compute hungry as well. They have a lot of data.

05:32.160 --> 05:36.640
They can start running on GPUs, so supercomputing is very interesting for them. And I don't have to

05:36.640 --> 05:40.880
explain you about the AI boom that's happening. So that's bringing lots of new software,

05:41.200 --> 05:46.080
to supercomputers as well. There's an increased interest in the cloud. So cloud is very flexible.

05:46.080 --> 05:50.800
It just pin up VMs. You run something, you tore down VMs again. And you only pay for what you

05:50.800 --> 05:57.600
use essentially. And that's, yeah, supercomputing centers cannot beat that. So it is very interesting.

05:58.640 --> 06:05.840
We see other micro-archities as other CPU families coming beyond Intel and AMD. So for the

06:05.840 --> 06:10.480
very long time, it was like everything was Intel. There was nothing else. AMD came back in the

06:10.480 --> 06:14.720
game that was too late. It's exactly six. It was close enough that it was still easy. But now we

06:14.720 --> 06:19.760
have our, like the biggest supercomputing in supercomputing in Europe, Jupiter in Germany, just across

06:19.760 --> 06:26.240
the border with Belgium, near Archen. It's full arm. Nvidia grace CPU, so it's all arm. If you're

06:26.240 --> 06:31.200
software cannot run on arm or it can compile on arm, you're not running on the biggest supercomputer

06:31.280 --> 06:36.320
in Europe, so that's a shame. So that's already interesting. And then there's the risk 5 CPU

06:36.320 --> 06:40.400
family that you, there was a death room, I think, yesterday on risk 5 as well. That's coming,

06:40.400 --> 06:45.680
so that's the third CPU family, which is totally different from exit and six in arm. So if your

06:45.680 --> 06:51.600
software was built for Intel and AMD, it will not run on arm, at least not without emulation

06:51.600 --> 06:56.320
and other 30 tricks, which only slow things down. Same thing for risk 5, so you get this huge

06:56.400 --> 07:00.000
mess of things and it's only getting worse and worse. So we're getting more software,

07:00.000 --> 07:05.600
we're getting other platforms like cloud, we're getting bigger diversity in CPUs, and it was

07:05.600 --> 07:11.840
already pretty bad that we had to build everything from source. On the flip side, the people that

07:11.840 --> 07:16.640
help the researchers using the supercomputers, so the support teams, which I am in, they sort of

07:16.640 --> 07:21.280
have a cap in the amount of people that can be there. There's only a certain amount of money for staff,

07:21.280 --> 07:24.400
but they get a lot more work, a lot more questions and they need to help a lot more people.

07:25.360 --> 07:30.240
So this gives some idea. This is the number of software requests we've been seeing

07:30.800 --> 07:36.240
at Ganton University over the years since 2018, and it just line goes up, which is bad,

07:36.240 --> 07:40.720
because more requests is more work for us as researchers waiting longer because they can't figure

07:40.720 --> 07:44.640
out how to install the software themselves, so that's bad news, just more and more software,

07:44.640 --> 07:49.600
more and more requests come. Yeah, so this essentially has doubled in about 5 years, which is pretty bad.

07:49.920 --> 07:56.000
So we're constantly busy, we have ideas to improve things, but we just don't have the time to

07:56.000 --> 08:02.640
implement the ideas because we're busy handling all the requests. What if we could change that?

08:02.640 --> 08:07.360
So what if there's a way that you don't have to install the software anymore that you want to use,

08:08.880 --> 08:15.040
even on your laptop, on the supercomputer, in the cloud, what if it was just there and what if

08:15.040 --> 08:20.720
that could be possible without compromising on performance? So that would be interesting,

08:20.720 --> 08:24.320
and that's what we're doing with the easy project. So that's exactly the goal we have

08:25.200 --> 08:33.600
to really change the situation for the better. So how are we doing this? So easy is a bit of a

08:33.600 --> 08:38.880
game to acronym. It's the European environment for scientific software installations very long.

08:38.880 --> 08:43.280
European actually helps because it helps us get in funding for doing this. You can pronounce it as

08:43.280 --> 08:50.320
easy, which is funny and titles of talks and stuff. So what it really is is a shared repository

08:50.320 --> 08:55.760
of installations of scientific software. And I say installations that's important. It's not a

08:55.760 --> 09:00.960
package manager that you can do RPM install. No, it gives you the installation itself. So if you

09:00.960 --> 09:05.600
get access to easy, the software is there, you can start running it. You don't need to install anything else.

09:07.200 --> 09:11.840
I were really doing this to work together, so we want to avoid a lot of duplicate work,

09:11.840 --> 09:16.960
people were installing software again and again in their own silos, and maybe exchanging ideas

09:16.960 --> 09:21.120
were working together on tools like easy builds, but they were still doing the actual work of

09:21.120 --> 09:27.920
getting things to install and compiled and build in their silos. You also want this to be a uniform

09:27.920 --> 09:32.160
way of giving software to the users. So whether you're on your laptop or whether you're on a

09:32.160 --> 09:36.000
super computer or in the cloud, you should be doing the exact same thing, and it should be working fine.

09:36.560 --> 09:42.960
It should work on any Linux operating system or still geared towards super computers, which are

09:42.960 --> 09:47.840
100% Linux, but you can do virtual machines, I can run it actually on my Mac here as well,

09:47.840 --> 09:53.520
and the Linux virtual machines, so that's good enough. So we want to support laptops, work stations,

09:53.520 --> 10:00.240
super computers, clouds, Raspberry Pi, as well, why not support different CPUs. We're also

10:00.240 --> 10:04.960
take care that pay attention to the network and the super computers. We want to support GPUs to

10:04.960 --> 10:10.480
make it even more interesting next to all the CPU families we already have, and performances are

10:10.480 --> 10:15.840
number one focus point, but we want to automate things, there's lots of software, so we need to

10:15.840 --> 10:20.640
automate things as much as possible. We're very serious about testing as well, and of course collaboration

10:20.640 --> 10:25.760
is everything we do. Some of this is funded through a Euro HPC center of excellence, so the

10:25.760 --> 10:32.960
European does really help for getting money to build this out. So yeah, some of this is repetition,

10:33.040 --> 10:36.640
uniform software stack essentially means you have the same software environment everywhere. You

10:36.640 --> 10:42.400
go somewhere either easy as already there, if not, you can easily install it, and if it is there,

10:42.400 --> 10:45.920
you know what's going on, you know how to use it. If you don't want to sacrifice performance,

10:46.960 --> 10:50.720
you often see this with containers and come down somebody gives you container image and

10:50.720 --> 10:55.520
everything is there, trust me, and it works everywhere. It's like, that means you're running

10:55.520 --> 10:59.360
binaries that work everywhere, so they're not optimized for my host CPUs, so performance is

10:59.360 --> 11:04.720
going to be not as good as it could be, I get a ability of compute when I'm sacrificing performance,

11:04.720 --> 11:08.960
and that's not what we want to do. To some extent, condos the same thing, they build one package

11:08.960 --> 11:16.640
for x86, they don't build for every possible x86 CPU, we do. We're avoiding duplicate work by

11:16.640 --> 11:24.080
working together, we use tools that automate the software installation process, like easy build,

11:24.080 --> 11:28.480
but like I already said, you're used them in silos, you're not really working together on the

11:28.480 --> 11:33.520
installations themselves, and that's something we want to change. And it also helps, and then

11:33.520 --> 11:39.760
it actually opens up additional use cases, so it helps with training. You can give people a temporary

11:39.760 --> 11:44.080
super-computer in the cloud, there's tools for spinning up slurn clusters in the cloud,

11:44.800 --> 11:50.240
you give people access, you give them easy to get access to the software that they want to run

11:50.240 --> 11:54.000
on that cluster, you train them on them for like a week. After that week, you throw the cluster away,

11:54.000 --> 11:57.600
you literally burn it down all the VMs are gone, but they can still get the same software

11:57.680 --> 12:01.680
environment, easy somewhere else, on the laptop, on all the super-computers, so you train them in

12:01.680 --> 12:06.320
an environment that they will be familiar with and that they can translate to other platforms as well.

12:08.480 --> 12:12.240
So a bit more graphically, what does this all mean? Well, we have a whole bunch of software,

12:12.240 --> 12:17.600
like Gromax, but many other things as well, or Python packages, bioinformatics,

12:17.600 --> 12:22.640
PyTorch for AI, whatever you can think of, as long as we have source code, and it's open source

12:22.640 --> 12:28.480
software, we can install it in easy. Today that includes 650 different software projects,

12:28.480 --> 12:33.520
and then I'm not counting things like NumPy or R libraries, we have, it says over a thousand,

12:33.520 --> 12:37.200
it's actually over two thousand additional things like those, if you're counting those,

12:37.200 --> 12:42.720
that software then it's close to 3,000 unique software projects available today,

12:42.720 --> 12:47.440
work on any Linux system, I haven't seen any exceptions there, whether you're running

12:47.440 --> 12:55.200
Fedora or Ubuntu, I don't care, they work. We support 14 different CPU targets, so today

12:55.200 --> 13:01.760
across essentially two CPU families, so AMD Intel is one arm as another, and we essentially support

13:01.760 --> 13:08.160
multiple generations of CPUs in there. So if there's an AMD's and two, it's in three, it's in four,

13:08.160 --> 13:13.840
we support all of those, and we build the binaries for each of these generations. And then we also

13:13.920 --> 13:17.600
already have support for an Nvidia GPUs, also there are three generations of GPUs.

13:18.400 --> 13:23.600
Rescribe is coming, we're playing with that already, exploring that support for AMD GPUs,

13:23.600 --> 13:27.760
so for the Rokom ecosystem is a work on progress, so we're expanding our scope as well,

13:27.760 --> 13:32.160
in terms of how we support. What's really interesting, I'll show this hands-on,

13:32.160 --> 13:36.640
is that the software is three and then on demand. So once you have access to easy, you can see what

13:36.640 --> 13:41.920
is there, Gromax, which versions, whatever, when we need to start running Gromax, it's actually

13:41.920 --> 13:47.680
pulling things in behind your back, a bit like you go to Netflix, you pick a movie, the movie is not

13:47.680 --> 13:51.600
on your system or on your TV yet, when you start watching it, it's like being pulled in as you

13:51.600 --> 13:56.080
watch it. This is the same thing happens with the software here, so things are pulled in on demand,

13:56.080 --> 14:00.480
so you're not pulling in a 10 gigabyte container image and then using a 100 megabytes of it,

14:00.480 --> 14:03.120
now you're only downloading the 100 megabytes, they're actually going to use.

14:04.800 --> 14:10.080
And this is now being embedded in the Euro HPC Federation platform, so they're becoming

14:10.080 --> 14:16.560
rather bicting in Europe across all the European super computers. All of this is powered by

14:16.560 --> 14:21.360
open source software, so all of this is done in public, anyone can access easy, there's no limitations,

14:21.360 --> 14:25.280
you don't have to tell us that you're using it, you just do what's in the documentation,

14:25.280 --> 14:28.320
you install and configure certain human-fests and you get access to everything.

14:29.120 --> 14:33.520
The top layer of easy, which we call the software layers, where the actual scientific software

14:33.600 --> 14:40.720
is in, so that installations installed with easy builds exposed through the module system

14:40.720 --> 14:45.040
that we use L mod for, and there's a small thing that does CPU detection, so you don't have to

14:45.040 --> 14:49.520
tell us which you have, we detect it for you and we do the right things on the need of the

14:49.520 --> 14:55.760
covers. We use Gen2 Linux to build a middle layer to make things compatible with essentially any Linux

14:55.760 --> 15:00.880
system, so we don't get about your Linux distribution or work fine, and you certainly am

15:00.880 --> 15:07.040
a fast to distribute the software and get this on demand's streaming behavior. Refrain for software

15:07.040 --> 15:12.560
testing, magic hassle to spin up clusters in the cloud to do demos or to actually do the builds

15:12.560 --> 15:16.880
themselves, we do a lot of the builds in AWS, which are sponsoring easy, they just give us

15:16.880 --> 15:22.640
free access to whatever CPUs we need to have access to, so everything essentially it's one big

15:22.640 --> 15:27.360
puzzle, and at some point all the components were there to actually make the puzzle and build

15:27.360 --> 15:34.080
out this bigger thing fully built with open source software. So same thing, software goes into

15:34.080 --> 15:39.120
easy, easy pops up on local supercomputers, like we have in flounders, the flamish supercomputing

15:39.120 --> 15:45.200
center, it's actually one very close here in Zellek, a new one Sophia at the University of

15:45.200 --> 15:52.960
the University of Brussels or the Flamish peaking University of Brussels. They do get along,

15:53.040 --> 16:00.160
so you can use it on your laptop, on your PC, as long as you have Linux in there somewhere

16:00.160 --> 16:04.320
in the VM or directly, you can use it in the cloud, commercial clouds, but also this

16:04.320 --> 16:08.000
European open science cloud, if you haven't heard about this and you'll research that

16:08.000 --> 16:13.120
sometimes need like a small VM to try things, you can get it for free there, every citizen in

16:13.120 --> 16:19.200
Europe can get a small Linux VM or whatever, where you can do some stuff on, you have like a

16:19.200 --> 16:26.000
monthly budget or you can run things. It works on Raspberry Pi, we have done demos and this

16:26.000 --> 16:30.160
it also works on risk five boards, if you're a bit more careful, if you know what you need to do,

16:31.200 --> 16:34.960
and it is on the biggest supercomputers all across Europe, thanks to the Federation platform,

16:34.960 --> 16:42.960
so we're essentially everywhere. So how does this work? Like I already explained, software is

16:42.960 --> 16:48.640
streamed in only a month, as it is being used, thank you, sir, the MFS for that, we built the software

16:48.640 --> 16:52.640
and we built the binary so that it's independent of the host operating system, that's where the

16:52.640 --> 16:58.960
gentle Linux comes in, this middle layer there on the top, we call this sometimes containers without

16:58.960 --> 17:02.800
the containing, because containers do essentially the same thing, they build a smaller operating system

17:02.800 --> 17:08.000
in the container image, and then they put you in a box to protect you from the outside world,

17:08.000 --> 17:11.920
we actually don't want to be in a box, we want to access the GPU, we want to access the inter

17:11.920 --> 17:17.440
connect, we want to get access to all the fancy hardware, so we're taking the idea of being

17:17.440 --> 17:22.080
independent of the host, but we're not containing, because we don't actually want to be contained,

17:22.080 --> 17:27.280
and we're optimizing for specific CPUs and GPUs as well, we'll alter the tech what you have,

17:27.280 --> 17:33.200
so you don't need to care whether you're on an Intel or an ARM system, that's fine, it works fine.

17:34.720 --> 17:39.520
As an end user, and this is something we can still improve, we're working on this, once we get

17:39.520 --> 17:45.440
access to easy and I'll try to do a demo with the Wi-Fi holds up, as an end user, you source

17:45.520 --> 17:49.360
and in its script, this is something I really want to improve, we're working on this,

17:49.360 --> 17:54.080
but it does the detection of your CPU, it sets up your environment, and from that point forward,

17:54.080 --> 17:59.600
you can just load modules to activate additional software and run the software, and everything will

17:59.600 --> 18:04.160
work fine, so once you have access to easy, you can stop installing software, everything is essentially

18:04.160 --> 18:09.600
there, if the software that you need is included in easy, of course. How does this work?

18:09.600 --> 18:15.040
Well, you're this nice person working on her computer, you ask easy, do you have the

18:16.400 --> 18:22.400
Chrome Expinery already, it checks the cache, if it is, then it just runs it, if it's not there,

18:22.400 --> 18:26.640
it will download the binary from the middle of server or one of the middle servers that we have,

18:26.640 --> 18:30.960
puts it in the cache, and then runs it, and the middle server synchronizes with the central server,

18:30.960 --> 18:35.520
so you have like a network of servers, essentially, serving this thing. All of this is done by

18:35.600 --> 18:41.520
SerbianFS, we get this for free, essentially. So yeah, the streaming thing is interesting.

18:42.320 --> 18:48.320
This gives a bit of a better view of the SerbianFS network, so in the sort of middle of the slide,

18:48.320 --> 18:52.640
the darker part is the central server, this is actually sitting in groaning in the Netherlands,

18:52.640 --> 18:57.680
currently one of the partners in the easy project, let me have a couple of middle servers around,

18:59.440 --> 19:05.280
there's one in AWS, again sponsored, thank you, AWS for that, we still have one in Azure,

19:05.280 --> 19:09.680
but that one is probably going to die soon, it's sitting in US East, which is not that interesting

19:09.680 --> 19:14.560
for people in Europe anyway, there's some people in the UK that built or introduced another middle

19:14.560 --> 19:19.120
server because they like the idea of easy, so they want to help the network, and then there's a separate

19:19.840 --> 19:24.880
one on the top right that we use to synchronize local members, so that's interesting, if you have an

19:24.880 --> 19:30.880
HPC cluster, you actually want to have all the software close to the cluster, so you can actually

19:30.880 --> 19:34.560
set up your own middle server, synchronize with hours, and then you have everything close and you're

19:34.640 --> 19:38.080
essentially in full control, you can disconnect the network and everything nicely works.

19:41.360 --> 19:45.440
All right, so demo, I'll skip this because I'll do it hands-on, I'll skip this too,

19:46.240 --> 19:49.920
essentially it's the same thing, you source, and in it's script, you load modules,

19:49.920 --> 19:53.600
download an input file for Gromax in this case, you file a Gromax and it works fine,

19:54.400 --> 19:58.720
and you can do the same thing, so this script that I'm showing you can copy paste this, put it in a file

19:58.800 --> 20:05.840
called easy-dash demo.sh, and if you use Lima, it's a tool for starting Linux containers,

20:05.840 --> 20:13.680
or Linux VMs, I guess, on macOS, so you set up Lima to be aware of easy to provide the easy

20:13.680 --> 20:18.880
repository and slash tv and fast software, and then you can run the script, so this script that I showed

20:18.880 --> 20:23.600
you works on the laptop, works exactly the same way on a supercomputer, you're logging to the

20:23.600 --> 20:29.040
supercomputer, you use commands to submit the script as a job, without touching it, without making any

20:29.040 --> 20:35.120
changes, works fine, so you get some performance numbers after running Gromax, you're looking

20:35.120 --> 20:41.600
to the cloud where you may have a slurm cluster, here I'm actually, it doesn't show it, but it does show

20:41.600 --> 20:48.320
it, so the P equals says I'm submitting this to an arm partition, again I can run the same script,

20:48.320 --> 20:53.280
no changes needed, I just s-batch the script and it comes back with the results, and the same thing

20:53.280 --> 20:58.160
on one of the supercomputers in Europe, I can do the same thing, so wherever you are, it's fine,

20:58.160 --> 21:02.160
if you're a bit smart about writing the script, so if you download essentially the input files that you need,

21:04.080 --> 21:06.800
it works fine, the software essentially follows you around wherever you go.

21:10.560 --> 21:15.440
So before I do the demo, how do you get access to this, how do you get this tvMFS software

21:15.440 --> 21:21.040
easy-ion thing on there, all you need to do is what's on the slide, so this is for a redhead-based

21:21.040 --> 21:30.000
system, you're essentially installing sort of your MFS, so just a young installer, the NFN

21:30.000 --> 21:36.880
stall from Europe, you can find this in the documentation, so this is for the repository where you can

21:36.880 --> 21:41.200
get sort of your MFS, so then you can install sort of your MFS, you configure sort of your MFS

21:41.200 --> 21:46.560
for this guy, in this case a single VM or single client, which is just giving it direct connection

21:46.640 --> 21:50.960
to the mirror server, if you're a single client that's okay, and give me 10 gigabytes of

21:50.960 --> 21:56.080
disk space at max to download software when I'm running it, so you can limit it as well, and you

21:56.080 --> 22:02.080
complete this setup, run this on a redhead-based system, an RPM-based system, you get access to

22:02.080 --> 22:08.560
easy and it works, now you may not believe me, so the best way of showing you is actually doing this,

22:09.280 --> 22:15.360
so here I am in Amazon, I will spin up a VM, I'll make it an R1 to make it a bit more interesting,

22:16.000 --> 22:21.280
we'll have eight cores, 50 gigabytes of disk which is way too big, I will launch this,

22:22.560 --> 22:30.240
I should take a seat, so I'll launch this VM, this should be quite quick, it will go as well,

22:30.240 --> 22:40.880
this gives me an IP address, I can SSH into that IP address, and I do fast, yes, there it goes,

22:41.600 --> 22:48.080
this is an entirely new VM, uptime zero minutes, so it was really spun up right now, this has

22:48.080 --> 22:55.280
like no software, it's a bare bones, redhead-9 to a realm-9 operating system, there's not even

22:55.280 --> 23:05.280
them here, so how can I survive without them? No, no, we don't do it, so what I'm going to do,

23:05.280 --> 23:09.120
I'll just copy-based here, the installation instructions for getting access to easy,

23:09.120 --> 23:14.000
just literally copy-based for this, the same as was on the slide, I entered this, and now this

23:14.000 --> 23:20.720
is going to take hopefully only like a minute, but essentially all this is doing is the biggest

23:20.720 --> 23:27.040
part of the VM installed SerbianFS, now the nice thing is we don't need to do anything extra,

23:27.040 --> 23:31.600
there's actually a second step here as well, which is installing a very small RPM that has

23:31.600 --> 23:36.720
the configuration for SerbianFS that knows about easy, you can actually skip this one because

23:36.800 --> 23:42.640
easy is included in the default configuration for SerbianFS, so the SerbianFS people like us enough

23:43.520 --> 23:49.920
that we actually get into their default configuration, this is not relying on the Wi-Fi at all

23:49.920 --> 23:55.840
to do the installation, so this should be quick, it's literally just installing SerbianFS and it's

23:55.840 --> 24:01.040
dependencies, so you won't see Go Max opened from TensorFlow, none of this is actually being installed,

24:02.000 --> 24:12.480
so hopefully this completes quite soon, yes complete, alright, so now we have SerbianFS available,

24:12.480 --> 24:19.120
if I remember the command name, so this is installed that's actually all we installed, so now you

24:19.120 --> 24:24.320
think, oh yeah, we can access easy, right, this is so this CPMFS part is now suddenly there,

24:25.120 --> 24:30.160
but this looks empty, like what's put it I miss, well you didn't miss anything, this works like

24:30.240 --> 24:35.920
web pages, if you know the URL, you go to it, it's auto mounts the repository and suddenly it's

24:35.920 --> 24:41.360
there, and now of course if I do that, let's in here I do see it, alright, so we have access to easy,

24:41.360 --> 24:48.640
excellent, now I'm thinking what the best way is to actually do the demo, well let's copy paste

24:49.920 --> 24:56.560
the script from the slides, like I promised you, this should work out of the box, so I'll

24:56.560 --> 25:04.720
literally put this in a file called easy demo, and of course I don't have the, I can use fii though,

25:04.720 --> 25:16.640
that's okay, put this in the script, make it executable, yeah, alright, so just to remind you what

25:16.640 --> 25:23.760
the script will do, it will set up our environments for doing CPU detection, what are we running

25:23.760 --> 25:29.040
on, please update my environment, so I can start using software, will load a gromax module,

25:29.840 --> 25:34.400
and I will just download the file to run gromax and actually do the do the launch,

25:36.240 --> 25:41.280
so this fits out a whole bunch of output, also again something I'm not a big fan of right now,

25:41.280 --> 25:47.120
the key part here is it has detected that we have an ARM CPU, and an ARM Neo1stv1 CPU,

25:47.120 --> 25:51.920
it supports certain instructions, that's the binary we're going to use, it didn't find any GPUs,

25:52.080 --> 25:56.800
that's correct, we don't have any, it does all that, it download the input file for gromax

25:56.800 --> 26:03.360
and it just runs gromax, I don't need to do anything else, so that's excellent, I'll cancel it,

26:03.360 --> 26:09.360
this will take a little while, to bring this home a bit more, and since I'm doing quite well on time,

26:11.360 --> 26:17.280
I'll download some of the demo scripts as well that we have, okay,

26:17.440 --> 26:26.560
do I need this, no, actually I don't, but I can, I mean, get this in easy, so I don't need to

26:26.560 --> 26:36.000
install it anymore, I can just do this, I set up easy to update my current shell to make

26:36.000 --> 26:41.360
modules available, so it does the same detection, and now I do have get, I didn't have to install

26:41.360 --> 26:48.160
it at all, so now I can do this good clone, go for demo scripts, I've demo scripted here for

26:48.160 --> 26:56.080
the bunch of software, I have bio-conductor espresso gromax by torch open foam, quantum espresso

26:56.080 --> 27:03.920
tensorflow, somebody pick one, quantum espresso, okay, that's the dangerous one I think, let's see

27:03.920 --> 27:12.960
what happens, so this looks wow, science, I have no idea what's going on, but essentially this

27:12.960 --> 27:18.640
will just load modules for gromax and then science, so let's run it and see what happens,

27:20.400 --> 27:25.840
so it sometimes takes a while to react, that's by, because it's firing up the gromax,

27:25.840 --> 27:30.160
the binary, or trying it, in terms of your emphasis, oh wait, I'll actually get it for you,

27:30.160 --> 27:33.600
and then you can run it, right, but it's all transparent, so you don't really notice this,

27:33.600 --> 27:39.200
so I don't know what this means, but it's something scientific, it's some kind of plot, I guess,

27:39.200 --> 27:48.000
or yeah, so it does work, okay, so hopefully that was a bit convincing and the life demo worked, yes,

27:56.400 --> 27:59.680
so that's really all the rest of it, there's no hidden tricks here, you can essentially do it,

27:59.680 --> 28:03.760
but I did, if you have access to any Linux system, we have installation instructions for you,

28:03.760 --> 28:09.120
Ubuntu as well, if you get really advantageous, if you're on the risk of five boards,

28:09.120 --> 28:13.280
you need to install sort of being a fast from source, it's not that difficult, it takes a while,

28:13.280 --> 28:17.680
but then yeah, so the steps may be a little bit different depending on their own system,

28:18.960 --> 28:23.120
now this is a software performance devroo, right, I already explained, you need to make sure

28:23.120 --> 28:29.280
you run binary that are optimized for your hardware, that's very important, so build for your CPUs,

28:29.680 --> 28:34.720
don't sacrifice performance for mobility of compute, like people typically do with containers,

28:34.720 --> 28:39.680
now you can actually build containers that are optimized for specific CPUs as well and run those,

28:39.680 --> 28:43.200
there's actually benefits to that as well, but that's just not what people typically do,

28:43.200 --> 28:48.000
they build a container image once and they drag it along everywhere and they assume everything is magic and fine,

28:48.000 --> 28:53.680
it's not, so you do have to be careful, like I showed you in the demo, another performance aspect,

28:53.760 --> 28:59.840
it takes minutes to get access, so in a hard beat, it works fine, whatever you're running,

28:59.840 --> 29:03.600
it's trimmed in on the amount, so don't need to wait long for it to actually start

29:04.480 --> 29:07.840
because you're only downloading what you're using, again, different from containers,

29:07.840 --> 29:11.920
if you have a container image, that's big, because it has a lot of software, that's good,

29:11.920 --> 29:16.160
you have to wait until the whole container is downloaded before you can start running it, so that's a bit annoying,

29:17.040 --> 29:20.720
so we may first use lots of caching as well, so if you download something once,

29:20.880 --> 29:23.600
it has like a local cache that was that 10 gigabytes I mentioned,

29:25.280 --> 29:29.120
so it doesn't download again and again every time you run it, of course, and there's multiple levels of caching

29:30.000 --> 29:35.520
in this network, it improves startup performance as well of software, on parallel file systems,

29:35.520 --> 29:38.720
and that's maybe the last big thing I want to talk about, I have more slides on this,

29:40.400 --> 29:45.040
and then humans waste less time as well, they collaborate with essentially install the software

29:45.040 --> 29:53.040
once and easy, and everybody can use them, so the startup performance thing is I added these slides

29:53.040 --> 30:00.080
in 10 minutes ago, but so starting software on super computers can be slow, like what the fact,

30:00.080 --> 30:05.280
it's a super computer, why would it be slow? Well it's because very often you only have a parallel

30:05.280 --> 30:10.560
file system on the super computer to like really big things, huge data files, and a hundred computers

30:10.720 --> 30:16.560
working together on single files and writing things, parallel LIO, really nice, really bad for software,

30:16.560 --> 30:20.640
you don't want to install your software on a parallel file system, if you go to the documentation

30:20.640 --> 30:26.400
of like Luster and GPS, it actually tells you don't install your software on Luster, it's just

30:26.400 --> 30:33.120
it's a bad fit for what we do, but very often it's the only option you have and like there's nothing else,

30:34.080 --> 30:39.760
so software installations are typically small files, binaries, Python files, whatever,

30:39.760 --> 30:44.800
and quite a lot of them, potentially, and it has a pretty weird access pattern compared to regular

30:44.800 --> 30:50.480
data files as well, you can solve this by putting the software in the container image and you have one

30:50.480 --> 30:55.280
big file, very good for a parallel file system, but you still want to build your software from source

30:55.280 --> 31:00.960
and then put it in the container, which is like extra work again, and if you're doing a central

31:00.960 --> 31:05.920
software stack, so you want to get something rich for your users, doing that with containers is a mess,

31:05.920 --> 31:09.600
because you either have very big containers or you have a hundred of them and people have to pick

31:09.600 --> 31:16.080
and get very messy again, so one very concrete example I'll power through this quite quickly,

31:16.640 --> 31:21.200
start up performance, how do we test it, how do we benchmark it, let's try import TensorFlow,

31:21.200 --> 31:24.720
it should be fast, right? There's not a lot going on, it's a single import, we're not actually using

31:24.800 --> 31:32.320
TensorFlow, well, this triggers six and a half thousand open goals, and it opens over 500

31:32.320 --> 31:38.160
directories for some reason, it actually requires almost three and a half thousand of those files that

31:38.160 --> 31:44.560
includes over two thousand files from TensorFlow itself, then almost a thousand files from dependencies

31:44.560 --> 31:48.800
and other stuff, and then a little bit of things from essentially the operating system and

31:49.360 --> 31:53.920
this middle layer, this compatibility layer, and totally it's about one gigabyte of data,

31:53.920 --> 31:57.440
that's not a lot, but because it's scattered because three and a half thousand files,

31:57.440 --> 32:03.280
it's very bad for parallel files systems. To show this, some benchmarks, this is like the optimal

32:03.280 --> 32:09.680
case, crazy mode, where you install TensorFlow in memory, like you actually built TensorFlow from

32:09.680 --> 32:13.840
source and you put it in the RAM disk, which is bad because you reboot your computer, it's gone,

32:13.840 --> 32:18.160
and you can start again, but this is the best you could do, there's no faster files system than

32:19.120 --> 32:24.400
so if you do that, it takes about two seconds to do the import TensorFlow, not bad, that's

32:24.400 --> 32:30.560
worth waiting for, you can probably optimize this, but this is like the ground truth, we cannot really

32:30.560 --> 32:38.880
do better than this, more realistic is lost, so if you do this on the left, it's called

32:40.000 --> 32:44.480
situations, let's say a fresh reboot, lost or hasn't cashed anything yet, you're the first

32:44.480 --> 32:48.640
poor person to do import TensorFlow, you're waiting for over a minute, for the import

32:48.640 --> 32:52.640
TensorFlow to complete, and this is like normal for a ruster, if you talk to a luster people,

32:52.640 --> 32:58.880
it's like yeah, it's software, we told you not to do it, if it's hot kernel cash or everything is

32:58.880 --> 33:03.440
cashed already, you can get close to the optimal because then essentially everything is already

33:03.440 --> 33:07.840
in memory, so luster also cashs things and it's not actually always that bad as it is on the left,

33:08.560 --> 33:14.320
luster is open source, parallel, file system, very good, GPA vases commercial software,

33:15.760 --> 33:19.840
so there you get like 40 seconds and if it's cashed or not, it doesn't really matter too much,

33:19.840 --> 33:25.600
so it's always like half a minute to do an import TensorFlow, very annoying, so bad fit for software,

33:26.240 --> 33:29.520
certainly my face is a little better, but you have to be careful how you set it up,

33:30.240 --> 33:35.120
so with certainly my face we get here like in a cold situation, so the cashes are clean,

33:35.360 --> 33:39.440
you get around 9 seconds, if you have the software very close to your system, so we'll

33:39.440 --> 33:45.040
private proxy or a private mirror server, if your mirror server is quite far away, so if you're

33:45.040 --> 33:49.520
in the Netherlands and your mirror server is in Dublin, then you're suffering about a minute,

33:49.520 --> 33:55.200
but that has bad as DPSS, okay, I'm actually downloading from half across Europe,

33:55.200 --> 34:00.080
that's still doing fine, if your mirror server is across Atlantic, then it's very bad,

34:00.160 --> 34:05.840
so you have to make sure your network is set up well, and then there's two additional situations,

34:05.840 --> 34:10.640
there's hot cash, so everything is already in memory, so ground truth, or very close to that,

34:11.440 --> 34:16.480
if it's not in the kernel cash yet, so not in memory yet, but it isn't a private cash of

34:17.440 --> 34:21.360
serve me my face, then you get the stuff on the right, but then we're still talking 4-5 seconds,

34:21.360 --> 34:25.440
so nothing even close to the 30 seconds or the minutes that we were seeing with the parallel fases,

34:26.400 --> 34:30.160
that happened because here with me my face was built for software, right, so it knows the

34:30.160 --> 34:35.440
access patterns, it can handle small files quite well, it does local cashing on the client,

34:35.440 --> 34:41.760
multiple cashing levels, so way better than GPFS or LASTA do for this type of stuff, yeah,

34:42.320 --> 34:47.600
more information, time is up, okay, I won't go through this, I covered most of this, lots of

34:47.600 --> 34:53.440
software is already there, if software is missing, we actually this is a community project,

34:53.440 --> 34:58.480
you can send us a full request, please add this, we will check it, if it's okay, we will build it,

34:58.480 --> 35:02.960
we will ingested in easy and is there for everybody, if you want to learn more about this,

35:02.960 --> 35:09.120
come on and talk to me, but this is a community project, easy is already available, all across Europe

35:09.120 --> 35:13.120
in the cloud, on the big European systems, there's a list in our documentation,

35:14.720 --> 35:19.040
and in this part of the EuroHPC Federation platform, so really essentially, so can

35:19.040 --> 35:24.080
university is being paid to integrate it into this new platform, which doesn't exist yet,

35:24.080 --> 35:30.160
it's coming end of March, and it's essentially a one-stop shop, one central point to access

35:30.160 --> 35:35.840
European supercomputers, easy is going to be part of this, more information on the webinar,

35:35.840 --> 35:42.320
and thanks to our sponsors, European Union, European project, and lots of links to click on

35:42.320 --> 35:43.520
and get more information.

35:49.760 --> 35:54.800
I guess we have time for a question.

36:05.840 --> 36:12.400
So, if I listen correctly, the main goal of the easy project is to make infrastructure,

36:12.400 --> 36:16.720
so it's easy for the scientists to access the HPC, right?

36:17.120 --> 36:20.880
Yeah, to use the software that they want to use, not necessarily HPC, but.

36:20.880 --> 36:27.600
Okay, so then you mentioned that one issue that you have is making that software performance

36:27.600 --> 36:30.960
for specific CPUs, right?

36:30.960 --> 36:36.480
Yeah, like R, GPUs is the one, so there's another player now coming to the industry, which is the

36:36.480 --> 36:38.960
QPUs, right, the quantum process units.

36:38.960 --> 36:43.120
Yeah. So, my question is, is there an initiative in your project,

36:43.120 --> 36:46.480
where I'd taken to account that or how is that being handled?

36:46.480 --> 36:50.960
Because this is a hot topic now in the quantum computing industry.

36:50.960 --> 36:58.160
How can we integrate those things, right? How can you do share the load between a GPU or

36:58.160 --> 37:03.520
a CPU and so on? It would be very nice to have our ads, some open search there in this field,

37:03.520 --> 37:09.280
because there are a lot of private. So, summarized could easy help with the situation where you

37:09.280 --> 37:14.640
are getting quantum computers and you want to optimize software for those, very good question.

37:14.640 --> 37:19.920
I don't have good answers there. What I can say is that easy is being integrated into this,

37:19.920 --> 37:23.440
which is not only the supercomputers, but also the quantum computers. So, in some sense,

37:23.440 --> 37:29.600
we'll have to provide an answer to that. Now, to me, the libraries that you would use to access

37:29.600 --> 37:35.040
the quantum computer is just another software package. I don't care too much. And for me, it's the

37:35.040 --> 37:38.880
job of the libraries to make sure that they are able to use the quantum computer very well.

37:39.840 --> 37:44.560
Or maybe the compiler that's using, it's not really programming languages in quantum computers,

37:44.560 --> 37:48.400
it's very different. But it's, I'm more like on that side and we will just, we install all the

37:48.400 --> 37:53.600
software that you need to make good use of the quantum computers. But it's not our job, I think,

37:53.600 --> 37:59.360
to optimize for, I don't know how the quantum computer works. I'm happy to install thousands of

37:59.440 --> 38:04.560
software packages. So, as long as it's open source, then we'll support it. Okay? Thanks a lot.

38:08.800 --> 38:14.720
Yeah, you have to wrap up if there's anyone has questions, I'll be outside so you can meet me there.

38:15.760 --> 38:16.720
Thanks a lot. Thank you.