WEBVTT

00:00.000 --> 00:09.440
So, yeah, to shout out, we'll be talking about, you know, what does it actually cost in

00:09.440 --> 00:15.960
terms of energy and, you know, maybe even other resources to run this models on GPUs, not

00:15.960 --> 00:19.960
in consequential subject, but, you know, typically gets overlooked in all the excitement

00:19.960 --> 00:21.640
of actually, you know, running something.

00:21.640 --> 00:48.600
So, how many people are interested in optimizing AI workloads, how many people are interested

00:48.600 --> 00:52.600
in optimizing energy efficiency in AI workloads?

00:52.600 --> 00:58.840
Okay, then it's for you and my aim for this very quick session is to increase the number

00:58.840 --> 01:00.800
of hands that we are seeing today.

01:00.800 --> 01:05.480
So, by the way, my name is Tushara and I'm an assistant professor at Canadian University.

01:05.480 --> 01:10.600
So, let's start with a one more question.

01:10.600 --> 01:19.760
So, how many months or how many years, we can power a and typical household, American

01:19.760 --> 01:27.800
household with the power that we used to plane a train GPT-3.

01:27.800 --> 01:38.160
And it guesses, you can shout out how many years or how many months, I mean, how many years,

01:38.160 --> 01:44.480
you can power a typical American household or American house that the same power that

01:44.480 --> 01:48.320
we used to train GPT-3.

01:48.320 --> 01:55.840
Well, some while guess is, but it's 120 years.

01:55.840 --> 02:03.880
So, what this is what we used and if you are surprised by that number, that GPT-4 was 40 times

02:03.880 --> 02:05.480
bigger than that.

02:05.480 --> 02:13.160
So, well, so, and this plot is basically showing that what kind of training the sources

02:13.160 --> 02:18.160
in terms of electricity we are consuming to train these large models.

02:18.160 --> 02:25.640
And this is another depiction that shows that what kind of prediction we are making in terms

02:25.640 --> 02:31.680
of energy consumption of these AI training or inference models.

02:31.680 --> 02:36.720
You can see that it's a very tiny number in the beginning of 2023.

02:36.720 --> 02:45.040
And we are predicting 2030, it's speaking like, and look at the scale, this is three tell

02:45.040 --> 02:51.200
about our, that we are talking and that's 653 tell about, of course, it's an estimation

02:51.200 --> 02:53.200
but still given take.

02:53.200 --> 02:55.400
So, what we can do about it?

02:55.960 --> 03:03.160
So, sustainable AI or DNA is an emerging field and that where we don't really want to

03:03.160 --> 03:08.520
lose all the benefits, we of course, want to reap all the benefits that AI technologies

03:08.520 --> 03:13.640
and people like you are developing, but we want to optimize it.

03:13.640 --> 03:22.200
So, that's the premise of sustainable AI or the DNA and the first important point here

03:22.280 --> 03:28.680
is that we, how we can measure how much energy your AI model is consuming.

03:28.680 --> 03:37.720
So, I mean, we can measure there are various hardware, you know, meters and software

03:37.720 --> 03:44.040
energy meters, I mean some of them, maybe known to you, I mean we have seen some commands

03:44.040 --> 03:51.640
people are trying with SMI, so there are there, but the problem with these is that they

03:51.720 --> 03:57.320
are good for, for computing energy consumption at the system level.

03:57.320 --> 04:04.440
So, you can, you can measure energy, what the entire system, the entire rack is taking,

04:05.720 --> 04:11.960
and if you're interested in, you know, figuring out how much energy my one single API call is

04:11.960 --> 04:18.920
taking, that is not that easy to figure it out and that we sometimes back, we wrote a paper

04:19.000 --> 04:26.280
about that, how we can instrument source code and insert appropriate, you know, checks,

04:26.840 --> 04:32.520
because accuracy is another thing, I mean we can measure energy, but it should be accurate.

04:32.520 --> 04:38.920
So, how we can ensure that it is accurately calculated and log all the details.

04:38.920 --> 04:45.080
So, that's, and this instrument instrumentation is done automatically.

04:45.960 --> 04:52.440
We, in fact, extended that tool, I mean by the way, there was a tool in the previous paper also,

04:52.440 --> 05:00.120
it opened source, and we recently extended that tool code green tool, which is also open source

05:00.120 --> 05:06.760
that you can, you guys can use, that has even further increase the precision of accuracy

05:07.480 --> 05:14.600
in energy consumption. And as a community, we know many, many techniques that we can apply

05:15.160 --> 05:21.800
in various phases of AI workloads. So, I'm just summarizing some of them here.

05:23.240 --> 05:29.160
Now, one very last question that I want to ask is that, you know, at application level,

05:29.160 --> 05:36.360
there are various techniques, as you can see, but what about at a GPU level or GPU optimization

05:37.080 --> 05:42.360
level. So, you, some of you may say that well, we can maximize as a occupancy.

05:43.640 --> 05:52.360
And that is valid guess, but here also, I would like to highlight something, which is very

05:52.360 --> 05:58.360
critical that we recently figured it out, that not only the number of threads that counts,

05:59.640 --> 06:06.040
but also the third block configuration also counts. So, just to highlight that, that all of these

06:06.920 --> 06:13.000
configuration that you see in the highlighted bar, they are using the same number of threads,

06:13.000 --> 06:20.200
but their block configuration is changing, and that is changing the energy consumption for that specific task.

06:21.960 --> 06:28.360
And, and again, if you are interested in, in this, instead of the work, this is, we, we

06:28.920 --> 06:35.720
a predict energy consumption by instrumenting or not only instrumenting the source, but

06:36.280 --> 06:42.920
using the ptx format that some of you might be knowing. So, that is it. If you are interested

06:42.920 --> 06:47.720
in this kind, this line of work, these are the all the papers that we recently published. So,

06:47.720 --> 06:51.720
feel free to take a look. Thank you.

