WEBVTT

00:00.000 --> 00:29.840
All right, time for the, for the next talk, just quiet, quiet everyone, all right, so every

00:29.840 --> 00:34.880
year we try to find, you know, a presentation and sometimes they get submitted, sometimes we

00:34.880 --> 00:40.960
have to find, you know, people introducing a new and exciting inference framework, because I think

00:40.960 --> 00:44.720
it's all about those types of things these days. Like again, you know, you can sort of do a lot

00:44.720 --> 00:49.760
of things with PyTorch, but that is super boring. And it's not really inference optimized. So last

00:49.760 --> 01:03.760
year, it was the ML this year, it's tracked. So Julian and Matthew take it away. Hey, um, no, no,

01:03.760 --> 01:09.200
okay, well, thank you very much for attending this presentation. Julian and myself are both

01:10.080 --> 01:17.200
working as software engineer developers at Sonos voice control in Paris, and we are here today

01:17.200 --> 01:23.120
to introduce two open source libraries. One has been in a for a while, it's tracked, and the other

01:23.120 --> 01:30.240
is a companion project that we open source last summer. So we, just quite seeing the surface, we have

01:30.240 --> 01:36.880
just a couple of minutes to introduce tracked. So what you have to remember, it is, it's a generic,

01:36.880 --> 01:43.680
neural inference neural network inference library. One of these characteristics is that it's tracked

01:43.760 --> 01:48.960
all the way down. It goes from passing your model to executing it, so it contains the matrix

01:48.960 --> 01:54.880
multiplication stuff and optimized routine that we need. It's not new, it has been around for nearly

01:54.880 --> 02:01.920
ten years, and it's been used extensively in Sonos, but some of the company have been using it.

02:02.800 --> 02:08.240
It's written in Russ, it's meant to be very integration friendly, and it's released under a very

02:08.240 --> 02:15.440
permissible instance. So just to give you an idea of what it looks like to use tracked,

02:16.800 --> 02:21.680
there's a bunch of code here, but very little is actually tracked code. The first two lines show

02:21.680 --> 02:27.520
how to instantiate the NNF parser of tracked. The second line show how to load a model,

02:28.400 --> 02:35.120
transform it into a renewable form where, where, which you can actually run, which means finding

02:35.120 --> 02:39.840
out the right order of operations, stuff like that. So that's the first two lines at the top of

02:39.840 --> 02:46.240
the screen, and then you have a bunch of stuff which is just getting an input ready for processing,

02:46.240 --> 02:54.080
and finally you get a model.run, or you actually run some model on the image, and the rest is

02:54.080 --> 03:03.120
post processing and displaying to make the example walk. So very simple to use. Another facet of tracked

03:03.200 --> 03:10.320
is it's common line tool, so it's designed to be able to first dump a model, so that here you can

03:10.320 --> 03:16.560
see the big enough of a mobile net model. You can see the convolution with some details, so it's useful

03:16.560 --> 03:23.360
when you're given a model and need to figure out how you can make it run, that's your first stop.

03:25.280 --> 03:30.640
But then once it's running, you can actually get insight on the performance, here you can see

03:30.640 --> 03:37.760
a profile of the same mobile net network, where you can see the depth-wise convolution takes

03:37.760 --> 03:45.760
nearly 10% of the entire execution time, and you also get some insight on the arithmetic's

03:45.760 --> 03:51.120
intensity of the operator, and so you can design if something is what you expect or if you need to

03:51.120 --> 03:58.960
invest in it. As I said before, it's meant to be very easy to integrate, so one of the key

03:58.960 --> 04:05.120
point is it doesn't have any big cumbersome sort of party, it's very easy to cross compile because we

04:05.120 --> 04:12.320
are using rest. It has support for resin, so you can actually run some model, a new browser,

04:13.760 --> 04:20.640
and we have a bunch of targets that we optimise for, so I'm 32 and I'm 64, I'll abuse because

04:20.880 --> 04:28.640
well, so knows, but also Intel and wasum, so wasum works with CMED, I mean all of them are CMED,

04:28.640 --> 04:38.240
but that was was MCMED, we are also optimizing for metal and QDAR. Tract as a CAPI and on top of this CAPI

04:38.240 --> 04:46.720
was some Python bindings. Recently we've worked on adding to tracks on GPU support and we still

04:46.800 --> 04:51.840
want it to be very easy, so you can see the difference in the first two lines of code and

04:51.840 --> 04:58.880
so one in the middle of the screen what it means to go from CPU to GPU and everything goes right

04:58.880 --> 05:04.320
on Tract, it's still all the fact we're still working on support, but you can see that in this case

05:04.320 --> 05:09.600
we get a nice improvement on the performance and you can also see I'm in social education to show

05:09.760 --> 05:18.000
how the command link can be used to bench a model. Here I'm going to show some of the

05:18.560 --> 05:26.320
web thing or thing that you might find weird if you play with Tract and on this instance we are

05:26.320 --> 05:36.800
showing the dump of Yamamoto and one thing that I want you to look at is the source as this S thing

05:37.680 --> 05:44.720
which means that Tract supports symbolic dimension of shapes. Tract likes to know everything about

05:44.720 --> 05:49.280
to network before running it so we need to know the rank of each tensor and we need to know

05:49.280 --> 05:56.560
the shapes of each dimension I mean the length of each dimension, but there is a twist you can

05:56.560 --> 06:01.440
use symbol so we don't have to have the exact value we need a symbolic expression to know

06:01.440 --> 06:08.960
how to reason about your model and I have another example here so that's in the same

06:08.960 --> 06:16.880
model it's a Yamamoto, the dynamic key value cash operator what you can see here is the shape is now

06:16.880 --> 06:21.840
S plus P as being the number of token in the current terms or of and two on if you're doing

06:21.840 --> 06:26.960
token generation and the P is the best sequence so here you can have all you prompt

06:26.960 --> 06:34.560
or everything accumulated before and one other thing which is different about Tract compared to

06:34.560 --> 06:41.840
the engine is that the state I mean Tract can do state management so here what you can see is that

06:42.640 --> 06:49.280
the dynamic key value cash operator just have one input and one input it doesn't have to be plugged

06:49.280 --> 06:55.200
into extra model input and outputs for storm management for statement management so state can be

06:55.200 --> 07:06.720
managed by Tract inside a state object that Tract API expose so all of the examples so far

07:06.720 --> 07:13.040
that I've shown our about NNF and we come back to that but Tract also has some support for I mean

07:13.840 --> 07:20.560
it has first class it isn't for onix we support like 85% of the operator set and it pans out to

07:20.560 --> 07:28.080
more model than that we have to do some adjustment because of the knowing all all about the

07:28.080 --> 07:36.400
tensors before hand thing and the Tract protocol before format doesn't contain everything we need

07:36.480 --> 07:43.120
so here you can see the setting putfact line for five where we actually load the model in two

07:43.120 --> 07:49.280
stage first we load some kind of protomodel then we add some type information shape information here

07:51.040 --> 07:58.480
to specify exactly the shape of one input and that's enough in that case to help Tract

07:58.480 --> 08:04.400
infer everything else about the shapes in the model so it's really analog to when you add

08:05.040 --> 08:09.920
when you need to add some type information in the C++ program or in REST program for to add the

08:09.920 --> 08:21.280
compile of figure figure out everything more recently the LN thing happened and it came with

08:21.280 --> 08:26.160
engine that we are no longer generic but we are taking a new approach where basically you

08:26.160 --> 08:32.400
what got the model in the engine and then you provide the tensors on the side and that doesn't work

08:32.400 --> 08:40.400
for Tract because Tract wants the model as an input so for some time we try to manage with

08:40.400 --> 08:45.680
onix but we like that we want to not go going anywhere with this and we add this internal tool

08:45.680 --> 08:50.400
that is called Torch to NNF and that is what Trilion is going to introduce now

08:51.600 --> 08:52.320
thank you Matthew

08:54.640 --> 09:01.440
so we will discuss about now shipping neural network from Torch so by Torch to NNF format

09:02.320 --> 09:07.920
and NNF format which is highly compatible with the Tract inference and giant which is this

09:07.920 --> 09:16.160
list so before going on we just want to set what is a model as set for us so something which

09:16.160 --> 09:22.560
is completely aside from a neural land giant is a list of tonsile names and a graph of

09:22.560 --> 09:29.680
computation that have been defined correctly which inform you the set of transformation that go from

09:29.680 --> 09:35.520
input to output and so this is what you need to have something independent that you can consider

09:35.520 --> 09:42.720
an asset so what is NNF and NNF's and for neural network extension format it addresses the same

09:42.720 --> 09:49.360
core problem as NNX and was specified by the coronavirus group it's something that didn't

09:51.200 --> 09:58.560
match as good as for the community as it could be because it landed one year after

09:58.560 --> 10:05.680
NNX and it didn't have a proper inference on giant to run on but it has some very interesting

10:05.680 --> 10:12.640
key features namely it have a readable graph structure that you can see on the right there so

10:12.640 --> 10:19.760
it's very easy to read you can just it's textual so you can directly use it with your text editor

10:19.760 --> 10:25.200
and it have composition available directly so as you know neural network are repeating blocks

10:25.280 --> 10:32.400
over and over again and so you don't have to repeat over and over again this especially for us

10:32.400 --> 10:37.760
what was interesting is that it's an extensible tonsile format as well that is defined in the

10:37.760 --> 10:45.760
specification which are as well to have extended quantization logic and so now what is dodge

10:45.760 --> 10:52.560
NNF so dodge NNF it's a project that was born in 2022 out of frustration trying to export

10:52.640 --> 10:58.880
NNX quantized model for internal project and it aimed to be a strong bridge between PyTorch

10:58.880 --> 11:08.480
and NNF and more precisely tracked NNF extended version with check upfront at export time that the

11:08.480 --> 11:16.320
equivalent between the two and friends on giant that are the PyTorch one and the tracked one

11:16.560 --> 11:23.760
equivalent for some specific example and so you can see here an example where you export a model

11:23.760 --> 11:30.160
which is very similar from the API of NNX you may be familiar from PyTorch except now you have this

11:30.160 --> 11:36.080
inference target which is there and which allow you to specify any inference on giant but here

11:36.080 --> 11:42.000
tracked with a specific version which allow you to unlock some specific features and so why

11:42.080 --> 11:47.280
it holds to an NNF so as we discussed it unlock some specific quantization and the quantization

11:47.280 --> 11:55.680
function which are more advanced it can relieve you control specific module say realization if you

11:55.680 --> 12:03.120
wish so you can take the and on some specific stuff you can add additional expression for

12:03.120 --> 12:10.720
the symbolics which is discussed about tracked directly baked in the mobile assets and finally

12:10.720 --> 12:17.440
it's more focused on single based neural networks so it will work better compared to NNX to

12:17.440 --> 12:25.120
export this model for tracked so thanks for listening if you have any question fit free to reach us

12:25.760 --> 12:31.920
on a GitHub or just ask us now areas of the documentation as well if you wish to contribute to

12:32.160 --> 12:43.040
use the project thank you very much we plan to stay around the room if you have questions

12:43.040 --> 12:56.560
stuff I also jumped into the matrix group so thank you so much guys I don't see any questions so I

12:56.560 --> 13:02.000
guess will we'll just let you hang out with the people