WEBVTT

00:00.000 --> 00:02.000
Thank you.

00:02.000 --> 00:04.000
Thank you.

00:04.000 --> 00:06.000
Thank you.

00:06.000 --> 00:10.000
Thank you.

00:10.000 --> 00:12.000
Thank you.

00:12.000 --> 00:13.000
All right.

00:13.000 --> 00:15.000
So, hi everyone.

00:15.000 --> 00:19.000
Hi everyone and welcome to first

00:19.000 --> 00:20.000
the 2010-06.

00:20.000 --> 00:23.000
My name is Sun and I'm from Huyen Falls.

00:23.000 --> 00:25.000
And today I'm going to talk about my

00:25.000 --> 00:28.000
subject, which is 90-moderate support in

00:28.000 --> 00:30.000
La La.

00:30.000 --> 00:32.000
So, my plan for this talk will be

00:32.000 --> 00:35.000
I will quickly have a self-introduction

00:35.000 --> 00:38.000
and then click both onto the

00:38.000 --> 00:40.000
history of a 90-moderate support.

00:40.000 --> 00:42.000
And then I'll talk about my work,

00:42.000 --> 00:44.000
some future direction.

00:44.000 --> 00:47.000
And lastly, there will be some this for you

00:47.000 --> 00:49.000
if you want to contribute to the project.

00:49.000 --> 00:51.000
And I will try to resolve

00:51.000 --> 00:54.000
like maybe five minutes for you

00:54.000 --> 00:56.000
to have some question.

00:56.000 --> 00:58.000
So, I hope we will have time for

00:58.000 --> 00:59.000
question.

00:59.000 --> 01:00.000
So, let's start.

01:00.000 --> 01:03.000
My name is Sun Sun and I'm

01:03.000 --> 01:05.000
I'm a software engineer at Huyen Falls.

01:05.000 --> 01:07.000
I'm one of the core maintainer

01:07.000 --> 01:09.000
of La Mado CDP.

01:09.000 --> 01:11.000
And you can see my work side,

01:11.000 --> 01:13.000
my sheet hard profile here.

01:13.000 --> 01:15.000
My slogan is that I'm doing a

01:15.000 --> 01:17.000
for fun, not for both of it.

01:17.000 --> 01:19.000
And my fun, I mean the science,

01:20.000 --> 01:22.000
the high machine learning and AI.

01:22.000 --> 01:25.000
And if you are curious about my work,

01:25.000 --> 01:27.000
you can visit my sheet hard profile

01:27.000 --> 01:30.000
I do a bunch of work on La Mado CDP.

01:30.000 --> 01:33.000
So, let's move on to this subject.

01:33.000 --> 01:36.000
So, what gross is that

01:36.000 --> 01:39.000
of multi-moderate support in La Mado CDP?

01:39.000 --> 01:42.000
So, from the very beginning of the project

01:42.000 --> 01:45.000
in 2020, we have initial support

01:46.000 --> 01:48.000
for a model school lover.

01:48.000 --> 01:51.000
And then, we have also had

01:51.000 --> 01:53.000
the support in La Mado CDP.

01:53.000 --> 01:54.000
It was nice.

01:54.000 --> 01:56.000
And then, what happened is,

01:56.000 --> 01:57.000
unfortunately,

01:57.000 --> 01:58.000
we actually should remove it

01:58.000 --> 02:00.000
because it's just too ugly.

02:00.000 --> 02:02.000
So, in the meantime,

02:02.000 --> 02:04.000
there was also new model,

02:04.000 --> 02:05.000
because back then,

02:05.000 --> 02:07.000
the multi-moderate and vision,

02:07.000 --> 02:08.000
especially with the model,

02:08.000 --> 02:10.000
was quite a hot thing,

02:10.000 --> 02:12.000
made me for NOE was a thing.

02:12.000 --> 02:14.000
And then,

02:15.000 --> 02:16.000
so,

02:16.000 --> 02:18.000
the US user experience

02:18.000 --> 02:19.000
was not very good,

02:19.000 --> 02:21.000
because for each new model,

02:21.000 --> 02:23.000
you had to use

02:23.000 --> 02:25.000
its own

02:25.000 --> 02:27.000
called common-line device.

02:27.000 --> 02:30.000
So, in my last year,

02:30.000 --> 02:31.000
I tried to work on this,

02:31.000 --> 02:32.000
I tried to,

02:32.000 --> 02:33.000
as a same time,

02:33.000 --> 02:35.000
improve the user experience

02:35.000 --> 02:36.000
and deliver the experience

02:36.000 --> 02:38.000
by introducing a new thing.

02:38.000 --> 02:41.000
I call this belief empty empty.

02:41.000 --> 02:43.000
And that way,

02:43.000 --> 02:46.000
I was able to bring back

02:46.000 --> 02:47.000
multi-moderate support

02:47.000 --> 02:50.000
to La Mado CDP and La Mado CDP.

02:50.000 --> 02:54.000
So, before my work on

02:54.000 --> 02:55.000
new empty empty,

02:55.000 --> 02:57.000
what is localized

02:57.000 --> 02:58.000
is that we had

02:58.000 --> 03:00.000
kind of this clear

03:00.000 --> 03:02.000
big things in inside La Mado CDP.

03:02.000 --> 03:04.000
We have the new La Mado CDP,

03:04.000 --> 03:05.000
which is the main library.

03:05.000 --> 03:06.000
And then, we had

03:06.000 --> 03:07.000
Clip that CDP,

03:07.000 --> 03:10.000
which is an implementation

03:10.000 --> 03:12.000
of vision transformer.

03:12.000 --> 03:14.000
And then, root La Mado CDP,

03:14.000 --> 03:16.000
it was something specifically

03:16.000 --> 03:18.000
for the La Mado model.

03:18.000 --> 03:19.000
And then, people,

03:19.000 --> 03:21.000
want to enter

03:21.000 --> 03:22.000
a new model,

03:22.000 --> 03:24.000
they tried to hack this library

03:24.000 --> 03:26.000
to add their new model inside.

03:26.000 --> 03:28.000
So, it's not very nice,

03:28.000 --> 03:30.000
it was very costly

03:30.000 --> 03:32.000
to maintain that much

03:32.000 --> 03:33.000
infrastructure.

03:33.000 --> 03:37.000
So, let's go to the empty empty empty.

03:37.000 --> 03:39.000
What is empty empty is

03:39.000 --> 03:40.000
not a model?

03:40.000 --> 03:42.000
So, initially, I propose this

03:42.000 --> 03:44.000
as LITLAVA2.

03:44.000 --> 03:46.000
Then, I realize

03:46.000 --> 03:49.000
I cannot use a model

03:49.000 --> 03:51.000
that is two models,

03:51.000 --> 03:52.000
specific names.

03:52.000 --> 03:53.000
So, let's get rid of that

03:53.000 --> 03:54.000
ring already.

03:54.000 --> 03:56.000
So, the idea is that

03:56.000 --> 03:57.000
now, I will abstract everything

03:57.000 --> 03:59.000
and, like,

03:59.000 --> 04:00.000
and that's where

04:00.000 --> 04:02.000
it is, I want

04:02.000 --> 04:04.000
one LITLABORIC.

04:04.000 --> 04:07.000
So, the core architecture is

04:07.000 --> 04:08.000
that used to have

04:08.000 --> 04:10.000
a clip of CDP, which is

04:10.000 --> 04:11.000
with which content

04:11.000 --> 04:13.000
on the vision model.

04:13.000 --> 04:16.000
So, vision transformer.

04:16.000 --> 04:18.000
And then, on the,

04:18.000 --> 04:19.000
let's say,

04:19.000 --> 04:20.000
pre-processing stop is

04:20.000 --> 04:22.000
doing by

04:22.000 --> 04:23.000
another sub-moder.

04:23.000 --> 04:25.000
So, it's almost,

04:25.000 --> 04:26.000
I will say,

04:26.000 --> 04:27.000
it's almost transparent

04:27.000 --> 04:29.000
to the end user

04:29.000 --> 04:31.000
and the end developer.

04:31.000 --> 04:34.000
So, put more

04:34.000 --> 04:35.000
than just

04:35.000 --> 04:37.000
encapsulates the core library.

04:37.000 --> 04:39.000
So, it's not

04:39.000 --> 04:40.000
that CDP.

04:40.000 --> 04:41.000
But, I also end

04:41.000 --> 04:42.000
to bring

04:42.000 --> 04:44.000
truly multi-moder

04:44.000 --> 04:45.000
support, because we

04:45.000 --> 04:46.000
start with vision.

04:46.000 --> 04:47.000
And now, we have

04:47.000 --> 04:49.000
audio, and then also

04:49.000 --> 04:51.000
video, which is just

04:51.000 --> 04:52.000
image and audio

04:52.000 --> 04:53.000
other things, other things

04:53.000 --> 04:54.000
and by the way.

04:54.000 --> 04:56.000
And so, along the way,

04:56.000 --> 04:57.000
I also provide

04:57.000 --> 04:58.000
sub-moder for

04:58.000 --> 05:00.000
audio input

05:00.000 --> 05:02.000
or audio pre-processing.

05:02.000 --> 05:03.000
And I also end

05:03.000 --> 05:04.000
it to be

05:04.000 --> 05:05.000
very extensible,

05:05.000 --> 05:06.000
because it's now

05:06.000 --> 05:07.000
being more

05:07.000 --> 05:08.000
more

05:08.000 --> 05:09.000
more

05:09.000 --> 05:10.000
optimistic.

05:10.000 --> 05:11.000
So, here,

05:11.000 --> 05:12.000
here is

05:12.000 --> 05:13.000
an example of

05:13.000 --> 05:14.000
one of the

05:14.000 --> 05:15.000
IP icon

05:15.000 --> 05:16.000
that I add to the project.

05:16.000 --> 05:17.000
So, it's going

05:17.000 --> 05:18.000
empty and be

05:18.000 --> 05:19.000
tokenized.

05:19.000 --> 05:20.000
And the way it's

05:20.000 --> 05:21.000
one is that you can see

05:21.000 --> 05:23.000
I had

05:23.000 --> 05:24.000
an input text.

05:24.000 --> 05:25.000
So, inside

05:25.000 --> 05:26.000
input text,

05:26.000 --> 05:27.000
you can specify

05:27.000 --> 05:28.000
this is

05:28.000 --> 05:29.000
the media, and then you

05:29.000 --> 05:30.000
can head text.

05:30.000 --> 05:31.000
And while

05:31.000 --> 05:32.000
this is

05:32.000 --> 05:33.000
marker.

05:33.000 --> 05:34.000
And then,

05:35.000 --> 05:36.000
we can

05:36.000 --> 05:37.000
knit

05:37.000 --> 05:38.000
knots.

05:38.000 --> 05:39.000
So, you can enter

05:39.000 --> 05:41.000
a bit

05:41.000 --> 05:42.000
map, which is

05:42.000 --> 05:44.000
an image on

05:44.000 --> 05:45.000
audio, or maybe

05:45.000 --> 05:47.000
something else in the future.

05:47.000 --> 05:48.000
And it will be

05:48.000 --> 05:49.000
replaced to the

05:49.000 --> 05:50.000
exact

05:50.000 --> 05:51.000
marker.

05:51.000 --> 05:53.000
So, let's see

05:53.000 --> 05:54.000
a little

05:54.000 --> 05:55.000
little

05:55.000 --> 05:56.000
more

05:56.000 --> 05:57.000
here.

05:57.000 --> 05:58.000
So, I'm not going

05:58.000 --> 05:59.000
to

05:59.000 --> 06:00.000
the

06:00.000 --> 06:01.000
developer

06:01.000 --> 06:03.000
and I will only show you

06:03.000 --> 06:04.000
the

06:04.000 --> 06:05.000
view

06:05.000 --> 06:06.000
the experience here.

06:06.000 --> 06:07.000
So, let's start

06:07.000 --> 06:08.000
with

06:08.000 --> 06:09.000
Lamassianai.

06:09.000 --> 06:10.000
And I'm going to

06:10.000 --> 06:11.000
try

06:11.000 --> 06:12.000
model is

06:12.000 --> 06:13.000
congenitary

06:13.000 --> 06:14.000
four billion

06:14.000 --> 06:15.000
per

06:15.000 --> 06:16.000
mirror.

06:16.000 --> 06:17.000
So, right now

06:17.000 --> 06:18.000
I'm using

06:18.000 --> 06:19.000
the

06:19.000 --> 06:20.000
CLI, the

06:20.000 --> 06:21.000
common light

06:21.000 --> 06:22.000
in the face, and the

06:22.000 --> 06:24.000
high

06:24.000 --> 06:26.000
common light

06:26.000 --> 06:27.000
is the

06:27.000 --> 06:29.000
Z-Lama server.

06:29.000 --> 06:30.000
So, when I

06:30.000 --> 06:31.000
touch an image,

06:31.000 --> 06:33.000
let's

06:33.000 --> 06:34.000
try this image.

06:34.000 --> 06:35.000
Then I ask

06:35.000 --> 06:36.000
what it is.

06:36.000 --> 06:37.000
What is this?

06:37.000 --> 06:39.000
What is that?

06:39.000 --> 06:43.000
Yes,

06:43.000 --> 06:44.000
we don't

06:44.000 --> 06:45.000
speak

06:45.000 --> 06:46.000
more paper.

06:46.000 --> 06:47.000
This one, by the way.

06:47.000 --> 06:48.000
Yeah.

06:48.000 --> 06:49.000
All right.

06:49.000 --> 06:51.000
So, how did I go back?

06:51.000 --> 06:52.000
Yes.

06:52.000 --> 06:53.000
So, that's a

06:53.000 --> 06:54.000
little

06:54.000 --> 06:55.000
quick

06:55.000 --> 06:57.000
the

06:57.000 --> 06:59.000
on the Lama

06:59.000 --> 07:00.000
CDP.

07:00.000 --> 07:01.000
What do you

07:01.000 --> 07:02.000
have?

07:02.000 --> 07:04.000
What in the face of that?

07:04.000 --> 07:05.000
So, let's talk about

07:05.000 --> 07:06.000
future

07:06.000 --> 07:07.000
directions

07:07.000 --> 07:08.000
I plan.

07:08.000 --> 07:09.000
So, so far,

07:09.000 --> 07:10.000
I have

07:10.000 --> 07:12.000
planned to support

07:12.000 --> 07:13.000
these two

07:13.000 --> 07:14.000
big things.

07:14.000 --> 07:15.000
Let's

07:15.000 --> 07:16.000
dive into the

07:16.000 --> 07:17.000
the first thing.

07:17.000 --> 07:18.000
So, not

07:18.000 --> 07:19.000
a model

07:19.000 --> 07:20.000
output.

07:20.000 --> 07:21.000
So, recently, we have

07:21.000 --> 07:22.000
a lot of

07:22.000 --> 07:23.000
image,

07:23.000 --> 07:24.000
generation model,

07:24.000 --> 07:25.000
like

07:25.000 --> 07:26.000
something

07:26.000 --> 07:27.000
image, I don't remember

07:27.000 --> 07:28.000
yet.

07:28.000 --> 07:29.000
But,

07:29.000 --> 07:30.000
the main idea

07:30.000 --> 07:31.000
how is what

07:31.000 --> 07:32.000
is that you have to

07:32.000 --> 07:33.000
meet Lama,

07:33.000 --> 07:34.000
which is the main

07:34.000 --> 07:35.000
library.

07:35.000 --> 07:36.000
That will produce

07:36.000 --> 07:37.000
some

07:37.000 --> 07:38.000
embedding output,

07:38.000 --> 07:39.000
and then we have to

07:39.000 --> 07:41.000
decode that

07:41.000 --> 07:42.000
embedding output

07:42.000 --> 07:44.000
into your

07:44.000 --> 07:46.000
audio or image.

07:46.000 --> 07:48.000
So, the

07:48.000 --> 07:49.000
overall idea looks like this.

07:49.000 --> 07:50.000
It's a little bit complicated

07:50.000 --> 07:51.000
why

07:51.000 --> 07:52.000
because some model

07:52.000 --> 07:53.000
right now, some model

07:54.000 --> 07:55.000
generation

07:55.000 --> 07:56.000
effects and

07:56.000 --> 07:57.000
multimodal

07:57.000 --> 07:58.000
interlip.

07:58.000 --> 07:59.000
So, that's

07:59.000 --> 08:00.000
relatively

08:00.000 --> 08:01.000
complicated

08:01.000 --> 08:02.000
because you can

08:02.000 --> 08:03.000
imagine that

08:03.000 --> 08:04.000
the model actually

08:04.000 --> 08:05.000
firstly

08:05.000 --> 08:06.000
generates some effects

08:06.000 --> 08:07.000
saying that

08:07.000 --> 08:08.000
here's the image

08:08.000 --> 08:09.000
that I want you

08:09.000 --> 08:10.000
generated for you.

08:10.000 --> 08:11.000
Or maybe

08:11.000 --> 08:12.000
it

08:12.000 --> 08:13.000
do some

08:13.000 --> 08:14.000
reasoning steps

08:14.000 --> 08:15.000
and then it makes

08:15.000 --> 08:16.000
some

08:16.000 --> 08:17.000
kind of tokens like

08:17.000 --> 08:18.000
touch generation token.

08:18.000 --> 08:19.000
And then when

08:19.000 --> 08:20.000
I get this token,

08:20.000 --> 08:21.000
I had to switch

08:21.000 --> 08:22.000
into using

08:22.000 --> 08:23.000
the lead empty empty

08:23.000 --> 08:24.000
to generate the

08:24.000 --> 08:25.000
image, for example.

08:25.000 --> 08:26.000
And then

08:26.000 --> 08:27.000
at some point

08:27.000 --> 08:28.000
there will be

08:28.000 --> 08:29.000
another token

08:29.000 --> 08:30.000
that says

08:30.000 --> 08:31.000
stop generation

08:31.000 --> 08:32.000
and I had to switch

08:32.000 --> 08:33.000
like.

08:33.000 --> 08:34.000
So, that is one

08:34.000 --> 08:35.000
of the complicated part

08:35.000 --> 08:36.000
on this

08:36.000 --> 08:37.000
system.

08:37.000 --> 08:38.000
The

08:38.000 --> 08:40.000
second complicated part

08:40.000 --> 08:41.000
is the

08:41.000 --> 08:42.000
actual

08:42.000 --> 08:43.000
implementation

08:43.000 --> 08:44.000
under the hood.

08:44.000 --> 08:45.000
So, for audio

08:45.000 --> 08:47.000
decoder,

08:47.000 --> 08:48.000
there is

08:48.000 --> 08:49.000
multiple choice here.

08:49.000 --> 08:50.000
We can use

08:50.000 --> 08:51.000
a transformer based

08:51.000 --> 08:52.000
which is

08:52.000 --> 08:53.000
a little bit higher

08:53.000 --> 08:54.000
or we can use

08:54.000 --> 08:55.000
a division

08:55.000 --> 08:57.000
by a

08:57.000 --> 08:58.000
which is much higher.

08:58.000 --> 09:00.000
And for vision

09:00.000 --> 09:02.000
decoder, so far

09:02.000 --> 09:03.000
I only know about

09:03.000 --> 09:05.000
diffusion based model

09:05.000 --> 09:06.000
and we also have

09:06.000 --> 09:07.000
something

09:07.000 --> 09:08.000
on diffusion

09:08.000 --> 09:09.000
dot CPP

09:09.000 --> 09:10.000
that is

09:10.000 --> 09:11.000
unfortunately

09:11.000 --> 09:12.000
it's not that easy

09:12.000 --> 09:13.000
to interact

09:13.000 --> 09:14.000
in lambda CPP.

09:14.000 --> 09:15.000
It's not

09:15.000 --> 09:16.000
like

09:16.000 --> 09:18.000
I can just

09:18.000 --> 09:20.000
copy

09:20.000 --> 09:21.000
the code

09:21.000 --> 09:22.000
there's

09:22.000 --> 09:23.000
not how it works

09:23.000 --> 09:24.000
unfortunately.

09:24.000 --> 09:25.000
So, yes

09:25.000 --> 09:26.000
the

09:26.000 --> 09:27.000
initial generation

09:27.000 --> 09:28.000
will be a long way

09:28.000 --> 09:29.000
to go

09:29.000 --> 09:30.000
I think.

09:30.000 --> 09:31.000
Another thing

09:31.000 --> 09:33.000
is

09:33.000 --> 09:34.000
so

09:34.000 --> 09:35.000
I also plan to

09:35.000 --> 09:36.000
have video

09:36.000 --> 09:37.000
in code.

09:37.000 --> 09:38.000
So, video

09:38.000 --> 09:39.000
is just

09:39.000 --> 09:40.000
an audio

09:40.000 --> 09:41.000
like

09:41.000 --> 09:42.000
an animation

09:42.000 --> 09:43.000
audio.

09:43.000 --> 09:45.000
So,

09:45.000 --> 09:46.000
see,

09:46.000 --> 09:47.000
the way

09:47.000 --> 09:48.000
she think about

09:48.000 --> 09:49.000
is that

09:49.000 --> 09:50.000
I can just

09:50.000 --> 09:51.000
extract

09:51.000 --> 09:52.000
bunch of image

09:52.000 --> 09:53.000
from an

09:53.000 --> 09:54.000
audio file.

09:54.000 --> 09:55.000
But what happened

09:55.000 --> 09:56.000
if you had an

09:56.000 --> 09:58.000
audio

09:58.000 --> 10:00.000
so a video

10:00.000 --> 10:01.000
plays

10:01.000 --> 10:02.000
like

10:02.000 --> 10:03.000
one hour long

10:03.000 --> 10:04.000
it would be

10:04.000 --> 10:05.000
a lot of memory.

10:05.000 --> 10:06.000
So,

10:06.000 --> 10:07.000
I'm

10:07.000 --> 10:08.000
inventing

10:08.000 --> 10:09.000
quote and quote

10:09.000 --> 10:10.000
inventing

10:10.000 --> 10:11.000
a streaming

10:11.000 --> 10:12.000
API

10:12.000 --> 10:15.000
in

10:15.000 --> 10:16.000
something

10:16.000 --> 10:17.000
not just a single frame

10:17.000 --> 10:18.000
I want to

10:18.000 --> 10:19.000
process

10:19.000 --> 10:20.000
some frame

10:20.000 --> 10:21.000
other time

10:21.000 --> 10:22.000
it's

10:22.000 --> 10:23.000
matching stuff.

10:23.000 --> 10:24.000
Another way

10:24.000 --> 10:25.000
some of the

10:25.000 --> 10:26.000
new

10:26.000 --> 10:28.000
support

10:28.000 --> 10:29.000
like

10:29.000 --> 10:30.000
like

10:30.000 --> 10:31.000
like

10:31.000 --> 10:32.000
a single frame

10:32.000 --> 10:33.000
so I'm inventing this

10:33.000 --> 10:34.000
guy

10:34.000 --> 10:35.000
in

10:35.000 --> 10:36.000
infrastructure

10:36.000 --> 10:37.000
and another

10:37.000 --> 10:38.000
question is

10:38.000 --> 10:39.000
so

10:39.000 --> 10:40.000
which guy

10:40.000 --> 10:41.000
I think

10:41.000 --> 10:42.000
the one she used to

10:42.000 --> 10:43.000
decode

10:44.000 --> 10:46.000
or

10:49.000 --> 10:52.000
bad

10:52.000 --> 10:53.000
because video

10:54.000 --> 10:55.000
code

10:55.000 --> 10:57.000
so

10:57.000 --> 10:58.000
is

10:58.000 --> 11:00.000
kind of

11:00.000 --> 11:01.000
the

11:01.000 --> 11:03.000
long story

11:03.000 --> 11:04.000
but

11:04.000 --> 11:07.000
I think

11:07.000 --> 11:08.000
no

11:08.000 --> 11:09.000
going to be happy

11:09.000 --> 11:10.000
if i

11:10.000 --> 11:12.000
throks

11:12.000 --> 11:16.000
the link here to you to learn more about my plans for the future.

11:17.000 --> 11:25.000
Okay, so now let's talk about some more important things for you to interact with the easy project.

11:25.000 --> 11:32.000
So yes, we are looking for a contributor, mentioned a developer for NTND, and not just NTND,

11:32.000 --> 11:36.000
Lama CP as a whole project.

11:36.000 --> 11:41.000
So if you want to involve in, I have three tips to that you to give to you.

11:41.000 --> 11:46.000
So the first tip is whenever you want to contribute to something new,

11:46.000 --> 11:48.000
that is why interesting to do.

11:48.000 --> 11:54.000
Let's first look around the code by and let's try to reuse what was already there.

11:54.000 --> 12:00.000
So, for example, for most of the multi-models,

12:00.000 --> 12:04.000
most of the vision model or even some audio model,

12:05.000 --> 12:12.000
you have a function is going to be with and VIPs in vision and vision transformers.

12:12.000 --> 12:19.000
So I already abstract most of the vision transformers into one simple function that you can use.

12:19.000 --> 12:23.000
So a lot of model is just like, just be with and this old,

12:23.000 --> 12:30.000
yeah, you see that model, like, quite not something that you just vision transformers inside.

12:31.000 --> 12:34.000
And if you have some, if you need to add something,

12:34.000 --> 12:39.000
there is not just about miles, that you think is going to be a big thing.

12:39.000 --> 12:42.000
So try to make sure it's sorry.

12:42.000 --> 12:48.000
So try to first maybe open a discussion, an issue to discuss with us.

12:48.000 --> 12:58.000
Second tip is, at least for NTND, and maybe I think for positive code by Lama CP.

12:58.000 --> 13:02.000
You can use AI, yeah, of course, to discover a thing.

13:02.000 --> 13:08.000
But I don't think at least for NTND, the code bay is major enough for AI to work out,

13:08.000 --> 13:15.000
because I already saw some here that used AI under the hood to work out the code.

13:15.000 --> 13:26.000
And even some one PR in specific, they tried to re-implement the matrix multiplication inside,

13:26.000 --> 13:29.000
which is not nice at the same time.

13:29.000 --> 13:33.000
Because, yeah, we have a lot of matrix multiplication inside Lama,

13:33.000 --> 13:39.000
not CP, NGM, and for some reasons, there is still need to reinvent their own

13:39.000 --> 13:42.000
method of matrix multiplication.

13:42.000 --> 13:49.000
So to go back to this point, you can, you should use AI to go back to my last,

13:49.000 --> 13:54.000
my first point is to look as a code bay to NTC, what's already there.

13:54.000 --> 13:59.000
And you can also discuss back and forth with the AI, you know it.

13:59.000 --> 14:03.000
What you're going to add could be adding a lot of code or not.

14:03.000 --> 14:07.000
And if we want to be alive with the directional project,

14:07.000 --> 14:10.000
which point should be brought up for discussion.

14:10.000 --> 14:13.000
So AI is very good for that thing.

14:13.000 --> 14:20.000
And last but not least, let's give a simple answer to bits.

14:20.000 --> 14:25.000
So by simple, I mean, the code bay is quite young.

14:25.000 --> 14:31.000
So I'm not trying to add exact support for all of these, all of the model.

14:31.000 --> 14:38.000
I know some of the model, it has somewhat working thing, not exactly the same implementation as,

14:38.000 --> 14:40.000
at an expected level.

14:40.000 --> 14:41.000
Yes, the code bay is young.

14:41.000 --> 14:44.000
So I just want you to have simple things first.

14:44.000 --> 14:49.000
So let's try and, whenever you push a change, let's try to not be,

14:49.000 --> 14:57.000
do you start to use specific or, yeah, to be to model specific.

14:57.000 --> 15:01.000
And let's try not to break our model.

15:01.000 --> 15:04.000
Of course, that's why we don't want you to do.

15:04.000 --> 15:08.000
But yeah, by the way, when you do the AI code,

15:08.000 --> 15:12.000
they tend to break our model, which is not quite nice.

15:12.000 --> 15:16.000
Yeah, so I think that's own for my talk.

15:16.000 --> 15:21.000
And thank you for your listening and this question time for everyone.

15:21.000 --> 15:29.000
APPLAUSE

15:29.000 --> 15:32.000
We've got time for one question.

15:32.000 --> 15:35.000
OK.

15:35.000 --> 15:38.000
I don't see any hands.

15:38.000 --> 15:41.000
Oh, sorry, yeah.

15:41.000 --> 15:48.000
How long, how long until the Lama CPB will have the ability to process the output of the model?

15:48.000 --> 15:51.000
To work on exactly the level of work.

15:51.000 --> 15:53.000
Yeah, so I'll just repeat the job question here,

15:53.000 --> 15:55.000
because you don't have to microwave it.

15:55.000 --> 15:57.000
Can you go there on the table?

15:57.000 --> 15:58.000
OK.

15:58.000 --> 16:03.000
So the job question was how long that is tight in terms of time,

16:03.000 --> 16:06.000
until that we have the first working like,

16:06.000 --> 16:09.000
somewhat working version of multiple of the output.

16:09.000 --> 16:14.000
I don't have a calm lie. I would say maybe I'm planning it for this.

16:14.000 --> 16:20.000
I'm having a demo for tech to fish now right now.

16:20.000 --> 16:23.000
So I think in one or two months, we're going to get it running.

16:23.000 --> 16:27.000
But for emission generation, it's going to take more time.

16:27.000 --> 16:29.000
I don't know.

16:29.000 --> 16:31.000
Maybe another question we have time.

16:31.000 --> 16:33.000
Yes, sorry.

16:33.000 --> 16:41.000
I just wanted to ask you how to test the computer.

16:41.000 --> 16:47.000
Yeah, so the question is how to configure Lama CPB to use multiple users

16:47.000 --> 16:50.000
on only one GPU or one server.

16:50.000 --> 16:52.000
It actually depends, I think, for now.

16:52.000 --> 17:00.000
The best way for you to do is to try to think or ring with the number of parallel requests,

17:00.000 --> 17:03.000
and not end up being requests.

17:03.000 --> 17:08.000
And as a same time, try to balance it with the context layer.

17:08.000 --> 17:12.000
So that's what all I can tell you.

17:12.000 --> 17:16.000
Do you have no time for that?

17:16.000 --> 17:17.000
Yeah.

17:17.000 --> 17:19.000
Oh, sorry.

17:19.000 --> 17:20.000
Yeah.

17:20.000 --> 17:21.000
Yeah.

17:21.000 --> 17:29.000
Yeah.

17:29.000 --> 17:30.000
Yeah.

17:30.000 --> 17:35.000
So the FAP and BAC, it's very famous.

17:35.000 --> 17:37.000
The FAP and BAC library, why I choose this?

17:37.000 --> 17:39.000
Just because it's so famous.

17:39.000 --> 17:40.000
It's nice.

17:40.000 --> 17:41.000
It's top of a list.

17:41.000 --> 17:43.000
Of course, there are some autoconsiduration.

17:43.000 --> 17:44.000
Yeah.

17:44.000 --> 17:47.000
Do you consider to be optional dependency?

17:47.000 --> 17:48.000
Sorry.

17:48.000 --> 17:52.000
Yeah, optional.

17:52.000 --> 17:56.000
So I haven't considered it yet.

17:56.000 --> 17:59.000
I don't have a clear plan on how to do it.

17:59.000 --> 18:02.000
But I try to make it optional.

18:02.000 --> 18:07.000
No more questions, and just people, because man of that will get recorded.

18:07.000 --> 18:11.000
So please do not stand up and walk away.

18:11.000 --> 18:16.000
And until the speaker is done, otherwise the video would be like horrible.

18:16.000 --> 18:21.000
So nobody stands up and leaves until the speaker is done.

18:21.000 --> 18:25.000
There would be like a three minute, five minute break in between the speakers.

18:25.000 --> 18:28.000
When that is allowed, not why all the questions.

18:28.000 --> 18:29.000
Yeah.

18:29.000 --> 18:33.000
Do I have time for one more question here?

18:33.000 --> 18:34.000
Yeah.

18:34.000 --> 18:38.000
So the guy here, I think he's his easy.

18:38.000 --> 18:45.000
So my understanding is that right now, all you're in

18:45.000 --> 18:50.000
support in Lama CBP is already good enough.

18:50.000 --> 18:57.000
So my understanding is that right now, all you're in support in Lama CBP is working and good enough.

18:57.000 --> 19:05.000
What would be the most powerful model right now to get a well effect out of trying all you're in support?

19:05.000 --> 19:06.000
Yeah.

19:06.000 --> 19:09.000
So we have support from Miss Tang.

19:09.000 --> 19:11.000
That's a Miss Tang model.

19:11.000 --> 19:14.000
That's a bottom audio input.

19:14.000 --> 19:16.000
This model is quite big, I think.

19:16.000 --> 19:22.000
There's another model, it's going to be quite a model, quite an audio, which is also very good.

19:22.000 --> 19:24.000
So you can try it.

19:24.000 --> 19:31.000
And by the way, there's quite an army that allows you to input both types and audio as a same path.

19:31.000 --> 19:32.000
Yeah.

19:32.000 --> 19:33.000
Thank you.

