WEBVTT

00:00.000 --> 00:13.000
All right, we're going to start off again with the Lumi Supercomputer and Machine.

00:13.000 --> 00:15.000
Thank you.

00:15.000 --> 00:18.000
Good morning, everyone.

00:18.000 --> 00:19.000
My name is Boris.

00:19.000 --> 00:22.000
I want to talk to you about my adventures,

00:22.000 --> 00:25.000
scaling a finite demand code and the Lumi Supercomputer.

00:25.000 --> 00:32.000
And now we have to improve our machine code for to work on it.

00:32.000 --> 00:36.000
So first thing first, I will describe briefly what is G-Mesh.

00:36.000 --> 00:39.000
Best generator we use in our team.

00:39.000 --> 00:44.000
And then I will describe the kind of simulations we want to run.

00:44.000 --> 00:46.000
And I will focus on two aspects.

00:46.000 --> 00:50.000
The overlap of mesh subdomains.

00:51.000 --> 00:58.000
And now we mix it challenging to have a scaling code on massively parallel machines.

00:58.000 --> 01:00.000
So what is G-Mesh?

01:00.000 --> 01:04.000
G-Mesh is a free and open source machine generator,

01:04.000 --> 01:09.000
which can build 2D 3D meshes suited for finite elements.

01:09.000 --> 01:11.000
And it has scripting language,

01:11.000 --> 01:13.000
binding for mail and widgets,

01:13.000 --> 01:16.000
C++ Python, Fortune, Julia.

01:16.000 --> 01:18.000
It's also a GUI.

01:18.000 --> 01:22.000
You can use it on the common line, so it's very versatile.

01:22.000 --> 01:26.000
And it can generate that kind of meshes.

01:26.000 --> 01:29.000
So millions of elements,

01:29.000 --> 01:33.000
it is multifraid that you can be on complex geometry,

01:33.000 --> 01:37.000
it's very fun tool to use.

01:37.000 --> 01:42.000
And so this is a machinery to use for my own finite element simulations.

01:43.000 --> 01:48.000
My main field of interest is time harmonic wave simulations.

01:48.000 --> 01:52.000
So wave simulations with non-frequency.

01:52.000 --> 01:57.000
It has many applications, such as rig-to-design imaging.

01:57.000 --> 02:03.000
It can be acoustic waves, electromagnetic waves, so it's a broad family.

02:03.000 --> 02:09.000
But each time I want to do finite elements to solve this kind of problem.

02:10.000 --> 02:13.000
And when you do finite elements for time harmonic waves,

02:13.000 --> 02:19.000
you end up with a very large complex linear system with a spasm matrix.

02:19.000 --> 02:24.000
Typically you can go up to hundreds of millions of millions of a known.

02:24.000 --> 02:28.000
At this large scale, the spasm direct software is too expensive to use.

02:28.000 --> 02:34.000
And classical iterative method, such as multi-grids, dumb work well.

02:34.000 --> 02:38.000
So we want to use an iterative software, but this one.

02:38.000 --> 02:43.000
And so this is the reason we use mostly a domain decomposition method.

02:43.000 --> 02:47.000
Once again, it's a very broad family of numerical methods,

02:47.000 --> 02:50.000
which are naturally suited for parallel computing,

02:50.000 --> 02:55.000
because you split your very large problems into many subproblems.

02:55.000 --> 02:57.000
You solve them in parallel.

02:57.000 --> 03:01.000
And you use it as a preconditioner as some kind of iterative procedure

03:01.000 --> 03:04.000
to recover the entire solution.

03:04.000 --> 03:07.000
And so it is an iterative method overall,

03:07.000 --> 03:11.000
but you need to solve many subproblems in parallel many times.

03:11.000 --> 03:14.000
And when you do that, you can solve very large problems,

03:14.000 --> 03:18.000
such as this plane reactor intake,

03:18.000 --> 03:22.000
which requires a billion of a known at least,

03:22.000 --> 03:24.000
because it's a high frequency wave.

03:24.000 --> 03:27.000
So you have many, many unknown points,

03:27.000 --> 03:30.000
but we can make it work.

03:31.000 --> 03:35.000
This particular test case was run on the Lumis supercomputers,

03:35.000 --> 03:39.000
which is a huge PC cluster in Finland,

03:39.000 --> 03:43.000
which has a main advantage of having enough resources

03:43.000 --> 03:45.000
for what kind of problem.

03:45.000 --> 03:50.000
So for instance, back to a plane this time for the exhaust.

03:50.000 --> 03:53.000
It ran on Lumis with 500 nodes,

03:53.000 --> 03:57.000
which is the most you can ask for on the cluster.

03:57.000 --> 04:01.000
And we could solve this problem in 15 minutes,

04:01.000 --> 04:06.000
and solving a problem with 1.3 billion unknown.

04:06.000 --> 04:12.000
In this case, it was 4,000 MPI processes with 16 threads each.

04:15.000 --> 04:20.000
So that was the state of the antinmyteen before I came in.

04:20.000 --> 04:24.000
So it's a high auto mesh split with Metis,

04:24.000 --> 04:27.000
which generates another lapping partition,

04:27.000 --> 04:31.000
and we ran over 4,000 subdomains.

04:31.000 --> 04:34.000
And so that there were two limitations.

04:34.000 --> 04:38.000
It's that going above is a 1,6,000 subdomains.

04:38.000 --> 04:41.000
It's not very effective.

04:41.000 --> 04:44.000
And also it was another lapping partition,

04:44.000 --> 04:47.000
but sometimes you prefer to have overlapping subdomains,

04:47.000 --> 04:50.000
because some solvers rely on it,

04:50.000 --> 04:53.000
and we wanted to know if there were good old bad solvers.

04:54.000 --> 04:59.000
And so these are the two things I want to talk about today.

04:59.000 --> 05:05.000
So for the overlaps, I think a picture is worth a thousand words.

05:05.000 --> 05:09.000
So you open your favorite mesh with GMesh.

05:09.000 --> 05:14.000
You want to split it, say it on four subdomains.

05:14.000 --> 05:18.000
So if I also read the existing, you call Metis.

05:18.000 --> 05:21.000
Provide a partition, and you use it.

05:21.000 --> 05:23.000
We build interface elements.

05:23.000 --> 05:26.000
So you can do the coupling.

05:26.000 --> 05:29.000
Now the new thing is that we can create overlaps.

05:29.000 --> 05:32.000
So we take a subset of the neighboring subdomains,

05:32.000 --> 05:35.000
and we open to them to have an extended subdomain.

05:35.000 --> 05:40.000
And we use this extended subdomain as a computation component

05:40.000 --> 05:43.000
in the preconditioner.

05:43.000 --> 05:45.000
And so this is a new feature we implemented,

05:45.000 --> 05:50.000
which was a very finite venture.

05:50.000 --> 05:52.000
Once it is done, the API is pretty simple,

05:52.000 --> 05:56.000
and you can query the overlaps, all sort of the boundaries.

05:56.000 --> 06:00.000
So the implementation is that we have the view on existing elements

06:00.000 --> 06:02.000
for the overlaps.

06:02.000 --> 06:05.000
But we need to create new elements for the boundaries.

06:05.000 --> 06:08.000
In particular, we have two kind of boundaries.

06:08.000 --> 06:14.000
We have a outer boundary, and what we call inner boundary,

06:14.000 --> 06:17.000
which may be not completely appropriate.

06:17.000 --> 06:23.000
So the green part here is the extension of the already existing boundary.

06:23.000 --> 06:30.000
And the red part is the artificial boundary of the overlaps.

06:30.000 --> 06:34.000
But it actually is a curve inside your computation domain.

06:34.000 --> 06:37.000
And so it's critical to know the difference,

06:37.000 --> 06:41.000
because you want to apply different boundary conditions on these two families of curves,

06:41.000 --> 06:45.000
because it impacts correctness and convergence of your thing.

06:45.000 --> 06:49.000
And so we added these features into GASH.

06:49.000 --> 06:52.000
And once it's done, we can do some numerical experiments

06:52.000 --> 06:56.000
to perform large scale simulations on the main.

06:56.000 --> 07:00.000
This will be my test case until we end the talk.

07:00.000 --> 07:05.000
It's a geophysics test case for aquistics simulations.

07:05.000 --> 07:09.000
So it's an heterogeneous medium with varying wave velocity,

07:09.000 --> 07:12.000
blue with water, where the waves are slow,

07:12.000 --> 07:16.000
and yellow red are rocks or soil,

07:16.000 --> 07:18.000
where waves travel much faster.

07:18.000 --> 07:20.000
And so it's an interesting problem,

07:20.000 --> 07:24.000
because you have varying wavelengths, reflection, reflection,

07:24.000 --> 07:27.000
and interesting physics happening.

07:27.000 --> 07:32.000
And I want to simulate ways at higher frequency possible.

07:32.000 --> 07:36.000
And higher frequency, the more expensive a problem,

07:36.000 --> 07:39.000
higher frequency means shorter wave lengths,

07:39.000 --> 07:40.000
which means a finer mesh.

07:40.000 --> 07:43.000
And so more degrees of freedom.

07:43.000 --> 07:48.000
And so I go to from 1.6 to 6.25 hertz,

07:48.000 --> 07:54.000
which for in our mesh translates from 18 to 600,

07:54.000 --> 07:57.000
11 to 600 million of a node.

07:57.000 --> 08:01.000
And we do it in a weeks scaling way.

08:01.000 --> 08:05.000
So we want to keep the problems ice constant,

08:05.000 --> 08:08.000
and we add more resources as we increase the frequency.

08:08.000 --> 08:15.000
And so we need one process for every 85,000 unknown.

08:15.000 --> 08:19.000
And so using the resources from Lumi,

08:19.000 --> 08:22.000
it went from 2 to 128 nodes,

08:22.000 --> 08:25.000
each node having 100, 28 CPUs.

08:25.000 --> 08:27.000
I mean, threats.

08:27.000 --> 08:32.000
And so typical example of a kind of mesh we have.

08:32.000 --> 08:35.000
We have a lap is not shown, but it was computed.

08:35.000 --> 08:38.000
So you can see each color is a partition,

08:38.000 --> 08:40.000
so a different subdomain.

08:40.000 --> 08:43.000
We are designed to have the same size in terms of element count,

08:43.000 --> 08:46.000
followed balancing, but we actually represent different sizes

08:46.000 --> 08:48.000
in terms of geometrical space.

08:48.000 --> 08:50.000
And because we have smaller elements in some areas,

08:50.000 --> 08:52.000
and larger elements in some areas,

08:52.000 --> 08:55.000
to match local wavelengths.

08:57.000 --> 09:00.000
In the original 5 format we used,

09:00.000 --> 09:02.000
we have one file, passive domain.

09:02.000 --> 09:06.000
Each file contains the necessary elements, the nodes,

09:06.000 --> 09:08.000
so this part of the mesh.

09:08.000 --> 09:11.000
But we chose to duplicate the mesh topology.

09:11.000 --> 09:14.000
So each file contains information about which subdomain,

09:14.000 --> 09:16.000
to choose which subdomain.

09:16.000 --> 09:18.000
Once the name of interface, between the subdomains,

09:18.000 --> 09:19.000
and that kind of things.

09:19.000 --> 09:21.000
Because it was simpler.

09:21.000 --> 09:24.000
But actually at some point it will turn out to be an issue.

09:24.000 --> 09:26.000
I'll come back to it later.

09:26.000 --> 09:31.000
But now that we have overlapping subdomains,

09:31.000 --> 09:34.000
you also need to export elements from the overlaps.

09:34.000 --> 09:36.000
So the files get a bit bigger.

09:36.000 --> 09:40.000
You also need more bookkeeping information

09:40.000 --> 09:43.000
to know which subset to use and what kind of things.

09:43.000 --> 09:46.000
You also need to store boundaries of overlaps.

09:46.000 --> 09:49.000
So the whole thing is a bit more expensive.

09:49.000 --> 09:53.000
And so we start with a fairly naive implementation

09:53.000 --> 09:55.000
of exporting all elements from our subdomain,

09:55.000 --> 09:58.000
and the elements from the neighboring subdomain.

09:58.000 --> 10:01.000
Which is a bit too much because we don't need all elements

10:01.000 --> 10:03.000
from the neighboring subdomain.

10:03.000 --> 10:06.000
But it's simpler and it's just a mesh.

10:06.000 --> 10:09.000
So we don't expect it to be too expensive.

10:09.000 --> 10:12.000
So there is some redundancy.

10:12.000 --> 10:16.000
And it worked on small problems.

10:16.000 --> 10:19.000
We managed to run up to 40 million unknowns.

10:19.000 --> 10:23.000
And then going above, it's getting complicated.

10:23.000 --> 10:26.000
Because we have too much duplication.

10:26.000 --> 10:29.000
And sometimes you can load all the meshes

10:29.000 --> 10:32.000
because you have way too much data.

10:32.000 --> 10:34.000
And especially as you increase the problem size,

10:34.000 --> 10:36.000
we have a more regular mesh,

10:36.000 --> 10:39.000
which translates to more of a laptop domains,

10:39.000 --> 10:41.000
so more elements to export.

10:41.000 --> 10:46.000
So we need to do something a bit more efficient.

10:47.000 --> 10:50.000
And so we decided to export only the necessary elements

10:50.000 --> 10:52.000
in the necessary nodes.

10:52.000 --> 10:54.000
Which takes some time because you need to loop

10:54.000 --> 10:58.000
of elements to see if you are needed or not to keep a set.

10:58.000 --> 11:03.000
But it makes for smaller files and mesmemory consumption in less ways.

11:03.000 --> 11:08.000
So we do that for the elements and for the node.

11:08.000 --> 11:09.000
And it helps.

11:09.000 --> 11:13.000
We could go a bit higher and solve 80 million non-problems.

11:13.000 --> 11:16.000
But going above, still crashing.

11:16.000 --> 11:19.000
Because we run out of memory.

11:19.000 --> 11:22.000
And so as someone who loves to waste energy,

11:22.000 --> 11:26.000
just for a render job with more memory to see if it worked.

11:26.000 --> 11:28.000
And it didn't.

11:28.000 --> 11:32.000
And I was a bit confused because we export only what we need

11:32.000 --> 11:36.000
or subset of elements, subset of node.

11:36.000 --> 11:40.000
And it turned out that the mesh topology I mentioned.

11:40.000 --> 11:44.000
But we keep data on which the domain is connected to.

11:44.000 --> 11:48.000
What's the domain and what kind of things?

11:48.000 --> 11:51.000
Which is deprecated across every file.

11:51.000 --> 11:55.000
When you have a big thousand partitions, it's not negligible anymore.

11:55.000 --> 11:58.000
And that actually makes up for most of a memory you consume.

11:58.000 --> 12:03.000
And so what we ignored because of the minor overhead.

12:03.000 --> 12:07.000
It was actually wasting more than half of our memory.

12:08.000 --> 12:11.000
Just to use this information.

12:11.000 --> 12:15.000
And so we felt a bit dumb when we noticed.

12:15.000 --> 12:19.000
And it was a bit annoying to fix because many parts of the code

12:19.000 --> 12:21.000
relied on this assumption.

12:21.000 --> 12:25.000
So it took almost a month to fix.

12:25.000 --> 12:29.000
But once we did it and everything worked well,

12:29.000 --> 12:33.000
we could go much higher and we could solve problems with.

12:33.000 --> 12:39.000
200, 300, 1600 unknowns and go to up to 6000 subdomains.

12:39.000 --> 12:42.000
Which is four times what we could do before.

12:42.000 --> 12:44.000
And it's with other lab.

12:44.000 --> 12:47.000
And we could even go I go in theory.

12:47.000 --> 12:52.000
And now the mesh fans are much, much smaller.

12:52.000 --> 12:58.000
And so we reached the efficiency we were hoping for.

12:58.000 --> 13:06.000
And using this, we could publish paper, comparing other labping and non developing numerical methods.

13:06.000 --> 13:12.000
And it turned down to non developing one is a bit faster and requires less memory.

13:12.000 --> 13:14.000
So all this for that.

13:14.000 --> 13:18.000
But it was worth investigating.

13:18.000 --> 13:25.000
Very small subset of our paper where we do some weak scaling studies.

13:25.000 --> 13:28.000
It's a bit complex to read if you're not familiar with a problem.

13:28.000 --> 13:33.000
Because we expect that even in the case of perfect weak scaling,

13:33.000 --> 13:35.000
the timing increases with the frequency.

13:35.000 --> 13:38.000
Because the cost per iteration is stable.

13:38.000 --> 13:41.000
But the number of iterations still grows.

13:41.000 --> 13:45.000
So we expected the linear relationship between time and frequency.

13:45.000 --> 13:51.000
If we had resources as a frequency increases.

13:52.000 --> 13:57.000
And we could show the oras on the left is overlapping method.

13:57.000 --> 14:00.000
And the oras on the right is non developing method.

14:00.000 --> 14:07.000
And non developing method is faster, shorter and longer.

14:07.000 --> 14:12.000
So once we publish this paper, we could publish a bit of implementation.

14:12.000 --> 14:16.000
So we recently made the overlap boundaries,

14:16.000 --> 14:19.000
but with high order elements.

14:19.000 --> 14:22.000
Which changed the way we build the overlaps.

14:22.000 --> 14:27.000
Because it was a bit naive on the first time.

14:27.000 --> 14:31.000
Instead of having a subdomain looking for elements on its neighbors.

14:31.000 --> 14:36.000
Each subdomain looks at its boundary and tells the of a subdomain.

14:36.000 --> 14:38.000
This should be a overlap.

14:38.000 --> 14:41.000
And it's a much simpler and much faster.

14:41.000 --> 14:43.000
And we finally merged.

14:43.000 --> 14:46.000
Last week all these changes in the main branch of gmesh.

14:46.000 --> 14:50.000
It's not that part of the release, but it will be in the next one.

14:50.000 --> 14:55.000
And so if you're interested, you can already try it out.

14:55.000 --> 15:06.000
So it was a bit story of a big story of attack on in my life last year.

15:06.000 --> 15:08.000
It was a long adventure.

15:09.000 --> 15:15.000
So it's a good lesson on optimizing and scaling things.

15:15.000 --> 15:18.000
Because at every step, a new issue arrived.

15:18.000 --> 15:21.000
And it's never the one you expect.

15:21.000 --> 15:25.000
So it was a lesson in always profiling, understanding assumptions.

15:25.000 --> 15:29.000
Because sometimes you think you know when you have things consuming too much memory.

15:29.000 --> 15:35.000
And it's actually something completely different that you didn't think of.

15:35.000 --> 15:43.000
But it was very interesting to always improve iterate, eliminate waste at each step.

15:43.000 --> 15:50.000
And going further forces you to find out what's the main issue.

15:50.000 --> 15:55.000
And it's very fascinating things to to live.

15:55.000 --> 16:01.000
The last thing we haven't fixed yet is that we can do fast simulations when you load the mesh.

16:01.000 --> 16:06.000
But the time we take to export the mesh properly is a bit long.

16:06.000 --> 16:11.000
Because we do extra checks to ensure we only export what we need, everything is correct.

16:11.000 --> 16:14.000
So it could be optimized a bit further.

16:14.000 --> 16:19.000
But we came a long way since last year.

16:19.000 --> 16:22.000
That's about it for me.

16:22.000 --> 16:29.000
Just a small advertisement that we were seeing the first gmesh user meeting in the edge of the summer.

16:29.000 --> 16:36.000
So if you happen to be a gmesh user, feel free to come by and discuss this kind of things.

16:36.000 --> 16:39.000
And I'm open to questions if you have some.

16:59.000 --> 17:11.000
So the question is why I didn't use an adaptive mesh measurement, right?

17:11.000 --> 17:13.000
Actually we did.

17:13.000 --> 17:15.000
No, it's not adaptive.

17:15.000 --> 17:19.000
It's a, um, to say it.

17:19.000 --> 17:21.000
We create a mesh in advance.

17:21.000 --> 17:28.000
But we know where it has to be fine and why it has to be because because it's from the local wave speed which we is known.

17:28.000 --> 17:39.000
So once you have partitioned your mesh changing the mesh density would be a bit of a mess and not sure we plan for it.

17:39.000 --> 17:45.000
But in this case we already know the ideal refinement level.

17:45.000 --> 17:47.000
Okay.

17:47.000 --> 17:56.000
So the question is does gmesh support GPUs?

17:56.000 --> 18:00.000
Um, great question.

18:00.000 --> 18:02.000
Working progress.

18:02.000 --> 18:08.000
So the machine process is itself still wants on CPUs because it's not very GPU friendly.

18:08.000 --> 18:12.000
But one of my colleagues is performing the same kind of simulation I'm doing.

18:12.000 --> 18:14.000
But on GPUs.

18:14.000 --> 18:18.000
So it's not very natural because it's pass on GPU everywhere.

18:18.000 --> 18:21.000
And it's not naturally suited for GPUs.

18:21.000 --> 18:27.000
But essentially you do something as I do with subdomains but you do much smaller subdomains.

18:27.000 --> 18:31.000
And you instead of having one subdomain per thread.

18:31.000 --> 18:35.000
You have maybe 100 and 1000 subdomains per GPU.

18:35.000 --> 18:39.000
And you can use a 10 or 100 GPUs if you want.

18:39.000 --> 18:43.000
Which is one more reason to partition efficiently.

18:43.000 --> 18:49.000
And in the future we will need to have maybe 100 subdomains per file instead of one file,

18:49.000 --> 18:57.000
because each PC and means don't like when we have 16,000 files as input.

18:57.000 --> 19:01.000
So far we haven't murdered me yet.

19:01.000 --> 19:13.000
So the question is, which GPU we used?

19:13.000 --> 19:17.000
Um, which technology?

19:17.000 --> 19:21.000
And I have to ask my colleague for details.

19:21.000 --> 19:27.000
So we have access to a Belgian cluster which is Nvidia GPUs.

19:27.000 --> 19:31.000
And we have access to Lumie which uses AMD.

19:31.000 --> 19:35.000
We mostly prototyping of a smaller one which is Nvidia.

19:35.000 --> 19:37.000
But we still work in progress.

19:37.000 --> 19:41.000
But it's we're trying to be compatible with both.

19:41.000 --> 19:45.000
But there's still some compatibility issues with some libraries.

19:45.000 --> 19:49.000
And we're comparing different backends for the algebra part.

19:49.000 --> 19:55.000
Because we have some dense backends and sparse backends.

19:55.000 --> 19:59.000
And it's very exploratory.

20:05.000 --> 20:09.000
So the question is, did I consider out of core competitions?

20:09.000 --> 20:13.000
Do you mean for the simulation part of the...

20:13.000 --> 20:23.000
Yeah, so for mesh part, it's not really an issue for now.

20:23.000 --> 20:27.000
The issue is time to export all the files.

20:27.000 --> 20:29.000
But we can do it.

20:29.000 --> 20:31.000
It did a lot of memories.

20:31.000 --> 20:35.000
Sometimes you use what we call a fat node and the cluster.

20:35.000 --> 20:39.000
So we just more RAM and less CPUs because we just need a lot of RAM.

20:39.000 --> 20:43.000
So maybe it can go up to a terabyte of RAM.

20:43.000 --> 20:47.000
And for the competition part.

20:55.000 --> 20:57.000
The solvers we use aim to avoid that.

20:57.000 --> 21:03.000
So you can use sparse direct solvers, such as memes which have an out of core capability.

21:03.000 --> 21:05.000
But it works with its slower.

21:05.000 --> 21:11.000
And even if you have enough memory, it's actually like five times slower than what we do.

21:11.000 --> 21:13.000
And so we save memory with the time.

21:13.000 --> 21:17.000
And we don't have to mess up the disk.

21:17.000 --> 21:21.000
But it depends on the application.

21:21.000 --> 21:23.000
Because we're investigating case.

21:23.000 --> 21:25.000
We solve for multiple right-hand sites.

21:25.000 --> 21:28.000
I'm focused on case with 50 or 100 right-hand sites.

21:28.000 --> 21:31.000
But if you have thousands of right-hand sites,

21:31.000 --> 21:35.000
maybe it's worth it to have a factorization of your matrix.

21:43.000 --> 21:44.000
Yes.

21:44.000 --> 21:47.000
So the question is what is a gmesh format like?

21:47.000 --> 21:53.000
So it's a custom format, but it has a binary and text version.

21:53.000 --> 21:55.000
Binary is just a bit faster to read.

21:55.000 --> 21:58.000
But we all have the implementation support both ways.

21:58.000 --> 22:02.000
And when you open the file, it detects which one was used.

22:29.000 --> 22:34.000
Can you repeat the last part?

22:46.000 --> 22:48.000
Okay, so the question was,

22:48.000 --> 22:51.000
where if we use GPU partitioners,

22:51.000 --> 22:55.000
and come again?

22:59.000 --> 23:01.000
Okay, yes.

23:01.000 --> 23:04.000
So the question is about alternatives to matrix?

23:04.000 --> 23:09.000
For now, we only support matrix.

23:09.000 --> 23:11.000
We thought about parameters,

23:11.000 --> 23:13.000
but slightly sensing issue,

23:13.000 --> 23:15.000
and the gmesh itself is still controlled.

23:15.000 --> 23:17.000
So it's not immediate.

23:17.000 --> 23:22.000
We are working on prototype times of all main partitioners,

23:22.000 --> 23:24.000
based on space feeling curves.

23:25.000 --> 23:28.000
Which has supposed to be much faster,

23:28.000 --> 23:31.000
but if very rough partitions.

23:31.000 --> 23:35.000
Not very familiar with GPU ones,

23:35.000 --> 23:38.000
but it would be nice at some point probably.

23:38.000 --> 23:41.000
And we've regard to the physics,

23:41.000 --> 23:43.000
there's no particular trick.

23:43.000 --> 23:46.000
We just give a graph of volume element,

23:46.000 --> 23:49.000
so it may just know the connection between T-Trader,

23:49.000 --> 23:52.000
and we just ask for partitions with a similar bit of T-Trader

23:53.000 --> 23:55.000
in each chip domain.

24:07.000 --> 24:08.000
Last one.

24:10.000 --> 24:11.000
Exiderer?

24:11.000 --> 24:12.000
Yeah.

24:13.000 --> 24:15.000
So the question was,

24:15.000 --> 24:19.000
is there support for exiderer or over candle elements?

24:20.000 --> 24:23.000
Very support for exiderer elements.

24:23.000 --> 24:26.000
Very recently, like last week,

24:26.000 --> 24:31.000
they started working on prototypes for political elements.

24:31.000 --> 24:34.000
But it is very much of a draft for now.

24:34.000 --> 24:36.000
But exiderer, yes.

24:37.000 --> 24:40.000
At least there exist in the format.

24:41.000 --> 24:45.000
Not sure we algorithm for generating them as advanced,

24:45.000 --> 24:47.000
but you can exist.

24:49.000 --> 24:51.000
Alright, let's go.

