WEBVTT

00:00.000 --> 00:18.240
All right, next time we're going to have Luca talking to us about a reproducible

00:18.240 --> 00:20.560
XFS file systems.

00:20.560 --> 00:22.840
Hi, everyone.

00:22.840 --> 00:23.840
I'm Luca.

00:23.840 --> 00:29.000
Today we will talk about reproducible file system with XFS.

00:29.000 --> 00:32.000
A little thing about me.

00:32.000 --> 00:34.560
I'm a software engineering singer.

00:34.560 --> 00:36.760
I know an open software enthusiast.

00:36.760 --> 00:42.720
You can find me everywhere with 89, Luca 89, that's my mail.

00:42.720 --> 00:46.920
Why are we interested in reproducible file systems?

00:46.920 --> 00:55.160
We build stuff and we want that stuff to be very viable by the end user.

00:55.160 --> 00:59.160
The user has access to the source code, so they should be able to reproduce our own

00:59.160 --> 01:06.000
artefacts and be sure that we are not missing stuff up.

01:06.000 --> 01:13.280
It's also useful for us for debugging, for auditability and so on, so it's very important.

01:13.280 --> 01:22.720
Working in Schengard with our product of VMs, we wanted to be transfer for our users

01:22.720 --> 01:26.880
and have reproducible images for them.

01:26.880 --> 01:35.480
How do we use one usually approach publicizing the file system, MKFS, that file system,

01:35.480 --> 01:40.920
you mount it, you copy stuff inside, and you are mounted, but this is actually very much

01:40.920 --> 01:42.640
not reproducible.

01:42.640 --> 01:51.200
We have some caveats, we need route privileges to do mounts, and we need a kernel for

01:51.200 --> 01:58.960
that, which is not, I mean, it's very important.

01:58.960 --> 02:03.680
Time stamps, obviously vary, you write different times, you have different time stamps, and

02:03.680 --> 02:10.440
we have different random seeds that lead to different high-note generation, and so many

02:10.440 --> 02:12.840
moving parts.

02:12.840 --> 02:17.600
My first attempt was using a little effect time if someone doesn't know what it is, it's

02:17.680 --> 02:27.320
a library where you basically use an LD-prilode attack, let's say, it will, in IJAC, most

02:27.320 --> 02:36.160
of the random functions, like get time of day, get random, get from 32 and stuff like

02:36.160 --> 02:37.160
that.

02:37.160 --> 02:44.560
So, if we try to use a lib-fect time and create an empty MKFS file system, these are

02:44.560 --> 02:45.560
really works.

02:45.560 --> 02:51.560
We have an empty file system, which is reproducible, but in the moment that we mount it to

02:51.560 --> 02:58.960
populate it, this will break reproducibility, even if we use no time, no time, no this

02:58.960 --> 02:59.960
stuff.

02:59.960 --> 03:02.440
There are many reasons for that.

03:02.440 --> 03:08.720
First of all, LD-prilode, he's only on user space, and mount is kernel space, so that

03:08.720 --> 03:11.720
won't work, simply.

03:11.720 --> 03:17.320
Also, the creation time is set by the kernel, lot by the lib-c, so you cannot intercept

03:17.320 --> 03:18.320
that.

03:18.320 --> 03:21.480
Also, it's a very complex file system.

03:21.480 --> 03:29.320
It has allocation groups, which has some eristic of, to spread the load, as parallel

03:29.320 --> 03:38.880
and as in KO, which is non-deterministic, because it's not, and non-deterministic order

03:38.880 --> 03:47.400
of allocation changes the B plus 3 shape, so that's also not reproducible.

03:47.400 --> 03:51.520
So I went out and checked on reproducible.org.

03:51.520 --> 03:59.400
There is a system images section that shows you how some file system are actually B can

03:59.400 --> 04:09.560
actually be reproducible, like all the XTE family, SquashFS, EFS, and the ISOs.

04:09.560 --> 04:14.960
The common part is that you don't mount it to populate it, you populate it while you do

04:14.960 --> 04:16.320
it.

04:16.320 --> 04:20.240
So let's do it.

04:20.320 --> 04:29.400
I went, the plan was first, let's work on the hardest part, which is populating the

04:29.400 --> 04:37.800
directory while we do MKFS, then we can do the deterministic parts, so block the fix the

04:37.800 --> 04:47.200
time and fix the other random points, and then attest, because we always are to attest.

04:47.200 --> 04:53.400
First step was populating, I went out and searched, and actually XFS has a way to, already

04:53.400 --> 05:01.880
had a way to populate a file system, which is called Portofile, it's a very, very old specification,

05:01.880 --> 05:11.440
which basically is like a text file that represents the file system in our text manner,

05:11.480 --> 05:18.920
and as you can see, you have a reference to the original file that we want to copy, only

05:18.920 --> 05:27.900
for regular files, all the other files like SimLink's chart devices, I don't know, block

05:27.900 --> 05:33.960
devices and stuff like that, are actually created and they do not refer to the original

05:33.960 --> 05:39.920
file, so we have limited extended attribute supports, because we lose them for directories

05:40.000 --> 05:45.480
for SimLink's and stuff like that, we don't have actually control on timestamps,

05:45.480 --> 05:51.480
because we could copy the timestamps only for regular files, all the other files are set

05:51.480 --> 05:55.560
to current time, which we can control, but we lose the information from the original

05:55.560 --> 06:02.520
source. At the beginning, I proposed to change this, but this was rejected because it's

06:02.600 --> 06:14.280
a very old and let's say stable, you know. Yeah, and specification, so we decided, they decided

06:14.280 --> 06:21.920
no, no, don't touch it, so they suggested me, we should actually create a populate

06:21.920 --> 06:30.920
from directory functionality, and I tried, so basically what it does is we use the Portofile

06:30.920 --> 06:35.920
infrastructure that was already there, and just branch out in the case that we are pointing

06:35.920 --> 06:43.560
to directory to populate from, instead of file to populate from, so it will, the recursively

06:43.560 --> 06:49.760
traverse all that are actually in that source directory, and then if it's a directory

06:49.760 --> 06:59.360
recurs, it's a file, if it's a file created, copying from the source, timestamps, which

06:59.920 --> 07:09.200
the modification time is what we usually care about. Access time usually is just noise, so

07:09.200 --> 07:17.240
by default, it's not copied and set to current time. Creation time is set always to current

07:17.240 --> 07:27.240
time, we will control current time afterwards. So yeah, usually in the reproducible builds

07:27.240 --> 07:34.760
in the world, access time is stripped anyway, so by default, it's not preserved. All the

07:34.760 --> 07:44.640
time stamps we need attributes, one caveat was that when we, for regular files, this was pretty

07:44.640 --> 07:51.080
easy, the words are, they were already lots of function are pre-made to do that. For

07:51.080 --> 08:01.200
the same links, sockets in feeforce, you cannot actually use the regular FGAT, get X attributes,

08:01.200 --> 08:09.840
because we will not open them regularly, we use them with all parts and with no follow,

08:09.840 --> 08:18.560
so we don't, you know, have loops, so we have to handle the bedfile descriptor error in

08:18.560 --> 08:28.000
the world back to, I'll get X attributes and so on, because siblings and habits extended

08:28.000 --> 08:35.960
up attributes that are important to, and if we just skip them, we might not have a function

08:35.960 --> 08:43.640
of file system actually. The other part was tracking card links from the source directory

08:43.640 --> 08:52.080
in the destination file system. The implementation is added, concentrate more on correctness

08:52.080 --> 09:05.640
than performance, so it's linear array, it's a static global array, and then we just registered

09:05.720 --> 09:16.160
all the various card links and refer to that, with some logic for the growth of the array.

09:16.160 --> 09:24.080
Actually, I didn't find it a problem performance wise, because we are, so MKFS is IO bounded

09:24.080 --> 09:33.600
and it's sequential, no parallelism, and I tested it on my flatback folder, which is full

09:33.680 --> 09:44.120
of our links, and it behave good enough. Anyway, this is, maybe for the future, we can improve

09:44.120 --> 09:52.320
that and do it and ask for something like that. We have populated the file system, now we

09:52.320 --> 09:58.960
have to add the determinism to it, so we can try with, again, with leapfake time, and this

09:59.040 --> 10:05.760
actually works, but we want to get away from leapfake time, because first of all, we

10:05.760 --> 10:12.160
all deeper load type of things is not ideal, it doesn't always work, static linking is a

10:12.160 --> 10:23.200
thing, this doesn't follow the reproducible build standards for source data, so, and basically

10:23.280 --> 10:31.760
will depend on external tooling, instead of just doing MKFS, right? So, I went away ahead and

10:32.720 --> 10:40.880
implement the source data epoch thing, so we get current fixed time, so if source data epoch

10:40.880 --> 10:48.160
is specifying it's a valid number, because it's a unique timestamp actually, so seconds from

10:48.240 --> 10:57.840
1970s, instead of get time of day, we just return the fixed value, this is used for creation

10:57.840 --> 11:10.960
time, change time for I-nodes, and the other part is having the get random part, I was suggested

11:10.960 --> 11:18.400
to do a very simple thing about just, is it deterministic mode and return a fixed value instead

11:18.400 --> 11:27.680
of just passing the value that you want to be return it, because anyway, get random in MKFS time,

11:27.680 --> 11:37.360
it's only used for the I-node generation number, and at MKFS time, you don't reuse it, I-nodes, so

11:37.440 --> 11:44.160
it's not a problem if every I-node has the same generation number, so this doesn't break stuff,

11:46.160 --> 11:54.080
and then we have the testing, so in the XFS tests, I added a test where we create a five-system

11:54.080 --> 12:02.240
three times in a row, and they should always have the same harsh, very simple test, the five-system,

12:02.240 --> 12:11.200
sorry, the source directory is generated with FFS stress, so it has all type of files, links,

12:11.200 --> 12:18.800
and so on, so it should be quite representative, we can see it in action, so we can have a very

12:18.800 --> 12:26.160
basic XFS file system that is probably populated from our root of FFS directory, we can preserve

12:26.160 --> 12:33.840
access time with eight-time equals one, maybe it can be useful for someone, we can have an

12:33.840 --> 12:42.160
actually reproducible file system, an empty file system without using blip effect time by using the

12:44.000 --> 12:49.520
environment variables for source data book, and the deterministic seed, so this works,

12:49.520 --> 12:58.560
be my full, we still need to fix the U-wide of the file system because it's random else, but yeah,

12:58.560 --> 13:08.320
that's that works, and we can do a little test with make, or FI, to create a root of FFS for

13:08.400 --> 13:19.120
Debian, and create multiple, sorry, images for it, so this is a script that just uses MK also

13:19.120 --> 13:29.760
OSI to create our root of FFS, create an image file, and then MKFS it to impopulated, so if we

13:29.840 --> 13:37.840
build it the first time, this will go, and sorry, create the root of FFS, which is the slow part,

13:37.840 --> 13:48.560
actually, then after this it will automatically, you will see the familiar output of MKFS, which

13:48.560 --> 13:57.520
is one, and it automatically also populated the file system, you can see the root of FFS, we save it,

13:58.480 --> 14:09.920
in another name, and delete the root of FFS, and regenerate it again, this will rebuild the root of

14:09.920 --> 14:16.080
FFS directory, which is reproducible because MK also OSI supports reproducible builds,

14:18.000 --> 14:24.800
after that it will create a new image file and a new MKFS comment as you can see,

14:25.600 --> 14:36.320
if we file the images, they are both valid, XFS file season, and both of them have the same hash.

14:36.320 --> 14:58.240
This is very useful for us, we use it to build our images, so we know when an image actually

14:58.240 --> 15:04.560
change it or not, before maybe shipping them to clouds and stuff like that, because many other

15:04.560 --> 15:14.320
steps are not reproducible like converting our role, I don't know, VHD, VNDK, for BIS file format,

15:14.320 --> 15:22.800
cloud format, so we know if we have to fire up our CI or not from this, for example,

15:24.240 --> 15:33.120
so it's important for distributions, maybe also for embedded system, this can be integrated with

15:33.280 --> 15:46.560
MKOS OSI, as you see, it's important for security, for compliance, you need reproducible builds and for testing,

15:48.560 --> 15:54.800
these are the links to the patches, the first two are the MKFS patches, the test patches,

15:54.800 --> 16:12.000
and then the documentation are reproducible BIS.org. Thank you.

16:12.000 --> 16:17.040
My question would be, do you keep some artifacts of what was the environment you used to generate

16:17.040 --> 16:23.600
the image, because if you want to, do you keep metadata files or artifacts that traces,

16:23.920 --> 16:28.400
software you used on the version you used to obtain a particular file system?

16:28.400 --> 16:38.080
For the MKOSI part, you mean? So this one is just for the demo, in Enchenga, we use

16:38.080 --> 16:44.080
a upcode to build our root of the images, which is also the tool that we build for our container images,

16:44.960 --> 16:51.440
and that one is also fully reproducible and we have a very historic part of how it was composed,

16:51.680 --> 17:01.360
it's rendered, it's rendered, so it's fit the package versions, and it's pinned to snapshots

17:01.360 --> 17:06.000
of the repositories.

17:08.000 --> 17:13.520
All right, thanks a lot. Well, we're going to set the out of time, sorry. Okay, thank you.

