WEBVTT

00:00.000 --> 00:18.960
Thank you, and good afternoon, everybody, so I'm Mario, and today I'd like to talk a bit

00:18.960 --> 00:26.960
how we're working to implement an S3-fronted cold storage at serve. Just very quickly, a bit

00:26.960 --> 00:31.760
of what me, so I'm computing engineer, sir, and I started just three months ago, and I joined

00:31.760 --> 00:38.640
the T-Percad on the cap team. And my first task in the team was actually to review proof of concept

00:38.640 --> 00:45.760
it was done by a summer student last year, concerning putting a tape back into an S3 endpoint,

00:45.760 --> 00:50.880
and my ultimate goal in there will be to design an S3 interface for our tipping infrastructure.

00:51.200 --> 00:58.080
Just briefly, what we will discuss today, so we'll go over the project goal and give some

00:58.080 --> 01:05.280
technical context that will shed more light on why the decisions and discussions, why do what

01:05.280 --> 01:10.880
I will talk about. Then we'll analyze the proof of concept that has been developed last year,

01:10.880 --> 01:17.280
and then we'll go over brainstorming of the possible architectural solutions that we will be

01:18.240 --> 01:24.640
and at the end, there will be some time for questions. First of all, what is sir,

01:24.640 --> 01:29.200
so it's the world's biggest laboratory for particle physics, it's mostly concerns itself

01:29.200 --> 01:36.000
with the fundamental physics study, and it's mostly famous for it's most famous for it's

01:36.000 --> 01:40.960
LHC, it's a large hydrogen collider, which is a particle accelerator that is located beneath

01:41.920 --> 01:50.400
the Geneva area, spanning across Switzerland and France border. Why does sir need tape in the first

01:50.400 --> 01:55.920
place? Well, because experiments need to write a lot of data and it needs to be stored somewhere,

01:55.920 --> 02:03.520
and sir, the tapes constitute a very efficient means of storage. In fact, just last year

02:03.520 --> 02:10.000
last December, we had one extra bite of stored data in our tape libraries for x-marry

02:10.000 --> 02:16.080
mandate, but that's not the only use case. We also store user data for could be for various

02:16.080 --> 02:22.080
reasons for compliance or disaster recovery, but that's another use case, and the way that we

02:22.080 --> 02:27.120
stored them, so we have developed internally software to manage the tape libraries, which is called

02:27.600 --> 02:36.000
the Seren tape archive or CTA, and it's open source software, and so the idea here is to have

02:36.000 --> 02:43.360
CTA as a tape backend to a nestry plus place here, API endpoint. Why is, because so as three

02:43.360 --> 02:50.800
is pretty much the industry standard, there's a lot of clients support for it, and also we want to

02:50.800 --> 02:57.280
fit into the ecosystem that is already there with the S3, so we want to avoid reinventing the

02:57.280 --> 03:02.800
wheel of creating another protocol, so S3, what constitutes a good way to interface with it.

03:04.160 --> 03:08.240
I should also mention that for the time being, this project will at least internally

03:08.240 --> 03:17.120
will mostly target the user data archive use case. So to give a brief visual about it,

03:17.120 --> 03:23.040
so imagine this being the whole service that we want to build, so it's take back up service

03:23.040 --> 03:29.920
for users, you have backup buckets, and there must be an S3 point somewhere, but you can identify

03:29.920 --> 03:36.720
within this system a subset, which I would call the appliance, which has its own aspirant point,

03:36.720 --> 03:42.320
and this is the one responsible for actual physical tape storage regardless of how the system

03:42.320 --> 03:46.720
is internally built, and this will be our main challenge to build it.

03:49.040 --> 03:57.360
So first, technical context about certain type archive, so again it's the software that

03:57.920 --> 04:02.480
provides physical access to the tape libraries, and the interesting thing about it is that you

04:02.480 --> 04:10.720
don't interact directly with it. The way, so CTA needs a disk buffer in front, which is an intermediary

04:10.720 --> 04:19.760
storage medium, which will act as a buffer between the client and tape, and this is because

04:19.760 --> 04:25.040
the tape medium is really high latency, it's not always ready to write, so you need this intermediary

04:25.840 --> 04:32.240
space to store that on, and for this reason you can see CTA as a tape back and for the disk buffer,

04:33.120 --> 04:38.800
and for example the most support, I mean there are some support that flows for the files,

04:38.800 --> 04:44.080
you can write files to the disk buffer, and it will be archive, or you can replace the file to be

04:44.080 --> 04:49.760
recalled, and it will be written back to the disk buffer if it was not already there.

04:51.120 --> 04:56.240
Yeah, there are more flows, but we'll mostly focus on these, there's not enough time unfortunately,

04:56.240 --> 05:01.280
and it's free and open software, it's on a certain GitLab or on GitHub,

05:03.280 --> 05:07.920
and just to be a bit more clear, so this is an example of the archive also, there will be some

05:08.480 --> 05:14.160
data producers, which will dump data into the disk buffer, and the disk buffer, which knows that

05:14.160 --> 05:21.440
is being backed by CTA, will basically queue work into the archive queue, and eventually the

05:21.440 --> 05:27.520
tape drives will, when they are free to perform work, will pick up some of the work that needs to be done,

05:27.520 --> 05:34.160
and will mount a new cartridge in there, and it will start writing to tape, you can see there

05:34.160 --> 05:43.280
is 400 megabyte per second average, there is a reason for that, and this is for each of the

05:43.280 --> 05:50.880
tape drives, and the workflow works in reverse, so one thing that I didn't mention here is that

05:50.880 --> 05:57.920
as soon as a file is written to tape, it's actually truncated in the disk buffer, so the disk

05:57.920 --> 06:03.120
buffer retains the metadata, but the content of the data is not on the disk buffer, or all of the

06:03.120 --> 06:09.920
data is fully on the only on tape, and given this situation, you have the recall workflow to call the file back,

06:09.920 --> 06:18.240
so there is a component that will ask the disk buffer, actually, for the file for restoring it from tape,

06:18.240 --> 06:24.240
and this will happen again to a retreat queue, and whenever a drive is ready, and the conditions match,

06:25.520 --> 06:31.440
and the conditions allow it, it will mount the cartridge for which it was requested, and will dump the

06:31.680 --> 06:36.720
content of the file back into the disk buffer, so it will rehydrate the file, metadata was already there,

06:36.720 --> 06:43.840
but now the content is also back there, and the client can access it, so a bit more about the disk buffer

06:44.560 --> 06:50.320
that is a sits in front of CTA, so it's the one holding the metadata, you can see it has a file system

06:50.320 --> 06:57.440
higher hierarchy, so you write as you want to a file system, but one thing that should be noted that

06:57.440 --> 07:04.160
metadata leaves only here, whenever CTA reads from the disk buffer, it will only read the data,

07:04.160 --> 07:11.840
and write exclusively that to tape, so U.S. actually, U.S. I didn't mention that, but this is the

07:11.840 --> 07:18.240
software that we use for as a disk buffer, so U.S. stands for U.S. open storage, and it's also an open

07:18.240 --> 07:24.640
source technology, but basically the fact that metadata is held only by it means that its persistent

07:24.800 --> 07:29.440
is critical, so if you lose U.S. for example, you will be able to retrieve the file content

07:29.440 --> 07:37.120
not there metadata, so you wouldn't know what the files are, and so a little difference with the

07:37.120 --> 07:43.680
normal file system is that U.S. is aware of files being online or offline, so if the content

07:43.680 --> 07:48.800
is there, you can consider a file online, you can retrieve it right away, but if the content is on tape,

07:48.800 --> 07:52.320
even though the metadata is there, then the file is offline, because you need a retrieval operation

07:52.320 --> 08:00.480
to be able to read it back, and it's explicitly designed for large and stable throughput,

08:00.480 --> 08:09.520
this is important for tape, because actually tape has a minimum speed that it should be written

08:09.520 --> 08:15.040
and written to and write from, and this is because of how it works, because it's a linear tape,

08:15.040 --> 08:19.760
and so it spins and breaking it is not something that you can do immediately, if there's no data,

08:19.760 --> 08:29.760
I cannot stop right away, so there are some constraints around speed, about this technical context,

08:29.760 --> 08:37.600
I want to take away the main points about the minimum speed for the tapes, so that whatever solution

08:37.600 --> 08:42.320
needs to be able, you can see this as constraints or features, so it needs to guarantee

08:42.320 --> 08:48.080
a minimum speed stable and minimum speed, then the fact that metadata lives on the disk buffer is

08:48.080 --> 08:54.800
also important, losing the disk buffer, whatever implementation is, it becomes critical because of that,

08:56.400 --> 09:02.400
then there's no object affinity logic in CTAs of today, but this I mean that whenever there is a

09:02.400 --> 09:07.120
bunch of work to do, a bunch of files to write on tape, then a CTA currently doesn't

09:08.080 --> 09:13.520
have any logic around which files to write, it will just do some work, but for some files,

09:13.600 --> 09:18.640
it may make sense to live on the same tape because they have a highest, a higher chance of being

09:18.640 --> 09:26.160
restored together, so you want to mount only one tape and not more of them, also one currently

09:27.840 --> 09:31.680
present semantic on CTA is that the file is considered safe when it's fully on tape,

09:32.560 --> 09:37.600
and not if it's in between, so if you write on the buffer only, then it's not considered safe,

09:37.600 --> 09:44.640
yet, and the clients are able to ask the disk buffer if the file is on tape, or not,

09:45.200 --> 09:50.640
and also there's no modify files semantics or only delete, which may be relevant if you

09:50.640 --> 09:55.760
intend to interact with a street bucket, some operation that a street can do may not be doable

09:57.360 --> 10:02.640
as far as the disk is concerned, speaking of which the other half of the technical introduction that

10:02.640 --> 10:09.040
I want to do is about a street and glacier API, just really quickly, I think a lot of people are

10:09.040 --> 10:18.400
familiar with it, but I still may not be, so S3 is a product of AWS, which offers object storage,

10:18.400 --> 10:25.200
and you talk with using the S3 API, it's a rest interface, HTTP, and here on the left you can see

10:25.200 --> 10:30.720
an example called that you can do, so for example, the get object operational, let's you retrieve

10:30.720 --> 10:37.360
the content and metadata of a file, and you would specify the most important learning methods are

10:37.360 --> 10:44.640
bucket and key, which are relevant for the S3 service, so the bucket is a namespace, a domain of files,

10:44.640 --> 10:51.440
you can see as it has its own file system, and the key is the path within the file system,

10:53.200 --> 10:57.360
and so on the right you can see how it maps to an HTTP request, you can see that the bucket

10:57.360 --> 11:07.280
and the file key appear both in the path URL, and a get object operation will essentially be

11:07.280 --> 11:15.600
a get HTTP verb, and this is the general idea behind the S3, a rest interface, and the kind of

11:15.600 --> 11:22.960
operation you can do is on object level, write, read, delete objects, on bucket level, among which,

11:23.120 --> 11:27.440
so you can of course create a numeric buckets, but there's also the lifecycle configuration, which is

11:27.440 --> 11:33.840
really important, and I previously talked touched upon it, but extensively, so you can configure the

11:33.840 --> 11:40.800
bucket for automated movement between storage classes, which are abstractions of which kind of storage

11:40.800 --> 11:48.800
medium will hold your data, then you have metadata and other operations, but one point that I want

11:49.120 --> 11:56.080
that is very relevant for this talk is the Glacier API subset of S3, so you can see that

11:56.080 --> 12:03.680
it lets you do the archival and recall of the files, just what CTA models, but using the S3 API,

12:03.680 --> 12:11.280
so regarding the archival operation, it's actually not an imperative operation, so you can

12:11.280 --> 12:21.200
not ask the S3 API to move a file to call storage unless you copy the whole object, what is

12:21.200 --> 12:26.640
usually done is that you set a lifecycle policy in the bucket, so you're able to tell for example,

12:26.640 --> 12:32.560
after one year, move the file to tape, and it gets moved to a different storage class, and as soon

12:32.560 --> 12:37.520
as this happens, the object gets truncated, and then you get an invalid object state reply if you

12:37.520 --> 12:42.640
try to get the content, because the file is offline, it's not there anymore, to get it on online

12:42.640 --> 12:48.800
again, you should do a recolour operation, this is an imperative, so you request a restore object,

12:48.800 --> 12:55.440
and the system will do in the background some work to get your file from tape back into the bucket,

12:55.440 --> 13:01.040
and rehydrate it, and after the operation is completed, then the file will be accessible again.

13:01.120 --> 13:09.440
Most important point, as Glacier is a user of facing API, it doesn't specify any kind of interface

13:09.440 --> 13:15.440
to the tape infrastructure, you have no way to tell how to transfer file files to whatever

13:15.440 --> 13:21.440
take system that you use, because it's all an internal detail, so the way that it will be

13:21.440 --> 13:28.480
doable is the implementation specific, and different S3 implementation among the open source

13:28.480 --> 13:35.040
available ones offer a different degree of both compatibility with the S3 and mechanism to

13:35.040 --> 13:41.920
let you move files to tape, so with that intro, we can finally take a look at the proof of concept

13:41.920 --> 13:47.520
that was worked on the last summer, so the architecture that was picked was to take the current stack,

13:47.520 --> 13:55.120
so this buffer in front of CTA, and then another layer, in this case, Nuva was picked, it's also an

13:55.120 --> 14:01.840
open source software with commercial production usage, it provides an S3 interface and a way to write

14:01.840 --> 14:09.040
files to tape, and the whole of all of these three components for the appliance in the

14:09.680 --> 14:17.200
schema that I showed you before, so Nuva is deployed on Kubernetes cluster, there's an operator,

14:17.200 --> 14:23.360
so you deploy the custom resource, and you automatically will get an S3 endpoint in your cluster,

14:25.520 --> 14:32.560
so in stores internally, the data on this, what's called NSFS, it's on namespace file system,

14:34.080 --> 14:38.240
and which is essentially a directory, so it maps objects to file system paths,

14:39.440 --> 14:48.400
and you can see how in fact in the POC we deployed Nuva using a local file system, so you mount

14:48.400 --> 14:53.280
that director within the pod, and you declare that there will be your storage, and so it really

14:53.280 --> 15:00.000
maps to files, and the way that you can write to tape is by using the take cloud interface that they

15:00.000 --> 15:08.560
offer, so in which we can dive a bit, so how does it work in this case, so it's implemented storage

15:08.560 --> 15:15.120
classes as AWS does, so there's the warm storage class and there's glacier, whenever you write

15:15.120 --> 15:20.400
to glacier, be it because you do it explicitly or because a lifecycle rule does it for you,

15:20.800 --> 15:28.000
nothing happens actually, the request gets appended to a log, called migrate log, the same is true

15:28.000 --> 15:36.160
for retrieval, whenever you call a restore object action, then your operation is written down to the

15:36.160 --> 15:42.400
recall log, and these are all asynchronous, so what we'll actually do the operation is whenever

15:42.400 --> 15:48.960
a Crohn drop or an operator will call this managed NSFS script, which we'll actually do some

15:48.960 --> 15:54.080
a log notation and managing locking, and actually delegate the most important part of the

15:54.080 --> 15:58.800
world, so actually moving that up to tape to another script, which you write, so there's an

15:58.800 --> 16:05.200
next interface to it, you can write the migrate and a recall script, and basically the log file will

16:05.200 --> 16:10.240
be passed on to you, and that log file essentially contains the files under the NSFS, so the name

16:10.240 --> 16:16.000
of the files that should be moved to tape, so just move down one by one, and that's up to you

16:16.000 --> 16:23.760
after your implementation, so you can do it however you'd like, so we tested it out, and what

16:23.760 --> 16:30.960
did we observe from this, so in interface is indeed very flexible because it leaves all of the

16:30.960 --> 16:36.320
heavy load to you, it's also smart enough to not recall files, for example, if the disk buffer is

16:36.320 --> 16:41.120
full, so because it's close to full, then you shouldn't put more stuff into the disk buffer,

16:41.120 --> 16:45.280
and the user will have to wait, basically, so the user will do a restore operation, but we'll

16:45.280 --> 16:53.280
have to wait for it to happen, because the infrastructure is not ready. I think that documentation

16:53.280 --> 16:58.800
could use some improvement, in this case, for example, we found some updated documentation

16:58.800 --> 17:04.880
on the, about the log format, and also some things learned unclear, like the boundary of failure

17:04.880 --> 17:12.960
handling, when the responsibility, when something fails during the script execution, then some

17:13.120 --> 17:18.320
features are missing, as far as I could understand, the storage class flight cycle does not,

17:19.120 --> 17:25.840
is not doable, so you can just write directly to Glacier, and also we observe a failure at

17:25.840 --> 17:30.640
around 10K, it's even thing is migration, but honestly this requires investigation, I didn't

17:30.640 --> 17:35.040
have the time to look into it, but this could very well be to our implementation of the

17:35.040 --> 17:44.480
migration script, because it was done in batch, and it's a QC. So a few more words about the architecture,

17:44.480 --> 17:48.800
the duplication of the disk buffer is something that we have implemented, basically, here,

17:48.800 --> 17:54.640
there's Nuva, and there's US, each has their metadata copy, which is not ideal, it could

17:54.640 --> 18:01.200
introduce its synchronization problems, and also duplication of the provision space is also not ideal.

18:02.160 --> 18:07.200
I call this the one more layer approach, which made troubleshooting really painful, because there's a chain

18:07.200 --> 18:15.200
of heterogeneous technologies at work, one after the other in sequence, also the file content

18:15.200 --> 18:23.680
migration being initiated by Nuva, Nuva is not ideal, because again I said tape is a, so it doesn't

18:23.680 --> 18:29.360
fit the CTA model, because CTA basically will, is a long running process that will wait for the

18:29.360 --> 18:36.160
drive to be ready to do work. So if you run a script and you delegate the data movement to

18:36.160 --> 18:41.520
this script, it doesn't fit the CTA model, because CTA expect the object to be fully there for it,

18:41.520 --> 18:47.520
so if you want to link Nuva with CTA, you need to buy a disk buffer in the middle to hold the file,

18:49.120 --> 18:55.520
or a long running implementation, but let's not go there, and the last point which may be

18:55.520 --> 19:01.520
very specific to certain, but there's an abundance of, of sex, sex expertise, because there's a lot of

19:02.720 --> 19:09.600
production users, users, usage, pardon, of Seth, and so we also wanted to take a look at

19:10.160 --> 19:18.240
its implementation of a rather skateway as well before going on. So with that premise, let's go over a few

19:18.240 --> 19:26.640
solutions, but first, one last premise, there's one important feature of Seth that I should

19:26.640 --> 19:33.280
mention, which is a, so a zika project, also called, software abstraction layer. So Seth

19:33.280 --> 19:40.800
has internally split the way that it entered the S3 API from the way, from the storage driver,

19:40.800 --> 19:47.440
so you could theoretically implement a driver to write somewhere else other than rados, for example,

19:47.520 --> 19:53.920
a positify system, and you can also write filters. One example of which is the wasp lifting,

19:53.920 --> 20:00.640
also mentioned during the previous, the previous talk, and the other important feature is the

20:00.640 --> 20:07.760
cloud transition and cloud restore feature, in which you can declare a storage class, a cold storage

20:07.760 --> 20:12.960
class, which actually maps to another bucket, instead of an actual storage, I mean cold storage

20:13.040 --> 20:20.240
medium. So in this case, you can chain buckets and make them colder and colder with this feature.

20:20.800 --> 20:27.600
And I mentioned this because, again, in the schema of the mental model that I worked with,

20:27.600 --> 20:32.800
I distinguished the appliance, which has the actual tape, I mean the responsibility of moving data

20:32.800 --> 20:39.200
to tape from anything else that comes before it, which may very well be another bucket, thanks to

20:39.200 --> 20:44.640
this feature. And there are a few benefits around it. There is isolation and control. Like,

20:44.640 --> 20:50.080
for example, you isolate your appliance from, you can control the bandwidth between the buckets.

20:50.080 --> 20:56.800
So because the appliance needs to output a stable and a stable throughput to the drives, for example,

20:56.800 --> 21:03.360
and other advantages. So that said, our concern here, what I will show you from

21:03.360 --> 21:09.680
in the solution space is about the appliance. So I grew them into the solution families because

21:09.680 --> 21:14.720
there's a few ways how you could do it is. These are just examples, but first approach, one more layer,

21:14.720 --> 21:20.880
what we did with the proof of concept, we saw the pros and cons of this. So, I mean, it's already

21:20.880 --> 21:27.040
working system. So US and CTA are one after the other, and US can already output the throughput that

21:27.120 --> 21:34.480
CTA needs. So you have a layer on top, it's easy to do. But again, you need to, I mean, there's

21:34.480 --> 21:38.960
the metadata duplication, you need to maintain your own storage drive, so you need to maintain

21:38.960 --> 21:44.160
the interface between the two, and it's painful to debug because it's a heterogeneous.

21:46.560 --> 21:51.360
So let's take a look perhaps what I call a solution, one look five, one more thin layer.

21:51.680 --> 21:57.680
Another way to do it would have a thin, uh, a three protocol translator. So since, uh,

21:57.680 --> 22:02.880
a structured, uh, basically reproduced the, um, semantics of US, uh,

22:02.880 --> 22:07.600
based on your port objects, red object, restore objects. So that those are all, uh, things that

22:07.600 --> 22:13.920
US does. So, uh, and in fact, one colleague of mine has, uh, created a project called US

22:13.920 --> 22:19.520
a three, which is a module for diversity gateway, which does this exact translation. So this is

22:19.520 --> 22:25.440
another way to do it. Uh, you save yourself the metadata duplication because there's a direct

22:25.440 --> 22:31.440
translation. So US has a, is the only owner of the metadata. Uh, but you have to care about, uh,

22:31.440 --> 22:38.080
for emulation of a STIP, API for example, it's on you. Uh, another way you could do it is,

22:38.080 --> 22:43.840
uh, what I call client driven emulated glacier in which, uh, your reverse the relationships. So there's

22:43.920 --> 22:50.800
CTA. So, uh, so as three being as far as technology, the question comes, uh, naturally, why not use

22:50.800 --> 22:56.320
that as a this buffer? Maybe CTA could be a client for it. I mean, the elephant in the room on this

22:56.320 --> 23:02.080
question is the fact that the S3, uh, API, a glacier in particular does not provide you any mechanism

23:02.080 --> 23:08.800
to work, to emulate, uh, migration to cold storage. So you cannot really, uh, ask the project,

23:08.800 --> 23:14.160
the, uh, the file to go flying, for example, like it's done usually in the back of, uh, of an

23:14.160 --> 23:20.160
S3 service. But you could emulate it. One, uh, where you could do it is, uh, for example, CTA can

23:20.160 --> 23:26.400
write, or object metadata, uh, so you could mark an object as a flying, for example, through a tag.

23:26.400 --> 23:31.840
And the lower scripts, uh, whenever it finds the tag as offline, it will reply in expected way.

23:31.840 --> 23:35.840
It will emulate glacier by saying, you know, invalid, invalid object state, these objects

23:35.840 --> 23:40.160
are flying. And then CTA is free to work with the objects. That's one other way to do it.

23:42.560 --> 23:49.040
Another way would, would be out of band data control. So we said that S3, for example, doesn't provide

23:49.040 --> 23:56.080
you a way, um, the S3 API, doesn't provide you a way to, uh, basically move the object to cold storage

23:56.080 --> 24:02.160
as a client. But perhaps there are internal ways to do this, like out of band ways, uh, one way

24:02.160 --> 24:06.160
could be to access liberado subjects directly to truncate and re-hydrate objects.

24:06.880 --> 24:11.760
Another way could be to implement either non-standard S3 API or some tooling, for example,

24:11.760 --> 24:17.760
through, uh, some other medium. These are just theoretical, uh, ideas. Well, uh, none of this is

24:17.760 --> 24:23.840
implemented, uh, but this could be other ways in which this could be done. And of course,

24:23.840 --> 24:28.160
pros and cons are very much implementation dependent. For example, the risk, if you access

24:28.240 --> 24:32.640
liberado's object, the risk of exposing internal details, which are not guaranteed. I mean,

24:32.640 --> 24:38.080
there are, in the documentation, but, uh, I think, uh, the right to change them. I mean,

24:38.080 --> 24:43.040
there, there's nothing set in stone there. They could change in a new, uh, release, perhaps.

24:44.320 --> 24:48.640
And yeah, in the end, there are a few solutions, which I won't spend, uh, much

24:48.640 --> 24:55.360
words about. I included them for completeness. Uh, one of them is CTA, implement the S3 API.

24:55.360 --> 25:01.360
I think there are enough S3 API providers, which are open source and, uh, in addition to being,

25:01.360 --> 25:07.680
uh, reinventing the wheel, uh, it also introduces a new functional scope for CTA, which is this

25:07.680 --> 25:14.000
buffers. CTA is a part of a full solution. It doesn't necessarily need to be the full, uh,

25:14.000 --> 25:20.560
appliance here, but it can be the, uh, the back end of an appliance. And so it would introduce, uh,

25:20.560 --> 25:25.920
there's a scope creep here. And same, if you wanted to theoretically create, um, like, bring the

25:25.920 --> 25:31.280
disk buffer to CTA, but then, uh, create this standard interface to integrate with whatever, whatever

25:31.280 --> 25:37.120
solution. Uh, uh, but here, again, new functional scope for CTA, new problems with, uh, are not

25:37.120 --> 25:43.680
considering that at the moment. So the next step for us would be to test out, uh, some of these solutions

25:43.680 --> 25:50.080
and see, uh, what works best. Uh, but I'm also interested if, uh, any of you has implemented this already

25:50.240 --> 25:56.960
to hear about your thoughts, your pains and, uh, uh, your joys with your approach. Uh, so I'm

25:56.960 --> 26:03.840
available, uh, for the rest of, uh, foster, uh, so if you want to talk, I'd be happy to, uh, just

26:03.840 --> 26:09.120
quickly acknowledging some of my colleagues, uh, Michael Vladimir and Niels for providing some, I mean,

26:09.120 --> 26:14.240
technical feedback and, uh, graphs as well as the Seth website and mermaid for the graphics.

26:15.120 --> 26:18.880
And that would be all.

26:26.880 --> 26:37.120
And, uh, I, uh, so it is an alternative to the tape rest, uh, API, uh, interfaces, uh, and

26:37.120 --> 26:42.800
several questions. There is something, some checks, uh, on the checks, some of the files,

26:42.800 --> 26:48.080
when there is a retrieval, uh, from the retrieval, the data and the checks, I'm saying,

26:48.080 --> 26:57.120
on the metadata, uh, of the else. So, uh, so, uh, first question, uh, tape rest API is in

26:57.120 --> 27:03.120
enough. Uh, sure. So the question, the first question is, uh, uh, uh, this is an alternative

27:03.120 --> 27:06.560
to the tape rest API. So this is about Seth, no?

27:06.560 --> 27:10.560
Oh, yeah, about, yeah. Is there a tape rest API?

27:10.560 --> 27:18.640
It pressed the API, you know, the API protocol, the API used by WSIG for, uh, data, uh, for, uh,

27:18.640 --> 27:26.880
for a drive, uh, data from data, uh, varsity gate for human, um, there is a, uh, WSIG standard

27:27.040 --> 27:35.120
of the tape rest API to recall find from a, uh, okay. Okay. No, this is more, mostly about,

27:35.120 --> 27:41.120
so that's, uh, basically, so S3 doesn't provide an interface, will Seth implement this interface.

27:41.120 --> 27:46.240
So I don't know, I, I, I, I, I'm not part of the Seth process, uh, uh, uh, uh, project, but that

27:46.240 --> 27:52.160
will be an interesting, uh, thing. So, uh, I, actually, I wasn't aware of this, uh, so thank you for that.

27:52.160 --> 27:56.640
And the second question was about the checks summing, checking, when never a cloud object is

27:56.720 --> 28:01.280
retrieved, right? So the object cannot change, uh, in that case, is what you mean.

28:01.280 --> 28:08.160
So if the object changes on the second bucket, uh, then the first bucket will return an error.

28:08.960 --> 28:14.880
Is this what you mean? Or if there is an error, when, uh, recalling the child from tape to

28:14.880 --> 28:20.400
this cup, there is some checks about the test that they recall is done correctly, and all that

28:20.480 --> 28:28.000
I, uh, read correctly from tape for, or assume that the reading is always successful from

28:28.000 --> 28:35.120
tape to this. Ah, no, I, as far as I remember, so I'm not a CTA developer, but I think that CTA does it,

28:36.160 --> 28:36.800
on its own.

28:36.800 --> 28:38.000
Thank you.

28:38.000 --> 28:38.800
Thank you.

28:38.800 --> 28:43.200
All what is it said on tape, when you work on people, as we already did.

28:43.200 --> 28:48.480
So CTA is the part, both, uh, sorry, um, so repeat the question. So does, uh,

28:49.120 --> 28:53.040
um, there's only one copy of the file on tape, or there are more than one.

28:53.040 --> 28:57.920
So CTA allows you to configure that. You can configure CTA storage classes, which, uh,

28:57.920 --> 29:00.320
define how many copies of the data do, would you like?

29:00.960 --> 29:20.960
Okay, so, uh, question is, if you were to retrieve a tape from a library, would you be able to take a

29:20.960 --> 29:23.600
look at tape and figure out which files are lost?

29:23.760 --> 29:28.400
Would you be able to work out from EOS, which files are on this type?

29:29.760 --> 29:31.680
Ah, from EOS, which file are on this tape?

29:31.680 --> 29:38.960
So this information is held, actually yes. So, um, basically, the, uh, so the, the, the way that it

29:38.960 --> 29:44.640
works is a, but you want to need CTA. This is the thing. So EOS by itself doesn't, because the way that

29:44.640 --> 29:50.400
CTA and the US, uh, basically agree on where a file is. So how can you track a file?

29:50.480 --> 29:58.480
Basically, um, so on EOS, there is the concept of file path, right? And on CTA, there is no such concept.

29:59.200 --> 30:04.000
Uh, so how does it happen? So CTA, whenever there is a file, we'll also try to figure out if,

30:04.000 --> 30:10.080
if that file is already on tape. And how does it do that? It looks at, um, extended attributes of the object.

30:10.080 --> 30:15.200
So, uh, if it's not there, uh, if there's no extended attribute with the archive ID,

30:15.200 --> 30:20.720
then, uh, it will write, create a new one for it, and identify or internally to CTA.

30:20.720 --> 30:26.960
It will write it as extended attributes. And from them on, that file on EOS would be trackable on tape.

30:26.960 --> 30:32.880
But only through CTA. So CTA tracks where the archives ID are on which tape and where.

30:32.880 --> 30:38.560
And EOS only cares about knowing the file path and the archive ID. So this is, uh, the, the chain.

30:41.760 --> 30:45.040
Okay, so our time's up. If you have a question, please meet me later.

30:45.200 --> 30:47.200
Thank you.

