WEBVTT

00:00.000 --> 00:15.280
7,000 enterprise applications and IT services, 60,000 servers and containers, and the same amount

00:15.280 --> 00:21.360
of cultural positries, and several hundreds of thousands open source components that we

00:21.360 --> 00:25.440
rely on in our supply chain.

00:25.440 --> 00:31.760
And for every one of these components, we need to know where it's running, how it's used,

00:31.760 --> 00:36.640
in which version, and so on, and in which context.

00:36.640 --> 00:38.640
And how are we going to do that?

00:38.640 --> 00:45.520
Well, probably with an expected amount of 500,000 S-bombs created and rich then analyzed

00:45.520 --> 00:46.920
per day.

00:46.920 --> 00:52.440
That's the size of the program that a few colleagues and I have been conducting for the last

00:52.440 --> 00:54.560
two years.

00:54.560 --> 00:58.840
Yesterday, I already gave a presentation on how we set up this program, how the strategy

00:58.840 --> 01:00.440
worked, how we did that.

01:00.440 --> 01:03.440
So I would cover only a small part of that.

01:03.440 --> 01:05.840
And today, the focus is rather on the tooling.

01:05.840 --> 01:11.320
So what are we using to scale these operations and to get to a state that is good enough

01:11.320 --> 01:12.320
for us?

01:12.320 --> 01:19.720
First of all, who of you knows, I know, otherwise, who of you doesn't know Deutsche Bahn?

01:19.720 --> 01:21.720
Thank you.

01:21.720 --> 01:22.720
No, thank you.

01:22.800 --> 01:26.040
OK, so I won't talk much about this.

01:26.040 --> 01:29.680
The big thing, or the important thing is the organization is large.

01:29.680 --> 01:30.440
It's diverse.

01:30.440 --> 01:36.960
We have 500 different professions, 200,000 employees, and so on.

01:36.960 --> 01:44.120
So at the same size, it's our IT, more or less, like with the complications, and also

01:44.120 --> 01:47.680
the different scenarios that we have to think about here.

01:47.680 --> 01:50.360
So I will cover a little bit about that later.

01:50.360 --> 01:53.920
But most importantly, Deutsche Bahn is not a software company.

01:53.920 --> 01:59.920
So our business is making or running trains, and somewhat maintaining the railway

01:59.920 --> 02:03.560
infrastructure in Germany.

02:03.560 --> 02:07.880
Talking about IT now and the scope of this talk.

02:07.880 --> 02:13.760
If we think about transparent supply chains, we have to also think about the diverse

02:13.760 --> 02:20.080
streams of software, how software is coming into our organization.

02:20.080 --> 02:25.480
And this is building software, of course, and for the most different applications, like

02:25.480 --> 02:30.960
on many of your phones, probably, but also on web servers, but also on embedded devices

02:30.960 --> 02:35.480
somewhere sitting in between stations on the railway tracks or somewhere else.

02:35.480 --> 02:40.520
We also buy software, obviously, and we operate software, not only in the cloud, not only

02:40.520 --> 02:46.840
containers, and so on, but also on edge devices in operational technology.

02:46.840 --> 02:50.080
And if we have to challenge that we also know which component is used, where we have

02:50.080 --> 02:54.320
to consider all these sourcing streams.

02:54.320 --> 02:58.960
So how can we find a common language to describe the composition of all of that?

02:58.960 --> 03:03.320
I mean, we are in the S-bomb, it's of course S-bombs.

03:03.320 --> 03:09.380
But we don't see S-bomb as a tool, we rather see it as a common methodology to describe

03:09.380 --> 03:14.180
software, not only to describe it, but actually to do something with it.

03:14.180 --> 03:17.300
So I won't go through all the use cases that we came up with it, and this is only

03:17.300 --> 03:18.300
subset.

03:18.300 --> 03:23.700
But this is just more than license compliance or security vulnerabilities or CRA compliance.

03:23.700 --> 03:26.900
This can also be more forward-looking.

03:26.900 --> 03:31.380
How do we understand and analyze our whole supply chain to be in rest engaged somewhere

03:31.380 --> 03:34.340
strategically or to be in rest from somewhere?

03:34.340 --> 03:38.300
So S-bombs must become a shared infrastructure.

03:38.300 --> 03:42.500
But again, it's the only about S-bombs, it's about also thinking of these systems in

03:42.500 --> 03:45.180
these more open methodologies.

03:45.180 --> 03:51.140
And one important part of that, which is perfectly integrated with this Vex, also mentioned

03:51.140 --> 03:55.500
a few times here already, the vulnerability exploitability exchange.

03:55.500 --> 04:01.340
As a great way to cope with a challenge that we will have and already have, the more transparency

04:01.340 --> 04:07.980
we get into our supply chains, the more frightening findings we will have.

04:07.980 --> 04:12.300
Especially from the security point of view.

04:12.300 --> 04:17.620
And Vex is a great opportunity for teams to not double and triple their rub on describing

04:17.620 --> 04:21.140
whether they are factored or not by certain security vulnerability.

04:21.140 --> 04:25.660
That may not even be exploitable in their specific context.

04:25.660 --> 04:30.300
So S-bomb Vex, but also S-bomb signing and so on are all both in flux that we are trying

04:30.300 --> 04:32.420
to get into our organization.

04:32.420 --> 04:37.100
And you can imagine given from the size that we have, that this is not an easy task.

04:37.100 --> 04:43.380
So yesterday I talked a bit more on how we set this up and with which principles I will

04:43.380 --> 04:48.580
not go into detail, but I would just like to highlight one or two points here, because

04:48.580 --> 04:52.380
we as a small group that started this, we set up a few principles.

04:52.380 --> 04:59.260
And one of the most important principles was to not talk and tools, but in capabilities.

04:59.260 --> 05:05.260
So once we started to leave out the exact tool names that we had or tools that were on

05:05.260 --> 05:09.860
the market and started rather thinking in systems, the whole picture became much clearer

05:09.860 --> 05:13.500
and we could also think big.

05:13.500 --> 05:19.020
On the other hand, on the technical side, we said, well, we have to consider all the different

05:19.020 --> 05:20.020
sourcing scenarios.

05:20.020 --> 05:23.020
We cannot only look for built-espons of our pipelines.

05:23.020 --> 05:27.100
We have to think about something that works for every workflow and there are many workflows

05:27.100 --> 05:28.780
within our company.

05:28.780 --> 05:36.100
So what was really helpful for us is our mental model of an Espa lifecycle.

05:36.100 --> 05:38.500
And this is the simplifying result.

05:38.500 --> 05:44.740
So you will see on the left the different sourcing streams of Espaums, really showing where

05:44.740 --> 05:49.500
do Espaums come from, from the most diverse systems, from the most diverse subsidiaries

05:49.500 --> 05:55.140
that we have within our company and other raceic vendors and so on.

05:55.140 --> 06:02.740
And the ideas that these Espaums enriched on a baseline quality are then from decentralized

06:02.740 --> 06:08.500
sourcing coming into a central Espaum database, the Espaum inventory, how we call it.

06:08.500 --> 06:09.700
And this is very important here.

06:09.700 --> 06:13.380
We cannot live in the world where we have multiple Espaums storages.

06:13.380 --> 06:18.700
We need to know in one place with one query what we are using where and how.

06:18.700 --> 06:23.700
And given the centralized nature of the Espaum storage, we can also which with rather

06:23.700 --> 06:30.900
further information that cannot be described in the Espaum itself, like in which IT enterprise

06:30.900 --> 06:38.100
architecture, the application is the Espaum actually used, who is responsible for its content.

06:38.100 --> 06:43.460
Who do I have to call out at night from their well-deserved sleep if something goes

06:43.460 --> 06:46.900
really wrong and like what's the team responsible for it?

06:46.900 --> 06:52.500
But then in the other hand if I look at the analysis then of the Espaums, this again is decentralized.

06:52.500 --> 06:56.260
We are not having a single UI to put in all the dashboards and so on.

06:56.260 --> 07:02.260
So you can imagine this Espaum inventory is like a database with quite performance APIs.

07:02.260 --> 07:07.140
This is the simplified version that I'm just showing the system more complex things, showing

07:07.140 --> 07:12.580
the streams of Espaums and Vex information, going through our organization and systems.

07:12.580 --> 07:16.580
Again, this isn't tools, I mean you could also write some tools here but in the end

07:16.580 --> 07:17.580
is systems.

07:17.580 --> 07:21.940
You don't have to take a picture of that, these slides I think are already uploaded and you will find

07:22.900 --> 07:25.780
that. So how do we translate all of this into reality?

07:28.740 --> 07:30.900
It's clear that this cannot happen overnight.

07:30.900 --> 07:38.660
So we need to have this step by step and our strategy here was to first focus on Espaum adoption

07:38.660 --> 07:46.020
rather than Espaum quality. So we wanted to know fast which software components we are using

07:46.900 --> 07:52.980
so we look for low hanging food, we look for priorities for instance set by the server resilience act

07:53.860 --> 08:01.060
and start with getting Espaums for all source repositories and for all pipelines where the developers

08:01.700 --> 08:08.180
onboarded this. And in the course of this year we will add the Espaums from runtime so from our

08:08.260 --> 08:17.540
service, physical and cloud service and the containers. So 60,000 roughly and also following up

08:17.540 --> 08:22.180
from operational technology. I mentioned that earlier you can imagine as an infrastructure owner

08:22.180 --> 08:26.820
we have a lot of OT that is running. I have been running since decades and it's probably still

08:26.820 --> 08:33.860
running for decades but we still need to know what IT is running there. Now let's get into tools

08:34.500 --> 08:43.780
and more concrete. We looked at this overarching architecture before. So what we came up with

08:43.780 --> 08:50.020
or what the idea was is that we also create Espaum set for a certain degree of baseline quality.

08:51.380 --> 08:57.300
And this baseline quality has to be good enough to be working for our use cases so there aren't

08:57.540 --> 09:04.180
perfect. So we came up with a default modular tool chain you would call it like a default

09:04.180 --> 09:10.420
process that developers can just lock into the pipelines for instance or that is automatically running.

09:10.420 --> 09:16.100
And the steps are similar to what Victor mentioned earlier like we come from a generation of

09:16.100 --> 09:24.260
Espaums over in enrichment of Espaums for instance regarding a license or a licensing information

09:24.900 --> 09:30.740
to an analysis of Espaums regarding security but also regarding license compliance. So for

09:30.740 --> 09:36.580
instance checking it against licensing policies that we have internally based on the usage scenario

09:36.580 --> 09:44.180
and but also for creating for instance copyright notices based on that. On the horizontal

09:44.180 --> 09:51.140
you see like that we have internally and external information systems. So from internally of course

09:51.140 --> 09:56.340
we need to have some context information but I go go through step by step. So how do we generate

09:56.340 --> 10:00.740
the Espaums? Let's talk in tools now. Our default tool chain and this is important. This is like

10:00.740 --> 10:05.300
the default. This is modular like developers can choose something else if you think that is better

10:05.300 --> 10:10.820
for the ecosystem. We generate an Espaums currently using sift because they do a really good job

10:10.820 --> 10:17.940
and integrating with various ecosystems and doing some post processing. Then we find and off the

10:18.020 --> 10:24.020
Espaums especially from the licensing point of view they rely on clearly defined. That's a

10:24.020 --> 10:30.660
great project also been mentioned a few times you're already it's a basically a database of already

10:30.660 --> 10:37.700
scan packages using scan code. So we don't do not have to rescan all these packages with

10:37.700 --> 10:43.220
scan code looking for copyright and licensing information because given our size of the supply chain

10:43.220 --> 10:51.620
that will literally burn the planet. Then for the analysis of the regarding the security we are

10:51.620 --> 10:58.100
using KRIP also from Ankara and connecting this with our RECS information. To be honest this isn't

10:58.100 --> 11:04.500
perfect yet so KRIP and we are also contributing to make KRIP better with RECS information but

11:04.500 --> 11:12.260
we have this RECS pretty good and for instance that already had a really good effect that can

11:12.260 --> 11:19.700
I can say that when we had the RECS to shell vulnerability a few weeks months ago we were able to

11:19.700 --> 11:25.220
identify the effect of products within our companies 12 hours before our proprietary

11:25.220 --> 11:30.900
very expensive security scanner said we had elsewhere. So this is doing a great job it's not

11:30.900 --> 11:38.820
perfect but it's well good enough in order to respond to such things. To check against the licensing

11:38.900 --> 11:44.580
policies we are currently using KRIP also from Ankara also not perfect but we are also

11:44.580 --> 11:50.900
contributing to it and working together with Ankara and regarding the copyright notices or

11:50.900 --> 11:56.260
how we call it like compliance artifacts generation again clearly finds us a great candidate for

11:56.260 --> 12:02.260
doing so we are not calling here the the public API by the way about having set up an internal

12:02.260 --> 12:11.940
clearly defined instance that again was the simplified version now comes to complex one

12:14.340 --> 12:18.580
again look at the slides but this is really again it's talking in tools of course but it's

12:18.580 --> 12:23.380
more or less systems and workflows like how do certain steps have to integrate with each other

12:23.380 --> 12:29.380
how the flows of RECS and S-bomb information which systems internally and externally are effective

12:30.020 --> 12:35.220
basically or need to be included and what do we have to yeah still work on in order to

12:36.420 --> 12:43.780
get this whole picture working for all of our engineers. I said this tools read how we call it

12:43.780 --> 12:53.300
it's modular and the idea was to have a really good adoption from our developers so this default

12:53.300 --> 13:00.740
2 chain it's working in pipelines we have internally a product called pipe shift created by a

13:00.740 --> 13:07.540
great team by us coming up with a good lap templates for their CI and there's just a one liner

13:07.540 --> 13:13.140
that adds all of this magical many of these magic steps creating the s-bomb enriching it and

13:13.140 --> 13:17.700
storing it into our central s-bomb inventory and it just needs I don't know two or three rare

13:17.700 --> 13:23.460
others for configuration but it's also important to note that not all products within our company

13:23.460 --> 13:29.140
and our organization are using pipelines yeah they are good reasons to not do that or where it's not

13:29.140 --> 13:36.100
possible so the idea here is that it's also available as binaries as home proof packages and we have

13:36.100 --> 13:44.500
a home proof tab for that by mason top and mason plus and other functional so developers can basically

13:44.580 --> 13:49.380
integrated in all their workflows that they have for instance also on an s-bomb that has been delivered

13:49.380 --> 13:54.420
by a vendor and they want to increase the quality because they do not directly have a leverage

13:54.420 --> 14:04.020
to do so now we have a okay good s-bomb it can become better and we're working on that but what to do

14:04.020 --> 14:11.140
with it right reset it's going into our into our central s-bomb database but how do I look into it

14:11.140 --> 14:16.660
and this is where the compliance read comes into play here the compliance read is an internal service

14:16.660 --> 14:23.380
that we have it's like rep your eye but also with an API and that helps teams with understanding

14:23.380 --> 14:28.740
their assets better so and their compliance stages compliance cannot only be security findings but

14:28.740 --> 14:35.060
also other things that we have internally but it has a direct connection to the s-bomb inventory

14:35.940 --> 14:41.300
it helps the developers and teams but also service owners to understand their assets better so it

14:41.300 --> 14:47.140
can work on single repositories on on GitLab groups but also on assets that can spend multiple

14:47.140 --> 14:55.300
groups and multiple repositories so this is a great way this service also crawls through all

14:55.300 --> 15:00.980
our GitLab repositories for instance and accepts the s-bomb that are either coming from itself

15:01.140 --> 15:09.540
or fetched or was sent to by pipeline integrations and other steps and technically this is

15:09.540 --> 15:16.580
a plug-in to backstage does anyone of you do backstage by the way a few hands 30% create so

15:16.580 --> 15:21.060
we are also working well together I can only recommend it we're using it internally as a developer

15:21.060 --> 15:29.060
portal so teams can do much more with it like setting up new repositories and whole projects quite quickly

15:29.860 --> 15:36.900
regarding s-bomb I mentioned already this compliance read can interact directly with the s-bomb

15:36.900 --> 15:44.180
inventory it's basically more or less the same database or close to it so developers can directly

15:44.180 --> 15:48.900
see the s-bomb that they sent to it they can also manually upload it different for instance

15:48.900 --> 15:54.580
get it from a manufacturer and they can also inspect the quality they can re-upload or download

15:54.660 --> 16:02.020
the s-bomb so that it works quite nicely based on these s-bombs we mentioned before

16:02.020 --> 16:08.580
there are also findings we have security findings of course like from the scribe scanner running constantly

16:08.580 --> 16:14.580
on all the s-bombs finding on an up to date security or vulnerability database rather there are

16:14.580 --> 16:20.580
findings and for the licensing the same applies so we have license checks integrated and so

16:20.580 --> 16:26.580
developers can see what's the status whether it complies with my own licensing policy or with

16:26.580 --> 16:33.860
the company's default depending on the usage scenario of the project or the product over there

16:35.140 --> 16:40.580
the idea here is that we have a machine readable and come close to an automated compliance

16:40.580 --> 16:46.580
so not only throw warnings at developers but also prioritize these warnings and that works quite

16:46.660 --> 16:53.380
well already technically I think the last slide on the compliance suite how it's working

16:54.020 --> 16:59.460
from the architectural side I mentioned already the compliance read itself it's a plug-in for

16:59.460 --> 17:05.540
backstage that is doing much more than just these compliance things the compliance within instead

17:05.540 --> 17:12.740
has this runo or the service that quads to get lab and also our edge of DevOps and on the deployment

17:12.740 --> 17:18.820
side how this is deployed of course Kubernetes and with a big support of cross plane and so these

17:18.820 --> 17:26.100
are really important open source projects that we rely on and also contribute to now we have all

17:26.100 --> 17:31.380
these s-bombs and findings what do we do with that and here I would like to showcase one of these

17:31.380 --> 17:37.620
use cases that we have if we have aggregated information over our supply chains and this shows just

17:37.620 --> 17:43.140
two stats of the many that we couldn't can generate first is front and frame works so we can quickly

17:43.140 --> 17:49.140
see overall what the Deutsche Bahn are we using as front and network what are the rising stars

17:49.140 --> 17:55.700
rare as technology that might be hidden champions do we for instance want to strategically use

17:55.700 --> 18:01.140
only one or two frame works instead of a whole mix of it or do we are we using front ends

18:01.140 --> 18:06.020
of frame works or also programming languages that we actually do not want to use anymore because

18:06.100 --> 18:11.700
they are deprecated also and so this way having this aggregated information a rare federated

18:11.700 --> 18:17.300
organization we can make better decisions so s-bombs are not only good for security compliance

18:17.300 --> 18:22.980
and so on but also for forward thinking investments into ecosystem and the understanding

18:24.180 --> 18:29.220
I would like to show and now we are running out of time a few stats that are just compiled a few days

18:29.220 --> 18:36.340
ago very roughly I went through over 80,000 s-bombs basically it's natural of a second 80,000

18:36.340 --> 18:42.260
is the current size we are having more s-bombs coming in basically every second that would

18:42.260 --> 18:49.540
place old s-bombs and yeah interesting numbers here I mean I found that we are using more than 100,000

18:49.540 --> 18:55.140
open source components so without versions if we add versions and architectures this will be tripled

18:55.140 --> 19:02.180
on factor 5 I think so this is great insight that we have and the questions how do we actually

19:02.180 --> 19:08.900
act on these stats at the end and the important part here is it's people so we have to take

19:08.900 --> 19:14.020
people with us not only developers not only operators but also those who are as responsible

19:14.740 --> 19:20.820
for security like governance owners so make it easy for them integrate into their workflows

19:20.820 --> 19:25.860
consider their different ways how software is developed and operated that's very important and

19:26.660 --> 19:31.700
surely governance owners and those who run these projects like I do our important part

19:31.700 --> 19:36.500
yeah but we have to take people with us and these developers so concluding

19:37.540 --> 19:44.020
s-bombs are not a product they are a mythology and therefore it can only be implemented incrementally

19:44.020 --> 19:49.380
it has to be pleasant for the users and it has to be modular we cannot expect a single tool

19:49.460 --> 19:55.140
to solve our other issues so internalize this knowledge into your organizations do not externalize

19:55.140 --> 20:00.740
it do not rely on single manufacturers for that and let's collaborate not only on the tools

20:00.740 --> 20:08.260
but also on how we set this up with our organizations and communities so thank you very much for the

20:08.260 --> 20:12.900
you

