WEBVTT

00:00.000 --> 00:16.320
All right, folks. We're going to get started, or my glasses. This is a new problem in my life.

00:16.320 --> 00:21.280
So some of you will understand when you get to a certain age. Anyway, this next session

00:21.280 --> 00:25.440
is a case study in critical infrastructure and what happens after a crisis. Please welcome

00:25.440 --> 00:36.480
John Erickson. Hello. So my name is John Erickson. I am currently the community's manager

00:36.480 --> 00:41.760
for OpenSSL Foundation. In the past, I was a community manager at Stack Overflow in a few other

00:41.760 --> 00:49.760
places and today I'm going to talk about what might have been the worst day in the history

00:49.760 --> 00:57.520
of the OpenSSL community. And perhaps also the very best day. I like to start my talks with

00:57.520 --> 01:02.480
the OpenSSL mission. I'll just read it. We believe everyone should have access to security and

01:02.480 --> 01:08.800
privacy tools wherever they are, whoever they are, wherever they are, and whatever their personal

01:08.800 --> 01:16.560
beliefs are as a fundamental human right. And this formulation is a relatively new for the

01:16.560 --> 01:22.400
mission, but the belief system, the idea that we believe everyone, wherever they are, whoever

01:22.400 --> 01:30.480
they are, should have access to these tools. It goes back even before OpenSSL was thought of.

01:31.520 --> 01:39.840
So let's go back to the Mesolithic era of the Internet when the World Wide Web was invented.

01:39.840 --> 01:46.160
And the World Wide Web depended on the Internet protocol, and you're not protocol, you pass

01:46.160 --> 01:51.600
packets back and forth from one server to another. And anyone along the path can open the

01:51.600 --> 01:56.400
the packet and see what's inside, right. And there's nothing to prevent anyone from seeing

01:56.400 --> 02:02.560
the plain text. And this is actually sort of a feature of the system because it's open by default.

02:02.560 --> 02:08.960
The idea is if you have the URL, you can get the information. We want information to be free,

02:08.960 --> 02:14.320
right. We want everyone to be able to find what they're looking for. And that's great. And it

02:14.320 --> 02:22.480
is started by research institutes who really want to share our information. But what if you want

02:22.480 --> 02:29.120
to sell stuff on the Internet? I know it's crazy, but imagine if you sold books by going to a

02:29.120 --> 02:34.400
web page and then the books will be delivered to your house. How would that work? Well one problem

02:34.400 --> 02:41.280
you'd have is when people send your credit card to you, anybody along the chain could read your

02:41.280 --> 02:46.720
credit card information and that would be bad. And then things like they would be able to find

02:46.720 --> 02:51.760
your home address because how do you get the books to the home or the office? That's also bad.

02:51.760 --> 02:56.800
So there's a lot of places where you just don't want your information to be completely free

02:56.800 --> 03:03.040
and available for anyone. You actually want it to be very limited to certain people. And so this

03:03.040 --> 03:13.360
was a problem that was actually solved by a company called NetScape back in 1995. They developed

03:13.360 --> 03:19.760
an early browser and an early web server and they invented something called SSL. Doesn't matter

03:19.760 --> 03:25.760
what SSL stands for. None of the SSL is for secure and it meant that all of your packets are

03:25.760 --> 03:30.720
passed back and forth would be encrypted. So if someone grabbed your packets they couldn't read it

03:30.720 --> 03:36.800
and figure out what your home address is or the very private email that you're about to send.

03:36.800 --> 03:42.320
And considering that I was thinking about my I have a new laptop and I thought about all the things

03:42.320 --> 03:48.720
I had to install and I was like okay so I'm going to install Firefox. Okay that's it.

03:50.320 --> 03:54.640
I mean obviously a few other things home-brew and such that they used for developing.

03:54.640 --> 04:03.120
There's a little problem or there was a problem in the early 1990s and that's that the US government

04:03.120 --> 04:08.080
said this is a military technology and you cannot export it outside of the US or Canada.

04:09.600 --> 04:16.000
So think about our mission. Where ever you are. This is what the world looked like in 1995.

04:16.800 --> 04:21.760
There's a story encryption for everybody in you know look into that section up there

04:22.720 --> 04:29.680
and everyone else had sort of weak encryption and by week I don't mean just that the NSA could figure out what you are sending.

04:30.240 --> 04:35.760
Week meaning like someone with a regular desktop computer and a few out or a few days could

04:36.160 --> 04:43.360
extract your information. So unless your secret is very time constrained you know you have a time limit on it.

04:44.640 --> 04:50.480
That's not very storm. And by the way this isn't even explained all the problem because if your browser

04:51.040 --> 04:55.520
that you purchased is in the US but you're speaking to a server that's outside of the US

04:55.520 --> 05:00.880
while you're getting weak encryption. You're not going to get strong encryption. That's a that's a big problem.

05:01.760 --> 05:06.880
And so someone decide to solve it. In fact two someone who have been to live in Australia

05:07.840 --> 05:14.080
and these guys Tim Hudson and Eric A Young invented a system called SSLAY

05:15.040 --> 05:24.080
and made open source and this is a very common thing in an open source project right you see something you want

05:24.720 --> 05:30.800
and you can't have it either because it doesn't exist or in this case you can't have it because you don't live in that

05:30.800 --> 05:37.040
certain geographic region and you build it yourself. And then instead of just saying oh I've got this thing for myself

05:37.120 --> 05:43.600
you say well I built this and I want some help building it and doing more and so I'm going to make the project open source

05:43.600 --> 05:54.240
and other people can contribute and you kind of conquer. You've ever played the game risk you know Australia strategy

05:56.000 --> 06:00.800
which anyway it doesn't it doesn't actually work in the game risk but it worked great for SSLAY

06:01.440 --> 06:07.520
and so it just spread everywhere because if you know weak encryption or storm encryption which

06:07.520 --> 06:16.640
would you prefer okay I agree storm encryption it is. And then because SSLAY was used in other projects like

06:16.640 --> 06:23.840
Apache and Pearl and all sorts of other things it actually went and you know kind of conquered the US too

06:24.560 --> 06:29.040
right because if you're building a free and open source project would you rather use

06:30.480 --> 06:35.520
SSLAY which is open source or would you rather use net escape SSLA which you had to pay for?

06:36.640 --> 06:42.880
A couple of caveats in this there was still a lot of us uh net escape SSLAY right because if you had

06:42.880 --> 06:48.880
purchased a net escape you the server the browser for some reason you know like I did what I was in

06:48.880 --> 06:56.000
the 90s then you'd use SSLA from net escape and it just is a natural thing. Fortunately they

06:56.000 --> 07:00.800
talked to each other the great thing I mentioned I didn't mention before but net escape and made the

07:00.800 --> 07:08.960
standard available they'd publish their standard so that other people could communicate to their SSLAY

07:09.120 --> 07:21.120
and then the founders of SSLAY left to join RSA so now you've got a great open source project

07:21.120 --> 07:26.160
and the maintainers have gone off and done something else and and so what do you do I mean this is

07:26.160 --> 07:31.760
there's a problem right well since I wasn't really much of a problem because there's a community

07:31.840 --> 07:39.440
on the SSLAY users list and so that community said hey we can just fork the SSLAY code

07:40.160 --> 07:48.080
and then we can scrape off SSLAY from all the files that have it and put open open SSLAY back on it

07:48.080 --> 07:54.320
and they released it this is actually take two the email for take two that happened in January of

07:54.320 --> 08:04.640
2009 or 1999 and you can see they nicely credited the founders of SSLAY and and they actually

08:04.640 --> 08:14.880
made the license a little bit nicer it's in the patchy cell license so in other words open SSLAY was

08:14.880 --> 08:22.240
able to just sort of take it over where SSLAY left off it's everywhere it does all the things it's

08:22.240 --> 08:28.160
not just the web anymore you can use it for email you can use it for just anything you can think of

08:28.160 --> 08:34.560
on where you're transmitting data for one place to another I talk to a guy who is working on

08:34.560 --> 08:40.000
serial port communication which was great I was so excited about it here like this when I was an

08:40.000 --> 08:47.040
intern at the National Weather Service and and he has an open SSLAY connection so that you can

08:47.120 --> 08:53.600
actually connect to a serial port over the internet which I was like wow I wish I had that back in the 90s

08:56.000 --> 09:02.000
so that's the end of my talk right because open source one open SSLAY has conquered the world

09:02.000 --> 09:12.080
yay team but there's a problem this is what I consider the worst day in the community

09:12.960 --> 09:18.880
and this is a front page of the New York Times not the top story but still in the front page

09:19.440 --> 09:25.680
is this the story about this problem with this these websites right what is this it actually

09:25.680 --> 09:32.240
says it later in the article but not in this section the headline it's heart lead how many of you have

09:32.240 --> 09:40.560
heard of heart lead yay heart lead is amazing because it means that all sorts of people know

09:40.560 --> 09:48.800
about open SSLAY all of a sudden right so the downside is I worked at I said I mean I was working

09:48.800 --> 09:55.680
at Stack overflow at the time and our our system administrators were like oh no what are we going

09:55.680 --> 10:00.640
to do because we had a lot of servers and almost all those servers had open SSLAY and an

10:01.360 --> 10:06.480
all of need to be patched and then you need to make sure that your patch actually was

10:06.480 --> 10:13.440
was fixing the problem that you had actually done all the steps and you know there's new reports

10:13.440 --> 10:18.640
about how people should change their passwords and how they'd lost you know potentially lost

10:18.640 --> 10:23.120
all their data that actually probably didn't happen there probably was a huge amount of data

10:23.120 --> 10:28.480
we can't know for sure because there's no no way to know because of the nature of the bug but

10:29.440 --> 10:37.200
but it cost a lot of time and it gave open SSLAY sort of a bad name in you know among the general

10:37.200 --> 10:41.440
public even we're not talking about tech people we're talking about people who read the New York times

10:44.640 --> 10:56.640
so yeah I mean it could be both right but I see what you mean so here's my diagnosis of why

10:56.960 --> 11:03.280
that happened the Open SSLAY community was not keeping up with the popularity of the project itself

11:03.920 --> 11:08.960
Open SSLAY was used everywhere by all sorts of people and the community this is the number of people

11:08.960 --> 11:15.680
contributed of some sort of commitment each month and I've got all the way back to 1999 and the

11:15.680 --> 11:23.840
pink line is the 12 month average and you can see it's around five people you know it goes up and down

11:23.920 --> 11:27.520
five people for this project but that's not even telling the full truth because the number of

11:27.520 --> 11:35.280
people who are actively maintaining it was closer to two or maybe like one and a half so it was

11:35.280 --> 11:40.560
not a lot of people who are actively maintaining it and the community was sort of stagnant

11:40.560 --> 11:47.280
like you're not seeing anywhere near the growth compared to the use of trade okay so let's ask

11:47.280 --> 11:54.800
the next question why is that? I think this picture a couple weeks ago it says Mojavecaca

11:54.800 --> 11:59.440
community I thought it was amusing because this is not looking like a very healthy community

12:00.400 --> 12:05.200
it's all prickly and dry and unless you know what you're looking for you can't even see the

12:05.200 --> 12:11.280
Mojavecaca they're actually these two sort of stump things they're supposed to have these big green

12:11.520 --> 12:18.720
leaves they look amazing but these do not look amazing they look prickly and I thought this is a

12:18.720 --> 12:26.560
good representation of what the community was like before Hartley. Cryptography is intimidating

12:27.440 --> 12:33.680
people you know think about what happens if I start working on this project and something breaks

12:34.320 --> 12:38.160
what happens if I start working on this project and I create something that's not secure

12:38.720 --> 12:44.320
oh my goodness that's a lot of responsibility and so not everyone feels like they can do it

12:44.320 --> 12:48.320
not everyone feels like they can contribute to openness to self and that's true even today

12:50.640 --> 12:55.600
and not everyone can contribute code either and this is maybe even a little bit harder for

12:55.600 --> 13:03.520
a project like open SSL which is mostly written in C. C is a fantastic language but again it's

13:03.520 --> 13:08.000
intimidating not a lot of people you know a lot of people know it but there's even more people are

13:08.000 --> 13:18.000
excited about other languages you know whilst my train of thought but but it's C is not exactly the

13:18.000 --> 13:23.760
easiest language to work with for a lot of people and then there's Pearl I love Pearl Pearl is amazing

13:24.800 --> 13:30.320
a lot of people don't like Pearl right Pearl is nice of them C right well not for some people

13:30.800 --> 13:35.600
and then it's all there's a sort of a caveat which is that the Pearl that's used an open SSL

13:36.640 --> 13:43.520
is used to generate assembly code so if you're excited about Pearl I'm sorry you're not actually

13:43.520 --> 13:49.680
writing Pearl you're writing a Pearl generator for assembly code and that's breaking my brain just talking

13:49.680 --> 13:55.520
about it imagine trying to work on this project and so these are intimidating things how many

13:55.520 --> 14:00.560
people could do this right and then the other thing that's going on is the people who are

14:00.560 --> 14:05.360
maintaining the project are really into it right they're really excited about it are they thinking

14:05.360 --> 14:10.640
about how do I bring new people into this project no they're thinking about how do I avoid having

14:10.640 --> 14:15.920
people try to commit some tell me this is how you should do it right that's that's a bother it's taking

14:16.000 --> 14:23.920
away from the code and then the other thing is the community happened on mailing lists again I love

14:23.920 --> 14:30.960
mailing lists there's email to you know that that was the way things worked in the 90s but mailing

14:30.960 --> 14:35.680
lists have a problem which is that you kind of have to to get involved in it you kind of have to

14:35.680 --> 14:41.040
read the mailing list a while otherwise people are yelling at you and then you're submitting patches

14:41.040 --> 14:46.080
over email you know all that sort of thing it's it's not that it's impossible it's not that you

14:46.080 --> 14:52.160
can't do it it's just another barrier that makes it slightly harder for people to get into the project

14:55.040 --> 15:01.120
so this is actually the picture I took I cropped it down to just the fun part but this is the bigger

15:01.120 --> 15:05.200
picture which is you can see there's more variety there's more stuff going on you can actually see

15:05.360 --> 15:11.360
Mojavec with the green things sticking out I love this picture some of you may not love the desert as

15:11.360 --> 15:18.000
much as I do but I come from Southern California and the desert is my home and this is is a great

15:18.000 --> 15:24.720
picture and what I like about it from a metaphorical point of view is that it's not just the Mojavec

15:24.720 --> 15:32.000
community is not just Mojavec and the open SSL community is not just the maintainers and developers of

15:32.080 --> 15:38.880
code of open SSL there's actually a lot more going on here and then the other thing is Mojavec

15:38.880 --> 15:45.040
I won the ways that they are able to thrive and reproduce is they depend on fire so when fire comes

15:45.040 --> 15:50.720
through it makes it easier for them to spread their the seed around and grow and expand

15:50.720 --> 15:59.840
sometimes there's an necessity for creative destruction in life so this is a change

15:59.840 --> 16:07.280
anyone know where a heart bleed bug happened on this graph it actually happened a little bit

16:07.280 --> 16:12.560
after when people started becoming more active I haven't really figured out exactly why but you can

16:12.560 --> 16:19.760
see that before heart bleed you would go literal years without anyone new joining the community

16:19.760 --> 16:26.320
and since then it's been going up and up and up and that's great right this is how a healthy

16:26.320 --> 16:35.440
community survives people like order so this is actually a different picture in near Nevada and

16:35.440 --> 16:41.120
there's a Joshua tree and there's an art installation it may not be possible to see all the way in the back

16:41.120 --> 16:46.480
but there's this cool art installation of someone has put up stones that make these towers

16:46.480 --> 16:50.400
and a bunch of people have come out to the desert to see this I recommend it

16:50.720 --> 17:01.120
that's excellent price too this is what the community looked like in 2015 so still mailing lists

17:02.320 --> 17:08.880
but now there's a wiki and it's on GitHub and I honestly think GitHub makes a big difference

17:08.880 --> 17:14.640
for a lot of people because instead of having to go to a mailing list learn how to make your patch

17:14.640 --> 17:19.440
follow the rules of this individual community get up stream lines a lot of that right

17:19.520 --> 17:23.920
we all know how to create an issue and GitHub we all know how to create a PR on GitHub

17:24.640 --> 17:30.640
there's still some nuance like from project to project but the interface is has actually helped

17:30.640 --> 17:34.400
it creates a sense of order in the community that means that people are more likely to join

17:36.480 --> 17:42.560
another thing we did was we started adding tests so I don't know if you can see it really well

17:42.560 --> 17:50.000
but that top line is the code increases over time is actually a logarithmic graph so that's not

17:50.000 --> 17:55.760
a weird shape and then there's tests which is the green or test which is the pink line

17:56.480 --> 18:03.200
and after heart bleed there was a real focus on creating a testing framework and creating some tests

18:03.200 --> 18:08.720
and that actually helps contributors a lot because you don't have to know the entire code base

18:08.800 --> 18:15.520
to make a change if I make a change in one part and I test my part and it seems to be working

18:15.520 --> 18:20.800
it's doing what I think it should and then I run the test suite and it didn't break something else

18:20.800 --> 18:27.200
somewhere else I have a higher sense of confidence that my change was actually a good change

18:27.200 --> 18:33.440
and not only that but the maintainers can look at that test case and the test suite and say

18:33.440 --> 18:37.520
okay it didn't break any of our tests and they can feel a little bit more confident that they

18:38.160 --> 18:46.080
that the code is worth bringing in heart bleed was good for the community communities that don't

18:46.080 --> 18:53.520
grow are in the process of dying and so this is my favorite graph because it's showing that the

18:53.520 --> 18:58.560
openness to the community is growing and there's just a million different ways that that's happening

18:58.560 --> 19:05.760
right here's some things that we've done the past year that has happened in the past year

19:06.800 --> 19:16.160
I started a survey to find out where openness to sell is being used and we had a conference

19:16.160 --> 19:25.120
in October that was amazing more recently we had an AI the community that does AI find a bunch of

19:25.120 --> 19:32.720
vulnerabilities in our code and one of the things I like is that they say that it's hard to find

19:32.720 --> 19:38.800
it's extraordinarily difficult to find vulnerabilities in open as sell and why is that because the

19:38.800 --> 19:43.200
community is gone through and fixed a lot of things that doesn't mean heart bleed isn't possible

19:43.200 --> 19:49.600
in the future it means it's less likely this is a picture I took just today I don't know if you can

19:49.600 --> 19:55.680
see it home brew has their thing and the example that they gave is brew and sell open as a sell

19:56.400 --> 20:04.880
I don't know why I think that's so cool but I think it's awesome I want to say thank you to our

20:04.880 --> 20:09.760
supporters we are the open as sell foundation we depend on the supporters the supporters

20:10.560 --> 20:14.640
allow me to come to this conference which is not cheap because I come from California

20:14.720 --> 20:22.800
yeah the conference itself is cheap but the becoming isn't and I'm not ready for questions but

20:22.800 --> 20:30.480
why I'm answering questions please scan my QR code so that you can get access to the newsletter

20:30.480 --> 20:37.360
that I'm definitely going to be writing tomorrow so I can get it out this week this is a brand new

20:37.360 --> 20:42.560
thing that the open as sell foundation is doing and you know I'm writing it so you don't have to

20:42.560 --> 20:50.160
worry about it being spam because I'm lazy and I'm going to be pushing every month or every

20:50.160 --> 20:58.560
couple of months to push one out all right ready for questions

20:58.880 --> 21:08.320
Thank you very much.

21:16.960 --> 21:23.360
Hi one thing I noticed about the heart blade bug was that the code was committed at like 1130

21:23.360 --> 21:30.480
pm on New Year's Eve yes do you think that was affected well I was there so I can't say for sure

21:31.280 --> 21:37.200
but I don't think that was a I don't think that was a positive thing and you know it's just when

21:37.200 --> 21:41.360
there's a small number of developers and you're getting some code coming in and this was this

21:41.360 --> 21:46.480
code came in was actually part of the heartbeat feature which is a good thing right you want to

21:46.480 --> 21:52.720
bring that in it wasn't like someone was dumping some dumb thing in but it makes it harder when

21:52.720 --> 21:57.360
there's only a small number of people and they have to frantically look through the code

21:57.360 --> 22:05.440
or let's let someone down right so I think having a very small number of maintainers is a

22:05.440 --> 22:12.960
potential problem for projects and and so the maintainers themselves should be incentivized

22:12.960 --> 22:17.440
to grow the community so that's a bigger group of people looking at each of these things yeah

22:17.600 --> 22:25.600
that's really interesting and good to hear the story told from something really knows it

22:25.600 --> 22:31.120
well tying into some things we heard on the main track this morning how do you protect

22:31.760 --> 22:36.880
the mental health of your contributors because you go to the Wikipedia page the guy who put in

22:36.880 --> 22:43.280
the commit that broke things is named I mean that's really rather scary I'm going to commit

22:43.280 --> 22:49.680
something there on the world sort of villain how do you look after your community to protect them

22:49.680 --> 22:56.560
from that frankly unsafe that's probably putting it by being treated yeah you didn't pay for it

22:56.560 --> 23:01.120
that's a great question so one thing for me personally is I don't know the name of the person

23:01.120 --> 23:06.000
I mean I looked at it once and I was like I'm going to forget this because I do agree that there's

23:06.000 --> 23:12.480
a sense of shame right I think the other thing is there are more people in the loop so there's

23:12.560 --> 23:17.040
two reviewers for the code and so it's actually there's more people who have a sense of

23:17.040 --> 23:24.000
responsibility you kind of diffuse their responsibility a little bit yeah I don't know I mean there's

23:24.000 --> 23:28.880
I think there's sort of safety and numbers so right the more people that you haven't

23:28.880 --> 23:36.080
involved it's helpful and then I personally just would never want to make a big deal about

23:36.240 --> 23:43.920
this poor person because yeah it's a potential very shameful thing so time for one more

23:45.040 --> 23:46.720
in the back

23:51.520 --> 23:55.600
thank you for your involvement and as someone who worked at a university you got to deal with this

23:56.160 --> 24:03.360
a large scale it was very interesting I'm wondering to what extent you would measure the health

24:03.360 --> 24:10.240
of a community based on the percentage of tests that are in your product percentage of tests

24:12.160 --> 24:20.800
yeah I mean so I add this this is still not enough tests I'll tell you that so we're at about 70

24:20.800 --> 24:27.040
percent tests there's sort of a point where you diminishing returns where like you're testing things

24:27.040 --> 24:31.680
in lines of code that are just not getting run very often it's more work to create the test

24:31.680 --> 24:36.400
and very low impact on it so there is definitely a place where you you just reach

24:37.280 --> 24:45.040
diminishing returns when I work close to that right and and I think the other thing you know

24:45.040 --> 24:49.520
I didn't mention too much on this graph but the documentation is actually perhaps a bigger problem

24:49.520 --> 24:56.160
right now because you know the test helped you verify that the code is working but from a community

24:56.160 --> 25:00.960
perspective we want people to contribute code and if the documentation has not reached the level

25:00.960 --> 25:06.160
where people are going to contribute code fairly quickly then then we have a problem and I think

25:07.440 --> 25:13.120
I've heard this a lot the OpenSL documentation is hard to use and I agree I'm fairly new to the

25:13.120 --> 25:19.680
project and reading it it's just hard to get into it does what the developers need it to do which is

25:19.680 --> 25:26.640
great so once you're inside the community it's fantastic but it's hard to get in through that

25:27.200 --> 25:33.200
so I mean I didn't really answer your question with it with a number it's above 70 less than a hundred

25:33.200 --> 25:39.520
right and and I think I don't know my 90 would be great 80 something would be would be pretty good too

25:41.120 --> 25:48.160
yeah thank you were at time okay great thanks John

25:56.640 --> 26:01.280
So

