WEBVTT

00:00.000 --> 00:15.680
Hi there. I'm Jade. I would be presenting with Nexus today. Unfortunately, they can't

00:15.680 --> 00:23.040
be here. A quick disclaimer before I start this talk. I'm not a cybersecurity professional.

00:23.120 --> 00:32.800
This is not a tutorial. This is how we dealt with a real situation. I'd about continuity and

00:32.800 --> 00:49.360
continuity is a matrix chart client. Is that stuck? Oh no. Okay. There we go. continuity is a matrix

00:49.360 --> 00:59.760
chart client. Matrix is a distributed chart protocol if you haven't heard of it. As a project,

00:59.760 --> 01:08.320
we run continuity.org, which is where we keep all of our official accounts. Continuity is a part

01:08.320 --> 01:17.680
of an ecosystem of other servers all built from the same original code base. The currently active

01:17.680 --> 01:27.840
names right now are conjugate, grapevine and tile. Now, just set the scene to this talk. It's for

01:27.840 --> 01:38.880
him and I can't sleep. So I open up my computer to download some books for my Kindle and I see this.

01:39.200 --> 01:49.360
This is quite bad. If you can't quite see what's going on there, my project account has just

01:49.360 --> 01:59.600
sent a message. I've been query raw itter, user ID, device token and that commands just printed

01:59.600 --> 02:08.480
out all of the session tokens for my account. There are a few of things going on with that message

02:08.560 --> 02:18.960
but we'll get to that later. Now I saw this message 23 seconds after it was sent. That was

02:18.960 --> 02:26.400
incredibly lucky because that meant we had time to react. If I hadn't seen it, the things would

02:26.400 --> 02:33.920
probably be a lot worse. But what's the first thing you do when you suspect your account is compromised?

02:34.880 --> 02:41.360
The first thing you do is you alert your team. The second thing you do is panic, right?

02:43.520 --> 02:48.960
And then you go through and you try to limit the damage. You disable the account's admin access.

02:49.520 --> 02:55.120
We suspended the account so it couldn't log in. We reset the password in case that was how they

02:55.520 --> 03:03.520
broken. We disabled all of the login tokens that just got leaked. And then we started putting out

03:03.600 --> 03:09.680
the plugs on all of the servers that we had access to. Unfortunately, the person who manages

03:09.680 --> 03:15.280
the continuity or cone-suffer was asleep so that's one plug that we couldn't pull out.

03:17.840 --> 03:26.240
So we're safe for now. But how did the attacker do that? Have we stopped them from doing it again?

03:26.240 --> 03:34.160
It's time to investigate. Identifying the attacker's server is pretty easy. There I count

03:34.160 --> 03:42.400
joined the room that that message was sent in, like just under an hour earlier. And we go and we

03:42.400 --> 03:49.360
look at the event. And looking from other servers, everything seems to check out. There's nothing

03:49.440 --> 03:56.720
suspicious from that point of view. But I have a hunch. And so I get the contents of the event

03:56.720 --> 04:02.960
from the continuity or cone-suffer. And we can see there are the all events I explained what

04:02.960 --> 04:08.560
those are later. And you can see the body of the message where that admin command is.

04:10.400 --> 04:15.360
There's some extra data there that marks it as part of a thread. I think it's a server notice for

04:15.360 --> 04:22.000
some reason. The event ID. And then there's some more data there, like what room it was in,

04:22.000 --> 04:32.400
the sender, and some signatures. But something is missing, I realize. And that is the client

04:32.400 --> 04:39.680
transaction ID. Whenever a matrix client sends a message, it adds a unique identifier to the message.

04:39.680 --> 04:47.840
So it knows when it gets it back. But this one doesn't have it. So the event must have come through

04:47.840 --> 04:54.160
a different route. It must have come over the federation API, which came from a different server.

04:54.160 --> 05:01.120
But other servers don't have this account, so it has to be forged. Now, in the meantime, whilst

05:01.120 --> 05:06.880
we're investigating this, the attacker is causing as much damage as possible. We've looked

05:06.880 --> 05:14.320
them out of my account. But they're going and exploiting the vulnerability in each of our public rooms.

05:14.320 --> 05:19.680
One by one to ban every server from the room, so you can't communicate with people.

05:22.000 --> 05:31.040
And we're already feeling the time pressure. It's now 5 a.m. it's been an hour since we were attacked.

05:31.120 --> 05:39.840
And every member of the team who was a waked at point is following their own little path of

05:39.840 --> 05:46.880
investigation. And I have a hot tip for you. If you are being attacked by a malicious actor,

05:46.880 --> 05:53.680
you don't talk to them. They will mislead you. Next, I identified the attacker and started

05:53.680 --> 06:00.160
a conversation with them. And from them, they were using a modified version of our software

06:00.160 --> 06:08.160
to perform the attack. But they also sent next down a whole side track where I have been a cryptography

06:08.160 --> 06:15.200
attack with signatures. It wasn't. And that was a waste of a couple of hours. The next real

06:15.200 --> 06:23.920
insight that we had was, once we got access to logs from other servers, and we could see failed attack

06:24.560 --> 06:31.120
attempts. And based on these failed attack events, we could work out. You know, the attacker

06:31.120 --> 06:39.840
required knowledge of the room. They required author events. Now, author events are previous messages

06:39.840 --> 06:46.480
in the room that control who has permission to do what. So when the room is first created,

06:46.800 --> 06:53.760
when missions are set, when someone joined the room. And each of them has to have these

06:54.960 --> 07:00.800
bodies, so that when the messages sent, if a server doesn't know about them, we are ready

07:00.800 --> 07:10.400
if you can go and go and fetch them. And that just means that the attacker already had to have an

07:10.400 --> 07:19.280
account in the room to perform the attack. So a private room we thought was safe. And the attack

07:19.280 --> 07:26.960
was handcrafting events. So that explained all of the oddities with the first message. They hadn't

07:26.960 --> 07:35.520
sent it through a normal client that manually typed out all the JSON. But it's half five now.

07:36.480 --> 07:43.840
And one of our mentors has to go to sleep. And we want to call and now it's 10 past six.

07:43.840 --> 07:52.560
And I have to go to sleep. So we call in external health, Olivia from the grapevine project.

07:52.560 --> 08:03.120
So which is one of the other projects based on the same purpose. And we get some more useful

08:03.120 --> 08:11.120
information. The attacker starts probing a server we have set up with the debug logging and instrumentation

08:11.120 --> 08:21.600
on it. And we now know the last piece of the vulnerability. The vulnerability is one in a attacker,

08:21.600 --> 08:30.000
or one at the server leaves a room. But at this point next expires, it's only Olivia left.

08:30.080 --> 08:37.760
But at seven morning, so people should be waking up soon. Now, before we get to the vulnerability,

08:37.760 --> 08:45.440
how do you leave a room in matrix? To leave a room, your server has to send a lever then. So

08:45.440 --> 08:53.280
a message that's saying it's left the room. But to make that event, it requires knowledge of the

08:53.280 --> 09:00.800
state of the room, who's in it, what the occupants are and so on. But your server, when it's

09:00.800 --> 09:09.200
leaving a room, doesn't necessarily always have that information. For example, if you get invited to a

09:09.200 --> 09:15.360
room and reject that invite, you've never joined the room that you're still leaving it. So

09:16.080 --> 09:24.400
you have to get that information somehow. So the leaving server picks a server that it

09:24.400 --> 09:30.240
announces in the room properly and has all that information and asks it to get all that information

09:30.240 --> 09:39.760
for it and make a leave event. And then that server returns the event and the leaving server

09:39.840 --> 09:45.600
signs that and sends it back for that server to distribute to everyone else in the room so they

09:45.600 --> 09:52.560
know you've left. What could go wrong here? I wonder, I think there are a few steps in this thing now.

09:53.440 --> 10:00.640
I mean, let's go through pointing into this code first. So for each, each server we know about,

10:00.640 --> 10:07.280
we send that Federation request and we try if it's successful, we take that and we go to the next

10:07.360 --> 10:16.160
step and pass that and get the JSON from it. We insert some metadata into that and we sign

10:16.160 --> 10:24.000
that event and then we send it back. Yeah, there's still something missing there. Validation.

10:25.680 --> 10:35.440
The attacking server could return anything in that response, not just a leave event, but a normal

10:35.440 --> 10:47.840
message or a ban or anything like that. So it was time to devour the patch and leave here,

10:48.480 --> 10:57.200
what was hard and makes the patch where we basically insert that Federation,

10:58.080 --> 11:09.120
we check the event type, the sender, the affected user, the runer, the type of membership that

11:09.120 --> 11:16.560
it is, whether it's a join or a leave or a cell and a cell on. But doesn't have some piece still.

11:16.560 --> 11:23.200
How did they trigger that exploit? Because your client doesn't just leave rooms by itself.

11:23.200 --> 11:30.480
You have to manually trigger that and at this point there are all sorts of wild theories going around.

11:30.480 --> 11:37.840
Maybe it was a discord bridge that got broken or something like that. But I've opened up and

11:37.840 --> 11:43.040
we now have access to the logs that continuity to the org. My own continuity to org, we run a little

11:43.040 --> 11:49.760
box that has enhancements and this was the bot that was leading the rooms and it had a

11:49.760 --> 11:54.880
plug-in on it that manages which means that it joins. But rather than just ignoring invites,

11:54.880 --> 11:59.680
it would automatically reject any invites. So that was what was triggering the leaves.

12:01.600 --> 12:06.400
So now we have a patch, you know exactly what happened and it's time to distribute the patch.

12:08.000 --> 12:12.800
Now like I was saying, there are a part of an ecosystem that we've similar projects. So three other

12:12.880 --> 12:20.480
projects are vulnerable and exactly the same way. The spec is also missing some validation,

12:20.480 --> 12:29.920
not all of it. And some projects we have communication channels that we don't. And it takes

12:29.920 --> 12:35.680
a few hours to ensure everyone's thought patch is prepared and everyone can hit the release button

12:35.760 --> 12:48.560
at the same time. We released the security patch from the attack to the patch release. It was 13

12:48.560 --> 13:02.400
hours and 45 minutes. We still have a couple of minutes. So we can talk about some of the ways

13:02.400 --> 13:08.720
we're securing continuity against similar attacks in the future. We've added a lot of security

13:08.720 --> 13:15.360
features. So we've added the ability to lock an account using an API so that you can immediately

13:15.360 --> 13:21.360
prevent an account from performing any actions. We've added a command to force you look out

13:21.360 --> 13:29.840
of all of the user sessions so that instead of taking ten minutes to do that, it takes us 30 seconds

13:29.840 --> 13:38.640
next time. We've added configuration to find admins so that if the attack against access to the

13:38.640 --> 13:46.240
server, they can't promote their permissions. And we've added hardening to make sure that commands

13:46.240 --> 13:52.480
don't get executed outside of places by you'd expect them. We've also been working on projects

13:52.480 --> 13:59.120
draw bridge over 100 commits, hardening the Federation APIs, which we haven't released yet,

13:59.120 --> 14:05.440
but we're going to resume. Any questions? We've got one minute. So one question.

14:05.440 --> 14:33.280
How difficult was it getting CDAs for vulnerabilities? Oh, that is actually shockingly easy,

14:33.760 --> 14:38.320
but most of my maths gone. I can actually show you that right now.

14:40.880 --> 14:49.600
Change. So, stop memory. On my screen right now, we have the GitHub security advisory.

14:49.600 --> 14:56.640
Now GitHub provided a really handy feature which allows you to manage vulnerabilities, reports coming in,

14:57.600 --> 15:07.120
and it assigns them IDs. To get a CDA, we just fill out the security advisory and hit a button

15:07.120 --> 15:17.040
to request a CDA number and it takes maybe a week and they come back to with a CDA number.

15:17.120 --> 15:27.440
And now, we can just click publish. CDA number.

15:34.080 --> 15:35.840
Great. Any other questions?

15:35.840 --> 15:47.680
Perfect. All right, then. Okay, I was on. Thank you.

