WEBVTT

00:00.000 --> 00:15.280
So please welcome Devgang Lu, he's a software engineer, and it's first time at first

00:15.280 --> 00:20.960
them, so it's a big occasion, it's not his first time reverse engineering, but it's the first

00:20.960 --> 00:33.960
time doing something that big, so a big one of applause for him.

00:33.960 --> 00:35.960
Thank you very much.

00:35.960 --> 00:41.560
So today we will be talking about reverse engineering in the world largest music streaming

00:41.560 --> 00:49.120
platform, which you may guess is Spotify, so I want to ask if there's anyone from Spotify

00:49.120 --> 00:53.120
in the room.

00:53.120 --> 01:04.120
Okay, actually no one, so I shouldn't get arrested for doing this.

01:04.120 --> 01:10.720
So a bit on who I am, I'm the kind of software developer, I'm a big contributor for open

01:10.720 --> 01:17.720
source, I'm going from the more projects to a little bit more stuff, and also my own projects.

01:17.720 --> 01:22.720
I enjoy, I will see reverse engineering and a bit of software security.

01:22.720 --> 01:30.720
For that reason, I'm also interested in IoT stuff, I have a couple of CVs to my name and I have fun with

01:30.720 --> 01:33.720
other stuff that isn't necessarily public.

01:33.720 --> 01:40.720
I'm also a CTF player, so for those that don't know, CTFs are capture the flag competitions,

01:40.720 --> 01:42.720
which are cybersecurity related.

01:42.720 --> 01:49.720
I am a former Tmitary member, which is the team that competes in the European cybersecurity challenge.

01:49.720 --> 01:57.720
With my old team, I was an organizer for CC-2024, which is the European cybersecurity competition,

01:57.720 --> 02:04.720
and also a platform provider for ICC, which is the international cybersecurity competition.

02:04.720 --> 02:09.720
And I've also been a deaf confinalist with macaroni.

02:09.720 --> 02:13.720
Before we start, there's a big disclaimer.

02:13.720 --> 02:20.720
This talk is present early in my opinion, not that of my current previous future employer,

02:20.720 --> 02:25.720
and not that of my friends, my colleagues or people I met online.

02:25.720 --> 02:33.720
For the same reason, this talk includes only things that I learned and discovered personally,

02:34.720 --> 02:37.720
and no work from other people.

02:37.720 --> 02:42.720
The work presented in this slide is to enhance Spotify, not undermining.

02:42.720 --> 02:49.720
We don't do piracy, and I have nothing to do with another archive.

02:49.720 --> 02:55.720
This talk was decided well before that happened.

02:55.720 --> 03:00.720
And to Spotify, please don't mind me again, please don't sue me.

03:00.720 --> 03:02.720
Let's have a chat first.

03:02.720 --> 03:05.720
So now we begin.

03:05.720 --> 03:10.720
We will talk about Spotify, but we will also talk about Libre's Pot,

03:10.720 --> 03:17.720
which is the actual open-source projects that we and me and other people work on.

03:17.720 --> 03:22.720
When I talk about Libre's Pot, actually I'm talking about many projects,

03:22.720 --> 03:26.720
because there are many projects within different languages,

03:26.720 --> 03:31.720
but all of them share the fact that they are client library for Spotify,

03:31.720 --> 03:34.720
capable of playback through various decades.

03:34.720 --> 03:40.720
It is completely headless, so it does not require a desktop environment,

03:40.720 --> 03:43.720
it's entirely based or a version engineering.

03:43.720 --> 03:47.720
It is fully capable as a Spotify connect endpoint,

03:47.720 --> 03:51.720
and it's an alternative to all the time they prepared the Libs Potify,

03:51.720 --> 03:55.720
which I never got to know, even.

03:55.720 --> 04:00.720
This project is not a bookdownloader, don't use it for that, please.

04:00.720 --> 04:04.720
A way to keep our bypass heads, we do not support free accounts,

04:04.720 --> 04:09.720
it's not a wrapper around public APIs, it's not a control library,

04:09.720 --> 04:13.720
it's not an alternative, go into it or clean.

04:13.720 --> 04:17.720
So why Libre's Pot exists?

04:17.720 --> 04:22.720
You can find your own reasons, but these are some of the most common.

04:22.720 --> 04:27.720
You can turn any device into a Spotify connect endpoint.

04:27.720 --> 04:30.720
It is edless, it does not require a desktop environment,

04:30.720 --> 04:33.720
it has very low resource users,

04:33.720 --> 04:35.720
who use it for that reason.

04:35.720 --> 04:38.720
If you're into really into open source,

04:38.720 --> 04:41.720
you may want to know exactly the code that you're running,

04:41.720 --> 04:45.720
so all the Libres for projects allow you to do that.

04:45.720 --> 04:48.720
And also you may be an audio nerd,

04:48.720 --> 04:51.720
so you may want to build your own audio pipeline,

04:51.720 --> 04:55.720
your multi-room setup, your DIY doc,

04:55.720 --> 04:59.720
and streamer setup, all those kind of things.

04:59.720 --> 05:03.720
So I was saying the Libre's Pot is a family,

05:03.720 --> 05:07.720
because there are many projects, depending on the language,

05:07.720 --> 05:10.720
mainly on the support.

05:10.720 --> 05:13.720
We have Python version, we have a C version,

05:13.720 --> 05:16.720
which runs on ESP32, for example,

05:16.720 --> 05:20.720
then we have what I'm currently working on,

05:20.720 --> 05:23.720
which is a Go Libre's Pot, we can go obviously.

05:23.720 --> 05:26.720
It's an old version in Go,

05:26.720 --> 05:31.720
which is not a developer anymore.

05:31.720 --> 05:33.720
Then there's another project of mine,

05:33.720 --> 05:37.720
which was the original Libre's Pot in not the original,

05:37.720 --> 05:40.720
but my original project in Java.

05:40.720 --> 05:43.720
And then we have the original Libre's Pot,

05:43.720 --> 05:48.720
which is in Rust, and is the longest standing of them all.

05:48.720 --> 05:52.720
The Java version is actually deprecated,

05:52.720 --> 05:55.720
because, yeah.

05:55.720 --> 06:01.720
So we'll have a look into the Spotify infrastructure,

06:01.720 --> 06:06.720
a little bit, or at least what we need to get,

06:06.720 --> 06:09.720
as Spotify connect and point working.

06:09.720 --> 06:13.720
This is the state of things after 2019,

06:13.720 --> 06:16.720
because there was major change in the infrastructure.

06:16.720 --> 06:19.720
And also this is what we,

06:19.720 --> 06:21.720
as a reverse engineering,

06:21.720 --> 06:25.720
in a reverse engineering, we discovered being the infrastructure.

06:25.720 --> 06:27.720
If there was someone from Spotify,

06:27.720 --> 06:31.720
they could probably tell me I'm wrong.

06:31.720 --> 06:35.720
So piece by piece, we have the AP resolve,

06:35.720 --> 06:38.720
which is a service that returns at least

06:38.720 --> 06:41.720
of endpoints for other services.

06:41.720 --> 06:44.720
Then we have the access point,

06:44.720 --> 06:47.720
which was very used, which was the thing

06:47.720 --> 06:50.720
that was very used up until 2019.

06:50.720 --> 06:52.720
It's a custom protocol,

06:52.720 --> 06:54.720
of a TCP connection,

06:54.720 --> 06:56.720
that has some dfial monkey exchange,

06:56.720 --> 06:59.720
and then encrypts using some strange cipher,

06:59.720 --> 07:01.720
called Shannon,

07:01.720 --> 07:03.720
and transports simple data packets,

07:03.720 --> 07:05.720
which are composed of just a packet type,

07:05.720 --> 07:07.720
packet length and packet data.

07:07.720 --> 07:09.720
From this connection,

07:09.720 --> 07:11.720
we get store credentials,

07:11.720 --> 07:13.720
so we log in with the access point,

07:13.720 --> 07:16.720
and we get a set of store credentials,

07:16.720 --> 07:20.720
so we can use them to authenticate other sessions,

07:20.720 --> 07:26.720
of when you close and reopen the decline.

07:26.720 --> 07:30.720
Then we have the log in five service,

07:30.720 --> 07:34.720
which is a new authentication service

07:34.720 --> 07:38.720
that is used for the newer services in infrastructure,

07:38.720 --> 07:41.720
and we have to authenticate that using the credentials

07:41.720 --> 07:43.720
from the access point,

07:43.720 --> 07:46.720
and the most useful thing that we still need,

07:46.720 --> 07:50.720
the access point for is retrieving the encryption keys

07:50.720 --> 07:52.720
for the audio files,

07:52.720 --> 07:56.720
because Spotify serves only encrypted audio files,

07:56.720 --> 07:59.720
and you will need the key to the crypt.

07:59.720 --> 08:02.720
On to the next piece, this is new stuff.

08:02.720 --> 08:04.720
As I said, the login file provides authentication

08:04.720 --> 08:06.720
for other services.

08:06.720 --> 08:11.720
It provides a beer token to authenticate the SP client,

08:11.720 --> 08:14.720
which is just a bunch of rest APIs.

08:14.720 --> 08:18.720
The most interesting ones are those related to the school

08:18.720 --> 08:20.720
of the connect state,

08:20.720 --> 08:22.720
that makes the connect cluster.

08:22.720 --> 08:25.720
So every Spotify connect endpoint,

08:25.720 --> 08:28.720
publishes its state to the server,

08:28.720 --> 08:30.720
which builds the connect cluster

08:30.720 --> 08:34.720
and orchestrates all the endpoints.

08:34.720 --> 08:37.720
This part of the API,

08:37.720 --> 08:40.720
it's just a rest API,

08:40.720 --> 08:44.720
and Spotify uses Protobuff for all the communications

08:44.720 --> 08:46.720
on that API.

08:46.720 --> 08:51.720
Lastly, we have the event part of things,

08:51.720 --> 08:56.720
the dealer is called the web socket API is called,

08:56.720 --> 08:58.720
it is located with login five,

08:58.720 --> 09:02.720
and it uses Protobuff messages wrapped in JSON,

09:02.720 --> 09:04.720
don't ask me why.

09:05.720 --> 09:09.720
And it publishes all the events required

09:09.720 --> 09:13.720
for the client to work to the client.

09:13.720 --> 09:17.720
And also contains a special token,

09:17.720 --> 09:21.720
which is used to synchronize the dealer and the SP client,

09:21.720 --> 09:26.720
so that they work on the same stuff.

09:26.720 --> 09:28.720
Now, we're going to have a look

09:28.720 --> 09:30.720
at some of the technical challenges,

09:30.720 --> 09:33.720
involved with the reverse engineering Spotify.

09:33.720 --> 09:35.720
We'll start with the easiest one,

09:35.720 --> 09:38.720
which is intercepting HTTPS traffic.

09:38.720 --> 09:39.720
Traffic is encrypted,

09:39.720 --> 09:43.720
so there's no really way to intercept it passively.

09:43.720 --> 09:46.720
We will need to do many the middle.

09:46.720 --> 09:50.720
Luckily, Spotify doesn't do setvigate pinning,

09:50.720 --> 09:52.720
so it's even easier.

09:52.720 --> 09:56.720
We can just pull up something like a meeting proxy,

09:56.720 --> 09:59.720
get the setvigate authority,

09:59.720 --> 10:02.720
install it for other trusted certificate authority

10:03.720 --> 10:05.720
for this system, also for Chrome,

10:05.720 --> 10:09.720
because Spotify uses the Chromium and Bedet framework,

10:09.720 --> 10:12.720
which it reads DCA from Chrome.

10:12.720 --> 10:15.720
We set the proxy URL in the desktop client,

10:15.720 --> 10:20.720
and voila, we can see all the traffic unencrypted.

10:20.720 --> 10:25.720
You can see that the client holds with all the services

10:25.720 --> 10:27.720
I was telling you about,

10:27.720 --> 10:30.720
so we have for the APR solve,

10:30.720 --> 10:31.720
but then it does.

10:31.720 --> 10:33.720
Again, five authentication,

10:33.720 --> 10:35.720
it connects to the access point,

10:35.720 --> 10:38.720
it pushes its state to the server,

10:38.720 --> 10:42.720
and then connects to the editor.

10:42.720 --> 10:45.720
We can see the access point connection here,

10:45.720 --> 10:48.720
but it's not actually decrypted.

10:48.720 --> 10:49.720
If we look into it,

10:49.720 --> 10:51.720
it would be still encrypted,

10:51.720 --> 10:54.720
and we've solved this problem later.

10:54.720 --> 10:56.720
This was easy.

10:56.720 --> 10:59.720
Now we get onto the interesting stuff.

11:00.720 --> 11:06.720
So we are going to recover C++ protobuff classes in Gidra.

11:06.720 --> 11:08.720
What does that mean?

11:08.720 --> 11:10.720
C++.

11:10.720 --> 11:13.720
We don't know that protobuff and Gidra.

11:13.720 --> 11:15.720
What are those?

11:15.720 --> 11:20.720
Protobuff is a mechanism for serializing structural data.

11:20.720 --> 11:23.720
It's just like XML or JSON,

11:23.720 --> 11:25.720
but weighs molar.

11:25.720 --> 11:27.720
It's maintained by protobuff.

11:27.720 --> 11:29.720
It has a bunch of other features.

11:29.720 --> 11:34.720
But the interesting part about it is that it's very small.

11:34.720 --> 11:38.720
And for that reason, it requires code generation.

11:38.720 --> 11:42.720
So you cannot simply like to do with JSON decoded,

11:42.720 --> 11:47.720
because there's something missing in the serialized format

11:47.720 --> 11:51.720
that you have in the generated code for your messages.

11:51.720 --> 11:54.720
It's put if I was using it for a while,

11:54.720 --> 11:57.720
at least a single free spot was born.

11:57.720 --> 11:59.720
So how does protobuff works?

11:59.720 --> 12:03.720
As I said, the wire format, the serialization is entirely binary.

12:03.720 --> 12:06.720
As you can see, in the example,

12:06.720 --> 12:08.720
if we have a message called user,

12:08.720 --> 12:10.720
which has three fields, name, favorite number,

12:10.720 --> 12:13.720
and obvious, which are respectively one, two, and three.

12:13.720 --> 12:16.720
As field numbers, on the right,

12:16.720 --> 12:20.720
you see the serialized version of that message.

12:20.720 --> 12:24.720
And you see there's never mentioned of the field name.

12:24.720 --> 12:30.720
So there's never the name, favorite number, or obvious field name.

12:30.720 --> 12:35.720
You only see the field number, which is called field tag inside the slide.

12:35.720 --> 12:38.720
That allows it to be small.

12:38.720 --> 12:42.720
But the other problem is that you cannot recover the name,

12:42.720 --> 12:45.720
unless you have the message type definition.

12:45.720 --> 12:48.720
Spotify uses a lot of protobuff.

12:48.720 --> 12:52.720
So we may want to recover those.

12:52.720 --> 12:56.720
Luckily, there's a special message called the file descriptor proto,

12:56.720 --> 12:59.720
which describes a protophile.

12:59.720 --> 13:03.720
And that message is serialized,

13:03.720 --> 13:10.720
and included in the C++ source code that is generated for your messages.

13:10.720 --> 13:13.720
So you can use a tool like a protobuff toolkit.

13:13.720 --> 13:16.720
We run it against the desktop client binary,

13:16.720 --> 13:21.720
and we get 900 files and 2400 messages, which is quite a lot.

13:21.720 --> 13:26.720
And it's also quite fun, because sometimes you see stuff that hasn't been published yet,

13:26.720 --> 13:28.720
because apparently the ship,

13:28.720 --> 13:34.720
non-production protobuff definitions inside the binary.

13:34.720 --> 13:36.720
This is what it looks like.

13:36.720 --> 13:40.720
On the top left, you see the file descriptor protomessage,

13:40.720 --> 13:43.720
I've removed the stuff that we are not interesting in.

13:43.720 --> 13:46.720
In the bottom left, you see an example message, I've taken,

13:46.720 --> 13:50.720
and I will use for other examples, which is the any message,

13:50.720 --> 13:58.720
which contains only two fields, which are the type URL, and the value.

13:58.720 --> 14:03.720
On the right, you see what this serializing defile descriptor proto

14:03.720 --> 14:07.720
for the any protophile looks like.

14:07.720 --> 14:11.720
And you can see there's many similarities,

14:11.720 --> 14:16.720
and you can see how you could reconstruct the original protophile

14:16.720 --> 14:19.720
from what you see on the right.

14:19.720 --> 14:26.720
And that is what the protobuff toolkit tool I showed you does.

14:26.720 --> 14:30.720
Next piece of tooling we will need is Gidram,

14:30.720 --> 14:32.720
many of you may know it.

14:32.720 --> 14:35.720
It's a version engineering framework created,

14:35.720 --> 14:39.720
and maintained by the NSA search directorate.

14:39.720 --> 14:43.720
We will need it mainly for its compilation scripting,

14:43.720 --> 14:46.720
but it does a bunch of other things.

14:46.720 --> 14:51.720
It helps us understand what is going on inside this Spotify,

14:51.720 --> 14:54.720
desktop client, which is written in C++,

14:54.720 --> 15:00.720
and try to transform it into actual readable code for us humans.

15:00.720 --> 15:04.720
But if we do that, we get 6 million lines of code.

15:04.720 --> 15:09.720
Of course, the binary does not have the back symbols,

15:09.720 --> 15:14.720
so we get no variable names, no function names, anything like that.

15:14.720 --> 15:19.720
So it's simply 6 million of meaningless code.

15:19.720 --> 15:23.720
So how do we find what we are interested in?

15:23.720 --> 15:25.720
There are multiple ways.

15:25.720 --> 15:27.720
You can look for strings.

15:27.720 --> 15:31.720
You can do some fancy code flow analysis.

15:31.720 --> 15:35.720
What I've did and has turned out to be useful for me

15:35.720 --> 15:41.720
is figure out where those protobuff classes generated

15:41.720 --> 15:45.720
by protobuff are being used in the compile code.

15:45.720 --> 15:50.720
For that, we will make use of the fact that those classes

15:50.720 --> 15:55.720
are C++ classes, which extend the virtual

15:55.720 --> 15:59.720
global global protobuff message class.

15:59.720 --> 16:02.720
It's very important that class is virtual,

16:02.720 --> 16:04.720
because virtual classes are virtual tables,

16:04.720 --> 16:09.720
which are basically tables of addresses inside the binary,

16:09.720 --> 16:14.720
and we can use those to trace back to the constructors

16:14.720 --> 16:17.720
and the structures of set classes.

16:17.720 --> 16:19.720
We do that.

16:19.720 --> 16:22.720
We need to do it deterministically and automatically,

16:22.720 --> 16:26.720
because there are a bunch of messages inside the Spotify,

16:26.720 --> 16:29.720
so we can do it by end.

16:29.720 --> 16:31.720
So we'll have a look at an example,

16:31.720 --> 16:35.720
how we can inside the generated C++ code,

16:35.720 --> 16:40.720
trace back the file descriptor protobuff message,

16:40.720 --> 16:42.720
which I showed you earlier,

16:42.720 --> 16:46.720
and we know that works because other tools use it.

16:46.720 --> 16:51.720
How we can trace back this message back to the C++ class.

16:52.720 --> 16:55.720
To do that, we look in generated code.

16:55.720 --> 16:59.720
We see that this message is referenced inside

16:59.720 --> 17:03.720
the internal structure called descriptor table.

17:03.720 --> 17:08.720
Then we see that this is referenced in strange class,

17:08.720 --> 17:10.720
which I don't really know what it does,

17:10.720 --> 17:13.720
but it's sure that it was compilation time.

17:13.720 --> 17:16.720
Luckily for us, this class is finally,

17:16.720 --> 17:21.720
this function is finally used inside the actual C++ class.

17:21.720 --> 17:24.720
We are interested in, which is, for example,

17:24.720 --> 17:26.720
say it's any.

17:26.720 --> 17:30.720
This get-metadata method is also virtual.

17:30.720 --> 17:34.720
And as I said, virtual classes, virtual methods,

17:34.720 --> 17:37.720
and up in the V table.

17:37.720 --> 17:43.720
So we can go from the file descriptor protoback

17:43.720 --> 17:46.720
to the V table of that class,

17:46.720 --> 17:50.720
and from there we can get the constructors and the structures.

17:50.720 --> 17:53.720
Can we automate this course?

17:53.720 --> 17:55.720
All the code for the,

17:55.720 --> 17:59.720
scrolled Gita script is available on my GitHub.

17:59.720 --> 18:02.720
It's quite old, but it still works again,

18:02.720 --> 18:04.720
I guess, Spotify.

18:04.720 --> 18:09.720
It will not work with the latest version of the protobuff generator,

18:09.720 --> 18:12.720
because they changed the generated code quite a bit.

18:13.720 --> 18:15.720
The end result is this.

18:15.720 --> 18:17.720
On the left, you can see that the script

18:17.720 --> 18:22.720
has recovered the where the V tables for some messages are.

18:22.720 --> 18:25.720
I feel the just for one.

18:25.720 --> 18:28.720
You can see it has recognized where the V tables are.

18:28.720 --> 18:32.720
It has renamed the structures.

18:32.720 --> 18:37.720
And on the right, you see what the internal descriptor,

18:37.720 --> 18:41.720
protobter descriptor table structure looks like.

18:41.720 --> 18:44.720
You can see it has, for example, the file name,

18:44.720 --> 18:49.720
and the, the point animation to the descriptor.

18:49.720 --> 18:51.720
So we're done with that.

18:51.720 --> 18:53.720
We did that.

18:53.720 --> 18:55.720
Now onto the last part.

18:55.720 --> 18:59.720
I mentioned, we have the HTTPS traffic.

18:59.720 --> 19:05.720
We know what classes are being used to generate that HTTPS traffic,

19:05.720 --> 19:09.720
because the, the API uses mainly protobuff.

19:09.720 --> 19:14.720
If we see an API call, we can look into Gidra,

19:14.720 --> 19:18.720
find where the message for that call is being created,

19:18.720 --> 19:23.720
and look what the code does and try to figure out

19:23.720 --> 19:27.720
how to do the same in our code.

19:27.720 --> 19:32.720
The last piece we're missing is the access point I mentioned

19:32.720 --> 19:33.720
in the beginning.

19:33.720 --> 19:38.720
We still cannot see what traffic happens there.

19:38.720 --> 19:41.720
And there might be some interesting stuff.

19:41.720 --> 19:47.720
So now we're going to see how we can log the traffic.

19:47.720 --> 19:52.720
And for that, we will need some more tools.

19:52.720 --> 19:54.720
One of those is GDB.

19:54.720 --> 19:59.720
You may know GDB if you've written any C or C++ code.

19:59.720 --> 20:05.720
It's very, it's the standard tool for dynamic analysis.

20:05.720 --> 20:08.720
And bug finding in C and C++ programs,

20:08.720 --> 20:11.720
as far we have done only static analysis.

20:11.720 --> 20:17.720
But you may be used to using GDB with the bug symbols,

20:17.720 --> 20:22.720
because you may use it on the code you are written yourself.

20:22.720 --> 20:28.720
Here we don't have that, because Spotify doesn't ship the bug symbols,

20:28.720 --> 20:29.720
obviously.

20:29.720 --> 20:33.720
So we will use an extension to GDB called the point DBG,

20:34.720 --> 20:38.720
which facilitates a lot of the reverse engineering process.

20:38.720 --> 20:43.720
You can see the staggering difference between running GDB

20:43.720 --> 20:51.720
on an yellow word program with and without and with point DBG.

20:51.720 --> 20:54.720
There's a lot more info, and also there's a lot more commands.

20:54.720 --> 20:58.720
You can use to move faster.

20:58.720 --> 21:01.720
Another tool that we'll use is freedom.

21:02.720 --> 21:05.720
Freedize the dynamic code instrumentation toolkit.

21:05.720 --> 21:10.720
It's real cool because it works on basically any platform.

21:10.720 --> 21:20.720
And it allows you to write some JavaScript code to inject inside the process you want to look into.

21:20.720 --> 21:28.720
So what we will typically do is use GDB to manually verify what you are looking at

21:28.720 --> 21:30.720
if your assumptions are correct.

21:30.720 --> 21:34.720
And then switch to something that is more automated.

21:34.720 --> 21:39.720
You can also automate GDB, but we will do it with freedom.

21:39.720 --> 21:47.720
So you write some JavaScript code and you basically have a script for our use case that logs all the traffic.

21:47.720 --> 21:51.720
Last piece, what is the Cypher?

21:51.720 --> 21:53.720
I was talking about in the beginning.

21:53.720 --> 21:56.720
It's called the Shannon Cypher.

21:56.720 --> 21:59.720
It's part of the sober family, sober family.

21:59.720 --> 22:04.720
It's been developed by Cole Comostralia in 1997.

22:04.720 --> 22:08.720
The original implementation is available only through way back machine.

22:08.720 --> 22:12.720
I have no idea why Spotify use that in the first place.

22:12.720 --> 22:22.720
The only advantage is that it does encryption and message authentication simultaneously, which is, I guess, Andy.

22:22.720 --> 22:25.720
The reference implementation is very simple.

22:25.720 --> 22:35.720
It has a method to set the key, a method to set the nonsense, a method to encrypt, to decrypt and to finish the encryption or encryption process,

22:35.720 --> 22:43.720
and generate the message authentication code for what we have encrypted so far.

22:44.720 --> 22:53.720
How do we find those functions inside the idra because we want to log what goes through the encryption and decryption functions?

22:53.720 --> 22:58.720
Well, one very useful trick is to look for Constance.

22:58.720 --> 23:05.720
For example, this constant is used in the original source code for the Shannon Cypher.

23:05.720 --> 23:09.720
We look for the same constant inside the Spotify binary.

23:09.720 --> 23:18.720
You guys get just reads, just like in the original source code.

23:18.720 --> 23:23.720
So we are looking at the writing with some trial and error.

23:23.720 --> 23:29.720
We will figure out where the functions are defined, in the idra and the compile code.

23:29.720 --> 23:36.720
At that point, we can verify our assumption with GDB.

23:36.720 --> 23:40.720
We've done so far only static analysis by finding the function.

23:40.720 --> 23:44.720
Now we do the dynamic analysis.

23:44.720 --> 23:52.720
So for example, if we look at the Shannon Encryp function, we can set a break point where we think the function is.

23:52.720 --> 23:54.720
It has just three parameters.

23:54.720 --> 23:57.720
We are not interested in the first one.

23:57.720 --> 24:04.720
But if we look at the second one, which is the data buffer, which should contain the data that is being encrypted,

24:04.720 --> 24:12.720
that is in the RSI register, the content of RSI looks like a byte pointer.

24:12.720 --> 24:15.720
So we probably in the right place.

24:15.720 --> 24:19.720
We look at the other parameter, which is EDX.

24:19.720 --> 24:21.720
It contains a reasonable number.

24:21.720 --> 24:31.720
And if we print the content of RSI for 398 bytes, which is the content of EDX,

24:31.720 --> 24:35.720
we get something that looks like an access point packet.

24:35.720 --> 24:39.720
So we have the packet type, which is AB.

24:39.720 --> 24:41.720
We have the packet length and then the data.

24:41.720 --> 24:43.720
The packet length is exactly 395.

24:43.720 --> 24:47.720
So it's 398 minus the three initial bytes.

24:47.720 --> 24:51.720
So we are definitely in the right place.

24:51.720 --> 24:59.720
Now that we know the address of the Encryp function, we can set up freedom.

24:59.720 --> 25:05.720
I run all my tests on virtual machine, because I can restore it.

25:05.720 --> 25:09.720
I can go back and stop it and do whatever.

25:09.720 --> 25:15.720
Luckily, free data supports this use case, because it has a free server,

25:15.720 --> 25:22.720
which I can launch on the VM and then connect it from the host through TCP connection.

25:22.720 --> 25:28.720
So you can see here, for example, in the screenshot that I'm connecting to my VM and launching

25:28.720 --> 25:37.720
this spot 5 binary and getting the base address of where the binary code has been loaded.

25:37.720 --> 25:39.720
We can do better.

25:39.720 --> 25:45.720
This is still very manual, because we are into the free data replica.

25:45.720 --> 25:52.720
And so we finally write the free discrete that I was talking about all along.

25:52.720 --> 26:00.720
We get the base address, and then we use the interceptor module to look into the two functions,

26:00.720 --> 26:05.720
which are the encryption and the encryption functions.

26:05.720 --> 26:10.720
For the encryption function, we can do exactly what I just mentioned,

26:10.720 --> 26:13.720
so we just get the value of the register.

26:13.720 --> 26:21.720
We read some memory from, we read some memory for the length that is set in the register.

26:21.720 --> 26:24.720
And we just log it.

26:24.720 --> 26:31.720
For the decryption function, it's not that simple, because when we enter the decryption function,

26:31.720 --> 26:34.720
obviously the data is still encrypted.

26:34.720 --> 26:39.720
We want to look at the data when we exit from the decryption function.

26:39.720 --> 26:47.720
And for that, we'll use a built in free data function, which is on enter and on leave.

26:47.720 --> 26:50.720
So we save the value of the register when we enter.

26:50.720 --> 26:54.720
And then when we leave, we will dump the content of the memory.

26:54.720 --> 26:57.720
And we'll have the decrypted content.

26:57.720 --> 27:00.720
This is what the discrete looks like.

27:00.720 --> 27:07.720
So we see that the client sends the AB packet, which is the login packet.

27:07.720 --> 27:12.720
Then it receives an AP welcome, an access point, welcome packet, which

27:12.720 --> 27:16.720
significates that the the login was successful.

27:16.720 --> 27:21.720
Then it receives a counter code packet, a product, a team for a packet.

27:21.720 --> 27:27.720
And then it sends a mercury request packet, which is another thing.

27:27.720 --> 27:38.720
The mercury protocol is something that they are using to build essentially HTTP over their own custom protocol,

27:38.720 --> 27:41.720
which no idea what they did in the first place.

27:41.720 --> 27:53.720
Then we receive a mercury event, and we receive another mercury request packet, which is actually the response to the original mercury request packet.

27:53.720 --> 27:57.720
With that, we are done with the technical challenges.

27:57.720 --> 28:04.720
Actually, there are many more technical challenges, but those are, I think, the most interesting ones.

28:04.720 --> 28:07.720
And this was the fun part.

28:07.720 --> 28:16.720
This is, for me, at least, the fun part of reverse engineering, the fun part of maintaining such a project.

28:16.720 --> 28:23.720
But also come the legal challenges involved with maintaining reverse engineering projects.

28:23.720 --> 28:31.720
Because if I suppose many of you have an open source project, and you probably don't have to deal with legal problems,

28:31.720 --> 28:36.720
because it's your own code, it's stuff you have written yourself.

28:36.720 --> 28:44.720
Now, no one can tell you that you cannot have that code.

28:44.720 --> 28:51.720
That is not true for Spotify and in general, reverse engineering projects.

28:51.720 --> 29:00.720
So what happens is that we have a constant problem, which is not making Spotify angry.

29:00.720 --> 29:09.720
Because if we do, they have the legal power, the legal strength to take us down, and we don't.

29:09.720 --> 29:24.720
We, as personal contributors to the open source projects, don't have the power or the willingness of answerback is Spotify was ever to, and they did times.

29:24.720 --> 29:34.720
We don't have to, the power to answer back, and so we have to comply essentially.

29:34.720 --> 29:45.720
I said they did, but fire as many, the MCAs on GitHub, it has emailed many people, even me and other maintainers about things.

29:45.720 --> 29:55.720
I've not ever received an actual legal, legal inquiry from them, but some, for example, have.

29:55.720 --> 30:08.720
For this reason, we will not implement many features that some are either requests, heavily requested, and others are kind of essential and said not to have.

30:08.720 --> 30:18.720
Those are listener reporting, which we will talk about losses playback, which is a recent one, and adds playback and support for free accounts.

30:18.720 --> 30:37.720
So what is listener reporting? When you use Spotify, all your listens are being reported to the server, that is to build your recently listened pool of tracks, that is also to influence the algorithm.

30:37.720 --> 30:44.720
And it is also to credit the artists.

30:44.720 --> 30:50.720
One downside of using the LibreSource projects is that this does not happen.

30:50.720 --> 30:59.720
So your listens will not be accounted for, and effectively artists you listen to will not be credited.

30:59.720 --> 31:05.720
That is very sad, but we really can do it.

31:05.720 --> 31:16.720
Not because of the reverse engineering process, but because they rely on an entirely different system for logging playback.

31:16.720 --> 31:26.720
So it does anything to do with the connect state, it does not have anything to do with downloading from the CDN.

31:26.720 --> 31:37.720
It's purely an entirely separate system, and reverse engineering that and making it public.

31:37.720 --> 31:47.720
To bring a lot of risks to the project, because that is something that many people look into to create so-called the list and bots.

31:47.720 --> 32:02.720
So these are services that you can buy as an artist, and get not a free, but a paid boost to your tracks, and essentially get into the algorithm.

32:02.720 --> 32:06.720
This is something that Spotify tries to combat.

32:06.720 --> 32:20.720
If we make that code open source, then suddenly there will be a lot more of those list and bots, and Spotify will come to us and say, what the fact don't do that.

32:20.720 --> 32:24.720
And that is not something that we want to happen.

32:24.720 --> 32:35.720
Another one is Loser's payback. This is a recent one because Spotify has launched support for flag files quite recently, a couple of months ago.

32:35.720 --> 32:56.720
After two or three years that the protobuff for flag playback started appearing, they actually published it, which is the thing I was talking about that in the protobuff files you see things before they are released.

32:56.720 --> 33:05.720
They joined the Game of Ice 5 streaming providers, because the quality they usually serve is quite low.

33:05.720 --> 33:24.720
And for that reason, Loser's payback became very quickly, an heavily requested feature, because many has had in the beginning have DIY setups, where they want to get the best out of their DIY audio pipeline,

33:24.720 --> 33:35.720
or something like that, or like me, which I'm not an audio nerd, I want to get the best out of my extensive subscription.

33:35.720 --> 33:53.720
I pay for it, I want to use all the features essentially, but sadly we have received quite explicitly a message from Spotify that told us don't do that.

33:53.720 --> 34:06.720
If you keep going, you will be in trouble, because those flag files are protected by a new DRM, which we will call stop stop.

34:06.720 --> 34:13.720
It's not actually called stop stop, but it's kind of a meme. You can probably guess what it's real name is.

34:13.720 --> 34:23.720
And for context, other eye-ficing providers do not have DRMs to protect the eye quality flag files.

34:23.720 --> 34:31.720
So what is this DRM, which is the big problem of supporting flag files?

34:31.720 --> 34:41.720
I said it in the beginning, right now, and originally we can just get the encryption keys for the audio files from the access point.

34:41.720 --> 34:50.720
So there's some back-end service that returns the key, and it returns the key as is. So you can take the key, the key, the data, everything is fine.

34:50.720 --> 35:06.720
But recently, recently as in the past year, but also recently as in the Anas Damp thing, they started cracking down on the usage of this API, first for free accounts.

35:06.720 --> 35:16.720
So free accounts started to become heavily limited in what they could do through this API, all the API.

35:17.720 --> 35:33.720
The fun thing is that they started killing their own products. So many Spotify partners were essentially broken. You could not use them anymore, and that is still the case for many accounts.

35:33.720 --> 35:50.720
There's all, there's an old blog post, not really a blog post, but a forum post with people that are really, really angry because they cannot use their iFi streamers for some reason.

35:50.720 --> 36:06.720
And Spotify doesn't really seem to care, honestly. That also breaks some of the labor spot users because they are targeting these old API use for the encryption keys.

36:06.720 --> 36:23.720
The new DRM, it's something entirely different. I call it the new DRM, but essentially there was no DRM before. The new DRM does not serve the decryption key as is, but it serves an obfuscated decryption key.

36:23.720 --> 36:38.720
And you need the to the obfuscated, please stay forward, and luckily the defuscation code contains some constants and some procedures that they can claim for intellectual property infringement.

36:38.720 --> 36:50.720
So we cannot include that code in our public repositories, or they will finally have a reason to take us down.

36:50.720 --> 37:10.720
Last thing is ads playback and free accounts. This is another good choice to not support those for many reasons. One of them is that, but if I doesn't care as long as you touch them in the pockets.

37:10.720 --> 37:29.720
Supporting free accounts would mean that potentially you are stealing revenue. Even if we implement all the limitations that come with free accounts being ads and you cannot listen to your playlists in order, but you have to shuffle it.

37:29.720 --> 37:44.720
Even if we did that correctly, they would still have a lever to say, well, yeah, correctly, but there was one thing wrong, so it's all broken, you have stolen some revenue.

37:44.720 --> 37:59.720
Then it's not hard to reverse engineer, but they tried to hide it, so it's essentially harder than the other stuff, so we would have to waste a little bit more time on it.

37:59.720 --> 38:13.720
The logic changes frequently, depending on the business side of things, and this is what mothers want to know about.

38:13.720 --> 38:42.720
For advanced or simple mods to the Spotify Android app, starting to support those kinds of things would give mothers an insight on how things work, and then we would get contribution from mothers, which is people that Spotify tries to combat nothing wrong with them, but Spotify surely tries to combat mothers.

38:42.720 --> 38:49.720
And we not want to proceed with them, like you.

38:49.720 --> 39:09.720
So, our got temporarily banned from Spotify, that was meant to happen, honestly. On the 29th of October, we created a private but not so private discourse over with some of the other contributors to work on stop stop.

39:09.720 --> 39:16.720
The third on November, we already had a working implementation of stop stop.

39:16.720 --> 39:23.720
So, on the fourth of November, let's go, we have the implementation, let's do this.

39:23.720 --> 39:38.720
So, in GoLiversPot, I first implemented a flag decoder, which I did not have before, with one pull request, and with another pull request, I implemented support for stop stop.

39:38.720 --> 39:58.720
That did not include any of the updated code, the application code. That only included the API calls required to get the application key, and then you would have to provide your own the application code to plug into the project at compile time.

39:58.720 --> 40:02.720
So, that you could effectively use it.

40:02.720 --> 40:12.720
And I started using, because the code was not public, the code was running on my machine. I was happy with it, the flag files were working flawlessly.

40:12.720 --> 40:18.720
I could not hear the difference, but that doesn't matter. I wanted to use it anyway.

40:19.720 --> 40:30.720
The day after, the fifth of November, we receive, we as the main containers of the LibbersPot projects, we receive an email from Spotify.

40:30.720 --> 40:37.720
Actually, a support ticket they opened towards us, which is quite funny for another reason.

40:37.720 --> 40:59.720
Saying that they have seen what we are doing with the DRM, and we should stop doing it, because they needed to preserve the integrity of their platform, the shareholders value and stuff like that.

40:59.720 --> 41:11.720
So, essentially, we actually stopped talking about it, some containers deleted, some public information, but it all ended there.

41:11.720 --> 41:19.720
Nothing in that email said that I could not use it myself, so I kept using.

41:19.720 --> 41:31.720
The fifth of November, or the 15th of November, I recap, I am logged out of all my devices, and I quickly understood that I was banned.

41:31.720 --> 41:45.720
Luckily, I could appeal to my suspension, and just 48 hours later, I got my account back, but I have not used stopstop since.

41:45.720 --> 41:57.720
How to interpret what happened, most likely, it was a warning. I'm pretty sure it was not an automated system, because, yeah.

41:57.720 --> 42:15.720
And that was what we think was a warning towards me, which I was the only one as far as I know, that was using stopstop on my setup and to stop doing it.

42:15.720 --> 42:33.720
As I said, the fact that they opened a support ticket towards us was quite fun, because when we tried to email them back, when we tried to email them back, they never responded to us.

42:33.720 --> 42:39.720
It's not the first time they write us an email, but they never respond when we asked for clarification.

42:39.720 --> 42:53.720
But some days after, when they apparently closed the ticket on their side, we received the original title, the ticket, which was a time sensitive email to send from supportatspotify.com.

42:53.720 --> 42:59.720
And we were asked to rate our interaction with the customer service.

42:59.720 --> 43:25.720
So, to close, how can you help? Of course, with all of the open source projects, if they not try, use it, maybe it's not for you, but use it, Spotify will know if you use it, that doesn't mean you will get banned.

43:25.720 --> 43:39.720
There's many people across the world that are using deliberate spot clients and no one ever reported to have been banned, but Spotify will know, because we do not try to hide.

43:39.720 --> 43:44.120
We explicitly say that we are, we are, we are,

43:44.120 --> 43:47.560
go lieber spot, we are, lieber spot rust and stuff like that.

43:47.560 --> 43:51.000
So that they can, for example, filter us out from,

43:51.000 --> 43:53.320
from their analytics.

43:53.320 --> 43:57.720
Contribute, bug reports, feature requests are always welcome,

43:57.720 --> 44:01.720
which are our best to make those things happen.

44:01.720 --> 44:04.840
But you may want to make things happen on your own.

44:04.840 --> 44:09.360
So write some code, fix a bug, implement some new feature.

44:09.360 --> 44:14.880
If you work as Spotify, as no one here apparently,

44:14.880 --> 44:18.320
get in touch with us, with the lieber spot maintainers,

44:18.320 --> 44:20.240
you already have our emails.

44:20.240 --> 44:24.160
So you can, you can clearly get in touch.

44:24.160 --> 44:28.720
We know you can, you just don't want, and please don't buy me again,

44:28.720 --> 44:31.120
because I actually use it.

44:31.120 --> 44:32.120
Thank you.

44:32.120 --> 44:45.240
Thank you very much for a very interesting presentation,

44:45.240 --> 44:46.600
which is obviously very popular.

44:46.600 --> 44:49.960
We literally have three minutes for questions,

44:49.960 --> 44:51.960
so you're happy to take them.

44:51.960 --> 44:54.120
Yep, okay.

44:54.120 --> 44:55.720
Questions?

45:02.760 --> 45:09.400
Yes, I'm a little bit concerned about the reporting back

45:09.400 --> 45:12.600
and the artists not getting revenue for it.

45:12.600 --> 45:16.760
Is there any changes or communications?

45:16.760 --> 45:20.120
Because I want to support your artist, not support Spotify.

45:20.120 --> 45:24.360
The limitation is getting the code public.

45:24.360 --> 45:25.400
I have it.

45:25.400 --> 45:27.000
I have the code that does that.

45:27.000 --> 45:29.720
I use it, and it works.

45:29.720 --> 45:33.480
The problem with that, I can share it with the people I know.

45:33.480 --> 45:36.920
I know personally, I have contributed on the project,

45:36.920 --> 45:39.880
and I trust, but I cannot share it publicly,

45:39.880 --> 45:43.640
and I cannot share it with people I don't know.

45:43.640 --> 45:48.360
That's the main problem, because I would get in trouble, essentially.

45:49.400 --> 45:51.400
So yeah, that's the downside.

45:51.400 --> 45:57.640
And that's the reason why we continuously try to get in touch with Spotify

45:57.720 --> 46:03.080
and tell them, give us a way to do things correctly

46:03.080 --> 46:04.840
and not with, like, right now.

46:07.960 --> 46:12.600
I have a question, yeah.

46:12.600 --> 46:17.400
I wonder, how did you know that encryption is shanan?

46:21.880 --> 46:27.000
You can reverse engineer some of the original code

46:27.000 --> 46:31.000
from way back in time, which is a lot simpler,

46:31.000 --> 46:33.320
because it's a lot less code.

46:33.320 --> 46:38.200
And what you do, essentially, when you reverse engineer cryptographic stuff,

46:38.200 --> 46:40.760
is that you look for constants.

46:40.760 --> 46:43.320
So you find some magic numbers in the code.

46:43.320 --> 46:46.200
You Google it, and you find what cipher it is,

46:46.200 --> 46:49.720
and that essentially what led us to understand that it was shanan.

46:53.160 --> 46:56.120
We have one more question here, but are you willing to take more questions

46:56.120 --> 46:57.080
outside?

46:57.080 --> 46:57.880
Yeah, yeah, of course.

46:57.880 --> 47:01.800
Yeah, so one more question here, and then if just outside,

47:01.800 --> 47:04.440
you're kindly carry on the conversation here again.

47:04.440 --> 47:07.560
Hi, have you looked into Deezer?

47:08.760 --> 47:10.200
Oh, no, no, no, no, no, no.

47:10.200 --> 47:11.400
I only use Spotify.

