WEBVTT

00:00.000 --> 00:12.000
Let me tell you a story, who will talk about Willow, a family of...

00:12.000 --> 00:16.000
Willow, a family of peer-to-peer storage protocols.

00:16.000 --> 00:18.000
Let's welcome her with a round of applause.

00:18.000 --> 00:36.000
Once upon a time, one unique time stamps was still small.

00:36.000 --> 00:44.000
The ancients worked together to build a system which would let them share their knowledge from afar.

00:44.000 --> 00:56.000
The system connected places of learning and all were hopeful that it would usher in a new golden age of art and science.

00:56.000 --> 01:01.000
But the wicked hearts of men were drawn to the power of this system.

01:01.000 --> 01:09.000
And slowly, piece by piece, they twisted the system to serve their bottomless grave.

01:15.000 --> 01:18.000
Until it was theirs.

01:18.000 --> 01:22.000
So listen well, system builder.

01:22.000 --> 01:29.000
The systems you build today will be the weapons used against you tomorrow.

01:31.000 --> 01:51.000
And now the title screen.

01:51.000 --> 02:04.760
I'm not really sure why I wanted to include a title screen here, but after this, the video

02:04.760 --> 02:16.000
game, illusions and... I should have taken that out. I'm Sammy Golo, a programmer, illustrator

02:16.000 --> 02:21.040
and mum. I've been working on some peer-to-peer protocols, called Willow, together with

02:21.040 --> 02:27.120
Aliosh and Maya, who sadly could not be here today. The story you just heard is one we're

02:27.120 --> 02:31.600
waking up to. Centralized systems were designed with the best of intentions, but we're

02:31.600 --> 02:37.520
weaponized anyway, and peer-to-peer systems will be exactly the same. If the next generation

02:37.520 --> 02:43.280
of network succeed, there will be well-resourced people trying to turn them against us. But how

02:43.280 --> 02:48.880
did they do this? I'm going to put the weaponization of networks into two broad categories,

02:48.880 --> 02:55.520
the weaponization of data and weaponization of infrastructure. Data, particularly data connected

02:55.520 --> 03:01.840
in the videos, is valuable and has a market, buyers who use that data to advertise to you,

03:01.840 --> 03:08.160
who use that information to watch you, to condemn you and eventually to find you. That might sound

03:08.160 --> 03:12.720
alarmous, but that is exactly what is happening in the United States as ice partners with

03:12.720 --> 03:17.840
commercial data brokers. It was unthinkable not that long ago, and if it's happening today.

03:19.440 --> 03:23.840
The infrastructure of the network themselves is also weaponized. If you force others to

03:23.840 --> 03:28.880
traverse a network for basic functionality, you can listen to communications as a man in the middle.

03:29.920 --> 03:34.400
By co-locating storage and functionality on certain points of networking, you can force new

03:34.400 --> 03:38.240
terms and features on users who have no other choice. You can even make it so that people

03:38.240 --> 03:43.760
need to buy new hardware to participate on the network. Our reliance on massive data centers

03:43.760 --> 03:48.320
has even been weaponized to create a speculative AI bubble. How do you make the next generation

03:48.320 --> 03:52.720
of protocols more difficult to weaponize? This is the lens through which I'd like to introduce

03:52.720 --> 03:58.640
willow today. Willow is a family of peer-to-peer protocols which deal with accessing a modifying

03:58.720 --> 04:05.520
share data, access control and data exchange. These take the forms of protocols and low-level

04:05.520 --> 04:11.520
libraries. We want to provide plumbing, not products. I'm going to be talking about the why

04:11.520 --> 04:16.000
of Willow mostly, rather than what or how. That's because if you'd like to learn about the

04:16.000 --> 04:19.760
features of Willow or how it does what it does, we've got a website which can tell you that with

04:19.760 --> 04:25.760
specifications, diagrams, comics and lots more besides. There are QR codes as you've seen along

04:25.760 --> 04:29.040
the bottom of the slides, linking to different concepts where you can learn more.

04:31.280 --> 04:36.160
So, data has value. Other ways that we can make it more difficult to gather.

04:37.760 --> 04:43.600
Newtability is the ability to change a mapping of one value to another. Willow is a system of

04:43.600 --> 04:48.080
mapping data to names. The deliberate act of naming makes it possible for us to assign new

04:48.080 --> 04:54.080
values to names and to forget the old values entirely. People make mistakes and change their minds.

04:54.160 --> 04:57.760
We want to system which gives users agency of their digital past.

04:57.760 --> 05:03.360
Neutability also makes external moderation possible. For getting data makes it harder to

05:03.360 --> 05:09.280
hoard and sell and analyse. And that's important because data has value. If the vast majority

05:09.280 --> 05:13.920
of peers coordinate deletion, it decreases the window in which malicious data order can

05:13.920 --> 05:20.400
in-depth data before it disappears. Even metadata has value. People have been drone-bombed

05:20.400 --> 05:25.520
for being associated with metadata. So it's just not enough to be able to reassign values to names.

05:25.520 --> 05:29.040
Sometimes we want to forget the fact that any data was assigned to those names to begin with.

05:29.920 --> 05:34.480
Willow's naming structure allows peers to forget this kind of metadata and data wholesale.

05:35.120 --> 05:39.840
This is a mitigation, of course. We can't force malicious peers to forget. The same way

05:39.840 --> 05:44.160
I can't force you to forget this presentation after seeing it. But malicious peers have a

05:44.160 --> 05:48.880
much harder time when they're operating in a network of peers working together. That improvement

05:48.880 --> 05:54.560
of the odds alone is worth it. There are systems which present the most recent version

05:54.560 --> 05:59.280
as the true value, but still hold on to the data and metadata in the background.

05:59.280 --> 06:04.000
Usually as a requirement of some data trucks they're using. This is Earth's mutability

06:04.000 --> 06:08.160
and does not sufficiently protect people. Not only does it not protect people but it

06:08.160 --> 06:13.280
has another cost. Those old values need to be stored somewhere and quite often they need to be processed

06:13.280 --> 06:17.920
too. When you have authentic deletion, you don't have these costs and you make it cheaper to

06:18.000 --> 06:24.000
participate in the system. Cheapness is an interesting quality. When the requirements for running

06:24.000 --> 06:28.000
your own network a lower, it's harder for others to make you feel like there's no alternative.

06:28.560 --> 06:34.560
How can we further pursue cheapness? Let's talk about CRDTs. There are very

06:34.560 --> 06:39.760
fancy CRDTs out there which are able to intelligently merge fine-grained edits into a single coherent

06:39.760 --> 06:44.640
edit. These kinds of fine-grained CRDTs are critical for applications where many people are editing

06:44.640 --> 06:51.040
the same thing at the same time, like a word processor or task list. But these fancy CRDTs

06:51.040 --> 06:56.000
also come with a cost, storage and computation. Participation requires carrying around a

06:56.000 --> 07:01.360
history of changes, perfect for data hoarders and a duty to process them into a final state.

07:01.360 --> 07:06.400
This puts a hard ceiling on how large and long-lived these spaces can be. Not cheap.

07:07.680 --> 07:13.440
Do we always need the ability to reconcile fine-grained edits? Arguably no. Most of the

07:13.440 --> 07:17.840
day, we're using applications where people author data alone and share it with others,

07:17.840 --> 07:24.000
private messaging, chat rooms, web pages, microblobs, media libraries, archives, forums,

07:24.000 --> 07:28.480
issue trackers. You could use a fancy CRDT for these, but you would be adding

07:28.480 --> 07:34.560
significant cost without benefit. So what if in the name of cheapness, we said we're not going

07:34.560 --> 07:38.480
to support collaborative word processors, but the majority of applications instead.

07:38.480 --> 07:44.880
Willow using an extremely simple CRDT, last right wins between different devices owned

07:44.880 --> 07:51.680
by the same user. It's dirt cheap. Cheapness is interesting, not because it lets you run a

07:51.680 --> 07:55.680
community of millions on a single server, but because it lets you run a community of a

07:55.680 --> 08:01.520
dozen on a potato. This is critical in a world where we need to get more out of the hardware

08:01.520 --> 08:09.280
that we already have. I just mentioned different devices owned by the same user. This implies

08:09.280 --> 08:15.120
there's some notion of identity in Willow. Earlier, I mentioned how data is weaponized and used

08:15.120 --> 08:20.000
to advertise to watch and locate individuals. This is done by associating different data points

08:20.000 --> 08:24.800
with one another through an abstract identity or digital double, uniting them or which can

08:24.800 --> 08:30.800
then often be tied to a real person. There's also a growing mania for associating individuals

08:30.880 --> 08:36.160
with a single digital identity. When digital identities become synonymous with the flesh and blood

08:36.160 --> 08:41.120
beings, they're set to exclusively represent. We create the conditions for grievous fraud.

08:41.840 --> 08:46.080
How can we introduce identity to Willow in such a way that we can minimize these dangers?

08:46.080 --> 08:52.240
And at the same time, enjoy intuitive access control. Users of Willow can issue their own identities

08:52.240 --> 08:57.520
which will randomize ideas. They can keep that identity themselves or share it with others.

08:57.600 --> 09:02.240
They can ascribe as much or as little identifying data like a name or picture to that idea as they

09:02.240 --> 09:07.520
want. They can reuse a single identity across many communities or issue themselves as many as they

09:07.520 --> 09:13.680
wish. For whatever scenario they see fit. It's a start, but malicious actors can still

09:13.680 --> 09:19.280
de-obfuscate the relation between different identities. We see this on the web where technologies

09:19.280 --> 09:23.520
like cookies are used to connect to single use different identities from across many different

09:23.520 --> 09:28.480
domains and communities. Your identity as a reader of anarchist theory may be connected to your

09:28.480 --> 09:33.280
identity out of your work, for instance. This is possible because the web is different domains

09:33.280 --> 09:38.080
are able to bleed into one another and share information. But what if we formalize a hard and

09:38.080 --> 09:44.080
crossable threshold between the domains? Willow has namespaces, which actors completely independent

09:44.080 --> 09:49.120
universes of data and in space for yourself and in space for your friend circle and in space for

09:49.200 --> 09:54.000
a social network. What belongs in one the namespace cannot cross over into another and nobody can

09:54.000 --> 09:59.200
learn what is within a given namespace without being given explicit consent to access at first.

10:00.800 --> 10:05.680
But the barrier of explicit consent is still not enough. It is still possible to use a single

10:05.680 --> 10:10.640
identity across many namespaces linking them. A malicious actor can learn a lot by simply knowing

10:10.640 --> 10:16.640
which namespaces you belong to. To mitigate this, it must be impossible for someone to learn

10:16.640 --> 10:21.600
about the namespaces someone else is interested in unless they already know about it themselves.

10:23.680 --> 10:28.400
Willow uses a system called private interest overlapped, confidentially determined what data

10:28.400 --> 10:33.440
any two peers are both interested without revealing any interest, which their partner does not

10:33.440 --> 10:38.640
know about themselves. This works at quite a granular level so you can not only hide namespace

10:38.640 --> 10:42.640
from other peers but also parts of the namespace they don't have access to and presumably

10:42.640 --> 10:47.120
have no knowledge of. This makes it very hard for malicious peers to gather data and infer

10:47.120 --> 10:53.200
connections between data and identities. Within dependent namespaces, private interest overlapped

10:53.200 --> 10:58.480
and usability, communities can have private digital spaces with real moderation. This is especially

10:58.480 --> 11:02.240
relevant to vulnerable communities who need to be able to communicate with each other without

11:02.240 --> 11:08.560
being harassed and spied upon. But by retreating to our own enclaves, are we losing something?

11:09.120 --> 11:13.280
There is something special about finding a new website or profile you've never seen before

11:13.280 --> 11:18.240
and perhaps making new friendships through that. Inversely, there's also something special to

11:18.240 --> 11:22.080
being able to publish something you made and have it reach far further than you never imagined

11:22.080 --> 11:28.000
it could. Can we do something like that? One mitigating? It's accompanying risks? The risks

11:28.000 --> 11:32.800
being harassment, drive-by exposure to things you don't want to see and being forced to

11:32.800 --> 11:38.640
participate in the same space with entities you want nothing to do with? Quite often, you can

11:38.640 --> 11:43.440
only do something after the fact. In social networks you can block. On the Fediverse, you can go

11:43.440 --> 11:49.520
a little further and you federate. Both of these are opt-out approaches. But what if we invert that

11:49.520 --> 11:54.000
and make a public network that you opt into? Where every person you read data from is someone

11:54.000 --> 11:59.440
you've chosen to listen to and who is allowed you to listen? So in addition to the prior that

11:59.440 --> 12:04.720
invite only namespace, will those access control system let-o-cap? Has communal namespaces

12:04.720 --> 12:08.560
where anyone can claim their own little slice of the space, which they can write whatever they

12:08.560 --> 12:14.000
want to, with a catch being that nobody has to listen. Other users must choose whether they're

12:14.000 --> 12:19.360
interested in your day's first and must be given express permission to read them. Both

12:19.360 --> 12:24.160
stars of namespace have explicit concerns at their heart. You can never read or write any data

12:24.160 --> 12:28.160
without someone having given you express permission first. Perhaps that permission is someone

12:28.160 --> 12:33.040
saying I allow it absolutely everyone to see this, but at least it's always explicit.

12:34.800 --> 12:40.320
What I've been scurting around is who stores well-o-data and how is it exchanged? This thing

12:40.320 --> 12:45.040
appeared to peer system, the data restored and used its own devices. Devices only store what

12:45.040 --> 12:49.600
they've expressed an interest in, and they can express a lot of granularity in that request,

12:49.600 --> 12:55.440
not just what you do, only blog posts, but also when, like, only data from the last week,

12:55.440 --> 13:00.800
or how much, like, only the most recent 100 megabytes. Creating a correspondence between

13:00.800 --> 13:05.200
the community which uses network and infrastructure which hosts it is a wonderful way to keep

13:05.200 --> 13:12.240
outside interest from screen things up. But how does data move from peer to peer? Many protocols

13:12.240 --> 13:16.880
seem a bi-directional connection can always be established, but establishing such a connection

13:16.880 --> 13:23.680
is a privilege. Maybe you don't have access to such infrastructure, or maybe you can't trust

13:23.840 --> 13:28.240
that infrastructure. Every connection leaves the trace, and perhaps that's something you can't

13:28.240 --> 13:35.840
afford. Infrastructure is a process, not an end state. Connectivity is lost and found.

13:35.840 --> 13:42.400
Servers go down for maintenance or permanently. Volunteers move on, channels are compromised,

13:42.400 --> 13:47.360
there must be different means of moving data to suit the given moment. And that's why we've

13:47.360 --> 13:51.920
decoupled our data model entirely from how the data is exchanged, and this allows us to design

13:51.920 --> 13:57.680
different sync protocols for different situations. For the good times, when you can establish

13:57.680 --> 14:01.920
bi-directional connections, we've sync protocols which are able to confidentially determine

14:01.920 --> 14:06.960
common interest with private interest overlap, resist, man, and the middle attacks. Intelligently

14:06.960 --> 14:11.280
determine the least amount of information to exchange via a range-based set reconciliation,

14:11.280 --> 14:15.040
and even communicate memory constraints with each other so that constrained devices like

14:15.040 --> 14:20.720
microcontrollers can participate. These protocols don't specify the transport used, so it could be

14:20.720 --> 14:25.520
web sockets, Bluetooth, or anything else capable of establishing bi-directional communication.

14:27.120 --> 14:30.800
In particular, these protocols are able to let you securely communicate with peers

14:30.800 --> 14:36.000
you don't necessarily trust, which is vital when you need to take every opportunity you can get.

14:37.600 --> 14:42.160
For everything else, that's the drop format. This protocol serializes WillowDasis to single

14:42.160 --> 14:48.960
blob, which can then be sent over the infrastructure you trust or already have. Signal, email,

14:49.040 --> 14:56.720
or maybe personally transported by USBQ. We want to contribute and you can live your protocols,

14:56.720 --> 15:01.360
able to meet the many crises that we're meeting today. Users should be able to see the risks

15:01.360 --> 15:07.040
they're taking on with open eyes and have the tools to remedy them. One of the crises we

15:07.040 --> 15:12.000
slept walked into was caused by believing that the internet was a new world separate from ours.

15:13.040 --> 15:18.720
But networks are not a femoral virtual world, but grossly physical, a sprawl of cables and

15:18.720 --> 15:23.280
humming machinery forced to creep over the surface of the earth. These networks have real

15:23.280 --> 15:28.320
physical demands and constraints, and the protocols of tomorrow will need to justify every last

15:28.320 --> 15:37.120
bite of storage and memory. The systems we build today will be the weapons used against us

15:37.120 --> 15:43.760
tomorrow. Reality is messy, and when we ignore that, we betray the users who depend on the systems

15:43.760 --> 15:50.320
we design. So, let's make our systems as hard to wield against us as possible, and perhaps

15:50.320 --> 15:53.920
then we can truly make the mouse. Thank you.

15:53.920 --> 16:16.800
Thank you. Thank you, Simon. So great. Thank you. I said you got so many

16:17.760 --> 16:24.160
questions. Just apologies for the ignorance because I haven't. I don't know much about

16:24.160 --> 16:29.600
will I, but I was wondering if you could say perhaps if you're able to who maybe your biggest

16:29.600 --> 16:37.040
consumer is or maybe who is somebody who maybe maybe using it quite frequently. We are

16:37.040 --> 16:42.880
deep in the kit, I've got a mic at so far. We're deep in design and theory and implementation,

16:42.880 --> 16:51.040
so we're mostly just still procuring funding and building things.

16:51.040 --> 17:00.800
Hi. Just trying to understand your aim, so you're trying to provide a foundation like the

17:00.800 --> 17:06.720
web has, so including the protocol and how should different node access and how to be stored

17:06.800 --> 17:12.720
data. Yes, it's quite low. It's just kind of like protocols for how do you access

17:12.720 --> 17:19.280
and modify that data, how do you control the access to it, and also providing different

17:19.280 --> 17:24.080
means to exchange it with other people. But you could use existing protocols, transports,

17:24.080 --> 17:31.760
protocols, the web. But I mean like, Internet could be your underlying protocol use we're

17:31.760 --> 17:35.520
thinking but like we didn't have that. Yeah, no, it's definitely supposed to be like a

17:35.520 --> 17:40.560
companion to like the web. I see you. Thank you.

17:53.360 --> 17:58.400
It just looks really great. Thanks a lot. Do you already have applications for your protocol

17:58.480 --> 18:04.080
or are you still trying to figure out how all these different pieces fit together?

18:04.080 --> 18:09.360
We have rust implementations for the data model, and also for our capabilities system

18:09.360 --> 18:15.360
meta-cap, and we hope to have persistent storage and the drop format implemented in the next

18:15.360 --> 18:20.480
month or so. No, on the application layer currently you don't have.

18:28.480 --> 18:33.680
Thank you for the fantastic presentation. I was really inspiring. And I'm just wondering,

18:34.320 --> 18:40.880
first of all, you did you who did the drawings? Great. Fantastic. I figured as much. And just one

18:40.880 --> 18:46.000
question. So I believe I understand that you start from one theoretical starting point, which is

18:46.000 --> 18:50.480
the sentence you stated, that these will be the weapons tomorrow of tomorrow that we'll be used

18:50.480 --> 18:56.880
again. And how far are you in solving this problem you would say from a theoretical standpoint?

18:59.120 --> 19:05.360
We've been working on these protocols for, I would say, like four years or so, and working

19:05.360 --> 19:14.400
closely with sort of the prominent thinkers in within that field as well. So I feel like our designs

19:14.400 --> 19:18.720
are pretty solid at this point for the data model, like the specifications are final.

19:20.880 --> 19:26.880
So, yes, I think this is what we're going to shoot our shot with really.

19:28.400 --> 19:31.440
Thank you very much. Do you have more questions? Please do this.

19:33.040 --> 19:37.760
We have a question from the internet.

19:37.760 --> 19:41.760
Okay. Tristan, Tristan, Tristan merely asks, any thoughts about backups?

19:42.800 --> 19:48.560
Backups. Yes, you could, you could use it because this is kind of like, um,

19:50.480 --> 19:54.720
because we don't use CRDTs. For days, basically, everything's just kind of like bite strings.

19:55.680 --> 20:01.120
This system is ideal for storing like very large blocks of data. So, yeah, backups.

20:01.760 --> 20:02.800
Great. Let's do them.

20:05.280 --> 20:07.200
I'm not in the scene of the interrupt. Sorry.

20:13.440 --> 20:22.720
I wanted to ask about how to implement VLO protocol, but that question kind of was answered or asked already.

20:23.040 --> 20:30.400
But I will wondering, so it seems like it's a different kind of network.

20:30.400 --> 20:37.920
So the devices that implement VLO protocol and everything else would be different in the

20:37.920 --> 20:45.280
internet. I mean, they're all connected in the internet, but it could be that some places,

20:45.280 --> 20:51.520
some remotes, don't implement VLO and some will implement VLO. And the question is how you

20:51.600 --> 20:56.720
distinguish between each other? Yeah. I mean, I think kind of one of the

20:56.720 --> 21:01.920
core design principles that we have is sort of meeting people where they're already at with sort

21:01.920 --> 21:04.880
of the hardware that they have or the infrastructure that they're already trust.

21:06.960 --> 21:14.240
And so, yes, if they need, if they need to be on the web or they need to really not be on the web,

21:14.240 --> 21:21.200
you know, we want this to be something that they can use and mix or not really. So, yeah,

21:21.280 --> 21:26.240
these things can sit side by side or they can be separate or, um, okay.

21:26.240 --> 21:29.200
That makes sense? Yeah. Great. Thank you.

21:30.240 --> 21:37.200
Well, especially in the room. Okay. And now that I have your attention. So first, we're going to thank you again.

21:37.200 --> 21:51.200
Thank you.

