WEBVTT

00:00.000 --> 00:15.000
Yes, thank you, my name is Andres, I'm actually the most known from my Node.js email protocol

00:15.000 --> 00:24.500
Libraries, like the Node Mailer and some other Libraries, basically if you use some kind

00:24.500 --> 00:34.300
of like email client or that uses Node.js or JavaScript or load-flare email workers

00:34.300 --> 00:40.700
and probably there's some kind of my library in the stack somewhere, usually.

00:40.700 --> 00:48.300
But today I'm talking about one of my biggest projects that is called the Vital email server.

00:48.300 --> 00:56.500
It's slightly similar to StarWart and it's also when I started it quite a long time

00:56.500 --> 01:05.900
ago, it's probably in 2016, so it's quite old, but it's not very well known because

01:05.900 --> 01:15.700
it's I've been mostly building it for one single company, but yeah, that's just some

01:15.700 --> 01:25.700
code I came up with, but yeah, it's just that it means that we email is basically to the

01:25.700 --> 01:33.100
end of today's work, it's like a digital passport, it's like a common alone, if you like

01:33.100 --> 01:40.500
sign a document with a token sign or some other platform, then you are identified by your

01:41.300 --> 01:47.100
email address, so it's even though it's not meant to be and it's not the digital passport

01:47.100 --> 01:56.100
and in reality it is, so it's but so the services that are using email are like evolved

01:56.100 --> 02:03.900
quite a lot, but the email infrastructure itself usually is not so much, so in this like

02:03.900 --> 02:14.300
presentation, I'm not trying to like other things, but it's more like a lesson slurrent

02:14.300 --> 02:24.620
store here, why is some design decisions, how they were done, so, and so it's mostly

02:24.620 --> 02:31.420
used by this one single stone and company, and also from a stone from Dalin, and it's

02:31.420 --> 02:41.100
a web hosting company that uses it, there are more than 100,000 active accounts, and if you

02:41.100 --> 02:50.300
do the math and they're like the other, it's a mini actually, a mini one, say quota, account

02:50.300 --> 02:59.740
is 12 gigabytes, so it's if you multiply 12 gigabytes with 100,000 then you get quite a lot,

02:59.980 --> 03:07.660
more than 1000 gigabytes, so obviously not everyone is using all the like space that they have

03:07.660 --> 03:14.300
like available, but in today's world nobody's deleting the email anymore, it's just moved

03:14.300 --> 03:21.020
to the market, I've followed that, and so it means that mailboxes, mailbox sizes are like

03:21.020 --> 03:34.380
confetti growing, and people are reaching the quota limits, and in addition to this company

03:34.380 --> 03:40.700
uses custom email client built on top of a while the KTI, this email client is going to have

03:40.780 --> 03:54.620
to not open source, but I hope it will be, but right now it's not, and yes, so it looks, and

03:54.620 --> 04:03.180
and actually the problem statement, where the who's a fight act, like come from a way to

04:03.180 --> 04:08.940
speak like this in these are in the history of them who don't lamp stack, before we had like

04:08.940 --> 04:14.940
all these cloud hosting providers, they have like tons of small lamp hosting providers,

04:14.940 --> 04:22.860
the Linux Appetra, MySQL, and PHP, so it's something where you would just put and the

04:22.860 --> 04:30.380
host of birthplace for example, and thing was with these kind of providers is that the packages

04:30.460 --> 04:40.700
are usually quite cheap, like $5, $10, or years, and the main business was this website hosting,

04:40.700 --> 04:47.420
but in addition to that, usually the company is the hosting providers, also provided the email

04:47.420 --> 04:57.020
accounts, and at first it was just give away basically, it was cheap to maintain for the

04:57.020 --> 05:04.620
company, because the mailbox size is very small, if you've been around, then you maybe

05:04.620 --> 05:11.500
remember that even hotmail had a mailbox quota size two megabytes, so that's not one email,

05:11.500 --> 05:17.500
that's not one attachment, that's the entire mailbox, so all the emails had to fit into two megabytes

05:17.500 --> 05:27.260
at this place, so when it's not too difficult, even in back in these days to host this,

05:29.260 --> 05:38.460
where it's small mailbox, you don't put a lot of emails into this place, so two megabytes

05:38.460 --> 05:45.020
is like an extreme example, but the 10 megabytes was quite common, I would say, and maybe later

05:45.020 --> 05:51.180
there was a hundred megabytes, but it was still like really small, but once a Gmail game around,

05:51.180 --> 05:59.020
then suddenly a Gmail had a very large quarter sizes, then people started demanding more,

05:59.020 --> 06:05.500
and now the web hosting company has this issue that they were still getting this very cheap,

06:06.780 --> 06:14.380
hosting clients, like $5 or $10, or years, but users demand that large as main boxes,

06:14.460 --> 06:21.740
but they were not, they didn't want to pay for this, so it basically wanted to do the service

06:21.740 --> 06:28.780
for free, but this, but providing the service was not so cheap for their hosting projects anymore.

06:34.540 --> 06:42.860
Yeah, so it's kind of the back history of how one that came to be, it's basically

06:45.340 --> 06:56.460
an attempt to quite cheaply provide more or less normal mail service to users that do not want to pay for it.

06:59.020 --> 07:05.980
Yeah, and there's some additional problems as well,

07:06.060 --> 07:13.820
like with mail, especially, we all have the major information in salvage speech, as well,

07:14.780 --> 07:21.820
the main problem with mail there is that it's all your emails are in a single folder,

07:22.380 --> 07:30.380
and so it gets really difficult to, if you have a large number of users, then it gets difficult

07:30.380 --> 07:40.380
to provision a disk space to host these users. Let's say you have one gigabyte of quota

07:40.380 --> 07:48.860
for one user, and you have one database that then means that you can put thousands of users into this,

07:48.860 --> 07:56.700
but usually users have used, let's say maybe on other it's 100 megabytes, so it turns out that

07:56.700 --> 08:04.140
you have this one database, like this, but you only use 10% of it, so 90% is unused,

08:04.140 --> 08:11.740
and so the providers to, like, got cost, they would oversell it, and instead of selling this,

08:11.740 --> 08:15.980
saying that they have by this, the thousands of users, they've ended 10,000, let's say.

08:15.980 --> 08:24.540
But now sooner or later, he will start to hit the limits, and the users may want to not fit into this

08:24.540 --> 08:31.020
piece anymore, and you have to start to migrate to some other, just in style, you know,

08:31.020 --> 08:37.980
mail server, and migrate a bunch of this mail folder to this new server, but mail

08:37.980 --> 08:44.540
there means that each email is a separate file, and if you have one gigabyte, like, mailbox

08:44.540 --> 08:50.540
that you need to migrate at once, you can just migrate it like separately, but everything needs to be

08:50.540 --> 08:59.420
moved, like, at the same time, then there might be, like, maybe 10,000 or 20,000 files in this folder,

08:59.420 --> 09:05.740
and you have, maybe you have to, like, migrate 100 users of this, so it end up that you have to

09:05.740 --> 09:12.540
start copying, like, millions and millions of small files, and even though, like networks are

09:12.540 --> 09:19.820
fast, then moving small files is never, like it was too fast, so it gets, like, really thick,

09:19.820 --> 09:25.740
and you have to do it like all the time, all of, like, if you go to this kind of hosting products,

09:25.740 --> 09:32.220
like, network hack, then you can see that lights are always blinking, because there's so much

09:34.060 --> 09:44.220
small files, like, moving around, so it's a very complicated. Other problems is that the

09:44.220 --> 09:53.340
configuration changes, the standard source of the user's configuration files, and which means

09:53.340 --> 10:00.380
that if you do something, like, then email address, like, alias to some user, or change the

10:00.380 --> 10:07.660
quarter of whatever, you have to reload the process, but if you have, like, a large number of users,

10:07.660 --> 10:12.700
then there's always someone who's changing something, and but you kind, like, constantly,

10:12.700 --> 10:21.260
keep reloading the, like, to post-fix or Tomcourt or whatever. So what is the solution is that

10:21.260 --> 10:26.620
you just wait a bit, maybe five minutes, wait, then minutes, and then you're reloading,

10:26.620 --> 10:31.740
and then just all the changes are applied, at once, which is, like, highly inconvenient,

10:31.740 --> 10:37.260
because if the user happens, that's, like, an email address, alias, he, like, submits the

10:37.420 --> 10:43.260
form, and now the alias still doesn't actually work, because it takes five minutes, or

10:43.260 --> 10:47.100
then minutes, until there's several configurations, he reloaded.

10:50.940 --> 10:57.660
Some other problems is, like, a spam fighting, just, I first know why they didn't take it,

10:57.660 --> 11:02.940
do anything about it, but then it's just full-time job, somebody has to, like, keep an eye

11:03.020 --> 11:09.820
all the time, what's going on with, like, the spam situation, really bad visibility, just

11:10.620 --> 11:18.460
log files, and two files at best, and one of the worst problems, IP blacklisting,

11:18.460 --> 11:24.220
that if you have a lot, number of users, then there's always some bad apples there, and if someone

11:24.220 --> 11:31.420
just sends out a spam email, and your IP address gets blacklist that you then, nobody from this

11:31.500 --> 11:37.500
server is going to send any email, at least for our, which is, not okay.

11:40.780 --> 11:49.660
And so, while that was meant to, like, fix these issues, there's, there's no single points of

11:49.660 --> 11:59.100
failure, it's, each component can be, like, complicated and if anything comes down,

11:59.580 --> 12:05.740
then one of these components then nothing actually happens. You can just set up a new process

12:07.180 --> 12:16.380
it's also always not that scaling, and you can just add new instances, and you don't have to,

12:16.380 --> 12:23.820
like, migrate use access, just, actually, the main thing here is not to wipe that itself,

12:23.820 --> 12:33.420
but database, because most of the work is done by the database. And also, instant changes,

12:33.420 --> 12:40.460
all changes are fun, VIPI, so it's just, if you have a new area of change quota, then it's just

12:40.460 --> 12:49.020
a place immediately, and also this storage and the API haven't, but it's just getting to it

12:49.100 --> 12:55.420
reversed. And now it's, like, sometimes, there are a lot of things to talk,

12:55.420 --> 13:02.060
because it's used as a MongoDB as the email storage, it might not see very reasonable,

13:02.060 --> 13:09.420
but, but it is, it's well long time down that the databases are not too

13:09.420 --> 13:19.340
good for email, but, but using MongoDB's different, that MongoDB itself, it's, I chose it,

13:19.340 --> 13:27.580
it was back in 2016, it was all the hate, everyone used MongoDB back then, and also one of the,

13:28.620 --> 13:36.620
like, good, like, reasons for that MongoDB is, there's a backing component, because we were

13:36.620 --> 13:44.460
considering the MongoDB, and the hate, and the hate, and the hate, as been, like, for yes,

13:44.460 --> 13:50.140
but because MongoDB had a backing component and it's still around, works quite well.

13:51.340 --> 13:57.100
And, I mean, the issue with MongoDB that it stores documents in, like, like,

13:57.100 --> 14:02.220
not, like, table house, but, like, documents, and this, naturally fits the,

14:02.460 --> 14:09.020
mind-tree structure, because in Wildtuck, when an email comes in, or has uploaded, then it's

14:09.580 --> 14:18.860
passed into a mind-tree and stored as a, like, taste structure, and not as the whole email file.

14:24.860 --> 14:30.540
Yeah, this is a, like, normally, you would, in the mail, you have this store, this email files

14:31.100 --> 14:36.940
in, like, each folder, it may, it may be full, that is, actual folder on this,

14:37.980 --> 14:43.900
but in Wildtuck, the message is passed, and stored in, like, this, stop, just form.

14:45.500 --> 14:51.420
So, what it, it makes, like, the release of two is filled in API on top of it.

14:51.420 --> 14:56.860
So, the Wildtuck API is actually just a, a healing thing in the map, just, on top of MongoDB,

14:57.020 --> 15:03.660
you ask for, like, message, and there's, obviously, some, like, permission checks,

15:03.660 --> 15:10.300
but, in the end, you just get, almost, the whole database echo from, from the database,

15:10.300 --> 15:16.860
and it's just a structure, the JSON. So, it's just a turning, emails from,

15:17.580 --> 15:25.980
storage is, like, extremely cheap and fast. And there's also a lot of storage optimizations,

15:26.060 --> 15:33.260
because, when email comes in, it's, it's all, it's, it's talks, but,

15:33.260 --> 15:39.900
attachments are taken out of it, of the email structure. So, the attachment usual in an email is

15:39.900 --> 15:48.700
space 64 encoded. So, while it takes out, the attachment, the code is into, like, normal,

15:48.700 --> 15:58.220
binary file, and then, it complicates it against the attachment storage. So, if you have all

15:58.220 --> 16:05.020
had the same file stored in the database, then, we do not store it the same attachment second time.

16:09.580 --> 16:15.660
One, the place where it's, like, helps a lot, like, food that she's not sure, that's usually

16:15.660 --> 16:23.740
some kind of logo or something. And, these kind of emails can be, like, in millions, so, instead of

16:23.740 --> 16:30.860
storing a million copies of some logo file, it just stored one copy, and it says all the,

16:30.860 --> 16:37.580
all the emails just have, for instance, one copy from the database. So, it's, like, a, it's,

16:38.460 --> 16:48.300
three level optimizations. And, this, although it just describes how we get back, how we use

16:49.500 --> 16:54.780
attachments. So, the problem is that when we want to delete an attachment, then we want to make sure

16:54.780 --> 17:01.340
that it's definitely not transparent by any email, anyway, because MongoDB is, it's not the

17:01.340 --> 17:08.300
relation database. So, it's kind of, kind of, difficult to keep back what you just, what,

17:09.660 --> 17:16.700
so we use this kind of multi-contest, just to keep back of it. But, I'm not going to be very

17:16.700 --> 17:25.260
deep into it, because I don't have much time. Yeah, that's the same thing. The third level of

17:25.260 --> 17:31.740
compression, but that didn't talk about, was that a wire type of compression, that also applies.

17:31.740 --> 17:39.340
So, we first, when we get the base 64-thousand, then first, we decode it into binary file,

17:39.340 --> 17:46.060
then we duplicate and finally, it's stored as our compressed file. And,

17:46.060 --> 17:56.220
another thing that we did is that we used, like, this custom hardware. So, we set up this setup,

17:56.220 --> 18:03.660
where we had this smaller SSD disk, and the larger SSD disk on every MongoDB instance,

18:03.660 --> 18:11.100
like, several instance. And, to me, it's a point for us that we put all the, we structure the MongoDB

18:11.180 --> 18:20.540
storage in a way that, or the main database was mounted to the SSD, but the, it's, but,

18:20.540 --> 18:32.380
the attachment contents were, mounted to the very cheaper, like, a whole database. And, in this way,

18:33.580 --> 18:38.060
we had, like, MongoDB cluster with several properties. We didn't need to use any, like,

18:38.060 --> 18:47.060
You hate some other kind of guarantees that disks are working.

18:53.060 --> 18:55.060
It's all stateless.

18:55.060 --> 18:58.060
You can just, at the point,

18:58.060 --> 19:01.060
you can connect to any one.

19:01.060 --> 19:03.060
Why does it instantly,

19:03.060 --> 19:06.060
because it doesn't matter which kind of things

19:06.060 --> 19:10.060
you do?

19:10.060 --> 19:14.060
Yeah, maybe just one thing you mentioned is that

19:14.060 --> 19:16.060
we use a basically floating.

19:16.060 --> 19:18.060
So it's a, you know,

19:18.060 --> 19:21.060
change something in an email.

19:21.060 --> 19:23.060
For example, market estate

19:23.060 --> 19:25.060
or at the start of whatever,

19:25.060 --> 19:26.060
or delete it,

19:26.060 --> 19:29.060
then we float in direct glass

19:29.060 --> 19:30.060
with this information.

19:30.060 --> 19:34.060
And if some, the same user has

19:34.060 --> 19:36.060
like active session,

19:36.060 --> 19:38.060
in some other server,

19:38.060 --> 19:40.060
then it gets notified with this,

19:40.060 --> 19:44.060
and we use the reddit soup system for the floating.

19:48.060 --> 19:51.060
Yeah, that's the API thing.

19:51.060 --> 19:52.060
Yeah, that's the,

19:52.060 --> 19:55.060
we don't have to heal out the system anymore.

19:55.060 --> 19:57.060
And there also,

19:57.060 --> 19:59.060
it's just an example

19:59.060 --> 20:01.060
that making a like,

20:02.060 --> 20:05.060
I'm applying on top of,

20:05.060 --> 20:07.060
like, actually, I'm observing

20:07.060 --> 20:09.060
very more difficult than just

20:09.060 --> 20:11.060
facing a JSON from there.

20:11.060 --> 20:13.060
I think, which is coming,

20:13.060 --> 20:14.060
extremely fast,

20:14.060 --> 20:17.060
because it's coming to take from the database.

20:17.060 --> 20:18.060
And also, we can,

20:18.060 --> 20:20.060
as we control an entire stack,

20:20.060 --> 20:21.060
we can use, like,

20:21.060 --> 20:23.060
more than an authentication system,

20:23.060 --> 20:25.060
especially in the way my client,

20:25.060 --> 20:27.060
we can use the,

20:28.060 --> 20:30.060
the pass keys and whatever.

20:34.060 --> 20:35.060
And maybe it's just,

20:35.060 --> 20:36.060
yeah, also notice,

20:36.060 --> 20:38.060
this is that we can also,

20:38.060 --> 20:40.060
we have a really good visibility

20:40.060 --> 20:41.060
into everything,

20:41.060 --> 20:44.060
but it's also happening in the mail glass bar.

20:44.060 --> 20:45.060
So we can just see,

20:45.060 --> 20:47.060
from which geographic location,

20:47.060 --> 20:49.060
someone is logging in,

20:49.060 --> 20:51.060
or we even more important,

20:51.060 --> 20:53.060
which geographic location,

20:53.060 --> 20:54.060
so it's trying to log in

20:54.060 --> 20:55.060
to different kind of users,

20:56.060 --> 20:57.060
or going for example,

20:57.060 --> 20:59.060
just to find out that if someone

20:59.060 --> 21:01.060
is attacking the logging system,

21:01.060 --> 21:06.060
or all kinds of different kinds of events that happen,

21:06.060 --> 21:07.060
because everything,

21:07.060 --> 21:09.060
everything that happens,

21:09.060 --> 21:10.060
is sent to the,

21:10.060 --> 21:11.060
like,

21:11.060 --> 21:12.060
like,

21:12.060 --> 21:13.060
like,

21:13.060 --> 21:15.060
like,

21:15.060 --> 21:16.060
like,

21:16.060 --> 21:17.060
like,

21:17.060 --> 21:18.060
like,

21:18.060 --> 21:19.060
like,

21:19.060 --> 21:20.060
like,

21:20.060 --> 21:21.060
like,

21:21.060 --> 21:22.060
like,

21:22.060 --> 21:23.060
like,

21:23.060 --> 21:24.060
like,

21:25.060 --> 21:27.060
like,

21:27.060 --> 21:29.060
like,

21:29.060 --> 21:30.060
like,

21:30.060 --> 21:33.060
like,

21:33.060 --> 21:34.060
like,

21:34.060 --> 21:35.060
like,

21:35.060 --> 21:36.060
like,

21:36.060 --> 21:37.060
like,

21:37.060 --> 21:38.060
like,

21:38.060 --> 21:39.060
like,

21:39.060 --> 21:40.060
like,

21:40.060 --> 21:41.100
like,

21:41.100 --> 21:42.060
like,

21:42.060 --> 21:43.060
like,

21:43.060 --> 21:45.060
like,

21:45.060 --> 21:48.060
like,

21:48.060 --> 21:50.060
like,

21:50.060 --> 21:51.060
like,

21:51.060 --> 21:52.060
like,

21:52.060 --> 21:53.060
like,

21:53.060 --> 21:59.380
identified the person who sent a spam so we can disable it and just who

21:59.380 --> 22:04.660
everybody else emails to some other NJ server until don't have to

22:04.660 --> 22:08.980
work at this specific IP doesn't work at all.

22:08.980 --> 22:20.780
Just yeah and just to finish it, what work is that among what they

22:20.780 --> 22:27.020
mean, if it's what well they're mind-tree, they're having everything

22:27.020 --> 22:31.540
stateless, it's huge win because you can just really easily replace

22:31.540 --> 22:41.100
upgrade for example and stuff, yeah and these saving scores like the

22:41.100 --> 22:46.780
carries the scores, but main problem is that as the database is really

22:46.780 --> 22:52.220
large like hundreds of the habits of data, then it gets quite a

22:52.220 --> 22:58.620
difficult to tune this kind of database and also I'm a

22:58.620 --> 23:03.180
partner well I say in the third cases which so we each email client can

23:03.180 --> 23:12.620
make some really worth a comment, yeah that's it for me

23:16.780 --> 23:30.460
I see that you're not storing the raw line is decomposing

23:30.460 --> 23:36.460
that as an I'm explained I need the original line, for example what do you

23:36.460 --> 23:41.100
do if I want to verify a digital signature and am I getting advice

23:42.060 --> 23:47.740
you're hitting the mind-tree? yeah yeah the question is that we

23:47.740 --> 23:57.340
basically decompile the email into stocks at the format but in a lot of cases

23:57.340 --> 24:03.420
we need the original email back so the answer is that in both cases you

24:03.420 --> 24:09.500
really get the same email bite to bite but not in all cases because

24:09.500 --> 24:16.780
when we like take the email into bot then there's some

24:16.780 --> 24:21.980
data loss involved it's very little but it's still there so

24:21.980 --> 24:27.740
for example in take-aim validation we just do the old evaluation beforehand

24:27.740 --> 24:31.980
they store this information with the email so when you make the API request

24:31.980 --> 24:35.100
to get the information up to you and then you get also the

24:35.100 --> 24:38.460
authentication validation results that this

24:38.460 --> 24:43.260
but if you would try to replicate it then in most cases you would get the same

24:43.260 --> 24:48.620
result but not in all cases. the reason why I'm asking is the

24:48.620 --> 24:52.860
I'm up to the email signatures that I can sign without encrypting and then I can

24:52.860 --> 24:58.860
verify any more of this program right? yes

24:58.860 --> 25:04.860
let's move to the same question

25:04.860 --> 25:11.020
I want to continue the question what will be this poor word in both sides

25:11.020 --> 25:17.340
in my hands it was a poor word in you continue the same

25:17.340 --> 25:23.180
it's very sticky for example and the sticky in the side is a body

25:23.180 --> 25:30.780
and he does and if order and construction of basis before you can

25:30.780 --> 25:34.380
deeper and basic support structure I should

25:35.340 --> 25:39.740
but if it's a case that they keep signature work

25:39.740 --> 25:45.020
it was applied to the body and more right? yeah the question is about

25:45.020 --> 25:49.820
foreharding and keeping the integrity I guess

25:49.820 --> 25:57.340
in foreharding everything works because like email like

25:57.340 --> 26:00.940
decompiling happens when we store the email but the whole having

26:00.940 --> 26:04.620
happens before it so we get an email we see that it matches

26:04.620 --> 26:07.980
I'm going the foreharding hole and just forehard email and that's it

26:07.980 --> 26:13.420
and store it's still hits in the email but then we store it on the disk

26:13.420 --> 26:19.660
tell me to do the take on piling and after this you don't do any foreharding anymore

26:19.900 --> 26:25.740
I may have missed it but how do you undo search?

26:25.740 --> 26:30.220
search is handled by MongoDB built in the text search

26:30.220 --> 26:40.220
okay it was really fast anyway otherwise thanks again

