WEBVTT

00:00.000 --> 00:11.960
Well, hello everyone, my name is Jacomo. I'm an Italian developer. I do quite like

00:11.960 --> 00:17.200
function programming though that is not the topic of today. I'm helping build what I think

00:17.200 --> 00:21.000
is a very cool open source project it's called gleam. It's a programming language. If you're

00:21.000 --> 00:24.880
curious about it, let's have a chat about it later. But we're here to talk about data

00:24.880 --> 00:30.400
phases today. Talking about data phases, of course, maybe I don't know our marketing

00:30.400 --> 00:36.640
department or whatever marketing advertisements, they want us to make beautiful graphs

00:36.640 --> 00:41.280
and crunch data because of course they do. They need that and so it's up to us to then

00:41.280 --> 00:46.400
go fetch the data and figure out all the data crunching and data wrangling that we need.

00:46.400 --> 00:51.360
Or maybe we're just building something for our friends. Well, I do love reading. I have a

00:51.440 --> 00:55.680
book club with my friends and of course being a nerd knowing how to code. I had to do

00:55.680 --> 01:00.160
something with all the data that I was gathering from them. And so here we are. We have books,

01:00.880 --> 01:05.920
they have genres, we rate them and so on and so forth. This is really just a simple example.

01:05.920 --> 01:10.400
But it doesn't really matter what you are building because when you are dealing with a

01:10.400 --> 01:15.360
database, well, that will come with the age or question of how do I talk to the database?

01:15.360 --> 01:20.400
How do I get the data out? How do I do the data wrangling? And answers might take

01:20.400 --> 01:25.840
entirely different looks. On one end of the spectrum, we might be writing sequel totally fine.

01:25.840 --> 01:31.840
At the other end, we might do something quite opposite. We want to abstract it away to even hide

01:31.840 --> 01:37.680
the fact that sequel is there. Using fancy tools like what amps, that shield the developer

01:37.680 --> 01:42.400
from ever having to write any line of sequel. That can be pretty cool as well.

01:43.200 --> 01:47.840
And usually as developers, when we are faced with this kind of choices, we like saying that

01:47.840 --> 01:53.200
a developer should really be a polyglot. We should be using the right language, the right tool

01:53.200 --> 01:58.640
for the job. We like that because that allows us to then advocate for the language we prefer.

01:58.640 --> 02:04.560
Of course, we should be using Java or Python. Oh, oh, not really the Java audience. I see.

02:04.560 --> 02:09.120
Okay, all gleam, for example, because that is a great language and we should be using that.

02:09.840 --> 02:14.400
But, well, it doesn't really matter. You could put any language in there. I think that

02:14.480 --> 02:20.080
they're right tool for the job. When it comes to talking to a database, well, that is sequel.

02:20.080 --> 02:26.960
That is the language meant to be talking to the database. So, it does some really nice advantages.

02:26.960 --> 02:31.040
I think this is quite the nice crowd to give this talk to because I want to find much push

02:31.040 --> 02:37.600
mixing that sequel is good. It makes us more intentional about how we are implementing our specific

02:37.680 --> 02:43.120
workflows because we can see that we're implementing something using what one, a pair,

02:43.120 --> 02:48.800
tens, hundreds or different queries and then we can see that we're doing something that is inefficient.

02:48.800 --> 02:54.240
Usually, we want to do as much as possible with this little round trips to the database as possible

02:54.240 --> 02:59.200
and being the ones that are writing the queries rather than having them generated under the

02:59.200 --> 03:06.480
route as an implementation detail puts us in the control spot where we can decide how to do something.

03:06.560 --> 03:11.360
So, it's also a lot harder to overfetch data because we are the ones who need to be writing select

03:11.360 --> 03:16.880
star overfetching the data. We can catch that pretty easily. I was reading our

03:16.880 --> 03:24.480
cool article, scaling Bosque's URL to power 800 million users and this company was facing

03:25.200 --> 03:30.640
most of its problems with some problematic queries coming from algorithms and they ended up having

03:30.640 --> 03:35.920
to then inspect those and see what was going on understanding what was going on under the wood.

03:36.640 --> 03:41.200
So, not really an implementation detail at this point when we need to have a look at it.

03:42.240 --> 03:49.280
So, if using SQL and not saying it's going to magically make our applications faster,

03:49.280 --> 03:54.720
but it kind of drives us towards this bit of success, what it's easier to do the right thing

03:54.720 --> 03:59.040
to not make something that is terribly and woeful is low without even noticing.

03:59.920 --> 04:07.440
So, if SQL is so good, then where is all the SQL? Why do we feel like we need different tools?

04:07.440 --> 04:12.800
Why do we feel like as developers we need to abstract it away? Well, when it comes to writing

04:12.800 --> 04:17.280
an application, we usually do that in a different language. This is what it might look in

04:17.280 --> 04:21.440
a gleam, for example, to run a query, it doesn't really matter how it does it. If you squint hard

04:21.440 --> 04:26.080
enough, it would be exactly the same in say Java, for example, seeing that you love it.

04:26.800 --> 04:32.400
So, what happens is our query becomes a string in our programming language.

04:32.400 --> 04:38.320
One first drawback is we lose syntax a lighting and then we need a way to bridge this gap

04:38.320 --> 04:43.920
between the relational wood where we have tables with rows and columns and the ward of our

04:43.920 --> 04:50.400
application where we might have objects, data structures or whatever we might use to then represent

04:50.400 --> 04:57.760
some domain concept. And so, we need to tell it that for example, here the first column

04:57.760 --> 05:02.880
ISBN that is a string and we're going to use that as the book I just began indeed.

05:02.880 --> 05:08.000
The second column is again a string that is the title and then we're using it to build this

05:08.000 --> 05:14.240
book data structure. It doesn't really matter how gleam does it. And this might look totally

05:14.240 --> 05:19.840
reasonable as long as we don't start writing more than just the simplest of the queries.

05:19.920 --> 05:23.840
Because of course, then requirements change, maybe we need to fetch more data.

05:23.840 --> 05:29.120
Maybe someone comes and does it refructuring, they switch columns around in our select query.

05:29.120 --> 05:35.280
Well, that might look silly, but I guarantee you if something can happen, it will happen sooner or later.

05:35.280 --> 05:40.320
And then now we have some really hard to find bugs because the decoder we've written saying that

05:40.320 --> 05:46.000
the first column is the, I just began and the second is the title is no longer right. And when do we

05:46.000 --> 05:51.280
start seeing these errors pop up? Well, if we've been diligent and we've written tests,

05:51.280 --> 05:55.760
those will happen in our test suit and we'll see something fail, hopefully, if we have a good

05:55.760 --> 06:01.200
coverage. Otherwise, those are going to happen in production, where things suddenly go wrong

06:01.200 --> 06:06.240
and then we get page and we need to figure out why something broke. Not really a pleasant developer

06:06.240 --> 06:11.440
experience. So we start from our place where we're thinking about the data that we need and really

06:11.440 --> 06:15.840
what we want to do is we want to fetch and use it, that's it. Of course, we need to write some

06:15.840 --> 06:21.360
SQL to do that, but now going from SQL to fetching the data is not a simple because we need to

06:21.360 --> 06:26.960
write some glue code that decoding logic that bridges the gap between these two words. And then

06:26.960 --> 06:32.240
finally we use it but glue code while it's in the name, it's sticky. It calls for more glue code

06:32.240 --> 06:36.960
and then we find bugs and we need to go back to writing SQL and then we need to write again

06:36.960 --> 06:44.400
some more glue code. It's really painful. No one wants to do that. So what should we do? Should we

06:44.480 --> 06:50.160
just give up? Except that as now I said SQL is, well, it's not the right language when building

06:50.160 --> 06:55.840
real-world applications. Well, of course, not, you know, the title of the talk. You do not need

06:55.840 --> 07:04.160
an ORM. So SQL could be the right language for the job. What would it look like then to embrace

07:04.160 --> 07:10.560
SQL rather than trying to abstract it away? Treat it as an implementation detail. Well, what I would

07:10.560 --> 07:16.400
like to show you? What if, sorry, calling a query was as simple as calling a function or a

07:16.400 --> 07:20.480
method if you're doing object oriented because then we know how to do that in our language of

07:20.480 --> 07:25.120
choice. Whatever that is, we deal with the data that is coming from the database. We know how to

07:25.120 --> 07:31.920
coding programming language x that we like. So I would like to show you a library that tries to

07:31.920 --> 07:36.160
do exactly that. It's called the SQL and the author of this library. Really, it's not the

07:36.160 --> 07:40.000
first of its kind. There's blends of libraries that do this in many different languages.

07:40.640 --> 07:46.560
This is thought for glean though. And I think the best way to get a sense for it because this

07:46.560 --> 07:50.320
might sound a bit and maybe what does it look like to embrace SQL and have a nice developer

07:50.320 --> 07:55.600
experience? Well, it's for with a live jammer, which is quite scary to me if someone could hold

07:55.600 --> 08:05.600
the microphone. So I don't, can I just, okay. Thank you. So let's jump right in. So here we have

08:06.080 --> 08:13.840
a project. And what it means to embrace SQL is we're writing our SQL queries just in plain

08:13.840 --> 08:20.160
old dot SQL files. As boring as it can be, but as nice as it can be. Because now we get, well,

08:20.160 --> 08:25.760
since accelerating, we get the output of IDEs. We might be using an entirely different one. Here,

08:25.760 --> 08:30.640
I'm using, for example, data grip. I think it's quite pleasant. It could be anything really.

08:30.640 --> 08:35.600
So here we're building a ranking of books. So we are giving a number to each book based on its

08:35.600 --> 08:40.480
rating. Number one is going to be the best and so on and so forth. Say we want to make this a

08:40.480 --> 08:46.640
tad more complex. We can turn it into a sub query, call it ranking. And only get the best

08:46.640 --> 08:52.000
three books, for example. So where the position is less than or equal to three. Maybe we want

08:52.000 --> 08:58.880
only the covers of the best three books for each genre. So let's group by genre. I don't think

08:58.880 --> 09:03.520
a sub query is all that nice to read. Let's make it a commentable expression. Much better.

09:03.520 --> 09:08.240
Now we see that we have a warning. Select star. Avoid that because that is all the fetching

09:08.240 --> 09:14.000
data. We never want to do it. We want to be precise about the data that we send to our application.

09:14.000 --> 09:22.800
So let's just select the genre and we want to do an aggregation. So a re aggregation of the

09:22.800 --> 09:30.320
covers. And just like that, we have a cover image, maybe. As covers. And just like that, we have

09:30.320 --> 09:39.120
something really nice. And basically the idea was writing this for us. We have refactorings.

09:39.120 --> 09:43.680
We have water completions. We can move code around pretty easily. SQL has been around for a long

09:43.680 --> 09:47.760
time. And these tools have been developed to making the experience of writing it really nice.

09:48.560 --> 09:54.800
But again, this is just a SQL query. So how do we actually run this? Then from our application,

09:54.800 --> 10:00.400
from our Glim code in this case? Well, to do that, you would add the squiro library as a dependency.

10:00.400 --> 10:07.760
I've already done that because I don't trust the Wi-Fi. And then we can just let run the

10:07.760 --> 10:13.680
squiro entry point. And just like that, you can see that something has happened. It says generated

10:13.680 --> 10:20.560
one query. So squiro is doing under the hood. Just some pretty straightforward code generation.

10:20.560 --> 10:24.960
I'm saying straightforward. It looks straightforward. It's actually pretty involved. But at the end

10:24.960 --> 10:31.840
of the day, what really matters is that what it does is it's understanding our queries, talking to the

10:31.840 --> 10:37.440
database and figuring out what is the shape of the data that we're getting back. So that it can

10:37.440 --> 10:42.320
generate those decoders that glue code that now we no longer own because it's taken care of by

10:42.320 --> 10:48.640
squiro. So it can see, for example, that the squiri with written returns rows that contain two columns.

10:48.640 --> 10:53.680
The genre column that is a string and the covers column look at that. That is a list of strings.

10:53.680 --> 10:58.880
We did an aggregation and it could figure that out pretty right. And then it defines a function

10:58.880 --> 11:05.440
to just call that simple query. And it is exactly the code I showed you earlier. Nothing fancy going on

11:05.440 --> 11:11.360
here. But now it is the one that is taking care of actually making sure that this is right

11:11.360 --> 11:17.840
and stays in sync with our database. So what this looks like when then we are developing our

11:17.840 --> 11:24.480
application is we write the SQL query and then just like that we can start using it. So we can call

11:24.480 --> 11:31.520
the function and we're good to go. We don't have to deal with any other really boring glue code

11:31.520 --> 11:37.360
and maintaining code that is really not important. And since now everything is nice and type safe,

11:37.360 --> 11:41.920
we can get the best tooling out that we want from our language. In the game, for example,

11:41.920 --> 11:48.240
I can automatically match on all the cases we see when talking to our database. We could get an error.

11:48.240 --> 11:52.640
Well, I will not bother with that. Here I'm making something pretty straightforward that I could

11:52.640 --> 11:58.240
manage in our live demo. Let's pretty short. We are just server side rendering some HTML.

11:58.240 --> 12:04.320
So what we could do is we make a list and then we transform those rows that are coming from the

12:04.320 --> 12:14.000
database. So we get those rows and we turn those into maybe a list item. So now this doesn't

12:14.000 --> 12:18.960
exist obviously, but since the language and the language server now can understand what we're

12:18.960 --> 12:23.840
dealing with, we get a lot of help from the language server. We can generate this automatically.

12:23.840 --> 12:28.960
We can automatically put a method to get the data out. We can then start writing some more code

12:28.960 --> 12:36.480
making the list item. We said we were making. So let's build maybe a title with the text that we want,

12:36.480 --> 12:43.760
the genre, for example. And then another list, a list of covers. So let's do that.

12:46.160 --> 12:52.080
So here we can transform those again. We can get those covers. We see we get her those are a list of

12:52.080 --> 12:57.440
strings. And we turn each cover into an image. Maybe to make something that is nice to look at.

12:57.520 --> 13:04.080
We generate it again and we're good to go. So one final list item and this contains

13:04.880 --> 13:16.880
an image. So let's do this source. And that is that cover that we have. And just like that,

13:16.880 --> 13:22.480
we're good to go. We could just focus on what we wanted making a nice to look application.

13:22.480 --> 13:28.880
And so what we have here, if we run it, maybe let's see if it works. Pingers crossed.

13:29.680 --> 13:35.520
Perfect. Seems like it started. And look at that. Okay, pretty right for what I could manage in our

13:35.520 --> 13:40.800
HTML. And we have the books that we're fetching from the live database that is running on my machine.

13:42.800 --> 13:50.160
Okay, let's get back to our presentation. So we have a really nice developer experience here

13:50.240 --> 13:55.040
because we're dealing with just plain old SQL files. And now also think about this. The data

13:55.040 --> 13:58.960
that you have is going to help to leave any application that you're going to build on top of it.

13:58.960 --> 14:03.280
So if you get tired about squirrel and gleam, that would make me really sad. But you can switch

14:03.280 --> 14:08.800
to any other language, take those SQL files, and you're good to go. You're not getting locked in

14:08.800 --> 14:13.920
by a specific language, a specific framework that then you have to keep using to reuse that code.

14:14.560 --> 14:19.040
Errors start happening at build time. The feedback loop is really tight. If we make a type

14:19.040 --> 14:24.000
or an query, well, we're going to see the error right in our terminal. As we generate the code,

14:24.000 --> 14:28.880
hopefully you can appreciate the effort that I put in to making that look nice and presentable.

14:30.320 --> 14:35.840
And so now it's a lot easier to deal with that. We started from a really bad place where to do

14:35.840 --> 14:39.760
with a lot of glue code. And to add them, don't get me wrong. They can have a great developer

14:39.760 --> 14:44.080
experience because they take care of those two steps below. Don't care about the SQL,

14:44.080 --> 14:49.040
you don't have to care about the glue code. Pretty happy, pretty right. But I think that obstructing

14:49.040 --> 14:55.040
the SQL away is a recipe for applications that are slower, not as performant. So what if we want

14:55.040 --> 15:00.800
to embrace SQL? Well, we can just let tools do the boring job. Take care of that glue code

15:00.800 --> 15:07.360
for us automatically. And so we can stick to the best language for the job at each step of the

15:08.320 --> 15:12.160
development. We write SQL when talking to the database. And then we use the language that we

15:12.160 --> 15:17.520
know and love, Glim for me, or whatever other language you prefer, when you're building your

15:17.520 --> 15:24.240
application. And so that is why you do not need an ORM. My name is Jacomo. I thank you so much for

15:24.240 --> 15:27.440
coming here today.

15:27.440 --> 15:39.200
If I can still 30 more seconds, please consider sharing your feedback. If you like this,

15:39.200 --> 15:43.760
let me know that would make my day. If you didn't, let me know how I can improve this. So I can

15:43.760 --> 15:49.280
make this better. If you want to keep in touch, I am on social media. You can send me a good

15:49.280 --> 16:02.640
or fashion email, or you can, of course, this is open source. You can have a look at Squirrel.

