WEBVTT

00:00.000 --> 00:08.000
Well, thank you so much.

00:08.000 --> 00:14.000
So, over the last couple of years, I've been leaning more and more into a lot of, like,

00:14.000 --> 00:19.000
I don't even say more advanced, but like, beyond tables in Postgres.

00:19.000 --> 00:23.000
So, like, views and functions and triggers and stuff.

00:23.000 --> 00:28.000
And I hit a wall when I was trying to figure out the best way to migrate my schema.

00:28.000 --> 00:32.000
And so, I started to go digging into how I wanted to do that.

00:32.000 --> 00:36.000
And I ended up rolling my own.

00:36.000 --> 00:40.000
This talk is not about the tool that migrates my schema.

00:40.000 --> 00:44.000
But it's about how I was able to solve a really critical question.

00:44.000 --> 00:51.000
As part of that, which is, how do I end the tool that truly understands the Postgres schema?

00:51.000 --> 00:57.000
Not just like running the, you know, SQL files, not running migrations against it, not storing it.

00:57.000 --> 00:59.000
But like, actually understanding it.

00:59.000 --> 01:03.000
What objects exist? How do they relate? And what depends on what?

01:03.000 --> 01:05.000
So, I know I had two options.

01:05.000 --> 01:09.000
I could write an SQL parser.

01:09.000 --> 01:14.000
I could parse those, create table, create function, create view, all of that.

01:14.000 --> 01:18.000
Handle all the edge cases and keep up to date with every Postgres release.

01:18.000 --> 01:23.000
Or, option B, I could just ask Postgres.

01:23.000 --> 01:26.000
It already parses SQL. That's kind of its whole job.

01:26.000 --> 01:32.000
And so, what if I let it do all the work and ask it what it knows?

01:32.000 --> 01:36.000
And also, I really, really didn't want to write a SQL parser.

01:36.000 --> 01:40.000
Because, like, what I have wanted to parse this, I mean, there's, look at that.

01:40.000 --> 01:43.000
We've got, like, a returns table with, like, four different arguments.

01:43.000 --> 01:46.000
Some of, there's sudden array with a default up there.

01:46.000 --> 01:49.000
There's a bunch of JSONB stuff down below.

01:49.000 --> 01:51.000
And even if you thought you could, like, skip over the whole thing.

01:51.000 --> 01:54.000
Notice that there's, like, an end for the begin atomic.

01:54.000 --> 01:56.000
And then there's also an end for the case statement.

01:56.000 --> 01:59.000
I knew I would just be messing the whole thing up the whole time.

01:59.000 --> 02:02.000
So, I didn't want to go down that path.

02:02.000 --> 02:06.000
And if you look at, like, what Postgres knows about a function like that.

02:06.000 --> 02:09.000
You know, it's got the name, it's got the argument types.

02:09.000 --> 02:13.000
It's got all these names. It's got the whole tree, whether it was stable.

02:13.000 --> 02:16.000
And that's kind of the information that I want to have.

02:16.000 --> 02:19.000
So, how do I go about actually setting that up?

02:19.000 --> 02:22.000
So, I was pretty simple.

02:22.000 --> 02:25.000
And I kind of leaned on this, like, shadow database pattern,

02:25.000 --> 02:29.000
which I've, so elegantly put up here in three steps,

02:29.000 --> 02:32.000
which is, you start with actual raw SQL files,

02:32.000 --> 02:36.000
so whatever you got, one file, many files, just whatever.

02:36.000 --> 02:39.000
And then you need some form of, like, a temporary postgres instance.

02:39.000 --> 02:42.000
So, a container would make sense, but, you know,

02:42.000 --> 02:46.000
if you just have some local running instance to, like, you can use that.

02:46.000 --> 02:51.000
And then you load it all in there, and you start in the system catalogs,

02:51.000 --> 02:56.000
which is super queryable, and you can get back everything you have.

02:56.000 --> 03:00.000
Of course, this does mean that you have to have a running postgres instance,

03:00.000 --> 03:05.000
but I'm kind of assuming you already do if you're doing some work with Postgres.

03:05.000 --> 03:07.000
Or at least have an easy way to access it.

03:07.000 --> 03:10.000
So, there's a whole bunch of system catalogs.

03:10.000 --> 03:13.000
You may have explored some of these before.

03:13.000 --> 03:15.000
I think there's, like, more than 50.

03:15.000 --> 03:18.000
You know, the big ones, PG classes like the big one.

03:18.000 --> 03:20.000
Anything that's, like, a table or, like, table,

03:20.000 --> 03:23.000
S, like, has columns or anything like that.

03:23.000 --> 03:25.000
The attributes or the, you know, columns on there,

03:25.000 --> 03:30.000
I'm not going to be this entire slide, but there's a lot of good stuff in there.

03:30.000 --> 03:34.000
And so, you can start, you know, querying them,

03:34.000 --> 03:36.000
and so you have a little table here, and you say, okay,

03:36.000 --> 03:39.000
what's going to be, you know, what are my columns, what are the defaults,

03:39.000 --> 03:42.000
and, you know, it takes a lot of reading the Postgres docs,

03:42.000 --> 03:46.000
but you can really start to understand some pretty rich details about

03:46.000 --> 03:47.000
about what you're looking at.

03:47.000 --> 03:50.000
And what's really cool is, you know, you're not even looking at the

03:50.000 --> 03:52.000
Create statement that would have created this table,

03:52.000 --> 03:55.000
but, like, I'm sure you can visualize in your head, like,

03:55.000 --> 03:56.000
what it would have looked like.

03:56.000 --> 03:59.000
I think everybody would have had slightly different formatting of what that

03:59.000 --> 04:01.000
Create statement would look like, but the, you know,

04:01.000 --> 04:03.000
the overall vibes are still there.

04:03.000 --> 04:06.000
And so, then if you've probably already done stuff like this before,

04:06.000 --> 04:07.000
so I won't belabor it.

04:07.000 --> 04:11.000
But the interesting bit is when we start getting into dependencies.

04:11.000 --> 04:13.000
Because order matters.

04:13.000 --> 04:15.000
You've probably seen something like this before.

04:15.000 --> 04:17.000
You go to, you know, drop a table, which, you know,

04:17.000 --> 04:19.000
we all do in production all the time.

04:19.000 --> 04:23.000
And Postgres says no, because it's a bunch of stuff depends on it.

04:23.000 --> 04:26.000
Or you try to, like, drop a type, because you want to, you know,

04:26.000 --> 04:27.000
do something with it.

04:27.000 --> 04:31.000
And it, again, says no, there's a whole bunch of dependencies on it.

04:31.000 --> 04:35.000
And so, if I'm trying to generate some TDL for, like,

04:35.000 --> 04:39.000
migrations or scheme a dump or whatever, that order really, really matters.

04:39.000 --> 04:43.000
And you can't create a view that references a function that doesn't exist yet.

04:43.000 --> 04:47.000
And I can't drop a table that a function still needs without using a cascade,

04:47.000 --> 04:49.000
but it's a little scary.

04:49.000 --> 04:52.000
And so, you know, we have this little dependency graph here,

04:52.000 --> 04:54.000
but how do we get it?

04:54.000 --> 04:57.000
Well, it turns out Postgres has the graph,

04:57.000 --> 04:59.000
and it's in something called PG-depend.

04:59.000 --> 05:03.000
And so, you know, Postgres wasn't just complaining

05:03.000 --> 05:05.000
retroactively when you try to drop things.

05:05.000 --> 05:08.000
It is storing the entire dependency graph in PG-depend.

05:08.000 --> 05:15.000
Every dependency is a row in this kind of view here that I've kind of adjusted a little bit,

05:15.000 --> 05:19.000
because those should be object IDs, and I put the names in there for readability here.

05:19.000 --> 05:25.000
But you can see that the pending order account function depends on the pending order's view,

05:25.000 --> 05:28.000
and the pending order's view depends on the order's status,

05:28.000 --> 05:29.000
and that's it.

05:29.000 --> 05:33.000
That's our whole chain, and we can look right there.

05:33.000 --> 05:36.000
And so, from there, to figure out some ordering,

05:36.000 --> 05:40.000
the graph gives you all the edges, and then you can just do a little topological sort.

05:40.000 --> 05:42.000
You go one direction if you need to do pre-ates.

05:42.000 --> 05:45.000
You go the other direction if you need to do some drops.

05:45.000 --> 05:50.000
However, there are a few nuances that you should be aware of.

05:50.000 --> 05:54.000
There was a column a couple slides ago that I kind of glossed over,

05:54.000 --> 05:56.000
and that was this dependency type.

05:56.000 --> 05:59.000
And there's more than just the four I have labeled here,

05:59.000 --> 06:02.000
but I'll go through them kind of quickly.

06:02.000 --> 06:04.000
So N is for normal.

06:04.000 --> 06:06.000
These are like the real dependencies between the objects.

06:06.000 --> 06:08.000
You created something, it depends on something else.

06:08.000 --> 06:11.000
You know, the kind of examples I gave before.

06:11.000 --> 06:14.000
The views depend on a function, it depends on table, whatever.

06:14.000 --> 06:17.000
Now the next one to start getting a little more interesting.

06:17.000 --> 06:18.000
A is automatic.

06:18.000 --> 06:22.000
Those are going to be like other objects that you could drop yourself,

06:22.000 --> 06:24.000
but they're created together.

06:24.000 --> 06:27.000
So like if you have a serial column, it creates a sequence,

06:27.000 --> 06:31.000
or there's a primary key has an index.

06:31.000 --> 06:34.000
So they're created together and they're kind of linked.

06:34.000 --> 06:36.000
And then there's I for internal.

06:36.000 --> 06:39.000
This is going to be stuff like your toast or some of the array types,

06:39.000 --> 06:42.000
which we'll get to in a moment, and then ease our extensions.

06:42.000 --> 06:47.000
And so like if you're querying something and the extension created a function,

06:47.000 --> 06:51.000
then you'll see that here.

06:51.000 --> 06:54.000
And so you start kind of going through these,

06:54.000 --> 06:57.000
and there are some catches, though.

06:57.000 --> 07:02.000
And so PG depend can't see stuff that's in string literals.

07:02.000 --> 07:07.000
By default, Postgres will enforce that the stuff exists at create time.

07:07.000 --> 07:11.000
But you generally can't create a function referencing a table that doesn't exist,

07:11.000 --> 07:13.000
but the dependency agent actually tracked.

07:13.000 --> 07:18.000
You could drop the table later and like PG depend won't care.

07:18.000 --> 07:22.000
And so for some of this stuff, there's really easy fix.

07:22.000 --> 07:24.000
If you're somewhat recent version of Postgres,

07:24.000 --> 07:26.000
which is just used begin atomic.

07:26.000 --> 07:30.000
And then it actually parses the whole query and stores it,

07:30.000 --> 07:32.000
and then it knows all of those things.

07:32.000 --> 07:35.000
It doesn't just sort as a string.

07:35.000 --> 07:40.000
So now we know that the function body depends on a whole bunch of things.

07:40.000 --> 07:44.000
That only works if you're using language SQL.

07:44.000 --> 07:49.000
So if you need PLPGSQL or any of the other things for any other reasons,

07:49.000 --> 07:51.000
then it's back to storing as a string.

07:51.000 --> 07:53.000
And so there's no fix for that.

07:53.000 --> 07:56.000
So you should use bigototomic, whether you can,

07:56.000 --> 07:59.000
but for everything else, you know, know the limits.

07:59.000 --> 08:02.000
And then I started to talk about this a little bit when we're talking about the dependency types.

08:02.000 --> 08:05.000
But there's implicit objects.

08:05.000 --> 08:08.000
So in this example here, you create a table.

08:08.000 --> 08:10.000
We have this serial primary key.

08:10.000 --> 08:13.000
And so behind the scenes, it's creating a constraint.

08:13.000 --> 08:16.000
It's creating an index and it's creating that sequence.

08:16.000 --> 08:18.000
And those are those dependency type A.

08:18.000 --> 08:22.000
At the end of the day, PG depend is mostly just that they're to yell at you

08:22.000 --> 08:24.000
if you try to drop something.

08:24.000 --> 08:30.000
And so these are A's, which means it'll just drop the other things if you do it won't stop you.

08:30.000 --> 08:33.000
But it's relevant information.

08:33.000 --> 08:35.000
And then there's array types.

08:35.000 --> 08:37.000
And these ones have a little bit of a different quirk.

08:37.000 --> 08:42.000
So like every type, you get as a array sibling.

08:42.000 --> 08:48.000
So if you create this order status, what you're actually going to see in the type catalog is you've

08:48.000 --> 08:49.000
loaded status.

08:49.000 --> 08:51.000
And then it's generally prefix with an underscore.

08:51.000 --> 08:54.000
But I think there's some sort of like collision detection things.

08:54.000 --> 08:55.000
So maybe it's different.

08:55.000 --> 08:59.000
Definitely don't like do like if it starts with an underscore thing.

08:59.000 --> 09:00.000
That's that's bad.

09:00.000 --> 09:02.000
But it creates these two.

09:02.000 --> 09:05.000
And that's that internal dependency type.

09:05.000 --> 09:14.000
And so what's interesting is now if I create a table that has something that uses that array type.

09:14.000 --> 09:17.000
Now I have to make two different jumps on PG depend.

09:17.000 --> 09:22.000
Because technically the table depends on that internal array type.

09:22.000 --> 09:29.000
And then internal array type looks at the actual type.

09:29.000 --> 09:33.000
So you've got to be able to follow those jumps there.

09:33.000 --> 09:39.000
And it's a somewhat similar story depending on what you need to do with the extension types as well.

09:39.000 --> 09:44.000
And so you also want to be a little bit honest with some of the trade-offs with this approach.

09:44.000 --> 09:46.000
So you do need money postgres.

09:46.000 --> 09:48.000
We talked about that one already.

09:48.000 --> 09:53.000
You do need to build support for like every object type explicitly.

09:53.000 --> 10:02.000
There's a lot of catalog tables and there's a lot of different things that you might not know about for all the different types of objects.

10:02.000 --> 10:07.000
So you kind of have to actually read the docs and codify all those things.

10:07.000 --> 10:08.000
So like this isn't magic.

10:08.000 --> 10:11.000
You're still offloading all the parsing to postgres.

10:11.000 --> 10:15.000
But you still need to build the queries that understand all those things.

10:15.000 --> 10:20.000
The postgres matters. Postgres versions do matter catalogs do evolve.

10:20.000 --> 10:30.000
They don't evolve as quickly as the I would say like the actual like language like syntax does.

10:30.000 --> 10:33.000
But things do change like that as well body.

10:33.000 --> 10:36.000
Does require a later version of postgres.

10:36.000 --> 10:40.000
And so sometimes we'll need to do some switching there if you try to support multiple versions.

10:40.000 --> 10:45.000
And then most importantly and this kind of relates to all these points you just review whatever out.

10:45.000 --> 10:52.000
But you have don't try to do anything super magical because there might be some feature that you didn't know existed and so you're not looking at.

10:52.000 --> 10:55.000
So don't just blindly trust it because postgres is doing the parsing.

10:55.000 --> 10:59.000
Like you still have to do some critical thinking of your own.

10:59.000 --> 11:05.000
So yeah overall we've got the shadow database let postgres to the parsing.

11:05.000 --> 11:07.000
We've got to clear the catalogs.

11:07.000 --> 11:15.000
Look through all the catalogs read all the docs implemented all PG depend is your friend for basically anything that requires ordering ever.

11:15.000 --> 11:22.000
So that's going to be you know the graph you can filter on the dependency types to know which ones you care about and how you want to treat them.

11:22.000 --> 11:28.000
Definitely know the blind spots and watch out for those implicit objects and then sort the whole thing.

11:28.000 --> 11:34.000
And so to kind of talk about the thing that IT's in the beginning I did build a tool that used all of this.

11:34.000 --> 11:41.000
This was my migration tool that needed to have really intimate knowledge of all of the different objects in the tree.

11:41.000 --> 11:49.000
So again this isn't to talk about that but it is schema as code for postgres so you add to the SQL files.

11:49.000 --> 11:55.000
We generate the diff from there and then let you review it and do everything you'd want to do there.

11:55.000 --> 12:02.000
And so you can find it on GitHub or the little website there you know give me a star or an issue or maybe even a PR.

12:03.000 --> 12:17.000
But of course all of these things we just talked about aren't just for migrations you could use it for drift detection some sort of schema lenders CI validation you know visualizations are always super fun.

12:17.000 --> 12:27.000
And that is so nervous about fitting under 20 minutes that I was able to get under 15 so that's the whole idea.

12:27.000 --> 12:39.000
Postgres knows your schema it parses SQL that's its whole point tracks dependencies and if you understand the types and columns then you don't have to reimplement it just ask postgres.

