WEBVTT

00:30.000 --> 00:41.000
All right, can you guys hear me?

00:41.000 --> 00:45.000
All right, so let's get started.

00:45.000 --> 00:49.000
Please, we have only a couple of minutes

00:49.000 --> 00:53.000
because we are going to present here with Kevin

00:53.000 --> 00:58.000
on contribution processes for MariaDB

00:58.000 --> 01:01.000
and for Postgres.

01:01.000 --> 01:06.000
So I'm going to try to be really quick

01:06.000 --> 01:08.000
and then give the floor to Kevin here.

01:08.000 --> 01:11.000
Okay, so let's get going.

01:11.000 --> 01:14.000
I want to walk you through the MariaDB

01:14.000 --> 01:16.000
server contribution process here

01:16.000 --> 01:20.000
and I want to show you a life example.

01:20.000 --> 01:23.000
So first of all, why?

01:23.000 --> 01:27.000
Basically contributing to database code

01:27.000 --> 01:29.000
basis is hard stuff.

01:29.000 --> 01:33.000
I remember my first contribution to the code base

01:33.000 --> 01:37.000
of the MariaDB server took three months to review.

01:37.000 --> 01:40.000
So it's not trivial.

01:40.000 --> 01:44.000
And that's why I want to show you some support

01:44.000 --> 01:47.000
and I want to show you how it's done basically

01:47.000 --> 01:51.000
so that it's not as intimidating as it might be.

01:52.000 --> 01:55.000
Right, so first of all, let's get the facts.

01:55.000 --> 01:58.000
It's a proper, the MariaDB server code

01:58.000 --> 02:00.000
basis is a proper GitHub repository.

02:00.000 --> 02:02.000
It has some extensions.

02:02.000 --> 02:05.000
It has a, like, contributors license agreement

02:05.000 --> 02:08.000
but which checks your pull request

02:08.000 --> 02:10.000
and then there is a build boat

02:10.000 --> 02:14.000
which also tests for regressions, whatever you submit.

02:14.000 --> 02:18.000
And you also need a geera for larger submissions.

02:18.000 --> 02:22.000
This is basically to describe what you are trying to do

02:22.000 --> 02:25.000
in a better way than just a commit message.

02:25.000 --> 02:28.000
Because, well, as I said, it's complex stuff

02:28.000 --> 02:31.000
so it needs better explanation sometimes.

02:31.000 --> 02:35.000
So how to pick a task to work on.

02:35.000 --> 02:39.000
Basically the best advice that I can give you

02:39.000 --> 02:42.000
is pick something that is important to you.

02:42.000 --> 02:45.000
When people work on scratching their own

02:46.000 --> 02:48.000
each is so to say.

02:48.000 --> 02:50.000
It's, they are most productive

02:50.000 --> 02:55.000
and they feel more accomplished when they achieve something.

02:55.000 --> 02:57.000
If you are wondering what to do,

02:57.000 --> 02:59.000
there is a list of beginner-friendly tasks

02:59.000 --> 03:02.000
in our contributing document.

03:02.000 --> 03:04.000
So take a look at those.

03:04.000 --> 03:06.000
They can get to start.

03:06.000 --> 03:08.000
The interesting things there.

03:08.000 --> 03:11.000
Another way of finding contributions

03:11.000 --> 03:14.000
is to basically engage online with other community members

03:14.000 --> 03:18.000
or check geera for open issues that are appealing.

03:18.000 --> 03:21.000
And then, last but not least,

03:21.000 --> 03:24.000
the MariaDB project participates in the Google

03:24.000 --> 03:26.000
summer of cold program.

03:26.000 --> 03:30.000
So that's huge incentive for people working

03:30.000 --> 03:33.000
on open source contributions.

03:33.000 --> 03:35.000
So, right.

03:35.000 --> 03:38.000
This is the contribution process in our nutshell.

03:38.000 --> 03:40.000
So basically you clone the MISK

03:40.000 --> 03:43.000
or MariaDB server repository.

03:43.000 --> 03:45.000
You work on your feature.

03:45.000 --> 03:48.000
And then you submit the pull request from your branch.

03:48.000 --> 03:52.000
And then you make sure that the built pass is okay.

03:52.000 --> 03:54.000
All the regression tests are good.

03:54.000 --> 03:56.000
Then you get a review.

03:56.000 --> 03:59.000
So basically you get a preliminary review by your

03:59.000 --> 04:01.000
truly here.

04:01.000 --> 04:04.000
I try to do that as fast as possible

04:04.000 --> 04:06.000
when the PR appears.

04:06.000 --> 04:08.000
And then we get a final review by an actual developer

04:08.000 --> 04:10.000
owning the cold base.

04:10.000 --> 04:13.000
And then make sure to have the final review

04:13.000 --> 04:17.000
or also push your change because sometimes

04:17.000 --> 04:20.000
they just approve it and they don't merge the PR,

04:20.000 --> 04:23.000
which is a problem, right.

04:23.000 --> 04:24.000
Okay.

04:24.000 --> 04:29.000
And for the case study that I promised,

04:29.000 --> 04:32.000
I probably it's not very visible,

04:32.000 --> 04:36.000
but it is ten lines of actual cold.

04:36.000 --> 04:39.000
Not a huge contribution, really.

04:39.000 --> 04:42.000
But it is something tangible.

04:42.000 --> 04:45.000
It was, as I said, based on a gire,

04:45.000 --> 04:49.000
which describes what was the problem and what

04:49.000 --> 04:51.000
was the fix for it.

04:51.000 --> 04:55.000
Then it also got great test coverage.

04:55.000 --> 05:00.000
All the regression tests were fine and everything.

05:00.000 --> 05:03.000
And then the CLA was signed.

05:03.000 --> 05:07.000
We got the proper treatment on our end properly

05:08.000 --> 05:10.000
There was an active conversation.

05:10.000 --> 05:11.000
I don't know if you see that,

05:11.000 --> 05:14.000
but there is 32 messages going back and forth

05:14.000 --> 05:16.000
for the stands and lines of cold.

05:16.000 --> 05:21.000
So I'm really impressed by the engagement

05:21.000 --> 05:26.000
that the developer showed in that particular case.

05:26.000 --> 05:30.000
There were also several iterations,

05:30.000 --> 05:33.000
several versions of these things was

05:33.000 --> 05:36.000
published and presented by the contributor.

05:36.000 --> 05:39.000
And finally, finally, it got merged

05:39.000 --> 05:45.000
by a very senior developer in the MariaDB community, apparently.

05:45.000 --> 05:48.000
So yeah, it's not hard.

05:48.000 --> 05:50.000
You just need to follow the steps

05:50.000 --> 05:56.000
and you get your name into the contributes list, basically.

05:56.000 --> 06:01.000
So take away a few free to ask me

06:01.000 --> 06:04.000
and communicate thoroughly, communicate often,

06:04.000 --> 06:07.000
and be responsive to request from reviewers

06:07.000 --> 06:09.000
and be nice and don't panic.

06:09.000 --> 06:13.000
So those are my contact data

06:13.000 --> 06:16.000
if you want to talk to me please do.

06:16.000 --> 06:20.000
And with that, I guess I will give the floor to Karen.

06:20.000 --> 06:33.000
And my audible.

06:33.000 --> 06:35.000
Thank you.

06:50.000 --> 07:02.000
Okay, cool.

07:02.000 --> 07:07.000
So thanks, Georgie, for the OE for the MariaDB code base.

07:07.000 --> 07:09.000
So now I'm going to give a similar insight

07:09.000 --> 07:13.000
but into the postgres code like the contribution process.

07:13.000 --> 07:15.000
I'm going to get from the perspective of someone

07:15.000 --> 07:18.000
who's done it just a few months ago for the first time.

07:18.000 --> 07:20.000
I'm relatively early in my career.

07:20.000 --> 07:23.000
And I just want to talk about the postgres contribution process,

07:23.000 --> 07:25.000
what my personal experience was,

07:25.000 --> 07:27.000
and some takeaways for how it works overall,

07:27.000 --> 07:30.000
and maybe you guys can also do it in the future.

07:30.000 --> 07:32.000
My name is Kevin.

07:32.000 --> 07:35.000
So to begin with, I can talk about in how my interest

07:35.000 --> 07:39.000
in postgres began because it was not like a typical way.

07:39.000 --> 07:42.000
So my first job out of college in 2023

07:42.000 --> 07:44.000
was a company called PRDB.

07:45.000 --> 07:47.000
PRDB is an open source ETL tool,

07:47.000 --> 07:48.000
as far as CDC tool,

07:48.000 --> 07:51.000
that moves data from postgres to multiple destinations,

07:51.000 --> 07:53.000
including clickhouse.

07:53.000 --> 07:55.000
PRDB uses logical replication,

07:55.000 --> 07:58.000
which is the postgres feature to read changes

07:58.000 --> 08:01.000
as they happen on postgres.

08:01.000 --> 08:04.000
And I think Rohit and like the plan scale

08:04.000 --> 08:07.000
folks talked about logical application detail earlier.

08:07.000 --> 08:10.000
It is a pretty complex and a pretty intricate feature,

08:10.000 --> 08:12.000
and it's mostly undocumented.

08:12.000 --> 08:15.000
So you end up needing to read postgres code a lot

08:15.000 --> 08:18.000
to make the tool extable and work for all cases.

08:18.000 --> 08:20.000
PRDB was acquired by clickhouse,

08:20.000 --> 08:21.000
which is why I end up here.

08:21.000 --> 08:24.000
And I continue doing the same work as part of clickpipes.

08:24.000 --> 08:26.000
So now I focus on going data from postgres

08:26.000 --> 08:30.000
into clickhouse as reliably as possible.

08:30.000 --> 08:33.000
So let's talk about the postgres code base.

08:33.000 --> 08:37.000
So postgres is a 30 plus year old C code base,

08:37.000 --> 08:40.000
and because of C and C's fairly feature light,

08:41.000 --> 08:43.000
you need to build a lot on top of it

08:43.000 --> 08:45.000
to have a clean and structured code base.

08:45.000 --> 08:48.000
So postgres has its own custom memory subsystem,

08:48.000 --> 08:50.000
has a lot of dynamic dispatch,

08:50.000 --> 08:54.000
has a lot of macro use just to make the code base make sense.

08:54.000 --> 08:57.000
A reading postgres code is not trivial.

08:57.000 --> 08:59.000
It can be very overwhelming.

08:59.000 --> 09:01.000
There's a ton of files each file as you know,

09:01.000 --> 09:03.000
4,000, 5000 lines of code.

09:03.000 --> 09:07.000
So what I did was to isolate an area that I wanted to focus on

09:07.000 --> 09:09.000
as logical application, and you know,

09:09.000 --> 09:11.000
focus on those files, those functions,

09:11.000 --> 09:12.000
you know everything else,

09:12.000 --> 09:14.000
and figure out the area that I wanted to focus on.

09:14.000 --> 09:16.000
One great thing about postgres is postgres

09:16.000 --> 09:17.000
has great commit messages.

09:17.000 --> 09:21.000
So every line of code in postgres has a good commit message

09:21.000 --> 09:23.000
where you can see why that change was made,

09:23.000 --> 09:25.000
and it links back to the mailing list

09:25.000 --> 09:27.000
so the mailing list will get into later.

09:27.000 --> 09:30.000
But like you can read the commit message

09:30.000 --> 09:32.000
and then you can read the links in the commit message

09:32.000 --> 09:34.000
to even get even more context out of it.

09:34.000 --> 09:38.000
So that's how the postgres code base can be very readable

09:38.000 --> 09:41.000
just by the get history.

09:41.000 --> 09:45.000
So until now, like for the past couple of years,

09:45.000 --> 09:48.000
my sort of relationship with the postgres code base

09:48.000 --> 09:49.000
is more of a passive one.

09:49.000 --> 09:52.000
I was the guy who just read the code,

09:52.000 --> 09:54.000
you know figured out customer issues,

09:54.000 --> 09:56.000
like helped things out at the mailing list,

09:56.000 --> 09:59.000
but I wasn't really focused on contributing the postgres myself.

09:59.000 --> 10:03.000
And the reason that change was kind of random.

10:04.000 --> 10:08.000
So what happened was I was reading through postgres docs

10:08.000 --> 10:11.000
and I found a setting called Scram iterations.

10:11.000 --> 10:14.000
This setting, all it does is it just controls a number of times

10:14.000 --> 10:18.000
the password for a user is hashed when creating or authenticating.

10:18.000 --> 10:23.000
It had a maximum value, that's really high maximum value.

10:23.000 --> 10:26.000
So the sort of intrusive thought that entered my head was

10:26.000 --> 10:28.000
what happens when he said this to the maximum value.

10:28.000 --> 10:31.000
I just want to virtual machine

10:31.000 --> 10:33.000
and it stayed running for a month.

10:33.000 --> 10:36.000
Like I left it running and then I realized it just running forever.

10:36.000 --> 10:40.000
So I realized it maybe there's a bug.

10:40.000 --> 10:44.000
So this is the code that postgres had

10:44.000 --> 10:47.000
to actually do this hashing thing.

10:47.000 --> 10:50.000
And maybe a few of you can spot the issue immediately here

10:50.000 --> 10:53.000
where that was causing an infinite loop.

10:53.000 --> 10:57.000
The problem was that like the,

10:58.000 --> 11:02.000
basically if you say to the maximum value of I and then increment I,

11:02.000 --> 11:05.000
it becomes a negative value because of integer overflow.

11:05.000 --> 11:09.000
So because of that, the lesson that equal to sign in the full loop,

11:09.000 --> 11:12.000
it will hit the maximum value, it will still compare through.

11:12.000 --> 11:16.000
Then it'll increment it to a negative value and then loop would never exit.

11:16.000 --> 11:19.000
So the fix for this was actually fairly simple.

11:19.000 --> 11:21.000
You just change the loop to not do that.

11:21.000 --> 11:26.000
So this was a story of how I accidentally found a bug in postgres.

11:26.000 --> 11:29.000
And the fix was like relatively simple.

11:29.000 --> 11:31.000
It is a bug that almost nobody would hit.

11:31.000 --> 11:34.000
But you know, it was something that I had found.

11:34.000 --> 11:36.000
But again, that was half the battle.

11:36.000 --> 11:40.000
Like I made the fix but I actually contribute the fix upstream.

11:40.000 --> 11:42.000
And this is where the postgres, you know,

11:42.000 --> 11:44.000
developer experience comes in a picture.

11:44.000 --> 11:48.000
So unlike RADB, it is not a standard GitHub repository.

11:48.000 --> 11:53.000
Postgres has its own Git repository that is hosted by postgres itself.

11:53.000 --> 11:57.000
And you know, it's the UI is quite different from GitHub.

11:57.000 --> 12:00.000
And the way they are no PRs.

12:00.000 --> 12:01.000
So there's no issue tracker.

12:01.000 --> 12:02.000
There's no PRs.

12:02.000 --> 12:04.000
This pretty much nothing for postgres.

12:04.000 --> 12:08.000
The way you actually commit code is you create a branch locally.

12:08.000 --> 12:10.000
You make your change locally.

12:10.000 --> 12:14.000
And then you use the git format patch command to make a patch.

12:14.000 --> 12:17.000
And that patch is very attached as an email.

12:17.000 --> 12:19.000
So you actually send an email to the mailing list in postgres.

12:19.000 --> 12:22.000
And that email contains, you know, your entire patch.

12:22.000 --> 12:25.000
Even if it's like a thousand lines of code, it will be a single file.

12:25.000 --> 12:28.000
As your attachment to the email you send to the mailing list.

12:28.000 --> 12:34.000
Basically, like an email address that contains like hundreds of people that are in the postgres community.

12:34.000 --> 12:43.000
So yeah, like the commands I run in this translate into the patch file that you know talks about what files my patch changed.

12:43.000 --> 12:45.000
And you know what the changes are.

12:45.000 --> 12:50.000
And this is what I submit as part of my email to fix a change.

12:51.000 --> 12:53.000
So postgres is called commit fest as well.

12:53.000 --> 13:02.000
So this is the postus equivalent of like a code spirit or a review cycle where a bunch of people submit the changes to commit first.

13:02.000 --> 13:04.000
You know, they have reviewers.

13:04.000 --> 13:07.000
They have a version of CI.

13:07.000 --> 13:09.000
You know, people reject PRs.

13:09.000 --> 13:10.000
They approve it.

13:10.000 --> 13:11.000
It gets merged.

13:11.000 --> 13:13.000
Sometimes it gets pushed to the next commit first.

13:13.000 --> 13:17.000
So I think each postgres has like five or six commit first.

13:18.000 --> 13:20.000
And they happen every couple of months.

13:20.000 --> 13:24.000
And this is where most of the postgres review happens because

13:24.000 --> 13:27.000
Postgres doesn't have a single company behind it.

13:27.000 --> 13:29.000
It's all a bunch of people working on like this.

13:29.000 --> 13:31.000
Pat time to review code changes.

13:31.000 --> 13:34.000
So it's kind of decentralized and that's where it's commit first process.

13:34.000 --> 13:39.000
So for me again, like the fix that I had made.

13:39.000 --> 13:42.000
The fix itself was like a few minutes of work.

13:42.000 --> 13:46.000
But it took me a lot more time to figure out how to draft that very first email.

13:46.000 --> 13:49.000
And how to actually attach a patch to Postgres.

13:49.000 --> 13:55.000
So you know, I, that was a mail I sent to Postgres after like some thinking.

13:55.000 --> 13:59.000
And after some like comments and you know,

13:59.000 --> 14:02.000
They're deciding whether it needs tests or not.

14:02.000 --> 14:04.000
In the end, the change was merged.

14:04.000 --> 14:05.000
It was not merged by me.

14:05.000 --> 14:07.000
So I have no commit access to the Postgres report.

14:07.000 --> 14:10.000
Someone merged it on my behalf and I was attributed as author.

14:10.000 --> 14:13.000
So this is kind of how the Postgres,

14:13.000 --> 14:16.000
you know, contribution process works.

14:16.000 --> 14:18.000
Yeah.

14:18.000 --> 14:21.000
So again, this was like a first book I had found.

14:21.000 --> 14:24.000
And around the same time I had made this first commit.

14:24.000 --> 14:29.000
There was a much bigger issue that some customers of PDB were facing.

14:29.000 --> 14:34.000
Where they were running into an issue where the replication slot creation.

14:34.000 --> 14:37.000
So replication slot is basically the Postgres.

14:37.000 --> 14:41.000
The thing that PDB connects to the Postgres to read the changes from Postgres.

14:41.000 --> 14:44.000
The command to create the slot would hang.

14:44.000 --> 14:47.000
And it would hang in a way where you couldn't stop it.

14:47.000 --> 14:50.000
Like even if you tried to control C or send a terminate command to the square E.

14:50.000 --> 14:52.000
It would just get stuck perpetually.

14:52.000 --> 14:53.000
You couldn't really stop it.

14:53.000 --> 14:57.000
And what some customers had to do was to entirely restart the database,

14:57.000 --> 15:00.000
which if it's Postgres and you're asking a customer to restart their Postgres,

15:00.000 --> 15:02.000
it's not a really good look.

15:02.000 --> 15:06.000
So initial sort of thought because we couldn't figure out what the issue was.

15:06.000 --> 15:08.000
Was there an issue with their managed service?

15:08.000 --> 15:10.000
Was there an issue with RDS or GCP or so and so forth?

15:10.000 --> 15:13.000
And what was a smoking gun for me was, you know,

15:13.000 --> 15:16.000
a customer actually sent either S trace output.

15:16.000 --> 15:19.000
So in this is calls from the incident that is having the issue.

15:19.000 --> 15:23.000
And that, you know, helped me track it down.

15:23.000 --> 15:26.000
So this is like a pretty intricate issue,

15:26.000 --> 15:29.000
which I did write a blog post on if you're more interested in details.

15:29.000 --> 15:32.000
But what was actually the issue was,

15:32.000 --> 15:35.000
so when creating a replication slot,

15:35.000 --> 15:39.000
there is a step that requires waiting for older transactions

15:39.000 --> 15:41.000
to complete.

15:41.000 --> 15:44.000
This code doesn't function properly on read replica.

15:44.000 --> 15:46.000
So on the primary Postgres instance,

15:46.000 --> 15:47.000
it works just fine.

15:47.000 --> 15:50.000
But on the read replica or hot standby,

15:50.000 --> 15:52.000
a bit of this code doesn't function properly.

15:52.000 --> 15:54.000
So it doesn't wait on a transaction,

15:54.000 --> 15:57.000
but it still thinks a transaction is running.

15:57.000 --> 15:59.000
So it just, you know,

15:59.000 --> 16:04.000
continues the loop indefinitely because it's not waiting.

16:04.000 --> 16:06.000
It's just checking and then it keeps failing.

16:07.000 --> 16:09.000
The problem with this loop is,

16:09.000 --> 16:12.000
if this loop has you never checks for interrupts.

16:12.000 --> 16:14.000
Like, even if you signal this back end to,

16:14.000 --> 16:16.000
you know, terminate or cancel or whatever,

16:16.000 --> 16:19.000
because in read replica,

16:19.000 --> 16:22.000
this code doesn't function the expected way.

16:22.000 --> 16:25.000
It would never receive the signal and so until this loop

16:25.000 --> 16:27.000
exits due to a transaction completing,

16:27.000 --> 16:31.000
the process would become unkillable.

16:31.000 --> 16:35.000
So this solution was not like simple,

16:35.000 --> 16:37.000
like isolated fix.

16:37.000 --> 16:39.000
So I submitted a patch for this,

16:39.000 --> 16:41.000
the immediate fix, which is to just let a customer

16:41.000 --> 16:44.000
stop the query, not let it run forever,

16:44.000 --> 16:46.000
which was to just allow this lot creation process

16:46.000 --> 16:48.000
to be interrupted by signals,

16:48.000 --> 16:49.000
even on read replica,

16:49.000 --> 16:52.000
in a respective of, you know, what was happening.

16:52.000 --> 16:54.000
This was committed to the Postgres code base,

16:54.000 --> 16:55.000
so it is in now,

16:55.000 --> 16:58.000
most Postgres minor versions.

16:58.000 --> 17:01.000
There was a follow-up patch by a different member of the community,

17:01.000 --> 17:03.000
so I'm not mean to highlight this way,

17:04.000 --> 17:06.000
so the highlight the fact that we are stuck in this state,

17:06.000 --> 17:08.000
and this patch,

17:08.000 --> 17:10.000
unfortunately, has not been merged yet,

17:10.000 --> 17:12.000
so even though it like a pretty small change

17:12.000 --> 17:14.000
which is highlight the thing,

17:14.000 --> 17:17.000
it was not merged right now.

17:17.000 --> 17:20.000
A long term fix is to just not need this loop at all

17:20.000 --> 17:21.000
and to have read replicas,

17:21.000 --> 17:24.000
you know, just wait for transaction the more efficient manner.

17:24.000 --> 17:27.000
This, and for setting is a lot more thought,

17:27.000 --> 17:30.000
a lot more expertise from the folks who know Postgres,

17:30.000 --> 17:32.000
so it needs to be figured out and then committed.

17:32.000 --> 17:35.000
This is definitely not something that I can do right now.

17:35.000 --> 17:38.000
So the conclusion for me,

17:38.000 --> 17:41.000
like, like this time last year,

17:41.000 --> 17:43.000
I had no idea how to contribute to Postgres,

17:43.000 --> 17:45.000
and you know, I figured it out over time,

17:45.000 --> 17:49.000
but the weird part for me was like the hardest part of all this

17:49.000 --> 17:51.000
was not necessarily the code itself.

17:51.000 --> 17:53.000
It was just convincing myself that, you know,

17:53.000 --> 17:54.000
I had something worth saying,

17:54.000 --> 17:57.000
I had not, like it is not something I was hallucinating,

17:57.000 --> 18:00.000
it's actually something that I found out was a bug

18:00.000 --> 18:03.000
and sending that very first email to the mailing list saying,

18:03.000 --> 18:04.000
I found this issue.

18:04.000 --> 18:06.000
This is true of open source,

18:06.000 --> 18:08.000
in general, even if you find an issue,

18:08.000 --> 18:10.000
it may actually not be an issue,

18:10.000 --> 18:12.000
or even if it is sometimes a contribution

18:12.000 --> 18:13.000
to make a rejected or ignored.

18:13.000 --> 18:16.000
But despite all that, like regardless of that,

18:16.000 --> 18:19.000
like if you have something to contribute to any open source

18:19.000 --> 18:22.000
whatever, it's Postgres or MariaDB or Clickhouse or whatever,

18:22.000 --> 18:24.000
even if it's a bug reported,

18:24.000 --> 18:26.000
Docs update, like just send it,

18:26.000 --> 18:29.000
and I feel that open source projects need more eyes

18:29.000 --> 18:31.000
on them and not feel it.

18:31.000 --> 18:32.000
Thank you.

18:32.000 --> 18:35.000
Thank you.

