WEBVTT

00:00.000 --> 00:09.000
Question, do you consider yourself a lazy person?

00:09.000 --> 00:10.000
Yes.

00:10.000 --> 00:11.000
Okay.

00:11.000 --> 00:16.000
Like me, you can find this talk very interesting, I think.

00:16.000 --> 00:23.000
So, we are here to talk about Augusti D Kubernetes, of course, and we will see later

00:23.000 --> 00:28.000
how laziness deal with Augusti D and Kubernetes.

00:29.000 --> 00:31.000
Before starting, let me present myself.

00:31.000 --> 00:33.000
I'm Graziano Casto, I'm from Italy.

00:33.000 --> 00:36.000
That's why we are named.

00:36.000 --> 00:37.000
Lugger?

00:37.000 --> 00:38.000
Okay.

00:38.000 --> 00:41.000
Yeah, thank you.

00:41.000 --> 00:43.000
Can everyone hear me?

00:43.000 --> 00:45.000
No.

00:45.000 --> 00:46.000
Lugger?

00:46.000 --> 00:47.000
This way?

00:47.000 --> 00:48.000
Okay.

00:48.000 --> 00:50.000
It's great.

00:50.000 --> 00:53.000
Anyway, I'm a developer election engineer in Akamas,

00:53.000 --> 00:57.000
and Italian company in doing performance engineering stuff on Kubernetes.

00:57.000 --> 00:59.000
I'm an open source contributor.

00:59.000 --> 01:03.000
I'm a tech lead to the CNCF talk developer experience.

01:03.000 --> 01:05.000
And Kubernetes is a really steam member.

01:05.000 --> 01:12.000
If you want to know more about how to contribute to contribute to the developer experience

01:12.000 --> 01:16.000
talk or any other talk in the CNCF or join the really steam

01:16.000 --> 01:18.000
Kubernetes or other seek, let me know.

01:18.000 --> 01:21.000
I'm here around the conference for the whole day.

01:21.000 --> 01:25.000
You can check my link link to connect with me or check my website

01:25.000 --> 01:29.000
and get all my articles and my talks.

01:29.000 --> 01:36.000
By the way, this is me at 3am working on my site projects or working projects.

01:36.000 --> 01:41.000
Something that was wrong and I need to change something on the cluster.

01:41.000 --> 01:47.000
Behind the scene, there is RGCD, of course, managing all the resources.

01:47.000 --> 01:53.000
And at 3am, I do the most reasonable thing that everyone do.

01:53.000 --> 01:58.000
To open Kubernetes and fetching the resources directly on the cluster.

01:58.000 --> 02:01.000
Wait for a couple of menus.

02:01.000 --> 02:04.000
The change is live, everything is okay.

02:04.000 --> 02:07.000
And I go back to sleep.

02:07.000 --> 02:12.000
The problem is that you can have two different situations here.

02:12.000 --> 02:15.000
The first one, self-healing in Nagasaki D is enabled.

02:15.000 --> 02:20.000
So your change is after, let's say, five menus is no more in the cluster.

02:20.000 --> 02:25.000
So you should awake back to fix the issue.

02:25.000 --> 02:30.000
The second one is, okay, I'm going back to sleep.

02:30.000 --> 02:36.000
But tomorrow morning, I will update back my repository to bring back the changes into the repository.

02:36.000 --> 02:38.000
But guys, I'm not only lazy.

02:38.000 --> 02:40.000
I also forget to do something.

02:40.000 --> 02:45.000
And someone else, day after, update the Git repository,

02:45.000 --> 02:48.000
figure the RGCD and new changes go on.

02:48.000 --> 02:52.000
The cluster overriding my changes from the last day.

02:52.000 --> 02:55.000
So at this point, everyone have a problem.

02:55.000 --> 02:58.000
The only point is when.

02:58.000 --> 03:01.000
Soon, if you have self-feeling on the RGCD,

03:01.000 --> 03:04.000
or maybe a couple of days later.

03:04.000 --> 03:09.000
So the problem is not that I'm lazy.

03:09.000 --> 03:11.000
Yes, this is a problem.

03:11.000 --> 03:16.000
The problem is that working with GitOps,

03:16.000 --> 03:21.000
we assume that the Git repository is your source of truth.

03:21.000 --> 03:26.000
That's just a lie, because the Git repository is your source of intent.

03:26.000 --> 03:32.000
It's like watching at a map and see, okay, from the point A to the point B,

03:32.000 --> 03:34.000
it's just a straight line.

03:34.000 --> 03:37.000
But in the middle of this straight line, there is a mounting.

03:37.000 --> 03:41.000
So you can't go straight to reach your destination.

03:41.000 --> 03:43.000
That's the same with the cluster.

03:43.000 --> 03:47.000
Someone else can fetch the resources with Cubsitial.

03:47.000 --> 03:52.000
Your self can fetch the resources and forget to bring back the changes

03:52.000 --> 03:54.000
on the Git repository.

03:54.000 --> 03:57.000
Or you can have some operators or agents that do something

03:57.000 --> 04:00.000
on the cluster without a human in the loop.

04:00.000 --> 04:04.000
So the Android Ruby in this situation is very high.

04:04.000 --> 04:09.000
And you can't be sure that the Git repository

04:09.000 --> 04:14.000
is exactly the state of the cluster at any point into the timeline.

04:14.000 --> 04:18.000
Just to give you a final example of this analogy,

04:18.000 --> 04:22.000
every you, there is someone from Portugal here.

04:22.000 --> 04:25.000
No? Okay, that's one. Sorry.

04:25.000 --> 04:30.000
I traveled for the first time in Portugal last year.

04:30.000 --> 04:35.000
And I was very happy because I was able to find a cheap photo

04:35.000 --> 04:39.000
just 500 meters from the conference venue.

04:39.000 --> 04:43.000
Then I realized when I was there that working for 500 meters in Portugal

04:43.000 --> 04:47.000
is not the same like working for 500 meters in Milan.

04:47.000 --> 04:51.000
In Portugal is more like hiking than just walking.

04:51.000 --> 04:53.000
And that's the same.

04:53.000 --> 04:58.000
By the way, how can we solve this situation?

04:58.000 --> 05:03.000
The idea is to have a mechanism, a process or something we will see later

05:03.000 --> 05:09.000
that do a pre-flight diff when you open new PRs on your Git Observatory

05:09.000 --> 05:15.000
to check that what you have in the exact time you open the PR on the cluster

05:15.000 --> 05:21.000
is what you are actually managing with the new PR you're opening on your Git repository.

05:21.000 --> 05:28.000
This way you are able to think changes if no drift are detecting on the cluster

05:28.000 --> 05:34.000
between the cluster and the repository or block the sync if something is

05:34.000 --> 05:37.000
drifting for the initial state.

05:37.000 --> 05:42.000
And hopefully you can also have a report that, okay, this is the fields

05:42.000 --> 05:50.000
and to your resources that need to be changed to to be compliant with your cluster.

05:51.000 --> 05:58.000
So let's start with the demo and then I will show you all the phases of this pipeline.

05:58.000 --> 06:04.000
I recorded the demo to avoid panic time and I hope that the video is working.

06:04.000 --> 06:08.000
Okay, this is my Git Observatory.

06:08.000 --> 06:18.000
First thing I'm going to apply the RGCD application to deploy my initial state on the cluster.

06:19.000 --> 06:24.000
Okay, now let's check that my services are running.

06:24.000 --> 06:28.000
This is the simple guest book application you can find on the Kubernetes documentation.

06:28.000 --> 06:31.000
It's a front end and not back end, basically.

06:31.000 --> 06:34.000
Okay, everything is running.

06:34.000 --> 06:41.000
Now I will move to the deploy section to scale the replicas for the deployment

06:42.000 --> 06:45.000
from free replicas to five.

06:45.000 --> 06:54.000
Okay, I recorded it yesterday night so maybe something is not very straightforward.

06:54.000 --> 06:57.000
Okay, five replicas.

06:57.000 --> 07:00.000
Now I'm patching the resource.

07:00.000 --> 07:04.000
Let's wait to scale in place.

07:04.000 --> 07:06.000
Okay.

07:07.000 --> 07:15.000
Now back to the Git Observatory where my RGO application is.

07:15.000 --> 07:21.000
First, open the branch to create the PR.

07:21.000 --> 07:24.000
Okay, let's say this is the day after.

07:24.000 --> 07:28.000
Another colleague wants to update the limits for the deployment.

07:28.000 --> 07:31.000
So open the branch, create the PR.

07:32.000 --> 07:36.000
RGO application targeting the main branch on the repository.

07:36.000 --> 07:38.000
This is the front end.

07:38.000 --> 07:41.000
You can see that replicas here is still free.

07:41.000 --> 07:46.000
And I'm going to update CPU and memory request to 400 million.

07:46.000 --> 07:48.000
Okay.

07:48.000 --> 07:50.000
Nice.

07:50.000 --> 07:58.000
Now I can make my changes and I'm opening the PR.

07:58.000 --> 08:02.000
Why I'm using GitHub desktop because I'm lazy.

08:02.000 --> 08:05.000
I don't want to interrupt with the terminal.

08:05.000 --> 08:06.000
Okay.

08:06.000 --> 08:11.000
Create the PR request and now a pipeline and GitHub version is triggered.

08:11.000 --> 08:16.000
That now is triggered.

08:16.000 --> 08:20.000
That basically install a tool as a scroll call call.

08:20.000 --> 08:22.000
I will show you later what call call is.

08:22.000 --> 08:28.000
Basically do as an upshot of your entire cluster state into a temporary folder.

08:28.000 --> 08:31.000
In this case, or you can just do it in another repository.

08:31.000 --> 08:34.000
And do and perform free ways merge.

08:34.000 --> 08:39.000
Between your base branch, your PR branch and your snapshot.

08:39.000 --> 08:45.000
This way it can detect which are the intentional changes you did in your PR.

08:45.000 --> 08:47.000
And which are the brief.

08:47.000 --> 08:54.000
After this track provide you into the GitHub, include PR comments, a comment.

08:54.000 --> 09:06.000
Showing you the diff of only the drift detected.

09:06.000 --> 09:07.000
Okay.

09:07.000 --> 09:12.000
Now it's going to do this snapshot for the cluster.

09:12.000 --> 09:14.000
Okay.

09:14.000 --> 09:18.000
Drift detecting into frontend.yamol.

09:18.000 --> 09:19.000
Okay.

09:19.000 --> 09:21.000
And now the job is completed.

09:21.000 --> 09:24.000
Let's go back to the PR.

09:24.000 --> 09:27.000
This is the error.

09:27.000 --> 09:32.000
Back to the poor request.

09:32.000 --> 09:37.000
You can see that first a new label was added to the PR.

09:37.000 --> 09:43.000
Call code 9 that identifies all the PR that can't be merged because of drift.

09:43.000 --> 09:51.000
And you can see the difference found into the PR compared with the actual state of the cluster.

09:51.000 --> 09:53.000
Now I'm back to the code.

09:53.000 --> 09:56.000
Bring back the change.

09:56.000 --> 10:01.000
Call me to the new change and now the cluster should be aligned.

10:01.000 --> 10:06.000
When the change is pushed, the pipeline is triggered again.

10:06.000 --> 10:10.000
Perform the same check with another snapshot.

10:10.000 --> 10:17.000
So every time you run this GitHub action, a new snapshot is created.

10:17.000 --> 10:23.000
Just to ensure that the cluster doesn't change from one commit to another.

10:23.000 --> 10:30.000
And now after all this process, you can see that the label is changed from Calco.

10:30.000 --> 10:34.000
Then I took Calco proof and the report is basically clean.

10:34.000 --> 10:39.000
You can see some error because to access to cluster level resources.

10:39.000 --> 10:43.000
So I have to configure my user in a certain way.

10:43.000 --> 11:00.000
And just for your demo, it doesn't make sense to do that.

11:01.000 --> 11:10.000
The cool thing is that this is a simple CLI application using the go client for Kubernetes.

11:10.000 --> 11:14.000
And you can access all the resources available in the cluster.

11:14.000 --> 11:19.000
So if you are using, for example CRD for cross plane, whatever.

11:19.000 --> 11:26.000
You can export all so the CRD, not all the way deployment services or whatever.

11:26.000 --> 11:33.000
Calco proof new changes and you basically can merge the PR.

11:33.000 --> 11:37.000
If there are a bunch of tools, you make the main calculation to sort time the name at places.

11:37.000 --> 11:40.000
Sorry, I've left.

11:40.000 --> 11:46.000
Can you limit the drift calculation to some name at places to set off at places?

11:46.000 --> 11:55.000
Yeah, the Calco validates command, I will show you later, have a namespace argument to limit the,

11:55.000 --> 12:00.000
this is an option to only this namespace.

12:00.000 --> 12:07.000
Now, the tool is called Calco is a site project I did it this summer in Croatia.

12:07.000 --> 12:14.000
By the way, you can scan this small QR code, but I will show you the link to access the repository in the documentation.

12:14.000 --> 12:24.000
I created this tool for another use case to have this snapshot to import the resources from one cluster to another.

12:24.000 --> 12:28.000
Then I implemented the auditing feature.

12:28.000 --> 12:37.000
You can create your snapshot repository and scan your new snapshot giving a deep between the last one and the previous one with the report.

12:37.000 --> 12:42.000
And now the use case that validate the git flow process.

12:42.000 --> 12:46.000
The feature itself is very straightforward.

12:46.000 --> 12:53.000
Export, I'm using the several preferred resources from the discovery PI into the go client for Kubernetes.

12:53.000 --> 13:08.000
I synthesized the resources because before creating the manifest removing some annotation or metadata because then there's a very big and commit a push into your GitHub repository.

13:08.000 --> 13:11.000
This is all the phases for my PR first.

13:11.000 --> 13:22.000
The PR is running on the CI pipeline is running on the master branch for every change into the manifest folder.

13:22.000 --> 13:25.000
And this is my RGOSID application.

13:25.000 --> 13:37.000
This is important to disable the self-feeling because in that way you are able to deny RGOSID to override your changes live on the cluster.

13:37.000 --> 13:49.000
Then you can download the Calco tool directly from GitHub releases or you can use brew if you're using macOS, for example.

13:49.000 --> 13:53.000
And the first thing to do is to create your context.

13:53.000 --> 13:59.000
So you create your context by giving the cube a config to access the cluster.

13:59.000 --> 14:06.000
You want to create the temporary folder and give a description to your context.

14:06.000 --> 14:10.000
The second step is the drift check.

14:10.000 --> 14:20.000
First thing you perform the export simple Calco export there is other arguments to customize the export but for this simple use case is enough.

14:20.000 --> 14:24.000
The output is a folder like that.

14:24.000 --> 14:30.000
So your cluster namespaces resource type and single resources.

14:30.000 --> 14:41.000
Then there is the Calco validase command which is the one that performed the free way between the resources.

14:41.000 --> 14:44.000
This command have different arguments.

14:44.000 --> 14:46.000
The first one is the base.

14:46.000 --> 14:53.000
You should give to Calco the base branch for your repository in this case master.

14:53.000 --> 14:59.000
Then the directory where the files are stored, where you want to perform the default.

14:59.000 --> 15:04.000
There's an option here that the temporary directory we created with the Calco export.

15:04.000 --> 15:11.000
Here is the name space argument so that you can filter your resources and perform the validation.

15:11.000 --> 15:14.000
Only so only on the guest book name space.

15:14.000 --> 15:23.000
And the name for the report file which will be used for your comment into the PR.

15:23.000 --> 15:26.000
This is how Calco validates works.

15:26.000 --> 15:29.000
The first thing is identifying the changes into the PR.

15:29.000 --> 15:34.000
So check out the master branch, check out the PR and do this first diff.

15:34.000 --> 15:37.000
For any changes do the free way diff.

15:37.000 --> 15:42.000
In this way he can distinguish between intentional diff.

15:42.000 --> 15:50.000
So the diffs that are into the PR and the drift which are not into the PR but are live into the cluster.

15:50.000 --> 15:53.000
And then generate the report.

15:53.000 --> 15:58.000
Of course now I'm doing this demo to keep it simple with raw manifest.

15:58.000 --> 16:05.000
So final llamals you can use it also with help or customized or other templating system.

16:05.000 --> 16:11.000
Just needed another first step before which is the rendering of the chart.

16:11.000 --> 16:22.000
You are not going to use the Calco validate here because Calco validates just the autopilot to work with final files.

16:22.000 --> 16:27.000
You are moved files which basically iterates over all the files found into the repository.

16:27.000 --> 16:35.000
Calco diff is an internal command that do the same thing basically but file per file.

16:35.000 --> 16:40.000
So you should iterate into the CI pipeline.

16:40.000 --> 16:45.000
I'm working to do a similar thing without just to improve the developer experience.

16:45.000 --> 16:50.000
Let me say this is the final step.

16:50.000 --> 16:55.000
So the labeling is a simple call for GitHub action.

16:55.000 --> 17:04.000
You can get the final report into GitHub document and the label Calco Denio Calco Plough.

17:04.000 --> 17:07.000
This is customizable.

17:07.000 --> 17:09.000
And that's all basically.

17:09.000 --> 17:16.000
Do you have any questions or?

17:16.000 --> 17:28.000
Can you use the scope of the chart also just to make the default one in namespace of a project in a sense?

17:28.000 --> 17:32.000
Can you use the scope of the export or the validation?

17:32.000 --> 17:33.000
Both.

17:33.000 --> 17:34.000
Okay.

17:34.000 --> 17:35.000
That's right.

17:35.000 --> 17:41.000
We have a quick discuss about like 600 namespaces with the project and something like that.

17:41.000 --> 17:42.000
Would be too much.

17:42.000 --> 17:54.000
So for the validation, you can use the namespace guest book to filter the validation process only on one namespace into your export.

17:54.000 --> 18:04.000
If you want to reduce the export for your entire cluster, I'm working to bring the same arguments to the export command.

18:04.000 --> 18:07.000
Right now it's just export anything.

18:08.000 --> 18:09.000
Exactly.

18:09.000 --> 18:10.000
Yes.

18:18.000 --> 18:23.000
So on the same field.

18:23.000 --> 18:24.000
Yeah.

18:26.000 --> 18:28.000
Okay.

18:28.000 --> 18:36.000
So what happened if I'm changing into the PR, the same field that was changed into the cluster.

18:36.000 --> 18:40.000
Like in the example, if I change the replicas.

18:40.000 --> 18:48.000
For example, right now the standard behavior to give it simple is the PR win.

18:48.000 --> 18:54.000
So if you are updating the replicas, the PR wins over the cluster state.

18:54.000 --> 19:05.000
I'm working on that to implement another argument to decide if you want to have it as a drift into the comment and then decide if emerging the project.

19:05.000 --> 19:09.000
If emerging the PR or evaluating it in another way.

19:09.000 --> 19:12.000
Any other questions?

19:29.000 --> 19:31.000
When?

19:31.000 --> 19:32.000
What?

19:40.000 --> 19:41.000
Sorry.

19:41.000 --> 19:42.000
What?

19:43.000 --> 19:46.000
Can you repeat the question?

19:47.000 --> 19:48.000
Okay.

20:01.000 --> 20:04.000
Yes.

20:04.000 --> 20:05.000
Yes.

20:05.000 --> 20:07.000
This is.

20:11.000 --> 20:12.000
Yeah.

20:12.000 --> 20:18.000
This is a not-to-mitted process to avoid you to access from the PR.

20:18.000 --> 20:24.000
And I'm using this tool that actually was born for another use case, which is the cluster not-shocked thing.

20:24.000 --> 20:36.000
Because I created this tool to have an exact copy of the cluster to move resources from one cluster to another from our namespace to another or to input the resources into an RIP for example.

20:36.000 --> 20:41.000
Then the tool was evolved to create a no-dating for the cluster.

20:41.000 --> 20:44.000
So basically the same tool is running into a job.

20:44.000 --> 20:49.000
Create this no-shocked each night and create a report from the last no-shocked to the PR.

20:49.000 --> 20:50.000
This one.

20:50.000 --> 20:52.000
Creating a report to okay.

20:52.000 --> 20:55.000
Today you changed these fields.

20:55.000 --> 21:06.000
Then since the calculation was already up and running into our CI, I used it to implement also this use case.

21:06.000 --> 21:09.000
But yeah, you can use directly our capacity.

21:09.000 --> 21:12.000
Any other question?

21:13.000 --> 21:18.000
There is also another project R goes March D for something or do something similar.

21:18.000 --> 21:19.000
I don't remember the name.

