WEBVTT

00:00.000 --> 00:11.040
Hello. Good afternoon. My name is Miguel Wartz. I'm here with my colleague Ormergi to present

00:11.040 --> 00:17.440
this talk titled GoBGP or Go Home, which is about simplifying keyword ingress with your

00:17.440 --> 00:25.200
favorite routing protocol, which is BGP. So first, the agenda. We will begin with the motivation

00:25.280 --> 00:38.400
for having, oh, for simplifying. Is the audio on it? Well, well, well, I don't know. Better now?

00:42.480 --> 00:52.720
Okay, no, let's fix this. I think that you're stupid. I'll shout if I have to. Better?

00:53.520 --> 01:07.360
Well, I think this is M-D-Wab-Del-Suck. But again, I will try to speak as loud as I can. So first, motivation.

01:09.920 --> 01:18.480
Well, if you keep at this, nobody was going to ear anything. But, so I think we'll just carry on

01:18.480 --> 01:26.160
and I apologize for people at home. First, why should we care about simplifying ingress? What is

01:26.160 --> 01:34.160
there for us in this second thing? Why BGP? And then we will make a brief BGP introduction.

01:34.880 --> 01:42.480
We will describe the use cases we're tackling. We will describe our solution, show a couple of demos,

01:42.480 --> 01:47.840
because it's always cool to see things that work and we'll wrap this up with a few conclusions.

01:48.960 --> 01:54.640
So, I'm going to start this with a show of hands to make some sense out of this. So,

01:55.840 --> 02:02.240
who of you have used Kubernetes in the last years? Yeah, I guess a lot of you cube-vert.

02:03.360 --> 02:09.280
Okay, some just a question. How do you expose a cube-vert VM to the outside world like

02:09.280 --> 02:16.800
outside your cluster you want to access in? Do I hear service anywhere? I heard service somewhere.

02:18.480 --> 02:24.240
Yeah, you would use a service. And how would you implement a Kubernetes service?

02:25.840 --> 02:34.320
Well, the solution is somewhere in the slide and it's not. Thing is, not breaks your performance.

02:34.320 --> 02:41.520
We don't want that. Plus, some verticals based some of them because of performance, but for

02:41.600 --> 02:48.320
other reasons, they really hate that thing. Verticals like telco, financial services,

02:48.320 --> 02:53.680
they really dislike having this. So, we want to try to avoid that. We do not want to have to

02:53.680 --> 03:02.480
expose a workload or a couple of ports on a workload using that. And on another separate topic,

03:02.480 --> 03:08.640
we want to simplify the Kubernetes admin life. We don't want them to have to be manually setting

03:08.640 --> 03:14.240
up routes on your clusters. So, these are two things. Simplified the admin's life and getting

03:14.240 --> 03:23.040
great of not complexity. This is what drives us. Now, YBGP. First of, it's an industry standard.

03:23.040 --> 03:30.640
It scales a lot and it's proven to work. Well, it powers the whole internet according to what they say.

03:31.440 --> 03:36.560
Another thing, policy control. You don't have to expose everything. You can have policy to

03:36.640 --> 03:42.320
describe what you want to import, what routes you're interested on, what things or what routes

03:42.320 --> 03:48.960
you want to expose to the outside world. Does you have a lot of control over how you design your

03:48.960 --> 03:54.880
traffic segmentation. And it is a routing protocol. You just deploy it. Your routers will speak

03:54.880 --> 04:02.320
to one another and they will scatter your network or the way to reach your network throughout

04:02.320 --> 04:05.920
the entire fabric. And those are the things that we are interested on.

04:07.440 --> 04:14.000
Very brief introduction about a BGP like fun facts, I guess, like this thing. I mean, some of

04:14.000 --> 04:19.440
you today could like be inventing the next cool thing like BGP. If you go for dinner, have a few

04:19.440 --> 04:24.800
beers, write a state machine on an app can. This could be it. 20 years from now or 40,

04:24.800 --> 04:30.560
we'll be here hearing about it. Now, going back to what we're here. We are providing

04:30.640 --> 04:36.960
integration between Q-Virt, so we run virtual machines and Kubernetes clusters and BGP.

04:36.960 --> 04:42.880
And for that, we are using a particular C&I server, one particular C&I plugin. This one

04:42.880 --> 04:47.120
work with all of them. This work with one specific one. It is often Kubernetes.

04:47.920 --> 04:55.280
I'm a Kubernetes, a CFCF project. And it pretty much translates from Kubernetes API. So things like

04:55.280 --> 05:03.200
services, pubs, network policy, and translates that into oven lingo or oven entities.

05:04.160 --> 05:11.200
Good question. What is oven? So oven is an SDN control plane to open a V-switches. So it just

05:11.200 --> 05:16.320
orchestrates a bunch of open V-switches, which would run each on a dedicated note.

05:17.840 --> 05:22.640
And it supports Q-Virt virtual machines. By this, we mean that it has

05:23.600 --> 05:29.120
a special code. Like he understands that a pod holds a VM and does special stuff to it.

05:29.680 --> 05:35.040
And it is the default network provider for, well, for at least one downstream distribution

05:35.040 --> 05:41.920
of Kubernetes. Then how do we implement BGP? This is implemented using open source. So we are using

05:41.920 --> 05:49.760
FRR. FRR is pretty much like an open source implementation of a router. And in the Kubernetes

05:49.760 --> 05:56.000
ecosystem, that is exposed to the cluster or packaged into the cluster with FRR Kubernetes,

05:56.720 --> 06:01.520
which integrates with oven Kubernetes. In a way, we will see in a few minutes.

06:03.120 --> 06:10.400
So how does this work in real life? So each of the nodes in your cluster will be running an FRR

06:10.400 --> 06:17.440
Kubernetes pod, which means that it is running a package one version of FRR.

06:18.400 --> 06:24.880
And this router will communicate with your cluster edge router, which will propagate

06:24.880 --> 06:31.040
your routes or import routes from the fabric, which could be the internet, and would connect to,

06:31.040 --> 06:36.720
let's say, your thing that is running somewhere else via a provider edge router for you to access

06:36.720 --> 06:42.240
your provider network. Now for the use cases, I will leave it to my colleague, order.

06:48.160 --> 06:56.000
Hello. So let's present a few use cases that we find interesting in our customers.

06:56.000 --> 07:04.800
So the first use case is to get in access to the provider network. This is the painful point where

07:04.800 --> 07:09.760
you need to mess around with services and these kind of stuff. So the use case is basically

07:10.400 --> 07:17.360
to connect to enable a pod or VM to access your cloud network or provide a network.

07:19.120 --> 07:26.720
So we do that by importing the provider network routes into the cluster network. We use BGP for that.

07:27.440 --> 07:33.360
And we have a few custom resources that we can to control it. We will show we will see it in a few

07:33.360 --> 07:44.560
moments in the demo. And this resource called FRR configuration. Remember this name,

07:44.560 --> 07:50.480
we will see it in a few. And the other use case is the other way around, is to expose your cluster

07:50.480 --> 08:01.920
network provider into the outside. It means that you enable services running on your cloud network

08:01.920 --> 08:10.160
to tap into your VMs or reports. We do that with two resources. The same as I mentioned

08:10.160 --> 08:18.240
earlier, the FRR configuration. And another resource called root advertisement.

08:20.080 --> 08:29.520
And now let's get to a few implementation details. So API, we have two resources as I mentioned.

08:29.520 --> 08:38.000
One is FRR configuration. This resource controls with whom to peer with. It controls

08:38.000 --> 08:46.000
who are my neighbors, which means the BGP routers that I want to connect to and to manage the

08:46.000 --> 08:53.920
sessions, what their autonomous system number and which node to consume this configuration.

08:54.240 --> 09:03.440
It's provide granular configuration that allows you to take subset of nodes and do the BGP

09:03.440 --> 09:10.640
sessions over them and not your whole cluster. And the other resource called route advertisement.

09:11.600 --> 09:17.760
This one controls what to advertise in your BGP session. It means it can be

09:17.760 --> 09:23.840
connected to your cluster, different network, you know the name or use of the fine networks,

09:23.840 --> 09:28.880
secondary networks. And which FRR configuration to use for that?

09:31.440 --> 09:38.800
Yeah, this is the resource set. So let's talk about the relationships between

09:38.880 --> 09:45.120
organ K, organ Kubernetes and FRR instances. So we have organ Kubernetes.

09:46.000 --> 09:55.520
It monitors your each node kernel routing tables and it provisions FRR configurations

09:56.480 --> 10:01.760
according to your to the API, to your configuration that you want to make.

10:02.720 --> 10:11.520
FRR Kubernetes take these configurations and translate it to the router configuration, running

10:11.520 --> 10:21.600
your each node instance. It's pretty straightforward. This is how it's done. In addition to

10:21.600 --> 10:28.480
FRR configuration, organ Kubernetes also monitors the route and that my advertisement,

10:28.560 --> 10:37.200
a resource that I mentioned earlier. And according to that, again, translated to FRR configuration

10:37.200 --> 10:48.880
and so on until your router, no router configuration. And one more nice topic is the failover

10:48.880 --> 10:56.800
capability using the by B-directional folding detection for BFD for short. This is basically

10:56.880 --> 11:05.040
what makes BGP to enable a high availability and failover to the next best route.

11:05.840 --> 11:10.160
So BFD is the protocol for detecting fault the links between two devices,

11:10.160 --> 11:18.480
device can be switched to router or router. BGP utilizes it to detect broken links and converge

11:18.560 --> 11:26.160
to a 12-tran 8-1. And it can show rapid failover and high availability, which is essential

11:26.800 --> 11:35.600
for production application and minimize the downtime. And with that, we will see a few demos when

11:35.600 --> 11:42.640
we get what we do. So I'm going to try this now. Yeah, we're still a lot better. Should have thought

11:42.640 --> 11:50.960
of that before. Yeah, so you can scan this to two access to demo, but I will be showing it.

11:52.240 --> 12:00.960
I really hope the font is big enough. And let us begin. So first thing, we are using a kind

12:00.960 --> 12:07.520
closer for this, plus a couple of containers that are deployed on the laptop. So we have our

12:07.520 --> 12:13.280
kind of nodes right there. So we have a common control plane, a common worker, a common worker

12:13.280 --> 12:19.920
too, like fairly intuitive naming. And then we have two extra containers. We have FRR and we have

12:19.920 --> 12:27.920
BGP server. So to set up we are running is kind of FRR is router. Obviously. And then you have

12:27.920 --> 12:34.960
BGP server, which is connected directly into the router. And then you have the Kubernetes cluster

12:35.840 --> 12:42.000
containers, which are also connected router. So FRR has two interfaces, one connected to a switch,

12:42.000 --> 12:46.960
which interconnects with the Kubernetes nodes. And it has a second interface, which connects to the

12:46.960 --> 12:55.760
BGP server. The default route in BGP server is the router. Okay, what are we going to see?

12:55.760 --> 13:00.720
I'm going to start this script that does this. And this is the configuration that we have.

13:00.720 --> 13:09.360
So we have our cluster ESN. So 64, 5, 12. I mean, just has to be a number. And then the network

13:09.360 --> 13:14.640
we want to import. So remember, we have policy. It's not like I'm going to, I don't know,

13:15.920 --> 13:21.280
bring all the routes from the outside world. No, I just want to be able to reach this specific

13:21.280 --> 13:27.600
route. So you have policy for that. This is the prefix we want to import. 172, 26, 00,

13:28.000 --> 13:33.680
16. And we are deploying a virtual machine in our cluster as well.

13:35.920 --> 13:40.720
Thank that is it. Oh, an important like the BGP server. So the container that we have

13:40.720 --> 13:46.560
outside the cluster is has this IP address. So the dot 3 IP.

13:49.520 --> 13:56.880
Okay. So, okay. This is the FRR configuration. Remember what or told us,

13:56.960 --> 14:03.200
this defines the neighbors. So you're specifying whom to peer with here. So you're defining one

14:03.200 --> 14:10.480
router in your cluster. The ASN I told you about and this router has two neighbors. One using an

14:10.480 --> 14:17.440
IPv4 address and another using an IPv6 address. Essentially, it's the same router. The router is

14:17.440 --> 14:24.480
just dual stack. So as you see dot 5 address both in the IPv4 and the IPv6 address.

14:24.800 --> 14:31.760
And we define here and this two receive allowed. You're saying I just want to import this

14:31.760 --> 14:39.040
particular prefix from the fabric. It could have been like in the allowed stands. It could have

14:39.040 --> 14:45.280
been like all. And you've import everything that the fabric knows about. You would bring that down

14:45.280 --> 14:51.040
to your cluster routers. That's not what you want. We just want these this particular prefix.

14:55.200 --> 15:00.960
Okay. This just shows that the routers are running in each of the nodes. And now we are

15:00.960 --> 15:06.000
provisioning the IP address. Sorry, provision the virtual machine. We see a discheduling.

15:09.040 --> 15:15.120
Maybe this forward. Okay. So it is now running. It got an IP address. So it just has the

15:15.120 --> 15:23.520
cluster default metric attachment. And I'm showing here, yeah, does the PMI spec for this virtual

15:23.520 --> 15:29.040
machine and the interesting thing here. It just has a cluster default network attachment. So you see

15:29.040 --> 15:40.640
here, masquerade interface type. Okay. Now what I'm going to do is to show the routes on the nodes

15:40.640 --> 15:46.880
itself. So I created a debug pod on the oven worker, which is where our virtual machine was

15:46.880 --> 15:53.280
scheduled on. And as we see we have imported this route from the fabric. So you see this

15:53.280 --> 16:02.400
172 dot 26 dot 0 dot 0 slash 16 is a route on well under under on the cluster nodes.

16:04.880 --> 16:10.800
And what we will do now is access the BGP server thing, which is located outside the cluster.

16:10.880 --> 16:22.880
So for that we console into our VM. For the refer to the best word. And we will now

16:22.880 --> 16:30.400
ping the IP address that is same right here and we have we can ping it. And we can also curl the

16:30.400 --> 16:40.640
thing, which is what we wanted to show in this first demo. Okay. We can see the

16:40.640 --> 16:47.600
ping again. Now what I want to show is that this is actually doing something. So I'm going to

16:47.600 --> 16:54.640
delete the configuration does saying that my router is no longer connected to this. You see here

16:54.720 --> 17:08.000
DFR configuration. And so you notice the ping still going, still going, going to delete it.

17:13.680 --> 17:20.080
And the ping stopped. And what I'm going to do again is I'm going to provision like the script

17:20.160 --> 17:26.000
is declarative. So it does nothing. If it already exists, one of the things it does is provision

17:26.000 --> 17:33.120
again, the FR configuration. So the ping resumes working. And this includes our first demo in which

17:33.120 --> 17:40.800
we try to access a profiter network. Let's move to the second one, which shows the second use case

17:40.800 --> 17:46.640
or tool is about in which we want to advertise to the outside world, our internal network, or

17:46.720 --> 17:55.040
our cluster network. So for that, we use the second script. Or we are using IBGP here. We can see

17:55.040 --> 18:03.360
that from D. It's gone. From the cluster SN external ACEN, they're the same. So IBGP,

18:03.360 --> 18:08.720
our neighbor, the .5 address. We are no longer using dual stack for this. We could. There's no reason

18:08.720 --> 18:15.600
to. So let's not. Now interesting enough. The cluster network we want to advertise to the outside

18:15.600 --> 18:22.640
world is this prefix right here. So then dot 100 dot 0 dot 0 slash 16. We want to advertise

18:22.640 --> 18:29.040
the bottom network. And the provider service that we have outside a cluster is dot IP address the

18:29.040 --> 18:35.600
same one as we have before. So this is the FR configuration. As you see, the only difference is

18:35.600 --> 18:40.960
in the allowed. It has mode all. So I want to import everything the fabric tells me. Don't do this

18:40.960 --> 18:49.920
in production ever. And this is okay. This is how we configure our UDM. So this is an isolated network.

18:49.920 --> 18:57.520
This is just a detail. So let's skip over it. And this is relevant part. We have right here. So this

18:57.520 --> 19:05.040
is the second CR dot or tools about this is the route advertising. So while the other told us

19:05.120 --> 19:11.120
whom to P with this will tell us what are we advertising or exposing to the outside world. So

19:11.120 --> 19:18.400
we want to use the FR configuration. So we want to define the one to consume the neighbors that

19:18.400 --> 19:26.320
we defined on FR configurations with these labels. And we want to expose the networks that has

19:26.320 --> 19:32.080
these this label right here. And that is pretty much it. Let's see what happens when we do it.

19:32.960 --> 19:40.320
Because again, remember our second use case, the thing we will be seeing now is how can we get rid of

19:40.320 --> 19:45.040
not and not have to use a Kubernetes service to access into our workloads.

19:45.760 --> 20:02.400
Okay, the router pods are running. I'm scheduling the VM. Scheduling, scheduled, running.

20:03.200 --> 20:08.400
Doesn't IP address in the range of the network we want to expose. I'm going to copy that IP address.

20:09.280 --> 20:17.280
Now I'm showing the routes in the fabric, right? So interesting enough. You'll see here that

20:18.320 --> 20:25.840
the fabric knows a way to access this prefix. And knows how to reach it in three different ways.

20:25.840 --> 20:32.080
So you can use either these three next hop IP addresses to reach it. Why don't three like

20:32.080 --> 20:37.520
these are the three closer nodes that we have like. So this means that the traffic could go

20:38.160 --> 20:41.040
into a workload into any of these IPs.

20:53.520 --> 20:59.680
Yeah, here you see the exact same thing, but not using like FR API, just the typical route

20:59.680 --> 21:06.560
table on your nodes. And we can now try we will Docker exit into our container and try to reach

21:06.640 --> 21:12.480
into the VM we have deployed on this isolated network. So Docker, exact BGP server.

21:16.320 --> 21:20.320
And we will bang the IP address that we are seeing right there and it is working.

21:21.600 --> 21:27.280
Which is what we hope to see. So you're exposing a workload running on a Kubernetes cluster directly

21:27.280 --> 21:33.920
using its IP. We did not have to create a service. There's no not involved. This is literally

21:33.920 --> 21:39.840
what we wanted to show. Now I'm going to do the exact same thing I did before. I'm going to delete

21:39.840 --> 21:44.880
the route advertisements and we're going to be seeing that the ping will stop. I could have

21:44.880 --> 21:50.480
done the same thing as I did before. I could move the FR configuration the result would have been the same.

21:52.080 --> 21:56.480
As you see it stops, I'm going to do the same thing again. Run the entire script, which is

21:56.720 --> 22:11.040
important, which will make the ping resume working. And it has resumed. So we'll take it away.

22:14.320 --> 22:25.840
And wrapping up with some conclusions. So before we go, we want you to go with the following points

22:27.120 --> 22:35.840
so we saw the integration between open Kubernetes and FR Kubernetes that allows us to connect

22:35.840 --> 22:45.280
workloads over a BGP and drop the excessive service configuration and have seamless,

22:46.080 --> 22:58.480
somewhat seamless experience of exposing your VMs or ports outside. And the most important thing

22:58.480 --> 23:07.600
is that you can reach your VM with its original IP. No not no fun stuff and the weird IPs.

23:07.840 --> 23:18.880
We saw that the integration with the provider network is dynamic. It means that if something fails

23:18.880 --> 23:25.680
in your provider network, thanks to BGP, it will recover and learn and learn your provider

23:25.680 --> 23:33.680
fabric routes and distributes them to your notes. So you don't need to do manual configurations

23:34.480 --> 23:41.200
and it's just make your admin happier because it has less job to do, less work to do.

23:43.600 --> 23:50.400
As he has no longer need to mess with annoying static configurations and manual work.

23:52.240 --> 24:02.480
And as we mentioned earlier, we finally dropped the nut overhead and annoying configuration.

24:04.320 --> 24:14.160
Increasing the VM and exposing it by its original IP. So in this feature is something that our

24:14.160 --> 24:21.520
customers look for a long time. It makes them happier and admin happier.

24:22.480 --> 24:33.760
And this is the end. We have a few resources. We have the BGP, it's a nice BGP intro.

24:34.320 --> 24:42.160
The project's docs and the integration doc is pretty straightforward. I hope you you find it useful.

24:42.800 --> 24:48.800
And that's all. Thank you.

24:50.960 --> 24:54.880
How are there any questions? Yes, it's been a long time ago.

24:54.880 --> 25:01.600
Do you happen to have some set of well-metrics where it shows how much improved

25:01.600 --> 25:05.360
the performance is going the BGP way compared to not.

25:06.080 --> 25:14.160
So the question was whether we have some metrics to compare between how we improve the net talk

25:15.760 --> 25:22.560
between comparing BGP to not configuration. So rough metrics for that would be

25:23.280 --> 25:29.200
like 40% for the not overhead and we didn't even talk about the underlying net work

25:29.760 --> 25:37.840
that may be used VxLan and other encapsulation protocols. This basically, this basically

25:41.600 --> 25:57.280
do we have more time? So the question was whether the VM can

25:57.280 --> 26:08.960
ingress and egress with the same IP? So the thing about the egress IP is that you'll be pretty

26:08.960 --> 26:13.840
much as advertising that to the net work as well. So the answer is yes, pretty much. If you're

26:16.080 --> 26:22.800
using it to egress, we want to try to force it to go in the same way. Otherwise, your contract

26:22.800 --> 26:28.400
tables would go bananas, your traffic could get dropped somewhere and debugging nightmare.

26:28.400 --> 26:33.840
Yeah, that's partially one of the reasons why that featured that integration was created.

26:38.960 --> 26:49.680
Sorry. I do not know what those are, sorry. And I think time's up, sorry. You can catch us

26:49.680 --> 26:51.440
outside, okay?