How to create an Open Virtual Network distributed gateway router

In this article, I discuss external connectivity in Open Virtual Network (OVN), a subproject of Open vSwitch (OVS), using a distributed gateway router.

OVN provides external connectivity in two ways:

A logical router with a distributed gateway port, which is referred to as a distributed gateway router in this article
A logical gateway router

In this article, you will see how to create a distributed gateway router and an example of how it works.

Creating a distributed gateway router has some advantages over using a logical gateway router for the CMS (cloud management system):

It is easier to create a distributed gateway router because the CMS doesn't need to create a transit logical switch, which is needed for a logical gateway router.
A distributed gateway router supports distributed north/south traffic, whereas the logical gateway router is centralized on a single gateway chassis.
A distributed gateway router supports high availability.

Note: The CMS can be OpenStack, Red Hat OpenShift, Red Hat Virtualization, or any other system that manages a cloud.

Setup details

Let's first talk about the deployment details. I will take an example setup having five nodes, out of which three are controller nodes and the rest are compute nodes. The tenant VMs are created in the compute nodes. Controller nodes run OVN database servers in active/passive mode.

Note: When you run the command ovn-nbctl/ovn-sbctl, it should be run on the node where the OVN database servers are running. Alternately, you can pass the --db option with the IP address/port.

Chassis in OVN

In OVN terminology, each node is referred to as chassis. A chassis is nothing but a node where the ovn-controller service is running. In order for a chassis to act as a gateway chassis, it should be capable of providing external (north/south) connectivity to the tenant traffic. It also requires the following configuration:

Configure ovn-bridge-mappings, which provides a list of key-value pairs that map a physical network name to a local OVS bridge that provides connectivity to that network.

ovs-vsctl set open . external-ids:ovn-bridge-mappings=provider:br-provider

Create the provider OVS bridge and add to the OVS bridge the interface that provides external connectivity:

ovs-vsctl --may-exist add-br br-provider
ovs-vsctl --may-exist add-port br-provider INTERFACE_NAME

In the above setup, all the controller nodes act as gateway chassis.

Below is the output of ovn-sbctl showing my setup:

Chassis "controller-0"
    hostname: "controller-0.localdomain"
    Encap geneve
        ip: "172.17.2.28"
        options: {csum="true"}
Chassis "controller-1"
    hostname: "controller-1.localdomain"
    Encap geneve
        ip: "172.17.2.26"
        options: {csum="true"}
Chassis "controller-2"
    hostname: "controller-2.localdomain"
    Encap geneve
        ip: "172.17.2.18"
        options: {csum="true"}
Chassis "compute-0"
    hostname: "compute-0.localdomain"
    Encap geneve
        ip: "172.17.2.15"
        options: {csum="true"}
Chassis "compute-1"
    hostname: "compute-1.localdomain"
    Encap geneve
        ip: "172.17.2.17"
        options: {csum="true"}

Let's first create a couple of logical switches and logical ports and attach them to a logical router:

ovn-nbctl ls-add sw0
ovn-nbctl lsp-add sw0 sw0-port1
ovn-nbctl lsp-set-addresses sw0-port1 "00:00:01:00:00:03 10.0.0.3"

ovn-nbctl ls-add sw1
ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "00:00:02:00:00:03 20.0.0.3"

ovn-nbctl lr-add lr0
# Connect sw0 to lr0
ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
ovn-nbctl lsp-add sw0 sw0-lr0
ovn-nbctl lsp-set-type sw0-lr0 router
ovn-nbctl lsp-set-addresses sw0-lr0 router
ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0

# Connect sw1 to lr0
ovn-nbctl lrp-add lr0 lr0-sw1 00:00:00:00:ff:02 20.0.0.1/24
ovn-nbctl lsp-add sw1 sw1-lr0
ovn-nbctl lsp-set-type sw1-lr0 router
ovn-nbctl lsp-set-addresses sw1-lr0 router
ovn-nbctl lsp-set-options sw1-lr0 router-port=lr0-sw1

Below is the output of ovn-nbctl:

ovn-nbctl show
switch 05cf23bc-2c87-4d6d-a76b-f432e562ed71 (sw0)
    port sw0-port1
        addresses: ["00:00:01:00:00:03 10.0.0.3"]
    port sw0-lr0
        type: router
        router-port: lr0-sw0
switch 0dfee7ef-13b3-4cd0-87a1-7935149f551e (sw1)
    port sw1-port1
        addresses: ["00:00:02:00:00:03 20.0.0.3"]
    port sw1-lr0
        type: router
        router-port: lr0-sw1
router c189f271-86d6-4f7f-891c-672cb3aa543e (lr0)
    port lr0-sw0
        mac: "00:00:00:00:ff:01"
        networks: ["10.0.0.1/24"]
    port lr0-sw1
        mac: "00:00:00:00:ff:02"
        networks: ["20.0.0.1/24"]

The port sw0-port1 can communicate with sw1-port1 since the switches are connected to the logical router lr0. The east-west traffic is distributed with OVN.

Now let's create a provider logical switch:

ovn-nbctl ls-add public
# Create a localnet port
ovn-nbctl lsp-add public ln-public
ovn-nbctl lsp-set-type ln-public localnet
ovn-nbctl lsp-set-addresses ln-public unknown
ovn-nbctl lsp-set-options ln-public network_name=provider

Notice thenetwork_name=provider. The network_name should match the list defined in the ovn-bridge-mappings. When a localnet port is defined in a logical switch, the ovn-controller running on gateway chassis creates an OVS patch port between the integration bridge and the provider bridge so that the logical tenant traffic leaves from and enters into the physical network.

At this point, the tenant traffic from the logical switches sw0 and sw1 still cannot enter the public logical switch, since there is no association between it and the logical router lr0.

Creating a distributed router port

Let's first connect lr0 to public:

ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.200/24
ovn-nbctl lsp-add public public-lr0
ovn-nbctl lsp-set-type public-lr0 router
ovn-nbctl lsp-set-addresses public-lr0 router
ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public

We still need to schedule the distributed gateway port lr0-public to a gateway chassis. What does scheduling mean here? It means the chassis that is selected to host the gateway router port provides the centralized external connectivity. The north-south tenant traffic will be redirected to this chassis and it acts as a gateway. This chassis applies all the NATting rules before sending out the traffic via the patch port to the provider bridge. It also means that when someone pings 172.168.0.200 or sends ARP request for 172.168.0.200, the gateway chassis hosting this will respond with the ping and ARP replies.

Scheduling the gateway router port

This can be done in two ways:

Non-high-availability (non-HA) mode: The gateway router port is configured to be scheduled on a single gateway chassis. If the gateway chassis hosting this port goes down for some reason, the external connectivity is completely broken until the CMS (cloud management system) detects this and reschedules it to another gateway chassis.
HA mode: The gateway router port is configured to be scheduled on a set of gateway chassis. The gateway chassis configured with a high priority claims the gateway router port. If this gateway chassis goes down for some reason, the next higher priority gateway chassis claims the gateway router port.

Scheduling in non-HA mode

Select a gateway chassis where you want to schedule the gateway router port. Let's schedule on controller-0. There are two ways to do it. Run one of the following commands:

ovn-nbctl set logical_router_port lr0-public options:redirect-chassis=controller-0

ovn-nbctl list logical_router_port lr0-public
_uuid : 0ced9cdb-fbc9-47f1-b2e2-97a49988d622
enabled : []
external_ids : {}
gateway_chassis : []
ipv6_ra_configs : {}
mac : "00:00:20:20:12:13"
name : "lr0-public"
networks : ["172.168.0.200/24"]
options : {redirect-chassis="controller-0"} peer : []
or
ovn-nbctl lrp-set-gateway-chassis lr0-public controller-0 20

In the ovn-sbctl show output below, you can see that controller-0 is hosting the gateway router port lr0-public.

ovn-sbctl show
Chassis "d86bd6f2-1216-4a73-bcaf-3200b8ed8126"
    hostname: "controller-0.localdomain"
    Encap geneve
    ip: "172.17.2.28"
    options: {csum="true"}
    Port_Binding "cr-lr0-public"
Chassis "20dc7bfb-a329-4cf9-a8ac-3485f7d5be46"
    hostname: "controller-1.localdomain"
    ...
    ...

Scheduling in HA mode

In this case, we select a set of gateway chassis and set a priority for each chassis. The chassis with the highest priority will be hosting the gateway router port.

In our example, let's set all the gateway chassis: controller-0 with priority 20, controller-1 with 15, and controller-2 with 10.

Run the following commands:

ovn-nbctl lrp-set-gateway-chassis lr0-public controller-0 20
ovn-nbctl lrp-set-gateway-chassis lr0-public controller-1 15
ovn-nbctl lrp-set-gateway-chassis lr0-public controller-2 10

You can verify the configuration by running the following commands:

ovn-nbctl list gateway_chassis
_uuid : 745d7f84-0516-4a0f-9b3d-772e5cb58a48
chassis_name : "controller-1"
external_ids : {}
name : "lr0-public-controller-1"
options : {}
priority : 15

_uuid : 6f2921d4-2555-4f81-9428-640cbf62151e
chassis_name : "controller-0"
external_ids : {}
name : "lr0-public-controller-0"
options : {}
priority : 20

_uuid : 97595b29-139d-4a43-9973-8995ffe17c64
chassis_name : "controller-2"
external_ids : {}
name : "lr0-public-controller-2"
options : {}
priority : 10

ovn-nbctl list logical_router_port lr0-public
_uuid : 0ced9cdb-fbc9-47f1-b2e2-97a49988d622
enabled : []
external_ids : {}
gateway_chassis : [6f2921d4-2555-4f81-9428-640cbf62151e, 745d7f84-0516-4a0f-9b3d-772e5cb58a48, 97595b29-139d-4a43-9973-8995ffe17c64]
ipv6_ra_configs : {}
mac : "00:00:20:20:12:13"
name : "lr0-public"
networks : ["172.168.0.200/24"]
options : {}
peer : []


ovn-sbctl show
Chassis "d86bd6f2-1216-4a73-bcaf-3200b8ed8126"
    hostname: "controller-0.localdomain"
    Encap geneve
        ip: "172.17.2.28"
        options: {csum="true"}
    Port_Binding "cr-lr0-public"
Chassis "20dc7bfb-a329-4cf9-a8ac-3485f7d5be46"
    hostname: "controller-1.localdomain"
    ...
    ...

You can always delete a gateway chassis' association to the distributed router port by running the following command:

ovn-nbctl lrp-del-gateway-chassis lr0-public controller-1

To support HA, OVN uses the Bidirectional Forwarding Detection (BFD) protocol. It configures BFD on the tunnel ports. When a gateway chassis hosting a distributed gateway port goes down, all the chassis detect that (thanks to BFD) and the next higher priority gateway chassis claims the port. For more details, please refer to this and run the following commands to access the OVN man pages: man ovn-nb, man ovn-northd, and man ovn-controller.

Chassis redirect port

In the output of ovn-sbctl show, you can see Port_Binding "cr-lr0-public". What is cr-lr0-public? For every gateway router port scheduled, ovn-northd internally creates a logical port of type chassisredirect. This port represents an instance of the distributed gateway port that is scheduled on the selected chassis.

What happens when a VM sends external traffic?

Now let's briefly see what happens when a VM associated with the logical port (let's say sw0-port0) sends a packet to destination 172.168.0.110 from the OVN logical datapath pipeline perspective. Let's assume the VM is running on compute-0 and the chassis redirect port is scheduled on controller-0. 172.168.0.110 could be associated with a physical server or a VM that is reachable via the provider network.

On the compute chassis, the following occurs:

When the VM sends the traffic, the logical switch pipeline of sw0 is run.
From the logical switch pipeline, it enters the ingress router pipeline via the lr0-sw0 port as the packet needs to be routed.
The ingress router pipeline is run and the routing decision is made and the outport is set to lr0-public.

 Logical flows
 table=0 (lr_in_admission ), priority=50 , match=(eth.dst == 00:00:00:00:ff:01 && inport == "lr0-sw0"), action=(next;)
 ...
 table=7 (lr_in_ip_routing ), priority=49 , match=(ip4.dst == 172.168.0.0/24), action=(ip.ttl--; reg0 = ip4.dst; reg1 = 172.168.0.200; eth.src = 00:00:20:20:12:13; outport = "lr0-public"; flags.loopback = 1; next;)
 ...
 table=9 (lr_in_gw_redirect ), priority=50 , match=(outport == "lr0-public"), action=(outport = "cr-lr0-public"; next;)

Since cr-lr0-public is scheduled on controller-0, the packet is sent to controller-0 via the tunnel port:

  table=32, priority=100,reg15=0x4,metadata=0x3 actions=load:0x3->NXM_NX_TUN_ID[0..23],set_field:0x4->tun_metadata0,move:NXM_NX_REG14[0..14]->NXM_NX_TUN_METADATA0[16..30],output:ovn-cont-0

On controller-0 chassis, the following occurs:

controller-0 receives the traffic on the tunnel port and sends the traffic to the egress pipeline of logical router lr0:

table=0, priority=100,in_port="ovn-comp-0" actions=move:NXM_NX_TUN_ID[0..23]->OXM_OF_METADATA[0..23],move:NXM_NX_TUN_METADATA0[16..30]->NXM_NX_REG14[0..14],move:NXM_NX_TUN_METADATA0[0..15]->NXM_NX_REG15[0..15],resubmit(,33)

The NAT rules are applied. That is, the source IP address 10.0.0.3 is NATed to 172.168.0.200:

  table=1 (lr_out_snat ), priority=25 , match=(ip && ip4.src == 10.0.0.0/24 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat(172.168.0.200);)

The packet is sent to logical switch public via the lr0-public port:

  table=3 (lr_out_delivery ), priority=100 , match=(outport == "lr0-public"), action=(output;)

And the packet is sent out via the localnet port to the provider bridge and reaches the destination.

Now let's see what happens for the reply traffic.

On controller-0 chassis:

The packet is received by the physical interface present in the provider bridge and it enters the pipeline of ingress logical switch public via the localnet port:

 table=0,priority=100,in_port="patch-br-int-to",dl_vlan=0 actions=strip_vlan,load:0x1->NXM_NX_REG13[],load:0x7->NXM_NX_REG11[],load:0x8->NXM_NX_REG12[],load:0x4->OXM_OF_METADATA[],load:0x2->NXM_NX_REG14[],resubmit(,8)

From public, the packet enters the pipeline of lr0 via the public-lr0 logical port.
In the ingress router pipeline, UnSNAT rules are applied. That is, the destination IP address is unNATed from 172.168.0.200 to 10.0.0.3:

 table=0 (lr_in_admission ), priority=50 , match=(eth.dst == 00:00:20:20:12:13 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(next;)
 ...
 table=3 (lr_in_unsnat ), priority=100 , match=(ip && ip4.dst == 172.168.0.200 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat;)

Since 10.0.0.3 belongs to the logical switch sw0, the packet enters the ingress pipeline of sw0 via lr0-sw0:

 table=7 (lr_in_ip_routing ), priority=49 , match=(ip4.dst == 10.0.0.0/24), action=(ip.ttl--; reg0 = ip4.dst; reg1 = 10.0.0.1; eth.src = 00:00:00:00:ff:01; outport = "lr0-sw0"; flags.loopback = 1; next;)

The ingress pipeline of sw0 is run and the packet is sent to compute-0 via the tunnel port because OVN knows that sw0-port1 resides on compute-0.

On compute-0 chassis, the following occurs:

compute-0 receives the traffic on the tunnel port and sends the traffic to the egress pipeline of logical switch sw0.
In the egress pipeline, the packet is delivered to sw0-port1.

Conclusion

This article provides an overview of a distributed gateway router in OVN, how it is created and what happens when a VM sends external traffic. Hopefully this will be helpful in understanding external connectivity support in OVN and troubleshooting any issues related to it.

Additional resources

See additional virtual networking articles on Open vSwitch and Open Virtual Network:

Last updated: June 7, 2023

Red Hat Developer Sandbox

Programming Languages & Frameworks

System Design & Architecture

Developer Productivity

Automated Data Processing

Platform Engineering

Secure Development & Architectures

E-Books

Cheat Sheets

Documentation

How to create an Open Virtual Network distributed gateway router

Setup details

Chassis in OVN

Creating a distributed router port

Scheduling the gateway router port

Scheduling in non-HA mode

Scheduling in HA mode

Chassis redirect port

What happens when a VM sends external traffic?

Conclusion

Additional resources

Profiling vLLM Inference Server with GPU acceleration on RHEL

Network performance in distributed training: Maximizing GPU utilization on OpenShift

Clang bytecode interpreter update

How Red Hat has redefined continuous performance testing

Simplify OpenShift installation in air-gapped environments

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue

How to create an Open Virtual Network distributed gateway router

Share:

Setup details

Chassis in OVN

Creating a distributed router port

Scheduling the gateway router port

Scheduling in non-HA mode

Scheduling in HA mode

Chassis redirect port

What happens when a VM sends external traffic?

Conclusion

Additional resources

Profiling vLLM Inference Server with GPU acceleration on RHEL

Network performance in distributed training: Maximizing GPU utilization on OpenShift

Clang bytecode interpreter update

How Red Hat has redefined continuous performance testing

Simplify OpenShift installation in air-gapped environments

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue