Open vSwitch

Open Virtual Network (OVN) is a subproject of Open vSwitch (OVS), a performant, programmable, multi-platform virtual switch. OVN adds to the OVS existing capabilities the support for overlay networks by introducing virtual network abstractions such as virtual switches and routers. Moreover, OVN provides native methods for setting up Access Control Lists (ACLs) and network services such as DHCP. Many Red Hat products, such as Red Hat OpenStack Platform and Red Hat Virtualization, are now using OVN, and Red Hat OpenShift Container Platform will be using OVN soon.

In this article, I'll cover how OVN ARP/ND_NS actions work, the main limitations in the current implementation, and how to overcome those. First, I'll provide a brief overview of OVN's architecture to facilitate the discussion:

OVN architecture

An OVN deployment consists of several components:

  • The OVN/CMS plugin (for example, Neutron) is the CMS interface component for OVN. The plugin’s main purpose is to translate the CMS’s notion of the logical network configuration into an intermediate representation composed by logical switches and routers that can be interpreted by OVN.
  • The OVN northbound database (NBDB) is an OVSDB instance responsible for storing network representation received from the CMS plugin. The OVN northbound database has only two clients: the OVN/CMS plugin and the ovn−northd daemon.
  • The ovn−northd daemon connects to the OVN northbound database and to the OVN southbound database. It translates the logical network configuration in terms of conventional network concepts, taken from the OVN northbound database, into logical datapath flows in the OVN southbound database
  • The OVN southbound database (SBDB), is also an OVSDB database, but it is characterized by a quite different schema with respect to the northbound database. In particular, instead of familiar networking concepts, the southbound database defines the network in terms of match-action rule collections called logical flows. The logical flows, while conceptually similar to OpenFlow flows, exploit logical concepts, such as virtual machine instances, instead of physical ones, such as physical Ethernet ports. In particular, the southbound database includes three data types:
    • Physical network data, such as the VM's IP address and tunnel encapsulation format
    • Logical network data, such as packet forwarding mode
    • The binding relationship between the physical network and logical network

OVN architecture

L2 address resolution problem

A typical OVN deployment is shown below where the overlay network is connected to an external one through a localnet port (ext-localnet, in this case):

Typical OVN deployment

Below, is shown the related OVN NBDB network configuration:

switch 35b34afe-ee16-469b-9893-80b024510f33 (sw2)  
        port sw2-port4  
             addresses: ["00:00:02:00:00:04 1.2.0.4 2001:db8:2::14 "]  
        port sw2-port3  
             addresses: ["00:00:02:00:00:03 1.2.0.3 2001:db8:2::13 "]  
        port sw2-portr0  
             type: router  
             addresses: ["00:00:02:ff:00:02"]  
             router-port: lrp2  
switch c16e344a-c3fe-4884-9121-d1d3a2a9d9b1 (sw1)  
        port sw1-port1  
             addresses: ["00:00:01:00:00:01 1.1.0.1 2001:db8:1::11 "]  
        port sw1-portr0  
             type: router  
             addresses: ["00:00:01:ff:00:01"]  
             router-port: lrp1  
        port sw1-port2  
             addresses: ["00:00:01:00:00:02 1.1.0.2 2001:db8:1::12 "]  
switch ee2b44de-7d2b-4ffa-8c4c-2e1ac7997639 (sw-ext)  
        port ext-localnet  
                 type: localnet  
                 addresses: ["unknown"]  
        port ext-lr0  
                 type: router  
                 addresses: ["02:0a:7f:00:01:29"]  
                 router-port: lr0-ext  
router 681dfe85-6f90-44e3-9dfe-f1c81f4cfa32 (lr0)  
        port lrp2  
                 mac: "00:00:02:ff:00:02"  
                 networks: ["1.2.254.254/16", "2001:db8:2::1/64"]  
        port lr0-ext  
                 mac: "02:0a:7f:00:01:29"  
                 networks: ["192.168.123.254/24", "2001:db8:f0f0::1/64"]  
        port lrp1  
                 mac: "00:00:01:ff:00:01"  
                 networks: ["1.1.254.254/16", "2001:db8:1::1/64"]  

Whenever a device belonging to the overlay network (for example, PC1) tries to reach an external device (for example, PC-EXT), it forwards the packet to the OVN logical router (LR0). If LR0 has not already resolved the L2/L3 address correspondence for PC-EXT, it will send an ARP frame (or a Neighbor Discovery for IPv6 traffic) for PC-EXT. The current OVN implementation employs ARP action to perform L2 address resolution. In other words, OVN will instruct OVS to perform a "packet in" action whenever it needs to forward an IP packet for an unknown L2 destination. The ARP action replaces the IPv4 packet being processed with an ARP frame that is forwarded on the external network to resolve the PC-EXT MAC address. Below is shown the IPv4/IPv6 OVN SBDB rules corresponding to that processing:

table=10(lr_in_arp_request  ), priority=100  , match=(eth.dst == 00:00:00:00:00:00), action=(arp { eth.dst = ff:ff:ff:ff:ff:ff; arp.spa = reg1; arp.tpa = reg0; arp.op = 1; output; };)
table=10(lr_in_arp_request  ), priority=100  , match=(eth.dst == 00:00:00:00:00:00), action=(nd_ns { nd.target = xxreg0; output; };)

The main drawback introduced by the described processing is the loss of the first packet of the connection (as shown in the following ICMP traffic) introducing latency in TCP connections established with devices not belonging to the overlay network:

PING 192.168.123.10 (192.168.123.10) 56(84) bytes of data.
64 bytes from 192.168.123.10: icmp_seq=2 ttl=63 time=0.649 ms
64 bytes from 192.168.123.10: icmp_seq=3 ttl=63 time=0.321 ms
64 bytes from 192.168.123.10: icmp_seq=4 ttl=63 time=0.331 ms
64 bytes from 192.168.123.10: icmp_seq=5 ttl=63 time=0.137 ms
64 bytes from 192.168.123.10: icmp_seq=6 ttl=63 time=0.125 ms
64 bytes from 192.168.123.10: icmp_seq=7 ttl=63 time=0.200 ms
64 bytes from 192.168.123.10: icmp_seq=8 ttl=63 time=0.244 ms
64 bytes from 192.168.123.10: icmp_seq=9 ttl=63 time=0.224 ms
64 bytes from 192.168.123.10: icmp_seq=10 ttl=63 time=0.271 ms
--- 192.168.123.10 ping statistics ---
10 packets transmitted, 9 received, 10% packet loss, time 9214ms

Proposed solution: Add buffering support for IP packets

In order to overcome this limitation, a solution for adding buffering support for IP packets has been proposed by which incoming IP frames that have no corresponding L2 address yet are queued and will be re-injected to ovs-vswitchd as soon as the neighbor discovery process is completed.

Repeating the above tests proves that even the first ICMP echo request is received by PC-EXT:

PING 192.168.123.10 (192.168.123.10) 56(84) bytes of data.
64 bytes from 192.168.123.10: icmp_seq=1 ttl=63 time=1.92 ms
64 bytes from 192.168.123.10: icmp_seq=2 ttl=63 time=0.177 ms
64 bytes from 192.168.123.10: icmp_seq=3 ttl=63 time=0.277 ms
64 bytes from 192.168.123.10: icmp_seq=4 ttl=63 time=0.139 ms
64 bytes from 192.168.123.10: icmp_seq=5 ttl=63 time=0.281 ms
64 bytes from 192.168.123.10: icmp_seq=6 ttl=63 time=0.247 ms
64 bytes from 192.168.123.10: icmp_seq=7 ttl=63 time=0.211 ms
64 bytes from 192.168.123.10: icmp_seq=8 ttl=63 time=0.187 ms
64 bytes from 192.168.123.10: icmp_seq=9 ttl=63 time=0.439 ms
64 bytes from 192.168.123.10: icmp_seq=10 ttl=63 time=0.253 ms
--- 192.168.123.10 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9208ms

Future development

A possible future enhancement to the described methodology could be to use the developed IP buffering infrastructure to queue packets waiting for given events and then send them back to ovs-vswitchd as soon as the requested message has been received. For example, we can rely on the IP buffering infrastructure to queue packets designated for an OpenShift pod that has not completed the bootstrap phase yet. Stay tuned :)

Additional resources

Here are some other articles related to OVN and OVS:

Last updated: June 17, 2022