Kernel Open vSwitch Flow Programming

Typically, users will interact with the Open vSwitch kernel datapath by way of the 'ovs-ofctl' utility to program OpenFlow rules into the 'ovs-vswitchd'. However, this isn't the only mechanism for forwarding packets via the openvswitch kernel module. An additional direct flow-programming interface is available using the 'ovs-dpctl' utility to add flows to the kernel. This post will cover influencing the movement of packets through the openvswitch kernel module using the 'ovs-dpctl' utility.

It's important to know how packets move through the kernel datapath. Whenever a packet arrives, the very first thing that happens is a flow key is filled with all of the metadata from the packet. This metadata includes Ethernet header information, ip header information, arp information, etc, and can also be layered with tunnel protocol information, if the packet arrived over such a tunnel. Once the flow key data is filled, the skb processing happens. The flow key is matched against the existing flow cache data. If a match is found, the actions associated with that flow are performed. If a match is not found, the packet will either be sent to the ovs-vswitchd utility, referred to as an upcall, or dropped (in the case that ovs-vswitchd is not running). A datapath, in Open vSwitch parlance, is merely a collection of vports. Typically, flow entries are populated for datapaths once an upcall happens. The packet is evaluated against the OpenFlow rules, and the subsequent flow key plus actions are programmed into kernel space.

There may be reasons not to use the 'ovs-vswitchd' to perform these actions. One possible case includes udp traffic, which is order sensitive, as the packets passed to userspace may arrive at the target application after the packets that are bridged by the kernel. Another reason may be for flow based bridging without the use of a userspace daemon. Whatever the reasons for wanting to program the userspace datapath, the Open vSwitch suite comes with a tool to do this called 'ovs-dpctl'.

When programming rules with 'ovs-dpctl' there are a few caveats to understand. The biggest is that the ovs-dpctl doesn't accept OpenFlow syntax. Instead, it uses a special syntax, which builds up a flow key specification. This means that the same syntax given to 'ovs-ofctl' will not be passed to ovs-dpctl. However, the syntax is still simple to use and doesn't stray too far from the familiar OpenFlow.

The flow-key match rules are made up of various parts of the metadata. The ones we'll focus on are the most common: Ethernet match rules, arp match rules, and ip match rules. There is even an encap() specifier, which allows peering into encapsulated packets. Each flow key match rule may require at least an "empty match" for the next layer. As an example, requesting an Ethernet match for type 0x806 (arp) will require specifying an empty arp() rule. The full suite of match rules (including tunnel matches, etc.) can be found in the parse_odp_key_mask_attr() function of lib/odp-util.c at the time of writing.

As a simple introduction, let's create two network namespaces, and link them via an openvswitch flow. This will require a system that has the openvswitch kernel compiled and loaded, as well as sufficient privileges to create namespaces and interfaces.

First, create two namespaces, called 'ns1' and 'ns2':

 root@localhost ~:# ip netns add ns1 && ip netns add ns2

Next, create the vhost Ethernet pairs and assign them to their namespaces:

 root@localhost ~:# ip link add name vport0 type veth peer name vport1
 root@localhost ~:# ip link add name vport2 type veth peer name vport3
 root@localhost ~:# ip link set vport0 up
 root@localhost ~:# ip link set vport2 up
 root@localhost ~:# ip link set vport1 netns ns1
 root@localhost ~:# ip link set vport3 netns ns2

At this point, we have two network namespaces, and veth pairs linking them. Time to start wiring the flows up. First, we'll ensure that the openvswitch module is loaded:

 root@localhost ~:# modprobe openvswitch

Now, create a datapath and add the ports:

 root@localhost ~:# ovs-dpctl add-dp myDP
 root@localhost ~:# ovs-dpctl add-if myDP vport0
 root@localhost ~:# ovs-dpctl add-if myDP vport2
 root@localhost ~:# ovs-dpctl show
     lookups: hit:0 missed:0 lost:0
     flows: 0
     masks: hit:0 total:0 hit/pkt:0.0
     port 0: myDP (internal)
     port 1: vport0
     port 2: vport2

In a second terminal, enter ns2, and start a tcpdump session:

 root@localhost ~:# ip netns exec ns2 /bin/bash -i
 root@localhost ~:# ip link set vport3 up
 root@localhost ~:# tcpdump -i vport3

In a third terminal, enter ns1 and start an arping:

 root@localhost ~:# ip netns exec ns1 /bin/bash -i
 root@localhost ~:# ip link set vport1 up
 root@localhost ~:# arping -I vport1 -D

At this point, no packets will be observed. Let's add our first rule to the datapath - allowing arp to traverse.
In the first terminal, allow arp from ns1 to ns2:

 root@localhost ~:# ovs-dpctl add-flow "in_port(1),eth(),eth_type(0x806),arp()" 2

Now allow arp from ns2 to ns1:

 root@localhost ~:# ovs-dpctl add-flow "in_port(2),eth(),eth_type(0x806),arp()" 1

Note that in the above rules, we can use src=, and dst= in the eth() flow key match rules to lock in the exact mac addresses we wish to traverse.

You should see arp packets start to flow through the bridge. Let's assign some addresses and try to ping.

In ns1:

 root@localhost ~:# ip addr add dev vport1

In ns2:

 root@localhost ~:# ip addr add dev vport3

Now try pinging from ns1:

 root@localhost ~:# ping

Pings will not succeed. That's because although there are rules explaining to the kernel how arp should traverse, there's no rule telling the non-arp packets (such as ipv4 icmp packets) how they should move through the datapath. A packet dump should show the arp messages succeeding, but no ICMP messages flowing properly. No matter, let's start by adding the appropriate rules:

 root@localhost ~:# ovs-dpctl add-flow "in_port(1),eth(),eth_type(0x800),ipv4(proto=1)" 2
 2017-03-20T19:05:26Z|00001|dpif|WARN|system@myDP: failed to put[create] (Invalid argument) in_port(2),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=,dst=,proto=1,tos=0/0,ttl=0/0,error: partial mask not supported for frag (0)), actions:1
ovs-dpctl: updating flow table (Invalid argument)

Oops! This example was meant to show what happens when a required flow key is missing. In this case, we get a somewhat cryptic message indicating that partial mask is not supported. However, dmesg tells us more:

[20991.852062] openvswitch: netlink: Missing key (keys=d8, expected=890)

When we see this message, there's a good bet that a required flow key match attribute is missing. In this case, it's the icmp attribute, so let's add it and try:

 root@localhost ~:# ovs-dpctl add-flow "in_port(1),eth(),eth_type(0x800),ipv4(proto=1),icmp()" 2

Pings will now traverse into ns2 properly, but ns1 still won't see the responses. Let's add the appropriate return path:

 root@localhost ~:# ovs-dpctl add-flow "in_port(2),eth(),eth_type(0x800),ipv4(proto=1),icmp()" 1

Now ping data should flow between ns1 and ns2 as we've directed - for any mac address and any ip addresses. Each of eth(), ipv4(), ipv6(), even tcp()/udp() allows specifying a src= and dst= to exactly match just those flows you wish to specify.

For one last trick, we will use netcat to establish a bi-directional communication. Note that this example doesn't use connection tracking and is merely meant as informative, but not to indicate the best way of doing tcp connection filtering.

In ns1:

 root@localhost ~:# nc -lvnp 8080

In the main terminal, let's create the bidirectional tcp rules. We need to match on packets destined from port 2 to port 1, with tcp destination port 8080. This will be for tcp packets, which contain SYN, ACK, FIN, or RST flags.

 root@localhost ~:# ovs-dpctl add-flow "in_port(2),eth(),eth_type(0x800),ipv4(proto=6),tcp(dst=8080),tcp_flags(0x1/0x1)" 1
 root@localhost ~:# ovs-dpctl add-flow "in_port(2),eth(),eth_type(0x800),ipv4(proto=6),tcp(dst=8080),tcp_flags(0x2/0x2)" 1
 root@localhost ~:# ovs-dpctl add-flow "in_port(2),eth(),eth_type(0x800),ipv4(proto=6),tcp(dst=8080),tcp_flags(0x4/0x4)" 1
 root@localhost ~:# ovs-dpctl add-flow "in_port(2),eth(),eth_type(0x800),ipv4(proto=6),tcp(dst=8080),tcp_flags(0x10/0x10)" 1

Likewise, for the return path, we'll want any packets, which are sourced from port 8080:

 root@localhost ~:# ovs-dpctl add-flow "in_port(1),eth(),eth_type(0x800),ipv4(proto=6),tcp(src=8080),tcp_flags(0x1/0x1)" 1
 root@localhost ~:# ovs-dpctl add-flow "in_port(1),eth(),eth_type(0x800),ipv4(proto=6),tcp(src=8080),tcp_flags(0x2/0x2)" 1
 root@localhost ~:# ovs-dpctl add-flow "in_port(1),eth(),eth_type(0x800),ipv4(proto=6),tcp(src=8080),tcp_flags(0x4/0x4)" 1
 root@localhost ~:# ovs-dpctl add-flow "in_port(1),eth(),eth_type(0x800),ipv4(proto=6),tcp(src=8080),tcp_flags(0x10/0x10)" 1

Now, in ns2:

 root@localhost ~:# nc 8080

There are many other combinations of flow keys and actions available. Hopefully, this inspires you to play with the openvswitch engine as a flow-based bridging solution in the kernel.

Whether you are new to Linux or have experience, downloading this cheat sheet can assist you when encountering tasks you haven’t done lately.

Last updated: April 5, 2017