Open vSwitch (OVS) is an open source framework for software-defined networking (SDN) and is useful in virtualized environments. Just like conventional network stacks, OVS can offload tasks to the hardware running on the network interface card (NIC) to speed up the processing of network packets.
However, dozens of functions are invoked in a chain to achieve hardware offload. This article takes you through the chain of functions to help you debug networking problems with OVS.
This article assumes that you understand the basics of OVS and hardware offload. To accompany your study of this article, you should be familiar with network commands, particularly Linux's tc
(traffic control) command, in order to dump traffic flows and see whether they are offloaded.
For the flow illustrated in this article, I used a Mellanox NIC.
The start of a network transmission
Let's start our long journey from an add/modify OVS operation to the hardware drivers with the first few functions called. Figure 1 shows each function at the beginning of the process, and the file in OVS that defines the function.
There are two ways that flows can be installed in the datapath. One is dpctl_add_flow
, which can be used to manually inject the flow to the datapath as shown in Figure 1. But ovs-dpctl
is not a common way of injecting datapath flows. Typically what happens is that the handler thread in OVS receives an upcall from dpif
, processes it, and installs the flow via dpif_operate()
as shown in Figure 2.
OVS offload operations
The dpif_netlink_operate
function is registered to the function pointer dpif->dpif_class->operate
. Calling the function leads to the call stack in Figure 3.
OVS's /lib/netdev-offload.c
file defines a netdev_register_flow_api_provider
function. The chain of calls continues through a function pointer registered as follows:
netdev_register_flow_api_provider(&netdev_offload_tc);
The netdev_tc_flow_put
function is assigned to the .flow_put
struct member as shown in the following excerpt:
const struct netdev_flow_api netdev_offload_tc = {
.type = "linux_tc",
…
.flow_put = netdev_tc_flow_put,
…
};
After the call reaches netdev_tc_flow_put
, the chain of calls continues as shown in Figure 4.
Sequence from a tc command
Let's leave our pursuit of offloading in the chain of OVS calls for a moment and look at a more conventional sequence of calls. Without OVS in the picture, a call from the tc
utility proceeds as shown in Figure 5.
Whether sendmsg
is issued from tc
, from OVS, or from another sender, the message goes to the kernel and then to the hardware driver.
The call to sendmsg
Now let's continue from where we had paused earlier in sendmsg
. The chain of functions continues as shown in Figure 6.
The Linux kernel registers the following functions to Routing Netlink (rtnl
) subsystem:
tc_new_tfilter
tc_del_tfilter
tc_get_tfilter
tc_ctl_chain
These functions are registered by calling rtnl_register
in the net/sched/cls_api.c
file. The RTM_NEWCHAIN
, RTM_GETCHAIN
, and RTM_DELCHAIN
operations take place in tc_ctl_chain
. In turn, rtnl_register
invokes rtnl_register_internal
, defined in net/core/rtnetlink.c
.
The sequence continues based on functions registered to the rtnl
subsystem. tc_new_tfilter
, defined in net/sched/cls_api.c
, invokes the function pointer registered to tp->ops->change
, and ends up calling fl_change
from the net/sched/cls_flower.c
file.
fl_change
checks whether the skip_hw
or skip_sw
policy is present. If the tc-policy
is skip_hw
, the flow is just added to tc
and the function returns.
Figure 7 takes a deeper look into the fl_change
function. It has changed somewhat in the latest kernel version, but the control flow is pretty much the same as the one shown in the figure.
If tc-policy
is unset or skip_sw
, the call sequence tries to add the flow to the hardware. Because we are interested in flows that get offloaded to hardware, we continue our journey further. The sequence of calls is the following, invoking functions
fl_hw_replace_filter (cls_flower.c) --> tc_setup_cb_add (cls_api.c) --> __tc_setup_cb_call (cls_api.c)
Finally, in the device driver
From here, the sequence goes to the hardware driver that was registered for the sender when Linux set up traffic control as part of its init sequence. For instance, the following code defines our Mellanox driver as the recipient of the message:
.ndo_setup_tc = mlx5e_setup_tc,
The mlx5e_setup_tc
function issues the following call to register the socket buffer's control block (CB):
flow_block_cb_setup_simple(type_data, &mlx5e_block_cb_list, mlx5e_setup_tc_block_cb, priv, priv, true);
In our case, the Mellanox hardware function named mlx5e_setup_tc_block_cb
gets called.
So now we have reached the Mellanox driver code. A few more calls and we can see how the flow rule is added to the flow table for hardware offload (Figure 8).
The drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
registers the following function, and the sequence continues as shown in Figure 9.
.create_fte = mlx5_cmd_create_fte,
The final function in Figure 9 invokes a command that adds the flow rule to the hardware. With this result, we have reached our destination.
Conclusion
I hope this helps you understand what happens while adding a flow for hardware offload, and helps you troubleshoot problems you might encounter. To learn more about the basics of Open vSwitch hardware offload, I recommend reading Haresh Khandelwal's blog post on the subject.
Last updated: December 14, 2021