Featured image for Improve multicore scaling in Open vSwitch DPDK.

A new feature in version 2.16 of Open vSwitch (OVS) helps developers scale the OVS-DPDK userspace datapath to use multiple cores. The Data Plane Development Kit (DPDK) is a popular set of networking libraries and drivers that provide fast packet processing and I/O.

After reading this article, you will understand the new group assignment type for spreading the datapath workload across multiple cores, how this type differs from the default cycles assignment type, and how to use the new type in conjunction with the auto load balance feature in the poll mode driver (PMD) to improve OVS-DPDK scaling.

How PMD threads manage packets in the userspace datapath

In the OVS-DPDK userspace datapath, receive queues (RxQs) store packets from an interface that need to be received, processed, and usually transmitted to another interface. This work is done in OVS-DPDK by PMD threads that run on a set of dedicated cores and that continually poll the RxQs for packets. In OVS-DPDK, these datapath cores are commonly referred to just as PMDs.

When there's more than one PMD, the workload should ideally be spread equally across them all. This prevents packet loss in cases where some PMDs may be overloaded while others have no work to do. In order to spread the workload across the PMDs, the interface RxQs that provide the packets need to be carefully assigned to the PMDs.

The user can manually assign individual RxQs to PMDs with the other_config:pmd-rxq-affinity option. By default, OVS-DPDK also automatically assigns them. In this article, we focus on OVS-DPDK's process for automatically assigning RxQs to PMDs.

OVS-DPDK automatic assignment

RxQs can be automatically assigned to PMDs when there is a reconfiguration, such as the addition or removal of either RxQs or PMDs. Automatic assignment also occurs if triggered by the PMD auto load balance feature or the ovs-appctl dpif-netdev/pmd-rxq-rebalance command.

The default cycles assignment type assigns the RxQs requiring the most processing cycles to different PMDs. However, the assignment also places the same or similar number of RxQs on each PMD.

The cycles assignment type is a trade-off between optimizing for the current workload and having the RxQs spread out across PMDs to mitigate against workload changes. The default type is designed this way because, when it was introduced in OVS 2.9, there was no PMD auto load balance feature to deal with workload changes.

The role of PMD auto load balance

PMD auto load balance is an OVS-DPDK feature that dynamically detects an imbalance created by the user in how the workload is spread across PMDs. If PMD auto load balance estimates that the workload can and should be spread more evenly, it triggers an RxQ-to-PMD reassignment. The reassignment, and the ability to rebalance the workload evenly among PMDs, depends on the RxQ-to-PMD assignment type.

PMD auto load balance is discussed in more detail in another article.

The group RxQ-to-PMD assignment type

In OVS 2.16, the cycles assignment type is still the default, but a more optimized group assignment type was added.

The main differences between these assignment types is that the group assignment type removes the trade-off of having similar numbers of RxQs on each PMD. Instead, this assignment type spreads the workload purely based on finding the best current balance of the workload across PMDs. This improved optimization is feasible now because the PMD auto load balance feature is available to deal with possible workload changes.

The group assignment type also scales better, because it recomputes the estimated workload on each PMD before every RxQ assignment.

The increased optimization can mean a more equally distributed workload and hence more equally distributed available capacity across the PMDs. This improvement, along with PMD auto load balance, can mitigate against changes in workload caused by changes in traffic profiles.

An RxQ-to-PMD assignment comparison

We can see some of the differing characteristics of cycles and group with an example. If we run OVS 2.16 with a couple RxQs and PMDs, we can check the log messages to confirm that the default cycles assignment type is used for assigning Rxqs to PMDs:

|dpif_netdev|INFO|Performing pmd to rx queue assignment using cycles algorithm.

Then we can take a look at the current RxQ PMD assignments and RxQ workload usage:

$ ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 8:
  isolated : false
  port: vhost1           queue-id:  0 (enabled)   pmd usage: 20 %
  port: dpdk0            queue-id:  0 (enabled)   pmd usage: 70 %
  overhead:  0 %
pmd thread numa_id 0 core_id 10:
  isolated : false
  port: vhost0           queue-id:  0 (enabled)   pmd usage: 20 %
  port: dpdk1            queue-id:  0 (enabled)   pmd usage: 30 %
  overhead:  0 %

The workload is visualized in Figure 1.

cycles RxQ-to-PMD assignment
Figure 1. cycles RxQ-to-PMD assignment.

The display shows that the cycles assignment type has done a good job keeping the two RxQs that require the most cycles (dpdk0 70% and dpdk1 30%) on different PMDs. Otherwise, one PMD would be at 100% and Rx packets might be dropped as a result.

The display also shows that the assignment insists on both PMDs having an equal number of RxQs, two each. This means that PMD 8 is 90% loaded while PMD 10 is 50% loaded.

That is not a problem with the current traffic profile, because both PMDs have enough processing cycles to handle the load. However, it does mean that PMD 8 has available capacity of only 10% to account for any traffic profile changes that require more processing. If, for example, the dpdk0 traffic profile changed and the required workload increased by more than 10%, PMD 8 would be overloaded and packets would be dropped.

Now we can look at how the group assignment type optimizes for this kind of scenario. First we enable the group assignment type:

$ ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=group

The logs confirm that is selected and immediately put to use:

|dpif_netdev|INFO|Rxq to PMD assignment mode changed to: `group`.

As mentioned earlier, the group assignment type eliminates the requirement of keeping the same number of RxQs per PMD, and bases its assignments on estimates of the least loaded PMD before every RxQ assignment. We can see how this policy affects the assignments:

$ ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 8:
  isolated : false
  port: dpdk0            queue-id:  0 (enabled)   pmd usage: 70 %
  overhead:  0 %
pmd thread numa_id 0 core_id 10:
  isolated : false
  port: vhost0           queue-id:  0 (enabled)   pmd usage: 20 %
  port: dpdk1            queue-id:  0 (enabled)   pmd usage: 30 %
  port: vhost1           queue-id:  0 (enabled)   pmd usage: 20 %
  overhead:  0 %

The workload is visualized in Figure 2.

group RxQ-to-PMD assignment type
Figure 2. group RxQ-to-PMD assignment type.

Now PMD 8 and PMD 10 both have total loads of 70%, so the workload is better balanced between the PMDs.

In this case, if the dpdk0 traffic profile changes and the required workload increases by 10%, it could be handled by PMD 8 without any packet drops because there is 30% available capacity.

An interesting case is where RxQs are new or have no measured workload. If they were all put on the least loaded PMD, that PMD's estimated workload would not change. It would keep being selected as the least loaded PMD and be assigned all the new RxQs. This might not be ideal if those RxQs later became active, so instead the group assignment type spread RxQs with no measured history among PMDs.

This example shows a change from the cycles to group assignment type during operation. Although that can be done,  the assignment type is typically set when initializing OVS-DPDK. Reassignments can then be triggered by any of the following mechanisms:

  • PMD auto load balance (providing that user-defined thresholds are met)
  • A change in configuration (adding or removing RxQs or PMDs)
  • The ovs-appctl dpif-netdev/pmd-rxq-rebalance command

Other RxQ considerations

All of the OVS-DPDK assignment types are constrained by the granularity of the workload on each RxQ. In the example in the previous section, it was possible to spread the workload evenly. In a case where dpdk0 was 95% loaded instead, PMD 8 would have a 95% load, while PMD 10 would have a 70% load.

If you expect an interface to have a high traffic rate and hence a high required load, it is worth considering the addition of more RxQs in order to help split the traffic for that interface. More RxQs mean a greater granularity to help OVS-DPDK spread the workload more evenly across the PMDs.

Conclusion

This article looked at the new group assignment type from OVS 2.16 for RxQ-to-PMD assignments.

Although the existing cycles assignment type might be good enough in many cases, the new group assignment type allows OVS-DPDK to more evenly distribute the workload across the available PMDs.

This dynamic assignment has the benefit of allowing more optimal use of PMDs and providing a more equally distributed available capacity across PMDs, which in turn can make them more resilient against workload changes. For larger changes in workload, the PMD auto load balance feature can trigger reassignments.

OVS 2.16 still has the same defaults as OVS 2.15, so users for whom OVS 2.15 multicore scaling is good enough can continue to use it by default after an upgrade. However, the new option is available if required.

Further information about OVS-DPDK PMDs can be found in the documentation.

Last updated: October 6, 2022