Open Virtual Network

The most common problem when people are trying to deploy an Open vSwitch with Data Plane Development Kit (OvS-DPDK) solution is that the performance is not as expected. For example, they are losing packets. This is where our journey for this series of blogs will start.

This first blog is about Poll Mode Driver (PMD) thread core affinity. It covers how to configure thread affinity and how to verify that it’s set up correctly. This includes making sure no other threads are using the CPU cores.

Dedicate CPU Cores to the PMD Threads

PMD threads are the threads that handle the receiving and processing of packets from the assigned receive queues. They do this in a tight loop, and anything interrupting these threads can cause packets to be dropped. That is why these threads must run on a dedicated CPU core; that is, no other threads in the system should run on this core. This is also true for various Linux kernel tasks.

Let’s assume you would like to use CPU cores 1 and 15 (a single hyper-thread pair) for your PMD threads. This will convert into a pmd-cpu-mask mask of 0x8002.

To manually accomplish the isolation you have to do the following:

  • Use the Linux kernel command line option isolcpus to isolate the PMD cores from the general SMP balancing and scheduling algorithms. For the example above, you would use the following: isolcpus=1,15. Please note that the isolcpus= parameter is deprecated in favor of cpusets. For more information check the kernel documentation.
  • Reducing the number of clock tick interrupts can be done with the combined nohz=on nohz_full=1,15 command-line options. This reduces the times the PMD threads get interrupted for servicing timer interrupts. More details on this subject can be found here: NO_HZ.txt
  • For the above to work correctly we need another command-line option, rcu_nocbs=1,15, or else the kernel will still interrupt the thread; details are in the same document: NO_HZ.txt.

NOTE: For the above kernel options you might need to add additional cores that also need isolation. For example, cores assigned to one or more virtual machines and the cores configured by the dpdk-lcore-mask.

To make all of the above more convenient you could use a tuned profile called cpu-partitioning for this. There is a somewhat older blog on tuned that might be helpful. However, in short, this is how you configure it:

# systemctl enable tuned
# systemctl start tuned
# echo isolated_cores=1,15 >> /etc/tuned/cpu-partitioning-variables.conf
# echo no_balance_cores=1,15 >> /etc/tuned/cpu-partitioning-variables.conf
# tuned-adm profile cpu-partitioning
# reboot

NOTE: You still need to set up isolcpus manually as indicated above to achieve 0-loss performance.

Verify CPU Assignments

Command-line Options

First, check the Linux command-line options to see if they are configured as expected:

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 \
root=/dev/mapper/rhel_wsfd--netdev64-root ro crashkernel=auto \
rd.lvm.lv=rhel_wsfd-netdev64/root rd.lvm.lv=rhel_wsfd-netdev64/swap \
console=ttyS1,115200 iommu=pt intel_iommu=on \
default_hugepagesz=1G hugepagesz=1G hugepages=32 \
isolcpus=1,2,3,4,5,6,15,16,17,18,19,20 skew_tick=1 \
nohz=on nohz_full=1,2,3,4,5,6,15,16,17,18,19,20 \
rcu_nocbs=1,2,3,4,5,6,15,16,17,18,19,20
tuned.non_isolcpus=0fe07f81 intel_pstate=disable nosoftlockup

PMD Thread Affinity

Second, make sure the PMD threads are/will be running on the correct threads. In this example you see they are assigned to the wrong CPUs, 11 and 27:

# pidstat -t -p `pidof ovs-vswitchd` 1 | grep -E pmd\|%CPU
06:41:21      UID      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
06:41:22      995         -      1316  100.00    0.00    0.00  100.00    27  |__pmd33
06:41:22      995         -      1317  100.00    0.00    0.00  100.00    11  |__pmd32
06:41:22      UID      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
06:41:23      995         -      1316  100.00    0.00    0.00  100.00    27  |__pmd33
06:41:23      995         -      1317  100.00    0.00    0.00  100.00    11  |__pmd32

In this case, it’s due to a known bug in tuned that moves away processes running on the isolated cores prior to its initialization. Running sytemctl restart openvswitch will solve this specific issue:

# systemctl restart openvswitch
# pidstat -t -p `pidof ovs-vswitchd` 1 | grep -E pmd\|%CPU
06:44:01      UID      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
06:44:02      995         -      2774  100.00    0.00    0.00  100.00     1  |__pmd32
06:44:02      995         -      2775  100.00    0.00    0.00  100.00    15  |__pmd33
06:44:02      UID      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
06:44:03      995         -      2774  100.00    0.00    0.00  100.00     1  |__pmd32
06:44:03      995         -      2775  100.00    0.00    0.00  100.00    15  |__pmd33

NOTE: Until the tuned issue is fixed, you always have to restart OVS after a tuned restart to get the correct CPU assignments!

To be 100% sure the Linux kernel is not scheduling your PMD thread on another core use the taskset command:

# taskset -pc 2774
pid 2774's current affinity list: 1
# taskset -pc 2775
pid 2775's current affinity list: 15

Other Threads Using the PMD Cores

Finally, make sure that no other threads are scheduled on the PMD cores. The following command will give the CPU affinity for all running userspace threads:

find -L /proc/[0-9]*/exe ! -type l | cut -d / -f3 | \
  xargs -l -i sh -c 'ps -p {} -o comm=; taskset -acp {}'

Below is a partial example output. Here you can see that my bash process is using the PMD reserved CPU cores:

...
agetty
pid 1443's current affinity list: 0,7-14,21-27
bash
pid 14863's current affinity list: 0-15
systemd
pid 1's current affinity list: 0,7-14,21-27
ovs-vswitchd
pid 3777's current affinity list: 2
pid 3778's current affinity list: 0,7-14,21-27
pid 3780's current affinity list: 2
pid 3781's current affinity list: 16
pid 3782's current affinity list: 2
pid 3785's current affinity list: 2
pid 3786's current affinity list: 2
pid 3815's current affinity list: 1
pid 3816's current affinity list: 15
pid 3817's current affinity list: 2
pid 3818's current affinity list: 2
pid 3819's current affinity list: 2
pid 3820's current affinity list: 2
...

NOTE: You could also use a tool called Tuna, to list all processes running on a specific core.

If you verify all the above, you will prevent other threads and the kernel from interfering with the PMD threads. Don’t forget to re-check the above if you make any major changes to your environment.

 

Last updated: March 23, 2022