The most common problem when people are trying to deploy an Open vSwitch with Data Plane Development Kit (OvS-DPDK) solution is that the performance is not as expected. For example, they are losing packets. This is where our journey for this series of blogs will start.
This first blog is about Poll Mode Driver (PMD) thread core affinity. It covers how to configure thread affinity and how to verify that it’s set up correctly. This includes making sure no other threads are using the CPU cores.
Dedicate CPU Cores to the PMD Threads
PMD threads are the threads that handle the receiving and processing of packets from the assigned receive queues. They do this in a tight loop, and anything interrupting these threads can cause packets to be dropped. That is why these threads must run on a dedicated CPU core; that is, no other threads in the system should run on this core. This is also true for various Linux kernel tasks.
Let’s assume you would like to use CPU cores 1 and 15 (a single hyper-thread pair) for your PMD threads. This will convert into a pmd-cpu-mask
mask of 0x8002
.
To manually accomplish the isolation you have to do the following:
- Use the Linux kernel command line option
isolcpus
to isolate the PMD cores from the general SMP balancing and scheduling algorithms. For the example above, you would use the following:isolcpus=1,15
. Please note that theisolcpus=
parameter is deprecated in favor of cpusets. For more information check the kernel documentation. - Reducing the number of clock tick interrupts can be done with the combined
nohz=on nohz_full=1,15
command-line options. This reduces the times the PMD threads get interrupted for servicing timer interrupts. More details on this subject can be found here: NO_HZ.txt - For the above to work correctly we need another command-line option,
rcu_nocbs=1,15
, or else the kernel will still interrupt the thread; details are in the same document: NO_HZ.txt.
NOTE: For the above kernel options you might need to add additional cores that also need isolation. For example, cores assigned to one or more virtual machines and the cores configured by the dpdk-lcore-mask
.
To make all of the above more convenient you could use a tuned profile called cpu-partitioning
for this. There is a somewhat older blog on tuned that might be helpful. However, in short, this is how you configure it:
# systemctl enable tuned # systemctl start tuned # echo isolated_cores=1,15 >> /etc/tuned/cpu-partitioning-variables.conf # echo no_balance_cores=1,15 >> /etc/tuned/cpu-partitioning-variables.conf # tuned-adm profile cpu-partitioning # reboot
NOTE: You still need to set up isolcpus
manually as indicated above to achieve 0-loss performance.
Verify CPU Assignments
Command-line Options
First, check the Linux command-line options to see if they are configured as expected:
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-693.17.1.el7.x86_64 \ root=/dev/mapper/rhel_wsfd--netdev64-root ro crashkernel=auto \ rd.lvm.lv=rhel_wsfd-netdev64/root rd.lvm.lv=rhel_wsfd-netdev64/swap \ console=ttyS1,115200 iommu=pt intel_iommu=on \ default_hugepagesz=1G hugepagesz=1G hugepages=32 \ isolcpus=1,2,3,4,5,6,15,16,17,18,19,20 skew_tick=1 \ nohz=on nohz_full=1,2,3,4,5,6,15,16,17,18,19,20 \ rcu_nocbs=1,2,3,4,5,6,15,16,17,18,19,20 tuned.non_isolcpus=0fe07f81 intel_pstate=disable nosoftlockup
PMD Thread Affinity
Second, make sure the PMD threads are/will be running on the correct threads. In this example you see they are assigned to the wrong CPUs, 11 and 27:
# pidstat -t -p `pidof ovs-vswitchd` 1 | grep -E pmd\|%CPU 06:41:21 UID TGID TID %usr %system %guest %CPU CPU Command 06:41:22 995 - 1316 100.00 0.00 0.00 100.00 27 |__pmd33 06:41:22 995 - 1317 100.00 0.00 0.00 100.00 11 |__pmd32 06:41:22 UID TGID TID %usr %system %guest %CPU CPU Command 06:41:23 995 - 1316 100.00 0.00 0.00 100.00 27 |__pmd33 06:41:23 995 - 1317 100.00 0.00 0.00 100.00 11 |__pmd32
In this case, it’s due to a known bug in tuned
that moves away processes running on the isolated cores prior to its initialization. Running sytemctl restart openvswitch
will solve this specific issue:
# systemctl restart openvswitch # pidstat -t -p `pidof ovs-vswitchd` 1 | grep -E pmd\|%CPU 06:44:01 UID TGID TID %usr %system %guest %CPU CPU Command 06:44:02 995 - 2774 100.00 0.00 0.00 100.00 1 |__pmd32 06:44:02 995 - 2775 100.00 0.00 0.00 100.00 15 |__pmd33 06:44:02 UID TGID TID %usr %system %guest %CPU CPU Command 06:44:03 995 - 2774 100.00 0.00 0.00 100.00 1 |__pmd32 06:44:03 995 - 2775 100.00 0.00 0.00 100.00 15 |__pmd33
NOTE: Until the tuned
issue is fixed, you always have to restart OVS after a tuned
restart to get the correct CPU assignments!
To be 100% sure the Linux kernel is not scheduling your PMD thread on another core use the taskset
command:
# taskset -pc 2774 pid 2774's current affinity list: 1 # taskset -pc 2775 pid 2775's current affinity list: 15
Other Threads Using the PMD Cores
Finally, make sure that no other threads are scheduled on the PMD cores. The following command will give the CPU affinity for all running userspace threads:
find -L /proc/[0-9]*/exe ! -type l | cut -d / -f3 | \ xargs -l -i sh -c 'ps -p {} -o comm=; taskset -acp {}'
Below is a partial example output. Here you can see that my bash
process is using the PMD reserved CPU cores:
... agetty pid 1443's current affinity list: 0,7-14,21-27 bash pid 14863's current affinity list: 0-15 systemd pid 1's current affinity list: 0,7-14,21-27 ovs-vswitchd pid 3777's current affinity list: 2 pid 3778's current affinity list: 0,7-14,21-27 pid 3780's current affinity list: 2 pid 3781's current affinity list: 16 pid 3782's current affinity list: 2 pid 3785's current affinity list: 2 pid 3786's current affinity list: 2 pid 3815's current affinity list: 1 pid 3816's current affinity list: 15 pid 3817's current affinity list: 2 pid 3818's current affinity list: 2 pid 3819's current affinity list: 2 pid 3820's current affinity list: 2 ...
NOTE: You could also use a tool called Tuna, to list all processes running on a specific core.
If you verify all the above, you will prevent other threads and the kernel from interfering with the PMD threads. Don’t forget to re-check the above if you make any major changes to your environment.
Last updated: March 23, 2022