This article is about the PMD thread load-based sleeping feature in Open vSwitch with a Data Plane Development Kit data path (OVS-DPDK) v3.2.
After reading this article, you will understand how to enable it, how it operates, and how it can save power with correct system configuration and traffic conditions.
You will also learn about the trade-off between power saving and wakeup latency and how to set a value in the pmd-sleep-max
config that best suits your use case.
A PMD thread approach to processing packets
PMD stands for Poll Mode Driver. In the context of OVS-DPDK, a PMD thread is a thread that runs 1:1 on a dedicated core to continually poll interfaces for packets.
This polling-based method is traditionally how a lot of DPDK-based applications are designed to operate, as it achieves high throughput and low latency.
However, when there is low or no traffic, the PMD thread might not need to be polling as fast as possible, and if it slowed down, it could still process the traffic on time while also saving power.
A simple approach to reducing work
There are different approaches available for reducing the work that a PMD thread does based on traffic load.
NAPI is a well-known interrupt/polling hybrid that requires all of the DPDK NICs drivers to support interrupt mode. It also requires different implementations for devices with different API, such as DPDK NICs using DPDK Ethdev API and vhost devices using the DPDK vhost library API. NAPI also means that the interfaces are polled or in interrupt mode, and there isn't a gradual transition between them, which means some low traffic can be enough that polling takes place as fast as possible when it isn't needed.
A simpler approach was taken that is agnostic to DPDK (or other) interfaces being polled by the PMD thread and does not require support at the device driver level. It also caters to high and low-load traffic with a gradual transition from fast to slower polling.
How simple? Very simple:
-
If there aren't many packets available (<16) on any of the RxQs being polled by the PMD thread, then the PMD thread will sleep for a little bit.
-
Still not many packets? Sleep for a little longer next time. Repeat.
-
More packets (16+) available now? Stop sleeping and start polling continually again.
Why do we need pmd-sleep-max?
The trade-off for sleeping is that packets on interface receive queues (RxQs) that the PMD thread is polling have to wait until the end of the sleep before they can be processed, so there may be an additional packet latency.
The good news is the maximum time to sleep can be tuned by using pmd-sleep-max
to find the right balance between power saving and packet latency.
Want more to save more power and can tolerate increased packet latency on a sudden burst of packets? Sure, set pmd-sleep-max
to a larger time.
Want to reduce packet latency on a sudden burst of packets but with less power saving? Sure, set pmd-sleep-max
to a shorter time.
For example, to allow a max requested sleep time of 100 us:
$ ovs-vsctl set Open_vSwitch . other_config:pmd-sleep-max=100
To disable any pmd sleeping, remove the config or set it to 0:
$ ovs-vsctl set Open_vSwitch . other_config:pmd-sleep-max=0
Road test
Let's see how it behaves with a range of different pmd-sleep-max
times and different traffic loads.
We'll use a typical Physical-Virtual-Physical test with dpdk-testpmd
forwarding packets in the guest as shown in Figure 1.
We must also ensure that system BIOS and any tuning software such as tuned
allows the use of processor low-power C-States.
Power usage
Figure 2 shows the measured power with different traffic rates and different pmd-sleep-max
times.
We’ll start with the right hand side at the 14.8 Mpps
(10Gbps 64 byte packets in this case) results. There is no time to sleep as packets are always waiting to be polled. Regardless of the pmd-sleep-max
time, there is no sleep or power saving.
For the 1 Mpps
case, we see that there is some power savings when pmd-sleep-max
is set, but the saving doesn't change with greater max sleeps. This is because even though a greater max sleep time is allowed, it is not reached as before it could, there were too many packets to process.
For the 1 Kpps
case, we start to see something interesting. In this case, when allowed, longer sleeps can be reached. Here, the max sleep setting directly affects how much power is saved. The longer the sleeps, the more power that is saved.
With the 100 us pmd-sleep-max
time, it is shown that power consumption has been reduced by a full 10 watts in this case.
For the 0 pps
case, we see similar power saving to the 1 Kpps
case. How can this be? The answer is that even though only one of these cases has traffic, most of the time is spent sleeping for the maximum amount of time allowed in both cases. So, both tests show a similar power saving.
Packet latency
This is measured with packets in different latency bins. It is just shown in Figure 3 for a 1 Kpps
packet rate as that’s the most interesting case.
An important point when considering packet latency is that there is an ingress and egress path through different PMD threads for this PVP test. So the packet latency shown is a round-trip latency, which is a combined latency from the packet's two journeys through OVS-DPDK PMD threads, either or both of which may sleep. Processor C-state wakeup latency may also contribute to wakeup packet latency.
It can be seen that as the pmd-sleep-max
time is increased, so is the packet latency. From an initial <10 us latency bin with no sleep allowed (pmd-sleep-max=0
) up to a 200-300 us latency bin for a pmd-sleep-max
of 200 us.
The packet latency measured for any non-zero pmd-sleep-max
value is a distribution of values. The distribution of values is because the PMD threads may be sleeping for a variable amount of time up to the maximum allowed.
Last notes
Information about how many loop iterations of the PMD thread had sleeps and their length can be seen with the other PMD thread statistics. (Don't forget to use pmd-stats-clear
to reset).
$ ovs-appctl dpif-netdev/pmd-perf-show
- sleep iterations: 25249 ( 99.6 % of iterations)
Sleep time (us): 2546186 (101 us/iteration avg.)
The PMD load-based sleeping feature is a simple feature that can result in power savings for OVS-DPDK under no or low traffic load conditions.
However, you must ensure that you have processor C-States enabled, and your results will vary depending on the processor and other workloads. You will also need to find the value of pmd-sleep-max
that best suits your environment, traffic profiles, and packet latency tolerance.