This article covers the IOThread Virtqueue Mapping feature for Kernel-based virtual machine (KVM) guests that was introduced in Red Hat Enterprise Linux (RHEL) 9.4.
The problemThe problem
Modern storage evolved to keep pace with growing numbers of CPUs by providing multiple queues through which I/O requests can be submitted. This allows CPUs to submit I/O requests and handle completion interrupts locally. The result is good performance and scalability on machines with many CPUs.
Although virtio-blk devices in KVM guests have multiple queues by default, they do not take advantage of multi-queue on the host. I/O requests from all queues are processed in a single thread on the host for guests with the <driver io=native …>
libvirt domain XML setting. This single thread can become a bottleneck for I/O bound workloads.
KVM guests can now benefit from multiple host threads for a single device through the new IOThread Virtqueue Mapping feature. This improves I/O performance for workloads where the single thread is a bottleneck. Guests with many vCPUs should use this feature to take advantage of additional capacity provided by having multiple threads.
If you are interested in the QEMU internals involved in developing this feature, you can find out more in this blog post and this KVM Forum presentation. Making QEMU’s block layer thread safe was a massive undertaking that we are proud to have contributed upstream.
How IOThread Virtqueue Mapping worksHow IOThread Virtqueue Mapping works
IOThread Virtqueue Mapping lets users assign individual virtqueues to host threads, called IOThreads, so that a virtio-blk device is handled by more than one thread. Each virtqueue can be assigned to one IOThread.
Most users will opt for round-robin assignment so that virtqueues are automatically spread across a set of IOThreads. Figure 1 illustrates how 4 queues are assigned in round-robin fashion across 2 IOThreads.
The libvirt domain XML for this configuration looks like this:
<domain>
…
<vcpu>4</vcpu>
<iothreads>2</iothreads>
…
<devices>
<disk …>
<driver name='qemu' cache=’none’ io=’native’ …>
<iothreads>
<iothread id='1'></iothread>
<iothread id='2'></iothread>
</iothreads>
More details on the syntax can be found in the libvirt documentation.
Configuration tipsConfiguration tips
The following recommendations are based on our experience developing and benchmarking this feature:
Use 4-8 IOThreads. Usually this is sufficient to saturate disks. Adding more threads beyond the point of saturation does not increase performance and may harm it.
Share IOThreads between devices unless you know in advance that certain devices are heavily utilized. Keeping a few IOThreads busy but not too busy is ideal.
Pin IOThreads away from vCPUs with
<iothreadpin>
and<vcpupin>
if you have host CPUs to spare. IOThreads need to respond quickly when the guest submits I/O. Therefore they should not compete for CPU time with the guest’s vCPU threads.Use
<driver io=”native” cache=”none” …>
. IOThread Virtqueue Mapping was designed forio=”native”
. Usingio=”threads”
is not recommended as it does not combine with IOThread Virtqueue Mapping in a useful way.
Performance
The following random read disk I/O benchmark compares IOThread Virtqueue Mapping with 2 and 4 IOThreads against a guest without IOThread Virtqueue Mapping (only 1 IOThread). The guest was configured with 8 vCPUs all submitting I/O in parallel. See Figure 2.
The most important fio
benchmark options are shown here:
fio --ioengine=libaio –rw=randread –bs=4k --numjobs=8 --direct=1
--cpus_allowed=0-7 --cpus_allowed_policy=split
This microbenchmark shows that when 1 IOThread is unable to saturate a disk, adding more IOThreads with IOThread Virtqueue Mapping is a significant improvement. Virtqueues were assigned round-robin to the IOThreads. The disk was an Intel Optane SSD DC P4800X and the guest was running Fedora 39 x86_64. The libvirt domain XML, fio options, benchmark output, and an Ansible playbook are available here.
Real workloads may benefit less depending on how I/O bound they are and whether they submit I/O from multiple vCPUs. We recommend benchmarking your workloads to understand the effect of IOThread Virtqueue Mapping.
A companion article explores database performance with IOThread Virtqueue Mapping.
ConclusionConclusion
The new IOThread Virtqueue Mapping feature in RHEL 9.4 improves scalability of disk I/O for guests with many vCPUs. Enabling this feature on your KVM guests with virtio-blk devices can boost performance of I/O bound workloads.
Last updated: September 11, 2024