Introduction
This article is about debugging out-of-memory issues with Open vSwitch with the Data Plane Development Kit (OvS-DPDK). It explains the situations in which you can run out of memory when using OvS-DPDK and it shows the log entries that are produced in those circumstances. It also shows some other log entries and commands for further debugging.
When you finish reading this article, you will be able to identify that you have an out-of-memory issue and you'll know how to fix it. Spoiler: Usually having some more memory on the relevant NUMA node works. It is based on OvS 2.9.
Background
As is normal with DPDK-type applications, it is expected that hugepage memory has been set up and mounted. For further information see set up huge pages.
The next step is to specify the amount of memory pre-allocated for OvS-DPDK. This is done using the Open vSwitch Database (OVSDB). In the case below, 4GB of huge-page memory is pre-allocated on NUMA node 0 and NUMA node 1.
# ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=4096,4096
The default is 1GB for NUMA 0 if dpdk-socket-mem
is not specified.
Now, let's look at the times when we can run out of memory.
Initialization
You can run out of memory when DPDK is initialized, which happens when ovs-vswitchd
is running and the OVSDB entry dpdk-init
is set to true
.
A useful log entry to watch for during initialization is this:
|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --socket-mem 4096,4096
This will confirm that the dpdk-socket-mem
you thought you were setting was actually set and passed to DPDK (thus avoiding the embarrassment of someone else pointing out that your scripts were wrong).
The most likely way to run out of memory during initialization is that huge page memory was not set up correctly:
|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --socket-mem 4096,4096 |dpdk|INFO|EAL: 32 hugepages of size 1073741824 reserved, but no mounted hugetlbfs found for that size
Another way is that you are requesting too much memory:
|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --socket-mem 32768,0 |dpdk|ERR|EAL: Not enough memory available on socket 0! Requested: 32768MB, available: 16384MB
Or you request none at all:
|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x1 --socket-mem 0,0 |dpdk|ERR|EAL: invalid parameters for --socket-mem
All these issues can be fixed by correctly setting up huge pages and requesting to pre-allocate an appropriate amount.
Adding a Port or Changing the MTU
These situations are grouped together because they can both result in a new pool of buffers being requested for a port. Where possible, these pools of buffers will be shared and reused, but that is not always possible due to differing port NUMA nodes or MTUs.
For new requests, the size of each buffer is fixed (MTU-based) but the number of buffers can be variable and OvS-DPDK will retry for a lower number of buffers if there is not enough memory for initial requests.
When DPDK cannot provide the requested memory to any one of the requests, it reports the following:
|dpdk|ERR|RING: Cannot reserve memory
While that may look serious, it's nothing to worry about because OvS handles this and simply retries for a lower amount. If however, the retries do not work then the following will be in the log:
|netdev_dpdk|ERR|Failed to create memory pool for netdev dpdk0, with MTU 9000 on socket 0: Cannot allocate memory
This case is an issue for functionality.
- If you were adding a port, it will not be usable.
- If you were changing the MTU, the MTU change fails but the port will continue to operate with the previous MTU.
How can you fix these errors? The general guide would be just to give OvS-DPDK more memory on the relevant NUMA node, or stick with a lower MTU.
Starting a VM
It doesn't seem obvious why you would run out of memory when starting a VM, as opposed to when you are adding a vhost port for it (previous section). The key is vhost NUMA reallocation.
When a VM is started, DPDK checks the NUMA node of the memory shared from the guest. This may result in requesting a new pool of buffers from the same NUMA node. But of course, there might be no memory pre-allocated with dpdk-socket-mem
on that NUMA node, or else there might be insufficient memory left.
The log entry would be similar to the add port/change MTU cases:
|netdev_dpdk|ERR|Failed to create memory pool for netdev vhost0, with MTU 1500 on socket 1: Cannot allocate memory
The fix for this is having enough memory on the relevant NUMA node, or changing the libvirt/QEMU settings so VM memory is from a different NUMA node.
Runtime, Adding a Port, or Adding Queues
Didn't we already cover adding a port? Yes, we did; however, this section is for when we get a requested pool of buffers, but some time later that proves to be insufficient.
This might be because there are many ports and queues sharing a pool of buffers and by the time some buffers are reserved for Rx queues, some are in flight processing and some are waiting to be returned from Tx queues, so there just aren't enough buffers to go around.
For example, the log entries when this occurs while using a physical NIC could look like this:
|dpdk|ERR|PMD: ixgbe_alloc_rx_queue_mbufs(): RX mbuf alloc failed queue_id=0 |dpdk|ERR|PMD: ixgbe_dev_rx_queue_start(): Could not alloc mbuf for queue:0 |dpdk|ERR|PMD: ixgbe_dev_start(): Unable to start rxtx queues |dpdk|ERR|PMD: ixgbe_dev_start(): failure in ixgbe_dev_start(): -1 |netdev_dpdk|ERR|Interface dpdk0 start error: Input/output error
For vhost ports, buffers are not reserved but you could see at runtime that you cannot get a new buffer while polling vhost ports. The log entry could look like this:
|dpdk(pmd91)|ERR|VHOST_DATA: Failed to allocate memory for mbuf.
If all the ports are needed, the easiest way to resolve this is to reduce the numbers of Rx queues or reserved buffers for the physical NICs. This can be done with the following command:
# ovs-vsctl set Interface dpdk0 options:n_rxq=4
or with this command:
# ovs-vsctl set Interface dpdk0 options:n_rxq_desc=1024
Alternatively, memory could be increased to ensure that a large pool of buffers will be available (that is, avoiding retries for lower amounts) but that approach scales only so far.
Further Debugging
If you run out of memory, there will be an error message in the log. If you want further details about the pools of memory being allocated, reused, and freed, you can turn on debug mode:
# ovs-appctl vlog/set netdev_dpdk:console:dbg # ovs-appctl vlog/set netdev_dpdk:syslog:dbg # ovs-appctl vlog/set netdev_dpdk:file:dbg
Allocated, reused, and freed messages will look like this:
|netdev_dpdk|DBG|Allocated "ovs_mp_2030_0_262144" mempool with 262144 mbufs |netdev_dpdk|DBG|Reusing mempool "ovs_mp_2030_0_262144" |netdev_dpdk|DBG|Freeing mempool "ovs_mp_2030_0_262144"
The name of the pool of buffers (that is, mempool
) gives us some information:
2030 : Padded size of the buffer (derived from MTU) 0 : NUMA node the memory is allocated from 262144 : Number of buffers in the pool
There is also a command to show which mempool
a port is using, as well as lots of other details (not shown):
# ovs-appctl netdev-dpdk/get-mempool-info dpdk0 mempool <ovs_mp_2030_0_262144>@0x7f35ff77ce40 ...
Wrap-up
If you have read to here, it probably means you've hit an issue with OvS-DPDK. Sorry to hear that. Hopefully, after reading the above guide you'll be able to identify if the issue was due to running out of memory and you'll know how to fix it.
Some guidance on how much memory is required and how to configure OvS-DPDK for multi-NUMA (including dpdk-socket-mem
) can be found in the OVS-DPDK: How much memory and OVS-DPDK: Multi-NUMA articles on this blog.