Perl in RHEL 8

XDP: From zero to 14 Mpps

In past years, the kernel community has been using different approaches in the quest for ever-increasing networking performance. While improvements have been measurable in several areas, a new wave of architecture-related security issues and related counter-measures has undone most of the gains, and purely in-kernel solutions for some packet-processing intensive workloads still lag behind the bypass solution, namely Data Plane Development Kit (DPDK), by almost an order of magnitude.

But the kernel community never sleeps (almost literally) and the holy grail of kernel-based networking performance has been found under the name of XDP: the eXpress Data Path. XDP is available in Red Hat Enterprise Linux 8, which you can download and run now.

This technology allows you to implement new networking features (and/or re-implement existing ones) via custom extended BPF (eBPF) programs attached to the kernel packet processing deep down the stack, killing the overhead of socket buffers (SKBs) management, reducing the per-packet memory management overhead, and allowing more-effective bulking. XDP eBPF programs have access to helpers for packet manipulation and packet forwarding, offering almost unlimited opportunity to change and extend the kernel behavior without the need to add new in-kernel code—and, while at it, reaching a higher possible processing speed. A modern driver with XDP support can easily handle more than 14 Mpps.

Excited by this opportunity, but scared by the unknown? This article will guide you towards your first XDP program, building a working example from zero and allowing you to build a light-speed network application from there.

XDP overview

XDP allows you to attach an eBPF program to a lower-level hook inside the kernel. Such a hook is implemented by the network device driver, inside the ingress traffic processing function (usually, the NAPI poll() method), before an SKB is allocated for the current packet.

The program entry function has a single argument: a context describing the current packet. The program can manipulate such packet in arbitrary ways, but it has to respect the constraints imposed by the eBPF verifier (a little more on this later). Finally, such program must return a controlling action to the owning device driver, specifying how the device driver should cope next with the processed packet, for example, passing it to the upper-layer processing or dropping it.

The eBPF verifier imposes some restrictions on the code you can write, performing rigorous checks to ensure that the program:

  • contains no loop
  • accesses only valid memory (for example, does not exceed the packet boundaries)
  • uses a limited number of eBPF instructions, since the limit in the Linux 4.14 kernel is 128K

The verifier itself runs inside the kernel and is executed when the eBPF program is loaded via the eBPF syscall. Once the program has been successfully loaded, it can be attached to the XDP hook for any number of devices. The kernel source bundles a user-space library (libbpf) with helper functions to simplify such tasks.

Adventurous (or advanced) users can write an eBPF program directly using the eBPF assembler, but it's much easier to use a higher-level programming language and let the compiler translate the code to eBPF. The eBPF community selected the LLVM compiler and the C language for such a task. Since the verifier takes action on compiled code, any optimization performed by the compiler can affect the results. For example, loops bound to a constant number of iterations are unrolled by the compiler. This allows you to circumvent the "no loops" constraint, but it also increases the number of eBPF instructions generated; nested loops can easily hit the 128K limit.

XDP flavors

Not all network device drivers implement the XDP hook. In such a case, you may fall back to the generic XDP hook, implemented by the core kernel and available regardless of the specific network device driver feature. Since such a hook takes place later in the networking stack, after SKB allocation, the performance observed there is much lower than the driver-based XDP hook, but it still allows experimenting with XDP. The network device drivers supporting the XDP hook in Linux 4.18 and later are:

  • bnxt
  • thunder
  • i40e
  • ixgbe
  • mlx4
  • mlx5
  • nfp
  • qede
  • tun
  • virtio_net

The list can vary with the kernel version, so it's worth checking for explicit support for your own driver in the running kernel; see below for how to do that.

Finally, the XDP/eBPF program behavior can be controlled and inspected by the user-space via a number of maps defined and used by the program. Such maps are accessible and modifiable also by the user-space, still using the libbpf helpers.

"Hello world"

XDP is not a programming language, butit it uses a programming language (eBPF), and every programming-related tutorial has to start with a "Hello world" program. While we can somewhat echo debug messages to XDP/eBPF program—and we will, later—we'll start instead from an even simpler XDP example: an eBPF program that does almost nothing, just passing each processed packet up to the kernel stack.

// SPDX-License-Identifier: GPL-2.0

#define KBUILD_MODNAME "xdp_dummy"
#include <uapi/linux/bpf.h>
#include "bpf_helpers.h"

SEC("proc")
int xdp_dummy(struct xdp_md *ctx)
{
    return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

Let's look at the details for the code above. It uses C-language syntax and includes two external headers. The first one, provided by the kernel-header package of most distributions, contains the definition of the XDP program return code, XDP_PASS in this example, meaning that the kernel will pass along the packet to network processing. Other available values are XDP_DROP (does the same), XDP_TX (sends back the packet out of the interface that received it), XDP_REDIRECT (sends the packet out of another interface).

The second header, which still is part of the Linux kernel but is not usually packaged by most distributions, contains a list of the available eBPF helpers and
the definition of the SEC() macro. The latter is used to place a fragment of the compiled object in different ELF sections. Such sections will be interpreted by
the eBPF loader to detect, for example, the maps defined by the program (and allow user-space access to them).

Finally, the last line formally specifies the license associated with this program. Some eBPF helpers are accessible only by GPLed programs, and the verifier will use this info to enforce such a restriction.

Some rough edges

Let's build and run it! LLVM/Clang version 3.7 or later (version 6.0 was used for this writing, the Fedora 28 default) and a reasonable, not-obsolete
make version (here, 4.2.1) are needed. This is the makefile used:

KDIR ?= /lib/modules/$(shell uname -r)/source
CLANG ?= clang
LLC ?= llc
ARCH := $(subst x86_64,x86,$(shell arch))

BIN := xdp_dummy.o
CLANG_FLAGS = -I. -I$(KDIR)/arch/$(ARCH)/include \
-I$(KDIR)/arch/$(ARCH)/include/generated \
-I$(KDIR)/include \
-I$(KDIR)/arch/$(ARCH)/include/uapi \
-I$(KDIR)/arch/$(ARCH)/include/generated/uapi \
-I$(KDIR)/include/uapi \
-I$(KDIR)/include/generated/uapi \
-include $(KDIR)/include/linux/kconfig.h \
-I$(KDIR)/tools/testing/selftests/bpf/ \
-D__KERNEL__ -D__BPF_TRACING__ -Wno-unused-value -Wno-pointer-sign \
-D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-Wno-unknown-warning-option \
-O2 -emit-llvm

all: $(BIN)

xdp_dummy.o: xdp_dummy.c
$(CLANG) $(CLANG_FLAGS) -c $< -o - | \
$(LLC) -march=bpf -mcpu=$(CPU) -filetype=obj -o $@

It is fairly non-trivial: because the program uses kernel headers, we need to cope with architecture specific headers dependencies. Invoking make on your distribution of choice with the above makefile can be disappointing, as you will probably get something like this:

clang -nostdinc -I. \
[... long argument list elided for clarity ...]
-O2 -emit-llvm -c xdp_dummy.c -o - | \
llc -march=bpf -mcpu= -filetype=obj -o xdp_dummy.o
xdp_dummy.c:5:10: fatal error: 'bpf_helpers.h' file not found
#include "bpf_helpers.h"
^~~~~~~~~~~~~~~
1 error generated.

Almost no distribution packages include the complete kernel sources, including the required bpf_helper.h. The current solution is, unfortunately, to download the full Linux sources, unpack them somewhere on your local disc, and invoke make as follow:

KDIR= make

Now we have our eBPF/XDP program, but we must load it inside the kernel to get any effect. For a less-trivial example, the load part is usually done by a controlling
user-space program, which will also monitor/interact with the XDP program via some maps. For instant gratification, we can use iproute, instead:

ip link set dev <net device name> xdp object xdp_dummy.o

The above code attaches our XDP program to the specified network device, trying to use the device hook and falling back to the generic one otherwise. Those who are brave and have a supported device driver can replace xdp with xdpdrv to force the usage of the driver-specific hook.  Attaching will fail if xdpdrv is not available.

What's next

With your newly written XDP program, you can experience unprecedented speed in packet filtering—unless you already did that by unplugging the Ethernet cable— as a modern driver with XDP support can easily handle more than 14 Mpps! But you could be interested in doing something useful, like dropping specified packets or collecting statistics about the XDP program activity. In the next article in this series, we will deal with that, moving on to a less-trivial example by introducing packet parsing, debugging, and map usage!

You might also be interested in this article: Using eXpress Data Path (XDP) maps in RHEL 8: Part 2.

Last updated: March 24, 2023