XDP (eXpress Data Path) is a powerful new networking feature in Linux that enables high-performance programmable access to networking packets before they enter the networking stack. But XDP has a high learning curve. Many developers have written introduction blogs for this feature, such as Paolo Abeni's Achieving high-performance, low-latency networking with XDP: Part I and Toke's Using the eXpress Data Path (XDP) in Red Hat Enterprise Linux 8.
XDP is based on extended Berkeley Packet Filter (eBPF) and is still fast-moving. The eBPF/XDP coding format and style are also changing. So developers are creating tools and frameworks to make eBPF and XDP applications easy to write. Two of these resources, the libbpf library and the xdp-tools utilities, are the topics of this article.
The article shows how to start writing XDP programs through the following tasks:
- Write and run a small introductory program:
- Write a program to drop all packets.
- Build and view a BPF object.
- Load a BPF object.
- Show information on a running BPF object.
- Unload a BPF object.
- Extend the program to let you deal with specific types of packets.
- Use a packet counter to use BPF maps.
- Add a customized userspace tool to load the BPF program.
The reader needs to be familiar with C code and IP header structures. All examples are tested with Red Hat Enterprise Linux (RHEL) 8.3.
Prerequisites
To prepare the developing environment, install the following packages:
$ sudo dnf install clang llvm gcc libbpf libbpf-devel libxdp libxdp-devel xdp-tools bpftool kernel-headers
Task 1: Write and run a simple program with XDP
This section teaches the minimal tasks you need to use XDP.
Task 1.1: Write a program to drop all packets
Let's start with a simple XDP program in C:
#include <linux/bpf.h> #include <bpf/bpf_helpers.h> SEC("xdp_drop") int xdp_drop_prog(struct xdp_md *ctx) { return XDP_DROP; } char _license[] SEC("license") = "GPL";
The linux/bpf.h
header file is provided by the kernel-header package, which defines all the supported BPF helpers and xdp_actions, like the XDP_DROP
action used in this example.
The bpf/bpf_helpers.h
header is provided by libbpf-devel, which provides some useful eBPF macros, like the SEC
macro used in this example. SEC
, short for section, is used to place a fragment of the compiled object in different ELF sections, which we will see in later output from the llvm-objdump
command.
The xdp_drop_prog()
function takes a parameter struct xdp_md *ctx
, which we have not used yet. I will talk about it later. This function returns XDP_DROP
, which means we will drop all incoming packets. Other XDP actions include XDP_PASS
, XDP_TX
, and XDP_REDIRECT
.
Finally, the last line formally specifies the license associated with this program. Some eBPF helpers are accessible only by GPL-licensed programs, and the verifier will use this information to enforce such a restriction.
Task 1.2: Build and dump the BPF object
Let's build the program in the previous section with clang
:
$ clang -O2 -g -Wall -target bpf -c xdp_drop.c -o xdp_drop.o
The -O
option specified which optimization level to use, and-g
generates debugging information.
You can use llvm-objdump
to show the ELF format after the build. llvm-objdump
is very useful if you want to know what a program does and you don't have the source code. The -h
option displays the sections in the object, and the -S
option displays the source interleaved with the disassembled object code. We'll show each of those options in turn.
$ llvm-objdump -h xdp_drop.o xdp_drop: file format ELF64-BPF Sections: Idx Name Size VMA Type 0 00000000 0000000000000000 1 .strtab 000000ad 0000000000000000 2 .text 00000000 0000000000000000 TEXT 3 xdp_drop 00000010 0000000000000000 TEXT 4 license 00000004 0000000000000000 DATA 5 .debug_str 00000125 0000000000000000 6 .debug_abbrev 000000ba 0000000000000000 7 .debug_info 00000114 0000000000000000 8 .rel.debug_info 000001c0 0000000000000000 9 .BTF 000001df 0000000000000000 10 .rel.BTF 00000010 0000000000000000 11 .BTF.ext 00000050 0000000000000000 12 .rel.BTF.ext 00000020 0000000000000000 13 .eh_frame 00000030 0000000000000000 DATA 14 .rel.eh_frame 00000010 0000000000000000 15 .debug_line 00000084 0000000000000000 16 .rel.debug_line 00000010 0000000000000000 17 .llvm_addrsig 00000002 0000000000000000 18 .symtab 000002d0 0000000000000000 $ llvm-objdump -S -no-show-raw-insn xdp_drop.o xdp_drop: file format ELF64-BPF Disassembly of section xdp_drop: 0000000000000000 xdp_drop_prog: ; return XDP_DROP; 0: r0 = 1 1: exit
Task 1.3: Load a BPF object
After you build the object, there are multiple ways to load it.
Warning: Do not load test XDP programs on the default interface. Instead, use the veth interface for testing. This is advisable to protect you from losing network connectivity because the program is dropping packets.
The easiest way to load the program is using the ip
command, like this:
$ sudo ip link set veth1 xdpgeneric obj xdp_drop.o sec xdp_drop
Butip
doesn't support the BPF Type Format (BTF) type map that we will talk about later. Although the latest ip-next release has fixed this issue by adding libbpf support, it has not been backported to the main line yet.
The recommended way to load the XDP object on RHEL is using xdp-loader
. This command depends on libbpf, which has full BTF support, and is the only way to load multiple programs on one interface.
To get Red Hat support for XDP, use libxdp
, as explained in the article XDP is conditionally supported in RHEL 8.3 release notes.
Now let's load the object on interface veth1
with xdp-loader
. We specify -m sbk
to use skb mode. Other possible modes include native and offload. But because these modes are not supported on all NIC drivers, we will just use skb mode in this article. The-s xdp_drop
option specifies the use of the section we created, xdp_drop
:
$ sudo xdp-loader load -m skb -s xdp_drop veth1 xdp_drop.o
Task 1.4: Show information on a running BPF object
There are also multiple ways to show information about a loaded XDP program:
$ sudo xdp-loader status CURRENT XDP PROGRAM STATUS: Interface Prio Program name Mode ID Tag Chain actions ------------------------------------------------------------------------------------- lo <no XDP program> ens3 <no XDP program> veth1 xdp_dispatcher skb 15 d51e469e988d81da => 50 xdp_drop_prog 20 57cd311f2e27366b XDP_PASS $ sudo bpftool prog show 15: xdp name xdp_dispatcher tag d51e469e988d81da gpl loaded_at 2021-01-13T03:24:43-0500 uid 0 xlated 616B jited 638B memlock 4096B map_ids 8 btf_id 12 20: ext name xdp_drop_prog tag 57cd311f2e27366b gpl loaded_at 2021-01-13T03:24:43-0500 uid 0 xlated 16B jited 40B memlock 4096B btf_id 16 $ sudo ip link show veth1 4: veth1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether ba:4d:98:21:3b:b3 brd ff:ff:ff:ff:ff:ff link-netns ns prog/xdp id 15 tag d51e469e988d81da jited
If you load your program with ip
cmd, only one XDP program can be loaded simultaneously. If you load your program withxdp-loader
, two programs will be loaded by default. One is xdp_dispatcher
, created by xdp_loader
, and the other is xdp_drop_prog
, written by us. The second command issued above shows that xdp_dispatcher
is running with ID 15 and xdp_drop_prog
is running with ID 20.
Task 1.5: Unload the XDP program
If you use ip cmd
to load the program, you can unload the program through:
$ sudo ip link set veth1 xdpgeneric off
Use the xdp
flag that corresponds to the way you loaded the file. In this example, we specified xdpgeneric off
because we loaded our program beginning with ip link set veth1 xdpgeneric obj
. Specify xdp off
if you loaded your beginning with ip link set veth1 xdp obj
.
To unload all XDP programs on an interface, issue xdp-loader
with the -a
option:
$ sudo xdp-loader unload -a veth1
Task 2: Drop specific packets with XDP
The first example dropped every packet, which has no practical use. Now let's do some real stuff. The example in this section drops all IPv6 packets:
#include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <linux/if_ether.h> #include <arpa/inet.h> SEC("xdp_drop") int xdp_drop_prog(struct xdp_md *ctx) { void *data_end = (void *)(long)ctx->data_end; void *data = (void *)(long)ctx->data; struct ethhdr *eth = data; __u16 h_proto; if (data + sizeof(struct ethhdr) > data_end) return XDP_DROP; h_proto = eth->h_proto; if (h_proto == htons(ETH_P_IPV6)) return XDP_DROP; return XDP_PASS; } char _license[] SEC("license") = "GPL";
Compare the code just shown with the first program. Here we added two more header files:linux/if_ether.h
to get theethhdr
struct and arpa/inet.h
to get the htons()
function.
The struct xdp_md
is also used in this example. It's defined (in Linux 5.10) like this:
struct xdp_md { __u32 data; __u32 data_end; __u32 data_meta; /* Below access go through struct xdp_rxq_info */ __u32 ingress_ifindex; /* rxq->dev->ifindex */ __u32 rx_queue_index; /* rxq->queue_index */ __u32 egress_ifindex; /* txq->dev->ifindex */ };
The packet contents lie between ctx->data
and ctx->data_end
. The data starts with an Ethernet header, so we assign the data toethhdr
like this:
void *data = (void *)(long)ctx->data; struct ethhdr *eth = data;
When accessing the data in struct ethhdr
, we must make sure we don't access invalid areas by checking whether data + sizeof(struct ethhdr) > data_end
, and returning without further action if it's true. This check is compulsory by the BPF verifer that verifies your program at runtime.
Then, determine whether the protocol in the Ethernet header is IPv6 by checking h_proto == htons(ETH_P_IPV6)
, and if it is, drop the packet by returning XDP_DROP
. For other packets, we just return XDP_PASS
.
Task 3: Map and count the processed packets
In the previous example, we dropped IPv6 packets. In this example, we will keep track of how many packets we dropped. The example introduces another BPF feature: maps. BPF maps are used to share data between the kernel and userspace. We can update the map data in the kernel and read it from userspace, or vice versa.
Here is an example of a new BTF-defined map (introduced by upstream commit abd29c931459):
struct { __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); __type(key, __u32); __type(value, long); __uint(max_entries, 1); } rxcnt SEC(".maps");
The map is named as rxcnt
with type BPF_MAP_TYPE_PERCPU_ARRAY
. This type indicates that we will have one instance of the map per CPU core; thus, if you have 4 cores, you will have 4 instances of the map. We will use each map to count how many packets are processed per core. The rest of the structure defines the key/value type and limits the maximum number entries to 1, because we need to count only one number (the number of received IPv6 packets). In C code, it would look like we defined an array on each CPU with a size of one, .e.g, unsigned int rxcnt[1]
.
Our program looks up the value of the rxcnt
entry with the function bpf_map_lookup_elem()
and updates the value. Here is the full code:
#include <linux/bpf.h> #include <bpf/bpf_helpers.h> #include <linux/if_ether.h> #include <arpa/inet.h> struct { __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); __type(key, __u32); __type(value, long); __uint(max_entries, 1); } rxcnt SEC(".maps"); SEC("xdp_drop_ipv6") int xdp_drop_ipv6_prog(struct xdp_md *ctx) { void *data_end = (void *)(long)ctx->data_end; void *data = (void *)(long)ctx->data; struct ethhdr *eth = data; __u16 h_proto; __u32 key = 0; long *value; if (data + sizeof(struct ethhdr) > data_end) return XDP_DROP; h_proto = eth->h_proto; if (h_proto == htons(ETH_P_IPV6)) { value = bpf_map_lookup_elem(&rxcnt, &key); if (value) *value += 1; return XDP_DROP; } return XDP_PASS; } char _license[] SEC("license") = "GPL";
Let's name the new program xdp_drop_ipv6_count.c
and build it to create the object file xdp_drop_ipv6_count.o
. After loading the object, send some IPv6 packets to this interface. Using the bpftool map show
command, we can see that the rxcnt
ID in our map is 13. Then we can use bpftool map dump id 13
to show that 13 packets were processed on CPU 0 and 7 packets were processed on CPU 1:
$ sudo xdp-loader load -m skb -s xdp_drop_ipv6 veth1 xdp_drop_ipv6_count.o ...receive some IPv6 packets $ sudo bpftool map show bpftool map show 13: percpu_array name rxcnt flags 0x0 key 4B value 8B max_entries 1 memlock 4096B btf_id 20 19: array name xdp_disp.rodata flags 0x480 key 4B value 84B max_entries 1 memlock 8192B btf_id 28 frozen # bpftool map dump id 13 [{ "key": 0, "values": [{ "cpu": 0, "value": 13 },{ "cpu": 1, "value": 7 },{ "cpu": 2, "value": 0 },{ "cpu": 3, "value": 0 } ] } ]
BPF supports many more map types, such as BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_ARRAY, etc.
Task 4: Load XDP objects with the custom loader
We can load the XDP objects withip
and show the map number with bpftool
. But if we want more advanced features (to create, read, and write maps, attach XDP programs to interfaces, etc.), we need to write the loader ourselves.
Here is an example of how to show the total packets and the number of packets per second (PPS) we dropped. In this example, I hard-coded a lot of stuff such as kernel object name, section name, etc. These all can be set via parameters in your own code.
The purpose of each function is explained by comments in the code:
#include <unistd.h> #include <stdlib.h> #include <string.h> #include <linux/if_link.h> #include <signal.h> #include <net/if.h> #include <assert.h> /* In this example we use libbpf-devel and libxdp-devel */ #include <bpf/bpf.h> #include <bpf/libbpf.h> #include <xdp/libxdp.h> /* We define the following global variables */ static int ifindex; struct xdp_program *prog = NULL; /* This function will remove XDP from the link when the program exits. */ static void int_exit(int sig) { xdp_program__close(prog); exit(0); } /* This function will count the per-CPU number of packets and print out * the total number of dropped packets number and PPS (packets per second). */ static void poll_stats(int map_fd, int interval) { int ncpus = libbpf_num_possible_cpus(); if (ncpus < 0) { printf("Error get possible cpus\n"); return; } long values[ncpus], prev[ncpus], total_pkts; int i, key = 0; memset(prev, 0, sizeof(prev)); while (1) { long sum = 0; sleep(interval); assert(bpf_map_lookup_elem(map_fd, &key, values) == 0); for (i = 0; i < ncpus; i++) sum += (values[i] - prev[i]); if (sum) { total_pkts += sum; printf("total dropped %10llu, %10llu pkt/s\n", total_pkts, sum / interval); } memcpy(prev, values, sizeof(values)); } } int main(int argc, char *argv[]) { int prog_fd, map_fd, ret; struct bpf_object *bpf_obj; if (argc != 2) { printf("Usage: %s IFNAME\n", argv[0]); return 1; } ifindex = if_nametoindex(argv[1]); if (!ifindex) { printf("get ifindex from interface name failed\n"); return 1; } /* load XDP object by libxdp */ prog = xdp_program__open_file("xdp_drop_ipv6_count.o", "xdp_drop_ipv6", NULL); if (!prog) { printf("Error, load xdp prog failed\n"); return 1; } /* attach XDP program to interface with skb mode * Please set ulimit if you got an -EPERM error. */ ret = xdp_program__attach(prog, ifindex, XDP_MODE_SKB, 0); if (ret) { printf("Error, Set xdp fd on %d failed\n", ifindex); return ret; } /* Find the map fd from the bpf object */ bpf_obj = xdp_program__bpf_obj(prog); map_fd = bpf_object__find_map_fd_by_name(bpf_obj, "rxcnt"); if (map_fd < 0) { printf("Error, get map fd from bpf obj failed\n"); return map_fd; } /* Remove attached program when it is interrupted or killed */ signal(SIGINT, int_exit); signal(SIGTERM, int_exit); poll_stats(map_fd, 2); return 0; }
Set the ulimit
to unlimited and build the program with the -lbpf -lxdp
flags. Then run the program, which shows the packet count output:
$ sudo ulimit -l unlimited $ gcc xdp_drop_ipv6_count_user.c -o xdp_drop_ipv6_count -lbpf -lxdp $ sudo ./xdp_drop_ipv6_count veth1 total dropped 2, 1 pkt/s total dropped 129, 63 pkt/s total dropped 311, 91 pkt/s total dropped 492, 90 pkt/s total dropped 674, 91 pkt/s total dropped 856, 91 pkt/s total dropped 1038, 91 pkt/s ^C
Summary
This article helps you understand what a XDP program looks like, how to add a BPF map, and how to write a custom loader. To learn more about XDP programming, please visit xdp-tutorial.
Last updated: April 7, 2022