Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Using eXpress Data Path (XDP) maps in RHEL 8: Part 2

 

December 17, 2018
Paolo Abeni
Related topics:
C, C#, C++Developer ToolsLinux
Related products:
Red Hat Enterprise Linux

Share:

    Diving into XDP

    In the first part of this series on XDP, I introduced XDP and discussed the simplest possible example. Let's now try to do something less trivial, exploring some more-advanced eBPF features—maps—and some common pitfalls.

    XDP is available in Red Hat Enterprise Linux 8, which you can download and run now.

    [Not] Reinventing the wheel

    We will start adding packet parsing to our sample; to simplify such task, we reuse the kernel definition for common networking protocol, adding the following to the include section of our XDP program:

    #include <linux/in.h>
    #include <linux/if_ether.h>
    #include <linux/if_packet.h>
    #include <linux/if_vlan.h>
    #include <linux/ip.h>
    

    We now need to access the packet contents via the XDP context. Let's take a look at its definition:

    struct xdp_md {
        __u32 data;
        __u32 data_end;
        __u32 data_meta;
        /* Below access go through struct xdp_rxq_info */
        __u32 ingress_ifindex; /* rxq->dev->ifindex */
        __u32 rx_queue_index; /* rxq->queue_index */
    };
    

    The packet contents are between ctx->data and ctx->data_end. So we can add the parsing code and try to use the address somehow. In this case, we drop the packet with a zero IPv4 destination address:

    /* Parse IPv4 packet to get SRC, DST IP and protocol */
    static inline int parse_ipv4(void *data, __u64 nh_off, void *data_end, __be32 *src, __be32 *dest)
    {
        struct iphdr *iph = data + nh_off;
    
        *src = iph->saddr;
        *dest = iph->daddr;
        return iph->protocol;
    }
    
    SEC("prog")
    int xdp_drop(struct xdp_md *ctx)
    {
        void *data_end = (void *)(long)ctx->data_end;
        void *data = (void *)(long)ctx->data;
        struct ethhdr *eth = data;
        __be32 dest_ip, src_ip;
        __u16 h_proto;
        __u64 nh_off;
        int ipproto;
    
        nh_off = sizeof(*eth);
    
        /* parse vlan */
        h_proto = eth->h_proto;
        if (h_proto == __constant_htons(ETH_P_8021Q) ||
            h_proto == __constant_htons(ETH_P_8021AD)) {
            struct vlan_hdr *vhdr;
    
            vhdr = data + nh_off;
            nh_off += sizeof(struct vlan_hdr);
            h_proto = vhdr->h_vlan_encapsulated_proto;
        }
        if (h_proto != __constant_htons(ETH_P_IP))
            goto pass;
    
        ipproto = parse_ipv4(data, nh_off, data_end, &src_ip, &dest_ip);
        if (!dst_ip)
            return XDP_DROP;
    
    pass:
        return XDP_PASS;
    }
    

    Tripping on the verifier

    The above code should compile just fine, but if we try to load it with iproute, we get a bad surprise:

    Prog section 'prog' rejected: Permission denied (13)!
    - Type: 6
    - Instructions: 19 (0 over limit)
    - License:
    
    Verifier analysis:
    
    0: (61) r1 = *(u32 *)(r1 +0)
    1: (71) r2 = *(u8 *)(r1 +13)
    invalid access to packet, off=13 size=1, R1(id=0,off=0,r=0)
    R1 offset is outside of the packet
    
    Error fetching program/map!
    

    It fails to pass the verifier check! The verifier error message could be somewhat misleading, as we are accessing the first few handful of bytes of the packet. We know that each Ethernet frame must be at least 64 bytes long and, thus, we know we are accessing valid offsets inside the packet payload.

    The verifier, instead, relies only on explicit checks: before access/manipulating any offset inside the packet, we must add a conditional check that such an offset
    is inside the packet body. In our example, before accessing each header, we must ensure that the header tail is below the packet end, by adding a patch like this:

    @@ -17,6 +17,9 @@ static inline int parse_ipv4(void *data, __u64 nh_off, void *data_end,
      {
          struct iphdr *iph = data + nh_off;
    
    +     if (iph + 1 > data_end)
    +         return 0;
    +
          *src = iph->saddr;
          *dest = iph->daddr;
          return iph->protocol;
    @@ -34,6 +37,8 @@ int xdp_drop(struct xdp_md *ctx)
          int ipproto;
    
          nh_off = sizeof(*eth);
    +     if (data + nh_off > data_end)
    +         goto pass;
    
          /* parse vlan */
          h_proto = eth->h_proto;
    @@ -43,6 +48,8 @@ int xdp_drop(struct xdp_md *ctx)
    
          vhdr = data + nh_off;
          nh_off += sizeof(struct vlan_hdr);
    +     if (data + nh_off > data_end)
    +         goto pass;
          h_proto = vhdr->h_vlan_encapsulated_proto;
          }
          if (h_proto != __constant_htons(ETH_P_IP))
    

    The verifier should be happy now!

    Custom XDP loader

    We already talked about maps in part 1; let's see how we can use them in practice. We want to enhance our XDP program to allow the user to configure the addresses to be dropped at runtime and also to be able to read the related stats.

    As a first step, we need to replace the iproute2 tool with a custom loader program, as the tool does not allow maps manipulation. The code fragment used to load the XDP program should be something like this:

    #include <bpf/bpf.h>
    #include <bpf/libbpf.h>
    #include <error.h>
    
    // [ ... ]
        struct bpf_prog_load_attr prog_load_attr = {
            .prog_type = BPF_PROG_TYPE_XDP,
            .file = "xdp_drop_kern.o",
        };
    // [ ... ]
        if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
            error(1, errno, "can't load %s", prog_load_attr.file);
    
        ifindex = if_nametoindex(dev_name);
        if (!ifindex)
            error(1, errno, "unknown interface %s\n", dev_name);
        if (bpf_set_link_xdp_fd(ifindex, prog_fd, 0) < 0)
            error(1, errno "can't attach to interface %s:%d: "
                  "%d:%s\n", dev_name, ifindex, errno,
                  strerror(errno));
    // [ ... ]
        // cleaning-up
        bpf_set_link_xdp_fd(ifindex, -1, 0);
    

    We are using the libbpf helper library, bundled in the Linux kernel sources. bpf_prog_load_xattr() loads the eBPF program specified by the prog_load_attr argument. It will parse all the elf sections of the specified object extracting all the related info and placing it into the obj status data. Each found program (text section) is then loaded inside the kernel via a newly allocated file descriptor (prog_fd).

    Such a file descriptor is later used to attach the loaded program to the selected device, via the bpf_set_link_xdp_fd() function. The last argument allows the user to specify several flags, such as a flag for replacing the existing XDP program, if any, or a flag for using the driver-level XDP hook. By default:

    • It will try to use the driver-level hook and then fall back to the common one.
    • If an XDP program is already installed on the specified device, it will fail.

    Finally, the last helper, to be invoked at program termination, detaches the XDP program from the NIC and frees all the associated kernel resources.

    Interacting with user space

    Let's now move to the juicy part: maps! Every data structure shared between the user space and the eBPF program is called a "map," but there are actually several different types: hashmap, array, queue, and so on. Usually, there are two different variants: simple and per-CPU. With the per-CPU variant, each entry is replicated for all the locally available CPUs; inside the kernel, each CPU will access only its private copy. The per-CPU variant avoids any kind of contention-related issue and it's the preferred one when the eBPF program must modify the data entries on a per-packet basis.

    The map data will be accessed by both user space and the eBPF program. It's convenient to add the data type definition in a header file included by both sides. In this example, we use a map to specify the source addresses to be filtered and count the number of bytes and packets dropped for each specified address. To add such a map to our program, we need something like this:

        // in xdp_drop_common.h
        struct stats_entry {
            __u64 packets;
            __u64 bytes;
        };
    
        // in xdp_drop_kern.c
        #include "xdp_drop_common.h"
        // [ ... ]
        /* forwarding map */
        struct bpf_map_def SEC("maps") egress_map = {
            .type = BPF_MAP_TYPE_PERCPU_HASH,
            .key_size = sizeof(__be32),
            .value_size = sizeof(struct stats_entry),
            .max_entries = 100,
        };
    
        // in xdp_drop_user.c
        struct bpf_map *map;
        int map_fd;
        // [ ... ]
        map = bpf_object__find_map_by_name(obj, "drop_map");
        if (!map)
            error(1, errno, "can't load drop_map");
        map_fd = bpf_map__fd(map);
        if (map_fd < 0)
            error(1, errno, "can't get drop_map fd");
    

    Note that our map is really a per-CPU hash table, and its definition contains only the key and value size, as the kernel needs only such info to do the allocation, perform the lookup, and do the entry update. The map definition contains also the maximum number of entries allowed inside such a map. Hashmaps are initially empty, and inserting the above such limit will fail. Arrays have a fixed size equal to the specified limit. The user space can access the map via a specified file descriptor. Using the libbpf helpers may look a little over-complicated here, but it really helps when the eBPF program exposes multiple maps.

    We are now ready to add the user space/eBPF interaction:

        // in xdp_drop_kern.c
        struct stats_entry entry;
        // [ ... ]
        stats = bpf_map_lookup_elem(&drop_map, &src_ip);
        if (!stats)
            goto pass;
    
        stats->packets++
        stats->bytes += ctx->data_end - ctx->data;
        return XDP_DROP;
    
        // in xdp_drop_user.c
        // [ ... ]
        memset(&entry, 0, sizeof(entry));
        if (bpf_map_update_elem(map_fd, &saddr, entry, BPF_ANY))
            error(1, errno, "can't add address %s\n", argv[i]);
        // [ ... ]
        if (bpf_map_lookup_elem(map_fd, &ipv4_addr, &entry))
            error(1, errnom "no stats for rule %x %x\n",
                  ipv4_addr);
        printf("addr %x drop %ld:%ld\n", ipv4_addr,
               entry.packets, entry.bytes);
    

    Now the eBPF program drops the packet only if the source IP address is found in the drop_map hash table, and it updates the related stats. The user-space program fills such a map with zeroed stats and (periodically) looks up such entries, printing out the stats reported by the eBPF program.

    For brevity, boilerplate user-space code to fetch somewhere the source address [list] and to gracefully terminate is omitted; when that is included, we are
    ready to build and run.

    Some map caveats

    The results obtained with the current code could be disappointing, ranging from random crashes of the user-space program to the eBPF filter being apparently ineffective. If the user-space program terminates abnormally, it will leave the XDP program attached to the network device and later execution will fail on startup. In such a case, the user needs to manually detach the XDP program with iproute:

    ip link set dev <NIC> xdp off
    

    In some lucky cases, the current code could work almost flawlessly, just failing to detach the XDP program at shutdown time.

    While some of you may already guess where the problem is, we are going to use an XDP/eBPF debugging facility to dump the program status, by adding the following to xdp_drop_kern:

    @@ -33,6 +33,13 @@ static inline int parse_ipv4(void *data, __u64 nh_off, void *data_end,
          return iph->protocol;
      }
    
    +     #define bpf_printk(fmt, ...) \
    +     ({ \
    +         char ____fmt[] = fmt; \
    +         bpf_trace_printk(____fmt, sizeof(____fmt), \
    +         ##__VA_ARGS__); \
    +     })
    +
          SEC("prog")
          int xdp_drop(struct xdp_md *ctx)
          {
    @@ -45,6 +52,8 @@ int xdp_drop(struct xdp_md *ctx)
          __u64 nh_off;
          int ipproto;
    
    +     bpf_printk("xdp_drop\n");
    +
          nh_off = sizeof(*eth);
          if (data + nh_off > data_end)
              goto pass;
    @@ -72,6 +81,8 @@ int xdp_drop(struct xdp_md *ctx)
          if (!stats)
              goto pass;
    
    +     bpf_printk("xdp_drop pkts %lld:%lld\n", stats->packets, stats->bytes);
    +
          stats->packets++;
          stats->bytes += ctx->data_end - ctx->data;
          return XDP_DROP;
    

    Then we can run again the example and observe the messages emitted in:

    /sys/kernel/debug/tracing/trace
    

    The eBPF helper is invoked correctly for each ingress packet.

    If you are lucky enough, you may observe that the stats associated with the map entry created by the user space look corrupted, for example, containing fairly random values even when the first packet is received after the entry creation.

    We are using a per-CPU map: when setting the entry, the kernel reads <number of possible CPUs> values from the specified data address, copying each of them into the corresponding per-CPU value inside the kernel map—in our case, accessing data other than the stats variable on the user-space program stack.

    Moreover, when the user-space process tries to read an entry from the map, the kernel copies the same amount of data to the specified address, again hitting data on the stack, and causing the random behavior mentioned above.

    The solution is simply allocating enough storage for the map entry with something like this:

        int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
         struct stats_entry *entry;
    // [ ... ]
        entry = calloc(nr_cpus, sizeof(struct stats_entry));
        if (!entry)
            error(1, 0, "can't allocate entry\n");
    

    And, when reading the map, walk and aggregate all the values:

        struct stats_entry all = { 0, 0};
    
        if (bpf_map_lookup_elem(map_fd, &ipv4_addr, entry))
            error(1, errno, "no stats for address %x\n",
                  ipv4_addr);
    
        for (j = 0; j < nr_cpus; j++) {
            all.packets += entry[j].packets;
            all.bytes += entry[j].bytes;
        }
    

    Now our IP filter application is ready!

    The road ahead

    In this article, we covered some of the functionality offered by XDP/eBPF, but there is much more. For example, there are many more eBPF helpers ready to be used for various tasks: updating the packet checksum after some modification, packets forwarding, and so on.

    A good starting point is with this header inside the Linux kernel sources, which contains the official documentation for the implemented helpers:

    include/uapi/linux/bpf.h
    

    Moreover, the samples/bpf/ directory, still in the kernel sources, contains several more-complex XDP examples. A relevant background is required before going there, though.

    The full source for the example discussed above can be found at:

    https://github.com/altoor/xdp_walkthrough_examples
    

    Happy hacking!

    See also:

    • Achieving high-performance, low-latency networking with XDP (Part 1 of this article series)
    • Network debugging with eBPF
    • Red Hat Enterprise Linux 8 announcement

    Download RHEL 8 Now. 

     

    Last updated: March 24, 2023

    Recent Posts

    • Our top 10 articles of 2025 (so far)

    • The benefits of auto-merging GitHub and GitLab repositories

    • Supercharging AI isolation: microVMs with RamaLama & libkrun

    • Simplify multi-VPC connectivity with amazon.aws 9.0.0

    • How HaProxy router settings affect middleware applications

    What’s up next?

     

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue