Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Network debugging with eBPF (RHEL 8)

December 3, 2018
Matteo Croce
Related topics:
C, C#, C++Linux
Related products:
Red Hat Enterprise Linux

Share:

    Introduction

    Networks are fun to work with, but often they are also a source of trouble. Network troubleshooting can be difficult, and reproducing the bad behavior that is happening in the field can be painful as well.

    Luckily, there are some tools that come to the aid: network namespaces, virtual machines, tc, and netfilter. Simple network setups can be reproduced with network namespaces and veth devices, while more-complex setups require interconnecting virtual machines with a software bridge and using standard networking tools, like iptables or tc, to simulate the bad behavior. If you have an issue with ICMP replies generated because an SSH server is down, iptables -A INPUT -p tcp --dport 22 -j REJECT --reject-with icmp-host-unreachable in the correct namespace or VM can do the trick.

    This article describes using eBPF (extended BPF), an extended version of the Berkeley Packet Filter, to troubleshoot complex network issues. eBPF is a fairly new technology and the project is still in an early stage, with documentation and the SDK not yet ready. But that should improve, especially with XDP (eXpress Data Path) being shipped in Red Hat Enterprise Linux 8, which you can download and run now.

    While eBPF is not a silver bullet, I think it is a very powerful tool for network debugging and it deserves attention. I am sure it will play a really important role in the future of networks.

    The problem

    I was debugging an Open vSwitch (OVS) network issue affecting a very complex installation: some TCP packets were scrambled and delivered out of order, and the throughput between VMs was dropping from a sustained 6 Gb/s to an oscillating 2–4 Gb/s. After some analysis, it turned out that the first TCP packet of every connection with the PSH flag set was sent out of order: only the first one, and only one per connection.

    I tried to replicate the setup with two VMs, and after many man pages and internet searches, I discovered that both iptables and nftables can't mangle TCP flags, while tc could, but it can only overwrite the flags, breaking new connections and TCP in general.

    Probably I could have dealt with it using a combination of iptables mark, conntrack, and tc, but then I thought: this could be a job for eBPF.

    What is eBPF?

    eBPF is an extended version of the Berkeley Packet Filter. It adds many improvements to BPF; most notably, it allows writing memory instead of just reading it, so it can also edit packets in addition to filtering them.

    eBPF is often referred to as BPF, while BPF is referred to as cBPF (classic BPF), so the word BPF can be used to represent both, depending on the context: here, I'm always referring to the extended version.

    Under the hood, eBPF uses a very simple bytecode VM that can execute small portions of bytecode and edit some in-memory buffers. eBPF comes with some limitations, to prevent it from being used maliciously:

    • Cycles are forbidden, so the program will exit in a definite time.
    • It can't access memory other than the stack and a scratch buffer.
    • Only kernel functions in a whitelist can be called.

    The loaded program can be loaded in the kernel in many ways, doing a plethora of debugging and tracing. In this case, we are interested in how eBPF works with the networking subsystem. There are two ways to use an eBPF program:

    • Attached via XDP to the very early RX path of a physical or virtual NIC
    • Attached via tc to a qdisc just like a normal action, in ingress or egress

    In order to create an eBPF program to attach, it is enough to write some C code and convert it into bytecode. Below a simple example using XDP:

    SEC("prog")
    int xdp_main(struct xdp_md *ctx)
    {
        void *data_end = (void *)(uintptr_t)ctx->data_end;
        void *data = (void *)(uintptr_t)ctx->data;
    
        struct ethhdr *eth = data;
        struct iphdr *iph = (struct iphdr *)(eth + 1);
        struct icmphdr *icmph = (struct icmphdr *)(iph + 1);
    
        /* sanity check needed by the eBPF verifier */
        if (icmph + 1 > data_end)
            return XDP_PASS;
    
        /* matched a pong packet */
        if (eth->h_proto != ntohs(ETH_P_IP) ||
            iph->protocol != IPPROTO_ICMP ||
            icmph->type != ICMP_ECHOREPLY)
            return XDP_PASS;
    
        if (iph->ttl) {
            /* save the old TTL to recalculate the checksum */
            uint16_t *ttlproto = (uint16_t *)&iph->ttl;
            uint16_t old_ttlproto = *ttlproto;
    
            /* set the TTL to a pseudorandom number 1 < x < TTL */
            iph->ttl = bpf_get_prandom_u32() % iph->ttl + 1;
    
            /* recalculate the checksum; otherwise, the IP stack will drop it */
            csum_replace2(&iph->check, old_ttlproto, *ttlproto);
        }
    
        return XDP_PASS;
    }
    
    char _license[] SEC("license") = "GPL";

    The snippet above, stripped of include statements, helpers, and all the not-necessary code, is an XDP program that changes the TTL of received ICMP echo replies, namely pongs, to a random number. The main function receives a struct xdp_md, which contains two pointers to the packet start and end.

    To compile our code into eBPF bytecode, a compiler with support for it is needed. Clang supports it and produces eBPF bytecode by specifying bpf as the target at compile time:

    $ clang -O2 -target bpf -c xdp_manglepong.c -o xdp_manglepong.o

    The command above produces a file that seems to be a regular object file, but if inspected, you'll see that the reported machine type will be Linux eBPF rather than the native one of the OS:

    $ readelf -h xdp_manglepong.o
    ELF Header:
      Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
      Class:                             ELF64
      Data:                              2's complement, little endian
      Version:                           1 (current)
      OS/ABI:                            UNIX - System V
      ABI Version:                       0
      Type:                              REL (Relocatable file)
      Machine:                           Linux BPF  <--- HERE
      [...]

    Once wrapped in a regular object file, the eBPF program is ready to be loaded and attached to the device via XDP. This can be done using ip, from the iproute2 suite, using the following syntax:

    # ip -force link set dev wlan0 xdp object xdp_manglepong.o verbose

    This command specified the target interface wlan0, and with the -force option, it will overwrite any existing eBPF code already loaded. After loading the eBPF bytecode, this is the system behavior:

    $ ping -c10 192.168.85.1
    PING 192.168.85.1 (192.168.85.1) 56(84) bytes of data.
    64 bytes from 192.168.85.1: icmp_seq=1 ttl=41 time=0.929 ms
    64 bytes from 192.168.85.1: icmp_seq=2 ttl=7 time=0.954 ms
    64 bytes from 192.168.85.1: icmp_seq=3 ttl=17 time=0.944 ms
    64 bytes from 192.168.85.1: icmp_seq=4 ttl=64 time=0.948 ms
    64 bytes from 192.168.85.1: icmp_seq=5 ttl=9 time=0.803 ms
    64 bytes from 192.168.85.1: icmp_seq=6 ttl=22 time=0.780 ms
    64 bytes from 192.168.85.1: icmp_seq=7 ttl=32 time=0.847 ms
    64 bytes from 192.168.85.1: icmp_seq=8 ttl=50 time=0.750 ms
    64 bytes from 192.168.85.1: icmp_seq=9 ttl=24 time=0.744 ms
    64 bytes from 192.168.85.1: icmp_seq=10 ttl=42 time=0.791 ms
    
    --- 192.168.85.1 ping statistics ---
    10 packets transmitted, 10 received, 0% packet loss, time 125ms
    rtt min/avg/max/mdev = 0.744/0.849/0.954/0.082 ms

    Every packet received goes through eBPF, which eventually does some transformation and decides to drop or let the packet pass.

    How eBPF can help

    Going back to the original network issue, I needed to mangle some TCP flags, only one per connection, and neither iptables nor tc allow doing that. Writing C code for this scenario would be very easy: set up two VMs linked by an OVS bridge and simply attach eBPF to one of the two VM virtual devices.

    This looks like a nice solution, but you must take into account that XDP only supports handling of received packets, and attaching eBPF in the rx path of the receiving VM will have no effect on the switch.

    To properly address this, eBPF has to be loaded using tc and attached in the egress path within the VM, as tc can load and attach eBPF programs to a qdisc just like any other action. In order to mangle packets leaving the host, an egress qdisc is needed to attach eBPF to.

    There are small differences between the XDP and tc API when loading an eBPF program: the default section name differs, the argument of the main function has a different structure type, and the returned values are different, but this is not a big issue. Below is a snippet of a program that does TCP mangling when attached to a tc action:

    #define RATIO 10
    
    SEC("action")
    int bpf_main(struct __sk_buff *skb)
    {
        void *data = (void *)(uintptr_t)skb->data;
        void *data_end = (void *)(uintptr_t)skb->data_end;
        struct ethhdr *eth = data;
        struct iphdr *iph = (struct iphdr *)(eth + 1);
        struct tcphdr *tcphdr = (struct tcphdr *)(iph + 1);
    
        /* sanity check needed by the eBPF verifier */
        if ((void *)(tcphdr + 1) > data_end)
            return TC_ACT_OK;
    
        /* skip non-TCP packets */
        if (eth->h_proto != __constant_htons(ETH_P_IP) || iph->protocol != IPPROTO_TCP)
            return TC_ACT_OK;
    
        /* incompatible flags, or PSH already set */
        if (tcphdr->syn || tcphdr->fin || tcphdr->rst || tcphdr->psh)
            return TC_ACT_OK;
    
        if (bpf_get_prandom_u32() % RATIO == 0)
            tcphdr->psh = 1;
    
        return TC_ACT_OK;
    }
    
    char _license[] SEC("license") = "GPL";

    The compilation into bytecode is done as in the XDP example before via the following:

    clang -O2 -target bpf -c tcp_psh.c -o tcp_psh.o

    But the loading is different:

    # tc qdisc add dev eth0 clsact
    # tc filter add dev eth0 egress matchall action bpf object-file tcp_psh.o

    At this point, eBPF is loaded in the right place and packets leaving the VM are mangled. By checking the received packets from the second VM, you can see the following:

    # tcpdump -tnni eth0 -Q in
    [1579537.890082] device eth0 entered promiscuous mode
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 809667041:809681521, ack 3046223642, length 14480
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 14480:43440, ack 1, length 28960
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 43440:101360, ack 1, length 57920
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [P.], seq 101360:131072, ack 1, length 29712
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 131072:145552, ack 1, length 14480
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 145552:174512, ack 1, length 28960
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 174512:210712, ack 1, length 36200
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 210712:232432, ack 1, length 21720
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 232432:246912, ack 1, length 14480
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [P.], seq 246912:262144, ack 1, length 15232
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 262144:276624, ack 1, length 14480
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 276624:305584, ack 1, length 28960
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 305584:363504, ack 1, length 57920
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [P.], seq 363504:393216, ack 1, length 29712
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 393216:407696, ack 1, length 14480
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 407696:436656, ack 1, length 28960
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 436656:494576, ack 1, length 57920
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [P.], seq 494576:524288, ack 1, length 29712
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 524288:538768, ack 1, length 14480
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 538768:567728, ack 1, length 28960
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 567728:625648, ack 1, length 57920
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [.], seq 625648:627096, ack 1, length 1448
    IP 192.168.123.1.39252 > 192.168.123.2.5201: Flags [P.], seq 627096:655360, ack 1, length 28264

    tcpdump confirms that the new eBPF code is working, and about 1 of every 10 TCP packets has the PSH flag set. With just 20 lines of C code, we selectively mangled the TCP packets leaving a VM, replicating an error that happened in the field, all without recompiling any driver and without even rebooting! This simplified a lot the validation of the Open vSwitch fix in a manner that was impossible to do with other tools.

    Conclusion

    eBPF is a fairly new technology, and the community has strong opinions about its adoption. It's also worth noting that eBPF-based projects like bpfilter are becoming more popular, and as consequence, various hardware vendors are starting to implement eBPF support directly in their NICs.

    While eBPF is not a silver bullet and should not be abused, I think it is a very powerful tool for network debugging and it deserves attention. I am sure it will play a really important role in the future of networks.

    Download Red Hat Enterprise Linux 8 and try eBPF.

    Additional Resources

    • Articles on Open vSwitch
    • Articles on Open Virtual Network
    • Introducing stapbpf – SystemTap’s new BPF backend
    Last updated: January 14, 2022

    Recent Posts

    • Integrate Red Hat AI Inference Server & LangChain in agentic workflows

    • Streamline multi-cloud operations with Ansible and ServiceNow

    • Automate dynamic application security testing with RapiDAST

    • Assessing AI for OpenShift operations: Advanced configurations

    • OpenShift Lightspeed: Assessing AI for OpenShift operations

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue