Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How VXLAN encapsulation affects Arm (aarch64) servers

February 3, 2025
Marcelo Ricardo Leitner
Related topics:
LinuxApplication modernizationSystem Design
Related products:
Red Hat Enterprise Linux

Share:

    In the previous article, Linux on Arm (aarch64) servers..., we examined flat networks. That is, we used only bare network interfaces and nothing else. That's a good start, but it is often not the topology used, especially with solutions like OpenStack. So in this installment, let’s evaluate how impactful VXLAN encapsulation can be for throughput and CPU consumption.

    We’ll reuse the hosts and testing procedures, with only the modifications to add VXLAN. For details on the test environment, please refer to the previously mentioned article.

    VXLAN configuration

    Each host has a VXLAN tunnel to the other host. For reference, the following is the NMState YAML file used for one of the hosts:

    interfaces:
    - name: vxlan08-09
      type: vxlan
      state: up
      mtu: 1450
      ipv4:
        address:
        - ip: 192.168.2.1
          prefix-length: 24
        enabled: true
        dhcp: false
      vxlan:
        base-iface: enp1s0f0np0
        id: 809
        remote: 192.168.1.2
    routes:
      config:
        - destination: 192.168.2.2/32
          next-hop-interface: vxlan08-09

    Tests with ELN kernal

    With an ELN kernel, kernel-6.9.0-0.rc4.37.eln136, the results are shown in Figure 1. Each pair of bars is a test. For example, the first pair, the generated traffic is in green while the received traffic by the device under test (DUT) is in purple. For transmission control protocol (TCP), they are the same, but they will vary on user datagram protocol (UDP) tests. The error markers are standard deviation out of 10 runs of 1 minute.

    For the CPU/throughput graphs, the total CPU consumption for both cores, application, and interrupt request (IRQ) are summed (theoretical maximum of 200 then) and then divided by the observed throughput. The smaller the number, the better (less CPU used to pump the same amount of traffic).

    TCP throughput: Figure 1 shows that encapsulation adds a toll. We can see it when compared to the results in previous article. It goes about 60% of the original throughput. 

    Bar graph showing DUT being able to receive 24Gbps and send 40Gbps
    Figure 1: TCP throughput.

    The application CPU, which was just 11.1% free, now gets some more slack because the IRQ one needs to do more work for the same packet, as seen in Figures 2 and 3 and their respective tables. Like before, what limits the throughput is the receiver IRQ CPU usage in all cases. Similar to the test with bare interfaces, it’s interesting to note that DUT was able to send traffic as fast as the traffic generator could receive while still having nearly half of the CPU power left unused.

    Bar graph showing DUT using 6.5% of CPU per Gbps on receive
    Figure 2: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     1.8+-0.3   
    CPU App %sys     54.2+-0.6  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    44.0+-0.5  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  99.9+-0.1  
    CPU S-IRQ %idle  0.1+-0.0
    Bar graph showing DUT using 2.5% CPU per Gbps sent
    Figure 3: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     0.7+-0.1
    CPU App %sys     56.0+-1.1
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    43.3+-1.0
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.1+-0.3
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  53.1+-0.6
    CPU S-IRQ %idle  46.8+-0.6

    UDP 1500 bytes

    Figure 4 shows that the throughput is halved compared to the test with bare interfaces, which is expected for this topology. 

    Bar graph showing DUT being able to send 3.6GBps and receive 3.7Gbps
    Figure 4: UDP 1500 bytes throughput.

    We can observe that the CPU usage pattern on the sending side is similar to the bare interface test with the application CPU being the bottleneck. As seen in Figures 5 and 6 and their respective tables, most of the VXLAN transmit processing is done at the application CPU, while the IRQ CPU is only doing the transmit completion. This means just freeing the sent packets, so it puts a considerable pressure on the sending application CPU. Conversely, on the receiving side, the DUT IRQ CPU gets fully busy, causing 42% idle time to show up now in the DUT application CPU.

    Bar graph showing DUT using 42% CPU per Gbps received
    Figure 5: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     6.9+-0.2   
    CPU App %sys     50.9+-0.4  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    42.2+-0.4  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  99.9+-0.0  
    CPU S-IRQ %idle  0.0+-0.0 
    Bar graph showing DUT using 34% CPU per Gbps when sending
    Figure 6: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     4.2+-0.4
    CPU App %sys     95.7+-0.4
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.1+-0.2
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  23.7+-0.9
    CPU S-IRQ %idle  76.2+-1.0

    UDP 60 bytes

    In this test, the encapsulation toll is less expressive because the stack is already under pressure due to the small packet size. The increased overhead due to the VXLAN processing is not as expressive as in the previous test, as seen in Figures 7-9 and their respective tables.

    Bar graph showing DUT being able to receive 90Mbps and send 65Mbps
    Figure 7: UDP 60 bytes throughput.
    Bar graph showing DUT using 1.7% CPU per Mbps on receive
    Figure 8: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     15.0+-0.5  
    CPU App %sys     85.0+-0.5  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    0.0+-0.0   
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  58.9+-1.5  
    CPU S-IRQ %idle  41.0+-1.5
    Bar graph showing DUT using 1.8% CPU per Mbps sent
    Figure 9: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     6.5+-0.3
    CPU App %sys     93.4+-0.3
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.1+-0.3
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  19.1+-0.8
    CPU S-IRQ %idle  80.9+-0.8

    Tests with RHEL 9.4

    Red Hat Enterprise Linux 9.4 kernel is based on kernel-5.14.0-427.el9. That’s the one we tested here with the same procedure.

    TCP throughput

    The results were the same as with the previous ELN kernel, as seen in Figures 10-12 and their respective tables. So, there is nothing else to add here.

    Bar graph showing DUT being able to receive 24Gbps and send 40Gbps
    Figure 10: TCP throughput.
    Bar graph showing DUT using 7% CPU per Gbps received
    Figure 11: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     3.0+-0.5   
    CPU App %sys     64.1+-0.7  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    32.9+-0.5  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  100.0+-0.1 
    CPU S-IRQ %idle  0.0+-0.0 
    Bar graph showing DUT using 2.7% CPU per Gbps sent
    Figure 12: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     0.7+-0.2
    CPU App %sys     58.4+-1.4
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    40.9+-1.3
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  53.6+-0.9
    CPU S-IRQ %idle  46.4+-0.9

    UDP 1500 bytes

    The results were quite the same as with the ELN kernel above, as can be seen in Figures 13-15 and their respective tables. Once again, I have nothing else to add here.

    Bar graph showing DUT being able to receive 4Gbps and send 3.5GBps
    Figure 13: UDP 1500 bytes throughput.
    Bar graph showing DUT using 40% CPU per GBps received
    Figure 14: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     7.6+-0.4   
    CPU App %sys     56.5+-0.7  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    35.9+-0.6  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  99.9+-0.0  
    CPU S-IRQ %idle  0.0+-0.0 
    Bar graph showing DUT using 35% CPU per Gbps sent
    Figure 15: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     3.6+-0.2
    CPU App %sys     96.3+-0.2
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  23.3+-0.5
    CPU S-IRQ %idle  76.7+-0.5

    UDP 60 bytes

    Once again, the results were quite the same as with the previous ELN kernel, as seen in Figures 16-18 and their respective tables with nothing else to add.

    Bar graph showing DUT being able to receive 85MBps and send 65Mbps
    Figure 16: UDP 60 bytes throughput.
    Bar graph showing DUT using 1.8% CPU per Mbps received
    Figure 17: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     13.6+-0.5  
    CPU App %sys     86.4+-0.5  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    0.0+-0.0   
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  55.4+-0.9  
    CPU S-IRQ %idle  44.6+-0.9
    Bar graph showing DUT using 1.8% CPU per Mbps sent
    Figure 18: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     5.8+-0.3
    CPU App %sys     94.1+-0.3
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  18.1+-0.4
    CPU S-IRQ %idle  81.9+-0.4

    Conclusions

    The DUT server kept up with the traffic generator and sustained datacenter-level traffic, as it was able to with bare interfaces. It is worth mentioning that there are many techniques to scale the application when using UDP sockets, such as UDP GRO, UDP Segmentation Offload, RPS, RFS, XPS, etc. Also, the DUT server was launched mid-2020 and aarch64 designs often have plenty of cores which the applications can use to scale.

    Related Posts

    • Linux on Arm (aarch64) servers: Can they handle datacenter-level networks?

    • OpenJDK, AArch64, and Fedora

    • How Red Hat ported OpenJDK to 64-bit Arm: A community history

    • The importance of standardization to emerging 64-bit ARM servers

    Recent Posts

    • More Essential AI tutorials for Node.js Developers

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    What’s up next?

    Download the Advanced Linux Commands cheat sheet. You'll learn to manage applications and executables in a Linux operating system, define search criteria and query audit logs, set and monitor network access, and more.

    Get the cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue