Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How VXLAN encapsulation affects Arm (aarch64) servers

February 3, 2025
Marcelo Ricardo Leitner
Related topics:
LinuxApplication modernizationSystem Design
Related products:
Red Hat Enterprise Linux

Share:

    In the previous article, Linux on Arm (aarch64) servers..., we examined flat networks. That is, we used only bare network interfaces and nothing else. That's a good start, but it is often not the topology used, especially with solutions like OpenStack. So in this installment, let’s evaluate how impactful VXLAN encapsulation can be for throughput and CPU consumption.

    We’ll reuse the hosts and testing procedures, with only the modifications to add VXLAN. For details on the test environment, please refer to the previously mentioned article.

    VXLAN configuration

    Each host has a VXLAN tunnel to the other host. For reference, the following is the NMState YAML file used for one of the hosts:

    interfaces:
    - name: vxlan08-09
      type: vxlan
      state: up
      mtu: 1450
      ipv4:
        address:
        - ip: 192.168.2.1
          prefix-length: 24
        enabled: true
        dhcp: false
      vxlan:
        base-iface: enp1s0f0np0
        id: 809
        remote: 192.168.1.2
    routes:
      config:
        - destination: 192.168.2.2/32
          next-hop-interface: vxlan08-09

    Tests with ELN kernal

    With an ELN kernel, kernel-6.9.0-0.rc4.37.eln136, the results are shown in Figure 1. Each pair of bars is a test. For example, the first pair, the generated traffic is in green while the received traffic by the device under test (DUT) is in purple. For transmission control protocol (TCP), they are the same, but they will vary on user datagram protocol (UDP) tests. The error markers are standard deviation out of 10 runs of 1 minute.

    For the CPU/throughput graphs, the total CPU consumption for both cores, application, and interrupt request (IRQ) are summed (theoretical maximum of 200 then) and then divided by the observed throughput. The smaller the number, the better (less CPU used to pump the same amount of traffic).

    TCP throughput: Figure 1 shows that encapsulation adds a toll. We can see it when compared to the results in previous article. It goes about 60% of the original throughput. 

    Bar graph showing DUT being able to receive 24Gbps and send 40Gbps
    Figure 1: TCP throughput.

    The application CPU, which was just 11.1% free, now gets some more slack because the IRQ one needs to do more work for the same packet, as seen in Figures 2 and 3 and their respective tables. Like before, what limits the throughput is the receiver IRQ CPU usage in all cases. Similar to the test with bare interfaces, it’s interesting to note that DUT was able to send traffic as fast as the traffic generator could receive while still having nearly half of the CPU power left unused.

    Bar graph showing DUT using 6.5% of CPU per Gbps on receive
    Figure 2: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     1.8+-0.3   
    CPU App %sys     54.2+-0.6  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    44.0+-0.5  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  99.9+-0.1  
    CPU S-IRQ %idle  0.1+-0.0
    Bar graph showing DUT using 2.5% CPU per Gbps sent
    Figure 3: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     0.7+-0.1
    CPU App %sys     56.0+-1.1
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    43.3+-1.0
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.1+-0.3
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  53.1+-0.6
    CPU S-IRQ %idle  46.8+-0.6

    UDP 1500 bytes

    Figure 4 shows that the throughput is halved compared to the test with bare interfaces, which is expected for this topology. 

    Bar graph showing DUT being able to send 3.6GBps and receive 3.7Gbps
    Figure 4: UDP 1500 bytes throughput.

    We can observe that the CPU usage pattern on the sending side is similar to the bare interface test with the application CPU being the bottleneck. As seen in Figures 5 and 6 and their respective tables, most of the VXLAN transmit processing is done at the application CPU, while the IRQ CPU is only doing the transmit completion. This means just freeing the sent packets, so it puts a considerable pressure on the sending application CPU. Conversely, on the receiving side, the DUT IRQ CPU gets fully busy, causing 42% idle time to show up now in the DUT application CPU.

    Bar graph showing DUT using 42% CPU per Gbps received
    Figure 5: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     6.9+-0.2   
    CPU App %sys     50.9+-0.4  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    42.2+-0.4  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  99.9+-0.0  
    CPU S-IRQ %idle  0.0+-0.0 
    Bar graph showing DUT using 34% CPU per Gbps when sending
    Figure 6: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     4.2+-0.4
    CPU App %sys     95.7+-0.4
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.1+-0.2
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  23.7+-0.9
    CPU S-IRQ %idle  76.2+-1.0

    UDP 60 bytes

    In this test, the encapsulation toll is less expressive because the stack is already under pressure due to the small packet size. The increased overhead due to the VXLAN processing is not as expressive as in the previous test, as seen in Figures 7-9 and their respective tables.

    Bar graph showing DUT being able to receive 90Mbps and send 65Mbps
    Figure 7: UDP 60 bytes throughput.
    Bar graph showing DUT using 1.7% CPU per Mbps on receive
    Figure 8: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     15.0+-0.5  
    CPU App %sys     85.0+-0.5  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    0.0+-0.0   
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  58.9+-1.5  
    CPU S-IRQ %idle  41.0+-1.5
    Bar graph showing DUT using 1.8% CPU per Mbps sent
    Figure 9: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     6.5+-0.3
    CPU App %sys     93.4+-0.3
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.1+-0.3
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  19.1+-0.8
    CPU S-IRQ %idle  80.9+-0.8

    Tests with RHEL 9.4

    Red Hat Enterprise Linux 9.4 kernel is based on kernel-5.14.0-427.el9. That’s the one we tested here with the same procedure.

    TCP throughput

    The results were the same as with the previous ELN kernel, as seen in Figures 10-12 and their respective tables. So, there is nothing else to add here.

    Bar graph showing DUT being able to receive 24Gbps and send 40Gbps
    Figure 10: TCP throughput.
    Bar graph showing DUT using 7% CPU per Gbps received
    Figure 11: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     3.0+-0.5   
    CPU App %sys     64.1+-0.7  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    32.9+-0.5  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  100.0+-0.1 
    CPU S-IRQ %idle  0.0+-0.0 
    Bar graph showing DUT using 2.7% CPU per Gbps sent
    Figure 12: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     0.7+-0.2
    CPU App %sys     58.4+-1.4
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    40.9+-1.3
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  53.6+-0.9
    CPU S-IRQ %idle  46.4+-0.9

    UDP 1500 bytes

    The results were quite the same as with the ELN kernel above, as can be seen in Figures 13-15 and their respective tables. Once again, I have nothing else to add here.

    Bar graph showing DUT being able to receive 4Gbps and send 3.5GBps
    Figure 13: UDP 1500 bytes throughput.
    Bar graph showing DUT using 40% CPU per GBps received
    Figure 14: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     7.6+-0.4   
    CPU App %sys     56.5+-0.7  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    35.9+-0.6  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  99.9+-0.0  
    CPU S-IRQ %idle  0.0+-0.0 
    Bar graph showing DUT using 35% CPU per Gbps sent
    Figure 15: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     3.6+-0.2
    CPU App %sys     96.3+-0.2
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  23.3+-0.5
    CPU S-IRQ %idle  76.7+-0.5

    UDP 60 bytes

    Once again, the results were quite the same as with the previous ELN kernel, as seen in Figures 16-18 and their respective tables with nothing else to add.

    Bar graph showing DUT being able to receive 85MBps and send 65Mbps
    Figure 16: UDP 60 bytes throughput.
    Bar graph showing DUT using 1.8% CPU per Mbps received
    Figure 17: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     13.6+-0.5  
    CPU App %sys     86.4+-0.5  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    0.0+-0.0   
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  55.4+-0.9  
    CPU S-IRQ %idle  44.6+-0.9
    Bar graph showing DUT using 1.8% CPU per Mbps sent
    Figure 18: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     5.8+-0.3
    CPU App %sys     94.1+-0.3
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  18.1+-0.4
    CPU S-IRQ %idle  81.9+-0.4

    Conclusions

    The DUT server kept up with the traffic generator and sustained datacenter-level traffic, as it was able to with bare interfaces. It is worth mentioning that there are many techniques to scale the application when using UDP sockets, such as UDP GRO, UDP Segmentation Offload, RPS, RFS, XPS, etc. Also, the DUT server was launched mid-2020 and aarch64 designs often have plenty of cores which the applications can use to scale.

    Related Posts

    • Linux on Arm (aarch64) servers: Can they handle datacenter-level networks?

    • OpenJDK, AArch64, and Fedora

    • How Red Hat ported OpenJDK to 64-bit Arm: A community history

    • The importance of standardization to emerging 64-bit ARM servers

    Recent Posts

    • How to implement and monitor circuit breakers in OpenShift Service Mesh 3

    • Analysis of OpenShift node-system-admin-client lifespan

    • What's New in OpenShift GitOps 1.18

    • Beyond a single cluster with OpenShift Service Mesh 3

    • Kubernetes MCP server: AI-powered cluster management

    What’s up next?

    Download the Advanced Linux Commands cheat sheet. You'll learn to manage applications and executables in a Linux operating system, define search criteria and query audit logs, set and monitor network access, and more.

    Get the cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue