Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

How VXLAN encapsulation affects Arm (aarch64) servers

February 3, 2025
Marcelo Ricardo Leitner
Related topics:
LinuxApplication modernizationSystem design
Related products:
Red Hat Enterprise Linux

    In the previous article, Linux on Arm (aarch64) servers..., we examined flat networks. That is, we used only bare network interfaces and nothing else. That's a good start, but it is often not the topology used, especially with solutions like OpenStack. So in this installment, let’s evaluate how impactful VXLAN encapsulation can be for throughput and CPU consumption.

    We’ll reuse the hosts and testing procedures, with only the modifications to add VXLAN. For details on the test environment, please refer to the previously mentioned article.

    VXLAN configuration

    Each host has a VXLAN tunnel to the other host. For reference, the following is the NMState YAML file used for one of the hosts:

    interfaces:
    - name: vxlan08-09
      type: vxlan
      state: up
      mtu: 1450
      ipv4:
        address:
        - ip: 192.168.2.1
          prefix-length: 24
        enabled: true
        dhcp: false
      vxlan:
        base-iface: enp1s0f0np0
        id: 809
        remote: 192.168.1.2
    routes:
      config:
        - destination: 192.168.2.2/32
          next-hop-interface: vxlan08-09

    Tests with ELN kernal

    With an ELN kernel, kernel-6.9.0-0.rc4.37.eln136, the results are shown in Figure 1. Each pair of bars is a test. For example, the first pair, the generated traffic is in green while the received traffic by the device under test (DUT) is in purple. For transmission control protocol (TCP), they are the same, but they will vary on user datagram protocol (UDP) tests. The error markers are standard deviation out of 10 runs of 1 minute.

    For the CPU/throughput graphs, the total CPU consumption for both cores, application, and interrupt request (IRQ) are summed (theoretical maximum of 200 then) and then divided by the observed throughput. The smaller the number, the better (less CPU used to pump the same amount of traffic).

    TCP throughput: Figure 1 shows that encapsulation adds a toll. We can see it when compared to the results in previous article. It goes about 60% of the original throughput. 

    Bar graph showing DUT being able to receive 24Gbps and send 40Gbps
    Figure 1: TCP throughput.

    The application CPU, which was just 11.1% free, now gets some more slack because the IRQ one needs to do more work for the same packet, as seen in Figures 2 and 3 and their respective tables. Like before, what limits the throughput is the receiver IRQ CPU usage in all cases. Similar to the test with bare interfaces, it’s interesting to note that DUT was able to send traffic as fast as the traffic generator could receive while still having nearly half of the CPU power left unused.

    Bar graph showing DUT using 6.5% of CPU per Gbps on receive
    Figure 2: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     1.8+-0.3   
    CPU App %sys     54.2+-0.6  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    44.0+-0.5  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  99.9+-0.1  
    CPU S-IRQ %idle  0.1+-0.0
    Bar graph showing DUT using 2.5% CPU per Gbps sent
    Figure 3: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     0.7+-0.1
    CPU App %sys     56.0+-1.1
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    43.3+-1.0
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.1+-0.3
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  53.1+-0.6
    CPU S-IRQ %idle  46.8+-0.6

    UDP 1500 bytes

    Figure 4 shows that the throughput is halved compared to the test with bare interfaces, which is expected for this topology. 

    Bar graph showing DUT being able to send 3.6GBps and receive 3.7Gbps
    Figure 4: UDP 1500 bytes throughput.

    We can observe that the CPU usage pattern on the sending side is similar to the bare interface test with the application CPU being the bottleneck. As seen in Figures 5 and 6 and their respective tables, most of the VXLAN transmit processing is done at the application CPU, while the IRQ CPU is only doing the transmit completion. This means just freeing the sent packets, so it puts a considerable pressure on the sending application CPU. Conversely, on the receiving side, the DUT IRQ CPU gets fully busy, causing 42% idle time to show up now in the DUT application CPU.

    Bar graph showing DUT using 42% CPU per Gbps received
    Figure 5: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     6.9+-0.2   
    CPU App %sys     50.9+-0.4  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    42.2+-0.4  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  99.9+-0.0  
    CPU S-IRQ %idle  0.0+-0.0 
    Bar graph showing DUT using 34% CPU per Gbps when sending
    Figure 6: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     4.2+-0.4
    CPU App %sys     95.7+-0.4
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.1+-0.2
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  23.7+-0.9
    CPU S-IRQ %idle  76.2+-1.0

    UDP 60 bytes

    In this test, the encapsulation toll is less expressive because the stack is already under pressure due to the small packet size. The increased overhead due to the VXLAN processing is not as expressive as in the previous test, as seen in Figures 7-9 and their respective tables.

    Bar graph showing DUT being able to receive 90Mbps and send 65Mbps
    Figure 7: UDP 60 bytes throughput.
    Bar graph showing DUT using 1.7% CPU per Mbps on receive
    Figure 8: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     15.0+-0.5  
    CPU App %sys     85.0+-0.5  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    0.0+-0.0   
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  58.9+-1.5  
    CPU S-IRQ %idle  41.0+-1.5
    Bar graph showing DUT using 1.8% CPU per Mbps sent
    Figure 9: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     6.5+-0.3
    CPU App %sys     93.4+-0.3
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.1+-0.3
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  19.1+-0.8
    CPU S-IRQ %idle  80.9+-0.8

    Tests with RHEL 9.4

    Red Hat Enterprise Linux 9.4 kernel is based on kernel-5.14.0-427.el9. That’s the one we tested here with the same procedure.

    TCP throughput

    The results were the same as with the previous ELN kernel, as seen in Figures 10-12 and their respective tables. So, there is nothing else to add here.

    Bar graph showing DUT being able to receive 24Gbps and send 40Gbps
    Figure 10: TCP throughput.
    Bar graph showing DUT using 7% CPU per Gbps received
    Figure 11: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     3.0+-0.5   
    CPU App %sys     64.1+-0.7  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    32.9+-0.5  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  100.0+-0.1 
    CPU S-IRQ %idle  0.0+-0.0 
    Bar graph showing DUT using 2.7% CPU per Gbps sent
    Figure 12: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     0.7+-0.2
    CPU App %sys     58.4+-1.4
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    40.9+-1.3
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  53.6+-0.9
    CPU S-IRQ %idle  46.4+-0.9

    UDP 1500 bytes

    The results were quite the same as with the ELN kernel above, as can be seen in Figures 13-15 and their respective tables. Once again, I have nothing else to add here.

    Bar graph showing DUT being able to receive 4Gbps and send 3.5GBps
    Figure 13: UDP 1500 bytes throughput.
    Bar graph showing DUT using 40% CPU per GBps received
    Figure 14: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     7.6+-0.4   
    CPU App %sys     56.5+-0.7  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    35.9+-0.6  
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  99.9+-0.0  
    CPU S-IRQ %idle  0.0+-0.0 
    Bar graph showing DUT using 35% CPU per Gbps sent
    Figure 15: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     3.6+-0.2
    CPU App %sys     96.3+-0.2
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  23.3+-0.5
    CPU S-IRQ %idle  76.7+-0.5

    UDP 60 bytes

    Once again, the results were quite the same as with the previous ELN kernel, as seen in Figures 16-18 and their respective tables with nothing else to add.

    Bar graph showing DUT being able to receive 85MBps and send 65Mbps
    Figure 16: UDP 60 bytes throughput.
    Bar graph showing DUT using 1.8% CPU per Mbps received
    Figure 17: DUT CPU % per throughput on receive.
                     DUT rcv    
    CPU App %usr     13.6+-0.5  
    CPU App %sys     86.4+-0.5  
    CPU App %irq     0.0+-0.0   
    CPU App %soft    0.0+-0.0   
    CPU App %idle    0.0+-0.0   
    CPU S-IRQ %usr   0.0+-0.0   
    CPU S-IRQ %sys   0.0+-0.0   
    CPU S-IRQ %irq   0.0+-0.0   
    CPU S-IRQ %soft  55.4+-0.9  
    CPU S-IRQ %idle  44.6+-0.9
    Bar graph showing DUT using 1.8% CPU per Mbps sent
    Figure 18: DUT CPU % per throughput on send.
                     DUT snd
    CPU App %usr     5.8+-0.3
    CPU App %sys     94.1+-0.3
    CPU App %irq     0.0+-0.0
    CPU App %soft    0.0+-0.0
    CPU App %idle    0.1+-0.0
    CPU S-IRQ %usr   0.0+-0.0
    CPU S-IRQ %sys   0.0+-0.0
    CPU S-IRQ %irq   0.0+-0.0
    CPU S-IRQ %soft  18.1+-0.4
    CPU S-IRQ %idle  81.9+-0.4

    Conclusions

    The DUT server kept up with the traffic generator and sustained datacenter-level traffic, as it was able to with bare interfaces. It is worth mentioning that there are many techniques to scale the application when using UDP sockets, such as UDP GRO, UDP Segmentation Offload, RPS, RFS, XPS, etc. Also, the DUT server was launched mid-2020 and aarch64 designs often have plenty of cores which the applications can use to scale.

    Related Posts

    • Linux on Arm (aarch64) servers: Can they handle datacenter-level networks?

    • OpenJDK, AArch64, and Fedora

    • How Red Hat ported OpenJDK to 64-bit Arm: A community history

    • The importance of standardization to emerging 64-bit ARM servers

    Recent Posts

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    • Red Hat UBI 8 builders have been promoted to the Paketo Buildpacks organization

    • Using eBPF in Red Hat products

    • How we made one data layer serve the UI, the mocks, and the E2E tests

    • Build trusted Python containers with Project Hummingbird and Calunga

    What’s up next?

    Download the Advanced Linux Commands cheat sheet. You'll learn to manage applications and executables in a Linux operating system, define search criteria and query audit logs, set and monitor network access, and more.

    Get the cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility