Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Benchmark Cyclone DDS in CentOS Stream 9 VMs for low latency

March 21, 2024
Matias Ezequiel Vara Larsen Alexander Lougovski Marcelo Tosatti
Related topics:
LinuxOpen sourceVirtualization
Related products:
Red Hat Enterprise Linux

    When users have to choose a middleware technology to communicate different software components, they often pick Robot Operating System 2 (ROS 2) due to its features like efficiency and reliability. These components can be deployed locally (i.e., same host, or remotely, different hosts), and they communicate by following a service-oriented pattern.

    For example, a software component that reads data from a sensor would publish the sensor values whereas another software component that processes that data would subscribe to it. ROS 2 takes care of sending the data between the components and ensures that data arrives on time. ROS 2 relies on a DDS as its middleware, which is a specification that is implemented by different vendors, e.g., Eclipse Cyclone DDS, OpenDDS. DDS is used in a wide variety of systems, including air-traffic control, jet engine testing, railway control, medical systems, naval command-and-control, smart greenhouses, and much more. However, when using a DDS to communicate software components that are deployed in virtual machines (VMs), the latency can vary in unpredictable ways, thus making the communication unreliable.

    In this article, we demonstrate the real time capabilities of virtual machines under CentOS Stream 9 for the Cyclone DDS. We present how to set up two CentOS Stream 9 guests on top of CentOS Stream 9 host to communicate over Eclipse Cyclone DDS with low latency.

    How to benchmark latency

    To benchmark the Cyclone DDS, we have chosen the latency-test benchmark that is based on the ddspeft tool. The measuring of latency requires two instances of the ddsperf tool, one in each host/guest. The first instance is the sender that sends a given amount of data at a given rate. The second is the receiver that sends back the same amount of data to the sender in a ping-pong scenario. The sending action is triggered by the ping option. The receiving behavior is triggered by the pong action. The sender measures the round-trip time and computes the latency as half of the round-trip time. For a given payload, the benchmark executes several round trips and then gets the median and the maximum latency among other parameters. The benchmark is launched by running the following command line:

    ./latency-test -i enp1s0 -I enp1s0 root@guest-1

    Benchmarked configurations

    We have launched the benchmark in two different setups. The first is by running the benchmark between two physical hosts. This can be seen as a base reference to compare with the virtualized system. The second is by running the benchmark between two VMs in the same host. In the following examples, we detail each setup.

    Host-to-host

    For the host-to-host communication, we set up each host with a RT kernel, i.e., 5.14.0-410.el9.x86_64+rt and 5.14.0-407.el9.x86_64+rt (CentOS Stream 9). To install an RT kernel, we use dnf:

    dnf install kernel-rt-kvm 

    We isolate two cores to run the benchmark. To do this, we use the realtime-virtual-host profile. Note that the isolated cores shall belong to the same NUMA node. To do this, we first edit /etc/tuned/realtime-virtual-host-variables.conf to define the isolated cores, and then we activate the profile:

    tuned-adm profile realtime-virtual-host 

    The launching of the benchmark is done by using the taskset command that allows to set up the affinity to a group of cores. At the end of the benchmark, the following graph is displayed, as shown in Figure 1.

    Alt text
    Figure 1: Payload vs latency graph for two bare-metal hosts.

    In yellow, Figure 1 shows the maximum latency with different payloads. We can see that the latency generally increases with the payload but the increase gets more pronounced when payload gets bigger than MTU (i.e., 1500 bytes).

    Guest-to-guest

    For guest-to-guest communication, we have built two CentOS Stream 9 guests and we have set up a RT kernel for each guest. To install a RT kernel, you can simply run the following command after the first boot of the guest:

    dnf install kernel-rt 

    The host also is a KVM/RT kernel (we have set up this host in the previous configuration). For each of the guests, we isolate three cores to run the vCPUs and one extra core for the kthread that handles the virtio-net network card. This deployment thus requires four isolated cores in the host for each guest. To isolate these cores on the host, we rely on the realtime-virtual-host profile as we did for the host-to-host configuration. Note that the cores that are assigned for a VM shall belong to the same NUMA node. For each guest, we set the affinity for the vCPUs and the kthread by using virsh. Also, we have disabled the Virtio memory balloon device. To do this, we use the virsh edit [nameoftheguest] command to edit the guest configuration and set the affinity. For example, in our case, each guest configuration would look like the following:

    <vcpu placement='static'>3</vcpu>
    <cputune>
      <vcpupin vcpu='0' cpuset='25'/>
      <vcpupin vcpu='1' cpuset='27'/>
      <vcpupin vcpu='2' cpuset='29'/>
      <emulatorpin cpuset='33'/>
      <vcpusched vcpus='0' scheduler='fifo' priority='1'/>
      <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
      <vcpusched vcpus='2' scheduler='fifo' priority='1'/>
    </cputune>

    Note that the cpuset parameter would depend on the guest and indicates to which CPU in the host the vCPU is pinned.

    In the guests, we use one vCPU for housekeeping and the other two vCPUs to run the benchmark. To set up this configuration, we activate the realtime-virtual-guest profile in each guest. To do this, we edit /etc/tuned/realtime-virtual-guest-variables.conf and then we apply the profile:

    tuned-adm profile realtime-virtual-guest

    Also, we set up the affinity of virtio-net IRQs to the cores that run the benchmark. This is important, because otherwise the kernel might decide to use IPI to handle the IRQs by a different core. Also, the housekeeping vCPU may become very busy if it handles all the IRQs thus interfering with one another. For example, if the virtio-net input IRQ is the 54, then, we can set the affinity to the 1 and 2 vCPUs by running:

    echo 06 > /proc/irq/54/smp_affinity 

    The benchmark runs in the guest by using the taskset on the isolated cores. Figure 2 illustrates the overall configuration of the host and the guests regarding affinity and isolation.

    Alt text
    Figure 2: Single host deployment with core and vCPU affinity.

    For networking between the guests, we let virsh use the default bridge and we choose vhost-net for each guest. 

    The graph in Figure 3 is when running the benchmark without any affinity configuration.

    Alt text
    Figure 3: Payload vs latency graph for two guests deployment with out-of-the-box configuration.

    Figure 3 shows that even though that median latency looks good the maximum latency varies in an unpredictable way. Let’s configure the guests and the host to get results similar than the physical host (see Figure 4).

    Alt text
    Figure 4: Payload vs latency graph for two guests deployment after configuration.

    Figure 4 shows that, after configuring the host and the guest and launching the benchmark, we achieve a lower max latency that is tightly to the median.

    We want to highlight that the improvement of latency also requires understanding the workload of the benchmark. We use strace to understand where time in the benchmark is spent. This investigation results in realizing that the benchmark is based on two threads one for reception and another for processing the requests. These threads exchange data by relying on a few futexes. Forcing these threads to run in the same core would increase contention in critical sections. To reduce such a contention, we make each thread run in a different core thus resulting in a reduction of latency. 

    Conclusion

    This article has presented how it is possible to reduce the latency when communicating VMs by using the CycloneDDS. To achieve this, we have configured the kernel host and the kernel in each guest. By doing so, we have presented results that are similar when deploying the DDS between physical hosts. 

    Related Posts

    • What CentOS Stream means for developers

    • A guide for using CentOS Project code

    • Virtio live migration technical deep dive

    • Network observability using TCP handshake round-trip time

    • What CentOS Stream means for developers

    • Convert CentOS Linux to RHEL using Red Hat Lightspeed

    Recent Posts

    • Every layer counts: Defense in depth for AI agents with Red Hat AI

    • Fun in the RUN instruction: Why container builds with distroless images can surprise you

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    What’s up next?

    Convert2RHEL is a command-line utility that can streamline your migration path from CentOS Linux 7 to a fully supported Red Hat Enterprise Linux (RHEL) operating system. This cheat sheet outlines how to convert your CentOS Linux instance to RHEL in just 7 steps.

    Get the cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.