Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Dive deeper in NUMA systems

May 31, 2013
Don Zickus
Related topics:
Linux
Related products:
Red Hat Enterprise Linux

Share:

    A common performance related issue we are seeing is how certain instructions
    are causing bottlenecks.  Sometimes it just doesn't make sense.  Especially
    when it involves lots of threads or shared memory on NUMA systems.

    For quite awhile a bunch of us have been writing tools to help exploit features
    of the CPU to provide us insight to not only the instruction of the bottleneck
    but the data address too.

    See, the instruction is only half the picture.  Having the data address allows
    you to see two distinct functions operating on what looks like distinct data,
    but yet are intertwined on a cache-line.  Thus these functions are tugging
    memory back and forth causing huge latency spikes.

    Sometimes the answer is to separate the data onto different cache-lines, other
    times (in the case of locks) perhaps change the granularity to reduce
    contention.

    Intel CPUs have support for providing data addresses for load and stores (along
    with latency times for loads) through its performance counters.  Userspace
    exploits this feature with a tool called 'perf'.

    Latest perf can be run with:

    #perf record -a -e cpu/mem-loads,ldlat=100/pp

    This samples system-wide loads on all cpus.  You can display them with

    #perf report --mem --stdio

    This shows some useful information:

    # Overhead       Samples  Local Weight      Memory access              Symbol              Data Symbol
    # ........  ............  ............  .................  ..................  .......................
    #
        17.84%           904  213           L3 hit             [.] spin            [.] 0x00007f1f84137080
        15.47%          1478  113           L3 hit             [.] spin            [.] 0x00007f1f84137080
        13.29%           780  184           L3 hit             [.] spin            [.] 0x00007f1f84137000
        12.69%           637  215           L3 hit             [.] spin            [.] 0x00007f1f84137080
        10.87%           624  188           L3 hit             [.] spin            [.] 0x00007f1f84137000
         6.45%           330  211           L3 hit             [.] spin            [.] 0x00007f1f84137080
         3.95%           384  111           L3 hit             [.] spin            [.] 0x00007f1f84137080

    Now I run the same thing except looking for stores (writes)

    #perf record -a -e cpu/mem-loads/pp
    #perf report --mem --stdio

    and I see:

    # Overhead       Samples              Symbol              Data Symbol
    # ........  ............  ..................  .......................
    #
        10.64%          2048  [.] spin            [.] 0x00007f1f84137080                   
        10.21%          1967  [.] spin            [.] 0x00007f1f84137000                   
         5.22%          1006  [.] proc2           [.] 0x00007fffd8987c68                   
         5.13%           987  [.] proc1           [.] 0x00007fffd8987c68                   
         4.90%           943  [.] acquire_lock    [.] 0x00007fffd8987c58                   
         2.69%           518  [.] release_lock    [.] 0x00007f1f84137080                   
         2.67%           514  [.] release_lock    [.] 0x00007f1f84137000

    The next step is to combine the data symbols from the loads and stores to get a bigger picture of what is
    going on. If you look at the top two lines of the outputs you see they both are accessing data addresses
    0x00007f1f84137080 and 0x00007f1f84137080.

    What this tells us is that the load latency is being caused by the constant stores (or writes) to the same
    data address. Reading the symbol reveals the function doing the loading and storing.

    Now in this simple example, the test is just ping-ponging back and forth on two spin locks. Reducing the
    locking time or moving the threads onto the same NUMA node would reduce the latency and contention.

    This example just shows how an expensive cache contention issue can slow down a NUMA system.

    Come and see my talk ("NUMA - Verifying it's not hurting your application performance" @ Red Hat Developer Exchange ) about the types of tools we have been working on to see if your system has these
    types of problems and how they can be exposed easily.

    Last updated: January 9, 2023

    Recent Posts

    • Ollama or vLLM? How to choose the right LLM serving tool for your use case

    • How to build a Model-as-a-Service platform

    • How Quarkus works with OpenTelemetry on OpenShift

    • Our top 10 articles of 2025 (so far)

    • The benefits of auto-merging GitHub and GitLab repositories

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue