Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Speed up SystemTap scripts with statistical aggregates

April 10, 2019
William Cohen

    A common question that SystemTap can be used to answer involves how often particular events occur on the system. Some events, such as system calls, can happen frequently and the goal is to make the SystemTap script as efficient as possible.

    Using the statistical aggregate in the place of regular integers is one way to improve the performance of SystemTap scripts. The statistical aggregates record data on a per-CPU basis to reduce the amount of coordination required between processors, allowing information to be recorded with less overhead. In this article, I’ll show an example of how to reduce overhead in SystemTap scripts.

    Below is a version of the syscalls_by_proc.stp example script that tallies the number of times each executable makes a system call. When the script exits, it prints out the number of times each executable on the system make a system call sorted from most to fewest system calls.

    Line 1 defines the global array used to store the information. Line 3 is the probe that actually tallies each system call using the ++. The probe end starting at line 5 prints out the data stored in the global syscalls associative array.

    global syscalls
    
    probe kernel.trace("sys_enter") { syscalls[execname()] ++ }
    
    probe end {
      printf ("%-10s %-s\n", "#SysCalls", "Process Name")
      foreach (proc in syscalls-)
        printf("%-10d %-s\n", syscalls[proc], proc)
    }

    The following code example shows a run of the previous script. The -v option causes SystemTap to provide timing information for each of the five passes passes. The -t provides timing information for each of the probes in the script and will provide a way to compare the efficiency of the different implementations. The -c "make -j8" starts the make command once the SystemTap instrumentation is ready and stops the instrumentation once the command finishes. At the end of the output is the "probe hit report," which provides information about the overhead of the various probes.

    The middle of line for the kernel.trace("raw_syscalls:sys_enter") shows it was triggered more than 24 million times. The middle of that same line states it took an average of 4224 clock cycles to do the operation. Also over 3 million lock contentions occur on the global associative array (__global_syscall) in the "refresh report" section.

    $ stap -v -t ~/present/2019blog/fast/syscalls_by_proc.stp -c "make -j8"
    Pass 1: parsed user script and 504 library scripts using 290680virt/85620res/3552shr/82780data kb, in 590usr/30sys/621real ms.
    Pass 2: analyzed script: 2 probes, 1 function, 0 embeds, 1 global using 296352virt/92244res/4484shr/88452data kb, in 130usr/190sys/331real ms.
    Pass 3: using cached /home/wcohen/.systemtap/cache/b8/stap_b8412b9e49934149b789a69c3e2a2b4e_1469.c
    Pass 4: using cached /home/wcohen/.systemtap/cache/b8/stap_b8412b9e49934149b789a69c3e2a2b4e_1469.ko
    Pass 5: starting run.
      DESCEND  objtool
      CALL    scripts/atomic/check-atomics.sh
      CALL    scripts/checksyscalls.sh
      CHK     include/generated/compile.h
      TEST    posttest
      Building modules, stage 2.
      MODPOST 3484 modules
    arch/x86/tools/insn_decoder_test: success: Decoded and checked 5057175 instructions
      TEST    posttest
    arch/x86/tools/insn_sanity: Success: decoded and checked 1000000 random instructions with 0 errors (seed:0x18e55059)
    Kernel: arch/x86/boot/bzImage is ready  (#9)
    #SysCalls  Process Name
    22614118   make
    1675055    sh
    138647     awk
    112797     modpost
    93456      cat
    69563      objdump
    68771      insn_decoder_te
    49887      grep
    14656      cc1
    13544      gcc
    11714      rm
    10174      as
    ...
    ----- probe hit report: 
    kernel.trace("raw_syscalls:sys_enter"), (/home/wcohen/present/2019blog/fast/syscalls_by_proc.stp:17:1), hits: 24894191, cycles: 310min/4224avg/477851max, variance: 5282267, from: kernel.trace("sys_enter"), index: 0
    end, (/home/wcohen/present/2019blog/fast/syscalls_by_proc.stp:21:1), hits: 1, cycles: 109027min/109027avg/109027max, variance: 0, from: end, index: 1
    ----- refresh report:
    '__global_syscalls' lock contention occurred 3230491 times
    Pass 5: run completed in 174200usr/100470sys/69766real ms.
    

    The overhead of the script can be significantly reduce by using statistical aggregates. The ++ is replaced by a <<< 1 to tally each system call and the tallies are now printed with a @sum(syscalls[proc]) in the code below.

    global syscalls
    
    probe kernel.trace("sys_enter") { syscalls[execname()] &lt;&lt;&lt; 1 }
    
    probe end {
      printf ("%-10s %-s\n", "#SysCalls", "Process Name")
      foreach (proc in syscalls-)
        printf("%-10d %-s\n", @sum(syscalls[proc]), proc)
    }
    

    Below is an equivalent run of the script using the statistical aggregates in place of the slower ++. Looking at the "probe hit report" toward the end of the listing, you will see there the raw_syscalls:sys_enter trace point was run a similar number of times as the other script, about 25 million times. However, notice that the average time required by the handler is 409 clock cycles, much lower that the 4224 cycles observed on the version with ++. There is no lock contention listed for the syscalls associative array listed in the "refresh report" section for the run using statistical aggregates either.

    $ stap -v -t ~/present/2019blog/faster/syscalls_by_proc.stp -c "make -j8"
    Pass 1: parsed user script and 504 library scripts using 290680virt/85624res/3552shr/82780data kb, in 580usr/30sys/616real ms.
    Pass 2: analyzed script: 2 probes, 1 function, 0 embeds, 1 global using 296348virt/92256res/4492shr/88448data kb, in 140usr/190sys/332real ms.
    Pass 3: using cached /home/wcohen/.systemtap/cache/de/stap_de5d5dc935de72aac43603205a17cad4_1482.c
    Pass 4: using cached /home/wcohen/.systemtap/cache/de/stap_de5d5dc935de72aac43603205a17cad4_1482.ko
    Pass 5: starting run.
      DESCEND  objtool
      CALL    scripts/atomic/check-atomics.sh
      CALL    scripts/checksyscalls.sh
      CHK     include/generated/compile.h
      TEST    posttest
      Building modules, stage 2.
      MODPOST 3484 modules
    arch/x86/tools/insn_decoder_test: success: Decoded and checked 5057175 instructions
      TEST    posttest
    arch/x86/tools/insn_sanity: Success: decoded and checked 1000000 random instructions with 0 errors (seed:0xbefec711)
    Kernel: arch/x86/boot/bzImage is ready  (#9)
    #SysCalls  Process Name
    22616433   make
    1675055    sh
    138647     awk
    112797     modpost
    93456      cat
    69563      objdump
    68771      insn_decoder_te
    49891      grep
    14706      cc1
    13544      gcc
    11714      rm
    10174      as
    ...
    ----- probe hit report:
    kernel.trace("raw_syscalls:sys_enter"), (/home/wcohen/present/2019blog/faster/syscalls_by_proc.stp:17:1), hits: 24895995, cycles: 216min/409avg/395696max, variance: 3425671, from: kernel.trace("sys_enter"), index: 0
    end, (/home/wcohen/present/2019blog/faster/syscalls_by_proc.stp:19:1), hits: 1, cycles: 847759min/847759avg/847759max, variance: 0, from: end, index: 1
    ----- refresh report:
    Pass 5: run completed in 172780usr/66010sys/64215real ms.
    

    The tradeoff of using the statistical aggregates is that when using the @sum() is that the data needs to be fetched from all the processors in the machine. This is more expensive than just fetching a single integer value stored in an associative array. However, for this example, reducing the overhead of the system call probes more than makes up for the overhead of @sum() used to print the results.

    When writing SystemTap scripts, you can use the -t option to better understand the overhead of your scripts and consider using the statistical aggregates when feasible. As this example shows, the overhead in instrumentation scripts can be significantly reduced in SystemTap scripts.

    Recent Posts

    • Protect data offloaded to GPU-accelerated environments with OpenShift sandboxed containers

    • Case study: Measuring energy efficiency on the x64 platform

    • How to prevent AI inference stack silent failures

    • Preventing GPU waste: A guide to JIT checkpointing with Kubeflow Trainer on OpenShift AI

    • How to manage TLS certificates used by OpenShift GitOps operator

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.