Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • See all Red Hat products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Red Hat OpenShift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • See all technologies
    • Programming languages & frameworks

      • Java
      • Python
      • JavaScript
    • System design & architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer experience

      • Productivity
      • Tools
      • GitOps
    • Automated data processing

      • AI/ML
      • Data science
      • Apache Kafka on Kubernetes
    • Platform engineering

      • DevOps
      • DevSecOps
      • Red Hat Ansible Automation Platform for applications and services
    • Secure development & architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & cloud native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • See all learning resources

    E-books

    • GitOps cookbook
    • Podman in action
    • Kubernetes operators
    • The path to GitOps
    • See all e-books

    Cheat sheets

    • Linux commands
    • Bash commands
    • Git
    • systemd commands
    • See all cheat sheets

    Documentation

    • Product documentation
    • API catalog
    • Legacy documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore the Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Exhaustive profiling toolkit: elfutils and libdwfl_stacktrace

November 12, 2025
Serhei Makarov
Related topics:
Developer ToolsLinuxObservability
Related products:
Red Hat Enterprise Linux

    Various (good enough) 80% solutions (i.e., framepointer unwinding and SFrame) have tended to dominate the Linux stack profiling landscape. These proved simple to implement and deploy compared to 20% solutions (e.g., elfutils) that use CFI for more exhaustive profile coverage, including coverage of difficult control-flow sections (e.g., function prologues and epilogues, unusual ABIs). This article discusses the libdwfl_stacktrace initiative to make the elfutils project easier to use for stack profiling and examines ideas for further work, including potential improvements to the kernel’s perf_events infrastructure to benefit both 80% and 20% profiling solutions.

    An exhaustive profiling solution

    In 2024, the elfutils project released the eu-stacktrace prototype. I developed eu-stacktrace with the goal of enabling system-wide stack sample profiling that does not require compiling programs with frame pointers. The goal was to remove barriers that made the already-existing unwinding library in elfutils difficult to adopt by a system-wide profiler.

    The initial version of eu-stacktrace was an executable that communicated with a profiling tool through a FIFO, accepting stack sample packets, unwinding them, and returning call chains. This was a conservative design which assumed minimal modifications to the profiling tool. In early 2025, I reworked the design into a more conventional and efficient library interface, released as libdwfl_stacktrace in elfutils 0.193 and based on feedback from the Sysprof profiler project.

    We designed libdwfl_stacktrace for adoption by profiler projects that currently receive callchain data from Linux perf_events. You must modify the profiler to collect stack samples rather than callchains, and then pass these stack samples to libdwfl_stacktrace for unwinding.

    The core of libdwfl_stacktrace is the following function, which translates a perf_events stack snapshot into a sequence of frames and passes the frames to a callback.

    int dwflst_perf_sample_getframes (dwfl, elf, pid, tid,
      const void *stack, size_t stack_size,
      const Dwarf_Word *regs, size_t n_regs,
      uint64_t perf_regs_mask, uint32_t abi,
      callback, void *arg);
    

    With a couple of minor changes, the interface also adapts to non-perf_events-based profiling infrastructures.

    -int dwflst_perf_sample_getframes (dwfl, elf, pid, tid,
    +int dwflst_sample_getframes (dwfl, elf, pid, tid,
      const void *stack, size_t stack_size,
      const Dwarf_Word *regs, size_t n_regs,
    -  uint64_t perf_regs_mask, uint32_t abi,
    +  const int *regs_mapping, size_t n_regs_mapping,
      callback, void *arg);
    

    Previously, the elfutils libdwfl library enabled stack unwinding, but the public interface made a number of limiting assumptions. A libdwfl library session was represented by a Dwfl structure tied to one process, generally assumed to be accessed as a core file or via a ptrace interface. This does not match the model used by profiling tools, which process sample packets for all processes on a system. When creating multiple Dwfl data structures, these did not share information, resulting in the repeated loading of CFI for the same library as that library was dynamically linked into different processes.

    The libdwfl_stacktrace interface handles multiple processes by providing profiling tools with a Dwfl_Process_Tracker data structure that maintains a table of Dwfl structures and a cache of associated module data. When obtaining Dwfl structures from the tracker via the new dwflst_tracker_find_pid() interface, these Dwfl structures will cache CFI for modules within the tracker. Thus, a commonly used module such as libc.so.6 will only load once, and Dwfl structures representing different processes will share it.

    The aim of libdwfl_stacktrace is to allow stack trace profilers to use elfutils’ mature support for CFI-based unwinding to cover programs across an entire system without missing edge case packets. Further work to make elfutils into an exhaustive profiling solution revolves around obtaining a more consistent data rate from the perf_events stack sampler.

    SFrame as a lightweight profiling solution

    In contrast to the exhaustive aim of libdwfl_stacktrace, the SFrame profiling effort is based around a simple and lightweight CFI format meant to have a simple interpreter that would be easy to include in the Linux kernel. Initial versions of SFrame improved on framepointer unwinding and did not aim to cover all architectures or all control-flow patterns. However, the project has since expanded its ambitions.

    A kernel patchset has been under review for quite some time, and the format was slated for testing in Fedora. But there have been delays. The initial infrastructure for deferred stack tracing merged only this August, but is not yet enabled for any architecture. To be honest, the slow pace of adoption was surprising. My interpretation of events is that SFrame as a promising 80% solution has become bogged down in the process of trying to also cover the 20% solution space. There is an inherent tension between SFrame’s reliance on established, implicit ABI rules and the potential need to cover code sections that do not adhere to these implicit rules.

    As SFrame becomes more complex to handle edge cases, the complexity of the standard begins to approach that of .eh_frame CFI, the kernel and distribution review process slows down, and the value of a completely new format becomes less clear. On the other hand, SFrame support could already be available in the Linux kernel and slated for inclusion in Fedora as a simpler alternative to .eh_frame. If only the project were more committed to keeping its design footprint small and supplanting framepointer unwinding as the already-existing "good enough, but not perfect" solution. Widely-used framepointer unwinding exhibits various coverage gaps, such as function prologues and epilogues, many of which SFrame resolves even in its initial form.

    An area that the SFrame project is beginning to explore is the generation of CFI data from JIT compilers. This is a slow but very promising area of development. Keeping the data format straightforward allows JIT compilers to implement support with minimal effort. Any work in this area is universally beneficial since we could also extend elfutils to read JIT-generated SFrame sections.

    The importance of profile consistency

    In doing performance-testing of various profiling solutions, I've come to appreciate that consistency of the sample rate and profiling overhead is more important than the absolute overhead. The kernel often decides to lower the perf_events sampling rate on a loaded system.

    [806714.250516] perf: interrupt took too long (3949 > 3940), lowering kernel.perf_event_max_sample_rate to 50000
    [806714.881792] perf: interrupt took too long (5081 > 4936), lowering kernel.perf_event_max_sample_rate to 39000
    [806715.097290] perf: interrupt took too long (6390 > 6351), lowering kernel.perf_event_max_sample_rate to 31000
    [806715.175595] perf: interrupt took too long (8084 > 7987), lowering kernel.perf_event_max_sample_rate to 24000
    [806716.304292] perf: interrupt took too long (10176 > 10105), lowering kernel.perf_event_max_sample_rate to 19000
    

    The use of inconsistent sample rates throughout a profile is problematic for studying the resulting data compared to a high but constant profiling overhead. Measurements such as per-function sample counts are inevitably distorted once this kernel safety measure is triggered.

    On heavily-loaded systems, even the baseline Sysprof profiler using perf_events with the default framepointer-based unwinding mechanism can trigger the throttling. Hence, this is an ongoing issue lurking in the background of the profile-quality discourse, and we need further testing to identify how often the kernel adjusts the sample rate. It’s known that the throttling becomes more likely as the number of cores on a system increases, which makes profiling on high-CPU-count systems more difficult to achieve reliably.

    We may overcome the tendency of the kernel to throttle perf_events sample rates with additional tuning parameters that accommodate high-bandwidth profiling data streams. Depending on the results from planned experimentation, it might make sense to reduce the likelihood of triggering long-term sample rate reductions from one-off data gluts by introducing an optional grace period for the throttling or automatically raising kernel.perf_event_max_sample_rate after a period of heavy load has passed. Currently, such tuning options do not exist, but are justifiable for exhaustive, debug-oriented profiling on non-production systems.

    Wrap up

    In this article, we discussed two approaches to Linux stack profiling: elfutils with libdwfl_stacktrace for exhaustive profiling and SFrame for lightweight profiling. Future work revolves around obtaining a more consistent data rate from the perf_events stack sampler to make elfutils into a more viable exhaustive profiling solution.

    Related Posts

    • How to retrieve packet drop reasons in the Linux kernel

    • Quality testing the Linux kernel

    • How Red Hat has redefined continuous performance testing

    • Profiling vLLM Inference Server with GPU acceleration on RHEL

    • Get system-wide profiles of binaries without frame pointers

    Recent Posts

    • Exhaustive profiling toolkit: elfutils and libdwfl_stacktrace

    • What's new in Red Hat Developer Hub 1.8?

    • What’s new for developers in Red Hat OpenShift 4.20

    • Introducing the external secrets operator for OpenShift

    • OpenShift AI connector for Red Hat Developer Hub (Developer Preview)

    What’s up next?

    RHEL10-cheat-sheet-tile cards

    Red Hat Enterprise Linux 10 cheat sheet

    Seth Kenlon
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue