Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Get system-wide profiles of binaries without frame pointers

June 11, 2024
Serhei Makarov
Related topics:
Linux
Related products:
Red Hat Enterprise Linux

    Many profiling tools on Linux have previously been limited by their reliance on stack unwinding algorithms that require commonly-used frame pointer optimizations to be disabled. This article introduces eu-stacktrace, a prototype tool that uses the elfutils toolkit’s unwinding libraries to support a sampling profiler to unwind frame pointer-less stack sample data.

    Background

    Developers and customers find benefit in profiling the performance of their applications in both development and production environments. A typical requirement for useful profile data is an accurate stack trace listing the active functions at every sample in the profile. Commonly used profiling tools try to fulfill this requirement with basic unwinder implementations that assume programs to have been compiled with a stack frame format that includes frame pointers.

    The reliance on frame pointer-based unwinding has led to a conflict of priorities in recent versions of popular Linux distributions. Historically, the omit-frame-pointer optimization in GCC has been a popular compiler default. This optimization reduces register pressure by reassigning the frame pointer register on architectures such as x86 to store general-purpose data. Recently, various users reliant on profiling have requested that this compiler optimization be disabled to allow existing profilers to function with system programs, while other users have been concerned about the resulting performance loss even on systems where profiling functionality is never used.

    It is worth noting that the elfutils toolkit includes a more versatile unwinder implementation that relies on .eh_frame data included in a program’s executable file. The .eh_frame format is a subset of the DWARF debug information format, restricted to call-frame information. Programs in almost all Linux distributions, including Red Hat Enterprise Linux and Fedora, are packaged and shipped with .eh_frame sections included in the executables.

    To make the elfutils unwinder implementation available for use by sampling profilers, I have been working on a tool called eu-stacktrace. At this point in time, there is a proof-of-concept version of eu-stacktrace that integrates with a patched version of the Sysprof whole-system sampling profiler.

    In the following sections, I present the design of eu-stacktrace, compare the effectiveness of Sysprof with and without eu-stacktrace, and describe further goals for development. It is my hope that eu-stacktrace can help profiling tools on Linux to work reliably regardless of the presence or absence of frame pointers in the compiled applications.

    Implementation

    The prototype version of eu-stacktrace consists of a command line tool implemented in a branch of the elfutils source repository and a patchset for the Sysprof profiler.

    Sysprof is a whole-system profiler that uses the Linux kernel’s perf_events framework to periodically sample the processes and threads running on each CPU, recording a syscap file containing a stream of sample packets. The syscap file can then be visualized in Sysprof’s graphical interface.

    In its existing implementation, Sysprof invokes perf_events with the PERF_SAMPLE_CALLCHAIN option, which requests the kernel to analyze frame pointers to identify the sequence of program counters in the stack data of a process. To produce a stack trace from this sequence, Sysprof maps the program counters to function names via a simple post-processing pass that runs after the profile data has been captured. However, this method cannot be used to profile programs which were compiled without frame pointers.

    In order to use eu-stacktrace for stack unwinding, the patched version of Sysprof instead configures perf_events with the PERF_SAMPLE_STACK option, which requests the kernel to return a fixed-size portion of the program’s stack data.

    The eu-stacktrace command line tool is launched concurrently with Sysprof and used as a helper process to unwind the stack samples.

    Sysprof sends stack sample packets to the eu-stacktrace process through a fifo. Then eu-stacktrace retrieves any .eh_frame information available for the profiled programs, unwinds each stack sample to produce a sequence of program counters, then writes the program counters to the syscap file as a sample packet in the exact same format that Sysprof would generate in its default mode of operation. Sysprof’s post-processing pass works exactly as before, reading the syscap file and appending function information.

    The following command-line example clarifies how Sysprof and eu-stacktrace exchange data:

    mkfifo /tmp/stacktrace.fifo
    
    # eu-stacktrace reads from fifo, writes to test.syscap:
    eu-stacktrace </tmp/stacktrace.fifo >test.syscap &
    
    # sysprof writes sample packets to fifo during its profiling pass,
    # then appends to test.syscap during its annotation pass
    sysprof-cli --sample-stack --use-fifo=/tmp/stacktrace.fifo test.syscap
    

    However, the most convenient way to use Sysprof with eu-stacktrace is through the --use-stacktrace option, which will instruct the patched version of Sysprof to launch an eu-stacktrace process automatically:

    sysprof-cli --use-stacktrace test.syscap

    Estimate of effectiveness

    It’s important to check that the overhead of CFI unwinding with eu-stacktrace is not too large compared to Sysprof’s default mode of operation. If this overhead turns out to be in the same range as the performance loss from compiling programs with frame pointers, that would make a strong argument for re-enabling the frame pointer removal optimization once CFI unwinding is generally accessible by profilers.

    To give an initial idea of the CPU overhead of eu-stacktrace unwinding compared to Sysprof’s default mode of operation, I used Sysprof with and without eu-stacktrace to profile a system that was running the stress-ng "matrix" benchmark, invoked with stress-ng --matrix 0 -t 30s. On a system that was otherwise lightly loaded, using Sysprof with the default frame pointer profiling resulted in 0.09% of the samples coming from the sysprof-cli profiler process, while profiling with eu-stacktrace resulted in 1.18% of the samples coming from sysprof-cli and eu-stacktrace.

    The overhead of the elfutils unwinder scales with the number of distinct processes for which eh_frame data needs to be processed, rather than with the number of samples. After launching several desktop applications and re-running the benchmark, the profiling overhead rose to 1.39% of the total samples.

    According to Fedora project discussions around the time frame pointers were being re-enabled in major distributions, slowdown due to frame pointers is reported to fall within the range of 0…2%. More extreme slowdowns have been observed for particular programs such as the Python interpreter, but are not ubiquitous.

    It is important to note that, unlike with overhead due to profiling, slowdown due to frame pointers occurs regardless of whether a particular system is being profiled or will ever need to be profiled. Thus, approximately 1% overhead with eu-stacktrace only during profiling is a reasonable tradeoff over 0…2% overhead for frame pointer inclusion on every system, all of the time. The overhead could be further reduced by making eu-stacktrace accessible via a library API rather than a fifo, at the cost of requiring more complex modifications to the profiling tools that use it.

    Next steps

    As of the time of writing, there are several remaining tasks to make eu-stacktrace work off-the-shelf as a solution for profiling without frame pointers. In particular, additional fixes are needed to make the implementation portable across architectures (the current prototype works on x86_64 systems) and to handle executables within containers; and more detailed benchmarking is desirable to estimate the upper limit for the complexity of a workload that can be handled within a given target profiling overhead (i.e., less than 2%).

    After that, it will be feasible to integrate eu-stacktrace with other profiling tools beyond Sysprof. Currently, having eu-stacktrace to interface with the profiling tool through a fifo ensures that the changes to the profiling tool will be as simple as possible.

    The eu-stacktrace prototype is available in a branch of the elfutils source repository, and the README describes how to build and test it with the currently-required Sysprof patches.

    Related Posts

    • Improvements to static analysis in the GCC 14 compiler

    • Frame pointers: Untangling the unwinding

    • Introducing debuginfod, the elfutils debuginfo server

    • How to retrieve packet drop reasons in the Linux kernel

    • New C++ features in GCC 14

    • How lazy debuginfo loading improves GDB and Valgrind

    Recent Posts

    • MCP servers vs. skills: Choosing the right context for your AI

    • How to route external and local LLMs with Models-as-a-Service

    • Protect data offloaded to GPU-accelerated environments with OpenShift sandboxed containers

    • Case study: Measuring energy efficiency on the x64 platform

    • How to prevent AI inference stack silent failures

    What’s up next?

    Interested in improving your Linux skills? This cheat sheet presents a collection of Linux commands and executables for developers who are using the Linux operating system in advanced programming scenarios.

    Advanced Linux Commands Cheat Sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.