Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Limitations of frame pointer unwinding

October 30, 2024
Serhei Makarov
Related topics:
Linux
Related products:
Red Hat Enterprise Linux

    Recent versions of commonly used Linux distributions including Fedora and Ubuntu have disabled frame pointer optimizations with the goal of allowing profiling tools to produce stack traces without needing to include a call-frame information interpreter. In this article, I will explain some overlooked limitations of unwinding with frame pointers and why enabling frame pointers does not constitute a full solution to enable profiling. I will also list some initiatives that aim to enable system-wide profiling without the need for frame pointers.

    Overview

    Several recent articles have discussed the interaction of frame pointer optimization defaults and profiling, including Guinevere Larsen’s overview of the issue, Will Cohen’s article on call-frame information and unwinding, and my own article on profiling frame pointer-less code with eu-stacktrace.

    In short, modern compilers can produce code either with or without a frame pointer register indicating the beginning of the current stack frame. With frame pointers enabled, the structure of the call stack is trivial to analyze; without frame pointers, an additional register becomes available for general computation. Since around 2011 with GCC version 4.6, the default has been to omit the frame pointer register, which means that debugging and profiling tools must use call-frame information to produce stack traces. In user space, call-frame information in the DWARF-based .eh_frame format is universally available.

    Unfortunately, it has not been feasible to include a full interpreter for DWARF and .eh_frame bytecode in the Linux kernel. Thus, the kernel’s perf_events framework can only use frame pointer unwinding for user-space code, which has affected profiling tools based on this framework and made them non-functional on most Linux distributions. This led to widespread calls to recompile distributions with frame pointers enabled, in the hopes of enabling system-wide stack trace profiling based on perf_events.

    Unfortunately, there are several issues with frame pointer unwinding that have been overlooked in the recent discussions:

    1. Uneven distribution of performance gains and losses
    2. Function prologues and epilogues
    3. Assembly-code functions in libraries

    Uneven distribution of performance gains and losses

    First, the users most impacted by the slowdown due to frame pointers are different from the users who benefit from profiling-driven fixes. This creates a win-lose tradeoff that cannot be discussed in a satisfying fashion.

    In general, one group of users is concerned with performance losses on systems with large numbers of interacting components. Such systems can exhibit issues due to mistuning, which can be fixed for a large performance impact when a profiler is available.

    Another group of users are concerned with raw computational capacity when obtaining the maximum degree of optimization from their compiler. There are no low-hanging fruit for such users to find; all that they get from re-enabling frame pointers is a 1-2% performance loss, which translates to the loss of about 1 or 2 years of compiler improvements.

    It is never good for an upstream project to be forced to decide which group of users is more important. This is especially true for projects whose user base is as large and diverse as that of a compiler or Linux distribution. Thus, there is a lot of motivation to develop a profiling solution that does not require frame pointer optimizations to be disabled.

    Function prologues and epilogues

    Second, the profiles produced by frame pointer unwinding will inevitably exhibit gaps around function prologues and epilogues and in procedure lookup table (PLT) sections. In these portions of an executable, the frame pointer register does not accurately reflect the current stack frame, which causes the frame pointer unwinder to skip the innermost function. In particular, this affects the validity of the profile for evaluating code-locality optimizations.

    A lower-bound estimate of the problem can be obtained by a method suggested by Will Cohen. The minimum size of an x86 function prologue is 8 bytes. We can use the following perf command to check the number of samples that fall into the first 8 bytes of a function and are thus guaranteed to have an inaccurate frame pointer:

    # perf report --sort=sample,symoff
      | grep -E '0x[01234567]$' # first 8 bytes only
      | grep -v "[k]" | grep -v "@plt" # userspace only

    When tested on x86, this analysis yielded about 5.2% of samples falling into the first 8 bytes of a function. Therefore, at least that proportion of samples will have an inaccurate stack trace when frame pointer unwinding is used. The actual proportion is likely to be greater, since compiler optimizations may expand the prologue with additional initialization code. Similarly, on aarch64, where the minimal function prologue size is 12 bytes, a minimum of 6.0% of inaccurate samples were found to occur. The frequent occurrence of samples early in a function may be a result of sampling being more likely to happen immediately after code is loaded into cache after a TLB miss.

    In any case, this means that even with frame pointers enabled, call-frame information is still required to obtain an accurate profile.

    Assembly-code functions in libraries

    Third, the existence of hand-written assembly-code functions in commonly-used libraries, particularly the glibc string and memory manipulation functions, causes another source of inaccuracy. Again, the assembly-code sections do not maintain the frame pointer register the same way that an ordinary function call does.

    In the best case, the frame pointer unwind will skip the caller of the assembly-code function. That is, if function f calls function g which calls strcpy, the resulting stack trace will claim that function f called strcpy directly. In the worst case, if the assembly-code function uses the frame pointer register for general-purpose computation, the unwind will not be able to proceed at all.

    On the other hand, a Call Frame Information (CFI) unwinder will be able to unwind the call correctly. Since around 2003, the glibc assembly code has been hand-annotated with CFI directives, and these document how the canonical frame address can be calculated relative to the stack pointer, or relative to a value spilled to memory, or otherwise. Any other library that includes similar annotations can enable accurate CFI unwinding of assembly-code.

    To support frame pointer unwinding by modifying the glibc assembly-code functions to stop using the frame pointer register for computation and imitate frame pointer-enabled code emitted by a compiler is not a likely prospect. In addition to the volume of the work required, only a subset of the glibc users would find such a change desirable.

    Alternatives to frame pointer unwinding

    Fortunately, there are signs that frame pointer enablement in current Linux distributions is only a stopgap measure. Several initiatives are underway, each of which would make it feasible to obtain profiles via perf_events without relying on a frame pointer unwinder:

    1. My own eu-stacktrace project was described in a prior article. As of the time of writing, an initial version has been merged upstream into elfutils release 0.192 and can be enabled by compiling elfutils with the --enable-stacktrace option, as described in the README.
    2. The SFrame project is a simplified call-frame information format with stronger efficiency guarantees than .eh_frame, albeit slightly less flexibility. As of the time of writing, a patchset to implement SFrame unwinding for perf_events is being reviewed for inclusion in the Linux kernel. After that, SFrame support will need to be added to elfutils and then major distributions will consider compiling their packages with .sframe sections by default.
    3. New generations of hardware will include shadow stack support. Shadow stacks are a security feature which uses hardware assistance to track the structure of the call stack and monitor its integrity. This would also allow stack traces to be obtained without relying on call-frame information or on frame pointers.

    Overall, the accuracy of a stack trace profile depends on a number of subtle details that are easy to overlook. Fortunately, the current projects seem to be on track to improve the quality of profile information over what has been available in the past.

    Last updated: November 6, 2024

    Related Posts

    • Get system-wide profiles of binaries without frame pointers

    • Frame pointers: Untangling the unwinding

    • Debuginfo is not just for debugging programs

    • Improving math performance in glibc

    Recent Posts

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    • How EvalHub manages two-layer Kubernetes control planes

    • Tekton joins the CNCF as an incubating project

    What’s up next?

    This cheat sheet helps you get familiar with over 30 basic Linux command-line executables frequently used by developers.

    Get the cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.