Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Profile-guided optimization in Clang: Dealing with modified sources

July 6, 2020
Serge Guelton
Related topics:
Linux

Share:

    Profile-guided optimization (PGO) is a now-common compiler technique for improving the compilation process. In PGO (sometimes pronounced "pogo"), an administrator uses the first version of the binary to collect a profile, through instrumentation or sampling, then uses that information to guide the compilation process.

    Profile-guided optimization can help developers make better decisions, for instance, concerning inlining or block ordering. In some cases, it can also lead to using obsolete profile information to guide compilation. For reasons that I will explain, this feature can benefit large projects. It also puts the burden on the compiler implementation to detect and handle inconsistencies.

    This article focuses on how the Clang compiler implements PGO, and specifically, how it instruments binaries. We will look at what happens when Clang instruments source code during the compilation step to collect profile information during execution. Then, I'll introduce a real-world bug that demonstrates the pitfalls of the current approach to PGO.

    Note: To learn more about PGO for Clang, see the Clang Compiler User's Manual.

    Instrumenting code in Clang

    In Clang, the -fprofile-instr-generate flag instructs the compiler to instrument code at the source instruction level, and the -fprofile-generate flag instructs the compiler to instrument code at the LLVM intermediate representation (IR) level. Both approaches share a design philosophy, with some differences in granularity. Our topic is -fprofile-instr-generate, and the way that it interacts with source code changes between profiling and recompilation.

    Consider the following scenario:

    1. Compile a code sample (C0) with -fprofile-instr-generate.
    2. Run it to collect profile information (P0).
    3. Edit the C0 sample and turn it into a new version, C1.
    4. Compile C1 using the original P0 profile information.

    How Clang deals with code modification

    The scenario of using somewhat obsolete profile information might seem odd because we usually compile, profile, and recompile. The profiling step can be quite time-consuming, however. In some cases, it is tempting for big projects to provide downloadable profile information based on a source snapshot. Administrators can then use the snapshot to recompile the code without the pain of collecting a new profile every time. (The dotnet runtime takes this approach.)

    Furthermore, for projects with a high commit rate, it could be unfeasible to provide profile information for each commit. As a result, slight changes to the code might not be documented in the profile used for recompilation. So, how would Clang cope with that?

    The trivial answer of "compare checksums for the whole file" is not satisfying because a slight change would invalidate the whole compilation unit. But the actual mechanism relies on the same idea: On a function basis, compute a checksum on the abstract syntax trees (AST), based on the tree structure. That way, changing a function doesn't invalidate the profile information collected for other functions. Of course, this approach has limitations. Removing a call site changes the number of times the function is called, and thus its hotness. But at least it prevents having profile information that points to code that no longer exists, and the other way around.

    Currently, if such outdated profile information is used, the Clang compiler ignores it and prints a warning:

    > echo 'int main() { return 0; }' > a.c && clang -fprofile-instr-generate a.c && LLVM_PROFILE_FILE=a.profraw ./a.out
    > llvm-profdata merge -output=a.profdata a.profraw
    > printf '#include \nint main() { if(1) puts("hello"); return 0; }' > a.c && clang -fprofile-instr-use=a.profdata a.c
    warning: profile data may be out of date: of 1 function, 1 has mismatched data
          that will be ignored [-Wprofile-instr-out-of-date]
    1 warning generated.
    

    When the improbable happens

    Recently, I was tasked with debugging a Clang segmentation fault (segfault), which was raised as an issue in Red Hat Bugzilla Bug 1827282. After debugging, I ended up with two functions having the same checksum:

    extern int bar;
    
    // first version
    void foo() {
        if (bar) { }
        if (bar) { }
        if (bar) { if (bar) { } }
    }
    
    // second version
    void foo() {
        if (bar) { }
        if (bar) { }
        if (bar) { if (bar) { if (bar) { } } }
    }
    

    That's a strange outcome because the checksum algorithm used in Clang relies on MD5, so the chance of having a conflict should be very low. Did the improbable happen?

    It turns out that it didn't. The conflict was due to a slight bug in the way the hashing was finalized, and we fixed it with a patch (D79961). Basically, when computing the hash, a buffer (uint64_t) needs to be filled. Once it's full, it is converted to an array of bytes and sent to the hashing routine. In the final steps, the uint64_t was directly sent to the routine and implicitly converted to a uint8_t, thus potentially ignoring the trailing nodes of the AST. We resolved the issue by adding a new test case that trivially tests that a small function change is reflected in the hash value.

    The patch works, but it changes the hash of most existing functions—namely, of each of those that had more than one element in their last buffer. That is an important side-effect because changing the hash invalidates most of the existing cached profiling information. Fortunately, the patch doesn't impact the typical "compile, profile, recompile" scenario, but it could be an issue for large build systems that pre-compute profile data for the client to download as part of the build process.

    Conclusion

    Clang and GCC both support using obsolete profile information to guide the compilation process. If a function body changes, obsolete information is ignored. This feature can be beneficial for large projects, where gathering profile information is costly. This puts an extra burden on the compiler implementation to detect and handle inconsistencies, which also increases the likelihood of a compiler bug.

    Last updated: June 30, 2020

    Recent Posts

    • Create and enrich ServiceNow ITSM tickets with Ansible Automation Platform

    • Expand Model-as-a-Service for secure enterprise AI

    • OpenShift LACP bonding performance expectations

    • Build container images in CI/CD with Tekton and Buildpacks

    • How to deploy OpenShift AI & Service Mesh 3 on one cluster

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue