Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How I experimented with PGO enabled LLVM in Fedora

November 7, 2023
Konrad Kleine
Related topics:
Linux
Related products:
Red Hat Enterprise Linux

Share:

    In this article, I will discuss the promising experiments I’ve done in order to implement and evaluate a PGO enabled LLVM toolchain in Fedora. The official documentation for Profile-Guided Optimization (PGO) covers the technique and the differences between sampling and instrumentation. In my experiments, I’ve solely focused on instrumentation.

    It’s okay to think of PGO as a black box

    Applying PGO to any application can be quite an involved task, and there’s enough to consider even when thinking about PGO as a black box. If you have an application, a performance testing workload, and maybe even a validation workload, then you’re can try PGO. If you don’t have a validation workload, just split up your performance testing workload in half. The idea is that you don’t overfit your application to just one set of inputs and outputs, namely the performance testing workload.

    PGO is like a feedback driven recompilation of your application, which means, unlike compiling with -O2 or -O3 in clang, you cannot benefit from PGO unless you help your application progress through the following five phases:

    • Phase 1: Turning on an optimization that makes your application slower.

      Compile your application and enable PGO in it. It is just a flag in clang. Note, that this doesn’t turn on an optimization like -O2. When you execute your application now, it will output PGO files. So, if anything, this will make your application slower.

    • Phase 2: Gather feedback from your workload.

      Then, you can run the application with the performance testing workload and collect the PGO output. Don’t look at the performance of this run because there’s a bit of overhead going on for the instrumentation.

    • Phase3: Merge the feedback into one file.

      After that, you take the PGO output, which could be multiple files, and merge it into one file. Loosely speaking, if your application passes a certain code branch twice in each output, this phase will collect both numbers and put more weight onto that code branch.

    • Phase 4: Recompile your application.

      Next, recompile your application by feeding the merged PGO output into the compile process. This will help clang steer to make better decisions when ordering basic blocks (source).

    • Phase 5: Validate the performance improvement.

      Finally, test your application against the performance testing workload and the validation workload and make sure you get good performance optimizations out of both.

    Applying the 5 phases to LLVM in Fedora

    The previous section discussed an application without going into further details. Let’s exchange the word “application” with “any PGO optimizable binary from the LLVM toolchain.” Why do I use such complicated phrasing? LLVM is made up of sub-packages with many binaries amongst which clang is one of the most prominent. In Fedora and Red Hat Enterprise Linux we build packages using a so-called “standalone build-mode”. That is a rather antiquated build mode that dates back to when LLVM was organized in SVN as separate repositories. Nowadays, LLVM is a Git mono repository and the default way of building it is as such. In fact, some sub-projects are deprecating the “standalone build-mode” upstream.

    Through the course of my experiments, the reasons for which I wanted to keep the build mode that we have were twofold. First, I wanted to make my changes as less disruptive as possible to ease the potential adoption of my changes. Second, I wanted to compare the results against our existing LLVM toolchain.

    Most of my work circled around changing the following three repositories from which I built part of the LLVM toolchain:

    • https://src.fedoraproject.org/rpms/llvm
    • https://src.fedoraproject.org/rpms/lld
    • https://src.fedoraproject.org/rpms/clang

    I decided to build for one operating system and one architecture only, namely Fedora 37 on x86_64.

    I branched off from the aforementioned Fedora sources and created a branch in each repository to store my work. Then in phase 1, I would build the llvm, lld, and clang packages inside a Fedora Copr repository: kkleine/llvm-pgo-instrumented. Recall a PGO instrumented clang, when compiling an application will produce the regular application binaries but also PGO data files.

    In phase 2, I created another Copr repository to run the application against a workload and gather PGO feedback. This is where things became interesting. For the clang application, a workload can be any other project that is compiled using clang. For example, blender or chromium are just a few examples for the packages that we build and ship in our Fedora Linux distribution, so why not just use them? After all, wouldn’t it be nice to build a package in Copr and get a sub-package with the PGO data inside? As with *-debug packages, they are generated automatically. Can this be automated so we don’t have to modify each package’s spec file individually, but instead keep them as is? The answer is yes to all of this.

    Copr can build in different chroots depending on what operating system and architecture you want. So I first need to create a new project called kkleine/profile-data-collection in Copr and smuggle in our LLVM toolchain:

    copr create \
      --chroot fedora-37-x86_64 
      --repo copr://kkleine/llvm-pgo-instrumented   
      profile-data-collection

    Any package that I build in the kkleine/profile-data-collection project that has a BuildRequires, clang tag in the spec file will now use the PGO instrumented clang. For the sake of clarity, let’s just assume the package I want to build is blender. But how do I get an automatic sub-package with PGO data inside? This is a bit more involved, but I’ll show you how it can be done. You need to modify the buildroot in which to build blender so another package is already installed before you build blender.

    copr edit-chroot \
      --packages llvm-pgo-instrumentation-macros \    
      kkleine/profile-data-collection/fedora-37-x86_64

    This installs a so-called llvm-pgo-instrumentation-macros package into the build root. What this does is best explained with the chromium package. Chromium takes a long time to build. Seriously, it sometimes takes more than a day to finish. The number of clang or clang++ calls along the way is enormous. And each time I compile a file, many places in the clang code are used. The gathered PGO information in the produced files has lots of overlaps and can greatly be reduced by merging them.

    In fact, my first naive approach to compiling chromium was to let it produce as many PGO files as needed and only at the very end have them merged into one. As it turns out, some of our builders ran out of disk space because of PGO files. In total, there was more than a gigabyte of PGO data on disk. Luckily, you can continuously merge PGO files to always fit onto disk. For chromium, the resulting PGO files about clang were reduced from 1,6 GB to 2 MB.

    The background merge job and RPM macros

    WARNING: I’m not going to explain every RPM macro that I’m using because I want to emphasize the workflow in general and only hint at the overall complexity of this endeavor.

    RPM spec files can have sections like %prep, %build, %install, or %changelog to organize the build process, from unpacking sources to compiling and installing files on the target system. RPM itself taps into this process and stops any job at the end of a %build section.

    %__spec_build_post   %{___build_post}
    ___build_post  \
      RPM_EC=$?\
      for pid in $(jobs -p); do kill -9 ${pid} || continue; done\
      exit ${RPM_EC}\
    %{nil}

    In order to automatically start a background job that collects and merges PGO data, I overwrite the existing __spec_build_pre macro in /usr/lib/rpm/macros by appending to it:

    # Adds a backslash to each new line unless it is a %{nil} line
    function add_backslash() {
    sed '/^%{nil}/!s,$, \\,'
    }
    rpm --eval "%%__spec_build_pre %{macrobody:__spec_build_pre}" \
    | add_backslash
    echo "%{?__llvm_pgo_instrumented_spec_build_pre}"

    The appended macro then starts the background job, which is an endlessly running job that uses inotifywait to look for close_write events on files matching the .*\.profraw$ regular expression in a given directory. Here’s how the background job is started through RPM macros.

    # Where to store all raw PGO profiles
    %__pgo_profdir %{_builddir}/raw-pgo-profdata
    # Auxiliary PGO profile to which the background
    # job merges continuously
    %__pgo_background_merge_target %{_builddir}/%{name}.llvm.background.merge
    # Place where the background job stores its PID file
    %__pgo_pid_file /tmp/background-merge.pid
    %__llvm_pgo_instrumented_spec_build_pre \
    [ 0%{__llvm_pgo_subpackage} > 0 ] \\\
    && %{__pgo_env} \\\
    && /usr/lib/rpm/redhat/pgo-background-merge.sh \\\
       -d %{__pgo_profdir} \\\
       -f %{__pgo_background_merge_target} \\\
       -p %{__pgo_pid_file} & \

    As we’ve seen before, unfortunately RPM is not very nice when it comes to running merge jobs in the background of the %build section for a package. There are many good reasons for this. But in order to stop the background job I need to write to a file for which the background job continuously listens: %{__pgo_shutdown_file}. Once the background job confirms that it is ready it will delete its PID file %{__pgo_pid_file} which is for what we’ll wait in the macro to happen.

    # Overriding __spec_build_post macro from /usr/lib/rpm/macros
    %__spec_build_post \
      %{?__llvm_pgo_instrumented_spec_build_post} \
      %{___build_post}
    
    %__llvm_pgo_instrumented_spec_build_post    \
      if [ 0%{__llvm_pgo_subpackage} > 0 ]\
      then\
       echo 'please exit' > %{__pgo_shutdown_file};\
       [ -e %{__pgo_pid_file} ] && inotifywait -e delete_self %{__pgo_pid_file} || true;\
      fi\

    I use a technique similar to how debug information is automatically created as a sub-package without the spec file actually asking for it:

    # Generate profiledata packages for the compiler
    %__llvm_pgo_subpackage_template \
    %package -n %{name}-llvm-pgo-profdata \
    Summary: Indexed PGO profile data from %{name} package \
    %description -n %{name}-llvm-pgo-profdata \
    This package contains profiledata for llvm that was generated while \
    compiling %{name}. This can be used for doing Profile Guided Optimizations \
    (PGO) builds of llvm \
    %files -n %{name}-llvm-pgo-profdata \
    %{_libdir}/llvm-pgo-profdata/%{name}/%{name}.llvm.profdata \
    %{nil}

    Think of %{name}.llvm.profdata as the file to which we’ve continuously merged our PGO data in the background job.

    As you can see the sub-package with PGO data will be called %{name}-llvm-pgo-profdata where %{name} resolves to the Name: tag in a spec file.

    This completes phase 3 with the creation and bidirectional termination of the background job to gather and merge PGO data files.

    Recompilation of LLVM with PGO

    There’s only one thing left to do. What if I want to get and later use PGO data from multiple projects like blender and chromium together? This is possible by collecting all generated subpackages through BuildRequires: tags in another package called llvm-pgo-profdata. During the build of this llvm-pgo-profdata package, all profiles are merged into an indexed profile data file. The final llvm-pgo-profdata RPM then installs the indexed profile data file into a location from which a PGO optimized build of LLVM can read it. This PGO optimized build of the LLVM toolchain is done in a third Copr project called kkleine/llvm-pgo-optimized. The llvm.spec file contains these lines to pull in the profile data:

    %if %{with pgo_optimized_build}
    BuildRequires: llvm-pgo-profdata
    %endif

    Then when CMake is invoked, the only change needed is to pass along the profile data:

    %if %{with pgo_optimized_build}
        -DLLVM_PROFDATA_FILE=%{_libdir}/llvm-pgo-profdata/llvm-pgo.profdata \
    %endif

    Evaluation

    What I tested here is the LLVM shipped with rawhide at the time against a PGO optimized LLVM 16.0.2 that I built.

    I tested this using the LLVM test suite:

    “The test-suite contains benchmark and test programs. The programs come with reference outputs so that their correctness can be checked. The suite comes with tools to collect metrics such as benchmark runtime, compilation time and code size.”

    In the evaluation, I kept an eye on the execution, compile, and link time:

    $ /root/test-suite/utils/compare.py --metric exec_time --metric compile_time --metric link_time --lhs-name 16.0.3 --rhs-name 16.0.2-pgo /root/rawhide/results.json vs /root/pgo/results.json
    Warning: 'test-suite :: SingleSource/UnitTests/X86/x86-dyn_stack_alloc_realign.test' has no metrics, skipping!
    Warning: 'test-suite :: SingleSource/UnitTests/X86/x86-dyn_stack_alloc_realign2.test' has no metrics, skipping!
    Warning: 'test-suite :: SingleSource/UnitTests/X86/x86-dyn_stack_alloc_realign.test' has no metrics, skipping!
    Warning: 'test-suite :: SingleSource/UnitTests/X86/x86-dyn_stack_alloc_realign2.test' has no metrics, skipping!
    Tests: 3052
    Metric: exec_time,compile_time,link_time
    
    Program                                       exec_time                    compile_time                  link_time
                                                  16.0.3    16.0.2-pgo diff    16.0.3       16.0.2-pgo diff  16.0.3    16.0.2-pgo diff
    920428-1.t                                      0.00      0.00        inf%   0.00         0.00             0.03      0.02     -27.8%
    pr17078-1.t                                     0.00      0.00        inf%   0.00         0.00             0.03      0.03      -4.2%
    enum-2.t                                        0.00      0.00        inf%   0.00         0.00             0.03      0.04      36.4%
    doloop-1.t                                      0.00      0.00        inf%   0.00         0.00             0.03      0.04      30.0%
    divconst-3.t                                    0.00      0.00        inf%   0.00         0.00             0.02      0.02     -17.9%
    pr81556.t                                       0.00      0.00        inf%   0.00         0.00             0.03      0.03      24.6%
    divcmp-4.t                                      0.00      0.00        inf%   0.00         0.00             0.03      0.04      13.9%
    20020307-1.t                                    0.00      0.00        inf%   0.00         0.00             0.03      0.02     -26.5%
    20020314-1.t                                    0.00      0.00        inf%   0.00         0.00             0.02      0.03      23.7%
    divcmp-3.t                                      0.00      0.00        inf%   0.00         0.00             0.03      0.03     -20.3%
    20020328-1.t                                    0.00      0.00        inf%   0.00         0.00             0.03      0.03       6.0%
    20020406-1.t                                    0.00      0.00        inf%   0.00         0.00             0.03      0.03      27.0%
    20020411-1.t                                    0.00      0.00        inf%   0.00         0.00             0.04      0.03     -20.1%
    complex-4.t                                     0.00      0.00        inf%   0.00         0.00             0.03      0.03       1.4%
    20020508-1.t                                    0.00      0.00        inf%   0.00         0.00             0.04      0.03     -14.0%
                               Geomean difference                      -100.0%                         -9.7%                       -1.2%
               exec_time                             compile_time                             link_time
    l/r           16.0.3     16.0.2-pgo         diff       16.0.3   16.0.2-pgo        diff       16.0.3   16.0.2-pgo         diff
    count  3034.000000    3034.000000    2401.000000  2505.000000  2505.000000  440.000000  2505.000000  2505.000000  2505.000000
    mean   1091.690748    1074.387911    inf          0.259116     0.225875    -0.077137    0.049104     0.048398     0.014828
    std    21120.154138   20962.649384  NaN           2.214408     1.988421     0.199779    0.032997     0.032546     0.237169
    min    0.000000       0.000000      -1.000000     0.000000     0.000000    -0.494005    0.017100     0.017500    -0.551422
    25%    0.000000       0.000000      -0.227273     0.000000     0.000000    -0.195129    0.029100     0.029100    -0.161290
    50%    0.001100       0.001100       0.000000     0.000000     0.000000    -0.110612    0.034300     0.033600    -0.010672
    75%    0.126725       0.123600       0.212121     0.000000     0.000000     0.011439    0.045700     0.044400     0.161049
    max    817849.818925  828252.719527  inf          74.697400    69.996700    0.844595    0.206500     0.227000     0.980296

    The most important line to look at is this:

    Geomean difference       -100.0%     -9.7%    -1.2%

    To interpret the results, one has to understand that all tested programs are too fast to measure their execution time, hence the inf%. The compile time on the other hand shows a performance improvement of 9.7% when going from LLVM 16.0.3 to PGO optimized LLVM 16.0.2. The performance of linking was also improved by 1.2%.

    A close to 10% performance improvement for the compiler is quite good, given that I haven’t changed a single line in the compiler itself.

    Summary

    I previously mentioned our rather antiquated way of building llvm, clang, compiler-rt and openmp. While providing us with great turnaround times for releasing bug fixes, it has also caused us quite some trouble. That is why we internally investigate to build those packages as one. This would give us a facet of opportunities like bootstrap builds, LTO over all packages, and PGO without the need for multiple Copr repositories. Note that Fedora and RHEL are built with Koji, not Copr.

    I hope you enjoyed reading this and are as excited as I am about the potential changes to the LLVM toolset.

    Related Posts

    • How to clean up the Fedora root folder

    • ABI change analysis of Fedora packages

    • Getting started with llvm-toolset

    • Red Hat Enterprise Linux compiler toolset updates: Clang/LLVM 7.0, Go 1.11, Rust 1.31

    Recent Posts

    • GuideLLM: Evaluate LLM deployments for real-world inference

    • Unleashing multimodal magic with RamaLama

    • Integrate Red Hat AI Inference Server & LangChain in agentic workflows

    • Streamline multi-cloud operations with Ansible and ServiceNow

    • Automate dynamic application security testing with RapiDAST

    What’s up next?

    Convert2RHEL is a command-line utility that can streamline your migration path from CentOS Linux 7 to a fully supported Red Hat Enterprise Linux (RHEL) operating system. This cheat sheet outlines how to convert your CentOS Linux instance to RHEL in just 7 steps.

    Get the cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue