Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Customize the compilation process with Clang: Making compromises

August 6, 2019
Serge Guelton
Related topics:
Linux

Share:

    In this two-part series, we're looking at the Clang compiler and various ways of customizing the compilation process. These articles are an expanded version of the presentation, called Merci le Compilo, which was given at CPPP in June.

    In part one, we looked at specific options for customization. And, in this article, we'll look at some examples of compromises and tradeoffs involved in different approaches.

    Making compromises

    Debug precision vs. size

    Increasing the accuracy of debug information leads to a bigger binary. On the opposite, decreasing the accuracy of debug information reduces its size. You can control this behavior with:

    • -g1: Lower precision.
    • -g2
    • -g3: Higher precision.
    • -fdebug-macro: Include debug information for macros!

    Recall that one can extract debug information to a separate file. Distributions like Fedora, Red Hat Enterprise Linux, or Debian do that and provide separate debug packages:

    • objcopy --only-keep-debug to extract debug information.
    • objcopy --compress-debug-sections to compress them.

    Unlike gcc, clang doesn't make any difference between -g2 and -g3 on our test case:

    $ for g in 1 2 3 ""
      do
        printf "-g$g: \t" && curl $sq | clang -c -O2 -g$g -xc - -o- | wc -c
      done
    -g1   : 3168632
    -g2   : 7025488
    -g3   : 7025488
    -g    : 7025488
    

    Bonus: -fdebug-macro -g : 7167752

    Impact of the optimization level on compilation time

    One could expect that more optimization takes more time—that the compiler tries harder. The following experiment, however, invalidates this intuition:

    $ for O in 0 1 2 3
      do
      /usr/bin/time -f "-O$O: %e s" clang sqlite3.c -c -O$O
      done
    -O0: 22.15 s
    -O1: 24.02 s
    -O2: 22.68 s
    -O3: 22.36 s
    

    This is still understandable; many optimizations remove instructions from the code, which leads to smaller input and thus faster processing by later optimization steps.

    Accuracy vs. performance

    In some situations, it may be relevant to trade accuracy (of the computations) for performance. This is especially true for floating-point operations:

    • -ffp-contract=fast|on|off: Floating-point expression contraction.
    • -ffast-math: Assume floating-point arithmetic is associative and that there's no NaN, inf or denormalized numbers.
    • -freciprocal-math: Optimize division by a literal.
    • -Ofast: -O3 + -ffast-math = -Ofast.

    The following example illustrates how the compiler can turn a (slow) division into a (faster) multiply:

    $ clang -xc - -o- -S -emit-llvm -O2 -freciprocal-math << EOF
    double rm(double x) {
      return x / 10.;
    }
    EOF
    define double @rm(double) {
      %2 = fmul arcp double %0, 1.000000e-01
      ret double %2
    }
    

    This example shows that clang successfully vectorizes the sum of a vector of double, taking advantage of -Ofast to change the instruction order and vectorize them, as the <2 x double> LLVM vector type points out.

    $ clang -xc++ - -o- -S -emit-llvm -Ofast << EOF
    #include <numeric>
    #include <vector>
    using namespace std;
    double acc(vector<double> const& some)
    {
      return accumulate(
               some.begin(),
               some.end(),
               0.);
    }
    EOF
    ...
    %95 = fadd fast <2 x double> %94, %93
    ...
    

    Portability vs. performance

    A binary may either be generic for an architecture, say x86_64, or take advantage of some instruction set (e.g., AVX). Trading one for the other can provide a great performance boost, at the cost of constraining the binary to a specific processor family.

    • -march=native: Use all instructions available on the host architecture.
    • -mavx: Generate code that can use the AVX instruction set (even if it's not available on the host).

    The following code combines an architecture-specific feature, here the availability of fused multiply add, with relaxation of floating point accuracy:

    $ clang++ -O2 -S -o- -march=native -ffp-contract=fast << EOF
    double fma(double x, double y, double z) {
      return x + y * z;
    }
    EOF
    ...
    vfmadd213sd %xmm0, %xmm2, %xmm1
    

    Performance vs. security

    The clang compiler provides several sanitizers that perform runtime-checking of various aspects of the program. Combined with a decent test suite, it is a good way to detect problems in one's program. It's usually considered a bad idea to ship software compiled with sanitizer flags as they significantly impact performance—the impact is less than running Valgrind on uninstrumented executables, though.

    • -fsanitize=address: Instrument memory accesses, adding out-of-bound checks.
    • -fsanitize=memory: Trace accesses to uninitialized values.
    • -fsanitize=undefined: Trace undefined behavior.
    • -fsanitize=thread: Detect data races in multi-threaded program.

    To illustrate the impact of instrumentation, let's investigate the LLVM bitcode generated by the compilation of the following snippet:

    // mem.cpp
    #include <memory>
    double x(std::unique_ptr<double> y) {
      return *y;
    }
    
    $ clang++ -fsanitize=address mem.cpp -S -emit-llvm -o- -O2
    

    Right before the memory access through a getelementptr, a key is computed and looked up to determine the status of the referenced memory location of the pointer. The code then branches on that checks and either reports an error or goes on.

    ...
    %h = getelementptr inbounds %"class.std::unique_ptr", %"class.std::unique_ptr"* %y, i64 0, i32 0, i32 0, i32 0, i32 0, i32 0
    %1 = ptrtoint double** %h to i64
    %2 = lshr i64 %1, 3
    %3 = add i64 %2, 2147450880
    %4 = inttoptr i64 %3 to i8*
    %5 = load i8, i8* %4
    %6 = icmp ne i8 %5, 0
    br i1 %6, label %7, label %8
    
    ; <label>:7:
    call void @__asan_report_load8(i64 %1)
    call void asm sideeffect "", ""()
    unreachable
    
    ; <label>:8:
    %9 = load double*, double** %h, align 8
    
    from __future__ import

    The Clang compiler supports different version of the C++ standard, so that if you're working on a given codebase, you can control which language features you're allowed to use. This capability is especially important if you plan to have a codebase compilable by several toolchains: the language version is firm common ground.

    • -std=c++11/14/17: Choose your standard version.
    • -std=gnu11/...: Pick your poison, and allow usage of a dialect.
    • -fcoroutines-ts: Enable experimental Technical Specifications.

    Using the clang CLI auto-completion feature bundled in clang itself, it is possible to list all supported standards:

    $ clang --autocomplete=-std=,
    ...
    c++2a
    ...
    cuda
    ...
    gnu1x
    ...
    iso9899:2011
    

    Control security features

    It's also possible to insert various kinds of countermeasures in the code to prevent basic attacks that exploit buffer overflows or ROP.

    • -fstack-protector: Add a stack canary, to detect (some) stack smashing.
    • -fstack-protector-strong: Same as above, but applied to more functions.
    • -fstack-protector-all: Same as above, but applied to all functions. The stack probing is not particularly costly but this does make your code slower and bigger.
    • -fsanitize=safe-stack: Split the stack in a RO stack and a RW stack, to make it harder to smash the stack
    • -fsanitize=cfi: Instrument control flow to detect various situation where an opponent could take control of the control flow. Various protection schemes exist, see Control Flow Integrity documentation.

    Let's have a look at the flight of a (stack) canary:

    $ clang -O2 -fstack-protector-all -S \
    -o- -xc++ - << EOF
    #include <array>
    using namespace std;
    auto access(array<__int128_t, 10> a,
                unsigned i)
    {
      return a[i];
    }
    EOF
    ...
    cmpq    (%rsp), %rcx
    jne .LBB0_2
    popq    %rcx
    retq
    .LBB0_2:
    callq   __stack_chk_fail
    

    At the end of the function, -fstack-protector-all has inserted a check between a value and the stack canary, leading to __stack_chk_fail being called if the comparison fails.

    Feeding the compiler

    Intuitively, the more information the compiler has, the better it can apply its optimizations. You can either gather more information or provide compiler hints.

    Profile guided optimization (PGO)

    If you have a relevant sample use case for your application, and you're willing to optimize your application based on that sample, you can use Profile Guided Optimization (PGO).

    1. Compile the whole code base with -fprofile-generate. This generates extra code to record the functions and branches most frequently visited.
    2. Run the generated binaries on the use cases.
    3. Recompile your code with -fprofile-use.

    Thanks to the information gathered, the compiler can better group and place functions, group and place basic blocks, and it has better hints for optimizations like loop unrolling or inlining.

    Link time optimization (LTO)

    Back in the day, separate compilation was a requirement due to memory limitations. Now, loading the whole program in memory during the compilation may be a valid option: that's Link Time Optimization.

    • -flto=full: at link time, the whole program is optimised once more. The memory requirements are more important but this opens more optimisation opportunities.
    • -flto=thin: for each function, extra summaries are computed, and the compiler can make its decision based on these summaries, lowering the memory requirements, at the expense of potentially missing some optimisations.

    As a fun fact, the -flto flag actually produces LLVM bitcode instead of ELF file:

    $ echo 'foo() { return 0;}' | clang -flto -O2 -xc - -c -ofoo.o
    $ file foo.o
    foo.o: LLVM bitcode
    

    Tuning optimization

    Some individual passes accept extra parameters to control threshold effects. Most notably:

    • -mllvm -inline-threshold=n : controls inlining.
    • -mllvm -unroll-threshold=n : controls unrolling.

    The greater the threshold, the more functions are inlined and more loops are unrolled.

    Unfortunately, this applies to the whole compilation unit. For finer grained control, one can rely on compiler directives, a.k.a pragmas.

    The following pragmas are valid on a loop and control various aspects of loop optimisations. Their effect is relatively straightforward: control whether the compiler will unroll the given loop or not, choose the unrolling factor

    #pragma clang loop unroll(enable|full)
    #pragma clang loop unroll_count(8)
    

    It's also possible to have a targeted version of -ffp-contract using the following pragma. In that case, the specified contract strategy is only valid for the decorated instruction, and the default contract (or the one specified through the command line) is applied otherwise.

    #pragma clang fp contract(fast)
    

    More pragmas are detailed in the Language Extension documentation

    Getting Compiler Feedback

    It's a well-known fact that compiling C++ can take, say, some time. Clang can provide detailed feedback on how much time it spent in each compilation step. The relevant flag is -ftime-report.

    To follow the optimization process in detail, it's also possible to ask for a verbose output of the optimization process, on a per-optimization basis, using the remark mechanism:

    • -Rpass=inline
    • -Rpass=unroll
    • -Rpass=loop-vectorize

    These flags tend to produce a lot of noise though:

    $ { clang -xc++ - -c \
      -O2 -Rpass=inline << EOF
    #include <numeric>
    #include <vector>
    using namespace std;
    double acc(vector<double> const& some)
    {
      return accumulate(
               some.begin(),
               some.end(),
               0.);
    }
    EOF
    } 2>&1 | c++filt
    ...
    ... remark: __gnu_cxx::__normal_iterator<double const*, std::vector<double, std::allocator<double> > >::__normal_iterator(double const* const&) \
    ... inlined into std::vector<double, std::allocator<double> >::begin() const with cost=-40 (threshold=337) [-Rpass=inline]
    

    Concluding words

    These two articles aimed to show that a compiler is a complex piece of software with many more levers of action than just make the code faster ones. Compilation speed, executable size, security, making the development process faster, are just some of the multiple targets a compiler tries hard to cover.

    Exploring the various flags is an endless quest, but here's a worthy one:

    $ clang --autocomplete=- | wc -l  # Count the number of compiler options that Clang accepts
    3197
    

    And this doesn't even take into account all the low-level tuning that can be done on the LLVM level!

    Last updated: July 29, 2019

    Recent Posts

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    • How to integrate vLLM inference into your macOS and iOS apps

    • How Insights events enhance system life cycle management

    • Meet the Red Hat Node.js team at PowerUP 2025

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue