Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Customize the compilation process with Clang: Optimization options

August 5, 2019
Serge Guelton
Related topics:
Linux

    When using C++, developers generally aim to keep a high level of abstraction without sacrificing performance. That's the famous motto "costless abstractions." Yet the C++ language actually doesn't give a lot of guarantees to developers in terms of performance. You can have the guarantee of copy-elision or compile-time evaluation, but key optimizations like inlining, unrolling, constant propagation or, dare I say, tail call elimination are subject to the goodwill of the standard's best friend: the compiler.

    This article focuses on the Clang compiler and the various flags it offers to customize the compilation process. I've tried to keep this from being a boring list, and it certainly is not an exhaustive one.

    This write-up is an expanded version of the talk "Merci le Compilo" given at CPPP on June 15, 2019.

    The clang version used is based on trunk, running on RHEL 7.

    Every now and then, I'll be using the SQLite Amalgamation C source as a large third-party code. Let's assume that the following line has been sourced:

    sq=https://raw.githubusercontent.com/azadkuh/sqlite-amalgamation/master/sqlite3.c
    

    Introduction: Stating goals

    The following source code is a relatively dumb version of a program that sums up numbers read from standard input. It's most likely memory bound, but there's still some processing going on:

    #include <iostream>
    int main(int argc, char** argv) {
      long s = 0;
      while (std::cin) {
        long tmp = 0;
        std::cin >> tmp;
        s += tmp;
      }
      std::cout << s << std::endl;
      return 0;
    }
    

    This is a relatively similar—but not equivalent—program written in Python. Python uses big integers by default so it behaves differently with respect to overflow, but it's enough for our purposes.

    import sys
    print(sum(int(x) for x in sys.stdin.readlines()))
    

    Let's take a dumb approach and measure the execution time of these two programs on a relatively large input set:

    $ seq 1000000 > numbers
    $ clang++ sum.cpp -o sum
    $ time ./sum < numbers
    0.61s user 0.01s system 94% cpu 0.659 total
    
    $ time python sum.py < numbers
    0.77s user 0.04s system 99% cpu 0.818 total
    

    The native code certainly is faster, but not by much. We can't draw too many conclusions from a single run, but there's at least one sure thing: The clang user has not specified their intent, so the compiler just generated a valid binary—this is thankfully a hard constraint—and didn't try to optimize it for whatever metric its user is interested in.

    Had the user wanted to optimize for execution speed, they should have specified that intent, say, through the -O2 flag:

    $ clang++ -O2 sum.cpp -o sum
    $ time ./count < numbers
    0.34s user 0.00s system 99% cpu 0.348 total
    

    Multi-criteria optimization

    For a wide range of codebases, there's something more than just optimize for speed. Sometimes, you want to limit the size of the binary; sometimes, you're okay with trading speed for extra security. This also depends on where you are in the development life cycle. During code editing, for example, you want a fast analysis of your code, and during bug tracking, you want as much debug information as possible, etc.

     #
     ##                           #
     ##                           ##
     ##            ##             ##
     ##            ##             ##
     ##            ##             ##
     ##    ##      ##             ##
     ##    ##      ##      #      ##
     ##    ##      ##      ##     ##
    PERF  DEBUG   EDIT    SECU   SIZE
    

    Performance

    I want the generated binary to run fast is a very common query for the compiler, so the following flags are among the most used ones:

    • -O0: No optimization at all.
    • -O1: O1 = (O0 + O2)/(2). I scarcely use this flag.
    • -O2: Optimize as much as possible, without taking the risk of significantly increasing the binary size or degrading performance.
    • -O3: Optimize even more, trading binary size for speed, and sometimes making decisions that may negatively impact performance.
    • -O4: O3 = O4. This is a myth.

    Bonus: -O3 -mllvm -polly activates polyhedral optimizations, if Clang was compiled with Polly support.

    Debug

    I want to debug my code, I don't care about performance is sadly a common request too :-/

    • -g: Include debug information.
    • -Og: == -O1 -g. That's already a trade-off between performance and debuggability.

    For the curious ones, the following snippet verifies that debug information sections are actually generated when passing the -g flag:

    $ curl $sq | clang -xc -c -g - -o sq.o
    $ objdump -h sq.o | grep debug
      #  name            size      ...
       9 .debug_str      00012b2d  ...
      10 .debug_abbrev   0000038d  ...
      11 .debug_info     0005056c  ...
      12 .debug_ranges   00000240  ...
      13 .debug_macinfo  00000001  ...
      14 .debug_pubnames 0000c73a  ...
      15 .debug_pubtypes 00001068  ...
      19 .debug_line     00073402  ...
    

    Security

    I want to protect my code from others—and myself is growing in importance these days. There aren't a lot of flags that impact security without impacting performance, but it's worth mentioning -D_FORTIFY_SOURCE=2. This picks a different declaration for a few functions, for example:

    $ clang -xc -c -O2 - -S -emit-llvm -o - -D_FORTIFY_SOURCE=2 << EOF
    #include <stdio.h>
    void foo(char *s) {
      printf(s, s);
    }
    EOF
    define void @foo(i8*) {
      %2 = tail call i32 (i32, i8*, ...) @__printf_chk(i32 1, i8* %0, i8* %0)
      ret void
    }
    

    The macro definition enables a hardened version of printf, namely __printf_chk, that also checks the number of variadic argument.

    Size

    I want to do some kind of weight control over my binary may be a valid requirement for some embedded system. In that case, you can use:

    • -Os: Same as -O2 with extra code size optimization, including different parameters for transformations like inlining.
    • -Oz: Same as -Os with more size optimizations, at the price of less performance.

    Let's showcase the impact of theses flags on the amalgamation binary:

    $ curl $sq|clang -xc - -O2 -c -o-|wc -c
    1488400
    $ curl $sq|clang -xc - -Os -c -o-|wc -c
    850696
    $ curl $sq|clang -xc - -Oz -c -o-|wc -c
    796976
    

    Editing

    The compiler also helps to produce better code through a bunch of warning and code-editing features:

    • -Wall: (Almost) all warnings.
    • -Werror[=...]: If you believe that a warning should be an error, you can selectively enable that feature, per warning.
    • -w: If you don't know what it does, you probably don't want to :-)
    • -Xclang -code-completion-at: An internal flag that can be used by IDE to provide smart code completion.
    $ cat hello.cpp
    #include <iostream>
    int main(int argc, char**argv) {
      std::co
    $ clang++ -Xclang -code-completion-at=hello.cpp:3:10 -fsyntax-only hello.cpp
    COMPLETION: codecvt : codecvt<<#typename _InternT#>, <#typename _ExternT#>, <#typename _StateT#>>
    COMPLETION: codecvt_base : codecvt_base
    ...
    COMPLETION: cout : [#ostream#]cout
    

    In this case, clang outputs all identifiers starting with co available in namespace std.

    In the next article, we'll look at various compromises and tradeoffs involved in optimization, such as debug precision versus binary size, the impact of the optimization level on compilation time, and performance versus security. Stay tuned.

    Last updated: July 29, 2019

    Recent Posts

    • Federated identity across the hybrid cloud using zero trust workload identity manager

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.