Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Customize the compilation process with Clang: Optimization options

August 5, 2019
Serge Guelton
Related topics:
Linux

Share:

    When using C++, developers generally aim to keep a high level of abstraction without sacrificing performance. That's the famous motto "costless abstractions." Yet the C++ language actually doesn't give a lot of guarantees to developers in terms of performance. You can have the guarantee of copy-elision or compile-time evaluation, but key optimizations like inlining, unrolling, constant propagation or, dare I say, tail call elimination are subject to the goodwill of the standard's best friend: the compiler.

    This article focuses on the Clang compiler and the various flags it offers to customize the compilation process. I've tried to keep this from being a boring list, and it certainly is not an exhaustive one.

    This write-up is an expanded version of the talk "Merci le Compilo" given at CPPP on June 15, 2019.

    The clang version used is based on trunk, running on RHEL 7.

    Every now and then, I'll be using the SQLite Amalgamation C source as a large third-party code. Let's assume that the following line has been sourced:

    sq=https://raw.githubusercontent.com/azadkuh/sqlite-amalgamation/master/sqlite3.c
    

    Introduction: Stating goals

    The following source code is a relatively dumb version of a program that sums up numbers read from standard input. It's most likely memory bound, but there's still some processing going on:

    #include <iostream>
    int main(int argc, char** argv) {
      long s = 0;
      while (std::cin) {
        long tmp = 0;
        std::cin >> tmp;
        s += tmp;
      }
      std::cout << s << std::endl;
      return 0;
    }
    

    This is a relatively similar—but not equivalent—program written in Python. Python uses big integers by default so it behaves differently with respect to overflow, but it's enough for our purposes.

    import sys
    print(sum(int(x) for x in sys.stdin.readlines()))
    

    Let's take a dumb approach and measure the execution time of these two programs on a relatively large input set:

    $ seq 1000000 > numbers
    $ clang++ sum.cpp -o sum
    $ time ./sum < numbers
    0.61s user 0.01s system 94% cpu 0.659 total
    
    $ time python sum.py < numbers
    0.77s user 0.04s system 99% cpu 0.818 total
    

    The native code certainly is faster, but not by much. We can't draw too many conclusions from a single run, but there's at least one sure thing: The clang user has not specified their intent, so the compiler just generated a valid binary—this is thankfully a hard constraint—and didn't try to optimize it for whatever metric its user is interested in.

    Had the user wanted to optimize for execution speed, they should have specified that intent, say, through the -O2 flag:

    $ clang++ -O2 sum.cpp -o sum
    $ time ./count < numbers
    0.34s user 0.00s system 99% cpu 0.348 total
    

    Multi-criteria optimization

    For a wide range of codebases, there's something more than just optimize for speed. Sometimes, you want to limit the size of the binary; sometimes, you're okay with trading speed for extra security. This also depends on where you are in the development life cycle. During code editing, for example, you want a fast analysis of your code, and during bug tracking, you want as much debug information as possible, etc.

     #
     ##                           #
     ##                           ##
     ##            ##             ##
     ##            ##             ##
     ##            ##             ##
     ##    ##      ##             ##
     ##    ##      ##      #      ##
     ##    ##      ##      ##     ##
    PERF  DEBUG   EDIT    SECU   SIZE
    

    Performance

    I want the generated binary to run fast is a very common query for the compiler, so the following flags are among the most used ones:

    • -O0: No optimization at all.
    • -O1: O1 = (O0 + O2)/(2). I scarcely use this flag.
    • -O2: Optimize as much as possible, without taking the risk of significantly increasing the binary size or degrading performance.
    • -O3: Optimize even more, trading binary size for speed, and sometimes making decisions that may negatively impact performance.
    • -O4: O3 = O4. This is a myth.

    Bonus: -O3 -mllvm -polly activates polyhedral optimizations, if Clang was compiled with Polly support.

    Debug

    I want to debug my code, I don't care about performance is sadly a common request too :-/

    • -g: Include debug information.
    • -Og: == -O1 -g. That's already a trade-off between performance and debuggability.

    For the curious ones, the following snippet verifies that debug information sections are actually generated when passing the -g flag:

    $ curl $sq | clang -xc -c -g - -o sq.o
    $ objdump -h sq.o | grep debug
      #  name            size      ...
       9 .debug_str      00012b2d  ...
      10 .debug_abbrev   0000038d  ...
      11 .debug_info     0005056c  ...
      12 .debug_ranges   00000240  ...
      13 .debug_macinfo  00000001  ...
      14 .debug_pubnames 0000c73a  ...
      15 .debug_pubtypes 00001068  ...
      19 .debug_line     00073402  ...
    

    Security

    I want to protect my code from others—and myself is growing in importance these days. There aren't a lot of flags that impact security without impacting performance, but it's worth mentioning -D_FORTIFY_SOURCE=2. This picks a different declaration for a few functions, for example:

    $ clang -xc -c -O2 - -S -emit-llvm -o - -D_FORTIFY_SOURCE=2 << EOF
    #include <stdio.h>
    void foo(char *s) {
      printf(s, s);
    }
    EOF
    define void @foo(i8*) {
      %2 = tail call i32 (i32, i8*, ...) @__printf_chk(i32 1, i8* %0, i8* %0)
      ret void
    }
    

    The macro definition enables a hardened version of printf, namely __printf_chk, that also checks the number of variadic argument.

    Size

    I want to do some kind of weight control over my binary may be a valid requirement for some embedded system. In that case, you can use:

    • -Os: Same as -O2 with extra code size optimization, including different parameters for transformations like inlining.
    • -Oz: Same as -Os with more size optimizations, at the price of less performance.

    Let's showcase the impact of theses flags on the amalgamation binary:

    $ curl $sq|clang -xc - -O2 -c -o-|wc -c
    1488400
    $ curl $sq|clang -xc - -Os -c -o-|wc -c
    850696
    $ curl $sq|clang -xc - -Oz -c -o-|wc -c
    796976
    

    Editing

    The compiler also helps to produce better code through a bunch of warning and code-editing features:

    • -Wall: (Almost) all warnings.
    • -Werror[=...]: If you believe that a warning should be an error, you can selectively enable that feature, per warning.
    • -w: If you don't know what it does, you probably don't want to :-)
    • -Xclang -code-completion-at: An internal flag that can be used by IDE to provide smart code completion.
    $ cat hello.cpp
    #include <iostream>
    int main(int argc, char**argv) {
      std::co
    $ clang++ -Xclang -code-completion-at=hello.cpp:3:10 -fsyntax-only hello.cpp
    COMPLETION: codecvt : codecvt<<#typename _InternT#>, <#typename _ExternT#>, <#typename _StateT#>>
    COMPLETION: codecvt_base : codecvt_base
    ...
    COMPLETION: cout : [#ostream#]cout
    

    In this case, clang outputs all identifiers starting with co available in namespace std.

    In the next article, we'll look at various compromises and tradeoffs involved in optimization, such as debug precision versus binary size, the impact of the optimization level on compilation time, and performance versus security. Stay tuned.

    Last updated: July 29, 2019

    Recent Posts

    • What qualifies for Red Hat Developer Subscription for Teams?

    • How to run OpenAI's gpt-oss models locally with RamaLama

    • Using DNS over TLS in OpenShift to secure communications

    • Scaling DeepSeek and Sparse MoE models in vLLM with llm-d

    • Multicluster authentication with Ansible Automation Platform

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue