Advantages of a multicompiler build

For a multitude of reasons, developers usually compile the project they are working on with only one compiler. On Red Hat Enterprise Linux 8, the system compiler for C and C++ is GNU Compiler Collection (GCC) 8, and newer versions are available through the GCC toolset.

However, there are several reasons why you might also build your project with Clang. Red Hat Enterprise Linux 8 offers the LLVM toolset, which contains Clang.

In this article, we'll take a look at why one might use more than one compiler. We'll focus on a system where GCC is currently the default compiler and consider Clang as the main alternative.

Detecting compiler-specific behavior

When you try to compile a project with a compiler not usually used by the project, one of the most frequent problems is the project's assumptions about the compiler in use. These assumptions can appear in many places within a project, from command-line options to supported features to compiler extensions.

Generally, all of those compiler-specific features are to be avoided, unless the project explicitly supports only compilers that offer nonstandard features. If you depend on a particular compiler or nonstandard features, that dependency should be documented in the project's build documentation and ideally enforced by the build system.

To ensure that a project keeps building with other compilers, it is useful to regularly build it with a new compiler. Such a build can also detect implementation-defined behavior, undefined behavior, and (in rare cases) compiler bugs.

Other potential problems include differences in supported __attribute__((foo)) directives (or [[attribute-name]] for C++), differences in supported language standards, and the use of unimplemented compiler built-ins.

Getting different error messages from multiple compilers

One of the biggest benefits of using a modern compiler is the warnings and error messages it generates, based on the command-line options the developers pass. Fixing these warnings can result in better code quality, increased portability, and fewer bugs.

One common problem is that not all compilers accept the same command-line arguments, so projects have to check for them at configure time. Depending on the needs of the project, most end up maintaining a list of compiler options for each compiler they build with.

Because figuring out whether a compiler supports a command-line option is so common, the usual build systems have built-in ways of performing these checks.

CMake uses the check_c_compiler_flag() or check_cxx_compiler_flag() functions:

include(CheckCCompilerFlag)
check_c_compiler_flag("-Werror=header-guard", CC_SUPPORTS_HEADER_GUARD)
check_c_compiler_flag("-Werror=logical-op", CC_SUPPORTS_LOGICAL_OP)

Meson uses the get_supported_arguments() function of the compiler object:

test_cflags = [
    '-Werror=header-guard',
    '-Werror=logical-op'
]

cc = meson.get_compiler('c')
supported_flags = cc.get_supported_arguments(test_cflags)

There are many more build systems, of course, but they all provide a more-or-less elegant way of dealing with this issue.

So, having configured our build with checks for compiler options, we can have a successful build with both Clang and GCC, or potentially any other compiler.

The following code shows an error when built with GCC, using the -Wlogical-op command-line option. Clang does not support that option and remains silent about the suspicious use of logical operators:

int main(int argc, char **argv) {
    if (argc > 0 && argc > 0) {
        return 1;
    }
    return 0;
}

GCC 10's output with -Wlogical-op looks like this:

test.c: In function ‘main’:
test.c:2:16: warning: logical ‘and’ of equal expressions [-Wlogical-op]
    2 |     if (argc > 0 && argc > 0) {
      |         ~~~~~~~~~^~~~~~~~~~~

Note: See this code on godbolt.org.

As already mentioned, Clang ignores this (potential) problem. But if we include the following header file in our previous test, GCC will remain silent about the typo in the header guard, while Clang will helpfully point out the mistake. Header guards like this are still relatively common in both C and C++ code, and the problem is usually hard to find:

#ifndef __TEST_HEADER_H__
#define __TEST_HEADRE_H__

/* ... Code ...*/

#endif

Clang with the -Werror=header-guard option tells us about the broken header guard:

In file included from test.c:3:
./test.h:1:9: error: '__TEST_HEADER_H__' is used as a header guard here, followed by #define of a different macro [-Werror,-Wheader-guard]
#ifndef __TEST_HEADER_H__
        ^~~~~~~~~~~~~~~~~

./test.h:2:9: note: '__TEST_HEADRE_H__' is defined here; did you mean '__TEST_HEADER_H__'?
#define __TEST_HEADRE_H__
        ^~~~~~~~~~~~~~~~~
        __TEST_HEADER_H__

Note: See this code on godbolt.org.

GCC is unable to detect this problem. The examples of different behavior I've shown in this section are just two of many examples one could find.

In practice, if the compilers emit only warnings and not errors, their usefulness depends on a developer actually looking at the compiler output. Passing -Werror to the compiler during development is also useful because it makes the compiler treat all warnings as errors.

Static analysis

One particularly well-known part of Clang is the static analyzer. Static analysis can be used to analyze certain aspects of a program "statically," meaning, before runtime. This allows for more thorough checking because the time taken by the analyzer is not as important as the time taken by the compiler.

Because it does not need manual intervention, static analysis is particularly useful in continuous integration (CI), where we can use it for every push to a repository. However, it is usually hard to keep code 100% clean of reports from the static analyzer. This is partly due to how thorough static analyzers are: They find many problems that compilers don't find, but some of these are false positives, caused by implicit invariants the analyzer doesn't know about. Encoding these assumptions in the form of assertions usually improves code clarity and also makes the analyzer happy.

Sanitizers

Clang comes with a couple of sanitizers, which instrument the compiled program at runtime. Sanitizers are usually used for issues that would otherwise require developers to rerun the program and get information about the problematic behavior, which costs additional time. They are also used for issues that don't abort the program but cause problems later on, such as integer overflows or access to uninitialized data.

The most useful and common sanitizers supported in Clang are the following:

  • AddressSanitizer can be used to detect various memory problems such as null pointer dereferences, use-after-free, and double/invalid free. It can be enabled via -fsanitize=address.
  • MemorySanitizer can be used to detect access to uninitialized memory. It can be enabled via -fsanitize=memory.
  • ThreadSanitizer detects data races in multithreaded programs. It can be enabled via -fsanitize=thread.
  • UndefinedBehaviorSanitizer detects various kinds of undefined behavior. It can be enabled via -fsanitize=undefined.

GCC supports all these sanitizers except for the memory sanitizer. These are the big sanitizers, but both GCC and Clang support many more fine-grained checks, so it might be worth checking their documentation for useful sanitizers and their options.

If the software has a test suite (which it should), and that test suite is run in a continuous integration fashion (which it should be), compiling the test suite with some of the sanitizers mentioned in this section makes a lot of sense and does not increase run time as much as running the programs in Valgrind would.

It is, however, certainly also helpful if developers have one or another sanitizer enabled when working on the software itself on a daily basis, to catch errors with real-world data as soon as possible.

For a more in-depth introduction to sanitizers and a comparison to the super useful Valgrind tool, check out Jan Kratochvil's recent article on the topic.

Clang fuzzer

A fuzzer is a tool that generates random input for a library under test. Fuzz testing is useful to find errors and crashes in any sort of file parser. Clang contains libFuzzer, which can be used for this sort of testing.

The fuzzer will keep providing new input to the library under test until a bug is found, so in the best case, the run of the fuzzer program continues for an indefinite amount of time. This case of fuzzing cannot be used in an automated fashion, but is still very valuable for developers to use manually.

For an introduction to using Clang's fuzzer on RHEL with llvm-toolset, read this article by Tom Stellard.

Link-time optimized (LTO) builds

More and more distributions are switching to using link-time optimized (LTO) builds. In these builds, the compiler does not emit native object code, but its intermediate representation (IR). The IR is then handed to the linker, which can apply intermodular optimizations.

LTO also helps identify issues with symbol visibility by removing symbols that are unused and not explicitly marked as externally visible. This can help you avoid accidentally exporting symbols that are not meant for public consumption.

Both GCC and LLVM support LTO builds, as well as a couple of configuration options to fit the needs of different use cases. For more details on these options as well as the inner working of the compilers during LTO builds, consult the GCC and LLVM documentation regarding this topic.

Control flow integrity in Clang

Clang's control flow integrity (CFI) is a special type of sanitizer that requires link-time optimization (LTO) to be used. You can enable it via -fsanitize=cfi.

CFI allows instrumentation of the compiled program to detect certain forms of undefined behavior and to abort the program in these cases. The CFI sanitizer supports different schemes, and they are usually optimized enough so that they can be enabled even in release builds. Google, for example, is known to do this on Android.

This is an example of security hardening that is available only on Clang right now.

Conclusion

Different compilers come with their strengths and weaknesses. Testing your project with different compilers will ensure you do not rely on particular compiler-specific behaviors—or even bugs. Many open source projects already have a CI pipeline that leverages more than one compiler, configuration, or platform. That is ideal, and you should try to do this for all of your projects if it makes sense for them. If you can't, it makes sense to at least use another compiler once in a while, or try to integrate this practice into your local development workflow.

Using sanitizers in your test suite and static analyzers regularly (or even via a special build in CI) is a great way of finding bugs ahead of time. Again, different tools show different defects in your code. Carefully evaluating them pays off in the long run. Try out other tools and use the one you feel most comfortable with on a daily basis. But always keep other options in mind and automate what you can.

 

Last updated: May 5, 2021