Introduction
Since their 4.8 version, the C and C++ compilers of the GNU Compiler Collection are equipped with built-in memory and data race errors detectors named Address Sanitizer and Thread Sanitizer.
This article intends to quickly walk you through the highlights of these two interesting tools.
Spotting common memory access errors ...
When instructed to compile a given program, the Address Sanitizer sub-system of GCC emits additional code to instruments the memory accesses performed during the program's execution. Later, when the resulting program is executed, the instrumentation code checks the validity of every single memory access that is performed.
If the memory access is deemed invalid, the execution of the program is aborted and a stack trace is printed on the standard error output of the process, hinting the user about the location the faulty access in the source code of the program.
The kind of invalid memory accesses that can be spotted today are:
- Out-of-bounds accesses for local, global, and heap objects
- Use-after-free accesses for heap objects
Other kinds of invalid memory accesses are being added to subsequent development versions of the Address Sanitizer sub-system of the compiler.
... while being relatively fast and lean
One of the neat aspects of Address Sanitizer is that it's pretty fast. A program instrumented with Address Sanitizer would typically run twice as slow as its non-instrumented counterpart and would typically consume 20% more memory. If you compare this to, for instance, available solutions based on binary translations, this is quite a feat!
This relative frugality makes programs instrumented with Address Sanitizer usable in environments where some other kinds of memory error detectors were impractical due to their memory consumption and speed overhead.
Finding data race errors
The Thread Sanitizer sub-system of GCC generates a different instrumentation code to check memory accesses for potential data races in multi-threaded contexts. At run-time, the resulting instrumented program is aborted when a data race is detected and, like for Address Sanitizer, a stack trace is emitted to the error output stream of the process to help the user find the source code location where the data race occurred.
At the moment, the run-time overhead of Thread Sanitizer is much bigger than the one of Address Sanitizer. In a typical setup, a program instrumented with Thread Sanitizer would consume at least five times more memory than its non-instrumented counterpart.
Supported platforms
Address Sanitizer is currently available for GNU/Linux on Intel, Arm and Power architectures, as well as on some variants of Mac OSX. Thread sanitizer on the other hand is available only on GNU/Linux for the x86_64 architecture at the moment.
Sharing code with the LLVM project
The instrumentation code emitted by the Address Sanitizer and Thread Sanitizer sub-systems of GCC calls functions that are implemented as part of run-time libraries which are maintained in the premises of the LLVM project. We believe this arrangement is a good example of fruitful collaboration between these two compiler infrastructure upstream projects.
Learning more about Address & Thread Sanitizers
Interested readers might find many sources of detailed information about these GCC features in the internet. Here is a short selection of these:
Last updated: August 30, 2016