GCC Undefined Behavior Sanitizer – ubsan
Not every software bug has as serious consequences as seen in the Ariane 5 rocket crash. Notwithstanding that, bugs cost software companies a lot of money every year and upset customers, users, and developers. Some bugs happen as a result of undefined behavior occurring in the program. Undefined behavior is a concept known especially in the C and C++ languages which means that the semantics of certain operations is undefined and the compiler presumes that such operations never happen. For instance, using non-static variable before it has been initialized is undefined. If an undefined behavior occurs, the compiler is free to do anything. The application can produce wrong results, crash, or print the complete text of Proust’s oeuvre.
Luckily, there are ways to detect at least some of the undefined behavior in a program. The compiler can issue a warning at compile time, but only in case it can statically detect some kind of wrongdoing. Often this is not the case and the checking has to take place at run time.
GCC recently (version 4.9) gained Undefined Behavior Sanitizer (ubsan), a run-time checker for the C and C++ languages. In order to check your program with ubsan, compile and link the program with
-fsanitize=undefined option. Such instrumented binaries have to be executed; if ubsan detects any problem, it outputs a “runtime error:” message, and in most cases continues executing the program. There is a possibility of making these diagnostic messages abort — just use the option
At present, ubsan can offer a handful kinds of checking. The simplest is probably the integer division by zero sanitization: if a division by zero occurs, or
INT_MIN / -1 for signed types, a run-time error is issued. Floating-point type division by zero is off by default, but can be turned on with the
-fsanitize=float-divide-by-zero command-line option.
Sanitization of the shift operation ensures that the result of a shift operation is not undefined. Note that what exactly is considered undefined differs slightly between C and C++, as well as between ISO C90 and C99. Generally, the right operand must not be negative and must not be greater than or equal to the width of the (promoted) left operand. An example of invalid shift operation is the following:
int i = 23;
i <<= 32;
One of the most important checking is the signed integer overflow checking. The practice shows that this undefined behavior is very common in real programs. Ubsan is able to check that the result of addition, subtraction, multiplication and negation does not overflow in signed arithmetic. For instance, in the example below ubsan would issue a run-time error:
int i = INT_MIN;
int j = -i;
But since one has to take the integer promotions into account, the following snippet is valid:
signed char c = SCHAR_MAX;
Even a conversion of a floating-point value to an integer value can overflow. Such a case is not diagnosed by default, but can be enabled specifically with the
Ubsan also provides a NULL pointer dereference checking. Thus, if a program tries to dereference or store to a NULL pointer, a run-time error is displayed. Furthermore, the NULL pointer checking handles even the case when a method is invoked on an object pointed by a NULL pointer.
__builtin_unreachable calls simply invokes a run-time error any time
__builtin_unreachable is reached in the program. Return statement instrumentation is only valid for C++ programs. It triggers when the end of a non-void function is reached without actually returning a value.
Out-of-bounds access is one of the most serious mistakes. Ubsan can help here, since it is able to instrument out-of-bounds accesses as well. Note that a pointer that points just past the end of an array is valid in C; a single object is treated as a 1-element array. Bounds instrumentation works on variable length arrays (VLAs) as well, but flexible array members are not instrumented.
Similar to the above, the VLA checking merely checks that a VLA’s size is a positive integer.
Accessing a misaligned pointer also results in undefined behavior. Ubsan provides checking of alignment of pointers as they are dereferenced. Calling a method or a constructor on an improperly aligned object is not valid either, and ubsan is able to detect this mistake as well.
GCC provides two attributes that can be used to hint the compiler that a function either should never get a NULL as an argument (
nonnull attribute), or that a function does not return NULL (
returns_nonnull). With this, the compiler is able to better optimize the code. But if the function gets or returns NULL pointer nevertheless, all bets are off. Ubsan’s nonnull attribute checking can be used to catch such wrongdoings.
Yet another feature is bool-enum load checking, which makes sure that storing a value other than 0/1 into a boolean does not go unnoticed, as well as storing a value of an enumerated type which is outside the values of that enumerated type.
And more to come
Some features are currently under development. The first one is object size checking. This makes use of the
__builtin_object_size function, which returns the size of an object. Typically, compiler optimizations must be enabled for
__builtin_object_size to work properly. If the compiler can prove that the program is accessing bytes outside an object, it churns out a run-time error.
And finally, another feature that is currently in the works is virtual pointer checking. As the name suggest, it is intended for C++ programs, and ought to verify that virtual pointers are in order – if not, the application is likely wrong and prone to fail.
With this work we attempted to discover many bugs in the programs as possible. That said, ubsan can’t prove that the program does not contain any bugs. Yet, especially together with
-fsanitize=address, it proved useful in hunting down the creeping bugs, if used regularly.
We’re always interested in receiving your feedback and questions, so feel free to add a comment or drop us an email at RHELdevelop AT redhat DOT com or tweet!