Detecting memory management bugs with GCC 11

Memory management bugs are among the hardest to find in C and C++ programs, and are a favorite target of exploits. These errors are difficult to debug because they involve three distinct sites in a program that are often far apart and obscured by the use of pointers: memory allocation, the use of the allocated memory, and the release of memory back to the system by deallocation. In this two-part article, we'll look at GNU Compiler Collection (GCC) 11 enhancements that help detect the subset of these bugs that affect dynamically allocated memory. The enhancements discussed here have been made to the GCC core. Related improvements to the GCC static analyzer are covered by David Malcolm in his article Static analysis updates in GCC 11.

Throughout this article, I include links to the code examples on Compiler Explorer for those who would like to experiment. You will find the links above the source code of each example.

Note: Read the second half of this article: Detecting memory management bugs with GCC 11, Part 2: Deallocation functions.

Overview of memory allocation policies

Let's start with a quick breakdown of the major kinds of memory management bugs. C and C++ outline four broad kinds of storage allocation policies:

  • Automatic: This policy allocates objects on the stack of a function. With the notable exception of the nonstandard, discouraged, yet still widely used alloca() function, automatic objects are allocated when they are declared. They are then commonly referred to by name, except when they are passed to other functions by reference or when pointers are used to point to the elements of arrays. As the name implies, automatic objects are deallocated automatically, at the end of the block in which they are declared. Objects allocated by alloca() are deallocated on function return.
  • Dynamic: This policy allocates objects on the heap by an explicit call to an allocation function. To avoid memory exhaustion, dynamically allocated objects must be deallocated by an explicit call to a corresponding deallocation function.
  • Static: This policy allocates named objects that last for the duration of a program. They never go out of scope and so they are never deallocated during the program's execution.
  • Thread: Like static, but limited in duration to a single thread of execution. Objects in this policy are automatically deallocated at the termination of the thread in which they are created.

Of these four policies, the one that represents the most common, but also the most insidious, class of problems is dynamic allocation. This form of allocation is the subject of this two-part article. To be sure, plenty of bugs also have to do with automatic storage (think about uninitialized reads, or accessing a local variable after it has gone out of scope through a pointer obtained while it was still live), but we will talk about those another time.

New command-line options in GCC 11

Before diving into the details of the dynamic memory management bugs that GCC 11 can detect, let's quickly summarize the command-line options that control detection. All the options are enabled by default. Although they perform best with optimization enabled, they don't require it.

GCC 11 provides two new options and significantly enhances one that has been available for several releases:

  • -Wmismatched-dealloc controls warnings about mismatches between calls to general memory allocation and deallocation functions. This option is new in GCC 11.
  • -Wmismatched-new-delete controls warnings about mismatches between calls specifically to operator new() and operator delete() in C++. This option is also new in GCC 11.
  • -Wfree-nonheap-object controls warnings about invalid attempts to deallocate pointers that were not returned by dynamic allocation functions. This option has been enhanced in GCC 11.

Dynamic memory management functions

The best known dynamic memory management functions in C are calloc(), malloc(), realloc(), and free(). But they are not the only ones. In addition to these C89 functions, C99 introduced aligned_alloc(). POSIX adds a few of its own allocation functions to the mix, including strdup(), strndup(), and tempnam(), among others. C library implementations often provide their own extensions. For instance, FreeBSD, Linux, and Solaris all define a function named reallocarray() that's a hybrid between calloc() and realloc(). The pointer returned by all these allocation functions must be passed to free() to be deallocated.

Besides functions that dynamically allocate raw memory, several other standard APIs allocate and deallocate other resources. For instance, fopen(), fdopen(), and the POSIX open_memstream() create and initialize FILE objects that must then be disposed of by calling fclose(); the popen() function also creates FILEs, but those must be closed by calling pclose(). Similarly, the POSIX newlocale() and duplocale() functions create locales that must be destroyed by calling freelocale().

Finally, many third-party libraries and programs define their own functions either to allocate raw memory or to initialize objects of various types that reside in allocated memory. These functions usually return pointers to the objects to their clients. The simplest of these can be deallocated directly by calling free(), but most APIs rely on their clients to destroy and deallocate objects by "returning" them to the appropriate deallocation function.

All these groups of APIs share a common theme: the allocation function in each group returns a pointer that's used to access the object and, importantly, that must eventually be passed to the appropriate deallocation function in the same group. The result of malloc() must be passed to free(), and that of fopen() to fclose(). Therefore, passing the result of fopen() to free() is a bug, as is calling fclose() on a pointer returned from malloc(). In addition, in C++, the result of a given form of operator new()—either ordinary or array—must be deallocated by the corresponding form of operator delete(), but not by calling free() or realloc().

Matching allocations with deallocations

Calling the wrong deallocation function to release a resource allocated by an allocation function from a different group usually leads to memory corruption. The call might crash right there and then, sometimes even with a helpful message, or might return to the caller and crash sometime later, in an area unrelated to the invalid call. Or the deallocation function might not crash at all but instead overwrite some data, leading to unpredictable behavior at some later point. Naturally, we would like to detect and prevent these bugs, not just before they make it into a product release, but ideally during code development before they are committed into the code base. The challenge is how to let our tools—compilers or static analyzers—know which functions must be used to deallocate each of the objects allocated by other functions.

For a subset of standard functions, the semantics and the associations can be and often are baked into the tools themselves. For example, GCC knows the effects of the standard C and C++ dynamic memory management functions and which ones go with which, but it doesn't have the same knowledge of <stdio.h> functions such as fopen() and fclose(), or about implementation-defined extensions. Additionally, GCC knows nothing about user-defined functions.

Attribute malloc

Enter attribute malloc, or more accurately, an enhancement to it implemented in GCC 11. In its traditional form, the attribute takes no arguments and simply lets GCC know that the function it applies to returns dynamically allocated memory like malloc(). This property is used by GCC to make aliasing assumptions about the contents of the returned memory and emit more efficient code. GCC 11 extends attribute malloc to take one or two arguments: the name of the deallocation function to call to release the allocated object and, optionally, the positional argument number to which the pointer must be passed. The same allocation function can be paired with any number of deallocation functions. For example, the following declarations designate fclose() as the deallocator for fopen(), fdopen(), fmemopen(), and tmpfile(), and pclose() as the only deallocator for popen().

int fclose (FILE*);

int pclose (FILE*);

__attribute__ ((malloc (fclose, 1))))
FILE* fdopen (int);

__attribute__ ((malloc (fclose, 1))))
FILE* fopen (const char*, const char*);

__attribute__ ((malloc (fclose, 1))))
FILE* fmemopen (void *, size_t, const char *);

__attribute__ ((malloc (pclose, 1))))
FILE* popen (const char*, const char*);

__attribute__ ((malloc (fclose, 1))))
FILE* tmpfile (void);

Ideally, the declarations in <stdio.h> and other C library headers would be decorated with the attributemalloc as just shown. A patch for glibc on Linux was submitted but hasn't been approved yet. Until that happens, you can add the previous declarations to your own header to enable the same detection. The full patch can also be downloaded from sourceware.org.

Both GCC proper and the integrated static analyzer make use of the new attribute to issue similar warnings. The static analyzer detects a broader set of problems at the cost of increased compilation time.

Detecting mismatched deallocations

The new attribute mallocis used by a number of warnings in GCC 11 to detect various memory management bugs. The -Wmismatched-dealloc option controls warnings about deallocation calls with arguments returned from mismatched allocation functions. For example, given the declarations in the previous section, the call to fclose() in the following function is diagnosed because the pointer passed to it was returned from an allocation function that's not associated with it: popen(). The popen_pclose example shows how this works:

void test_popen_fclose (void)
{
   FILE *f = popen ("/bin/ls");
   // use f
   fclose (f);
}

The compiler warning is:

In function 'test_popen_fclose':
warning: 'fclose' called on pointer returned from a mismatched allocation function [-Wmismatched-dealloc]
21 | fclose (f);
   | ^~~~~~~~~~
note: returned from 'popen'
19 | FILE *f = popen ("/bin/ls", "r");
   | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Conclusion to Part 1

Look for the second half of this article, where I will describe more options for detecting dynamic allocation bugs. I will conclude with situations that can lead to false positive or false negative identifications.

Last updated: February 11, 2024