Continuing in the effort to detect common programming errors, the just-released GCC 8 contains a number of new warnings as well as enhancements to existing checkers to help find non-obvious bugs in C and C++ code. This article focuses on those that deal with inadvertent string truncation and discusses some of the approaches for avoiding the underlying problems. If you haven't read it, you might also want to read David Malcolm's article Usability improvements in GCC 8.
To use GCC 8 on RHEL, see How to install GCC 8 and Clang/LLVM 6 on Red Hat Enterprise Linux 7. GCC 8 is the default compiler in Red Hat Enterprise Linux 8 Beta.
Why Is String Truncation a Problem?
It is well-known why buffer overflow is dangerous: writing past the end of an object can overwrite data in adjacent storage, resulting in data corruption. In the most benign cases, the corruption can simply lead to incorrect behavior of the program. If the adjacent data is an address in the executable text segment, the corruption may be exploitable to gain control of the affected process, which can lead to a security vulnerability. (See CWE-119 for more on buffer overflow.)
But string truncation does not overwrite any data, so why is it a problem? Inadvertently truncating a string can be considered data corruption: it is the creation of a sequence of characters from which some of the trailing characters are unintentionally missing. String truncation can take one of two general forms. One results in a NUL-terminated string that is shorter than the sum of the lengths of the concatenated strings. The other results in a sequence of bytes not terminated by a NUL character: that is, the result is not a string. Using such a result where a string is expected is undefined. (See CWE-170 for more about weaknesses resulting from improper string termination.) The different kinds of truncation are caused by different functions and their detection is controlled by different warning options, both of which are enabled by -Wall
.
GCC String Truncation Checkers
GCC has two checkers that detect string truncation bugs: -Wformat-truncation
(first introduced in GCC 7) and -Wstringop-truncation
(new in GCC 8). -Wformat-truncation
detects truncation by the snprintf
family of standard input/output functions, and -Wstringop-truncation
detects the same problem by the strncat
and strncpy
functions. The warnings are closely related to but distinct from -Wformat-overflow
and -Wstringop-overflow
, which detect buffer overflow by the corresponding unbounded standard I/O functions and by string-modifying functions declared in <string.h>
, respectively. All of these warnings, although conceptually simple, rely heavily on advanced data and control flow analyses performed by a number of optimization passes within GCC to maximize efficacy.
Forming Truncated Strings with snprintf
Forming a string that is shorter than expected is the most common kind of truncation. It typically results from calls to functions such as snprintf
and strncat
. The result is a valid string in the sense that it is properly terminated by the NUL character, but its length is less than expected because it is missing one or more trailing characters. For instance, if a string represents a name, truncating it may result in it matching a different name, as in the following example:
char dirname[256]; char filename[256]; FILE* open_file (void) { char pathname[256]; snprintf (pathname, sizeof pathname, "%s/%i/%s", dirname, getpid (), filename); return fopen (pathname, "w"); }
If the concatenation of the five components of the pathname does not fit in 256 bytes, the result will not refer to the intended file. A dirname
that is close to 256 characters long means the PID may end up truncated, and the file created would not just have the wrong name but it would also be in the wrong directory. If the file contains sensitive data, another process, possibly one controlled by a hacker, may be able to read or manipulate it in illicit ways. To help detect this problem, GCC diagnoses the snprintf
call above with a message similar to this:
warning: '%i' directive output may be truncated writing between 1 and 11 bytes into a region of size between 0 and 255 [-Wformat-truncation=] snprintf (pathname, sizeof pathname, "%s/%i/%s", dirname, getpid (), filename); ^~ note: 'snprintf' output between 4 and 524 bytes into a destination of size 256 snprintf (pathname, sizeof pathname, "%s/%i/%s", dirname, getpid (), filename); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The warning uses string lengths and ranges of integer arguments to determine when truncation is possible. When the length of a string argument cannot be determined and the argument is an array of known size, the warning uses the size of the array as the worst-case estimate of the string length instead. In the instance above, the text of the warning indicates that if dirname
were as few as 244 characters long, appending the slash and a very large PID value close to INT_MAX
would result in truncating the PID, not to mention the final slash and filename
. The phrasing "output may be truncated" indicates that truncation is possible but not inevitable. If the truncation were certain the phrase "output truncated" would be used instead. The final note then gives the minimum and maximum number of characters that GCC has determined the function will write into the destination.
The -Wformat-truncation
warning is not new in GCC 8, but thanks to some enhancements it is able to detect more instances of the problem than GCC 7. It detects so many that users tend to be surprised by the volume of warnings for their code. Some users have complained that they find the warning too noisy. The most common complaint is that it points out code where the result of truncation either is not used in ways that would cause it to misbehave (such as to print a message on the terminal or into a log file where it doesn't matter that a part of the string is cut off) or that the truncation is handled later (such as by the fopen
function failing to open a file with a truncated name).
Such views neglect to consider that incomplete or truncated messages make program output difficult for users or (in the case of log files) for operators to interpret. In more severe cases (such as relying on fopen
to fail to open a file), they tend to underestimate the downstream security risks.
However, despite best efforts, -Wformat-truncation
is not free of real false positives (instances when, by design, the warning should not be issued but is as a result of GCC bugs or limitations). Although they are proving harder to avoid than we would like, none is due to inherent flaws in GCC architecture or design but rather due to limitations in various optimization passes. As frustrating (and time consuming) as false positives can be both for users and for GCC developers, in this case they help highlight possible code generation improvement opportunities that might otherwise go unnoticed. GCC developers are tracking these false positives along with the optimization opportunities and working toward solutions.
Irrespective of whether a given instance of the -Wformat-truncation
warning indicates a possible bug in a program or it is a false positive, for best results it is best to avoid truncation. Since the purpose of snprintf
is to prevent buffer overflow, we recommend developers assume that every non-trivial call to snprintf
can result in truncation (otherwise, using the function would be unnecessary) and handle it appropriately. When GCC detects that truncation cannot happen, it will optimize the handling away, eliminating any overhead that might otherwise result.
Using snprintf Safely
One way to avoid truncation is to use snprintf
to determine the size of the destination buffer before storing output in it, and allocate it dynamically so it is just large enough. This is done by calling the function twice: once with a null destination pointer and then again with a pointer to the allocated buffer. The buffer can be allocated either by malloc
or, when its size is known to be small enough, as a variable-length array (use -Wvla-larger-than
to detect excessively large VLAs), for example, like this:
FILE* open_file (const char *dirname, const char *filename) { errno = 0; int n = snprintf (0, 0, "%s/%i/%s", dirname, getpid (), filename); if (n < 0) { perror ("snprintf failed"); abort (); } errno = 0; char *pathname = (char *) malloc (n + 1); if (!pathname) { perror ("malloc failed"); abort (); } errno = 0; n = snprintf (pathname, n + 1, "%s/%i/%s", dirname, getpid (), filename); if (n < 0) { perror ("snprintf failed"); abort (); } FILE *fp = fopen (pathname, "w"); free (pathname); return fp; }
Avoiding Truncation with sprintf and open_memstream
Another solution is to use the POSIX open_memstream
function to create a FILE
object that, when used with other I/O functions, manages a dynamically allocated buffer that grows as necessary to fit all output. With this approach, it is of course necessary to handle running out of memory, for example:
FILE* open_file (const char *dirname, const char *filename) { char *pathname; size_t pathsize; FILE *pathfp = open_memstream (&pathname, &pathsize); if (!pathfp) { perror ("open_memstream failed"); abort (); } fprintf (pathfp, "%s/%i/%s", dirname, getpid (), filename); if (fclose (pathfp)) { // Likely out of memory. perror ("fclose failed"); abort (); } FILE *fp = fopen (pathname, "w"); free (pathname); return fp; }
Avoiding Truncation with asprintf
Finally, the BSD and GNU function asprintf
can safely be used to both dynamically allocate the buffer and format output in a single call. As with the open_memstream
approach, the trade-off with asnprintf
is that callers must be prepared to handle the function's failure to allocate memory (that is, to detect and handle ENOMEM
).
Handling Truncation When It Occurs
When avoiding truncation is not feasible, it needs to be handled. To handle snprintf
truncation, the value returned from the function must be used to take some action. GCC looks to see whether the value is used in a meaningful way and avoids issuing the warning when it is. Note that it is not sufficient to assign the returned value to an otherwise unused variable. The simplest, though not necessarily the most appropriate, way to handle truncation is to abort, for example:
FILE* open_file (const char *dirname, const char *filename) { char pathname[256]; int n = snprintf (pathname, sizeof pathname, "%s/%i/%s", dirname, getpid (), filename); if (n < 0) { perror ("snprintf failed"); abort (); } if ((size_t)n > sizeof pathname) { perror ("pathname too long"); abort (); } return fopen (pathname, "w"); }
Forming Truncated Strings with strncat
String truncation can also occur as a result of calling strncat
. The origins of strncat
(and strncpy
) can be traced to Version 7 UNIX, which was released in 1979 and in which functions were introduced to manipulate arrays of binary data not necessarily terminated by the NUL character, such as directory entries or encryption keys. Unlike the other functions discussed in this article, strncat
is impossible to use safely even in the originally intended cases. To be used safely, the function would need to take as arguments not just the size of the remaining space in the destination but also the maximum number of characters to copy from a non-string. However, by providing only one size argument, it is impossible to avoid both buffer overflow and string truncation. Since preventing buffer overflow tends to be viewed as more important than preventing string truncation, GCC assumes the size argument refers to the remaining space in the destination buffer and expects safe calls to match the following pattern (see also the US-CERT article on strncpy()
and strncat()
):
strncat (dest, src, dest_size - strlen (dest) - 1);
Calls that have this form are not diagnosed. Other calls, such as those where the size is derived in some way from the size or length of the source string, are diagnosed by -Wstringop-overflow
. That includes unsafe calls like
strncat (dest, src, strlen (src)); // incorrect - warning
and
strncat (dest, src, sizeof src); // incorrect - warning
Forming Non-NUL-Terminated Sequences
An entirely different form of string truncation is one that can result from calls to strncpy
. Unlike functions such as snprintf
and strncat
that always append a terminating NUL, when the source string passed to strncpy
is longer than the size specified by the third argument, the function truncates the copy without appending a NUL to the end. The result is not a string in the C or C++ sense (which is defined as a NUL-terminated sequence of bytes in both languages) and, thus, it is not suitable as an argument to functions that expect one. It is a common error to call a string-handling function such as strlen
with an argument that is not a NUL-terminated string, for example:
FILE* open_file (const char *dirname, const char *filename) { char pathname[256]; strncpy (pathname, dirname, sizeof pathname); strncat (pathname, "/", sizeof pathname); strncat (pathname, filename, sizeof pathname); return fopen (pathname, "w"); }
If dirname
is longer than 255 characters, the call to strncpy
will copy the first 256 characters from it to pathname
without adding a terminating NUL. The subsequent calls to strncat will then try to write the path separator and the contents of filename
somewhere past the end of the pathname
buffer. Where exactly that occurs depends on the contents of memory beyond the end of the buffer (the location of the first NUL byte). GCC helps detect these errors by diagnosing the code like this:
warning: 'strncpy' specified bound 256 equals destination size [-Wstringop-truncation] strncpy (pathname, dirname, sizeof pathname); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ warning: 'strncat' specified bound 256 equals destination size [-Wstringop-overflow=] strncat (pathname, filename, sizeof pathname); ^~~~~~~
The first warning is the one relevant to the case we are discussing. (Note that this warning may be suppressed due to GCC bug 82944.) The second warning is caused by improperly bounding the number of characters copied by strncat
.
Using strncpy Safely
In general, it is not possible to avoid string truncation by strncpy
except by sizing the destination to be at least a byte larger than the length of the source string. With that approach, however, using strncpy
becomes unnecessary and the function can be avoided in favor of other APIs such as strcpy
or (less preferably) memcpy
. Much has been written about the problems with strncpy
and we recommend to avoid it whenever possible. It is, however, worth keeping in mind that unlike other standard string-handling functions, strncpy
always writes exactly as many characters as specified by the third argument; if the source string is shorter, the function fills the remaining bytes with NULs.
Mitigating strncpy Truncation
Since it is not possible to avoid truncation by strncpy
, when using other functions is not feasible, it is necessary to make sure the result of strncpy
is properly NUL-terminated and the NUL must be inserted explicitly, after strncpy
has returned:
char pathname[256]; strncpy (pathname, dirname, sizeof pathname); pathname[sizeof pathname - 1] = '\0';
GCC tries to detect these uses and avoid issuing the warning when it can determine that the NUL is inserted before the array is used by a string-handling function. However, the simple approach outlined above suffers from the same problem as ignoring snprintf
truncation and so, to be safe, the truncation should be detected and handled as discussed above. GCC 8 doesn't detect the missing handling in this case but future versions might.
Avoiding the possible buffer overflow in the subsequent strncat
calls in the example is left as an exercise for the reader.
-Wstringop-truncation
is arguably less prone to false negatives than the other warnings discussed in this article, but perhaps even more so than -Wformat-truncation
, it can be prone to false positives. That is because the originally intended and safe uses of the function are not always distinguishable from the unsafe ones. It was a necessary judgment call to decide whether to issue diagnostics in those cases. GCC developers decided to err on the side of caution and issue the warning on the basis that false positives are easy to suppress, especially by experienced programmers, in the correct and safe uses, while the false negatives would let mistakes by less experienced or less careful programmers go unnoticed. To help differentiate between the two sets of use cases and allow the false positives to be avoided, GCC 8 has introduced a new attribute to decorate arrays and pointers with that need not be NUL-terminated. The name of the attribute is nonstring
, and it is used by GCC to suppress select -Wformat-truncation
instances where the missing NUL is intended. It is important to note that since character arrays that are not NUL-terminated are not valid arguments to functions that expect strings (such as strlen
or strcpy
), using nonstring
arrays with such functions is diagnosed.