Most of us who write C and C++ code intuitively understand why some programs might behave differently when compiled with different compilers or different compiler options, or when running under particular conditions. We're also aware of the dangers of undefined behavior and usually try to avoid it in our code. But not all of us appreciate the many useful (and often unavoidable) aspects of undefined behavior. This article explains the kinds of behavior the C standard uses to categorize programs and their subtle, sometimes unexpected, impacts on compilers, libraries, other standards, and programmers themselves.

Programmers are rarely familiar with the subtle nuances of undefined behavior. Nor do programmers pay as close attention to the other kinds of behavior that C programs are subject to, or when that behavior might occur. In our discussions, we often use terms like "valid code" or "correct program" without having a shared understanding of their meanings.

I'm sometimes surprised that not even expert programmers, including C committee members, always agree on what the terms mean. Some don't know why the basic categorizations of program behavior were introduced into C in the first place, or their purpose.

The standard is a treaty

A programming language standard outlines two sets of requirements: One that implementations must follow when translating programs, and another that programmers are subject to in order for implementations to fulfill their end of the bargain. In the words of the C Charter: "the standard is a treaty between implementor (sic) and programmer."

The standard details the rules and defines the outcomes when both sides play by those rules. However, it does not and cannot define what happens when either side steps outside those rules.

That distinction seems straightforward. Yet, for historical reasons, details of the C standard are a bit more nuanced, and reality makes them more complicated still. The next section of this article explains how the C standard categorizes programs, depending on the extent to which they follow the rules. I'll explain why these mechanisms were introduced, and what programmers should expect of programs that fall under the different categories.

Categories of programs by behavior

The original goal of the C standard committee in the early 1980s was to codify the set of rules in existing implementations. This standardization was possible for features in the original Kernighan and Ritchie C manual where compilers and runtime libraries all agreed on their interpretation, but became more difficult when the compilers and libraries diverged from it. Some diverged because the manual was ambiguous, others to provide extensions useful for their target environment or CPU.

The committee was cautious to minimize disruption, both to existing implementations and to the large body of C code out there. But the committee did not want to diminish the value of conformance. Therefore, the committee introduced the notion of degrees of conformance. Although all implementations must conform to the standard, programs can choose to conform to varying degrees, and with varying guarantees for code portability.

To that end, the C standard recognizes several categories of programs. Each category in turn places progressively weaker requirements on implementations, and, as a result, weaker guarantees about programs' behavior and portability.

Before going into the details, a few words on terminology:

  • An implementation consists of a compiler and the standard C library. The C library rarely stands on its own but is usually incorporated into a larger library that implements some superset of C such as POSIX, or some proprietary operating system API.

  • By behavior, the standard means the observable output of a program: The data it writes to open streams (such as stdout and stderr), or the accesses it performs on volatile storage (both reading from it and writing to it).

    Formally, the term behavior is defined as external appearance or action. It's important to understand that this doesn't include the behavior that might be observed in a debugger, such as changing values of variables while stepping through a function, or in the output of a tracing tool attached to the program by some sort of an interprocess communication mechanism.

  • Finally, the term portability is defined in the ISO vocabulary as the capability of a program to be executed on various types of data processing systems without converting the program to a different language and with little or no modification.

Although the term behavior is frequently used to describe the effects of a single coding construct, such as a subtraction expression or a call to a library function like strlen, the implications of a construct can extend to the entire program. This is an important and often misunderstood point: The consequences of a single coding construct have an impact not just on the constructs that follow but sometimes also (and perhaps unintuitively) constructs that come before it in the program source. I'll have more to say about this topic in the section on undefined behavior.

It's also important to understand that the categories described in this article exist solely for the purpose of, and within the context of, the C standard. The same program doesn't fall under one category when compiled with compiler A or running on operating system X and under a different category when compiled with B or running on Y. Conversely, the writers of A or X don't get to decide what category a class of programs might fall into when targeting A and X, while those of B and Y get to make a different choice for their implementation. The categorization of a program is completely determined by the requirements of the C standard, even though implementers may, and often do, provide stronger guarantees about program behavior.

The same is true for other specifications that incorporate C, such as POSIX. The difference with the other specifications is that by doing so, implementers as noted do not change the categorization of C programs, while the authors of other specifications tend to "override" it for their own purposes. As a result, a program that falls into one category in C might fall into a different category under POSIX—a category with stronger guarantees than the C category. (The converse isn't possible without conflicting with it.)

Well-defined behavior

Ideally, a program is portable without change, not just to the environments intended for it, but to all possible environments, whether or not they exist in the wild, ever did, or might in the future. Given the same input, a portable program runs with the same output everywhere it's compiled today, and will continue to do so in the future. Another way we can describe a portable program like this is to call it well-defined.

Because the fundamental purpose of a standard is to foster portability, the prime goal of the C standard is to describe the behavior of portable programs. The behavior of all other categories is a secondary concern.

Portable programs provide the strongest guarantees to their users. To ensure these guarantees, portable programs must:

  • Consist only of code with well-defined syntax and rely only on constructs with well-defined semantics.
  • Adhere to all the most stringent requirements of the C standard, and therefore:
    • Either make no unconditional uses of any optional features specified by the standard, or guard the use of each feature by the appropriate __STDC_XXX__ feature test macro.
    • Use no implementation extensions.
    • Stay within all minimum implementation limits specified by the standard.

Unspecified behavior

The standard describes a few dozen coding constructs that might result in more than one kind of behavior (Annex J.1 lists 63 distinct cases). A couple of examples are whether the same string literals have distinct addresses, or the order in which subexpressions are evaluated. Uses of these constructs are valid code that implementations might handle in different ways that can even change from one instance to another. Because the identified constructs are valid code, their handling doesn't extend to issuing an error, either during compilation or at runtime.

A program with unspecified behavior is a program that is correct according to the standard, but that contains one or more coding constructs with unspecified behavior. Such a program must run successfully on every implementation (i.e., it must not crash—or trap, in the parlance of the standard—or otherwise misbehave), but could have different output from one execution to another.

For example, the behavior (specifically the result) of the following program is unspecified. It might return either 0 or 1, depending on whether or not the compiler merges the two empty strings into a single instance:

int main (void)
{
  const char *s = "";

  return s == "";
}

In theory, the program could also return 0 the first time it runs and 1 the second time, although in practice that's very unlikely.

Not every program that contains a coding construct with unspecified behavior necessarily has unspecified behavior itself. Programs have unspecified behavior only when their output depends in an observable way on the unspecified effects of the construct.

Implementation-defined behavior

Besides constructs with undefined behavior, the standard also describes a number of others that might result in more than one observable behavior, but where implementations are required to document how each instance is handled. Annex J.3 contains an exhaustive list of the 127 instances of implementation-defined behavior in C.

It should be emphasized that, just like the programs with unspecified behavior that we discussed in the previous section, a program with implementation-defined behavior is a valid program that must run successfully. It must not crash or otherwise misbehave.

This category is less useful than it seems. First, simply because a construct's behavior is documented doesn't make the construct portable, so relying on it doesn't improve the portability of a program. And second, not all implementations follow the requirement to document their choices.

Locale-specific behavior

Locale-specific behavior is behavior that arises when a program runs in a locale other than the default C locale. An example locale is fr_CA for Canadian French. This behavior is a special category that C programs opt into, either by being translated in a specific locale, or at runtime by calling the setlocale function with a first argument other than "C".

Implementations are expected to document the behavior of programs in specific locales. We will not consider this category further; it's effectively a special case of well-defined behavior where the output of a program can change from one environment (locale) to another.

Undefined behavior

Programs that don't fit in any of the categories discussed so far are undefined. Annex J.2 - Undefined behavior lists over 200 instances of such behavior. However, unlike the lists in the appendices of unspecified and implementation-defined behavior, Annex J.2's list is not exhaustive. This is sometimes a source of misunderstanding that is worth clearing up. Annex J.2 enumerates only the instances explicitly called out in the main body of the text. Beyond those instances where the C standard explicitly makes the behavior of a construct undefined, there are many others where the standard doesn't specify the behavior under some conditions. These instances use the words shall or shall not to outline a requirement on programs, and don't typically appear in the annex.

When interpreting the standard, it's important to be aware of this caveat. Don't assume that a programming construct always works the way it's working on your current implementation, just because the construct isn't explicitly listed as undefined.

The C standard describes programs that contain an undefined construct as either nonportable or erroneous. The practical difference is that nonportable programs run correctly (don't crash or misbehave), perhaps even with the expected output, in some environments. In contrast, erroneous programs might (although they don't need to) behave erratically or crash.

Most real-world programs fall into the undefined category: They contain some code whose behavior is not defined by the C standard. This isn't necessarily a bad thing. The behavior may be defined by some other standard or provided as an extension of the implementation targeted by the program's authors.

However, these assurances don't change the categorization of the program as undefined under the rules of the C standard. The behavior may also seem to be defined by a given implementation, in the sense that the construct and the program behave as the programmer intended. But unless the implementation documents as an extension the behavior of the construct under the conditions the standard doesn't specify, it should be considered erroneous.

In practice, not all implementations document all their extensions. Such practices create a fertile source of subtle portability bugs, as working programs start to misbehave after a trivial upgrade of the compiler or libraries.

Undefined behavior can be broken down into two kinds: compile-time undefined behavior and runtime undefined behavior. We'll consider each category in turn.

Compile-time undefined behavior

Compile-time undefined behavior refers to the behavior of constructs that are processed during translation, such as preprocessing directives or the evaluation of constant expressions. An example of nonportable compile-time undefined behavior is the use of an implementation-specific literal, such as a binary constant like 0b101010 with GCC (prior to C23). An example of erroneous compile-time undefined behavior (in popular compilers like Clang and GCC) is the following definition:

enum { e, f, g = f / e };

The semantics of the division expression are undefined when the second operand is zero. Because the compiler evaluates this division expression to process the definition of the enumeration, the problem occurs at compile time.

High-quality implementations diagnose erroneous compile-time undefined behavior by issuing an error message and rejecting the program. Other implementations might evaluate the division to zero, possibly as a deliberate extension, or as an incidental outcome of the coding path taken in the compiler. The example on Godbolt's Compiler Explorer illustrates these two alternatives in popular compilers.

But the possibilities don't end there. Low-quality implementations might silently accept the code but emit a program that behaves erratically at runtime: It might abort or produce inaccurate output. And in other implementations, the construct might even cause the compiler itself to crash.

In summary, a program that contains a coding construct with compile-time undefined behavior is undefined regardless of whether the program depends on that construct in any way. And the ill effects extend beyond the program to the implementation itself. With such a compiler, conspiracy theorists would be vindicated for believing that even just compiling an undefined program might wipe out one's hard drive.

Runtime undefined behavior

Runtime undefined behavior, on the other hand, refers to the outcome of a construct during program execution. An example of nonportable runtime undefined behavior that's pervasive in almost all programs is calling a library function that's defined neither in the program nor by the C standard, such POSIX popen.

Another example of runtime undefined behavior is calling the C standard printf function with a format directive whose behavior is undefined in the C standard but defined by some other standard such as POSIX. An example is %2$d:%1$d, used by POSIX to reorder the subsequent arguments in the output. Similar examples involve calling nonstandard functions that are either provided as intrinsics by the compiler or defined by the target system.

The most important difference from compile-time undefined behavior is that a program that contains a coding construct with runtime undefined behavior is itself undefined only if the construct is reached during the program's execution. However, if the construct is reached (or can be proven to be reached during program analysis), the effects of undefined runtime behavior can manifest at any point in the execution of the program. The effects can appear even before execution reaches the construct.

The pervasive reach of runtime constructs with undefined behavior is quite unintuitive, so much so that it has been colloquially referred to as undefined behavior time travel. The basic reason for this phenomenon is that the standard deliberately allows implementations to execute code out of order, including in parallel. Therefore, statements with no observable side effects that don't depend on each other can be reordered, or even merged with others, for greater efficiency.

Another important difference from compile-time undefined behavior is the ways high-quality implementations respond to an erroneous instance when they detect it: Although they can issue a warning message (many do for a small subset of undefined constructs), the standard prohibits conforming compilers from rejecting the program unless they can prove that every execution of the program is undefined.

In typical scenarios, this restriction virtually rules out such a response for separately compiled programs that consist of multiple translation units. Although it's common to promote warnings to errors through options like -Werror, doing so must be left at the discretion of the user, because it goes against the requirement of the standard and thus renders the implementation not conforming.

Optional features

Starting with C99, the standard has introduced features that conforming implementations are not required to provide. In most cases, for better portability, the presence of the support can be tested with a preprocessor conditional.

For example, support for variably modified types, an optional feature, can be checked by testing the value of the __STDC_NO_VLA__ macro. The value 1 means that the feature is supported, and any other value means it's not.

Relying on an optional feature is well-defined, but doing so without an equivalent backup (obviously) degrades the portability of the program to implementations that do not support the feature.

Implementation extensions

Virtually all implementations of C and C++ provide features that are not specified by the C standard at all. Most hosted environments don't implement the C standard alone, but also some superset of it, such as POSIX or some proprietary layer. Similarly, most compilers support other specifications that are supersets of C or extend C's capabilities, such as OpenMP. Many implementations also provide additional functions and capabilities that are not specified by any standard.

Extensions do not affect the conformance of the implementation, as long as they have no impact on the behavior of strictly conforming programs (discussed next). Implementations are required to document their extensions.

Strictly conforming and conforming programs

Programs whose behavior depends exclusively on portable code are termed strictly conforming in the C standard. As discussed in the section on well-defined behavior, such programs must not rely on any unspecified or implementation-defined constructs, use any implementation extensions, or use optional C features without a corresponding preprocessor guard. Strictly conforming programs can, however, invoke locale-specific behavior.

Other programs that are accepted (the standard uses the word acceptable) by at least one conforming implementation are simply called conforming. Conforming programs might contain code with unspecified, implementation-defined, and even undefined behavior, including syntactic extensions such as lexical tokens and additional keywords.

It's worth taking a minute to clarify what the standard means by acceptable: It means not only that an implementation successfully compiles the program (with or without warnings), but also that the program runs successfully to completion, and that it doesn't do so simply by accident. If it contains undefined code, it must be only nonportable code whose semantics are documented by the implementation, not code with erroneous undefined behavior (i.e., code whose semantics are not documented, even if they happen to be benign and perfectly reasonable).

It's important to emphasize that, although strictly conforming programs must not contain code with undefined behavior, they can contain constructs with unspecified or implementation-defined behavior. However, those that do so must avoid relying on such behavior.

This distinction might seem like playing word games, but it's valid and important. Remember, even the most portable programs usually contain unspecified behavior. Here's an example that's not unusual in production code:

int main (void)
{
  int i = 0, j = 0;

  return (++i - ++j);
}

The order in which the two subexpressions in (++i and ++j) are evaluated is unspecified. But because they are both evaluated before the subtraction, the result of the program is zero regardless. Thus, this program doesn't depend on the unspecified behavior, and so the program is strictly conforming: It's portable to and will run with the same output in all hosted environments.

The same argument would apply if the operand of the return statement were just 1 - 1. Even constant operands are evaluated, and the order of their evaluation is also unspecified.

Practical implications

So what are the practical implications of all this, that we as programmers (as opposed to the C committee, or compliance engineers) might care about?

The categories of behavior establish a common framework for reasoning about program correctness. Our number-one priority is to write correct programs, so it's rather important to have a solid, shared definition of what correct means.

Setting aside design requirements (those are unavoidably outside the scope of this discussion) and focusing strictly on coding, a correct program is a conforming program that's acceptable to the implementations we target.

With that definition in mind, deciding whether a program is correct should be easy as long as the implementations we target are conforming (including the requirement to document extensions). If all code in our program is defined either by the C standard (including unspecified behavior), or in the manuals that come with the compiler and libraries we depend on, it's correct.

In contrast, if our program contains code whose semantics aren't defined anywhere (i.e., nowhere in the manuals for our implementation or the rest of the system), it is, in the parlance of the C standard, erroneous. In common speech, it's buggy, and it doesn't matter whether the bug manifests itself in an observable way or is latent.

A corollary of this definition is that using an implementation that doesn't document some of the implementation-defined behavior or some of its extensions prevents us from relying on those features if we want correct programs. If using them cannot be avoided, we either need to accept that our programs are erroneous, or we need to redefine what correctness means to us. The latter course of action obviously presents problems when discussing our expectations with others—notably implementers, who rely on the standard definition of correctness.

References

Comments