Toward a Better Use of C11 Atomics – Part 1

Introduction

Following the lead of C++, along with a memory model describing the requirements and semantics of multithreaded programs, the C11 standard adopted a proposal for a set of atomic types and operations into the language. This change has made it possible to write portable multi-threaded software that efficiently manipulates objects indivisibly and without data races. The atomic types are fully interoperable between the two languages so that programs can be developed that share objects of atomic types across the language boundary. This paper examines some of the trade-offs of the design, points out some of its shortcomings, and outlines solutions that simplify the use of atomic objects in both languages.

History

The need for software to operate on objects of basic types in an atomic way goes back to the first multiprocessor systems. Many solutions were developed over the decades, each with its own unique characteristics that made writing portable code that took advantage of those features difficult. The C11 and C++11 standards codify an approach that allows software to make use of the hardware support for atomics on the broadest spectrum of processors. The authors of the proposals for C and C++ atomics envisioned efficient, lock-free implementations of these interfaces on architectures that provide robust support for such operations. The authors, however, didn’t want to preclude lock-based emulations on older or less capable hardware. Although emulated implementations were expected to store the lock associated with each atomic object separately from the object itself, this aspect too wasn’t mandated by the proposed specification. The proposers didn’t want to preclude stateful implementations of atomics.

The original proposal to add support for the feature into C (WG14 N1284) called for the addition of a fixed number of otherwise unspecified structs corresponding to each of the basic integer, character and address (pointer) types, along with a small set of generic operations on objects of these types. In efficient implementations the structs were anticipated to contain just a single member of the corresponding type (called the direct type), perhaps padded to the required alignment boundary for the hardware. In emulated implementations, the structs could also store a mutex to provide the necessary locking. The specification of the atomic types was carefully crafted to allow for full interoperability between C and C++. To achieve it, the atomic types must have the same alignment, size, and representation in both languages.

Limitations of the Proposed Approach

It was recognized early on that the approach initially proposed for C had several limitations. This section briefly examines those limitations and describes the solutions ultimately adopted to overcome them.

No Support For User-Defined Types

All C++ atomic types are specializations of a primary class template that C++ programmers can instantiate on arbitrary user-defined types. Objects of specializations on the user-defined types can be operated on using the same intuitive syntax as for scalar types. While atomic operations on many user-defined types must be emulated in software, operations on user-defined types that fit in a single CPU register can commonly take advantage of hardware support to achieve the same efficiency as those on ordinary scalars. However, since the set of C atomic types was finite, C programmers would have had no way to define atomic types of their own. For example, defining an atomic form of a struct like the C++ one below would not have been possible in C:

struct IntPair { int a, b; };
typedef std::atomic<IntPair> AtomicIntPair;
AtomicIntPair aip = { 0, 1 };

Inconvenient Syntax For Operations On Atomic Scalars

Thanks to operator overloading, C++ provides an intuitive syntax for manipulating its atomic types with the conservative sequentially consistent semantics. However, since C doesn’t allow overloading, it provided only a functional interface to manipulate atomic objects. This interface was considerably less convenient to use than the C++ API. For example, when the guarantees of sequentially consistent semantics are required, an atomic_int variable ai can be incremented using the same compound assignment expression in C++ as an ordinary int can:

ai += 7;

In C, the same variable would have had to have been incremented solely using the atomic_fetch_add() or atomic_fetch_add_explicit() generic functions:

atomic_fetch_add (&ai, 7);

For situations when sequential consistency isn’t necessary, the more general atomic_fetch_add_explicit() interface lets both C and C++ callers specify the desired memory order using the third argument. The following call is equivalent with the call to atomic_fetch_add() above:

atomic_fetch_add_explicit (&ai, 7, memory_order_seq_cst);

No Way To Initialize Atomics

C++ has constructors that make initialization of non-trivial objects transparent to users: an object with a constructor is initialized automatically when it’s defined. Since C has no equivalent feature it must rely on the programmer to explicitly initialize struct members before using them. However, since the members of the C atomic structs were deliberately unspecified, there was no portable way to initialize objects of such types. This wasn’t an issue for the efficient lock-free implementations which stored just a single data member in each atomic type — the value of the corresponding direct type. However, it posed a problem for the (hypothetical) emulated implementations that stored additional state such as a mutex along with the value in the same object. Such types were thought to require a similar approach as that used to initialize POSIX threads mutex objects.

In addition, for full interoperability between the languages, both need to make it possible to create an object of an atomic type that is in an uninitialized state and that can be initialized by a construct in the other language. Such a construct wasn’t available.

Solutions Adopted In C11

User-Defined Atomic Types

To provide a comparable degree of extensibility and syntactic convenience as C++, in response to the Atomic Proposal (WG14 N1485) C adopted a new _Atomic keyword. The keyword, which takes the form of either a type specifier or type qualifier, provides a general mechanism that enables declaring any type atomic, including user-defined structs. For example, the declarations below define the AtomicInt96 type whose objects can be manipulated atomically:

typedef struct Int96 {
    uint32_t high;
    uint64_t low;
} Int96;

typedef _Atomic Int96 AtomicInt96;

Intuitive Syntax For Operations On Atomic Types

In addition to the ability to define user-defined atomic types, the _Atomic keyword makes it possible to use objects of all atomic types in nearly all the same expressions and contexts as their direct (non-atomic) counterparts. Owing to the C feature called generic selection, it is also possible to use the same small set of functions (called generic functions) to operate on objects of all atomic types. This way C achieved nearly the same degree of convenience for the uses of atomic types as C++. For example, the following expressions are valid in both C and C++ and have the same effect and the same sequentially consistent memory order guarantees in both languages:

extern atomic_int ai;
atomic_fetch_add (&ai, 7);   // valid in both C and C++
ai += 7;                     // also valid in C and C++

Initialization

The initialization problem was recognized by the C++ study group working on interoperability of atomics between the two languages. To solve it, the paper on Explicit Initializers for Atomics (WG14 N1445) introduced a pair of macros for C and C++ programmers to initialize atomic objects with. The ATOMIC_VAR_INIT() macro can be used to initialize atomic objects with static storage duration to a non-zero value, while the atomic_init() macro is used to initialize automatic or dynamically allocated objects. For example, the following statically initializes the global atomic variable index to -1:

atomic_int index = ATOMIC_VAR_INIT (-1);

and this function initializes the object pointed to by the argument a to the specified values at runtime:

void init96 (AtomicInt96 *a, uint32_t high, uint64_t low)
{
    Int96 val = { .high = high, .low = low };
    atomic_init (a, val);
}

Outstanding And New Problems

Unfortunately, not all problems with the specification of atomics were recognized before the C11 and C++11 standards were published. A number of issues have been discovered since then. Some of them have been brought to the attention of the language committees. A subset of those have been acknowledged as defects or are being actively discussed. Some others have only recently been uncovered and are waiting to be raised as potential defect reports. This section briefly summarizes the first set.  In Part 2 of the article we then spend a bit more time discussing the second set of remaining problems, specifically initialization.

DR 431 – atomic_compare_exchange: What does it mean to say two structs compare equal?

WG14 Defect Report 431 (DR 431) raises a question about the intended semantics of the atomic_compare_exchange() generic function. The function is specified as follows:

atomic_compare_exchange_strong (volatile A *object, C *expected, C desired);

Atomically, compares the value pointed to by object for equality with that in expected, and if true, replaces the value pointed to by object with desired, and if false, updates the value in expected with the value pointed to by object.

The problem with the specification is that the concept of equality is not defined for objects of aggregate types such as structs. The informative note that follows the description suggests that implementations are expected to use the memcmp() function to perform the comparison, as if like so:

if (memcmp (object, expected, sizeof (*object)) == 0)
    memcpy (object, &desired, sizeof (*object));
else
    memcpy (expected, object, sizeof (*object));

However, using memcmp() to compare objects containing padding is not guaranteed to yield consistent results for objects with distinct representations of the same value. This includes not only structs with padding, but also objects of type _Bool (and in C++ the type bool) that may have many representations of each of the two values, true and false. It also includes any integer or pointer types in the rare implementations where objects of such types have padding bits. Furthermore, in another defect report, DR 474, Blaine Garst, the author of the C _Atomic keyword proposal argues that the memcmp() semantics are not appropriate for structs.

To better understand the problem, consider the following function that atomically increments an AtomicInt96 value:

void inc96 (AtomicInt96 *object)
{
    Int96 expected = *object;
    Int96 desired;
    do {
        desired.low = expected.low + 1;
        desired.high = expected.high;
        if (desired.low == 0)
            ++desired.high;
      } while (!atomic_compare_exchange_strong (object, &expected, desired));
}

The function first copies the value of the atomic variable to the ordinary, non-atomic local variables expected and desired. It then starts a loop in which it non-atomically increments desired and attempts to atomically assign its value to object. The assignment is successful if the value of object is equal to expected, otherwise it fails. On success atomic_compare_exchange_strong() returns true, otherwise it returns false and stores the value of *object in expected. The function loops until the assignment is successful.

The problem pointed out by DR 431 is that calling memcmp() on a pair of structs with padding has unspecified results. This should be obvious when the destination object is initialized by assigning to its individual members as done in the inc96() function, but it is so even if the two structs are copies of one another obtained by assignment, because the assignment need not copy the padding bits. To make matters worse, the problem cannot be solved by zeroing out the storage occupied by the struct by calling memset(&desired, 0, sizeof desired) before assigning to its members because C allows each assignment to modify the padding bits in an unspecified way. This is permitted to let implementations to use faster instructions when assigning values to members that are smaller than the size of the register used in the more efficient instructions.

Fortunately, while this problem is real, as Hans Boehm, one of the authors of the atomic proposal points out, it doesn’t affect the correctness of the loop. At worst, it adds one iteration to it. This is because on failure, the atomic_compare_exchange_strong() call stores a bitwise copy of the atomic variable in expected, obtained as if by a call to memcpy(). Because the bitwise copy is obtained by accessing the object via an lvalue of unsigned char (that is how memcpy() is specified to copy bytes) it is guaranteed by the C standard to have the exact same bit representation as the original.

DR 485 — Problem with the specification of ATOMIC_VAR_INIT

WG14 Defect Report 485 (DR 485) points out that the ATOMIC_VAR_INIT() macro as specified by the C standard and as defined by most implementations cannot be used with the "{ a, b }" form of initializers to initialize user-defined structs. Typically, the macro is defined like so:

#define ATOMIC_VAR_INIT(VALUE) (VALUE)

Using this macro, an AtomicInt96 object can have one of its members initialized this way:

AtomicInt96 a = ATOMIC_VAR_INIT ({ .low = 1 });

However, the macro cannot be used to also initialize the other members of the struct using the same intuitive aggregate syntax:

AtomicInt96 a = ATOMIC_VAR_INIT ({ .high = 1, .low = 2 });

That’s because the macro takes a single argument and the comma separating the initializers is interpreted by the preprocessor as separating arguments to the macro. To get around the problem, clever users might resort to the following awkward form instead:

AtomicInt96 a = ATOMIC_VAR_INIT (((Int96){ .high = 1, .low = 2 }));

Notice how the whole initializer is enclosed in a pair parentheses to foil the preprocessor and have it treat the expression as a single argument to the ATOMIC_VAR_INIT() macro. This is not only unintuitive but also more difficult to read than the first form.

To solve the problem, the technical corrigendum proposed in DR 485 recommends that the standard be changed to essentially require ATOMIC_VAR_INIT() to be a macro accepting a variable number of arguments. With the macro defined like so any number of arguments can be provided:

#define ATOMIC_VAR_INIT(...) __VA_ARGS__

In the next section we show that the macro actually isn’t necessary at all and that atomic aggregate objects can be simply initialized the same way as ordinary, non-atomic aggregates.

 

Share