Toward a Better Use of C11 Atomics – Part 2

Continued from Part 1.

Static Initialization

The C and C++ standards specify that

…the default (zero) initialization for objects with static or thread-local storage duration is guaranteed to produce a valid state.

This means that, for example, defining an atomic object at file scope without an initializer as shown below is guaranteed to initialize it to a default state and default (zero) value.

atomic_int counter;

Other than zero-initialization, the standards require that atomic objects with static and thread storage duration be initialized using one of the ATOMIC_VAR_INIT() and atomic_init() macros. This requirement is a vestige of the original proposal which specified atomic types as structs. The expectation was that emulated implementations that stored a mutex in the struct along with the value would define the ATOMIC_VAR_INIT() macro along the following lines

#define ATOMIC_VAR_INIT(Value) \
    { .mutex = PTHREAD_MUTEX_INITIALIZER, .value = (Value) }

But since the atomic types C11 ended up with are not structs but rather qualified basic types, such an initializer would be invalid. Not only that, since in the same implementation some atomic types may be lock-free and others not, the same macro couldn’t be used to initialize objects of both kinds of types. As a result, the only reasonable way to define the macro is as simply expanding to its argument, regardless of whether the implementation is lock-free or stateful. This is also how known implementations implement it. Consequently, the ATOMIC_VAR_INIT() macro is entirely unnecessary and atomic objects can be simply initialized using the same syntax as ordinary, non-atomic objects.

Dynamic Initialization

Although atomic types are new in C11 and C++11, integers and pointers have been atomically operated on in various ways for many years. Most low-level software such as operating system kernels and much higher-level software, including many general-purpose libraries such as the C or POSIX library have made use of the hardware atomic instructions to efficiently implement various primitives such as counters. The major benefit that the C and C++ standards provide is a uniform interface so that the low level software can be written more portably, without relying on the knowledge of the specifics of the hardware on each target system. The goal is not only for new software to adopt the standard interfaces to atomics but also for existing software to over time transition to using them instead of the legacy, less portable and less well specified and tested APIs. To that end, it’s important that the standard interfaces provide an easy migration path that minimizes the risk of introducing bugs in the process.

Unfortunately, the C and C++ standard specify that each atomic object must be initialized exactly once, and only using one of the two initialization macros. Initializing an object multiple times is undefined. This again is a vestige of the original specification where atomic types were structs possibly with additional state holding the object’s mutex. Because a mutex must be initialized exactly once, it made sense to require the same of atomics. But since the standard C atomic types are basic types and not structs, this restriction is no longer necessary, either in lock-free implementations, or in stateful ones. Let’s first see why atomic_init() isn’t necessary in lock-free implementations and then explore why it isn’t needed in the hypothetical stateful ones.

Known lock-free implementations define atomic_init() as some form of assignment. GCC 5, for example, implements it with sequentially consistent semantics:

#define atomic_init(PTR, VAL) \
    do {                      \
        *(PTR) = (VAL);       \
    } while (0)

It turns out that this isn’t the most efficient implementation (see GCC bug 68868) but it’s a correct one nonetheless. A more efficient definition of the macro for GCC and compatible compilers is simply this:

#define atomic_init(PTR, VAL) \
    atomic_store_explicit (PTR, VAL, memory_order_relaxed)

The difference between the two is that the first form will likely result in a fence emitted by the compiler while the second form will not. As compilers get better at optimizing atomic operations, they will likely avoid emitting the fence when it’s not necessary. For example, when a compiler can prove that the atomic object isn’t shared with other threads (a common situation when initializing an object that has just been dynamically allocated), it can safely treat the atomic variable as an ordinary one. (See GCC bug 68622 and the GCC Atomic Optimizations Wiki page).

Either way, since the macro expands to either a simple construct in the language (assignment) which is already valid for atomics, or to an invocation of the atomic_store_explicit() API, there is no need for yet another way to do the same thing.

Let’s now see why the macro wouldn’t be useful in a stateful implementation if one were to emerge that stored a mutex alongside the value in atomic objects. Suppose that this implementation defined the atomic_init() macro to first initialize the mutex as if by calling either the C11 mtx_init() function, or some other mutex initialization function like the POSIX pthread_mutex_init(). For illustration, the macro definition might then look something like this:

#define atomic_init(PTR, VAL)               \
    do {                                    \
        mtx_init ((PTR)->mutex, mtx_plain); \
        (PTR)->value = (VAL);               \
    } while (0)

In reality, since atomic types aren’t structs but basic types, the macro would have to be somewhat more involved and likely employ some compiler magic to access the hidden mutex and value members of the struct. But that detail is not important for this discussion.

Since both mtx_init() and pthread_mutex_init() allocate resources and can fail, how should this hypothetical implementation let programs detect and handle the failure? The answer is that it couldn’t because the atomic_init() macro doesn’t return a value. When the atomic object were to be used after a failed initialization, an attempt to lock its uninitialized mutex would be undefined. It could lead to a crash of the program, or worse, it could allow the program to continue to run and access the object without synchronization, causing a data race.

But let’s suppose that this hypothetical implementation guaranteed the initialization of its mutex object to never fail. Could that implementation be then used safely? It turns out that it most likely could not. Here’s why. All implementations of mutexes provide a pair of functions: one to initialize it and one to destroy it and release the system resources acquired during its construction. In the C11 threads library these functions are mtx_init() and mtx_destroy(). In the POSIX threads library, they are pthread_mutex_init() and pthread_mutex_destroy(). Calls to these two functions must be paired to avoid resource leaks. The Windows SDK provides the CreateMutex() and CloseHandle() pair of functions with equivalent semantics. But since the C11 atomic API provides only atomic_init() and no equivalent of atomic_destroy(), programs using such an implementation would have no way to release the resources acquired in atomic_init(). As a result, implementations must avoid storing any state that requires non-trivial initialization in atomic objects.

Consequently, not only is it safe to use the atomic_init() macro to “initialize” the same atomic variable more than once, since it has the same effects as plain assignment or, more efficiently, as the atomic_store_explicit() generic function invoked with the memory_order_relaxed argument, the macro is unnecessary. More than that, since the most efficient implementation of the atomic_init() macro provides only weakly consistent semantics which normally must be explicitly requested by passing memory_order_relaxed as the last argument to the atomic_store_explicit() generic function, the macro is arguably less safe than the explicit alternative. This is especially true for legacy software migrating to C11 atomics, since in the migration it may not always be clear which assignment is initialization that doesn’t require synchronization and which assignment is to a shared variable that must be protected from data races and on whose sequential order the program might depend. Using plain assignment in this case with its sequentially consistent guarantees or, equivalently, calling atomic_store(), is the safer alternative. Conversely, when using atomic_init() is known to be safe, using atomic_store_explicit(..., memory_order_relaxed) instead is also safe and guaranteed to be at least as efficient and could be more.

By the way of an example, suppose we are changing the following legacy code to use C11 atomics:

typedef struct SomeStruct {
      int         counter;   // treated as atomic
      struct Data data;      // ordinary non-atomic data
} SomeStruct;

void initSomeStruct (SomeStruct *ps) {
    legacy_atomic_store (&ps->counter, 0);   // possible initialization
    ...
}

As the first step, we change the type of counter to atomic_int, and replace the call to legacy_atomic_store() presumably made to initialize it, with an ordinary assignment:

typedef struct SomeStruct {
      atomic_int  counter;
      struct Data data;
} SomeStruct;

void initSomeStruct (SomeStruct *ps) {
    ps->counter = 0;   // safe but slow
    // Equivalently:
    //   atomic_store (&ps->counter, 0);
    // or
    //   atomic_store_explicit (&ps->counter, 0, memory_order_seq_cst);
    ...
}

After the changed code has been tested and validated and after it has been confirmed that initSomeStruct() really is called only to initialize the data and not to enforce any particular memory order it may be possible to optimize it by replacing the sequentially consistent store with a weak one:

void initSomeStruct (SomeStruct *ps) {
    atomic_store_explicit (&ps->counter, 0, memory_order_weak);   // fast initialization
    ...
}

Conclusion

C and C++ atomic types and operations are a powerful interface to write high performance, data race-free software running on multi-processor or multi-core systems. As one might expect, they provide full interoperability between mixed code written in the two languages. In this article we have shown that while the interfaces aren’t perfect, the problems that have been uncovered so far are relatively minor and can be easily corrected to make atomics even easier and safer to use than they are today. In fact, users of existing implementations don’t need to wait for the standards to change to take advantage of the corrections. It turns out that they are already available and the standards simply need to catch up with existing practice.


Join the Red Hat Developer Program (it’s free) and get access to related cheat sheets, books, and product downloads.

 

Share

Leave a Reply