Red Hat at the ISO C++ Standards Meeting (June 2014): Parallelism and Concurrency

Recently Red Hat sent several representatives to the JTC1/SC22/WG21 C++ Standards Committee meetings, which were held in June 2014 at the University of Applied Sciences in Rapperswil, Switzerland.

As in past ISO C++ meetings, SG1, the study group on parallelism and concurrency, met for the whole week to discuss proposals and work on the technical specifications (TS) for both parallelism and concurrency.

The Parallelism TS seems ready for a first publication soon. SG1 renamed the execution policy that allows vector hardware to be used by the parallel algorithms library to distinguish it from traditional vector execution as found on many current CPUs; this change clarifies that this policy allows both vector and task parallelism to be used, and allows for a vector-only policy to be added later on. SG1 also discussed a proposal about abstractions for task parallelism, as well as a few other proposals related to parallelism.

On the concurrency side, SG1 discussed a few proposals related to synchronization, for example one about latches and barriers, which SG1 wants to be added to the Concurrency TS after further review. The other large group of concurrency-related proposals all revolve around the concurrent execution of tasks, and how to let programmers control that. The proposals about coroutines and “Resumable Functions” have in common that they both are based on, conceptually, userspace threads; I’ve argued that this common part should be unified in these proposals, and I think we made initial progress toward this during the meeting. Finally, we discussed a couple of other proposals that suggested changes to or alternatives for the Executor facilities in the Concurrency TS; in the end, there wasn’t enough agreement about the current Executor feature anymore and SG1 agreed to remove this part from the Concurrency TS for now.

I presented two papers at the meeting. First, a proposal about light-weight execution agents (EAs), which are an abstract notion used to specify how code is executed in parallel or concurrently. Specifically, I define three kinds of EAs that have different forward progress guarantees (i.e., guarantees for when and under which circumstances an EA will execute a piece of code): concurrent, parallel, and weakly parallel. Concurrent EAs always eventually make progress, no matter what other EAs are or are not doing; thus, this is the same guarantee that OS threads provide under typical general-purpose OS schedulers. Parallel EAs give weaker guarantees, which resemble what a thread pool with a bounded number of threads would provide: once such an EA starts executing (e.g., an iteration of a parallel loop), it will behave like a thread; but there is no guarantee that such an EA will get started concurrently with another EA (because there might not be any thread left in the pool to start the EA). Finally, weakly parallel EAs provide even weaker progress guarantees but allow code to be executed with vector instructions. The benefit of these definitions is that (1) they give precise rules to the programmer about what kinds of synchronization are allowed in parallel tasks (e.g., can one safely use a mutex that blocks until another EA makes progress?) and (2) make it clear what implementations have to provide (e.g., can a parallel loop be implemented using a bounded thread pool?). These definitions are not yet part of any of the technical specifications, but there was general agreement in SG1 at the meeting that forward progress guarantees are an essential part of specifying parallel or concurrent execution.

Second, I presented a paper about memory_order_consume that I co-authored together with Paul McKenney (of Read-Copy-Update fame) and Jeff Preshing. memory_order_consume is a feature in C++11 and C11 that is supposed to allow synchronization experts to rely on ordering due to data dependencies instead of having to use explicit hardware barriers. This is heavily used in the Linux kernel’s Read-Copy-Update implementation, for example, and can decrease the cost of synchronization on Power and ARM CPUs. The problem with memory_order_consume is that the way it is specified in the current standard makes it impractical to implement in a compiler, and we are not aware of any optimized implementations of it. In the paper, we analyze why this is the case and propose alternative semantics that seem practical to implement. There is consensus in SG1 that we need to fix memory_order_consume, and while our paper is a first step towards a better memory_order_consume, we got good feedback and even a suggestion for one further potential solution in the meeting.

We’re always interested to hear what matters to Red Hat Enterprise Linux developers, so if you have comments or questions on any aspects of the upcoming C++ standards – in the concurrency area, or otherwise – please feel free to get in touch with us at rheldevelop AT redhat DOT com or @RHELdevelop.

Editor’s note:  a related article on C++ Core and Library is also available.