The February 2019 ISO C++ meeting was held in Kailua-Kona, Hawaii. As usual, Red Hat sent three of us to the meeting: I attended in the SG1 (parallelism and concurrency) group, Jonathan Wakely in Library, and Jason Merrill in the Core Working Group (see Jason's report here). In this report, I'll cover a few highlights of the meeting, focusing on the papers that were discussed.
The first part of the week in SG1 was spent primarily on papers related to the Executors proposal (p0443). First up was "Integrating executors with the parallel algorithms" (p1019). SG1 also saw this paper at the Fall WG21 meeting in San Diego (see my Fall 2018 trip report). Much of the discussion around this paper in Kona centered on whether supplying an executor to an algorithm required that the algorithm must execute on the supplied executor. Currently, execution policies are just hints to the algorithm, and the algorithm is free to ignore the hint (e.g., some algorithms have no profitable parallelization, or parallelization may not be profitable for small input ranges, so an algorithm may ignore the user's request for parallelization).
We also spent some time trying to get a clearer definition of what counts as a Thread of Execution (ToE) in the context of p1019 (e.g., does a ToE imply TLS? What about fibers, SIMD lanes, etc.?) and the standard parallel algorithms, as well as how exceptions might be handled. Currently, exceptions in parallel algorithms terminate the calling program. The consensus was that we'd like to aim for executors supplied to algorithms to require that the algorithm strictly execute on the supplied executor. The author was asked to work on a subsequent revision of the paper with this guidance in mind. No conclusions were reached on the topic of exception propagation or what specifically constitutes a ToE in this context.
Next, there was a brief discussion on an experience report I wrote for the Fall meeting (p1192). I had no new information on this paper for Kona but expect to bring either an update or a new paper based on work I will be doing to replace the default execution backend of the libstdc++ implementation of parallel algorithms from Intel's Thread Building Blocks to a backend based on OpenMP.
We also looked at a paper proposing an "Occupancy" property for executors (p1259), which could communicate some measure of maximal available parallelism to a parallel algorithm. SG1 spent time discussing how to word what this property actually conveys. SG1 eventually selected some new wording and forwarded the paper to LEWG to be included alongside the main Executors paper (p0443).
On Monday afternoon, we considered a handful of networking-related papers:
- "Networking TS enhancement to enable custom I/O executors" (p1322)
- "Reconsider the Networking TS for inclusion in C++20" (p1446)
- "Merge most of Networking TS into C++ Working Draft" (p1259)
The first paper (p1322) describes changes to the Networking TS to allow implementations to provide custom I/O contexts beyond those identified by the TS (currently implementations get threads donated by calling .run() on the io_context). There was unanimous agreement to include the proposed changes in the current working draft of the Networking TS.
Discussion of p1446 centered primarily on what to do if LEWG moved executors for C++20. As already noted, p1322 had unanimous consent to be merged to the current Networking TS draft, along with (p0958), which SG1 approved in San Diego. There was also unanimous consent that any future version of the Networking TS must include these changes. p1253 was deemed moot by the LEWG decision in San Diego to not advance networking for C++20.
The last paper considered by SG1 on Monday was p1478r0 (which was not in the pre-meeting mailing but should be in the forthcoming post-meeting mailing). This paper proposes formalizing a definition for byte-wise atomic memcpy. The paper focuses on Seqlocks (see also https://dl.acm.org/citation.cfm?doid=2247684.2247688) as a use-case for this feature, and it is expected that most existing implementations of memcpy already "do the right thing." As such, the paper proposes adding two additional versions of memcpy that also accept a memory_order argument:
- atomic_source_memcpy
- atomic_dest_memcpy
which would likely be implemented as aliases for existing memcpy implementations.
On Tuesday morning, SG1 looked at some papers on the intersection of coroutines and executors, all of which are more forward-looking than C++20. The papers discussed were:
- "Experience Report: Implementing a Coroutines TS Frontend to an Existing
Tasking Library" (p1403) - "Better integration of Sender Executors" (p1349)
- "Unifying Asynchronous APIs in C++ Standard Library" (p1341)
Tuesday afternoon, LEWG and SG1 held a joint session on Executors and the properties mechanism from the Executors proposal as a standalone facility (https://wg21.link/p1393). The consensus was to forward p1393 to LWG for C++ next. This effectively meant that, in addition to Networking, Executors will not ship with C++20.
On Wednesday, SG1 considered a request to withdraw the Concurrency TS (p1445). The premise is that most of the Concurrency TS will be shipping as part of C++20, and the remaining elements are dependent on what happens with Executors. SG1 voted in favor of withdrawing the TS.
The next paper SG1 considered was a proposal for a second Concurrency TS (p09402), with the intention of collecting those features in the first Concurrency TS that are not part of C++20, as well several new features that SG1 has been looking at over the last few years, for example:
- Executor-related facilities
- Concurrent data structures (concurrent queues, associative containers, counters)
- New synchronization primitives (cell, hazard pointers, RCU)
- Stackful coroutines (a.k.a. fibers)
SG1 generally approves of this approach.
SG1 also discussed several papers related to concurrent data structures:
- "Concurrency TS is growing: Concurrent Utilities and Data Structures" (p0940)
- "Concurrent associative data structure with unsynchronized view" (p0652)
- "Memory Model Issues for Concurrent Data Structures" (p0387)
On Thursday, SG1 considered a number of papers:
- "Giving atomic_ref implementers more flexibility by providing customization points" (p1372).
- "Deprecating volatile" (p1152). This paper looks at moving more problematic uses of volatile to Annex D. SG1 voted to forward to EWG.
- "volatile_load<T> and volatile_store<T>" (p1382). This paper is related to p1152 and aims to give a standard-supported (as opposed to what the standard actually says about volatile) way to access shared/memory mapped regions. SG1 is generally in favor of the direction of this paper.
- "Asymmetric Fences" (p1202). This paper proposes standardizing facilities to support use-cases in which concurrent accesses can be split into a common "fast path" and uncommon "slow path." SG1 consensus was to forward to LEWG for inclusion in a Concurrency TS2.
- "Not all agents have TLS" (p1367). This paper aims to tighten the definition for thread locals within the standard.
- "Executor properties for affinity-based execution" (p1436). This paper is a successor to p0796 and looks to recast the notion of affinity as properties on executors via the executor properties mechanism.
- "Stop Token and Joining Thread, Rev 9" (p0660). This paper has already been forwarded by LEWG and will be in C++20. SG1 voted to approve some wording changes prior to review by LWG.
Friday was a short day for SG1, and we considered a couple of papers related to fibers:
- "fiber_context - fibers without a scheduler" (p0876). This paper has been making the rounds for a while. The authors of this paper, and the recently approved coroutines functionality once again stated that accepting coroutines does not mean we don't also want fibers. The bulk of the discussion was around wording review, and updated wording based on the Kona discussion is expected to be brought at the Cologne meeting in July.
- The other two papers regarding fibers on SG1's agenda were "Fibers under the magnifying glass" (p1364) and a "Response to 'Fibers under the magnifying glass'" (p0866). The first paper makes various performance and overhead claims regarding fiber implementations vs. stackless coroutines, which the second paper largely rebuts.
Stay tuned for more reports.
Last updated: February 6, 2024