A new version of the OpenMP standard, 5.0, was released in November 2018 and brings several new constructs to the users. OpenMP is an API consisting of compiler directives and library routines for high-level parallelism in C, C++, and Fortran programs. The upcoming version of GCC adds support for some parts of this newest version of the standard.
This article highlights some of the latest features, changes, and “gotchas” to look for in the OpenMP standard.
Task reductions
The new version of the standard allows the reduction
clause on the taskloop
construct and adds new clauses: task_reduction
for the taskgroup
construct and in_reduction
for the task
and taskloop
constructs. Previously, variables could be reduced only across threads in "parallel" regions or teams in "teams" regions or across SIMD lanes. Now, on the constructs, the reduction variables are privatized and many copies of them can be created.
For task reductions, the implementation can choose from various approaches. For example, it can create an array of the privatized copies of variables, with one element per thread, and initialize the privatized copies at the start of the taskgroup
, or it can create privatized copies lazily when they are first referenced in in_reduction
by a task. It can even create privatized copies more lazily when they are first accessed by a task, or it can choose something else. The variables need to be reduced by the time the taskgroup
construct finishes.
The reduction
clause on the taskloop
construct is allowed only if the nogroup
clause is not specified; thus, if there is an implicit taskgroup
around the tasks, the clause acts as both a task_reduction
clause on that implicit taskgroup
and an in_reduction
for the individual explicit tasks the construct creates. When the reduction
clause is specified on a parallel
or workshare
construct can have a task
modifier, and such reduction is then usable in the in_reduction
clause on task
and taskloop
constructs encountered during execution of the parallel
or workshare
construct.
Here's an example:
int foo () { int r = 0; #pragma omp taskloop reduction (+:r) for (int i = 0; i < 128; i++) r += work (i); return r; }
Inside of the tasks created for the taskloop
construct (each of them can handle one or more iterations), references to the r
variable might refer to a privatized copy of that variable initialized somewhere before the first access to the variable (inside of the task to 0, in this case), generally by using the user-defined reduction initializer. By the time the taskloop
construct finishes, the original r
should be reduced from any privatized copy using the reduction combiner (in this case, omp_out += omp_in
). The implementation can choose if it will do locking, use atomic operations, or reduce serially when all the tasks are finished.
The task reductions can work even with global variables that are privatized when needed, as shown in the following example. The taskgroup
construct establishes task reduction for the r
variable, and any tasks with a corresponding in_reduction
clause will participate in that reduction. (Note that the arguments of the in_reduction
and corresponding task_reduction
or reduction
clause must be the same.) The taskloop
construct in the foo
function shows that the reduction
clause there acts similarly, but r
referenced directly in the tasks created by the taskloop
participates in the reduction as well.
int r; void bar (int i) { #pragma omp task in_reduction (+:r) r += work (i, 0); #pragma omp task in_reduction (+:r) r += work (i, 1); } int foo () { #pragma omp taskgroup task_reduction (+:r) bar (0); #pragma omp taskloop reduction (+:r) for (int i = 1; i < 4; ++i) { bar (i); r += i; } }
!= conditions and C++ range loops
Older versions of the standard required the loop condition to use <
, >
, <=
, or >=
comparisons only. The new version of OpenMP allows !=
to be used as a loop condition as well, as shown in the example below, although in such a case, the increment expression must be incrementing or decrementing the iterator variable by 1. Then, at compile time, the compiler can determine if it is incremented or decremented. Especially with C++, users of random access iterators are used to writing !=
instead of <
or >
; when the compiler can figure out at compile time if the loop is incrementing or decrementing, it can transform the loop to <
or >
comparisons instead, so this change is mainly syntactic sugar.
#pragma omp for for (auto iv = something.begin (); iv != something.end (); ++iv) /* ... */;
The OpenMP 5.0 standard adds support for many of the C++11, C++14, and C++17 features, and the C++ range for
is one of them. So you can use this:
#pragma omp parallel for for (auto x : vec) /* ... */;
and the compiler will split the work among threads of parallel
.
Host teams construct
In OpenMP 4.5, the teams
construct used to be allowed only immediately lexically nested inside of the target
construct for offloading. The OpenMP 5.0 standard allows teams
to be used also for host parallelization, especially for NUMA systems where communication between different NUMA nodes can be expensive. The different teams then don't participate in normal synchronization, which is done inside of "parallel" regions. You can distribute work between the different NUMA nodes using the distribute
’ construct and within each NUMA node, parallelize using parallel
’ constructs and possibly simd
constructs.
Iterators in the depend clause
In the depend
clause, artificial iterators can be used to create at runtime a variable number of depend
clauses. In the following code:
#pragma omp task depend(iterator (i=0:64:16, long j=v1:v2), in: arr[i][j]) ;
if the v1
variable has a value of 2 and the v2
variable has the value 4, then the above code is handled at runtime like this:
#pragma omp task depend(in: arr[0][2], arr[0][3], arr[16][2], arr[16][3]) \ depend(in: arr[32][2], arr[32][3], arr[48][2], arr[48][3])
where the i
artificial variable has int
type and j
has long
type and is only in the scope of the depend
clause; i
iterates from 0 (inclusive) to 64 (exclusive) with step 16, and j
iterates from v1
(inclusive) to v2
exclusive) with step 1.
Atomic construct changes
In OpenMP 4.5, atomic
constructs were either relaxed (when no extra clause has been specified) or sequentially consistent (with the seq_cst
clause). OpenMP 5.0 allows you to specify various other memory ordering behaviors explicitly (the relaxed
, acquire
, release
, acq_rel
clauses) and allows you to change the default when it is not specified explicitly through a new requires
directive. Additionally, you can specify a hint
clause (although currently GCC just parses that and ignores it later on). On the flush
construct, you can also specify a memory order clause.
#pragma omp requires atomic_default_mem_order(seq_cst) // All atomic constructs will be in this translation unit // sequentially consistent unless specified otherwise. #pragma omp atomic update // This will be now seq_cst i += 1; #pragma omp atomic capture relaxed // This will be relaxed v = j *= 2;
Various new combined constructs
As syntactic sugar, several new combined constructs are now supported, especially to save typing when using OpenMP tasking. These include:
#pragma omp parallel master #pragma omp parallel master taskloop #pragma omp parallel master taskloop simd #pragma omp master taskloop #pragma omp master taskloop simd
omp_pause_resource
OpenMP 5.0 has two new API calls, omp_pause_resource
and omp_pause_resource_all
, through which users can ask the library to release resources (threads, offloading device data structures, etc.). This can allow, for example, the use of fork
without an immediate exec when OpenMP directives have been used before and will be used in the child as well.
Data sharing changes
In GCC 9, there is a change that actually isn't something new in OpenMP 5.0. OpenMP 4.0 introduced, likely by a mistake, a change where const
qualified variables were no longer predetermined as shared. This has an intended effect that you can specify those variables in a shared
clause (likely motivation for that change), but it also has an undesirable effect that when the default(none)
clause is used, the const
qualified variables must be specified explicitly in some data sharing clause if they are used inside of the construct, whereas previously they didn't have to be. In older GCC releases, I was hoping to reverse this change, but it has been agreed that this is not going to change. Users encountering this have various options; see OpenMP data sharing for more details.
Trying it out
The new version of the OpenMP 5.0 standard is available and includes many more new features than those described above. Also, see the OpenMP 5.0 Reference Guide. The upcoming version of GCC (GCC 9) is going to support the features described above and various others (for C and C++ only for now), but many other OpenMP 5.0 new features will be implemented only in later GCC versions. See OpenMP 5.0 support for GCC 9 for details on what exactly is implemented in GCC 9 and what will be implemented in later GCC releases.
More articles for C/C++ developers
- Usability improvements in GCC 9
- Understanding GCC warnings
- How to install GCC 8 and Clang/LLVM 6 on Red Hat Enterprise Linux 7
- Recommended compiler and linker flags for GCC
- Usability improvements in GCC 8
- Getting started with Clang/LLVM
- Detecting String Truncation with GCC 8
- Implicit fall through detection with GCC 7
- Memory error detection using GCC 7
- Diagnosing Function Pointer Security Flaws with a GCC plugin
- Toward a Better Use of C11 Atomics – Part 1
- How to install GCC 8 on Red Hat Enterprise Linux