Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

New features in OpenMP 5.0 and 5.1

May 3, 2021
Jakub Jelínek
Related topics:
Developer ToolsC, C#, C++Linux

Share:

    OpenMP is an API consisting of compiler directives and library routines for high-level parallelism in C and C++, as well as Fortran. Version 5.1 of OpenMP was released in November 2020 and version 5.0 was released in November 2018. This article discusses the new features from OpenMP 5.0 which are implemented in GCC 11, and some new OpenMP 5.1 features.

    OpenMP 5.0 features

    Let's start with features that were added in the OpenMP 5.0 standard version.

    Support for non-rectangular collapsed loops

    Before OpenMP 5.0, all OpenMP looping constructs (worksharing loops, simd, distribute, taskloop, and combined or composite constructs based on those) were required to be rectangular. This means that all of the lower bound, upper bound, and increment expressions of all the associated loops in the loop nest were required to be invariant against the outermost loop. OpenMP 5.0 still requires all the increment expressions to be loop-invariant, but allows some cases where the lower and upper bound expressions of the inner loops can be based on a single outer-loop iterator.

    There are restrictions to this new feature, however: An inner-loop iterator must use at most one outer-loop iterator, and the expressions need to resolve to a * outer + b, where a and b are loop-invariant expressions. If the inner and referenced outer loops have different increments, there are further restrictions to support easy computation of the number of iterations of the collapsed loop nest before the loop. In addition, non-rectangular loops might not have schedule or dist_schedule clauses specified. This allows the implementation to choose any iteration distribution it prefers.

    The following triangular loop is an example:

    #pragma omp for collapse(2)
    for (int i = 0; i < 100; i++)
      for (int j = 0; j < i; j++)
        arr[i][j] = compute (i, j);

    But a non-rectangular loop can also be much more complex:

    #pragma omp distribute parallel for simd collapse(4)
    for (int i = 0; i < 20; i++)
      for (int j = a; j >= g + i * h; j -= n)
        for (int k = 0; k < i; k++)
          for (int l = o * j; l < p; l += q)
            arr[i][j][k][l] = compute (i, j, k, l);

    The easiest implementation is by computing a rectangular hull of the loop nest and doing nothing inside of the combined loop body for iterations that wouldn't be run by the original loop. For example, for the first loop in this section, the implementation would be:

    #pragma omp for collapse(2)
    for (int i = 0; i < 100; i++)
      for (int j = 0; j < 100; j++)
        if (j < i)
          arr[i][j] = compute (i, j);

    Unfortunately, such an implementation can cause a significant work imbalance where some threads do no real work at all. Therefore, except for non-combined non-rectangular simd constructs, GCC 11 computes an accurate number of iterations before the loop. In the case of loop nests with just one loop dependent on outer-loop iterators, it uses Faulhaber's formula, with adjustments for the fact that some values of the outer iterator might result in no iterations of the inner loop. This way, as long as the loop body performs roughly the same amount of work for each iteration, the work is distributed evenly.

    Conditional lastprivate

    In OpenMP, the lastprivate clause can be used to retrieve the value of the privatized variable that was assigned in the last iteration of the loop. The lastprivate clause with a conditional modifier works as a fancy reduction, which chooses the value from the thread (or team, SIMD lane, or task) that executed the maximum logical iteration number. For example:

    #pragma omp parallel for lastprivate(conditional:v)
    for (int i = 0; i < 1024; i++)
      if (cond (i))
        v = compute (i);
    result (v);

    For this construct to work, the privatized variable must be modified only by storing directly to it, and shouldn't be modified through pointers or modified inside of other functions. This allows the implementation to find those stores easily and adjust a store to remember the logical iteration that stored it. This feature is implemented in GCC 10 already.

    Inclusive and exclusive scan support

    OpenMP 5.0 added support for implementing parallel prefix sums (otherwise known as cumulative sums or inclusive and exclusive scans). This support allows C++17 std::inclusive_scan and std::exclusive_scan to be parallelized using OpenMP. The syntax is built upon the reduction clause with a special modifier and a new directive that divides the loop body into two halves. For example:

    #pragma omp parallel for reduction (inscan, +:r)
    for (int i = 0; i < 1024; i++)
      {
        r += a[i];
        #pragma omp scan inclusive(r)
        b[i] = r;
      }

    The implementation can then split the loop into the two halves, creating not just one privatized variable per thread, but a full array for the entire construct. After evaluating one of the halves of user code for all iterations—which differs between inclusive and exclusive scans—efficient parallel computation of the prefix sum can be performed on the privatized array, and finally, the other half of the user code can be evaluated by all threads. The syntax allows the code to work properly even when the OpenMP pragmas are ignored. This feature is implemented in GCC 10.

    Declare variant support and meta-directives

    In OpenMP 5.0, some direct calls can be redirected to specialized alternative implementations based on the OpenMP context from which they are called. The specialization can be done based on which OpenMP constructs the call site is lexically nested in. The OpenMP implementation can then select the correct alternative based upon the implementation vendor, the CPU architecture and ISA flags for which the code is compiled, and so on. Here is an example:

    void foo_parallel_for (void);
    void foo_avx512 (void);
    void foo_ptx (void);
    #pragma omp declare variant (foo_parallel_for) \
    match (construct={parallel,for},device={kind("any")})
    #pragma omp declare variant (foo_avx512) \
    match (device={isa(avx512bw,avx512vl,"avx512f")})
    #pragma omp declare variant (foo_ptx) match (device={arch("nvptx")})
    void foo (void);

    If foo is called directly from within the lexical body of a worksharing loop that is lexically nested in a parallel construct (including the combined parallel for), the call will be replaced by a call to foo_parallel_for. If foo is called from code compiled for the previously mentioned AVX512 ISAs, foo_avx512 will be called instead. And finally, if foo is called from code running on NVidia PTX, the compiler will call foo_ptx instead.

    A complex scoring system, including user scores, decides which variant will be used in case multiple variants match. This construct is partially supported in GCC 10 and fully supported in GCC 11. The OpenMP 5.0 specification also allows meta-directives using similar syntax, where one of several different OpenMP directives can be used depending on the OpenMP context in which it is used.

    The loop construct

    In OpenMP 4.5, the various looping constructs prescribed to the implementation how it should divide the work. A programmer specified whether the work should be divided between teams in the league of teams, or between threads in the parallel region, or across SIMD lanes in a simd construct, and so on. OpenMP 5.0 offers a new loop construct that is less prescriptive and leaves more freedom to the implementation about how to actually implement the work division. Here's an example:

    #pragma omp loop bind(thread) collapse(2)
    for (int i = 0; i < 1024; i++)
      for (int j = 0; j < 1024; j++)
        a[i][j] = work (i, j);

    The bind clause is required on orphaned constructs and specifies which kind of threads that encounter it will participate in the construct. If the pragma is lexically nested in an OpenMP construct that makes the binding obvious, the bind clause can be omitted. The implementation is allowed to use extra threads to execute the iterations. The loop construct is implemented in GCC 10.

    There are restrictions on which OpenMP directives can appear in the body of the loop, and no OpenMP API calls can be used there. These restrictions were imposed so that the user program can't observe and rely on how the directive is actually implemented. Restrictions on work scheduling have been added in OpenMP 5.1, which is discussed next.

    OpenMP 5.1 features

    In OpenMP 5.1, C++ programs can specify OpenMP directives using C++11 attributes, in addition to the older use of pragmas. Two examples using attributes follow:

    [[omp::directive (parallel for, schedule(static))]]
    for (int i = 0; i < 1024; i++)
      a[i] = work (I);
    
    [[omp::sequence (directive (parallel, num_threads(16)), \
                     directive (for, schedule(static, 32)))]]
    for (int i = 0; i < 1024; i++)
      a[i] = work (i);

    OpenMP 5.1 added a scope directive, where all threads encountering it will execute the body of the construct. Private and reduction clauses can be applied to it. For example:

    #pragma omp scope private (i) reduction(+:r)
    {
      i = foo ();
      r += i;
    }

    Unless the nowait clause is present on the directive, there is an implicit barrier at the end of the region.

    OpenMP 5.1 has new assume, interop, dispatch, error, and nothing directives. Loop transformation directives were also added. The master was deprecated and replaced by the new masked construct. There are many new API calls, including:

    • omp_target_is_accessible
    • omp_get_mapped_ptr
    • omp_calloc
    • omp_aligned_alloc
    • omp_realloc
    • omp_set_num_teams
    • omp_set_teams_thread_limit
    • omp_get_max_teams
    • omp_get_teams_thread_limit

    The OpenMP API features history appendix covers all changes, including deprecated features.

    Try it out

    The specifications for both OpenMP 5.0 and OpenMP 5.1 are available at openmp.org/specifications/, including both PDF and HTML layouts. The latest version of GCC (GCC 11) supports the features described in this article and various others (this time not just C and C++, but many features also for Fortran). But several other new features of OpenMP will be implemented only in later GCC versions.

    Last updated: April 29, 2021

    Recent Posts

    • Meet the Red Hat Node.js team at PowerUP 2025

    • How to use pipelines for AI/ML automation at the edge

    • What's new in network observability 1.8

    • LLM Compressor: Optimize LLMs for low-latency deployments

    • How to set up NVIDIA NIM on Red Hat OpenShift AI

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue