Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

OpenMP 4.0 support in Developer Toolset 3 Beta -- Parallel programming extensions for today's architectures

September 30, 2014
Jakub Jelínek Matt Newsome
Related topics:
Developer Tools
Related products:
Developer Tools

Share:

    In this article, we'll take a look at the OpenMP parallel programming extensions to C, C++ and Fortran - OpenMP 4.0. These are available out of the box in GCC v4.9.1, available to Red Hat Enterprise Linux developers via Red Hat Developer Toolset v3.0 (currently at beta release).

    For a thorough backgrounder in parallelism and concurrency programming concepts, see Torvald Riegel's earlier articles (part 1 and part 2). In this article, we'll instead dig into the nuts and bolts of what OpenMP v4 provides to developers, and how it works in practice in GCC.

    OpenMP v4.0

    The OpenMP 4.0 standard was released in July 2013 and includes various enhancements compared to the OpenMP v3.1 support shipped in RHEL 7's system compiler and in Developer Toolset v2.1 and earlier. These enhancements include SIMD constructs, device constructs, enhanced CPU affinity support, task dependencies, task groups, user defined reductions, construct cancellation and various other smaller changes. We'll talk about each of these enhancements in turn below.

    As with older versions of OpenMP, to enable OpenMP support in GCC one should use the -fopenmp compiler option during both compilation and linking.

    SIMD

    SIMD constructs were added to help the compiler vectorize performance critical loops. For example, in the following testcase:

    int foo (int *p, int *q) {
      int i, r = 0;
      #pragma omp simd reduction(+:r) aligned(p,q:32)
      for (i = 0; i < 1024; i++) {
        p[i] = q[i] * 2;
        r += p[i];
      }
      return r;
    }

    the new pragma directive tells the compiler that there are no loop-carried lexical backward data dependencies which would prevent the vectorization, hints that both "p" and "q" pointers are 32-byte aligned and requests the "r" variable to be privatized and used to compute a reduction (each SIMD lane will compute its own sum, and at the end those results are combined).

    The SIMD constructs can be combined with various other constructs, so a loop can be e.g. parallelized and vectorized at the same time, and one can declare certain functions with additional pragma to request creation of extra version(s) which will process multiple arguments simultaneously.

    #pragma omp declare simd simdlen(8) notinbranch uniform(y)
    int bar (int x, int y) { return x * y; }
    int foo (int *p, int *q) {
      int i, r = 0;
      #pragma omp parallel for simd reduction(+:r) aligned(p, q:32) schedule(static, 32)
      for (i = 0; i < 1024; i++) {
        p[i] = bar (q[i], 2);
        r += p[i];
      }
      return r;
    }

    In the above example for i?86/x86_64 architectures, GCC 4.9 creates 3 extra versions of bar, one for SSE2, one for AVX and one for AVX2, which can process 8 "x" values in one call, passed in a vector register(s). "y" is passed in as a scalar, the return value is again a vector. The combined constructs parallelizes the loop with 32 iteration chunks spread across CPU threads, and each chunk is then vectorized.

    Device Constructs

    Device constructs allow offloading of certain regions of code to specialized acceleration devices. In GCC 4.9, the OpenMP 4.0 device constructs are recognized, but no acceleration devices are supported yet, so the regions are executed using host fallback on the host CPU, but there is work underway in the upstream GCC project for GCC 5 to support offloading e.g. on Intel MIC accelerator cards, NVidia PTX and eventually AMD HSA too.

    CPU affinity

    GCC 4.8 and earlier supported CPU affinity to some extent, e.g. through the GOMP_CPU_AFFINITY environment variable and boolean OMP_PROC_BIND environment variable, but GCC 4.9 offers the much more precise OMP_PROC_BIND algorithm with an OMP_PLACES environment variable allowing description of the CPU topology.

    Task Dependencies and Groups

    Tasks in OpenMP 4.0 have been enhanced, so that it is possible to describe dependencies between child tasks of the same parent task, e.g. where a variable is shared by a number of tasks, and one of those tasks needs to wait until all tasks writing
    to that variable are complete. Tasks can be also grouped into task groups, where the end of the task group region waits for all the tasks from the task group to themselves complete.

    subroutine dep
      integer :: x
      x = 1
      !$omp parallel
        !$omp single
          !$omp taskgroup
            !$omp task shared (x) depend(out: x)
              x = 2
            !$omp end task
            !$omp task shared (x) depend(in: x)
              if (x.ne.2) call abort
            !$omp end task
            !$omp task shared (x) depend(in: x)
              if (x.ne.2) call abort
            !$omp end task
          !$omp end taskgroup
        !$omp end single
      !$omp end parallel
    end subroutine dep
    

    Here, the first task is a writer to x, and the other two tasks can't be scheduled until it is complete, while the other two tasks can be run simultaneously.

    User-Defined Reductions

    In OpenMP 3.0, only basic arithmetic was possible in C/C++ reductions (and in Fortran with a couple of extra intrinsics). OpenMP 3.1 added new support for min and max intrinsic reductions for C/C++ developers. In OpenMP 4.0, however, users can define their own reductions for both arithmetic types and classes or structures, by specifying a combiner operation as well as, optionally, an initializer operation. For example:

    struct S
    {
      int s;
      void foo (S &x) { s += x.s; }
      S (const S &x) { s = 0; }
      S () { s = 0; }
      ~S ();
    };
    
    #pragma omp declare reduction (foo: S: omp_out.foo (omp_in)) 
    initializer (omp_priv (omp_orig))

    defines a user defined foo reduction on class S. When this is used as:

    int bar ()
    {
      S s;
      #pragma omp parallel for reduction (foo: s)
      for (int i = 0; i < 64; i++)
        s.s += i;
      return s.s;
    }

    each thread will have its own private object, will perform the partial sum on it, and then the foo method will be called for each thread on the original "s" variable with a reference to the private copy of the variable. User defined reductions may, of course, also be used together with SIMD constructs, or device constructs.

    Construct Cancellation

    Some constructs - parallel, for, taskgroup and sections - can be cancelled in OpenMP 4.0, as long as the cancellation construct is lexically within the construct being cancelled and a few other conditions are met. As C++ exceptions must not be thrown through the OpenMP constructs, this can sometimes be useful to avoid doing unnecessary work once some exception has been raised and caught in the region. As an alternative example, when using tasks to search for something, if a particular task succeeds in finding the one required result, it is possible to cancel the entire taskgroup. Here's an example using C++ exceptions:

    void foo () {
      std::exception *exc = NULL;
      #pragma omp parallel shared(exc)
      {
        #pragma omp for
        for (int i = 0; i < N; i++) {
          #pragma omp cancellation point for
          try { something_that_might_throw (); }
          catch (const std::exception *e) {
            #pragma omp atomic write
            exc = e;
            #pragma omp cancel for
          }
        }
        if (exc) {
          #pragma omp cancel parallel
        }
      }
      if (exc) {
        // throw exc.
      }
    }

    In this case, exceptions are caught in the loop construct and stored atomically into a shared variable. The current thread then continues to the wait at the end of
    the loop construct. Other threads continue executing something_that_might_throw() until that returns. Upon starting the next iteration, however, the cancellation point construct tells
    the compiler to also bypass the rest of the iterations of the worksharing construct.

    Wrap-Up

    That completes this brief walk through the major new OpenMP features Red Hat Enterprise Linux developers can find in Red Hat Developer Toolset 3.0 Beta. We're always happy to receive your feedback and questions, so feel free to add a comment or drop us an email or tweet!

    Last updated: February 26, 2024

    Recent Posts

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    • How to integrate vLLM inference into your macOS and iOS apps

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue