Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

OpenMP 4.0 support in Developer Toolset 3 Beta -- Parallel programming extensions for today's architectures

September 30, 2014
Jakub Jelínek Matt Newsome
Related topics:
Developer tools
Related products:
Developer Toolset

    In this article, we'll take a look at the OpenMP parallel programming extensions to C, C++ and Fortran - OpenMP 4.0. These are available out of the box in GCC v4.9.1, available to Red Hat Enterprise Linux developers via Red Hat Developer Toolset v3.0 (currently at beta release).

    For a thorough backgrounder in parallelism and concurrency programming concepts, see Torvald Riegel's earlier articles (part 1 and part 2). In this article, we'll instead dig into the nuts and bolts of what OpenMP v4 provides to developers, and how it works in practice in GCC.

    OpenMP v4.0

    The OpenMP 4.0 standard was released in July 2013 and includes various enhancements compared to the OpenMP v3.1 support shipped in RHEL 7's system compiler and in Developer Toolset v2.1 and earlier. These enhancements include SIMD constructs, device constructs, enhanced CPU affinity support, task dependencies, task groups, user defined reductions, construct cancellation and various other smaller changes. We'll talk about each of these enhancements in turn below.

    As with older versions of OpenMP, to enable OpenMP support in GCC one should use the -fopenmp compiler option during both compilation and linking.

    SIMD

    SIMD constructs were added to help the compiler vectorize performance critical loops. For example, in the following testcase:

    int foo (int *p, int *q) {
      int i, r = 0;
      #pragma omp simd reduction(+:r) aligned(p,q:32)
      for (i = 0; i < 1024; i++) {
        p[i] = q[i] * 2;
        r += p[i];
      }
      return r;
    }

    the new pragma directive tells the compiler that there are no loop-carried lexical backward data dependencies which would prevent the vectorization, hints that both "p" and "q" pointers are 32-byte aligned and requests the "r" variable to be privatized and used to compute a reduction (each SIMD lane will compute its own sum, and at the end those results are combined).

    The SIMD constructs can be combined with various other constructs, so a loop can be e.g. parallelized and vectorized at the same time, and one can declare certain functions with additional pragma to request creation of extra version(s) which will process multiple arguments simultaneously.

    #pragma omp declare simd simdlen(8) notinbranch uniform(y)
    int bar (int x, int y) { return x * y; }
    int foo (int *p, int *q) {
      int i, r = 0;
      #pragma omp parallel for simd reduction(+:r) aligned(p, q:32) schedule(static, 32)
      for (i = 0; i < 1024; i++) {
        p[i] = bar (q[i], 2);
        r += p[i];
      }
      return r;
    }

    In the above example for i?86/x86_64 architectures, GCC 4.9 creates 3 extra versions of bar, one for SSE2, one for AVX and one for AVX2, which can process 8 "x" values in one call, passed in a vector register(s). "y" is passed in as a scalar, the return value is again a vector. The combined constructs parallelizes the loop with 32 iteration chunks spread across CPU threads, and each chunk is then vectorized.

    Device Constructs

    Device constructs allow offloading of certain regions of code to specialized acceleration devices. In GCC 4.9, the OpenMP 4.0 device constructs are recognized, but no acceleration devices are supported yet, so the regions are executed using host fallback on the host CPU, but there is work underway in the upstream GCC project for GCC 5 to support offloading e.g. on Intel MIC accelerator cards, NVidia PTX and eventually AMD HSA too.

    CPU affinity

    GCC 4.8 and earlier supported CPU affinity to some extent, e.g. through the GOMP_CPU_AFFINITY environment variable and boolean OMP_PROC_BIND environment variable, but GCC 4.9 offers the much more precise OMP_PROC_BIND algorithm with an OMP_PLACES environment variable allowing description of the CPU topology.

    Task Dependencies and Groups

    Tasks in OpenMP 4.0 have been enhanced, so that it is possible to describe dependencies between child tasks of the same parent task, e.g. where a variable is shared by a number of tasks, and one of those tasks needs to wait until all tasks writing
    to that variable are complete. Tasks can be also grouped into task groups, where the end of the task group region waits for all the tasks from the task group to themselves complete.

    subroutine dep
      integer :: x
      x = 1
      !$omp parallel
        !$omp single
          !$omp taskgroup
            !$omp task shared (x) depend(out: x)
              x = 2
            !$omp end task
            !$omp task shared (x) depend(in: x)
              if (x.ne.2) call abort
            !$omp end task
            !$omp task shared (x) depend(in: x)
              if (x.ne.2) call abort
            !$omp end task
          !$omp end taskgroup
        !$omp end single
      !$omp end parallel
    end subroutine dep
    

    Here, the first task is a writer to x, and the other two tasks can't be scheduled until it is complete, while the other two tasks can be run simultaneously.

    User-Defined Reductions

    In OpenMP 3.0, only basic arithmetic was possible in C/C++ reductions (and in Fortran with a couple of extra intrinsics). OpenMP 3.1 added new support for min and max intrinsic reductions for C/C++ developers. In OpenMP 4.0, however, users can define their own reductions for both arithmetic types and classes or structures, by specifying a combiner operation as well as, optionally, an initializer operation. For example:

    struct S
    {
      int s;
      void foo (S &x) { s += x.s; }
      S (const S &x) { s = 0; }
      S () { s = 0; }
      ~S ();
    };
    
    #pragma omp declare reduction (foo: S: omp_out.foo (omp_in)) 
    initializer (omp_priv (omp_orig))

    defines a user defined foo reduction on class S. When this is used as:

    int bar ()
    {
      S s;
      #pragma omp parallel for reduction (foo: s)
      for (int i = 0; i < 64; i++)
        s.s += i;
      return s.s;
    }

    each thread will have its own private object, will perform the partial sum on it, and then the foo method will be called for each thread on the original "s" variable with a reference to the private copy of the variable. User defined reductions may, of course, also be used together with SIMD constructs, or device constructs.

    Construct Cancellation

    Some constructs - parallel, for, taskgroup and sections - can be cancelled in OpenMP 4.0, as long as the cancellation construct is lexically within the construct being cancelled and a few other conditions are met. As C++ exceptions must not be thrown through the OpenMP constructs, this can sometimes be useful to avoid doing unnecessary work once some exception has been raised and caught in the region. As an alternative example, when using tasks to search for something, if a particular task succeeds in finding the one required result, it is possible to cancel the entire taskgroup. Here's an example using C++ exceptions:

    void foo () {
      std::exception *exc = NULL;
      #pragma omp parallel shared(exc)
      {
        #pragma omp for
        for (int i = 0; i < N; i++) {
          #pragma omp cancellation point for
          try { something_that_might_throw (); }
          catch (const std::exception *e) {
            #pragma omp atomic write
            exc = e;
            #pragma omp cancel for
          }
        }
        if (exc) {
          #pragma omp cancel parallel
        }
      }
      if (exc) {
        // throw exc.
      }
    }

    In this case, exceptions are caught in the loop construct and stored atomically into a shared variable. The current thread then continues to the wait at the end of
    the loop construct. Other threads continue executing something_that_might_throw() until that returns. Upon starting the next iteration, however, the cancellation point construct tells
    the compiler to also bypass the rest of the iterations of the worksharing construct.

    Wrap-Up

    That completes this brief walk through the major new OpenMP features Red Hat Enterprise Linux developers can find in Red Hat Developer Toolset 3.0 Beta. We're always happy to receive your feedback and questions, so feel free to add a comment or drop us an email or tweet!

    Last updated: February 26, 2024

    Recent Posts

    • Debugging image mode with Red Hat OpenShift 4.20: A practical guide

    • EvalHub: Because "looks good to me" isn't a benchmark

    • SQL Server HA on RHEL: Meet Pacemaker HA Agent v2 (tech preview)

    • Deploy with confidence: Continuous integration and continuous delivery for agentic AI

    • Every layer counts: Defense in depth for AI agents with Red Hat AI

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.