Intel® Advisor User Guide

ID 766448
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Add OpenMP Code to Synchronize the Shared Resources

OpenMP provides several forms of synchronization:

  • A critical section prevents multiple threads from accessing the critical section's code at the same time, thus only one active thread can update the data referenced by the code. A critical section may consist of one or more statements. To implement a critical section:

    • With C/C++: #pragma omp critical

    • With Fortran: !$omp critical and !$omp end critical

    Use the optional named form for a non-nested mutex, such as (C/C++) #pragma omp critical(name) or (Fortran) !$omp critical(name) and !$omp end critical(name). If the optional (name) is omitted, it locks a single unnamed global mutex. The easiest approach is to use the unnamed form unless performance measurement shows this shared mutex is causing unacceptable delays.

  • An atomic operation allows multiple threads to safely update a shared numeric variable on hardware platforms that support its use. An atomic operation applies to only one assignment statement that immediately follows it. To implement an atomic operation:

    • With C/C++: insert a #pragma omp atomic before the statement to be protected.

    • With Fortran: insert a !$omp atomic before the statement to be protected.

    The statement to be protected must meet certain criteria (see your compiler or OpenMP documentation).

  • Locks provide a low-level means of general-purpose locking. To implement a lock, use the OpenMP types, variables, and functions to provide more flexible and powerful use of locks. For example, use the omp_lock_t type in C/C++ or the type=omp_lock_kind in Fortran. These types and functions are easy to use and usually directly replace Intel Advisor lock annotations.

  • Reduction operations can be used for simple cases, such as incrementing a shared numeric variable or summing an array into a shared numeric variable. To implement a reduction operation, add the reduction clause within a parallel region to instruct the compiler to perform the summation operation in parallel using the specified operation and variable.

  • OpenMP provides other synchronization techniques, including specifying a barrier construct where threads will wait for each other, an ordered construct that ensures sequential execution of a structured block within a parallel loop, and master regions that can only be executed by the master thread. For more information, see your compiler or OpenMP documentation.

TIP:
After you rewrite your code to use OpenMP* parallel framework, you can analyze its performance with Intel® Advisor perspectives. Use the Vectorization and Code Insights perspective to analyze how well you OpenMP code is vectorized or use the Offload Modeling perspective to model its performance on a GPU.

The following topics briefly describe these forms of synchronization. Check your compiler documentation for details.