Add OpenMP Code to Synchronize the Shared Resources
OpenMP provides several forms of synchronization:
- Acritical sectionprevents multiple threads from accessing the critical section's code at the same time, thus only one active thread can update the data referenced by the code. A critical section may consist of one or more statements. To implement a critical section:
Use the optional named form for a non-nested mutex, such as (C/C++)#pragma omp critical(name)or (Fortran)!$omp critical(name)and!$omp end critical(name). If the optional(name)is omitted, it locks a single unnamed global mutex. The easiest approach is to use the unnamed form unless performance measurement shows this shared mutex is causing unacceptable delays.
- With C/C++:#pragma omp critical
- With Fortran:!$omp criticaland!$omp end critical
- Anatomic operationallows multiple threads to safely update a shared numeric variable on hardware platforms that support its use. An atomic operation applies to only one assignment statement that immediately follows it. To implement an atomic operation:
The statement to be protected must meet certain criteria (see your compiler or OpenMP documentation).
- With C/C++: insert a#pragma omp atomicbefore the statement to be protected.
- With Fortran: insert a!$omp atomicbefore the statement to be protected.
- Locksprovide a low-level means of general-purpose locking. To implement a lock, use the OpenMP types, variables, and functions to provide more flexible and powerful use of locks. For example, use theomp_lock_ttype in C/C++ or thetype=omp_lock_kindin Fortran. These types and functions are easy to use and usually directly replaceIntel Advisorlock annotations.
- Reduction operationscan be used for simple cases, such as incrementing a shared numeric variable or summing an array into a shared numeric variable. To implement a reduction operation, add thereductionclause within a parallel region to instruct the compiler to perform the summation operation in parallel using the specified operation and variable.
- OpenMP provides other synchronization techniques, including specifying abarrierconstruct where threads will wait for each other, anorderedconstruct that ensures sequential execution of a structured block within a parallel loop, andmasterregions that can only be executed by the master thread. For more information, see your compiler or OpenMP documentation.
After you rewrite your code to use OpenMP* parallel framework, you can analyze its performance with
Intel® Advisorperspectives. Use the
Vectorization and Code Insightsperspective to analyze how well you OpenMP code is vectorized or use the
Offload Modelingperspective to model its performance on a GPU.
The following topics briefly describe these forms of synchronization. Check your compiler documentation for details.