Intel® Fortran Compiler Classic and Intel® Fortran Compiler Developer Guide and Reference

ID 767251
Date 3/22/2024
Public
Document Table of Contents

PARALLEL Directive for OpenMP

OpenMP* Fortran Compiler Directive: Defines a parallel region.

Syntax

!$OMP PARALLEL [clause[[,] clause] ... ]

   loosely-structured-block

!$OMP END PARALLEL

-or-

!$OMP PARALLEL [clause[[,] clause] ... ]

   strictly-structured-block

[!$OMP END PARALLEL]

clause

Is one or more of the following:

  • ALLOCATE ([allocator :] list)

  • COPYIN (list)

  • DEFAULT (PRIVATE | FIRSTPRIVATE | SHARED | NONE)

  • FIRSTPRIVATE (list)

  • IF ([PARALLEL:] scalar-logical-expression)

  • NUM_THREADS ( scalar_integer_expression)

    Specifies the number of threads to be used in a parallel region. The scalar_integer_expression must evaluate to a positive scalar integer value. Only a single NUM_THREADS clause can appear in the directive.

  • PRIVATE (list)

  • PROC_BIND (PRIMARY | MASTER | CLOSE | SPREAD)

    Specifies a method for mapping the threads in the team to the "places" in the current partition.

    Once a thread is assigned to a place, the OpenMP* implementation should not move it to another place.

    PRIMARY instructs the execution environment to assign every thread in the team to the same place as the primary thread. MASTER has been deprecated and replaced by PRIMARY. CLOSE instructs the execution environment to assign the threads to places close to the place of the parent thread. SPREAD creates a sparse distribution for a team of T threads among the P places of the parent's place partition.

    For CLOSE and SPREAD, the threads with the smallest numbers are assigned starting with the place of the primary thread. For CLOSE, threads are packed into consecutive places within the parent's partition. For SPREAD, the partition of the primary thread is sub-divided and threads in the team are assigned round-robin to those subpartitions.

    Only a single PROC_BIND clause can appear in the directive.

  • REDUCTION ([reduction-modifier, ]reduction-identifier : list)

  • SHARED (list)

loosely-structured-block

Is a structured block (section) of statements or constructs. You cannot branch into or out of the block.

strictly-structured-block

Is a Fortran BLOCK construct. You cannot branch into or out of the BLOCK construct.

The PARALLEL and END PARALLEL directive pair must appear in the same routine in the executable section of the code.

The END PARALLEL directive denotes the end of the parallel region. There is an implied barrier at this point. Only the primary thread of the team continues execution at the end of a parallel region.

The number of threads in the team can be controlled by the NUM_THREADS clause, the environment variable OMP_NUM_THREADS, or by calling the runtime library routine OMP_SET_NUM_THREADS from a serial portion of the program.

NUM_THREADS supersedes the OMP_SET_NUM_THREADS routine, which supersedes the OMP_NUM_THREADS environment variable. Subsequent parallel regions, however, are not affected unless they have their own NUM_THREADS clauses.

Once specified, the number of threads in the team remains constant for the duration of that parallel region.

If the dynamic threads mechanism is enabled by an environment variable or a library routine, then the number of threads requested by the NUM_THREADS clause is the maximum number to use in the parallel region.

The code contained within the dynamic extent of the parallel region is executed on each thread, and the code path can be different for different threads.

If a thread executing a parallel region encounters another parallel region, it creates a new team and becomes the primary thread of that new team. By default, nested parallel regions are always serialized and executed by a team of one thread.

Example

You can use the PARALLEL directive in coarse-grain parallel programs. In the following example, each thread in the parallel region decides what part of the global array X upon which to work based on the thread number:

  !$OMP PARALLEL DEFAULT(PRIVATE) SHARED(X,NPOINTS)
        IAM = OMP_GET_THREAD_NUM( )
        NP = OMP_GET_NUM_THREADS( )
        IPOINTS = NPOINTS/NP
        CALL SUBDOMAIN(X,IAM,IPOINTS)
  !$OMP END PARALLEL

Assuming you previously used the environment variable OMP_NUM_THREADS to set the number of threads to six, you can change the number of threads between parallel regions as follows:

        CALL OMP_SET_NUM_THREADS(3)
  !$OMP PARALLEL
  ...
  !$OMP END PARALLEL
        CALL OMP_SET_NUM_THREADS(4)
  !$OMP PARALLEL DO
  ...
  !$OMP END PARALLEL DO