Compiling and Running an OpenMP Application

Developer Guide

oneAPI GPU Optimization Guide

Download PDF

ID 771772

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-16A9EABC-7E2C-4CE8-92A0-5DA8843D0B90

View Details

Compiling and Running an OpenMP Application

Use the following compiler options to enable OpenMP offload onto Intel® GPUs. These options apply to both C/C++ and Fortran.

-fiopenmp -fopenmp-targets=spir64

By default the Intel® compiler converts the program into an intermediate language called SPIR-V and stores that in the binary produced by the compilation process. The code can be run on any hardware platform by translating the SPIR-V code into the assembly code of the platform at runtime. This process is called Just-In-Time (JIT) compilation.

To enable the output of the compiler optimization report, add the following options:

-qopt-report=3 -O3

Note:

The -qopenmp compiler option is equivalent to -fiopenmp, and the two options can be used interchangeably.

Ahead-Of-Time (AOT) Compilation

For Ahead-Of-Time (AOT) compilation for Arctic Sound, you need to specify an additional compiler option (-Xs), as shown below. This option applies to both C/C++ and Fortran.

-fiopenmp -fopenmp-targets=spir64_gen -Xs "-device ats"

OpenMP Runtime Routines

The following are some device-related runtime routines:

omp_target_alloc
omp_target_free
omp_target_memcpy

The following runtime routines are supported by the Intel® compilers as Intel® extensions:

omp_target_alloc_host
omp_target_alloc_device
omp_target_alloc_shared

omp_target_free can be called to free up the memory allocated using the above Intel® extensions.

For a listing of OpenMP features supported in the icx, icpx, and ifx compilers, see:

Environment Variables

Below are some environment variables that are useful for debugging or improving the performance of programs.

For additional information on environment variables, see:

LIBOMPTARGET_DEBUG=1

Enables the display of debugging information from libomptarget.so.

LIBOMPTARGET_DEVICES=<DeviceKind>

Controls how sub-devices are exposed to users.

<DeviceKind> := DEVICE | SUBDEVICE | SUBSUBDEVICE |
                device | subdevice | subsubdevice

DEVICE/device: Only top-level devices are reported as OpenMP devices, and subdevice clause is supported.

SUBDEVICE/subdevice: Only 1st-level sub-devices are reported as OpenMP devices, and subdevice clause is ignored.

SUBSUBDEVICE/subsubdevice: Only second-level sub-devices are reported as OpenMP devices, and subdevice clause is ignored. On Intel® GPU using Level Zero backend, limiting the subsubdevice to a single compute slice within a tile also requires setting additional GPU compute runtime environment variable CFESingleSliceDispatchCCSMode=1.

The default is equivalent to <DeviceKind>=device

LIBOMPTARGET_INFO=<Num>

Allows the user to request different types of runtime information from libomptarget. For details, see:

https://openmp.llvm.org/design/Runtimes.html#libomptarget-info

LIBOMPTARGET_LEVEL0_MEMORY_POOL=<Option>

Controls how reusable memory pool is configured.

<Option>       := 0 | <PoolInfoList>
<PoolInfoList> := <PoolInfo>[,<PoolInfoList>]
<PoolInfo>     := <MemType>[,<AllocMax>[,<Capacity>[,<PoolSize>]]]
<MemType>      := all | device | host | shared
<AllocMax>     := positive integer or empty, max allocation size in MB
<Capacity>     := positive integer or empty, number of allocations from
                  a single block
<PoolSize>     := positive integer or empty, max pool size in MB

Pool is a list of memory blocks that can serve at least <Capacity> allocations of up to <AllocMax> size from a single block, with total size not exceeding <PoolSize>.

LIBOMPTARGET_LEVEL0_STAGING_BUFFER_SIZE=<Num>

Sets the staging buffer size to <Num> KB. Staging buffer is used to optimize copy operation between host and device when host memory is not Unified Shared Memory (USM). The staging buffer is only used for discrete devices. The default staging buffer size is 16 KB.

LIBOMPTARGET_LEVEL_ZERO_COMMAND_BATCH=copy

Enables batching of commands for data transfer in a target region.

If there are map(to: ) clauses on a target construct, then this environment variable allows multiple data transfers from the host to the device to occur concurrently. Similarly, if there are map(from: ) clauses on the target construct, this environment variable allows multiple data transfers from the device to the host to occur concurrently. Note that map(tofrom: ) or map( ) would be split into map(to: ) and map(from: ).

LIBOMPTARGET_LEVEL_ZERO_USE_IMMEDIATE_COMMAND_LIST=<Bool>

Enables/disables using immediate command list for kernel submission.

<Bool> := 1 | T | t | 0 | F | f

By default, using immediate command list is disabled.

LIBOMPTARGET_PLUGIN=<Name>

Designates the offload plugin name to use.

<Name> := LEVEL0 | OPENCL | X86_64 |
          level0 | opencl | x86_64

By default, the offload plugin is LEVEL0.

LIBOMPTARGET_PLUGIN_PROFILE=<Enable>[,<Unit>]

Enables basic plugin profiling and displays the result when the program finishes.

<Enable> := 1 | T
<Unit>   := usec | unit_usec

By default, plugin profiling is disabled.

if <Unit> is not specified, microsecond (usec) is the default unit

LIBOMPTARGET_PROFILE=<FileName>

Allows libomptarget.so to generate time profile output similar to Clang’s -ftime-trace option.

OMP_TARGET_OFFLOAD=MANDATORY

Specifies that program execution is terminated if a device construct or device memory routine is encountered and the device is not available or is not supported by the implementation.

Environment Variables to Control Implicit and Explicit Scaling

To disable implicit scaling and use one GPU tile only, set: ZE_AFFINITY_MASK=0.0

To enable explicit scaling, set: LIBOMPTARGET_DEVICES=subdevice

For Ponte Vecchio, implicit scaling is on by default.

Environment Variables for SYCL

There are several SYCL_PI_LEVEL_ZERO environment variables that are useful for the development and debugging of SYCL programs (not just OpenMP). They are documented at:

https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

oneAPI GPU Optimization Guide

Compiling and Running an OpenMP Application

Ahead-Of-Time (AOT) Compilation

OpenMP Runtime Routines

Environment Variables