Developer Guide

Contents

Compiling and Running an OpenMP Application

Use the following compiler options to enable OpenMP offload onto Intel® GPUs. These options apply to both C/C++ and Fortran.
-fiopenmp -fopenmp-targets=spir64
By default the Intel® compiler converts the program into an intermediate language called SPIR-V and stores that in the binary produced by the compilation process. The code can be run on any hardware platform by translating the SPIR-V code into the assembly code of the platform at runtime. This process is called Just-In-Time (JIT) compilation.
To enable the output of the compiler optimization report, add the following options:
-qopt-report=3 -O3
Note:
  • The
    -qopenmp
    compiler option is equivalent to
    -fiopenmp
    , and the two options can be used interchangeably.

Ahead-Of-Time (AOT) Compilation

For Ahead-Of-Time (AOT) compilation for Arctic Sound, you need to specify an additional compiler option (
-Xs
), as shown below. This option applies to both C/C++ and Fortran.
-fiopenmp -fopenmp-targets=spir64_gen -Xs "-device ats"

OpenMP Runtime Routines

The following are some device-related runtime routines:
omp_target_alloc omp_target_free omp_target_memcpy
The following runtime routines are supported by the Intel® compilers as Intel® extensions:
omp_target_alloc_host omp_target_alloc_device omp_target_alloc_shared
omp_target_free
can be called to free up the memory allocated using the above Intel® extensions.
For a listing of OpenMP features supported in the
icx
,
icpx
, and
ifx
compilers, see:

Environment Variables

Below are some environment variables that are useful for debugging or improving the performance of programs.
For additional information on environment variables, see:
LIBOMPTARGET_DEBUG=1
Enables the display of debugging information from libomptarget.so.
LIBOMPTARGET_DEVICES=<DeviceKind>
Controls how sub-devices are exposed to users.
<DeviceKind> := DEVICE | SUBDEVICE | SUBSUBDEVICE | device | subdevice | subsubdevice
DEVICE/device
: Only top-level devices are reported as OpenMP devices, and
subdevice
clause is supported.
SUBDEVICE/subdevice
: Only 1st-level sub-devices are reported as OpenMP devices, and
subdevice
clause is ignored.
SUBSUBDEVICE/subsubdevice
: Only second-level sub-devices are reported as OpenMP devices, and
subdevice
clause is ignored. On Intel® GPU using Level Zero backend, limiting the
subsubdevice
to a single compute slice within a tile also requires setting additional GPU compute runtime environment variable
CFESingleSliceDispatchCCSMode=1
.
The default is equivalent to
<DeviceKind>=device
LIBOMPTARGET_INFO=<Num>
Allows the user to request different types of runtime information from libomptarget. For details, see:
LIBOMPTARGET_LEVEL0_MEMORY_POOL=<Option>
Controls how reusable memory pool is configured.
<Option> := 0 | <PoolInfoList> <PoolInfoList> := <PoolInfo>[,<PoolInfoList>] <PoolInfo> := <MemType>[,<AllocMax>[,<Capacity>[,<PoolSize>]]] <MemType> := all | device | host | shared <AllocMax> := positive integer or empty, max allocation size in MB <Capacity> := positive integer or empty, number of allocations from a single block <PoolSize> := positive integer or empty, max pool size in MB
Pool is a list of memory blocks that can serve at least
<Capacity>
allocations of up to
<AllocMax>
size from a single block, with total size not exceeding
<PoolSize>
.
LIBOMPTARGET_LEVEL0_STAGING_BUFFER_SIZE=<Num>
Sets the staging buffer size to
<Num>
KB. Staging buffer is used to optimize copy operation between host and device when host memory is not Unified Shared Memory (USM). The staging buffer is only used for discrete devices. The default staging buffer size is 16 KB.
LIBOMPTARGET_LEVEL_ZERO_COMMAND_BATCH=copy
Enables batching of commands for data transfer in a
target
region.
If there are
map(to: )
clauses on a
target
construct, then this environment variable allows multiple data transfers from the host to the device to occur concurrently. Similarly, if there are
map(from: )
clauses on the
target
construct, this environment variable allows multiple data transfers from the device to the host to occur concurrently. Note that
map(tofrom: )
or
map( )
would be split into
map(to: )
and
map(from: )
.
LIBOMPTARGET_LEVEL_ZERO_USE_IMMEDIATE_COMMAND_LIST=<Bool>
Enables/disables using immediate command list for kernel submission.
<Bool> := 1 | T | t | 0 | F | f
By default, using immediate command list is disabled.
LIBOMPTARGET_PLUGIN=<Name>
Designates the offload plugin name to use.
<Name> := LEVEL0 | OPENCL | X86_64 | level0 | opencl | x86_64
By default, the offload plugin is LEVEL0.
LIBOMPTARGET_PLUGIN_PROFILE=<Enable>[,<Unit>]
Enables basic plugin profiling and displays the result when the program finishes.
<Enable> := 1 | T <Unit> := usec | unit_usec
By default, plugin profiling is disabled.
if
<Unit>
is not specified, microsecond (
usec
) is the default unit
LIBOMPTARGET_PROFILE=<FileName>
Allows libomptarget.so to generate time profile output similar to Clang’s
-ftime-trace
option.
OMP_TARGET_OFFLOAD=MANDATORY
Specifies that program execution is terminated if a device construct or device memory routine is encountered and the device is not available or is not supported by the implementation.
Environment Variables to Control Implicit and Explicit Scaling
To disable implicit scaling and use one GPU tile only, set:
ZE_AFFINITY_MASK=0.0
To enable explicit scaling, set:
LIBOMPTARGET_DEVICES=subdevice
For Ponte Vecchio, implicit scaling is on by default.
Environment Variables for SYCL
There are several
SYCL_PI_LEVEL_ZERO
environment variables that are useful for the development and debugging of SYCL programs (not just OpenMP). They are documented at:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.