Compiling and Running an OpenMP Application
Use the following compiler options to enable OpenMP offload onto
Intel® GPUs. These options apply to both C/C++ and Fortran.
-fiopenmp -fopenmp-targets=spir64
By default the Intel® compiler converts the program into an
intermediate language called SPIR-V and stores that in the binary
produced by the compilation process. The code can be run on any
hardware platform by translating the SPIR-V code into the assembly
code of the platform at runtime. This process is called Just-In-Time
(JIT) compilation.
To enable the output of the compiler optimization report, add the
following options:
-qopt-report=3 -O3
Note:
- The-qopenmpcompiler option is equivalent to-fiopenmp, and the two options can be used interchangeably.
Ahead-Of-Time (AOT) Compilation
For Ahead-Of-Time (AOT) compilation for Arctic Sound, you need to specify an
additional compiler option (
-Xs
), as shown below. This option
applies to both C/C++ and Fortran.-fiopenmp -fopenmp-targets=spir64_gen -Xs "-device ats"
OpenMP Runtime Routines
The following are some device-related runtime routines:
omp_target_alloc
omp_target_free
omp_target_memcpy
The following runtime routines are supported by the Intel® compilers
as Intel® extensions:
omp_target_alloc_host
omp_target_alloc_device
omp_target_alloc_shared
omp_target_free
can be called to free up the memory allocated
using the above Intel® extensions.For a listing of OpenMP features supported in the
icx
, icpx
,
and ifx
compilers, see:Environment Variables
Below are some environment variables that are useful for debugging or
improving the performance of programs.
For additional information on environment variables, see:
LIBOMPTARGET_DEBUG=1
Enables the display of debugging information from libomptarget.so.
LIBOMPTARGET_DEVICES=<DeviceKind>
Controls how sub-devices are exposed to users.
<DeviceKind> := DEVICE | SUBDEVICE | SUBSUBDEVICE |
device | subdevice | subsubdevice
DEVICE/device
: Only top-level devices are reported as OpenMP
devices, and subdevice
clause is supported.SUBDEVICE/subdevice
: Only 1st-level sub-devices are reported as
OpenMP devices, and subdevice
clause is ignored.SUBSUBDEVICE/subsubdevice
: Only second-level sub-devices are
reported as OpenMP devices, and subdevice
clause is ignored. On
Intel® GPU using Level Zero backend, limiting the subsubdevice
to
a single compute slice within a tile also requires setting additional
GPU compute runtime environment variable
CFESingleSliceDispatchCCSMode=1
.The default is equivalent to
<DeviceKind>=device
LIBOMPTARGET_INFO=<Num>
Allows the user to request different types of runtime information from
libomptarget. For details, see:
LIBOMPTARGET_LEVEL0_MEMORY_POOL=<Option>
Controls how reusable memory pool is configured.
<Option> := 0 | <PoolInfoList>
<PoolInfoList> := <PoolInfo>[,<PoolInfoList>]
<PoolInfo> := <MemType>[,<AllocMax>[,<Capacity>[,<PoolSize>]]]
<MemType> := all | device | host | shared
<AllocMax> := positive integer or empty, max allocation size in MB
<Capacity> := positive integer or empty, number of allocations from
a single block
<PoolSize> := positive integer or empty, max pool size in MB
Pool is a list of memory blocks that can serve at least
<Capacity>
allocations of up to <AllocMax>
size from a single block, with total size
not exceeding <PoolSize>
.LIBOMPTARGET_LEVEL0_STAGING_BUFFER_SIZE=<Num>
Sets the staging buffer size to
<Num>
KB. Staging buffer is used
to optimize copy operation between host and device when host memory is
not Unified Shared Memory (USM). The staging buffer is only used for
discrete devices. The default staging buffer size is 16 KB.LIBOMPTARGET_LEVEL_ZERO_COMMAND_BATCH=copy
Enables batching of commands for data transfer in a
target
region.If there are
map(to: )
clauses on a target
construct, then
this environment variable allows multiple data transfers from the host
to the device to occur concurrently. Similarly, if there are
map(from: )
clauses on the target
construct, this environment
variable allows multiple data transfers from the device to the host to
occur concurrently. Note that map(tofrom: )
or map( )
would be
split into map(to: )
and map(from: )
.LIBOMPTARGET_LEVEL_ZERO_USE_IMMEDIATE_COMMAND_LIST=<Bool>
Enables/disables using immediate command list for kernel submission.
<Bool> := 1 | T | t | 0 | F | f
By default, using immediate command list is disabled.
LIBOMPTARGET_PLUGIN=<Name>
Designates the offload plugin name to use.
<Name> := LEVEL0 | OPENCL | X86_64 |
level0 | opencl | x86_64
By default, the offload plugin is LEVEL0.
LIBOMPTARGET_PLUGIN_PROFILE=<Enable>[,<Unit>]
Enables basic plugin profiling and displays the result when the
program finishes.
<Enable> := 1 | T
<Unit> := usec | unit_usec
By default, plugin profiling is disabled.
if
<Unit>
is not specified, microsecond (usec
) is the default unitLIBOMPTARGET_PROFILE=<FileName>
Allows libomptarget.so to generate time profile output similar to
Clang’s
-ftime-trace
option.OMP_TARGET_OFFLOAD=MANDATORY
Specifies that program execution is terminated if a device construct
or device memory routine is encountered and the device is not
available or is not supported by the implementation.
Environment Variables to Control Implicit and Explicit Scaling
To disable implicit scaling and use one GPU tile only, set:
ZE_AFFINITY_MASK=0.0
To enable explicit scaling, set:
LIBOMPTARGET_DEVICES=subdevice
For Ponte Vecchio, implicit scaling is on by default.
Environment Variables for SYCL
There are several
SYCL_PI_LEVEL_ZERO
environment variables that
are useful for the development and debugging of SYCL programs (not
just OpenMP). They are documented at: