C/C++ or Fortran with OpenMP* Offload Programming Model
The Intel® oneAPI DPC++/C++ Compiler and the Intel® Fortran Compiler enable software developers to use OpenMP* directives to offload work to Intel accelerators to improve the performance of applications.
The OpenMP target construct is used to transfer control from the host to the target device. Variables are mapped between the host and the target device. The host thread waits until the offloaded computations are complete. Other OpenMP tasks may be used for asynchronous execution on the host; use the nowait clause to specify that the encountering thread does not wait for the target region to complete.
The C++ code snippet below targets a SAXPY computation to the accelerator.
#pragma omp target map(tofrom:fa), map(to:fb,a)
#pragma omp parallel for firstprivate(a)
for(k=0; k<FLOPS_ARRAY_SIZE; k++)
fa[k] = a * fa[k] + fb[k]
Array fa is mapped both to and from the accelerator since fa is both input to and output from the calculation. Array fb and the variable a are required as input to the calculation and are not modified, so there is no need to copy them out. The variable FLOPS_ARRAY_SIZE is implicitly mapped to the accelerator. The loop index k is implicitly private according to the OpenMP specification.
This Fortran code snippet targets a matrix multiply to the accelerator.
!$omp target map(to: a, b ) map(tofrom: c )
!$omp parallel do private(j,i,k)
c(i,j) = c(i,j) + a(i,k) * b(k,j)
!$omp end parallel do
!$omp end target
Arrays a and b are mapped to the accelerator, while array c is both input to and output from the accelerator. The variable n is implicitly mapped to the accelerator. The private clause is optional since loop indices are automatically private according to the OpenMP specification.
To optimize data sharing between the host and the accelerator, the target data directive maps variables to the accelerator and the variables remain in the target data region for the extent of that region. This feature is useful when mapping variables across multiple target regions.
#pragma omp target data [clause[[,] clause],...]
!$omp target data [clause[[,] clause],...]
!$omp end target data
The clauses can be one or more of the following. See TARGET DATA for more information.
IF ([TARGET DATA:] scalar-logical-expression)
Matrix Multiplication is a simple program that multiplies together two large matrices and verifies the results. This program is implemented using two ways: SYCL* and OpenMP.
The ISO3DFD OpenMP Offload sample references three-dimensional finite-difference wave propagation in isotropic media. ISO3DFD is a three-dimensional stencil to simulate a wave propagating in a 3D isotropic medium and shows some common challenges and techniques when targeting OpenMP Offload devices in more complex applications to achieve good performance.
openmp_reduction is a simple program that calculates pi. This program is implemented using C++ and OpenMP for CPUs and accelerators based on Intel® Architecture.
LLVM/OpenMP Runtimes describes the distinct types of runtimes available and can be helpful when debugging OpenMP offload.
Offload and Optimize OpenMP* Applications with Intel Tools <https://www.intel.com/content/www/us/en/developer/tools/oneapi/training/offload-optimize-openmp-applications.html>`_ describes how to use OpenMP* directives to add parallelism to your application.