C/C++ OpenMP* and SYCL* Composability
The oneAPI programming model provides a unified compiler based on
LLVM/Clang with support for OpenMP* offload. This allows seamless
integration that allows the use of OpenMP constructs to either
parallelize host side applications or offload to a target device. Both
the Intel® oneAPI DPC++/C++ Compiler, available with the Intel® oneAPI
Base Toolkit, and Intel® C++ Compiler Classic, available with the Intel®
oneAPI HPC Toolkit or the Intel® oneAPI IoT Toolkit, support OpenMP and
SYCL composability with a set of restrictions. A single application can
offload execution to available devices using OpenMP target regions or
SYCL constructs in different parts of the code, such as different
functions or code segments.
OpenMP and SYCL offloading constructs may be used in separate files, in
the same file, or in the same function with some restrictions. OpenMP
and SYCL offloading code can be bundled together in executable files,
in static libraries, in dynamic libraries, or in various combinations.
The SYCL runtime for DPC++ uses the TBB runtime when executing device code on the CPU;
hence, using both OpenMP and SYCL a CPU can lead to
oversubscribing of threads. Performance analysis of workloads
executing on the system could help determine if this is occurring.
Restrictions
There are some restrictions to be considered when mixing OpenMP and
SYCL constructs in the same application.
- OpenMP directives cannot be used inside SYCL kernels that run in the device. Similarly, SYCL code cannot be used inside the OpenMP target regions. However, it is possible to use SYCL constructs within the OpenMP code that runs on the host CPU.
- OpenMP and SYCL device parts of the program cannot have cross dependencies. For example, a function defined in the SYCL part of the device code cannot be called from the OpenMP code that runs on the device and vice versa. OpenMP and SYCL device parts are linked independently and they form separate binaries that become a part of the resulting fat binary that is generated by the compiler.
- The direct interaction between OpenMP and SYCL runtime libraries are not supported at this time. For example, a device memory object created by OpenMP API is not accessible by SYCL code. That is, using the device memory object created by OpenMP in SYCL code results unspecified execution behavior.
Example
The following code snippet uses SYCL and OpenMP offloading
constructs in the same application.
#include <CL/sycl.hpp>
#include <array>
#include <iostream>
float computePi(unsigned N) {
float Pi;
#pragma omp target map(from : Pi)
#pragma omp parallel for reduction(+ : Pi)
for (unsigned I = 0; I < N; ++I) {
float T = (I + 0.5f) / N;
Pi += 4.0f / (1.0 + T * T);
}
return Pi / N;
}
void iota(float *A, unsigned N) {
cl::sycl::range<1> R(N);
cl::sycl::buffer<float, 1> AB(A, R);
cl::sycl::queue().submit([&](cl::sycl::handler &cgh) {
auto AA = AB.template get_access<cl::sycl::access::mode::write>(cgh);
cgh.parallel_for<class Iota>(R, [=](cl::sycl::id<1> I) {
AA[I] = I;
});
});
}
int main() {
std::array<float, 1024u> Vec;
float Pi;
#pragma omp parallel sections
{
#pragma omp section
iota(Vec.data(), Vec.size());
#pragma omp section
Pi = computePi(8192u);
}
std::cout << "Vec[512] = " << Vec[512] << std::endl;
std::cout << "Pi = " << Pi << std::endl;
return 0;
}
The following command is used to compile the example code:
icpx -fsycl -fiopenmp -fopenmp-targets=spir64 offloadOmp_dpcpp.cpp
where
- -fsycloption enables SYCL
- -fiopenmp -fopenmp-targets=spir64option enables OpenMP* offload
The following shows the program output from the example code.
./a.out
Vec[512] = 512
Pi = 3.14159
If the code does not contain OpenMP offload, but only normal OpenMP
code, use the following command, which omits
-fopenmp-targets
:
icpx -fsycl -fiopenmp omp_dpcpp.cpp