Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 9/08/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Explicit SIMD SYCL Extension

oneAPI provides an Explicit SIMD SYCL extension (ESIMD) for lower-level Intel GPU programming.

ESIMD provides APIs that are similar to Intel's GPU Instruction Set Architecture (ISA), but it enables you to write explicitly vectorized device code. This explicit enabling gives you more control over the generated code and allows you to depend less on compiler optimizations.

The specification, API reference, and working code examples are available on GitHub.

NOTE:
Some parts of this extension are under active development and the APIs in the sycl::ext::intel::experimental::esimd package are subject to change. The restrictions are specified below.

ESIMD kernels and functions always require a subgroup size of one, which means that the compiler does not provide vectorization across work items in a subgroup. Instead, you must explicitly express the vectorization in your code. Below is an example that adds the elements of two arrays and writes the results to the third:

float *A = malloc_shared<float>(Size, q);
float *B = malloc_shared<float>(Size, q);
float *C = malloc_shared<float>(Size, q);

for (unsigned i = 0; i != Size; i++) {
  A[i] = B[i] = i;
}

q.submit([&](handler &cgh) {
  cgh.parallel_for<class Test>(
    Size / VL, [=](id<1> i)[[intel::sycl_explicit_simd]] {
    auto offset = i * VL;
    // pointer arithmetic, so offset is in elements:
    simd<float, VL> va(A + offset);
    simd<float, VL> vb(B + offset);
    simd<float, VL> vc = va + vb;
    vc.copy_to(C + offset);
  });
}).wait_and_throw();

In the example above, the lambda function passed to the parallel_for is marked with a special attribute: [[intel::sycl_explicit_simd]]. This attribute tells the compiler that the kernel is ESIMD-based and ESIMD APIs can be used inside it. Here the simd objects and copy_to intrinsics are used. They are available only in the ESIMD extension.

Fully runnable code samples can be found on GitHub.

Compile and Run ESIMD Code

Code that uses the ESIMD extension can be compiled and run using the same commands as you would with standard SYCL:

To compile using the open-source oneAPI DPC++ Compiler:

clang++ -fsycl vadd_usm.cpp

To compile using an Intel® oneAPI Toolkit:

icpx -fsycl vadd_usm.cpp

To run on an Intel specific GPU device, through the oneAPI Level Zero (Level Zero) backend:

SYCL_DEVICE_FILTER=level_zero:gpu ./a.out

The resulting executable ($./a.out) can be run only on Intel GPU hardware, such as Intel® UHD Graphics 600 or later. The SYCL runtime automatically recognizes ESIMD kernels and dispatches their execution, so no additional setup is needed. Both Linux and Windows platforms are supported, including OpenCL™ and Level Zero backends.

ESIMD Emulator

The ESIMD emulator (ESIMD_EMULATOR) provides a feature to execute ESIMD kernels on the host CPU without having an Intel GPU device in the system. It provides you with a way to debug ESIMD code in any debugger. Since the emulator tries to model massively parallel GPU kernel execution on CPU hardware, some differences in your execution profile may happen. Take this into account when debugging. You can redirect execution to the ESIMD emulator by setting an environment variable, no program recompilation is needed. When running a kernel via the emulator, the SYCL runtime sees the emulator as a normal GPU device (example, an is_gpu() test will return true for it).

Due to the specifics of ESIMD programming model, a standard SYCL host device cannot execute ESIMD kernels and needs supporting libraries to emulate barriers and GPU execution threads. It is impractical for the host part of a SYCL ESIMD app to include or link to all the necessary infrastructure components when there is no ESIMD code, or if debugging is not wanted. It is inconvenient or not possible for you to recompile the app with a switch to execute the ESIMD part on a CPU. The environment variable plus a separate back-end solves both problems.

The ESIMD emulator includes these components:

  • The ESIMD emulator plugin, which is a SYCL runtime back-end similar to OpenCL™ or Level Zero.
  • Host implementations of low-level ESIMD intrinsics, for example __esimd_scatter_scaled.
  • The supporting infrastructure linked dynamically to the plugin, for example the libCM library.

ESIMD Emulator Requirements

The ESIMD emulator backend uses a CM_EMU library for emulating GPUs using software multi-threading. The library is provided as separate pre-installed library in host machine, or built as part of the open-source oneAPI DPC++ Compiler. The required version for CM_EMU is 1.0.20 or later. To add the CM_EMU library as part of oneAPI DPC++ Compiler for ESIMD emulator backend, build the library during ESIMD emulator plug-in software module generation. Details on building CM_EMU library for ESIMD emulator, including required packages are described in ESIMD CPU Emulation.

Command Line/Environment Variable Options

There are no special command line options or environment variables required for building and running ESIMD kernels with the ESIMD emulator backend.

Running ESIMD Code in Emulation Mode

The compilation step for ESIMD kernels that are prepared for an ESIMD emulator backend is same as for OpenCL and Level Zero backends. The fully runnable code sample, and other samples, used below can be found on Github.

To compile using the open-source oneAPI DPC++ Compiler, use:

clang++ -fsycl vadd_usm.cpp

To compile using the Intel® oneAPI Toolkits, use:

 icpx -fsycl vadd_usm.cpp

To run under emulation through the ESIMD emulator backend, use:

SYCL_DEVICE_FILTER=ext_intel_esimd_emulator:gpu ./a.out

Code Sample

# Get sources
git clone https://github.com/intel/llvm-test-suite
cd llvm-test-suite
mkdir build && cd build

# Configure for make utility with compiler tools available in $PATH
cmake \
 -DCMAKE_CXX_COMPILER=clang++ \
 -DTEST_SUITE_SUBDIRS=SYCL \
 -DSYCL_BE="ext_intel_esimd_emulator" \
 -DSYCL_TARGET_DEVICES="gpu" \
 ..

# Build and Run
make check

# Or, for Ninja utility
cmake -G Ninja \
 -DCMAKE_CXX_COMPILER=clang++ \
 -DTEST_SUITE_SUBDIRS=SYCL \
 -DSYCL_BE="ext_intel_esimd_emulator" \
 -DSYCL_TARGET_DEVICES="gpu" \
 ..

# Build and Run
ninja check

NOTE:
Only ESIMD kernels are tested with the code sample, due to the following limitations:
  • The emulator is only available on Linux.
  • The emulator has limitation on the number of threads used under Linux. As software multi-threading is used for emulating hardware threads, the number of threads being launched for kernel execution is limited by the max number of threads supported by a Linux host machine.
  • The emulator supports only ESIMD kernels. Kernels written for SYCL cannot run with the ESIMD_EMULATOR backend. Kernels containing both SYCL and ESIMD code cannot run with the ESIMD_EMULATOR, unlike GPU backends like OpenCL™ or Level Zero.
  • The emulator cannot run in parallel with a Host Device.

Restrictions

This section contains lists of the main restrictions that apply when using the ESIMD extension.

NOTE:
Some extensions are not enforced by the compiler, which may lead to undefined program behavior.
  • Features not supported with ESIMD:
  • Unsupported standard SYCL APIs:
    • Local accessors. Local memory is allocated and accessed via explicit device-side APIs.
    • 2D and 3D accessors.
    • Constant accessors.
    • sycl::accessor::get_pointer(). All memory accesses through an accessor are done via explicit APIs. Example: sycl::ext::intel::esimd::block_store(acc, offset)
    • Accessors with offsets and/or access range specified.
    • sycl::sampler and sycl::stream classes.
  • Other restrictions:
    • Only Intel GPU devices are supported.
    • Interoperability between regular SYCL and ESIMD kernels is not yet supported. It is not possible to invoke an ESIMD kernel from SYCL kernel and vice versa.