Developer Guide

Intel oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs

ID 785441
Date 5/08/2024
Public
Document Table of Contents

Specify Number of SIMD Work Items

You have the option to increase the data-processing efficiency of a SYCL kernel by executing multiple work-items in a single instruction multiple data (SIMD) manner without manually vectorizing your kernel code.

Specify the number of work-items within a work-group that the Intel® oneAPI DPC++/C++ Compiler should execute in a SIMD or vectorized manner.

Deprecation Notice:

The [[cl::reqd_work_group_size(Z, Y, X)]] attribute is deprecated. Use the [[sycl::reqd_work_group_size(Z, Y, X)]] attribute.

To specify the number of SIMD work-items in a work-group, insert the [[intel::num_simd_work_items(N)]] attribute in the kernel source code. The supported values for size N are 2, 4, 8, and 16. Other sizes are accepted, but ignored (no vectorization occurs).

Consider the following example:

cgh.parallel_for<class kernelComputeSIMD>(
  nd_range<1>(range<1>(N), range<1>(REQD_WORK_GROUP_SIZE)),
  [=] (nd_item<id> it)
    [[intel::num_simd_work_items(NUM_SIMD_WORK_ITEMS),
    sycl::reqd_work_group_size(1, 1, REQD_WORK_GROUP_SIZE)]] {
      auto gid = it.get_global_id(0);
      accessorRes[gid] = sycl::sqrt(accessorIdx[gid]);
    }
NOTE:

Introduce the [[intel::num_simd_work_items(N)]] attribute in conjunction with the [[sycl::reqd_work_group_size(Z, Y, X)]] attribute. The [[intel::num_simd_work_items(N)]] attribute you specify must evenly divide the last argument that you specify to the req_work_group_size attribute.

For additional information about [[sycl::reqd_work_group_size(Z, Y, X)]] attribute, refer to Specify a Workgroup Size.