ID 683846
Date 6/21/2022
Public

5.2.13. Specifying Number of SIMD Work-Items

You have the option to increase the data-processing efficiency of an OpenCL™ kernel by executing multiple work-items in a single instruction multiple data (SIMD) manner without manually vectorizing your kernel code. Specify the number of work-items within a work-group that the Intel® FPGA SDK for OpenCL™ Offline Compiler should execute in an SIMD or vectorized manner.
Important: Introduce the num_simd_work_items(N) attribute in conjunction with the reqd_work_group_size(X, Y, Z) attribute. The num_simd_work_items attribute you specify must evenly divide the first argument that you specify to the cl::reqd_work_group_size attribute.
To specify the number of SIMD work-items in a work-group, insert the num_simd_work_item(N) attribute in the kernel source code.
For example, the code fragment below assigns a fixed work-group size of 64 work-items to a kernel. It then consolidates the work-items within each work-group into four SIMD vector lanes:
__attribute__((num_simd_work_items(4)))
__attribute__((reqd_work_group_size(64,1,1)))
__kernel void test(__global const float * restrict a,
__global const float * restrict b,
}