Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/19/2022
Public
Document Table of Contents

5.2.13. Specifying Number of SIMD Work-Items

You have the option to increase the data-processing efficiency of an OpenCL™ kernel by executing multiple work-items in a single instruction multiple data (SIMD) manner without manually vectorizing your kernel code. Specify the number of work-items within a work-group that the Intel® FPGA SDK for OpenCL™ Offline Compiler should execute in an SIMD or vectorized manner.
Important: Introduce the num_simd_work_items(N) attribute in conjunction with the reqd_work_group_size(X, Y, Z) attribute. The num_simd_work_items attribute you specify must evenly divide the first argument that you specify to the cl::reqd_work_group_size attribute.
To specify the number of SIMD work-items in a work-group, insert the num_simd_work_item(N) attribute in the kernel source code.
For example, the code fragment below assigns a fixed work-group size of 64 work-items to a kernel. It then consolidates the work-items within each work-group into four SIMD vector lanes:
__attribute__((num_simd_work_items(4)))
__attribute__((reqd_work_group_size(64,1,1)))
__kernel void test(__global const float * restrict a,
                   __global const float * restrict b,
                   __global float * restrict answer)
{
   size_t gid = get_global_id(0);
   answer[gid] = a[gid] + b[gid];
}
The offline compiler vectorizes the kernel datapath according to the value you specify for num_simd_work_items whenever possible.