Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 3/28/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.2.12. Specifying Number of Compute Units

To increase the data-processing efficiency of an OpenCL™ kernel, you can instruct the Intel® FPGA SDK for OpenCL™ Offline Compiler to generate multiple kernel compute units. Each compute unit is capable of executing multiple work-groups simultaneously.
CAUTION:
Multiplying the number of kernel compute units increases data throughput at the expense of FPGA resource consumption and global memory bandwidth contention between compute units.
To specify the number of compute units for a kernel, insert the num_compute_units(N) attribute in the kernel source code.
For example, the code fragment below directs the offline compiler to instantiate two compute units in a kernel:
__attribute__((num_compute_units(2)))
__kernel void test(__global const float * restrict a,
                   __global const float * restrict b,
                   __global float * restrict answer)
{
   size_t gid = get_global_id(0);
   answer[gid] = a[gid] + b[gid];
}
The offline compiler dynamically distributes work-groups across the specified number of compute units.
Note: To identify the specific compute unit on which a work-item is executing, call the get_compute_id() intrinsic function. Refer to Customization of Replicated Kernels Using the get_compute_id() Function for more information.