Intel® FPGA SDK for OpenCL™ Standard Edition: Best Practices Guide

ID 683176
Date 9/24/2018
Document Table of Contents

2.8.1. Kernels

The Intel® FPGA SDK for OpenCL™ Offline Compiler compiles a kernel that does not use any built-in work-item functions, such as get_global_id() and get_local_id(), as a single work-item kernel. Otherwise, the offline compiler compiles the kernel as an NDRange kernel.

For more information on built-in work-item functions, refer to section 6.11.1: Work-Item Functions of the OpenCL Specification version 1.0.

For single work-item kernels, the offline compiler attempts to pipeline every loop in the kernel to allow multiple loop iterations to execute concurrently. Kernel performance might degrade if the compiler cannot pipeline some of the loops effectively, or if it cannot pipeline the loops at all.

The offline compiler cannot pipeline loops in NDRange kernels. However, these loops can accept multiple work-items simultaneously. A kernel might have multiple loops, each with nested loops. If you tabulate the total number of iterations of nested loops for each outer loop, kernel throughput is usually reduced by the largest total iterations value that you have tabulated.

To execute an NDRange kernel efficiently, there usually needs to be a large number of threads.