Intel® FPGA SDK for OpenCL™ Standard Edition: Programming Guide

ID 683342
Date 4/22/2019
Public
Document Table of Contents

12.1.1.1. Overview: Intel FPGA SDK for OpenCL Pipeline Approach

The following figure depicts the architecture of an Intel FPGA SDK for OpenCL pipeline:
Figure 16. Parallel Execution Model of Intel FPGA SDK for OpenCL Pipeline StagesThe operations on the right represent the SDK's pipeline implementation of the OpenCL kernel code on the left. Each yellow box is an operation or data value found in the pipeline. The number associated with each operation represents the number of threads in the pipeline.

Assume each level of operation is one stage in the pipeline. At each stage, the Intel® FPGA SDK for OpenCL™ Offline Compiler executes all operations in parallel by the thread existing at that stage. For example, thread 2 executes Load A, Load B, and copies the current global ID (via gid) to the next pipeline stage. Similar to the pipelined execution on instructions in reduced instruction set computing (RISC) processors, the SDK's pipeline stages also execute in parallel. The threads will advance to the next pipeline stage only after all the stages have completed execution.

Some operations are capable of stalling the Intel FPGA SDK for OpenCL pipeline. Examples of such operations include variable latency operations like memory load and store operations. To support stalls, ready and valid signals need to propagate throughout the pipeline so that the offline compiler can schedule the pipeline stages. However, ready signals are not necessary if all operations have fixed latency. In these cases, the offline compiler optimizes the pipeline to statically schedule the operations, which significantly reduces the logic necessary for pipeline implementation.