4.1.1. Characteristics of Channels and Pipes
The default behavior of channels is blocking. The default behavior of pipes is nonblocking.
Concurrent Execution of Multiple OpenCL Kernels
You can execute multiple OpenCL kernels concurrently. To enable concurrent execution, modify the host code to instantiate multiple command queues. Each concurrently executing kernel is associated with a separate command queue.
The OpenCL pipe modifications outlined in Ensuring Compatibility with Other OpenCL SDKs in the Intel® FPGA SDK for OpenCL™ Programming Guide allow you to run your kernel on the SDK. However, they do not maximize the kernel throughput. The OpenCL Specification version 2.0 requires that pipe writes occur before pipe reads so that the kernel is not reading from an empty pipe. As a result, the kernels cannot execute concurrently. Because the Intel® FPGA SDK for OpenCL™ supports concurrent execution, you can modify your host application and kernel program to take advantage of this capability. The modifications increase the throughput of your application; however, you can no longer port your kernel to another SDK. Despite this limitation, the modifications are minimal, and it does not require much effort to maintain both types of code.
To enable concurrent execution of kernels containing pipes, replace the depth attribute in your kernel code with the blocking attribute (that is, __attribute__((blocking))). The blocking attribute introduces a blocking behavior in the read_pipe and write_pipe function calls. The call site blocks kernel execution until the other end of the pipe is ready.
If you add both the blocking attribute and the depth attribute to your kernel, the read_pipe calls only a block when the pipe is empty, and the write_pipe calls only a block when the pipe is full. Blocking behavior causes an implicit synchronization between the kernels, which forces the kernels to run in lock step with each other.
Implicit Kernel Synchronization
Synchronize the kernels implicitly via blocking channel calls or blocking pipe calls. Consider the following examples:
|Kernels with Blocking Channel Call||Kernels with Blocking Pipe Call|
You can synchronize the kernels such that a producer kernel writes data and a consumer kernel reads the data during each loop iteration. If the write_channel_intel or write_pipe call in producer does not write any data, consumer blocks and waits at the read_channel_intel or read_pipe call until producer sends valid data, and vice versa.
Data Persistence Across Invocations
After the write_channel_intel call writes data to a channel or the write_pipe call writes data to a pipe, the data is persistent across work-groups and NDRange invocations. Data that a work-item writes to a channel or a pipe remains in that channel or pipe until another work-item reads from it. In addition, the order of data in a channel or a pipe is equivalent to the sequence of write operations to that channel or pipe, and the order is independent of the work-item that performs the write operation.
For example, if multiple work-items try to access a channel or a pipe simultaneously, only one work-item can access it. The write_channel_intel call or write_pipe call writes the particular work-item data, called DATAX, to the channel or pipe, respectively. Similarly, the first work-item to access the channel or pipe reads DATAX from it. This sequential order of read and write operations makes channels and pipes an effective way to share data between kernels.
Imposed Work-Item Order
The SDK imposes a work-item order to maintain the consistency of the read and write operations for a channel or a pipe.