Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/19/2022
Public
Document Table of Contents

4.1.1. Characteristics of Channels and Pipes

To implement channels or pipes in your OpenCL™ kernel program, keep in mind their respective characteristics that are specific to the Intel® FPGA SDK for OpenCL™ .

Default Behavior

The default behavior of channels is blocking. The default behavior of pipes is nonblocking.

Concurrent Execution of Multiple OpenCL Kernels

You can execute multiple OpenCL kernels concurrently. To enable concurrent execution, modify the host code to instantiate multiple command queues. Each concurrently executing kernel is associated with a separate command queue.

Important:

Pipe-specific considerations:

The OpenCL pipe modifications outlined in Ensuring Compatibility with Other OpenCL SDKs in the Intel® FPGA SDK for OpenCL™ Programming Guide allow you to run your kernel on the SDK. However, they do not maximize the kernel throughput. The OpenCL Specification version 2.0 requires that pipe writes occur before pipe reads so that the kernel is not reading from an empty pipe. As a result, the kernels cannot execute concurrently. Because the Intel® FPGA SDK for OpenCL™ supports concurrent execution, you can modify your host application and kernel program to take advantage of this capability. The modifications increase the throughput of your application; however, you can no longer port your kernel to another SDK. Despite this limitation, the modifications are minimal, and it does not require much effort to maintain both types of code.

To enable concurrent execution of kernels containing pipes, replace the depth attribute in your kernel code with the blocking attribute (that is, __attribute__((blocking))). The blocking attribute introduces a blocking behavior in the read_pipe and write_pipe function calls. The call site blocks kernel execution until the other end of the pipe is ready.

If you add both the blocking attribute and the depth attribute to your kernel, the read_pipe calls only a block when the pipe is empty, and the write_pipe calls only a block when the pipe is full. Blocking behavior causes an implicit synchronization between the kernels, which forces the kernels to run in lock step with each other.

Implicit Kernel Synchronization

Synchronize the kernels implicitly via blocking channel calls or blocking pipe calls. Consider the following examples:

Table 14.  Blocking Channel and Pipe Calls for Kernel Synchronization
Kernels with Blocking Channel Call Kernels with Blocking Pipe Call
channel int c0; 

__kernel
void producer (__global int * in_buf) 
{ 
  for (int i = 0; i < 10; i++) 
  {   
    write_channel_intel (c0, in_buf[i]);
  } 
}

__kernel
void consumer (__global int * ret_buf) 
{                 
  for (int i = 0; i < 10; i++) 
  { 
    ret_buf[i] = read_channel_intel(c0);
  }
} 
__kernel
void producer (__global int * in_buf,
  write_only pipe int __attribute__
  ((blocking)) c0) 
{
  for (int i = 0; i < 10; i++) 
{
    write_pipe (c0, &in_buf[i]);
  }
}

__kernel
void consumer (__global int * ret_buf,
  read_only pipe int __attribute__
  ((blocking)) c0) 
{
  for (int i = 0; i < 10; i++) 
  {
    int x;
    read_pipe (c0, &x);
    ret_buf[i] = x;
  }
}

You can synchronize the kernels such that a producer kernel writes data and a consumer kernel reads the data during each loop iteration. If the write_channel_intel or write_pipe call in producer does not write any data, consumer blocks and waits at the read_channel_intel or read_pipe call until producer sends valid data, and vice versa.

Data Persistence Across Invocations

After the write_channel_intel call writes data to a channel or the write_pipe call writes data to a pipe, the data is persistent across work-groups and NDRange invocations. Data that a work-item writes to a channel or a pipe remains in that channel or pipe until another work-item reads from it. In addition, the order of data in a channel or a pipe is equivalent to the sequence of write operations to that channel or pipe, and the order is independent of the work-item that performs the write operation.

For example, if multiple work-items try to access a channel or a pipe simultaneously, only one work-item can access it. The write_channel_intel call or write_pipe call writes the particular work-item data, called DATAX, to the channel or pipe, respectively. Similarly, the first work-item to access the channel or pipe reads DATAX from it. This sequential order of read and write operations makes channels and pipes an effective way to share data between kernels.

Imposed Work-Item Order

The SDK imposes a work-item order to maintain the consistency of the read and write operations for a channel or a pipe.