Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 6/21/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.4.3. Multiple Work-Item Ordering for Channels

The OpenCL™ specification does not define a work-item ordering. The Intel® FPGA SDK for OpenCL™ enforces a work-item order to make it easier to rationalize channel read and write operations.

Multiple work-item accesses to a channel can be useful in some scenarios. For example, they are useful when data words in the channel are independent, or when the channel is implemented for control logic. The main concern regarding multiple work-item accesses to a channel is the order in which the kernel writes data to and reads data from the channel. If possible, the SDK's channels extension processes work-item read and write operations to the channel in a deterministic order. As such, the read and write operations remain consistent across kernel invocations.

Requirements for Deterministic Multiple Work-Item Ordering

To guarantee deterministic ordering, the SDK checks that a channel access is work-item invariant based on the following characteristics:

  • All paths through the kernel must execute the channel access.
  • If the first requirement is not satisfied, none of the branch conditions that reach the channel call should execute in a work-item-dependent manner.
  • The kernel is not inferred as a single work-item kernel.

If the SDK cannot guarantee deterministic ordering of multiple work-item accesses to a channel, it warns you that the channels might not have well-defined ordering and therefore might exhibit nondeterministic execution. Primarily, the SDK fails to provide deterministic ordering if you have work-item-variant code on loop executions with channel calls, as illustrated below:

__kernel void ordering (__global int * restrict check,
                        __global int * restrict data) {
  int condition = check[get_global_id(0)];

  if (condition) {
    for (int i = 0; i < N, i++) {
      process(data);
      write_channel_intel (req, data[i]);
    }
  }
  else {
    process(data);
  }
}