Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/19/2022
Public
Document Table of Contents

5.4.5.8. Enforcing the Order of Channel Calls

To enforce the order of channel calls, introduce memory fence or barrier functions in your kernel program to control memory accesses. A memory fence function is necessary to create a control flow dependence between the channel accesses before and after the fence.

When the Intel® FPGA SDK for OpenCL™ Offline Compiler generates a compute unit, it does not always create instruction-level parallelism on all instructions that are independent of each other. As a result, channel read and write operations might not execute independently of each other even if there is no control or data dependence between them. When channel calls interact with each other, or when channels write data to external devices, deadlocks might occur.

For example, the code snippet below consists of a producer kernel and a consumer kernel. Channels c0 and c1 are unbuffered channels. The schedule of the channel read operations from c0 and c1 might occur in the reversed order as the channel write operations to c0 and c1. That is, the producer kernel writes to c0 but the consumer kernel might read from c1 first. This rescheduling of channel calls might cause a deadlock because the consumer kernel is reading from an empty channel.

__kernel void producer (__global const uint * src,
                        const uint iterations)
{
    for (int i = 0; i < iterations; i++)
    {
        write_channel_intel(c0, src[2*i]);
        write_channel_intel(c1, src[2*i+1]);
    }
}

__kernel void consumer (__global uint * dst,
                        const uint iterations)
{
    for (int i = 0; i < iterations; i++)
    {
        /*During compilation, the AOC might reorder the way the consumer kernel 
        writes to memory to optimize memory access. Therefore, c1 might be read
        before c0, which is the reverse of what appears in code.*/  

        dst[2*i+1] = read_channel_intel(c0);
        dst[2*i] = read_channel_intel(c1);
    }
}
To prevent deadlocks from occurring by enforcing the order of channel calls, include memory fence functions (mem_fence) in your kernel.
Inserting the mem_fence call with each kernel's channel flag forces the sequential ordering of the write and read channel calls. The code snippet below shows the modified producer and consumer kernels:
channel uint c0 __attribute__((depth(0)));
channel uint c1 __attribute__((depth(0)));

__kernel void producer (__global const uint * src,
                        const uint iterations)
{
    for (int i = 0; i < iterations; i++)
    {
        write_channel_intel(c0, src[2*i]);
        mem_fence(CLK_CHANNEL_MEM_FENCE);
        write_channel_intel(c1, src[2*i+1]);
    }
}

__kernel void consumer (__global uint * dst;
                        const uint iterations)
{
    for (int i = 0; i < iterations; i++)
    {
        dst[2*i+1] = read_channel_intel(c0);
        mem_fence(CLK_CHANNEL_MEM_FENCE);
        dst[2*i] = read_channel_intel(c1);
    }
}

In this example, mem_fence in the producer kernel ensures that the channel write operation to c0 occurs before that to c1. Similarly, mem_fence in the consumer kernel ensures that the channel read operation from c0 occurs before that from c1.

Defining Memory Consistency Across Kernels When Using Channels

According to the OpenCL™ Specification version 1.0, memory behavior is undefined unless a kernel completes execution. A kernel must finish executing before other kernels can visualize any changes in memory behavior. However, kernels that use channels can share data through common global memory buffers and synchronized memory accesses. To ensure that data written to a channel is visible to the read channel after execution passes a memory fence, define memory consistency across kernels with respect to memory fences.
To create a control flow dependency between the channel synchronization calls and the memory operations, add the CLK_GLOBAL_MEM_FENCE flag to the mem_fence call.
For example:

__kernel void producer( __global const uint * src,
                        const uint iterations )
{
    for(int i=0; i < iterations; i++)
    {
        write_channel_intel(c0, src[2*i]);
        mem_fence(CLK_CHANNEL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
        write_channel_intel(c1, src[2*i+1]);
    }
}

In this kernel, the mem_fence function ensures that the write operation to c0 and memory access to src[2*i] occur before the write operation to c1 and memory access to src[2*i+1]. This allows data written to c0 to be visible to the read channel before data is written to c1.