3.1.3. Optimizing Buffer Inference for Channels or Pipes

Intel® FPGA SDK for OpenCL™ Standard Edition: Best Practices Guide

Download PDF

ID 683176

Date 9/24/2018

Version 18.1

Public

3.1.3. Optimizing Buffer Inference for Channels or Pipes

In addition to the manual addition of buffered channels or pipes, the improves kernel throughput by adjusting buffer sizes whenever possible.

During compilation, the offline compiler computes scheduling mismatches between interacting channels or pipes. These mismatches might cause imbalances between read and write operations. The offline compiler performs buffer inference optimization automatically to correct the imbalance.

Consider the following examples:

Kernel with Channels Kernel with Pipes

Table 7. Buffer Inference Optimization for Channels and Pipes
Kernel with Channels	Kernel with Pipes
`__kernel void producer ( __global const uint * restrict src, const uint iterations) { for(int i = 0; i < iteration; i++) { write_channel_intel(c0,src[2i]); write_channel_intel(c1,src[2i+1]); } } __kernel void consumer ( __global uint * restrict dst, const uint iterations) { for(int i = 0; i < iterations; i++) { dst[2i] = read_channel_intel(c0); dst[2i+1] = read_channel_intel(c1); } }`	__kernel void producer ( __global const uint * restrict src, const uint iterations, write_only pipe uint __attribute__((blocking)) c0, write_only pipe uint __attribute__((blocking)) c1) { for(int i = 0; i < iteration; i++) { write_pipe(c0,&src[2i]); write_pipe(c1,&src[2i+1]); } } __kernel void consumer ( __global uint * restrict dst, const uint iterations, read_only pipe uint __attribute__((blocking)) c0, read_only pipe uint __attribute__((blocking)) c1) { for(int i = 0; i < iterations; i++) { read_pipe(c0,&dst[2i]); read_pipe(c1,&dst[2i+1]); } }

__kernel void producer (
  __global const uint * restrict src,
  const uint iterations)
{
  for(int i = 0; i < iteration; i++)
  {
    write_channel_intel(c0,src[2*i]);
    write_channel_intel(c1,src[2*i+1]);
  }
}

__kernel void consumer (
  __global uint * restrict dst,
  const uint iterations)
{
  for(int i = 0; i < iterations; i++)
  {
    dst[2*i] = read_channel_intel(c0);
    dst[2*i+1] = read_channel_intel(c1);
  }
}

__kernel void producer (
  __global const uint * restrict src,
  const uint iterations,
  write_only pipe uint
    __attribute__((blocking)) c0,
  write_only pipe uint
    __attribute__((blocking)) c1)
{
  for(int i = 0; i < iteration; i++)
  {
    write_pipe(c0,&src[2*i]);
    write_pipe(c1,&src[2*i+1]);
  }
}

__kernel void consumer (
  __global uint * restrict dst,
  const uint iterations,
  read_only pipe uint
    __attribute__((blocking)) c0,
  read_only pipe uint
    __attribute__((blocking)) c1)
{
  for(int i = 0; i < iterations; i++)
  {
    read_pipe(c0,&dst[2*i]);
    read_pipe(c1,&dst[2*i+1]);
  }
}

The offline compiler performs buffer inference optimization if channels or pipes between kernels cannot form a cycle. A cycle between kernels is a path that originates from a kernel, through a write channel or a write pipe call, and returns to the original kernel. For the example, assume that the write channel or write pipe calls in the kernel producer are scheduled 10 cycles apart and the read channel or read pipe calls are scheduled 15 cycles apart. There exists a temporary mismatch in the read and write operations to c1 because five extra write operations might occur before a read operation to c1 occurs. To correct this imbalance, the offline compiler assigns a buffer size of five cycles to c1 to avoid stalls. The extra buffer capacity decouples the c1 write operations in the producer kernel and the c1 read operations in the consumer kernel.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® FPGA SDK for OpenCL™ Standard Edition: Best Practices Guide

3.1.3. Optimizing Buffer Inference for Channels or Pipes