Visible to Intel only — GUID: ewa1403551727147
Ixiasoft
Visible to Intel only — GUID: ewa1403551727147
Ixiasoft
5.10.1. Inferring a Shift Register
Consider the following code example:
channel int in, out;
#define SIZE 512
//Shift register size must be statically determinable
__kernel void foo()
{
int shift_reg[SIZE];
//The key is that the array size is a compile time constant
// Initialization loop
#pragma unroll
for (int i=0; i < SIZE; i++)
{
//All elements of the array should be initialized to the same value
shift_reg[i] = 0;
}
while(1)
{
// Fully unrolling the shifting loop produces constant accesses
#pragma unroll
for (int j=0; j < SIZE–1; j++)
{
shift_reg[j] = shift_reg[j + 1];
}
shift_reg[SIZE – 1] = read_channel_intel(in);
// Using fixed access points of the shift register
int res = (shift_reg[0] + shift_reg[1]) / 2;
// ‘out’ channel will have running average of the input channel
write_channel_intel(out, res);
}
}
In each clock cycle, the kernel shifts a new value into the array. By placing this shift register into a block RAM, the Intel® FPGA SDK for OpenCL™ Offline Compiler can efficiently handle multiple access points into the array. The shift register design pattern is ideal for implementing filters (for example, image filters like a Sobel filter or time-delay filters like a finite impulse response (FIR) filter).
When implementing a shift register in your kernel code, keep in mind the following key points:
- Unroll the shifting loop so that it can access every element of the array.
- All access points must have constant data accesses. For example, if you write a calculation in nested loops using multiple access points, unroll these loops to establish the constant access points.
- Initialize all elements of the array to the same value. Alternatively, you may leave the elements uninitialized if you do not require a specific initial value.
- If some accesses to a large array are not inferable statically, they force the offline compiler to create inefficient hardware. If these accesses are necessary, use __local memory instead of __private memory.
- Do not shift a large shift register conditionally. The shifting must occur in very loop iteration that contains the shifting code to avoid creating inefficient hardware.
Did you find the information on this page useful?
Feedback Message
Characters remaining: