Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 6/21/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.10. Inferring a Register

The Intel® FPGA SDK for OpenCL™ Offline Compiler can implement data that is in the private address space in registers or in block RAMs. In general, the offline compiler chooses registers if the access to a variable is fixed and does not require any dynamic indexes. Accessing an array with a variable index usually forces the array into block RAMs. Implementing private data as registers is beneficial for data accesses that should occur in a single cycle (for example, feedback in a single work-item loop).

The offline compiler infers private arrays as registers either as single values or in a piecewise fashion. Piecewise implementation results in very efficient hardware; however, the offline compiler must be able to determine data accesses statically. To facilitate piecewise implementation, hardcode the access points into the array. You can also facilitate register inference by unrolling loops that access the array.

If array accesses are not inferable statically, the offline compiler might infer the array as registers. However, the offline compiler limits the size of these arrays to 64 bytes in length for single work-item kernels. There is effectively no size limit for kernels with multiple work-items.

Consider the following code example:

int array[SIZE];
for (int j = 0; j < N; ++j)
{
    for (int i = 0; i < SIZE - 1; ++i)
    {
        array[i] = array[i + 1];
    }
}

The indexing into array[i] is not inferable statically because the loop is not unrolled. If the size of array[SIZE] is less than or equal to 64 bytes for single work-item kernels, the offline compiler implements array[SIZE] into registers as a single value. If the size of array[SIZE] is greater than 64 bytes for single work-item kernels, the offline compiler implements the entire array in block RAMs. For multiple work-item kernels, the offline compiler implements array[SIZE] into registers as a single value provided that its size is less than 1 kilobyte (KB).