Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 6/21/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.2.1. Unrolling a Loop (unroll Pragma)

Loop unrolling involves replicating a loop body multiple times, and reducing the trip count of a loop. Unroll loops to reduce or eliminate loop control overhead on the FPGA. In cases where there are no loop-carried dependencies and the offline compiler can perform loop iterations in parallel, unrolling loops can also reduce latency and overhead on the FPGA.

To direct the offline compiler to unroll a loop, or explicitly not to unroll a loop, insert an unroll kernel pragma in the kernel code preceding a loop you want to unroll.
Note: Unrolling of nested loops with large bounds might generate huge number of instructions, which can lead to very long compile times.
  • Provide an unroll factor whenever possible. To specify an unroll factor N, insert the #pragma unroll <N> directive before a loop in your kernel code.
    The offline compiler attempts to unroll the loop at most <N> times.
    Consider the code fragment below. By assigning a value of 2 as the unroll factor, you direct the offline compiler to unroll the loop twice.
    #pragma unroll 2
    for(size_t k = 0; k < 4; k++)
    {
       mac += data_in[(gid * 4) + k] * coeff[k];
    }
  • To prevent a loop from unrolling, specify an unroll factor of 1 (that is, #pragma unroll 1).
  • To unroll a loop fully, you may omit the unroll factor by simply inserting the #pragma unroll directive before a loop in your kernel code.
    The offline compiler attempts to unroll the loop fully if it understands the trip count. The offline compiler issues a warning if it cannot execute the unroll request.
    Consider the following code fragment where the unroll factor is 2:
    float data[N];
    #pragma unroll 2
    for (int i = 0; i < N; i++){
      data[i] = function(i, a);
    }

    The offline compiler partially unrolls the loop as shown in the following code fragment:

    float data[N];
    for (int i = 0; i < N; i += 2){
      data[i + 0] = function(i + 0, a);
      if (i + 1 < N){
        data[i + 1] = function(i + 1, a);
      }
    }