Intel® FPGA SDK for OpenCL™ Standard Edition: Best Practices Guide

ID 683176
Date 9/24/2018
Public
Document Table of Contents

2.3.1. Loop Analysis Report of an OpenCL Design Example

Loop analysis report of an OpenCL kernel example that includes four loops.

 1  // ND-Range kernel with unrolled loops
 2  __attribute((reqd_work_group_size(1024,1,1)))
 3  kernel void t (global int * out, int N) {
 4    int i = get_global_id(0);
 5    int j = 1;
 6    for (int k = 0; k < 4; k++) {
 7      #pragma unroll
 8      for (int n = 0; n < 4; n++) {
 9        j += out[k+n];
10      }
11    }
12    out[i] = j;
13
14    int m = 0;
15    #pragma unroll 1
16    for (int k = 0; k < N; k++) {
17      m += out[k/3];
18    }
19    #pragma unroll
20    for (int k = 0; k < 6; k++) {
21      m += out[k];
22    }
23    #pragma unroll 2
24    for (int k = 0; k < 6; k++) {
25      m += out[k];
26    }
27    out[2] = m;
28  }

The loop analysis report of this design example highlights the unrolling strategy for the different kinds of loops defined in the code.

Figure 15. Loop Analysis Report of an OpenCL Design Example with Four Loops

The executes the following loop unrolling strategies based on the source code:

  • Fully unrolls the first loop (line 6) automatically
  • Fully unrolls the inner loop (line 8) within the first loop because of the #pragma unroll specification
  • Does not unroll the second outer loop, Block2 (line 16), because of the #pragma unroll 1 specification
  • Fully unrolls the third outer loop (line 20) because of the #pragma unroll specification
  • Unrolls the fourth outer loop, Block4 (line 24), twice because of the #pragma unroll 2 specification