Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/19/2022
Public
Document Table of Contents

2.3. Viewing Throughput Bottlenecks in the Design

The Bottlenecks viewer, when used with the Loop Analysis and Schedule Viewerreports, provides information about the throughput bottlenecks in your design. This viewer lists all loops that result in a bottleneck for the current selected system, kernel, or task. You can select these loops to view more details about the bottleneck in the Details pane. For more information about the concept of bottlenecks, refer to Loop Bottlenecks.

The Bottlenecks viewer identifies the following categories of bottlenecks:

  • FMAX reduced or II increased, or both
  • Compiler applied bottlenecks (private copies set to 1 on local memory)
  • Bottlenecks due to the pragmas or attributes you apply on a loop
  • Concurrency limiter bottlenecks

Here is an example of data dependency:

kernel void lowered_fmax (global int *dst, int N) {
    int res = N;
    #pragma unroll  9
    for (int i = 0; i < N; i++) {
	    res += 1;
	    res ^= i;
    }
    dst[0] = res;
}
The Bottlenecks viewer displays the following message:
9X Partially unrolled lowered_fmax.B1:
Compiler failed to schedule this loop with smaller II due to data dependency on variable(s):
  res (Unknown location)
Most critical loop feedback path during scheduling:
  Number of nodes in critical path exceeded what the compiler has captured. Only the top 19 failing nodes are listed.
    1.00 clock cycle 32-bit Select Operation (fmax_ii.cl: 2, fmax_ii.cl: 6)

In the Bottlenecks viewer, you can then select the loop to display more information in the Details pane, which you can use to investigate why and what caused this bottleneck. For additional information about the bottlenecks, refer to the System Viewer and the Schedule Viewer. The System Viewer provides information about the isolated failing path and bottleneck type. The Schedule Viewer displays the bottleneck path for the variable.