Intel® High Level Synthesis Compiler Standard Edition: Best Practices Guide

ID 683259
Date 12/18/2019
Public
Document Table of Contents

3.1.1. Pointer Interfaces

Software developers accustomed to writing code that targets a CPU might first try to code this algorithm by declaring vectors a, b, and c as pointers to get the data in and out of the component.
Using pointers in this way results in a single Avalon Memory-Mapped (MM) Master interface that the three input variables share.

Pointers in a component are implemented as Avalon® Memory Mapped ( Avalon® MM) master interfaces with default settings. For more details about pointer parameter interfaces, see Intel HLS Compiler Default Interfaces in Intel® High Level Synthesis Compiler Standard Edition Reference Manual.

The vector addition component example with pointer interfaces can be coded as follows:
component void vector_add(int* a,
                          int* b,
                          int* c,
                          int N) {
  #pragma unroll 8
  for (int i = 0; i < N; ++i) {
    c[i] = a[i] + b[i];
  }
}
The following diagram shows the Component Viewer report generated when you compile this example. Because the loop is unrolled by a factor of 8, the diagram shows that vector_add.B2 has 8 loads for vector a, 8 loads for vector b, and 8 stores for vector c. In addition, all of the loads and stores are arbitrated on the same memory, resulting in inefficient memory accesses.
Figure 1. Component View of vector_add Component with Pointer Interfaces


The following Loop Analysis report shows that the component has an undesirably high loop initiation interval (II). The II is high because vectors a, b, and c are all accessed through the same Avalon MM Master interface. The Intel® HLS Compiler uses stallable arbitration logic to schedule these accesses, which results in poor performance and high FPGA area use.

In addition, the compiler cannot assume there are no data dependencies between loop iterations because pointer aliasing might exist. The compiler cannot determine that vectors a, b, and c do not overlap. If data dependencies exist, the Intel® HLS Compiler cannot pipeline the loop iterations effectively.



Compiling the component with an Intel® Quartus® Prime compilation flow targeting an Intel® Arria® 10 device results in the following QoR metrics, including high ALM usage, high latency, high II, and low fMAX. All of which are undesirable properties in a component.
Table 2.  QoR Metrics for a Component with a Pointer Interface1
QoR Metric Value
ALMs 15593.5
DSPs 0
RAMs 30
fMAX (MHz)2 298.6
Latency (cycles) 24071
Initiation Interval (II) (cycles) ~508
1The compilation flow used to calculate the QoR metrics used Intel® Quartus® Prime Pro Edition Version 17.1.
2The fMAX measurement was calculated from a single seed.