Intel® High Level Synthesis Compiler Standard Edition: Best Practices Guide

ID 683259
Date 12/18/2019
Public
Document Table of Contents

7.2. Component Gets Bad Quality of Results

While there are many reasons why your design achieves a poor quality of results (QoR), bad memory configurations are often an important factor. Review the Component Memory Viewer report in the High-Level Design Reports, and look for stallable arbitration nodes and unexpected RAM utilization.

The information in this section describes some common sources of stallable arbitration nodes or excess RAM utilization.

Component Uses More FPGA Resource Than Expected

By default, the Intel® HLS Compiler Standard Edition tries to optimize your component for the best throughput by trying to maximize the maximum operating frequency (fMAX).

A way to reduce area consumption is to relax the fMAX requirements by setting a target fMAX value with the --clock i++ command option. The HLS compiler can often achieve a higher fMAX than you specify, so when you set a target fMAXr to a lower value than you need, your design might still achieve an acceptable fMAX value, and a design that consumes less area.

Incorrect Bank Bits

If you access parts of an array in parallel (either a single- or multidimensional array), you might need to configure the memory bank selection bits.

See Memory Architecture Best Practices for details about how to configure efficient memory systems.

Conditional Operator Accessing Two Different Arrays of struct Variables

In some cases, if you try to access different arrays of struct variables with a conditional operator, the Intel® HLS Compiler merges the arrays into the same RAM block. You might see stallable arbitration in the Component Memory Viewer because there are not enough Load/Store site on the memory system.

For example, the following code examples show an array of struct variables, a conditional operator that results in stallable arbitration, and a workaround that avoids stallable arbitration.
struct MyStruct {
  float a;
  float b;
}

MyStruct array1[64];
MyStruct array2[64];
The following conditional operator that uses these arrays of struct variables causes stallable arbitration:
MyStruct value = (shouldChooseArray1) ? array1[idx] : array2[idx];
You can avoid the stallable arbitration that the conditional operator causes here by removing the operator and using an explicit if statement instead.
MyStruct value;
if (shouldChooseArray1)
{
    value = array1[idx];
} else
{
    value = array2[idx];
}

File-Scoped Static Variables

The Intel® HLS Compiler Standard Edition supports file-scoped static variables, but any memory attributes that you apply to static arrays work only if the static array is declared within the component function. Memory attributes applied to file-scope static variables are ignored. Memory attributes applied to a variable are also ignored if you attempt to apply attributes to a array members in a struct or class definition.

If you want to override the default memory settings for an array variable, ensure that the array variable is declared in the scope of the component function where the array variable is used. You can pass pointers to the static array to any subroutines that might access the static array.

This code change is shown in the following example. The code samples and high-level design report views that follow compare two implementations of a component that reads data from a stream into a local memory, then processes the data that is in that local memory.

In the first code example, the local memory is a file-scoped static variable. In the second code example, the local memory is a function-scoped static variable.

The second code example gets better QoR because you can apply memory optimization attributes to the static variable declaration. In this second example, the hls_memory and hls_numbanks(1) attributes force the static array into a single bank of on-chip RAM blocks.

Figure 21. Example 1: File-scoped Static Variable
hls_memory hls_numbbanks(1) static int myStaticArray[64];

void loadData(ihc::stream_in<int> &intStreamIn)
{
	for(int idx = 0; idx < 64; idx++)
	{
		myStaticArray[idx] = intStreamIn.read();
	}
}

int findMax()
{
	int maxVal = 0;
	for(int idx = 0; idx < 64; idx++)
	{
		int val = myStaticArray[idx];
		if (val > maxVal)
		{
			maxVal = val;
		}
	}
	
	return maxVal;
}

component
int dut(ihc::stream_in<int> &intStreamIn)
{
	loadData(intStreamIn);
	return findMax();
}

Figure 22. Example 2: Function-scoped Static Variable
void loadData(ihc::stream_in<int> &intStreamIn, int myStaticArray[64])
{
	for(int idx = 0; idx < 64; idx++)
	{
		myStaticArray[idx] = intStreamIn.read();
	}
}

int findMax(int myStaticArray[64])
{
	int maxVal = 0;
	for(int idx = 0; idx < 64; idx++)
	{
		int val = myStaticArray[idx];
		if (val > maxVal)
		{
			maxVal = val;
		}
	}
	
	return maxVal;
}

component
int dut(ihc::stream_in<int> &intStreamIn)
{
	hls_memory hls_numbbanks(1) static int myStaticArray[64];

	loadData(intStreamIn, myStaticArray);
	return findMax(myStaticArray);
}

Cluster Logic

Your design might consume more RAM blocks than you expect, especially if you store many array variables in large registers. The Area Analysis of System report in the high-level design report (report.html) can help find this issue.

The three matrices are stored intentionally in RAM blocks, but the RAM blocks for the matrices account for less than half of the RAM blocks consumed by the component.

If you look further down the report, you might see that many RAM blocks are consumed by Cluster logic or State variable. You might also see that some of your array values that you intended to be stored in registers were instead stored in large numbers of RAM blocks.

Notice the number of RAM blocks that are consumed by Cluster Logic and State.

In some cases, you can reduce this RAM block usage by with the following techniques:
  • Pipeline loops instead of unrolling them.
  • Storing local variables in local RAM blocks (hls_memory memory attribute) instead of large registers (hls_register memory attribute).