Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 4/01/2024
Public
Document Table of Contents

4.1.3. Avalon® Memory Mapped Agent Memories

Depending on your component, you can sometimes optimize the memory structure of your component by placing your component parameters in Avalon® Memory Mapped ( Avalon® MM) agent memories.

Agent memories are owned by the component and expose an MM agent interface for an MM Host to read from and write to.

When you allocate an agent memory, you must define its size. Defining the size puts a limit on how large a value of N that the component can process. In this example, the RAM size is 1024 words. This RAM size means that N can have a maximal size of 1024 words.

The vector addition component example can be coded with an Avalon® MM agent interface as follows:
component void vector_add(
     hls_avalon_agent_memory_argument(1024*sizeof(int)) int* a,
     hls_avalon_agent_memory_argument(1024*sizeof(int)) int* b,
     hls_avalon_agent_memory_argument(1024*sizeof(int)) int* c,
     int N) {
  #pragma unroll 8
  for (int i = 0; i < N; ++i) {
    c[i] = a[i] + b[i];
  }
}
The following diagram shows the Function View in the System Viewer that is generated when you compile this example.
Figure 26. System Viewer Function View of vector_add Component with Avalon® MM Agent Interface


Compiling this component with an Quartus® Prime compilation flow targeting an Arria® 10 device results in the following QoR metrics:
Table 4.  QoR Metrics Comparison for Avalon® MM Agent Interface1
QoR Metric Pointer Avalon® MM Host Avalon® MM Agent
ALMs 15593.5 643 490.5
DSPs 0 0 0
RAMs 30 0 48
fMAX (MHz)2 298.6 472.37 498.26
Latency (cycles) 24071 142 139
Initiation Interval (II) (cycles) ~508 1 1
1The compilation flow used to calculate the QoR metrics used Quartus® Prime Pro Edition Version 17.1.
2The fMAX measurement was calculated from a single seed.
The QoR metrics show by changing the ownership of the memory from the system to the component, the number of ALMs used by the component are reduced, as is the component latency. The fMAX of the component is increased as well. The number of RAM blocks used by the component is greater because the memory is implemented in the component and not the system. The total system RAM usage (not shown) should not increase because RAM usage shifted from the system to the FPGA RAM blocks.