Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

4.1.5. Avalon® Streaming Interfaces

Avalon® Streaming ( Avalon® ST) interfaces support a unidirectional flow of data, and are typically used for components that drive high-bandwidth and low-latency data.

The vector addition example can be coded with an Avalon® ST interface as follows:
struct int_v8 {
  int data[8];
};
component void vector_add(
    ihc::stream_in<int_v8>&  a,
    ihc::stream_in<int_v8>&  b,
    ihc::stream_out<int_v8>& c,
    int N) {
  for (int j = 0; j < (N/8); ++j) {
    int_v8 av = a.read(); 
    int_v8 bv = b.read(); 
    int_v8 cv; 
    #pragma unroll 8
    for (int i = 0; i < 8; ++i) {
      cv.data[i] = av.data[i] + bv.data[i];
    }
    c.write(cv);
  }
}

An Avalon® ST interface has a data bus, and ready and busy signals for handshaking. The struct is created to pack eight integers so that eight operations at a time can occur in parallel to provide a comparison with the examples for other interfaces. Similarly, the loop count is divided by eight.

The following diagram shows the Function View in the System Viewer that is generated when you compile this example.
Figure 27. System Viewer Function View of vector_add Component with Avalon® ST Interface


The main difference from other versions of the example component is the absence of memory.

The streaming interfaces are stallable from the upstream sources and the downstream output. Because the interfaces are stallable, the loop initiation interval (II) is approximately 1 (instead of exactly 1). If the component does not receive any bubbles (gaps in data flow) from upstream or stall signals from downstream, then the component achieves the desired II of 1.

If you know that the stream interfaces will never stall, you can further optimize this component by taking advantage of the usesReady and usesValid stream parameters.

Compiling this component with an Intel® Quartus® Prime compilation flow targeting an Intel® Arria® 10 device results in the following QoR metrics:
Table 5.  QoR Metrics Comparison for Avalon® ST Interface1
QoR Metric Pointer Avalon® MM Host Avalon® MM Agent Avalon® ST
ALMs 15593.5 643 490.5 314.5
DSPs 0 0 0 0
RAMs 30 0 48 0
fMAX (MHz)2 298.6 472.37 498.26 389.71
Latency (cycles) 24071 142 139 134
Initiation Interval (II) (cycles) ~508 1 1 1
1The compilation flow used to calculate the QoR metrics used Intel® Quartus® Prime Pro Edition Version 17.1.
2The fMAX measurement was calculated from a single seed.
Moving the vector_add component to an Avalon® ST interface, further improved ALM usage, RAM usage, and component latency. The component II is optimal if there are no stalls from the interfaces.