Intel® High Level Synthesis Compiler Standard Edition: Best Practices Guide

ID 683259
Date 12/18/2019
Document Table of Contents

3.1.5. Pass-by-Value Interface

For software developers accustomed to writing code that targets a CPU, passing each element in an array by value might be unintuitive because it typically results in many function calls or large parameters. However, for code targeting an FPGA, passing array elements by value can result in smaller and simpler hardware on the FPGA.

The vector addition example can be coded to pass the vector array elements by value as follows. A struct is used because we want to pass the entire array (of 8 data elements) by value.
struct int_v8 {
  int data[8];
component int_v8 vector_add(
    int_v8 a,
    int_v8 b) {
  int_v8 c; 
  #pragma unroll 8
  for (int i = 0; i < 8; ++i) {[i] =[i]  
  return c;

This component takes and processes only eight elements of vector a and vector b, and returns eight elements of vector c. To compute 1024 elements for the example, the component needs to be called 128 times (1024/8). While in previous examples the component contained loops that were pipelined, here the component is invoked many times, and each of the invocations are pipelined.

The following diagram shows the Component Viewer report generated when you compile this example.
Figure 5. Component View of vector_add Component with Pass-By-Value Interface

The latency of this component is one, and it has a loop initiation interval (II) of one.
Compiling this component with an Intel® Quartus® Prime compilation flow targeting an Intel® Arria® 10 device results in the following QoR metrics:
Table 6.  QoR Metrics Comparison for Pass-by-Value Interface1
QoR Metric Pointer Avalon® MM Master Avalon® MM Slave Avalon® ST Pass-by-Value
ALMs 15593.5 643 490.5 314.5 130
DSPs 0 0 0 0 0
RAMs 30 0 48 0 0
fMAX (MHz)2 298.6 472.37 498.26 389.71 581.06
Latency (cycles) 24071 142 139 134 128
Initiation Interval (II) (cycles) ~508 1 1 1 1
1The compilation flow used to calculate the QoR metrics used Intel® Quartus® Prime Pro Edition Version 17.1.
2The fMAX measurement was calculated from a single seed.
The QoR metrics for the vector_add component with a pass-by-value interface shows fewer ALM used, a high component fMAX, and optimal values for latency and II. In this case, the II is the same as the component invocation interval. A new invocation of the component can be launched every clock cycle. With a initiation interval of 1, 128 component calls are processed in 128 cycles so the overall latency is 128.