3.1.5. Pass-by-Value Interface

Intel® High Level Synthesis Compiler Standard Edition: Best Practices Guide

Download PDF

ID 683259

Date 12/18/2019

Version 19.1

Public

3.1.5. Pass-by-Value Interface

For software developers accustomed to writing code that targets a CPU, passing each element in an array by value might be unintuitive because it typically results in many function calls or large parameters. However, for code targeting an FPGA, passing array elements by value can result in smaller and simpler hardware on the FPGA.

The vector addition example can be coded to pass the vector array elements by value as follows. A struct is used because we want to pass the entire array (of 8 data elements) by value.

struct int_v8 {
  int data[8];
};
component int_v8 vector_add(
    int_v8 a,
    int_v8 b) {
  int_v8 c; 
  #pragma unroll 8
  for (int i = 0; i < 8; ++i) {
    c.data[i] =   a.data[i]  
                + b.data[i];
  }
  return c;
}

This component takes and processes only eight elements of vector a and vector b, and returns eight elements of vector c. To compute 1024 elements for the example, the component needs to be called 128 times (1024/8). While in previous examples the component contained loops that were pipelined, here the component is invoked many times, and each of the invocations are pipelined.

The following diagram shows the Component Viewer report generated when you compile this example.

Figure 5. Component View of vector_add Component with Pass-By-Value Interface

The latency of this component is one, and it has a loop initiation interval (II) of one.

Compiling this component with an Intel® Quartus® Prime compilation flow targeting an Intel® Arria® 10 device results in the following QoR metrics:

Table 6. QoR Metrics Comparison for Pass-by-Value Interface¹
QoR Metric	Pointer	Avalon® MM Master	Avalon® MM Slave	Avalon® ST	Pass-by-Value
ALMs	15593.5	643	490.5	314.5	130
DSPs	0	0	0	0	0
RAMs	30	0	48	0	0
f_MAX (MHz)²	298.6	472.37	498.26	389.71	581.06
Latency (cycles)	24071	142	139	134	128
Initiation Interval (II) (cycles)	~508	1	1	1	1

¹The compilation flow used to calculate the QoR metrics used Intel® Quartus® Prime Pro Edition Version 17.1.

²The f_MAX measurement was calculated from a single seed.

The QoR metrics for the vector_add component with a pass-by-value interface shows fewer ALM used, a high component f_MAX, and optimal values for latency and II. In this case, the II is the same as the component invocation interval. A new invocation of the component can be launched every clock cycle. With a initiation interval of 1, 128 component calls are processed in 128 cycles so the overall latency is 128.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® High Level Synthesis Compiler Standard Edition: Best Practices Guide

3.1.5. Pass-by-Value Interface