Intel® High Level Synthesis Compiler Pro Edition: Reference Manual

ID 683349
Date 3/28/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

A.2. Matrix Multiplication Library

The matrix multiplication source code library provided with the Intel® HLS Compiler Pro Edition gives you an FPGA-optimized templatized source code library to perform matrix multiplication of two matrices stored in a 2-D array.

When you use the matrix multiplication library, you can affect the number of DSP blocks and RAM blocks by controlling the dot product vector size and the number of matrix elements read at one time. Increasing the dot product vector size can achieve better latency, but at the cost of using more DSP blocks and other FPGA resources.

Header File

To include the matrix multiplication library in your component, add the following line to your component:
#include "HLS/matrix_mult.h"

The header file is self-documented. You can review the header file to learn how to use the matrix multiplication library in your component.

Template Arguments

The matrix multiplication library multiplies two 2-D matrices, A and B. The resulting product is returned in a third matrix, C. The matrix multiplication library has the following template arguments:
T
The data type of the matrix elements (For example, int, float, long, double).
t_rowsA
The number of rows in matrix A.
t_colsA
The number of columns in matrix A. This value also the number of rows in matrix B.
t_colsB
The number of columns in matrix B.
DOT_VEC_SIZE
The number of DSP blocks to use in a single computation. This value must be a factor of t_colsA.

You can achieve better component latency by increasing this value. However, you use more FPGA area to achieve this. Keeping this value low lowers your FPGA resource usage, but increases the latency.

BLOCK_SIZE
The number of elements to read at one time from matrix A. The default value of BLOCK_SIZE is the value of DOT_VEC_SIZE. You can reduce this number if the bandwidth needed by matrix A is lower than the value of DOT_VEC_SIZE, but it must remain a factor of DOT_VEC_SIZE.
RUNNING_SUM_MULT_L
This parameter can be adjusted to try and improve the fMAX of a component that uses this library. Review the header file for a detailed description of this argument and its effects.