Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Global Memory Bandwidth Use Calculation

To ensure the global memory bandwidth listed in the board specification file is utilized completely, calculating the kernel bandwidth use is beneficial. The report.html file also displays the kernel bandwidth values in the global memory view of the System Viewer. The following formulas explain how you can calculate this value on a per-LSU basis:

Formulas for Calculating Kernel Bandwidth Use

The LSU bandwidth equation is the minimum of three bottlenecks you need to calculate the use of global memory bandwidth. The remaining equations represent three bottlenecks that can limit the LSU bandwidth. These formulas represent the theoretical maximum bandwidth an LSU may consume, ignoring all other LSUs. The actual bandwidth depends on the LSU's access pattern and the interconnect's arbitration between all LSUs. To get an estimate of the overall bandwidth, a sum of the LSU bandwidths is available in the controller of the global memory view of the System Viewer.

The following table describes the variables used in the above equations:

Variables Used in Calculating Kernel Bandwidth
Variable Description
KWIDTH Byte-width of the LSU on the kernel. In the report.html file, it is referred to as WIDTH.
MWIDTH Byte-width of the LSU facing the external memory. In the report.html file, it is referred to as the <Memory Name>_Width.
FMAX Clock speed of the kernel in MHz. In the report.html file, you can identify this as the design’s clock speed.
MaxBandwidth Maximum bandwidth (measured in MB/s) the global memory can achieve. You can find this in the board_spec.xml file for the specific global memory.
NUM_CHANNELS Number of interfaces an external memory has. You can find this by counting the number of interfaces listed in the board_spec.xml file under that memory.
NUM_INTERLEAVING_CHANNELS When interleaving is enabled, this is the number of channels. Otherwise, this value is 1.
BW1 Bottleneck at the kernel boundary. Therefore, BW1 uses only kernel values, which means, values you can change by optimizing the design. If this is limiting the overall bandwidth use than it indicates, changing your design can improve the bottleneck at the kernel boundary.
BW2 Bottleneck at the memory interface to the kernel. Therefore, BW2 uses the size of the memory interface and the fMAX, which means either improving fMAX of your design or switching to a board with a wider memory interface can improve the bandwidth use.
BW3 Bottleneck in the external memory. Therefore, BW3 uses external memory properties exclusively, and if this is limiting your design, you have utilized the board bandwidth completely.