8.2. Optimize Global Memory Accesses
In most circumstances, the default burst-interleaved configuration leads to the best load balancing between the memory banks. However, in some cases, you might want to partition the banks manually as two non-interleaved (and contiguous) memory regions to achieve better load balancing.
The figure below illustrates the differences in memory mapping patterns between burst-interleaved and non-interleaved memory partitions.
Global Memory Bandwidth Use
To ensure the global memory bandwidth listed in the board specification file is utilized completely, calculating the kernel bandwidth use is beneficial. The report.html file also displays the kernel bandwidth values in the global memory view of the System Viewer. The following formulas explain how you can calculate this value on a per-LSU basis:
The LSU bandwidth equation is the minimum of three bottlenecks you need to calculate the use of global memory bandwidth. The remaining equations represent three bottlenecks that can limit the LSU bandwidth. These formulas represent the theoretical maximum bandwidth an LSU may consume, ignoring all other LSUs. The actual bandwidth depends on the LSU's access pattern and the interconnect's arbitration between all LSUs. To get an estimate of the overall bandwidth, a sum of the LSU bandwidths is available in the controller of the global memory view of the System Viewer.
The following table describes the variables used in the above equations:
|KWIDTH||Byte-width of the LSU on the kernel. In the report.html file, it is referred to as WIDTH.|
|MWIDTH||Byte-width of the LSU facing the external memory. In the report.html file, it is referred to as the <Memory Name>_Width.|
|FMAX||Clock speed of the kernel in MHz. In the report.html file, you can identify this as the design’s clock speed.|
|MaxBandwidth||Maximum bandwidth (measured in MB/s) the global memory can achieve. You can find this in the board_spec.xml file for the specific global memory.|
|NUM_CHANNELS||Number of interfaces an external memory has. You can find this by counting the number of interfaces listed in the board_spec.xml file under that memory.|
|NUM_INTERLEAVING_CHANNELS||When interleaving is enabled, this is the number of channels. Otherwise, this value is 1.|
|BW1||Bottleneck at the kernel boundary. Therefore, BW1 uses only kernel values, which means, values you can change by optimizing the design. If this is limiting the overall bandwidth use than it indicates, changing your design can improve the bottleneck at the kernel boundary.|
|BW2||Bottleneck at the memory interface to the kernel. Therefore, BW2 uses the size of the memory interface and the FMAX, which means either improving FMAX of your design or switching to a board with a wider memory interface can improve the bandwidth use.|
|BW3||Bottleneck in the external memory. Therefore, BW3 uses external memory properties exclusively, and if this is limiting your design, you have utilized the board bandwidth completely.|
Did you find the information on this page useful?