AN 1020: Using the FPGA AI Suite IP with High Bandwidth Memory on Stratix® 10 MX and Agilex™ 7 M-Series Devices
2.3. Conclusion
Ideally, one or a combination of the earlier-mentioned implementation suggestions results in an optimal trade-off between bandwidth, additional logic, and development effort.
The best performance with highest logic overhead can be achieved with an interleaved channel stitching approach with an additional width adaptation logic that spreads out the two 256-bit memory accesses resulting from a 512-bit transfer to the two HBM pseudo channels on one layer of the HBM stack.
The lowest logic overhead is to use one pseudo channel of one layer in an HBM stack for each FPGA AI Suite instance and configure the FPGA AI Suite IP to use only a 256-bit wide data bus by modifying the underlying .arch file.