Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide
A newer version of this document is available. Customers should click here to go to the newest version.
Visible to Intel only — GUID: kux1474989214282
Ixiasoft
Visible to Intel only — GUID: kux1474989214282
Ixiasoft
5.8.5. No Stalls, High Occupancy Percentage, and Low Bandwidth

In this example, the accelerator board can provide a bandwidth of 25600 megabytes per second (MB/s). However, the vector_add kernel is requesting (2 reads + 1 write) x 4 bytes x 294 MHz = 12 bytes/cycle x 294 MHz = 3528 GB/s, which is 14% of the available bandwidth. To increase the bandwidth, increase the number of tasks performed in each clock cycle.
Solutions for low bandwidth:
- Automatically or manually vectorize the kernel to make wider requests
- Unroll the innermost loop to make more requests per clock cycle
- Delegate some of the tasks to another kernel