Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide
A newer version of this document is available. Customers should click here to go to the newest version.
Visible to Intel only — GUID: saz1474984501691
Ixiasoft
Visible to Intel only — GUID: saz1474984501691
Ixiasoft
5.8.4. No Stalls, Low Occupancy Percentage, and Low Bandwidth

In this example, dst[] is executed once every 20 iterations of the FACTOR2 loop and once every four iterations of the FACTOR1 loop. Therefore, FACTOR2 loop is the source of the bottleneck.
Solutions for resolving loop bottlenecks:
- Unroll the FACTOR1 and FACTOR2 loops evenly. Simply unrolling FACTOR2 loop further does not resolve the bottleneck.
- Vectorize your kernel to allow multiple work-items to execute during each loop iteration.