Memory access efficiency often dictates the overall performance of your SYCL* kernel. Refer to Memory Types for an introduction to memory accesses.
The pipeline parallel nature of SYCL execution on FPGA means that memory loads and stores in your SYCL code compete for access to memory resources (global, local, and private memories). If your SYCL kernel performs a large number of memory accesses, the compiler must generate arbitration logic to share the available memory bandwidth between memory access sites in your kernel's datapath. If the bandwidth demanded by the datapath exceeds what the memory and arbitration logic can provide, the datapath stalls. This degrades the kernel’s throughput because the compute pipeline must wait for a memory access before resuming.
When optimizing your design, it is important to understand whether your kernel's throughput is limited by memory accesses (a memory-bound kernel) or by the structure of the kernel datapath (a compute-bound kernel). These situations require different optimization techniques. The following sections discuss memory access optimization in detail.
Consider the following when developing your SYCL code:
- The maximum computation bandwidth of an FPGA is much larger than the available global memory bandwidth.
- The available global memory bandwidth is much smaller than the local and private memory bandwidth.
- Minimize the number of global memory accesses.