Additional Recommendations
Optimizing memory accesses in your kernels can improve overall kernel performance. Consider implementing the following techniques for optimizing memory accesses:
- Avoid designing systems where one kernel writes an intermediate result to global memory and another kernel reads this data back from global memory. Instead, implement a SYCL* pipe (described in Pipes) between the producer and consumer kernels for direct data transfer. Alternatively, you can merge both kernels into a single larger kernel and use helper functions to logically separate the two original kernels.
- TheIntel® oneAPIimplements local memory in FPGAs differently than in GPUs. If your kernel contains code to avoid GPU-specific local memory bank conflicts, remove that code because the compiler generates hardware that avoids local memory bank conflicts automatically whenever possible.DPC++/C++Compiler