- Pursue general optimizations that apply across accelerators.
- Optimize aggressively for the prioritized accelerators.
- Optimize the host code in conjunction with step 1 and 2.
- High-level optimizations
- Loop-related optimizations
- Memory-related optimizations
- SYCL-specific optimizations
High-level Optimization Tips
- Increase the amount of parallel work. More work than the number of processing elements is desired to help keep the processing elements more fully utilized.
- Minimize the code size of kernels. This helps keep the kernels in the instruction cache of the accelerator, if the accelerator contains one.
- Load balance kernels. Avoid significantly different execution times between kernels as the long-running kernels may become bottlenecks and affect the throughput of the other kernels.
- Avoid expensive functions. Avoid calling functions that have high execution times as they may become bottlenecks.
- When possible, specify a work-group size. The attribute,[[cl::reqd_work_group_size(X, Y, Z)]], where X, Y, and Z are integer dimension in the ND-range, can be employed to set the work-group size. The compiler can take advantage of this information to optimize more aggressively.
- Consider use of the-Xsfp-relaxedoption when possible. This option relaxes the order of arithmetic floating-point operations.
- Consider use of the-Xsfpcoption when possible. This option removes intermediary floating-point rounding operations and conversions whenever possible and carries additional bits to maintain precision.
- Consider use of the-Xsno-accessor-aliasingoption. This option ignores dependencies between accessor arguments in a SYCL* kernel.