7.6. Minimizing the Memory Dependencies for Loop Pipelining
Loop dependencies might introduce bottlenecks for single work-item kernels due to latency associated with the memory accesses. The offline compiler defers a memory operation until a dependent memory operation completes. This could impact the loop initiation interval (II). The offline compiler indicates the memory dependencies in the optimization report.
- Ensure that the offline compiler does not assume false dependencies.
When the static memory dependence analysis fails to prove that dependency does not exist, the offline compiler assumes that a dependency exists and modifies the kernel execution to enforce the dependency. Impact of the dependency enforcement is lower if the memory system is stall-free.
- Write after read operations with data dependency on a load-store unit can take just two clock cycles (II=2). Other stall-free scenarios can take up to seven clock cycles.
- Read after write (control dependency) operation can be fully resolved by the offline compiler.
- Override the static memory dependence analysis by adding the line #pragma ivdep before the loop in your kernel code if you are sure that it carries no dependences.
Did you find the information on this page useful?