Improve Loop Performance by Caching On-Chip Memory
When is the On-chip Memory Cache Technique Applicable?
- Failure to achieve II=1 because of a loop-carried memory dependency in on-chip memoryThe on-chip memory cache technique is applicable if the compiler could not pipeline a loop with II=1 because of an on-chip memory dependency. If the compiler could not achieve II=1 because of a global memory dependency, this technique does not apply as the access latencies are too great.To check this for a given design, view the Loop Analysis report in the design's optimization report. The Loop Analysis report lists the II of all loops and explains why a lower II is not achievable. Check whether the reason given resemblesthe compiler failed to schedule this loop with smaller II due to memory dependency. The report describes themost critical loop feedback path during scheduling. Check whether this includes on-chip memory load/store operations on the critical path.
- An II=1 loop with a load operation of latency 1The compiler is capable of reducing the latency of on-chip memory accesses to achieve II=1. In doing so, the compiler makes a trade-off by sacrificing fMAXto improve the II.In a design with II=1 critical loops but lower than the desired fMAX, the on-chip memory cache technique might still be applicable. It can help recover fMAXby enabling the compiler to achieve II=1 with a higher latency memory access. To check whether this is the case for a given design, view the Kernel Memory Viewer report in the design's optimization report. Select the desired on-chip memory from the Kernel Memory List, and mouse over the load operationLDto check its latency. If the latency of the load operation is 1, this is a clear sign that the compiler has attempted to sacrifice fMAXto improve loop II.