2.3.3. Reducing the Area Consumed by Nested Loops Using loop_coalesce
Consider the following example where orig and lc_test kernels are used to illustrate how to reduce latency in nested loops.
The orig kernel has nested loops to a depth of four. The nested loops created extra blocks (Block 2, 3, 4, 6, 7 and 8) that consume area due to the variables being carried, as shown in the following reports:
Due to loop coalescing, you can see the reduced latency in the lc_test. The Block 5 of orig kernel and Block 12 of lc_test kernel are the inner most loops.
Did you find the information on this page useful?