Intel® FPGA SDK for OpenCL™ Standard Edition: Best Practices Guide

ID 683176
Date 9/24/2018
Document Table of Contents

2.3.3. Reducing the Area Consumed by Nested Loops Using loop_coalesce

When loops are nested to a depth greater than three, more area is consumed.

Consider the following example where orig and lc_test kernels are used to illustrate how to reduce latency in nested loops.

The orig kernel has nested loops to a depth of four. The nested loops created extra blocks (Block 2, 3, 4, 6, 7 and 8) that consume area due to the variables being carried, as shown in the following reports:

Figure 21. Area Report and System Viewer Before and After Loop Coalescing

Due to loop coalescing, you can see the reduced latency in the lc_test. The Block 5 of orig kernel and Block 12 of lc_test kernel are the inner most loops.

Figure 22. Area Report of lc_test and orig Kernels