Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

3.4.3.1. Reducing the Area Consumed by Nested Loops Using loop_coalesce

When loops are nested to a depth greater than three, more area is consumed.

Consider the following example where orig and lc_test kernels are used to illustrate how to reduce latency in nested loops.

The orig kernel has nested loops to a depth of four. The nested loops created extra blocks (Block 2, 3, 4, 6, 7 and 8) that consume area due to the variables being carried, as shown in the following reports:

Figure 65. Area Report and System Viewer (System View) Before and After Loop Coalescing

Due to loop coalescing, you can see the reduced latency in the lc_test. The Block 5 of orig kernel and Block 12 of lc_test kernel are the inner most loops.

Figure 66. Area Report of lc_test and orig Kernels