6. Loop Best Practices

Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

Download PDF

ID 683152

Date 1/23/2025

Version

Public

6. Loop Best Practices

The Intel® High Level Synthesis Compiler pipelines your loops to enhance throughput. Review these loop best practices to learn techniques to optimize your loops to boost the performance of your component.

The Intel® HLS Compiler Pro Edition lets you know if there are any dependencies that prevent it from optimizing your loops. Try to eliminate these dependencies in your code for optimal component performance. You can also provide additional guidance to the compiler by using the available loop pragmas.

As a start, try the following techniques:

Manually fuse adjacent loop bodies when the instructions in those loop bodies can be performed in parallel. These fused loops can be pipelined instead of being executed sequentially. Pipelining reduces the latency of your component and can reduce the FPGA area your component uses.
Use the #pragma loop_coalesce directive to have the compiler attempt to collapse nested loops. Coalescing loops reduces the latency of your component and can reduce the FPGA area overhead needed for nested loops.
If you have two loops that can execute in parallel, consider using a system of tasks. For details, see System of Tasks Best Practices.

Tutorials Demonstrating Loop Best Practices

The Intel® HLS Compiler Pro Edition comes with a number of tutorials that illustrate important Intel® HLS Compiler concepts and demonstrate good coding practices.

Review the following tutorials to learn about loop best practices that might apply to your design:

Tutorial	Description
You can find these tutorials in the following location on your Quartus® Prime system: `<quartus_installdir>`/hls/examples/tutorials
best_practices/ divergent_loops	Demonstrates a source-level optimization for designs with divergent loops
best_practices/ loop_coalesce	Demonstrates the performance and resource utilization improvements of using `loop_coalesce` pragma on nested loops.
best_practices/ loop_fusion	Demonstrates the latency and resource utilization improvements of loop fusion.
best_practices/ loop_memory_dependency	Demonstrates breaking loop-carried dependencies using the `ivdep` pragma.
loop_controls/ max_interleaving	Demonstrates a method to reduce the area utilization of a loop that meets the following conditions: The loop has an II > 1 The loop is contained in a pipelined loop The loop execution is serialized across the invocations of the pipelined loop
`best_practices/` optimize_ii_using_ hls_register	Demonstrates how to use the `hls_register` attribute to reduce loop II and how to use `hls_max_concurrency` to improve component throughput
`best_practices/` parallelize_array_operation	Demonstrates how to improve f_MAX by correcting a bottleneck that arises when performing operations on an array in a loop.
`best_practices/` relax_reduction_dependency	Demonstrates a method to reduce the II of a loop that includes a floating point accumulator, or other reduction operation that cannot be computed at high speed in a single clock cycle.
`best_practices/` remove_loop_carried_dependency	Demonstrates how to improve loop performance by removing accesses to the same variable across nested loops.
best_practices/ resource_sharing_filter	Demonstrates the following versions of a 32-tap finite impulse response (FIR) filter design: optimized-for-throughput variant optimized-for-area variant
`best_practices/` `speculated_iterations`	Demonstrates how to use `#pragma speculated_iterations` to control when speculated iterations are used.
`best_practices/` triangular_loop	Demonstrates a method for describing triangular loop patterns with dependencies.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

6. Loop Best Practices

Tutorials Demonstrating Loop Best Practices