Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide
ID
683152
Date
1/23/2025
Public
1. Discontinuation of the Intel® HLS Compiler
2. Intel® HLS Compiler Pro Edition Best Practices Guide
3. Best Practices for Coding and Compiling Your Component
4. FPGA Concepts
5. Interface Best Practices
6. Loop Best Practices
7. fMAX Bottleneck Best Practices
8. Memory Architecture Best Practices
9. System of Tasks Best Practices
10. Datatype Best Practices
11. Advanced Troubleshooting
A. Intel® HLS Compiler Pro Edition Best Practices Guide Archives
B. Document Revision History for Intel® HLS Compiler Pro Edition Best Practices Guide
6.1. Reuse Hardware By Calling It In a Loop
6.2. Parallelize Loops
6.3. Construct Well-Formed Loops
6.4. Minimize Loop-Carried Dependencies
6.5. Avoid Complex Loop-Exit Conditions
6.6. Convert Nested Loops into a Single Loop
6.7. Place if-Statements in the Lowest Possible Scope in a Loop Nest
6.8. Declare Variables in the Deepest Scope Possible
6.9. Raise Loop II to Increase fMAX
6.10. Control Loop Interleaving
9.4. Balancing Capacity in a System of Tasks
If your component contains parallel task paths with different latencies, you might experience poor performance, and in some cases, deadlock.
Typically, these performance issues are caused by a lack of capacity in the datapath of the functions calling task function using the ihc::launch and ihc::collect calls. You can improve system throughput in these cases by adding a buffer to the explicit streams to account for the latency of the task functions.
Review the following tutorials to learn more about avoiding potential performance issues in a component that uses a system of tasks:
- <quartus_installdir>/hls/examples/tutorials/ system_of_tasks/balancing_pipeline_latency
- <quartus_installdir>/hls/examples/tutorials/ system_of_tasks/balancing_loop_delay
- <quartus_installdir>/hls/examples/tutorials/ system_of_tasks/launch_and_collect_capacity
The Intel® HLS Compiler Pro Edition emulator models the size of the buffer attached to a stream. However, the emulator does not fully account for hardware latencies, and it might exhibit different behavior between simulation and emulation in these cases.
In addition to the techniques outlined in the tutorials, follow the practices that follow to try to maximize the data throughput of your design.