Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 12/19/2022

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

8.4. Balancing Capacity in a System of Tasks

If your component contains parallel task paths with different latencies, you might experience poor performance, and in some cases, deadlock.

Typically, these performance issues are caused by a lack of capacity in the datapath of the functions calling task function using the ihc::launch and ihc::collect calls. You can improve system throughput in these cases by adding a buffer to the explicit streams to account for the latency of the task functions.

Review the following tutorials to learn more about avoiding potential performance issues in a component that uses a system of tasks:
  • <quartus_installdir>/hls/examples/tutorials/ system_of_tasks/balancing_pipeline_latency
  • <quartus_installdir>/hls/examples/tutorials/ system_of_tasks/balancing_loop_delay
  • <quartus_installdir>/hls/examples/tutorials/ system_of_tasks/launch_and_collect_capacity

The Intel® HLS Compiler Pro Edition emulator models the size of the buffer attached to a stream. However, the emulator does not fully account for hardware latencies, and it might exhibit different behavior between simulation and emulation in these cases.

In addition to the techniques outlined in the tutorials, follow the practices that follow to try to maximize the data throughput of your design.