- 3.4. Listing the Intel® FPGA SDK for OpenCL™ Offline Compiler Command Options (no argument, -help, or -h)
- 7.5. Specifying the Name of an Intel® FPGA SDK for OpenCL™ Offline Compiler Output File (-o <filename>)
- 7.6. Compiling a Kernel for a Specific FPGA Board and Custom Platform (-board=<board_name>) and (-board-package=<board_package_path>)
- 7.13. Converting Warning Messages from the Intel® FPGA SDK for OpenCL™ Offline Compiler into Error Messages (-Werror)
- 7.17. Forcing a Single Store Ring to Reduce Area at the Expense of Write Throughput to Global Memory (-force-single-store-ring)
- 7.18. Forcing Fewer Read Data Reorder Units to Reduce Area at the Expense of Read Throughput to Global Memory (-num-reorder)
5.2.6. Specifying a Loop Initiation interval (II)
The ii pragma applies to single work-item kernels (that is, single-threaded kernels) in which loops are pipelined. Refer to the Single Work-Item Kernel versus NDRange Kernel section of the Intel® FPGA SDK for OpenCL™ Best Practices Guide for information on loop pipelining, and on kernel properties that drive the offline compiler's decision on whether to treat a kernel as single-threaded.
The higher the II value, the longer the wait before the subsequent loop iteration starts executing. Refer to the Reviewing Your Kernel's report.html File section of the Intel® FPGA SDK for OpenCL™ Best Practices Guide for information on II, and on the compiler reports that provide you with details on the performance implications of II on a specific loop.
For some loops in your kernel, specifying a higher II value with the ii pragma than the value the compiler chooses by default can increase the maximum operating frequency (fMAX) of your kernel without a decrease in throughput.
- The loop is pipelined because the kernel is single-threaded.
- The loop is not critical to the throughput of your kernel.
- The running time of the loop is small compared to other loops it might contain.
The <desired_initiation_interval> parameter is required and is an integer that specifies the number of clock cycles to wait between the beginning of execution of successive loop iterations.
#pragma ii <desired_initiation_interval>
Consider a case where your kernel has two distinct, pipelineable loops: a short-running initialization loop that has a loop-carried dependence and a long-running loop that does the bulk of your processing. In this case, the compiler does not know that the initialization loop has a much smaller impact on the overall throughput of your design. If possible, the compiler attempts to pipeline both loops with an II of 1.
Because the initialization loop has a loop-carried dependence, it has a feedback path in the generated hardware. To achieve an II with such a feedback path, some clock frequency might be sacrificed. Depending on the feedback path in the main loop, the rest of your design can run at a higher operating frequency.
If you specify #pragma ii 2 on the initialization loop, you tell the compiler that it can be less aggressive in optimizing II for this loop. Less aggressive optimization permits the compiler to pipeline the path limiting the fmax and allow your overall kernel design to achieve a higher fmax.
The initialization loop takes longer to run with its new II. However, the decrease in the running time of the long-running loop due to higher fmax compensates for the increased length in running time of the initialization loop.
Did you find the information on this page useful?