Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Pipeline Loops in Non-task Kernels (-Xsauto-pipeline)

To direct the Intel® oneAPI DPC++/C++ Compiler to compile your design and pipeline loops in non-task (parallel_for) kernels, include the -Xsauto-pipeline option in your icpx command. The host program invokes non-task kernels through the kernel execution function parallel_for, parallel_for_work_item, or parallel_for_work_group.

Example

icpx -fsycl -fintelfpga –Xshardware -Xsauto-pipeline <source_file>.cpp 

With the -Xsauto-pipeline option, the compiler attempts to pipeline the loops in your design, but the pipelining is not guaranteed. If you do not include the -Xsauto-pipeline option, the compiler does not pipeline the loops in parallel_for kernels. However, it executes different work items in parallel.

NOTE:

The -Xsauto-pipeline option might improve or degrade performance depending on the memory access pattern in your design.

  • If the auto-pipelining is successful, the Loop Analysis report displays the message Auto-pipelined parallel_for and parallel_for rewritten as a pipelined single_task (Details pane) . The compiler-generated loops appear marked as Compiler generated auto-pipeline loop in the report.
  • If the compiler chooses not to auto-pipeline the loops, the Loop Analysis report displays a message for the kernel. The reasons for not auto-pipelining a loop can be one of the following:
    • A barrier in the function is not at the top-level function scope.
    • Kernel uses a local or private memory.
    • Kernel uses a volatile or atomic memory, or channels.
TIP:

If you do not want the compiler to pipeline some infrequently used loops while allowing other loops to be auto-pipelined, use the [[intel::disable_loop_pipelining]] loop directive on specific loops when using the -Xsauto-pipeline option. This loop directive disables the loop pipelining.