Developer Guide

Contents

Loop Fuse Functions and
nofusion
Attribute

This topic describes the loop fuse functions and
nofusion
attribute that the
Intel® oneAPI
DPC++/C++
Compiler
supports.

Loop Fuse Functions

The loop fuse functions are declared in the
sycl/ext/intel/fpga_loop_fuse.hpp
header file, which is invoked by the
sycl/ext/intel/fpga_extensions.hpp
header file. Apply these loop functions to a block of code to indicate to the compiler that it must fuse adjacent loops in the code block overriding the compiler profitability or safety analysis of the fusion. Fusing adjacent loops reduces the amount of loop control overhead in your kernel that reduces the FPGA area used and increases the performance by executing both loops as one (fused) loop. For additional information, see Fuse Loops to Reduce Overhead and Improve Performance
The compiler supports the following loop fuse functions:
  • sycl::ext::intel::fpga_loop_fuse<v>(f)
    : Directs the compiler to fuse loops within the function
    f
    and up to a depth of
    v >= 1
    without affecting the functionality of either loop, overriding the compiler profitability analysis of fusing the loops. By default,
    v = 1
    , which is equivalent to indicating that the compiler should consider only the adjacent top-level loops for fusing. For example:
    [=]() { //Kernel sycl::ext::intel::fpga_loop_fuse<1>([&] { L1: for(...) {} L2: for(...) { L3: for(...) {} L4: for(...) { L5: for(...) {} L6: for(...) {} } } }); }
    By default (
    v = 1
    ), only loops
    L1
    and
    L2
    are initially considered for fusing. At a depth of
    v = 2
    , the compiler considers
    L1
    -
    L2
    and
    L3
    -
    L4
    loop pairs for fusing.
    The compiler automatically considers fusing adjacent loops with equal trip counts when the loops meet the Automatic Loop Fusion criteria or when fusion is deemed safe and profitable. You can use the
    sycl::ext::intel::fpga_loop_fuse<v>(f)
    function to inform the compiler to consider fusing adjacent loops with different trip counts, which is considered to be unprofitable by default.. With the loop fuse function applied to a block of code, the compiler always attempts to fuse adjacent loops (with equal or different trip counts) in the block whenever the compiler determines that it is safe to fuse the loops. Two loops are considered safe to merge if they meet the Fusion Criteria. The following example shows the effects of fusing loops with unequal trip counts:
    Unfused Loops
    Fused Loops
    [=]() { //Kernel sycl::ext::intel::fpga_loop_fuse([&] { for (int i = 0; i < N; i++) { // Loop Body 1 } for (int j = 0; j < M; j++) { // Loop Body 2 } }); }
    for (int f = 0; f < max(M,N); f++) { if (f < N) { // Loop Body 1 } if (f < M) { // Loop Body 2 } }
    A fused loop can itself be considered for fusing with other loops. For example, in the following code, L1 and L2 are initially considered for fusing. That resulting fused loop can then be considered for fusing with L4.
    [=]() { //Kernel sycl::ext::intel::fpga_loop_fuse([&] { L1: for(...) {} L2: for(...) { L3: for(...) {} } L4: for(...) { } }); }
  • sycl::ext::intel::fpga_loop_fuse_independent<v>(f)
    : Directs the compiler to fuse loops within the function
    f
    up to a depth
    v >= 1
    while overriding fusion-safety checks. Here,
    v = 1
    by default. When you use this function, you are guaranteeing to the compiler that fusing pairs of loops affected by the loop fuse function is safe. That is, there are no negative distance dependencies between the pairs of loops. If it is not safe, you might get functional errors in your kernel. Besides this difference, the
    sycl::ext::intel::fpga_loop_fuse<v>(f)
    and
    sycl::ext::intel::fpga_loop_fuse_independent<v>(f)
    functions behave identically.
Function Calls in Loop Fuse Code Blocks
If a function call occurs in a code block annotated with a loop fuse function and inlining that function call contains a loop, the resulting loop can be a candidate for loop fusion.
Nested Loop Fusion Functions
When you nest loop fusion functions, you might create overlapping sets of candidate loops. Consider the following example:
[=]() { //Kernel sycl::ext::intel::fpga_loop_fuse_independent<2>([&] { L1: for(...) {} L2: for(...) { sycl::ext::intel::fpga_loop_fuse<2>([&] { L3: for(...) {} L4: for(...) { L5: for(...) {} L6: for(...) {} } }); } }); }
In this example, the compiler considers the following loop pairs for fusion:
L1
-
L2
,
L3
-
L4
, and
L5
-
L6
. In addition, the compiler overrides the compiler negative distance dependency analysis of
L1
-
L2
and
L3
-
L4
loop pairs.

nofusion
Attribute

You can exempt a loop from being fused with an adjacent loop by annotating the loop with the
nofusion
loop attribute. This attribute prevents the annotated loop from being automatically or fused when it is subject to the loop fusion functions.
Syntax
[[intel::nofusion]]
Example
For example, the following code samples have the same effect. If one loop in a pair is annotated with the
nofusion
attribute, the other loop has no other loop to fuse with.
[[intel::nofusion]] L1: for (int j = 0; j < N; ++j){ data[j] += Q; } L2: for (int i = 0; i < N; ++l) { output[i] = Q * data[i]; }
L1: for (int j = 0; j < N; ++j){ data[j] += Q; } [[intel::nofusion]] L2: for (int i = 0; i < N; ++l) { output[i] = Q * data[i]; }
In the following example, the compiler does not apply the loop fusion transformation to the loops:
for (int x = 0; x < N; x++) { // loop 1 arr1_acc[x] = x; } [[intel::nofusion]] for (int x = 0; x < N; x++) { // loop 2 arr2_acc[x] = x; }
Because you have applied the
nofusion
attribute to loop 2, the compiler cannot fuse loop 1 with loop 2.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.