Loop Fuse Functions and nofusion Attribute
nofusion
AttributeThis topic describes the loop fuse functions and
nofusion
attribute that the
Intel® oneAPI
supports.
DPC++/C++
CompilerLoop Fuse Functions
The loop fuse functions are declared in the
sycl/ext/intel/fpga_loop_fuse.hpp
header file, which is invoked by the
sycl/ext/intel/fpga_extensions.hpp
header file. Apply these loop functions to a block of code to indicate to the compiler that it must fuse adjacent loops in the code block overriding the compiler profitability or safety analysis of the fusion. Fusing adjacent loops reduces the amount of loop control overhead in your kernel that reduces the FPGA area used and increases the performance by executing both loops as one (fused) loop. For additional information, see
Fuse Loops to Reduce Overhead and Improve PerformanceThe compiler supports the following loop fuse functions:
- : Directs the compiler to fuse loops within the functionsycl::ext::intel::fpga_loop_fuse<v>(f)fand up to a depth ofv >= 1without affecting the functionality of either loop, overriding the compiler profitability analysis of fusing the loops. By default,v = 1, which is equivalent to indicating that the compiler should consider only the adjacent top-level loops for fusing. For example:[=]() { //Kernel sycl::ext::intel::fpga_loop_fuse<1>([&] { L1: for(...) {} L2: for(...) { L3: for(...) {} L4: for(...) { L5: for(...) {} L6: for(...) {} } } }); }By default (v = 1), only loopsL1andL2are initially considered for fusing. At a depth ofv = 2, the compiler considersL1-L2andL3-L4loop pairs for fusing.The compiler automatically considers fusing adjacent loops with equal trip counts when the loops meet the Automatic Loop Fusion criteria or when fusion is deemed safe and profitable. You can use thesycl::ext::intel::fpga_loop_fuse<v>(f)function to inform the compiler to consider fusing adjacent loops with different trip counts, which is considered to be unprofitable by default.. With the loop fuse function applied to a block of code, the compiler always attempts to fuse adjacent loops (with equal or different trip counts) in the block whenever the compiler determines that it is safe to fuse the loops. Two loops are considered safe to merge if they meet the Fusion Criteria. The following example shows the effects of fusing loops with unequal trip counts:Unfused LoopsFused Loops[=]() { //Kernel sycl::ext::intel::fpga_loop_fuse([&] { for (int i = 0; i < N; i++) { // Loop Body 1 } for (int j = 0; j < M; j++) { // Loop Body 2 } }); }for (int f = 0; f < max(M,N); f++) { if (f < N) { // Loop Body 1 } if (f < M) { // Loop Body 2 } }A fused loop can itself be considered for fusing with other loops. For example, in the following code, L1 and L2 are initially considered for fusing. That resulting fused loop can then be considered for fusing with L4.[=]() { //Kernel sycl::ext::intel::fpga_loop_fuse([&] { L1: for(...) {} L2: for(...) { L3: for(...) {} } L4: for(...) { } }); }
- : Directs the compiler to fuse loops within the functionsycl::ext::intel::fpga_loop_fuse_independent<v>(f)fup to a depthv >= 1while overriding fusion-safety checks. Here,v = 1by default. When you use this function, you are guaranteeing to the compiler that fusing pairs of loops affected by the loop fuse function is safe. That is, there are no negative distance dependencies between the pairs of loops. If it is not safe, you might get functional errors in your kernel. Besides this difference, thesycl::ext::intel::fpga_loop_fuse<v>(f)andsycl::ext::intel::fpga_loop_fuse_independent<v>(f)functions behave identically.
Function Calls in Loop Fuse Code Blocks
If a function call occurs in a code block annotated with a loop fuse function and inlining that function call contains a loop, the resulting loop can be a candidate for loop fusion.
Nested Loop Fusion Functions
When you nest loop fusion functions, you might create overlapping sets of candidate loops. Consider the following example:
[=]() { //Kernel
sycl::ext::intel::fpga_loop_fuse_independent<2>([&] {
L1: for(...) {}
L2: for(...) {
sycl::ext::intel::fpga_loop_fuse<2>([&] {
L3: for(...) {}
L4: for(...) {
L5: for(...) {}
L6: for(...) {}
}
});
}
});
}
In this example, the compiler considers the following loop pairs for fusion:
L1
-L2
,
L3
-L4
, and
L5
-L6
. In addition, the compiler overrides the compiler negative distance dependency analysis of
L1
-L2
and
L3
-L4
loop pairs.
nofusion Attribute
nofusion
AttributeYou can exempt a loop from being fused with an adjacent loop by annotating the loop with the
nofusion
loop attribute. This attribute prevents the annotated loop from being automatically or fused when it is subject to the loop fusion functions.
Syntax
[[intel::nofusion]]
Example
For example, the following code samples have the same effect. If one loop in a pair is annotated with the
nofusion
attribute, the other loop has no other loop to fuse with.
|
|
In the following example, the compiler does not apply the loop fusion transformation to the loops:
for (int x = 0; x < N; x++) { // loop 1
arr1_acc[x] = x;
}
[[intel::nofusion]]
for (int x = 0; x < N; x++) { // loop 2
arr2_acc[x] = x;
}
Because you have applied the
nofusion
attribute to loop 2, the compiler cannot fuse loop 1 with loop 2.