Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.2.15. Specifying the use_stall_enable_clusters Cluster-control Attribute

You can apply the use_stall_enable_clusters cluster-control attribute to your OpenCL kernel to reduce the area of your kernel while possibly decreasing kernel fMAX and throughput, as follows:
__kernel void __attribute__((use_stall_enable_clusters))
example(__global int * restrict input, 
	    __global int * restrict output, int size){
	 for(int i = 0; i < size; ++i){
	    output[i] = input[i];
	 }
	
...
}

The Intel® FPGA SDK for OpenCL™ Offline Compiler typically groups related operations into clusters. In several scenarios, the clusters are stall-free clusters. A stall-free cluster executes the operations without any stalls and contains a FIFO at the end of the cluster that holds the results if the cluster is stalled. This FIFO adds area and latency to the kernel, but might allow a higher fMAX and increased throughput.

If you prefer lower FPGA area use and lower latency over higher throughput, use the __attribute__((use_stall_enable_clusters)) attribute to bias the compiler to produce stall-enabled clusters. Stall-enabled clusters lack an exit FIFO to buffer all data in the event that the whole SFC is stalled, which reduces area and latency, but passes stall signals to the contained operations. Passing stall signals might reduce fMAX.

Not all operations support stall, and these operations cannot be contained in a stall-enabled cluster. The compiler generates a warning if some operations cannot be placed into a stall-enabled cluster.

The compiler automatically uses stall-free clusters for kernels as they are generally more beneficial. This attribute requests the compiler to form stall-enabled clusters if possible.

Intel Stratix 10 Restriction: This attribute does not apply to designs that target Intel® Stratix® 10 devices unless you specify the -hyper-optimized-handshaking=off option of the aoc command.