Developer Guide

ID 785441
Date 2/07/2024
Public

## Strategies for Inferring the Accumulator

To leverage the single cycle floating-point accumulator feature, you can modify the accumulator description in your kernel code to improve efficiency or work around programming restrictions.

### Describe an Accumulator Using Multiple Loops

Consider a case where you want to describe an accumulator using multiple loops, with some of the loops being unrolled:

float acc = 0.0f;
for (i = 0; i < k; i++) {
#pragma unroll
for (j = 0; j < 16; j++)
acc += (x[i+j]*y[i+j]);
}

With fast math enabled by default, the Intel® oneAPI DPC++/C++ Compiler automatically rearranges operations in a way that exposes the accumulation.

### Modify a Multi-Loop Accumulator Description

If you want an accumulator to be inferred even when using -fp-model=precise, rewrite your code to expose the accumulation..

For the code example above, rewrite it in the following manner:

float acc = 0.0f;
for (i = 0; i < k; i++) {
float my_dot = 0.0f;
#pragma unroll
for (j = 0; j < 16; j++)
my_dot += (x[i+j]*y[i+j]);
acc += my_dot;
}

### Modify an Accumulator Description Containing a Variable or Non-Zero Initial Value

Consider a situation where you might want to apply an offset to a description of an accumulator that begins with a non-zero value:

float acc = array[0];
for (i = 0; i < k; i++) {
acc += x[i];
}

Because the accumulator hardware does not support variable or non-zero initial values in a description, you must rewrite the description.

float acc = 0.0f;
for (i = 0; i < k; i++) {
acc += x[i];
}
acc += array[0];

Rewriting the description in the above manner enables the kernel to use an accumulator in a loop. The loop structure is then followed by an increment of array[0].