Developer Guide

Intel oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs

ID 785441
Date 5/08/2024
Public
Document Table of Contents

Strategies for Inferring the Accumulator

To leverage the single cycle floating-point accumulator feature, you can modify the accumulator description in your kernel code to improve efficiency or work around programming restrictions.

Describe an Accumulator Using Multiple Loops

Consider a case where you want to describe an accumulator using multiple loops, with some of the loops being unrolled:

float acc = 0.0f;
for (i = 0; i < k; i++) {
  #pragma unroll
  for (j = 0; j < 16; j++)
    acc += (x[i+j]*y[i+j]);
}

With fast math enabled by default, the Intel® oneAPI DPC++/C++ Compiler automatically rearranges operations in a way that exposes the accumulation.

Modify a Multi-Loop Accumulator Description

If you want an accumulator to be inferred even when using -fp-model=precise, rewrite your code to expose the accumulation..

For the code example above, rewrite it in the following manner:

float acc = 0.0f;
for (i = 0; i < k; i++) {
  float my_dot = 0.0f;
  #pragma unroll
  for (j = 0; j < 16; j++)
    my_dot += (x[i+j]*y[i+j]);
  acc += my_dot;
}

Modify an Accumulator Description Containing a Variable or Non-Zero Initial Value

Consider a situation where you might want to apply an offset to a description of an accumulator that begins with a non-zero value:

float acc = array[0];
for (i = 0; i < k; i++) {
  acc += x[i];
}

Because the accumulator hardware does not support variable or non-zero initial values in a description, you must rewrite the description.

float acc = 0.0f;
for (i = 0; i < k; i++) {
  acc += x[i];
}
acc += array[0];

Rewriting the description in the above manner enables the kernel to use an accumulator in a loop. The loop structure is then followed by an increment of array[0].