Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

max_reinvocation_delay Attribute (Beta)

The loop reinvocation delay is the delay between launching the last iteration of a loop invocation and launching the first iteration of the next loop invocation. For interleaved loops, the loop reinvocation delay is redefined as the delay between launching the last iteration of a particular loop and launching the first iteration of the next loop invocation to enter the loop, which may not be the next loop invocation in the program order. This is due to interleaved loops having multiple invocations in the loop body, so the redefinition refers to when the loop body is fully occupied and when a loop invocation is exiting the loop body while a new loop invocation is ready to enter the loop body.

Use the max_reinvocation_delay attribute to direct the Intel® oneAPI DPC++/C++ Compiler to implement the loop that follows the attribute declaration with the specified maximum reinvocation delay value. This attribute restricts the maximum allowed delay between loop invocations for the implemented loop (without any stalls) and assumes a new invocation is ready to start executing.

Syntax

[[intel::max_reinvocation_delay(N)]]

The attribute parameter N is required and must be a positive constant expression of integer type. This parameter specifies the maximum number of clock cycles that you are willing to allow between launching the last iteration of a loop invocation and launching the first iteration of the next loop invocation. The higher the maximum reinvocation delay allowed, the longer the wait before the next loop invocation can start executing. However, the compiler can optimize the loop better if a high reinvocation delay is allowed in one of the following ways:

NOTE:
  • The extra latency between invocations for a loop can be a significant factor in overall performance if the trip count of the loop is very small. In this case, specify the N value as 1 to allow subsequent loop invocations to start immediately, but at the cost of larger II and/or lower fMAX to allow time to evaluate the exit condition.
    for(int i = 0; i < maxI; i++) {
      int m = …
      [[intel::max_reinvocation_delay(1)]]
      while(m*m*m < N) {
        m += 1;
      }
      dst[i] = m;
    }

    The inner loop in this example is implemented such that the first iteration of the (i+1)th invocation of the outer loop launches one cycle after the last iteration of the ith invocation of the outer loop.

  • The max_reinvocation_delay is a prototype loop attribute, and it currently supports only a value of N=1.
  • For hyper-optimized loops, the minimum reinvocation delay must be equal to the loop II. The compiler may be unable to implement the loop with an appropriate II for the currently supported values of max_reinvocation_delay attribute.