Device Family: Intel® Arria® 10, Arria® V, Cyclone® V, Intel® Stratix® 10, Stratix® V

Intel Software: Intel FPGA SDK for OpenCL

Type: Answers, KDB Area


Last Modified: August 04, 2017
Version Found: v17.0
Version Fixed: v17.1
Bug ID: FB: 478222;

Why does #pragma ivdep not work correctly in aocl version 17.0?

Description

In 16.1, this code behaved as expected where the outer loop was serialized due to dependencies and the inner loop dependencies were removed by the #pragma ivdep.

// This loop gets serialized due to true dependencies with inner loop

for (unsigned char x = 0; x < 4; x++) {

// Inner loop does not have inter-iteration dependencies, but depends on outer loop

#pragma ivdep

for (unsigned char y = 0; y<64; y++) {

In 17.0, the #pragma ivdep is now applied to both the inner and outer loop, so the dependencies in the outer loop are not accounted for by the compiler.  As a result, similar code may not work correctly in hardware despite working in emulation.

Workaround/Fix

Workaround:

1. Add an extra argument "dummy" to the kernel.  On the host side, always pass 1 for this dummy argument.

BEFORE

__kernel void my_kernel(
    __global cpx_t* restrict input,
    __global cpx_t* restrict result)

AFTER

__kernel void my_kernel(
    __global cpx_t* restrict input,
    __global cpx_t* restrict result,
    int dummy)

2.  In the loop nest, wrap the inner loop in "if (dummy)":

// This loop gets serialized due to true dependencies

for (unsigned char x = 0; x < 4; x++) {

if (dummy) {

    // No dependencies within each set of 64 iterations

#pragma ivdep

for (unsigned char y = 0; y<64; y++) {

 

This issue is scheduled to be fixed in a future version of the Intel© OpenCL™ for FPGA SDK.