Device Family: Intel® Arria® 10, Intel® Stratix® 10

Intel Software: Intel FPGA SDK for OpenCL, Quartus Prime Pro

Type: Answers

Area: Embedded, OpenCL


Last Modified: June 20, 2019
Version Found: v18.1
Bug ID: 1507240683, 1807446135

Why do I get bad performance when compiling vector add example design with Intel® FPGA SDK for OpenCL™?

Description

Due to a problem in the Intel® FPGA SDK for OpenCL™ version 18.1 and later,  you may get bad performance when you compile the same vector_add example design code. The performance is as follows.

Intel® FPGA SDK for OpenCL™ version

Performance

V16.1

V18.0

V18.1

V19.1

~3ms

~3ms

~170ms

~170ms

 

Workaround/Fix

To work around this problem, add an attribute  to vector_add.cl which sets the required work group size.

  __attribute__((reqd_work_group_size(1, 1, 1)))
  __kernel void vector_add(__global const float *x, 
                           __global const float *y, 
                           __global float *restrict z)
  {
      // get index of the work item
      int index = get_global_id(0);
      // add the vector elements
      z[index] = x[index] + y[index];
  }

The problem is scheduled to be fixed in a future release of the the Intel® FPGA SDK for OpenCL™.