Intel Software: Intel FPGA SDK for OpenCL, Quartus Prime Pro, Quartus Prime Standard

Type: Answers

Area: Embedded, OpenCL


Last Modified: January 09, 2019
Version Found: v17.1
Bug ID: FB: 2206622154;

Why does my OpenCL kernel compilation fail to generate hardware even though the estimated resources are low?

Description

If your OpenCL™ kernel fails to generate hardware even though the estimated resources are low, the failure may be due to excessive unrolling of loops that access global memory.

Loops that access global memory should not be unrolled beyond where a read or write to global memory is wider than the memory interface in the BSP.  This will cause contention, routing congestion and may result in compilation failure. 

Workaround/Fix

The width of the external memory interfaces can be found in the board_spec.xml file in the OpenCL™ BSP.  Here is an example from the board_spec.xml of the Arria 10 GX development kit BSP. (a10_ref)

<!-- DDR4-2400 -->

  <global_mem name="DDR" max_bandwidth="19200" interleaved_bytes="1024" config_addr="0x018">

    <interface name="board" port="kernel_mem0" type="slave" width="512" maxburst="16" address="0x00000000" size="0x80000000" latency="240"/>

  </global_mem>


As you can see, the external memory interface width on this BSP is 512 bits. (width="512") Therefore, if a loop accesses global 32-bit integers, the loop should not be unrolled more than 16. (512 / 32 = 16)   

If the original loop count is not a multiple of 16:

1.      Round up the new loop count to a multiple of 16.

2.      Make any on-chip memories in the loop large enough to accommodate the new loop count

3.      Use conditionals to prevent reads or writes when the new loop count exceeds the original loop count