Visible to Intel only — GUID: ewa1458581983424
Ixiasoft
Visible to Intel only — GUID: ewa1458581983424
Ixiasoft
8.4. Improving Kernel Performance by Banking the Local Memory
The following code example depicts an 8 x 4 local memory system that is implemented in a single bank. As a result, no two elements in the system can be accessed in parallel.
local int lmem[8][4];
#pragma unroll
for(int i = 0; i<4; i+=2) {
lmem[i][x] = …;
}
To improve performance, you can add numbanks(N) and bankwidth(M) in your code to define the number of memory banks and the bank widths in bytes. The following code implements eight memory banks, each 16-bytes wide. This memory bank configuration enables parallel memory accesses down the 8 x 4 array.
local int __attribute__((numbanks(8),
bankwidth(16)))
lmem[8][4];
#pragma unroll
for (int i = 0; i < 4; i+=2) {
lmem[i][x & 0x3] = …;
}
To enable parallel access, you must mask the dynamic access on the lower array index. Masking the dynamic access on the lower array index informs the Intel® FPGA SDK for OpenCL™ Offline Compiler that x does not exceed the lower index bounds.
By specifying different values for the numbanks(N) and bankwidth(M) kernel attributes, you can change the parallel access pattern. The following code implements four memory banks, each 4-bytes wide. This memory bank configuration enables parallel memory accesses across the 8 x 4 array.
local int __attribute__((numbanks(4),
bankwidth(4)))
lmem[8][4];
#pragma unroll
for (int i = 0; i < 4; i+=2) {
lmem[x][i] = …;
}