3.3.1. Changing the Memory Access Pattern Example

Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

Download PDF

ID 683521

Date 12/19/2022

Version

Public

Visible to Intel only — GUID: ahd1504718313058

Ixiasoft

View Details

3.3.1. Changing the Memory Access Pattern Example

The following is an example code of a simple OpenCL kernel:

kernel void big_lmem_4r_4w_nosplit (global int* restrict in, 
                                    global int* restrict out) {  
    local int lmem[4][1024]; 
 
    int gi = get_global_id(0);  
    int gs = get_global_size(0);  
    int li = get_local_id(0);  
    int ls = get_local_size(0);  
    int res = in[gi];              
    
    #pragma unroll   
    for (int i = 0; i < 4; i++) {          
         lmem[i][(li*i) % ls] = res;    
         res >>= 1;  }    

    // Global memory barrier
    barrier(CLK_GLOBAL_MEM_FENCE); 

    res = 0;  
    #pragma unroll   
    for (int i = 0; i < 4; i++) {    
        res ^= lmem[i][((ls-li)*i) % ls];  }     
    out[gi] = res;
}

In the System Viewer report, the system view of this example highlights the stallable loads and stores.

Figure 46. System View of the Example

Figure 47. Area Report of the Example

Figure 48. Kernel Memory Viewer of the Example

Observe that only two memory banks are created, with high arbitration on the first bank between load and store operations. Now, switch the banking indices to the second dimension, as shown in the following example code, :

kernel void big_lmem_4r_4w_nosplit (global int* restrict in, 
                                    global int* restrict out) {  
  local int lmem[1024][4];  

  int gi = get_global_id(0);  
  int gs = get_global_size(0);  
  int li = get_local_id(0);  
  int ls = get_local_size(0);  
  int res = in[gi];              
    
  #pragma unroll   
  for (int i = 0; i < 4; i++) {          
    lmem[(li*i) % ls][i] = res;    
    res >>= 1; 
  }    
    
  // Global memory barrier
  barrier(CLK_GLOBAL_MEM_FENCE); 
    
  res = 0;  
  #pragma unroll   
  for (int i = 0; i < 4; i++) {    
    res ^= lmem[((ls-li)*i) % ls][i];  
  }     
  out[gi] = res;
}

In the kernel memory viewer, you can observe that now four memory banks are created, with separate load store units. All load store instructions are stall-free.

Figure 49. Kernel Memory Viewer of the Example After Changing the Banking Indices

Figure 50. Area Report of the Example After Changing the Banking Indices

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

3.3.1. Changing the Memory Access Pattern Example