Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/19/2022
Document Table of Contents

6.2.2. Partitioning Buffers Across Different Memory Types (Heterogeneous Memory)

The board support package for your FPGA board can assemble a global memory space consisting of different memory technologies (for example, DRAM or SRAM). The board support package designates one such memory, which might consist of multiple interfaces, as the default memory. All buffers reside there.
Tip: For oneAPI SYCL-specific instructions, refer to Global Memory Accesses Optimization topic in the FPGA Optimization Guide for Intel® oneAPI Toolkits.

To use the heterogeneous memory, modify the code in your .cl file as follows:

  1. Determine the names of the global memory types available on your FPGA board in one of the following ways:
    • Refer to the board vendor's documentation for more information.
    • Find the names in the board_spec.xml file of your board Custom Platform. For each global memory type, the name is the unique string assigned to the name attribute of the global_mem element.
  2. To instruct the host to allocate a buffer to a specific global memory type, insert the buffer_location("<memory_type>") attribute, where <memory_type> is the name of the global memory type provided by your board vendor.
    For example:
    __kernel void foo(__global __attribute__((buffer_location("DDR"))) int *x,
                      __global __attribute__((buffer_location("QDR"))) int *y)
    If you do not specify the buffer_location attribute, the host allocates the buffer to the default memory type automatically. To determine the default memory type, consult the documentation provided by your board vendor. Alternatively, in the board_spec.xml file of your Custom Platform, search for the memory type that is defined first or has the attribute default=1 assigned to it.
    Intel® recommends that you define the buffer_location attribute in a preprocessor macro for ease of reuse, as follows:
    #define QDR\
    __global __attribute__((buffer_location("QDR")))
    #define DDR\
    __global __attribute__((buffer_location("DDR")))
    __kernel void foo (QDR uint * data, DDR uint * lup)
    Attention: If you assign a kernel argument to a non-default memory (for example, QDR uint * data and DDR uint * lup from the code above), you cannot declare that argument using the constant keyword. In addition, you cannot perform atomic operations with pointers derived from that argument.

By default, the host allocates buffers into the main memory when you load kernels into the OpenCL runtime via the clCreateProgramWithBinary function. During kernel invocation, the host automatically relocates heterogeneous memory buffers that are bound to kernel arguments to the main memory.

  1. To avoid the initial allocation of heterogeneous memory buffers in the main memory, include the CL_MEM_HETEROGENEOUS_INTELFPGA flag when you call the clCreateBuffer function. Also, bind the cl_mem buffer to the argument that used the buffer_location attribute using clSetKernelArg before doing any reads or writes from that buffer, as follows:
    mem = clCreateBuffer(context, flags|CL_MEM_HETEROGENEOUS_INTELFPGA,
                         memSize, NULL, &errNum);
    clSetKernelArg(kernel, 0, sizeof(cl_mem), &mem);
    clEnqueueWriteBuffer(queue, mem, CL_FALSE, 0, N, 0, NULL, &write_event);
    clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, NULL,
                           0, NULL, &kernel_event);

    For example, the following clCreateBuffer call allocates memory into the lowest available memory region of a nondefault memory bank:

                         memSize, NULL, &errNum);
    Note: Host programs using CL_MEM_HETEROGENEOUS_INTELFPGA and CL_CHANNEL_*_INTELFPGA flags must include the CL/cl_ext_intelfpga.h header file.

    The clCreateBuffer call allocates memory into a certain global memory type based on what you specify in the kernel argument. If a memory (cl_mem) object residing in a memory type is set as a kernel argument that corresponds to a different memory technology, the host moves the memory object automatically when it queues the kernel. Do not pass a buffer as kernel arguments that associate it with multiple memory technologies.

For more information about optimizing heterogeneous global memory accesses, refer to the Heterogeneous Memory Buffers and the Manual Partitioning of Global Memory sections of the Intel® FPGA SDK for OpenCL™ Best Practices Guide.