Developer Guide


Perform Kernel Computations Using Local or Private Memory

To optimize memory access efficiency, minimize the number of global memory accesses by performing kernel computations in local or private memory.
To minimize global memory accesses, it is often best to preload data from a group of computations from global memory to a local or private memory. Perform kernel computations on the preloaded data and write the results back to the global memory.

Preload Data into Local Memory or Private Memory

Local memory is considerably smaller than global memory, but it has significantly higher bandwidth and much lower latency. Unlike global memory accesses, the kernel can access local memory randomly without any performance penalty. When you structure your kernel code, attempt to access the global memory sequentially, and buffer that data in on-chip local memory before your kernel uses the data for computation.

Store Variables and Arrays in Private Memory

Intel® oneAPI
implements private memory using FPGA registers in the kernel datapath, block RAMs, or MLABs. The
Intel® oneAPI
analyzes the private memory accesses and promotes them to register accesses. Scalar variables, for example
, are typically promoted. Aggregate data types are promoted if array-access indices are compile-time constants. Typically, private memory is useful for storing single variables or small arrays. Registers are plentiful hardware resources in FPGAs, and it is usually better to use private memory instead of other memory types whenever possible. The kernel can access private memories in parallel, allowing them to provide more bandwidth than any other memory type (global and local).
For more information on the implementation of private memory using registers, refer to Inferring a Shift Register.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at