Perform Kernel Computations Using Local or Private Memory

Developer Guide

Intel oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs

Download PDF

ID 785441

Date 5/08/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-9561B197-435C-4671-B646-AF9C5D29D047

View Details

Perform Kernel Computations Using Local or Private Memory

To optimize memory access efficiency, minimize the number of global memory accesses by performing kernel computations in local or private memory.

To minimize global memory accesses, it is often best to preload data from a group of computations from global memory to a local or private memory. Perform kernel computations on the preloaded data and write the results back to the global memory.

Preload Data into Local Memory or Private Memory

When you structure your kernel code, if your global memory accesses are not sequential, consider refactoring your code to access global memory sequentially while buffering that data in local or private memory before using the data for computation. This can be beneficial for performance since the Intel® oneAPI DPC++/C++ Compiler implements local and private memory on-chip whereas global memory is off-chip for most platforms. On-chip memory is smaller than off-chip memory, but it significantly has higher bandwidth and much lower latency. Additionally, on-chip memory is more effective with random access memory patterns than off-chip. For more information, refer to Memory Accesses and Memory Attributes.

Store Variables and Arrays in Private Memory

The Intel® oneAPI DPC++/C++ Compiler implements private memory using FPGA registers in the kernel datapath, block RAMs, or MLABs.

Aggregate data types are also implemented in the registers if array-access indices are compile-time constants. Typically, private memory is useful for storing single variables or small arrays. Otherwise, the compiler uses block RAMs or MLABs.

Registers are ample hardware resources in FPGAs, andyou should use them with private memory instead of other memory types whenever possible. If a variable is implemented in registers, it can be accessed in parallel across the datapath, as each stage of the pipeline will have its own copy of the data.

For more information on the implementation of private memory using registers, refer to Inferring a Shift Register.

Parent topic: Memory Accesses

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs

Perform Kernel Computations Using Local or Private Memory

Preload Data into Local Memory or Private Memory

Store Variables and Arrays in Private Memory