Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

8.5. Optimizing Accesses to Local Memory by Controlling the Memory Replication Factor

The memory replication factor is the number of M20K memory blocks that your design uses to implement the local memory system. To control the memory replication factor, use the max_replicates kernel attribute in your OpenCL™ kernel.

Intel® 's M20K memory blocks have two physical ports. The number of logical ports that are available in each M20K block depends on the degree of pumping. Pumping is a measure of the clock frequency of the M20K blocks relative to the rest of the design.

Consider the following code example where the singlepump attribute is applied to a local memory system, lmem, which has three read accesses and one write access. The singlepump attribute indicates that the M20K blocks runs at the same frequency as the rest of the design.

__kernel void three_copies(int raddr, int waddr) {
    int __attribute__((memory,
                       numbanks(1),
                       singlepump,
                       max_replicates(3)))
                       lmem[16];

    lmem[waddr] = lmem[raddr] + lmem[raddr + 1] + lmem[raddr + 2];
    // do something with lmem
}
Figure 85. Accesses to Single-Pumped M20K Memory Blocks
The compiler creates an arbitration-free network, as shown in Figure 85. Each single-pumped M20K block has two logical ports available. Each write port in the local memory system must be connected to all M20K blocks that your design uses to implement the memory system. Each read port in the local memory system must be connected to one M20K block. Because of these connection constraints, there must be three M20K blocks to implement the specified number of ports in lmem.
Note: If you change max_replicates(3) to max_replicates(1), you observes one M20K block with arbitration between the three reads.

If you include the doublepump kernel attribute in your local variable declaration, you specify that the M20K memory blocks runs at double the frequency as the rest of the design.

__kernel void three_copies(int raddr, int waddr) {
    int __attribute__((memory,
                       numbanks(1),
                       doublepump))
                       lmem[16];

    lmem[waddr] = lmem[raddr] + lmem[raddr + 1] + lmem[raddr + 2];
    // do something with lmem
}
Figure 86. Accesses to Double-Pumped M20K Memory Blocks

Each double-pumped M20K block has four logical ports available. As such, there only needs to be one M20K block to implement three read ports and one write port in lmem.

Attention:
  • Double pumping the memory increases resource overhead. Use the doublepump kernel attribute only if it results in actual M20K savings, improves performance, or both.
  • Stores must be connected to every replicate. Hence, if there are more than three stores, the memory is not replicated. Local memory replication works well with single store.
  • Because the entire memory system is replicated, you might observe potentially large M20K memory blocks.