Intel® FPGA SDK for OpenCL™ Standard Edition: Best Practices Guide

ID 683176
Date 9/24/2018
Public
Document Table of Contents

7.5. Optimizing Accesses to Local Memory by Controlling the Memory Replication Factor

The memory replication factor is the number of M20K memory blocks that your design uses to implement the local memory system. To control the memory replication factor, include the singlepump or doublepump kernel attribute in your OpenCL™ kernel. The singlepump and doublepump kernel attributes are part 's advanced features.

's M20K memory blocks have two physical ports. The number of logical ports that are available in each M20K block depends on the degree of pumping. Pumping is a measure of the clock frequency of the M20K blocks relative to the rest of the design.

Consider an example design where the kernel specifies three read ports and one write port for the local memory system, lmem. As shown in the code example below, including the singlepump kernel attribute in the local variable declaration indicates that the M20K blocks will run at the same frequency as the rest of the design.

int __attribute__((memory,
                   numbanks(1),
                   bankwidth(64),
                   singlepump,
                   numreadports(3), 
                   numwriteports(1))) 
                   lmem[16];
Figure 83. Accesses to Single-Pumped M20K Memory Blocks

Each single-pumped M20K block will have two logical ports available. Each write port in the local memory system must be connected to all the M20K blocks that your design uses to implement the memory system. Each read port in the local memory system must be connected to one M20K block. Because of these connection constraints, there needs to be three M20K blocks to implement the specified number of ports in lmem.

If you include the doublepump kernel attribute in your local variable declaration, you specify that the M20K memory blocks will run at double the frequency as the rest of the design.


int __attribute__((memory,
                   numbanks(1),
                   bankwidth(64),
                   doublepump,
                   numreadports(3), 
                   numwriteports(1))) 
                   lmem[16];
Figure 84. Accesses to Double-Pumped M20K Memory Blocks

Each double-pumped M20K block will have four logical ports available. As such, there only needs to be one M20K block to implement all three read ports and one write port in lmem.

Attention:
  • Double pumping the memory increases resource overhead. Use the doublepump kernel attribute only if it results in actual M20K savings or improves performance, or both.
  • Stores must be connected to every replicate and must not suffer contention. Hence, if there are more than three stores, the memory is not replicated. Local memory replication works well with single store.
  • Because the entire memory system is replicated, you might observe potentially large M20K memory blocks.