Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

4.2. Control LSUs For Your Variable-Latency MM Host Interfaces

Controlling the type of load-store units (LSUs) that the Intel® HLS Compiler Pro Edition uses to interact with variable-latency Memory Mapped (MM) Host interfaces can help save area in your design. You might also encounter situations where disabling static coalescing of a load/store with other load/store operations benefits the performance of your design.

Review the following tutorial to learn about controlling LSUs: <quartus_installdir>/hls/examples/tutorials/best_practices/lsu_control.

To see if you need to use LSU controls, review the High-Level Design Reports for your component, especially the Function Memory Viewer, to see if the memory access pattern (and its associated LSUs) inferred by the Intel® HLS Compiler Pro Edition match your expected memory access pattern. If they do not match, consider controlling the LSU type, LSU coalescing, or both.

Control the Type of LSU Created

The Intel® HLS Compiler Pro Edition creates either burst-coalesced LSUs or pipelined LSUs.

In general, use burst-coalesced LSUs when an LSU is expected to process many load/store requests to memory words that are consecutive. The burst-coalesced LSU attempts to "dynamically coalesce" the requests into larger bursts in order to utilize memory bandwidth more efficiently.

The pipelined LSU consumes significantly less FPGA area, but processes load/store requests individually without any coalescing. This processing is useful when your design is tight on area or when the accesses to the variable-latency MM Host interface are not necessarily consecutive.

The following code example shows both types of LSU being implemented for a variable-latency MM Host interface:
component void
dut(mm_host<int, dwidth<128>, awidth<32>, aspace<4>, latency<0>> &Buff1,
    mm_host<int, dwidth<32>, awidth<32>, aspace<5>, latency<0>> &Buff2) {
  int Temp[SIZE];

  using pipelined = lsu<style<PIPELINED>>; 
  using burst_coalesced = lsu<style<BURST_COALESCED>>;

  for (int i = 0; i<SIZE; i++) {
    Temp[i] = burst_coalesced::load(&Buff1[i]); // Burst-Coalesced LSU
  }

  for (int i = 0; i<SIZE; i++) {
    pipelined::store(&Buff2[i], 2*Temp[i]); // Pipelined LSU
  }
}

Disable Static Coalescing

Static coalescing is typically beneficial because it reduces the total number of LSUs in your design by statically combining multiple load/store operations into wider load/store operations

However, there are cases where static coalescing leads to unaligned accesses, which you might not want to occur. There are also cases where multiple loads/stores get coalesced even though you intended for only a subset of them to be operational at a time. In these cases, consider disable static coalescing for the load/store operations that you did not want to be coalesced.

For the following code example, the Intel® HLS Compiler does not statically coalesce the two load operations into one wide load operation:
component int
dut(mm_host<int, dwidth<256>, awidth<32>, aspace<1>, latency<0>> &Buff1,
    int i, bool Cond1, bool Cond2) {

  using no_coalescing = lsu<style<PIPELINED>, static_coalescing<false>>;
  int Val = 0;
  if (Cond1) {
    Val = no_coalescing::load(&Buff1[i]);
  }
  if (Cond2) {
    Val = no_coalescing::load(&Buff1[i + 1]);
  }
  return Val;
}
If the two load operations were coalesced, an unaligned LSU would be created, which would hurt the throughput of your component.