AN 802: Intel® Stratix® 10 SoC Device Design Guidelines

ID 683117
Date 12/14/2020
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

3.2.3.4. Example 4: FPGA Writing Cache Coherent Data to HPS

In this example the HPS MPU requires access to data that originates in the FPGA. The most efficient mechanism for sharing small blocks of data with the MPU is to have logic in the FPGA perform cacheable writes to the HPS. It is important that the amount of data to be written to the HPS be in the form of relatively small blocks because large block writes cause the L2 cache to thrash, causing the cache to write to SDRAM for most of the transfer. For large buffer transfers it is more appropriate to have the FPGA write data to the FPGA-to-SDRAM ports directly as shown in Example 2.

GUIDELINE: Perform full accesses targeting FPGA-to-HPS bridge.

For the transaction to be cacheable, the FPGA master must write to the FPGA-to-HPS bridge and utilize the cache extension signaling of the ACE-Lite protocol. See the Related Information for details on the ACE-Lite protocol signaling extensions for cache coherent accesses.

Figure 16. FPGA Writing Cache Coherent DataFor abbreviations, refer to the figure in Overview of HPS Memory-Mapped Interfaces.

GUIDELINE: Perform cacheable accesses aligned to 32 bytes targeting the FPGA-to-HPS bridge.

The CCU slave of the HPS is optimized for transactions that are the same size as the cache line (32 bytes). As a result you should attempt to align the data to 32 byte boundaries and ensure after data width adaptation the burst length into the 64-bit CCU slave is four beats long. For example, if the FPGA-to-HPS bridge is set up for 128-bit transactions you should align the data to be 32 byte aligned and perform full 128-bit accesses with a burst length of 2.

GUIDELINE: When L2 ECC is enabled, ensure that cacheable accesses to the FPGA-to-HPS bridge are aligned to 8-byte boundaries.

If you enable error checking and correction (ECC) in the L2 cache you must also ensure each 8-byte group of data is completely written. The L2 cache performs ECC operations on 64-bit boundaries so when performing cacheable accesses you must always align the access to 8-byte boundaries and write to all eight lanes at once. Failing to follow these rules results in double bit errors, which cannot be recovered.

Regardless whether ECC is enabled or disabled, 64 byte cache transactions result in the best performance. For more information about 64 byte cache transactions, refer to GUIDELINE: Access 64 bytes per cacheable transaction. in the "Example 3: FPGA Reading Cache Coherent Data from HPS" section.

GUIDELINE: When L2 ECC is enabled, ensure that cacheable accesses to the FPGA-to-HPS bridge have groups of eight write strobes enabled.

  • For FPGA-to-HPS accesses from 32-bit FPGA masters, burst length must be 2, 4, 8, or 16 with all write byte strobes enabled.
  • For FPGA-to-HPS accesses from 64-bit FPGA masters, all write byte strobes must be enabled.
  • For FPGA-to-HPS accesses from 128-bit FPGA masters, the upper eight or lower eight (or both) write byte strobes must be enabled.