AN 886: Intel Agilex® 7 Device Design Guidelines

ID 683634
Date 10/09/2023
Public
Document Table of Contents

5.1.8.2.4. Examples of Cacheable and Non-Cacheable Data Accesses From the FPGA

Example 1: FPGA Reading Non-Cache Coherent Data from HPS EMIF Directly

In this example the FPGA requires access to data that is stored in the HPS EMIF. For the FPGA to access the same copy of the data as the MPU has access to, the L1 data cache and L2 cache need to be flushed if they already have a copy of the data. Once the HPS EMIF contains the most up-to-date copy of the data, the optimal path for the FPGA to access this data is for FPGA masters to read the data through the FPGA-to-HPS bridge directly targeting the SDRAM.

Figure 9. FPGA Reading Non-Cache Coherent Data

The FPGA-to-HPS bridge can be optimized to maximize the read throughput by setting the bridge width according to your system requirements. Intel recommends to use a burst capable master in the FPGA to read from the SDRAM, capable of posting burst lengths of four beats or larger.

Example 2: FPGA Writing Non-Cache Coherent Data into HPS EMIF Directly

In this example the HPS MPU requires access to data that originates from within the FPGA. For the MPU to be able to access the data coherently after it is written, software may need to flush or invalidate cache lines before the transfer starts, to ensure that the SDRAM contains the latest data after it is written. Failing to perform cache operations can cause one or more cache lines to eventually become evicted overwriting the data that was written by the FPGA master.

Figure 10. FPGA Writing Non-Cache Coherent Data
Note: Like in "Example 1: FPGA Reading Data from HPS EMIF Directly", the FPGA-to-HPS bridge can be optimized to maximize the write throughput by setting the bridge width according to your system requirements.

Example 3: FPGA Reading Cache Coherent Data from HPS

In this example the FPGA requires access to data originating in the HPS. The MPU in the HPS recently accessed this data so there is a chance that the data is still contained in the cache and therefore it may be optimal for the FPGA to access the cached data. To avoid the overhead of software having to flush dirty cache lines the FPGA can perform cache coherent reads through the FPGA-to-HPS bridge. It is important that the buffers being read be relatively small. Otherwise, the L2 cache might thrash reading data from SDRAM for most of the transfer. For large buffer transfers it is more appropriate to have the FPGA read data through the FPGA-to-HPS bridge directly accessing the SDRAM, as shown in "Example 1: FPGA Reading Data from HPS EMIF Directly".

GUIDELINE: Perform full accesses targeting FPGA-to-HPS bridge.

For the transaction to be cacheable, the FPGA master must read from the FPGA-to-HPS bridge and utilize the cache extension signaling of the ACE-Lite protocol. For more information about the ACE-Lite protocol signaling extensions for cache coherent accesses, refer to the Cache Coherency Unit section in the Intel Agilex® 7 Hard Processor System Technical Reference Manual.

Figure 11. FPGA Reading Cache Coherent Data

GUIDELINE: Perform cacheable accesses aligned to 64 bytes targeting the FPGA-to-HPS bridge.

The CCU of the HPS is optimized for transactions that are the same size as the cache line (64 bytes). As a result, attempt to align the data to 64 byte boundaries and ensure after data width adaptation the burst length into the 512-bit FPGA-to-HPS bridge is maximized. For example, a 128-bit FPGA master aligns the data to be 64 byte aligned and perform full 128-bit (16-byte) accesses with a burst length of 4.

GUIDELINE: Access 64 bytes per cacheable transaction.

Ensure that each burst transaction accesses 64 bytes. Each transaction must start on a 64-byte boundary.

Table 27.  Burst Lengths for 64-byte Alignment
FPGA Master Width (Bits) Access Size (Bytes) Burst Length
32 4 16
64 8 8
128 16 4
256 32 2
512 64 1

Example 4: FPGA Writing Cache Coherent Data to HPS

In this example the HPS MPU requires access to data that originates in the FPGA. The most efficient mechanism for sharing small blocks of data with the MPU is to have logic in the FPGA perform cacheable writes to the HPS. It is important that the amount of data to be written to the HPS be in the form of relatively small blocks because large block writes cause the L2 cache to thrash, causing the cache to write to SDRAM for most of the transfer. For large buffer transfers it is more appropriate to have the FPGA write data to the FPGA-to-HPS bridge targeting SDRAM directly as shown in Example 2.

GUIDELINE: Perform full accesses targeting FPGA-to-HPS bridge.

For the transaction to be cacheable, the FPGA master must write to the FPGA-to-HPS bridge and utilize the cache extension signaling of the ACE-Lite protocol. For more information about the ACE-Lite protocol signaling extensions for cache coherent accesses, refer to the Cache Coherency Unit section in the Intel Agilex® 7 Hard Processor System Technical Reference Manual.

Figure 12. FPGA Writing Cache Coherent DataFor abbreviations, refer to the figure in Overview of HPS Memory-Mapped Interfaces.

GUIDELINE: When L2 ECC is enabled, ensure that cacheable accesses to the FPGA-to-HPS bridge are aligned to 8-byte boundaries.

If you enable error checking and correction (ECC) in the L2 cache you must also ensure each 8-byte group of data is completely written. The L2 cache performs ECC operations on 64-bit boundaries so when performing cacheable accesses you must always align the access to 8-byte boundaries and write to all eight lanes at once. Failing to follow these rules results in double bit errors, which cannot be recovered.

Regardless whether ECC is enabled or disabled, 64 byte cache transactions result in the best performance. For more information about 64 byte cache transactions, refer to GUIDELINE: Access 64 bytes per cacheable transaction in the "Example 3: FPGA Reading Cache Coherent Data from HPS" section.

GUIDELINE: When L2 ECC is enabled, ensure that cacheable accesses to the FPGA-to-HPS bridge have groups of eight write strobes enabled.

  • For FPGA-to-HPS accesses from 32-bit FPGA masters, burst length must be 2, 4, 8, or 16 with all write byte strobes enabled.
  • For FPGA-to-HPS accesses from 64-bit FPGA masters, all write byte strobes must be enabled.
  • For FPGA-to-HPS accesses from 128-bit FPGA masters, the upper eight or lower eight (or both) write byte strobes must be enabled.