AN 886: Intel® Agilex™ Device Design Guidelines

ID 683634
Date 8/26/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.1.8.2.1. System Level Cache Coherency

Table 26.  Device Variant Checklist
Number Done? Checklist Item
1   Consider how many and which masters to use.
2   Determine how to manage cacheable accesses.
Consider how many masters and what masters to use:
  • MPU
  • DMA
  • Peripherals with master interfaces
  • Masters in FPGA connected to HPS.

    Cache coherency is a fundamental topic to understand any time data must be shared amongst multiple masters in a system. In the context of a SoC device these masters can be the MPU, DMA, peripherals with master interfaces, and masters in the FPGA connected to the HPS. Since the MPU contains level 1 and level 2 cache controllers, it can hold more up-to-date contents than main memory in the system. The HPS supports two mechanisms to make sure masters in the system observe a coherent view of memory: ensuring main memory contains the latest value, or have masters access a directory-based CCU fabric using the ACE-Lite interface.

    The MPU can allocate buffers to be non-cacheable which ensures data is never cached by the L1 and L2 caches. The MPU can also access cacheable data and either flush it to main memory or copy it to a non-cacheable buffer before other masters attempt to access the data. Operating systems typically provide mechanisms for maintaining cache coherency both ways described above.

    Masters in the system access coherent data by either relying on the MPU to place data into main memory instead of having it cached, or by having the master in the system perform a cacheable access through the CCU. The mechanism you use depends on the size of the buffer of memory the master is accessing.

    For more information, refer to the Interfacing to the FPGA section.

GUIDELINE: Ensure that data accessed through the CCU fits in the 1 MB L2 cache to avoid thrashing overhead.

Since the L2 cache is 1 MB in size, if a master in the system frequently accesses buffers whose total size exceeds 1 MB, thrashing results.

Cache thrashing is a situation where the size of the data exceeds the size of the cache, causing the cache to perform frequent evictions and prefetches to main memory. Thrashing negates the performance benefits of caching the data.

In potential thrashing situation, it makes more sense to have the masters access non-cache coherent data and allow software executing on the MPU maintain the data coherency throughout the system.

GUIDELINE: For small buffers of data shared between the MPU and system masters, consider having the system master perform cacheable accesses to avoid overhead caused by cache flushing operations.

If a master in the system requires access to smaller coherent blocks of data then you should consider having the MPU access the buffer as cacheable memory and the master in the system perform cacheable accesses to the data. Cacheable accesses to the CCU through the ACE-Lite protocol supported by the FPGA-to-HPS bridge ensure that the master and MPU access the same copy of the data. By having the MPU use cacheable buffers and the system master performing cacheable accesses, software does not have to maintain system wide coherency ensuring both the MPU and system master observe the same copy of data.