220.127.116.11. 764369: Data or Unified Cache Line Maintenance by MVA Fails on Inner-Shareable Memory
Under certain timing circumstances, a data or unified cache line maintenance operation by Modified Virtual Addresses (MVA) that targets an inner-shareable memory region might fail to propagate to either the Point of Coherency (PoC) or to the Point of Unification (PoU) of the system.
As a consequence, the visibility of the updated data might not be guaranteed to either the instruction side, in the case of self-modifying code, or to an external non-coherent agent, such as a DMA engine.
This erratum requires a dual Cortex* -A9 MPCore* device working in Symmetric Multi-Processing (SMP) mode with the broadcasting of CP15 maintenance operations enabled.
The following scenario shows how this erratum can occur:
- One CPU performs a data or unified cache line maintenance operation by MVA targeting a memory region that is locally dirty.
- The second CPU issues a memory request targeting this same memory location within the same time frame.
A race condition can occur, resulting in the cache operation not being performed to the specified Point of Unification or Point of Coherence.
- DCIMVAC: Invalidate data or unified cache line by MVA to PoC.
- DCCMVAC: Clean data or unified cache line by MVA to PoC.
- DCCMVAU: Clean data or unified cache line by MVA to PoU.
- DCCIMVAC: Clean and invalidate data or unified cache line by MVA to PoC.
This erratum can occur when the second CPU performs any of the following operations:
- A read request resulting from a Load instruction; the Load might be a speculative one.
- A write request resulting from any Store instruction.
- A data prefetch resulting from a PLD instruction; the PLD might be a speculative one.
Because it is uncertain whether execution of the cache maintenance operation propagates to either the Point of Unification or the Point of Coherence, stale data might remain in the data cache and not become visible to other agents that should have gained visibility to it.
Note that the data remains coherent on the L1 data side. Any data read from the other processor in the Cortex* -A9 MPCore* cluster, or from the Accelerator Coherency Port (ACP), would see the correct data. In the same way, any write to the same cache line from the other processor in the Cortex* -A9 MPCore* cluster, or from the ACP, does not cause a data corruption resulting from a loss of either data.
Consequently, the failure can only impact non-coherent agents in the system. These agents can be either the instruction cache of the processor, in the case of self-modifying code, or any non-coherent external agent in the system such as a DMA.
Two workarounds are available for this erratum.
The first workaround requires the three following elements to be applied together:
- Set bit in the undocumented SCU Diagnostic Control register located at offset 0x30 from the PERIPHBASE address. Setting this bit disables the “migratory line” feature and forces a dirty cache line to be evicted to the lower memory subsystem, which is both the Point of Coherency and the Point of Unification, when it is being read by another processor. Note that this bit can be written, but is always read as zero.
- Insert a DSB instruction before the cache maintenance operation. Note that if the cache maintenance operation executes within a loop that performs no other memory operations, ARM recommends only adding a DSB before entering the loop.
- Ensure there is no false sharing (on a cache line size alignment) for self-modifying code or for data produced for external non-coherent agent such as a DMA engine. For systems that cannot prevent false sharing in these regions, this third step can be replaced by performing the sequence of DSB followed by a cache maintenance operation twice.
Note that even when all three components of the workaround are in place, this erratum might still occur. However, this occurrence would require some extremely rare and complex timing conditions, so that the probability of reaching the point of failure is extremely low. This low probability, along with the fact that this erratum requires an uncommon software scenario, explains why this workaround is likely to be a reliable practical solution for most systems.
To ARM's knowledge, no failure has been observed in any system when all three components of this workaround have been implemented.
For critical systems that cannot cope with the extremely low failure risks associated with the above workaround, a second workaround is possible which involves changing the mapping of the data being accessed so that it is in a non-cacheable area. This ensures that the written data remains uncached, which means it is always visible to non-coherent agents in the system, or to the instruction side in the case of self-modifying code, without any need for cache maintenance operation.