Developer Guide

  • 2022.1
  • 09/08/2022
  • Public

Cache Locking with Software SRAM

Software SRAM is a software construct that uses hardware capabilities to allocate a portion of the physical address space into the cache. Once a software SRAM buffer is created, the addresses are protected from cache eviction by the same or other processes. The software SRAM needs to be configured before using the cache allocation library to allocate low-latency buffers.
The application uses the real-time configuration driver to get access to the software SRAM segment already locked during system boot. Applications can allocate memory from this segment with the size defined in the
file with 4 KiB (default) page granularity. The driver supports memory allocation for multiple applications simultaneously from one or more software SRAM segments.
By default sizes of caches are set to zeros, which means all allocated cache is available for applications. You can replace zeros with real numbers to limit caches available for an application.
The following image shows the cache architecture of an example processor. The example contains four cores and two levels of cache. Each core has a private L1 cache instance. Cores 0 and 1 share one L2 cache instance, while Cores 2 and 3 share the other L2 cache instance.
The affinity mask of the application should match the cores that were selected during software SRAM configuration. The following image shows an application that runs on Core 1 and the software SRAM segment created in the associated L2 cache instance. You will not be able to access this software SRAM segment from Core 2 or 3 with requested latency. Use the cpuid argument of the
call to “pin” your application to a specific core.
The cache allocation library takes requirements from the application and allocates memory from the correct software SRAM segment according to the affinity and latency value described in How the System Allocates Buffers. If the library cannot find the applicable software SRAM segment or there is no more memory in any applicable segment, the library returns a NULL pointer.
Software SRAM buffers that are created in the shared L3 cache may be accessed from any core. However, extra care must be taken for software SRAM buffers that are created in L2 caches to prevent unintended eviction of the software SRAM buffer. This requires coordination between the application and OS scheduler to assign affinities appropriate for the application based on the L2 software SRAM region it intends to access. If the application is subsequently migrated to a different core which does not share the same L2 cache, and continues to access the L2 software SRAM buffer, the performance of the software SRAM may diminish.
The following image shows how the library can work with multiple applications. To understand how it works, imagine three software SRAM segments and three applications on different cores.
The software SRAM segment0 has a CPU affinity mask that can be applicable for core0 and core1 and cannot be applicable for core2 and core3. The software SRAM segment1 can be applicable for all cores, but latency for this segment is 159 ns. The software SRAM segment2 is applicable for core2 and core3. A real system can also have a segment that applies to one core only.
App0, which runs on core1, allocates memory from segment0 and segment1. App1, which runs on core2, allocates memory from segment1 and segment2. App2, which runs on core3, allocates memory from segment2.
App0 cannot allocate memory from segment2 because CPU mask bits for the segment are zero. App1 and App2 cannot allocate memory from segment0. But all applications can allocate memory from segment1, because all bits in the CPU mask are set.
When the cache allocation library allocates memory according to user requirements, it chooses the slowest memory that can satisfy latency and size requirements. As an example, if App1 requests memory with 180 ns latency, segment0 and segment1 satisfy that requirement. If segment1 has enough memory, the buffer will be allocated in segment1 because it has higher latency. If segment1 does not have enough memory, the next faster segment will be chosen and so on.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at