- Provide low-latency buffer access (via software SRAM buffers) to real-time applications running on the system (via the cache allocation library API).
- Provide mechanisms to improve the worst-case execution time (WCET).
- Minimize the impact the GPU has on real-time applications running on the CPU cores.
- Partition the shared cache resources among various components using the cache (such as CPU, GPU, or I/O), referred to in this guide ascaching agents.
- Select from a variety of preset cache partitioning schemes. The presets provide varying levels of cache isolation and software SRAM to cover the most typical scenarios and it is highly probable that available presets will be suitable for your use case.
- Create a custom partitioning scheme. If you need a custom or more flexible setup, the tool offers an interactive interface to guide you through the process of adding or deleting software SRAM, as well as dividing the remaining cache among caching agents.
What Is a Cache Partitioning Scheme?
- First, determine how much of the cache should be reserved for software SRAM regions. Once cache space is reserved for software SRAM, it is no longer available to the rest of the system and is only accessible via the Cache Allocation Library.
- Determine how to partition the remaining cache between CPU cores, GPU, and I/O. Considerations:
- Sharing cache between multiple caching agents (CPU cores, GPU, and I/O) generally leads to increased jitter under loaded conditions.
- Isolating cores, GPU, and I/O will improve the noisy neighbor effect.
- If App1 and App2 are affinitized to Core 1 and Core 2, respectively, consider using Classes of Service to differentiate the cache space available to each core. Intel supports multiple Classes of Service which enable Core 1 to have a potentially separate, non-overlapping cache region compared with Core 2. If App1 is a real-time application, having dedicated cache space may be desireable to minimize the impact App2 has on App1’s performance.
- In several Intel® Core™ processors on Intel® Core™ processor-based products, real-time I/O traffic (designated via Traffic Class 1) can allocate directly into the L3 cache. If the I/O traffic is time sensitive, it will be faster for the CPU to access the data if the data resides in the cache instead of DRAM. Consider allocating a small portion of the cache for I/O traffic. See the list of processors supporting this feature below:
- Intel® Xeon® W-11865MRE Processor
- Intel® Xeon® W-11865MLE Processor
- Intel® Xeon® W-11555MRE Processor
- Intel® Xeon® W-11555MLE Processor
- Intel® Xeon® W-11155MRE Processor
- Intel® Xeon® W-11155MLE Processor
- 11th Generation Intel® Core™ i3-1115GRE Processor
- 11th Generation Intel® Core™ i5-1145GRE Processor
- 11th Generation Intel® Core™ i7-1185GRE Processor
- If the integrated GPU is going to be used, consider minimizing the portion of the L3 cache available to the GPU. By default, Intel enables maximum GPU performance by providing access to the entire L3. For real-time designs, maximum GPU performance is often not needed and a smaller portion of the L3 cache can be used. Careful selection of the cache available to the GPU, ensuring no overlap with regions dedicated to real-time applications, will improve the noisy neighbor effect of the GPU.
Cache Configurator and Software SRAM Setting
Cache Configurator and Cache Allocation Library
- Size of the buffer required
- Worst-case access latency for a single element in the buffer
- Real-time configuration driver at the OS level.
- Real-time configuration manager (RTCM) with the cache reservation library (CRL), or a hypervisor that supports CRL.
- Real-time configuration data (RTCD) and real-time configuration table (RTCT) at the BIOS level.