How the System Allocates Buffers
The cache allocation library performs buffer allocation in L2 or L3 cache by accessing an already prepared software SRAM buffer.
The library processes the latency parameter by comparing it to the latency values received from an OS driver called the
real-time configuration driver
.Using the following values as an example (11th Gen Intel® Core™ processors), the library allocates buffers as follows:

- If the latency parameter is lower than 6 ns, the library cannot create a buffer because the latency is too low. The library returns a null pointer.
- If the latency parameter is in the range of 6 to 48 ns, the library allocates a buffer in L2.
- If the latency parameter is in the range of 49 to 113 ns, the library allocates a buffer in L3.
- If the latency parameter is 114 ns or more, the library allocates a buffer in DRAM using standard memory functions (malloc, calloc, realloc, free).
- If you were to move your application to a platform with only L2 cache and DRAM, the library would allocate a buffer in L2 for any latency request in the L3 range.
The library processes the buffer size parameter by comparing it to the values in the config file,
config/.tcc.config
. The buffer size values in the config file are the maximum size allowed for one application.The total size of a software SRAM buffer is defined using Cache Configurator and may be shared by multiple applications. Increasing the buffer size value in the configuration file beyond the software SRAM size set in Cache Configurator will have no effect.
By default sizes of caches are set to zeros, which means all allocated cache is available for applications. You can replace zeros with real numbers to limit caches available for an application.
The configuration file is shared among all applications that access software SRAM. The buffer size value can be used to limit software SRAM consumption by one application and to prevent any application from taking all software SRAM.
Buffer size values for DRAM are an internal mechanism and have no limiting effect on the requested buffer size.
The following table shows a summary of the default buffer size values in the config file.
Maximum L2 Buffer Size (bytes) | Maximum L3 Buffer Size (bytes) | Maximum DRAM Buffer Size (bytes) |
---|---|---|
98304 | 262144 | Unlimited |
You can allocate multiple buffers in one application up to the size in
.tcc.config
. The library takes 32b for service data from the empty buffer and 8b per allocation. Therefore, each allocation is the requested buffer size + 8b.The current implementation does not support memory defragmentation. After
allocating and freeing multiple buffers, the software SRAM may become highly
fragmented, making it impossible to allocate large buffers. Intel does not
address fragmentation as it is anticipated that software designs using
software SRAM preallocate memory upfront for the lifetime of
the application.
Buffer Performance
In Determining Cache Allocation Library Inputs, the latency
values in the table indicate the worst-case buffer access times for
the cache. Worst-case buffer access time refers to the observed longest
amount of time for the application to access an element from the buffer
from its location in the memory hierarchy. Generally, the best-case
buffer access time will occur when the buffer resides in the L1 cache
and the system is nearly idle, and the worst-case buffer access time
will occur when the buffer resides in DRAM and the system is extremely
congested.
The latency values in the table are based on the worst possible
environment. The following factors contribute to this environment:
- Memory access pattern of the real-time application
- Load
Memory Access Pattern
The time that it takes to access an element of the buffer is directly
related to where the element currently resides in the memory hierarchy.
Although the cache allocation library has the ability to “lock”
buffers into specific levels of the memory hierarchy, the actual level
where an element will be found is dependent on the way or pattern in
which the buffer is accessed. This buffer access pattern is the manner
in which the real-time application reads buffer elements. The
application can read buffer elements in an arbitrary manner. However, linear and random access are the limit cases and the spectrum of access lies between the two.
In addition to the spatial locality of the access pattern (linear
vs. random), the delay between references plays a part in where in the
memory hierarchy the buffer element would be found. Arithmetic intensity
is a measure of the amount of compute done on each element retrieved
from memory. A high arithmetic intensity would result in the CPU taking
longer to reference the next element due to the sheer number of
calculations that it must perform on the current element before
referencing the next element. A low arithmetic intensity would result in
near back-to-back memory references.
With a sufficiently high arithmetic intensity and a sufficiently linear
access pattern, the hardware prefetchers would be able to “read-ahead”
and prime the cache with the data.

In contrast, the worst-case access pattern is random pointer-chase.

Random pointer-chase is a linked list where the application is randomly
accessing data. Because of the random nature of the pointer-chase, there is
no spatial locality, and the data is generally not prefetched into cache.
As a result, each access of an element in the buffer incurs the latency
for whatever level of the memory hierarchy that the library “locked” the buffer
into.
Load
Load refers to the congestion of the system from a so-called
noisy-neighbor workload. The noisy neighbor impacts the buffer access
time whenever it is competing for shared resources with the function
attempting to access the buffer. Some of these shared resources could
include:
- Core resource congestion due to another hardware thread (Hyper Threading)
- Translation Lookaside Buffer (TLB) congestion
- Congestion in any shared path or buffers between the core and the cache or memory
- Congestion in the shared cache itself
- Congestion in the Memory Controller
The worst-case interference exploits congestion of these shared
resources.
Summary
The cache allocation library uses latency values that were collected with the worst possible combination
of memory access pattern and load. It is expected that for conditions in your
environment, the actual measured access time will never exceed the worst
time from the table.