In Determining Cache Allocation Library Inputs
, the latency
values in the table indicate the worst-case buffer access times for
the cache. Worst-case buffer access time refers to the observed longest
amount of time for the application to access an element from the buffer
from its location in the memory hierarchy. Generally, the best-case
buffer access time will occur when the buffer resides in the L1 cache
and the system is nearly idle, and the worst-case buffer access time
will occur when the buffer resides in DRAM and the system is extremely
The latency values in the table are based on the worst possible
environment. The following factors contribute to this environment:
Memory access pattern of the real-time application
The time that it takes to access an element of the buffer is directly
related to where the element currently resides in the memory hierarchy.
Although the cache allocation library has the ability to “lock”
buffers into specific levels of the memory hierarchy, the actual level
where an element will be found is dependent on the way or pattern in
which the buffer is accessed. This buffer access pattern is the manner
in which the real-time application reads buffer elements. The
application can read buffer elements in an arbitrary manner. However, linear and random access are the limit cases and the spectrum of access lies between the two.
In addition to the spatial locality of the access pattern (linear
vs. random), the delay between references plays a part in where in the
memory hierarchy the buffer element would be found. Arithmetic intensity
is a measure of the amount of compute done on each element retrieved
from memory. A high arithmetic intensity would result in the CPU taking
longer to reference the next element due to the sheer number of
calculations that it must perform on the current element before
referencing the next element. A low arithmetic intensity would result in
near back-to-back memory references.
With a sufficiently high arithmetic intensity and a sufficiently linear
access pattern, the hardware prefetchers would be able to “read-ahead”
and prime the cache with the data.
In contrast, the worst-case access pattern is random pointer-chase.
Random pointer-chase is a linked list where the application is randomly
accessing data. Because of the random nature of the pointer-chase, there is
no spatial locality, and the data is generally not prefetched into cache.
As a result, each access of an element in the buffer incurs the latency
for whatever level of the memory hierarchy that the library “locked” the buffer
Load refers to the congestion of the system from a so-called
noisy-neighbor workload. The noisy neighbor impacts the buffer access
time whenever it is competing for shared resources with the function
attempting to access the buffer. Some of these shared resources could
Core resource congestion due to another hardware thread (Hyper
Translation Lookaside Buffer (TLB) congestion
Congestion in any shared path or buffers between the core and the
cache or memory
Congestion in the shared cache itself
Congestion in the Memory Controller
The worst-case interference exploits congestion of these shared
The cache allocation library uses latency values that were collected with the worst possible combination
of memory access pattern and load. It is expected that for conditions in your
environment, the actual measured access time will never exceed the worst
time from the table.