A more subtle cost is
. Processors keep recently accessed
data in cache memory, which is very fast, but also relatively small
compared to main memory. When the processor runs out of cache memory, it
has to evict items from cache and put them back into main memory.
Typically, it chooses the least recently used items in the cache. (The
reality of set-associative caches is a bit more complicated, but this is
not a cache primer.) When a logical thread gets its time slice, as it
references a piece of data for the first time, this data will be pulled
into cache, taking hundreds of cycles. If it is referenced frequently
enough to not be evicted, each subsequent reference will find it in
cache, and only take a few cycles. Such data is called “hot in cache”.
Time slicing undoes this, because if a thread A finishes its time slice,
and subsequently thread B runs on the same physical thread, B will tend
to evict data that was hot in cache for A, unless both threads need the
data. When thread A gets its next time slice, it will need to reload
evicted data, at the cost of hundreds of cycles for each cache miss. Or
worse yet, the next time slice for thread A may be on a different
physical thread that has a different cache altogether.