Generally, prefer software-controlled prefetch in situations where all the following are true: irregular access patterns are present, short arrays must be prefetched, and making changes to existing application code is acceptable. In practice, the individual advantages and disadvantages of hardware and software prefetching must be weighed against the needs of an individual situation.
The software-controlled prefetch is not intended for prefetching code. Using it can incur significant penalties on a multiprocessor system when code is shared.
Software prefetching has the following characteristics:
- Can handle irregular access patterns, which do not trigger the hardware prefetcher.
- Can use less bus bandwidth than hardware prefetching.
- Software prefetches must be added to new code, and they do not benefit existing applications.
There are different strengths and weaknesses to software and hardware prefetching on the Pentium 4 processor. The characteristics of the hardware prefetching are as follows (compare with the software prefetching features listed above):
- Works with existing applications.
- Requires regular access patterns.
- Start-up penalty before hardware prefetcher triggers and extra fetches after array finishes. For short arrays, this overhead can reduce effectiveness of the hardware prefetcher.
- The hardware prefetcher requires a couple of misses before it starts operating.
- Hardware prefetching will generate a request for data beyond the end of an array, which will not be utilized. This behavior wastes bus bandwidth. In addition, this behavior results in a start-up penalty when fetching the beginning of the next array; this occurs because the wasted prefetch should have been used instead to hide the latency for the initial data in the next array. Software prefetching can recognize and handle these cases.
- Will not prefetch across a 4K page boundary (i.e., the program would have to initiate demand loads for the new page before the hardware prefetcher will start prefetching from the new page).