Measuring Memory Bandwidth on Intel® Xeon® Processor 7500 Platform
This paper shows how to reproduce memory bandwidth measurements for the Intel® Xeon® processor 7500 series platform, and why the result of the STREAM benchmark doesn’t always answer the question “What is the maximum memory bandwidth of this platform?”
Introduction: The STREAM benchmark was created by John McCalpin while at the University of Virginia. Details about the benchmark, source code and some binaries are available here: http://www.cs.virginia.edu/stream/ It is generally accepted that the best measure of a platform’s memory bandwidth is the STREAM Triad result, which performs the operation: a(i) = b(i) + q*c(i) where i = the number of iterations of a matrix. The matrix is sized to be at least 2X the size of the largest cache in the system, so that the data is always read/written from main memory and not from on-die caches. Each iteration of the STREAM Triad performs two reads from memory (for “b(i)” and “q*c(i)”) and then one write (writing the result, a(i) to memory) resulting in a total of 24 bytes of data being transferred. The memory bandwidth is then calculated by how many iterations can be accomplished in a given amount of time. 1 million iterations in 1 second would result in a score of 24MB/s.
Does STREAM tell the whole story? As noted above, the STREAM Triad performs two reads, and one write to memory. If additional data is transferred to/from memory, besides these three transactions, the additional data transferred is not counted in the memory bandwidth result calculated by the benchmark. Thus the memory bandwidth of the platform may actually be higher than what the STREAM benchmark reports.
This is important to note, since the cache coherency protocol for most processors, will not allow you to write a cache line to memory, without first reading it. The read is done before the write, to ensure no one else has a copy of that particular cache line and the processor writing the cache line has ownership of the line. When running the STREAM benchmark, if the processor must first read the cache line before writing it, you will effectively be doing three reads, and one write to memory. However, only two reads, and one write will be counted towards your STREAM bandwidth score.
Read the full Measuring Memory Bandwidth on Intel® Xeon® Processor 7500 Platform.
48KB
很抱歉,此 PDF 仅供下载
Measuring Memory Bandwidth on Intel® Xeon® Processor 7500 Platform
This paper shows how to reproduce memory bandwidth measurements for the Intel® Xeon® processor 7500 series platform, and why the result of the STREAM benchmark doesn’t always answer the question “What is the maximum memory bandwidth of this platform?”
Introduction: The STREAM benchmark was created by John McCalpin while at the University of Virginia. Details about the benchmark, source code and some binaries are available here: http://www.cs.virginia.edu/stream/ It is generally accepted that the best measure of a platform’s memory bandwidth is the STREAM Triad result, which performs the operation: a(i) = b(i) + q*c(i) where i = the number of iterations of a matrix. The matrix is sized to be at least 2X the size of the largest cache in the system, so that the data is always read/written from main memory and not from on-die caches. Each iteration of the STREAM Triad performs two reads from memory (for “b(i)” and “q*c(i)”) and then one write (writing the result, a(i) to memory) resulting in a total of 24 bytes of data being transferred. The memory bandwidth is then calculated by how many iterations can be accomplished in a given amount of time. 1 million iterations in 1 second would result in a score of 24MB/s.
Does STREAM tell the whole story? As noted above, the STREAM Triad performs two reads, and one write to memory. If additional data is transferred to/from memory, besides these three transactions, the additional data transferred is not counted in the memory bandwidth result calculated by the benchmark. Thus the memory bandwidth of the platform may actually be higher than what the STREAM benchmark reports.
This is important to note, since the cache coherency protocol for most processors, will not allow you to write a cache line to memory, without first reading it. The read is done before the write, to ensure no one else has a copy of that particular cache line and the processor writing the cache line has ownership of the line. When running the STREAM benchmark, if the processor must first read the cache line before writing it, you will effectively be doing three reads, and one write to memory. However, only two reads, and one write will be counted towards your STREAM bandwidth score.
Read the full Measuring Memory Bandwidth on Intel® Xeon® Processor 7500 Platform.


