Technology & Research

Intel® Technology Journal Home

Volume 11, Issue 03

Tera-scale Computing


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1103.08

  • Volume 11
  • Issue 03
  • Published August 22, 2007

Tera-scale Computing

  Section 6 of 10  

High-Performance Physical Simulations on Next-Generation Architecture with Many Cores

IMPLICATIONS FOR THE MEMORY SUBSYSTEM

Memory bandwidth requirements grow proportionally to the number of cores on a multi-core chip. Furthermore, as applications and workloads evolve, memory bandwidth requirements are expected to grow. Current server memory bandwidth projections are mostly based on traditional benchmarks such as TPC-C, SPECjAppServer (SJAS), and SPECjbb (SJBB). Unfortunately, these benchmarks do not accurately reflect future important workloads such as our physical simulation applications.

Figure 11 shows the projected external memory bandwidth requirements for five different sizes of last-level cache (the other caches are assumed to be small and inclusive). The projection is based on running the workloads at 64 giga-instructions-per- second (GIPS). We analyze the bandwidth requirements for all important modules and compare them to TPC-C, SJAS, and SJBB. For each cache size, the modules are sorted according to their bandwidth requirements. The bandwidth requirements for the traditional benchmarks are highlighted for comparison.



Figure 11: Projection of external memory bandwidth requirements (GB/s) for a given last-level on-die cache size
click image for larger view
 

The results show the following behaviors:

  1. If we have less than 128MB of last-level cache, modules in physical simulation have a wide range of bandwidth requirements, ranging from a few gigabytes per second to over 200GB/s. The bandwidth usage of traditional benchmarks, on the other hand, is much lower than that (maximum of 40GB/s, even if we have only 8MB of cache).
  2. To put the results into context, we compare the requirements to projected available bandwidth in 2010. Memory bandwidth typically grows at 30% per year, so we expect the available bandwidth to be about 48GB/s in 2010. Workloads with bandwidth requirements greater than this will suffer performance-wise. Some of our modules have bandwidth requirements that greatly exceed 48GB/s unless the last-level cache is at least 64MB.
  3. The average bandwidth usage for each of the applications is significantly lower than the peak bandwidth usage. This is because each application is made up of modules with different bandwidth requirements. The scalability of the module with the highest bandwidth requirement often limits the scalability of the entire application.
  4. Our physical simulation modules benefit significantly more than traditional benchmarks do from a large last-level cache. When an application's entire working set fits into cache, the external memory bandwidth usage becomes minimal. For our applications, this happens when the cache is 128MB.
  5. One of our most memory-intensive modules is the incomplete Cholesky Preconditioned Conjugate Gradient (PCG) method from production fluid simulation. PCG is used to solve a system of equations arising from the discretization of the Poisson Equation. (PCG is one of the most popular approaches for solving large symmetric positive-definite systems of equations because it is more robust than direct solvers and converges fast. As such, PCG is of great importance beyond the study of this application). It consists of a number of operations performed sequentially on a set of two matrices and a number of vectors. The solver iterates tens of times until the solution converges. During each iteration, both matrices (which occupy about 40MB each) are streamed over. Thus, we see a huge bandwidth requirement when the last-level cache cannot hold the matrices. When the last-level cache is big enough to hold both matrices (and all the vectors), the bandwidth requirement is greatly reduced.

  Section 6 of 10  

Back to Top

In This Article

Download a PDF of this article.