- Home›
- Technology and Research›
- Intel Technology Journal›
- Tera-scale Computing
Tera-scale Computing
Package Technology to Address the Memory Bandwidth Challenge for Tera-scale Computing
MEMORY BANDWIDTH FUNDAMENTALS
It is useful to review several fundamental concepts as an introduction to the topic of memory bandwidth. First, it is important to understand the definition of memory bandwidth, the key elements related to bandwidth, and the role that the package interconnect plays. Very basically, memory bandwidth is defined as the product of the number of data bits in the memory bus and the speed of a single bit in the bus. This can be expressed as
BW = # of bits x bit rate Eq. (1)
For example, if a memory bus is 8 bits wide (or 1 byte wide) and each bit transmits data at 1Gb/s (gigabits per second), then the memory bandwidth is 1 byte (1B) x 1Gb/s, or 1GB/s. A more realistic example is that of a typical DDR2 bus that is 16 bytes (128 bits) wide and operating at 800Mb/s. The memory bandwidth of that bus is 16 bytes x 800Mb/s, which is 12.8GB/s.
Besides the actual memory bandwidth, other key elements of memory bandwidth are latency and capacity. Latency is the roundtrip time that it takes to receive a response after a request has been sent. Latency is typically measured in nanoseconds (ns). Capacity refers to the size of the memory and is typically measured in MBs.
The memory subsystem hierarchy of a computer architecture consists of many levels. Memory can be located at the chip level, the package level, the board level, and in separate devices off the board (such as the hard disk). There is a tradeoff among the types and the key elements of memory (bandwidth, latency, and capacity) depending upon the location in the memory subsystem hierarchy. Very simply, faster, lower capacity memory is typically located on-chip, while slower, higher capacity memory is located off-chip. On-chip memory usually uses Static Random Access Memory (SRAM) technology, which is fast but expensive, and it is low-density compared to other memory technologies. On-chip memory usually serves as a cache and can be further divided into levels of cache, e.g., L1 cache, L2 cache, etc., [2]. Off-chip memory typically uses Dynamic Random Access Memory (DRAM) technology, which is slower but cheaper, and it is higher-density than SRAM. Off-chip memory located on the system board serves as the main memory for the computer system.
Today's typical computer architecture consists of the microprocessor (CPU), the chipset, and the main memory. Busses connect the various components of the system. Figure 1 illustrates a typical system architecture consisting of a microprocessor connected to a chipset through the system bus. The chipset in this example is divided into a Memory Controller Hub (MCH) and a separate Graphics Processing Unit (GPU). Each has a memory bus connecting to on-board memory. The system bus connects the CPU to the on- board, main system memory.

Figure 1: System architecture with main memory bus connected to the chipset
click image for larger view
In this example, there are potential bottlenecks at each interconnect transition with respect to providing memory bandwidth to the CPU. Specifically, there is a transition from the CPU to the MCH through the system bus; and there is a transition from the MCH to the system memory through the main memory bus. The challenges to memory bandwidth in this traditional architecture have been to increasingly scale the capabilities of both the system and main memory busses to keep up with the steadily increasing memory demand of the CPU with each new generation. Figure 2 illustrates the historical trending of the system bus bandwidth capability vs. the memory bandwidth of the system. It makes sense that the two bandwidths have needed to scale simultaneously for optimum system performance. Scaling of bus capability has usually involved a combination of increasing the bus width while simultaneously increasing the bus speed.

Figure 2: Historical trend for system bus bandwidth vs. memory bandwidth
click image for larger view
Inherent in achieving both increased bus width and increased bus speed is the ability to scale the package technology to accommodate wider and faster busses. The package is the transitional interconnect between the fine pitch and high-density features of the processor chip and the much coarser pitch and low-density features of the system board. The ability of packaging technology to serve as an intermediary interconnect bridging the gap between the chip and the system board has been critical to enabling increasing system memory bandwidth in the past. Packaging technology will continue to play a critical and increasingly important enabling role as we transition to tera-scale computing architectures.
In This Article
- Abstract
- Introduction
- Memory Bandwidth Fundamentals
- Review of Package Technology Evolution vs. Memory Bandwidth Requirements
- Tera-scale Computing Memory Bandwidth Challenges for Package Technology
- Package Architectures To Meet the Memory Bandwidth Needs of Tera-Scale Computing
- Summary and Conclusion
- Acknowledgments
- References
- Authors' Biographies