- Home›
- Technology and Research›
- Intel Technology Journal›
- Tera-scale Computing
Tera-scale Computing
Integration Challenges and Tradeoffs for Tera-scale Architectures
FEEDING THE BEAST: MEMORY ARCHITECTURE
With substantial increases in the computation power on a single die, one faces the challenge of feeding it with enough data bandwidth. For a small class of applications where the memory footprint is small, the memory accesses will mainly be exercising the on-die caches. For the majority of applications, a major increase in off-chip memory bandwidth is required. This manifests itself in two ways: (1) providing power-efficient high-speed off-die I/O; (2) providing power-efficient high bandwidth DRAM access. The former has seen steady progress in the past decade, but not at the required pace. The latter may require a new look at DRAM core and I/O design.
The first step to addressing the memory bandwidth challenge can be more efficient storage or improved management of the on-die storage. For example, embedded DRAM [3] helps to increase the density of on-die storage compared to SRAM. Efficient management of on-die storage by avoiding duplication of data in the cache hierarchy, as discussed in the previous section, is another step in increasing the effective capacity of on-die storage.
Integration of DRAM, e.g., GDDR memory, inside the processor package can offer more control over the I/O channel and thus allow a higher bandwidth, compared to crossing of package to motherboard-connector-DIMM path. Recent works have demonstrated methods of using 3D stacked SRAM to offer a low capacity high-bandwidth option, e.g., Intel's tera-scale prototype [13] and IBM's work on 3- D integrated circuits [26]. The freedom in the footprint design of such SRAM devices enables power-efficient solutions; however, limited capacity of such devices limit their application. 3D stacking of multiple DRAM dies can improve the memory capacity, but requires dense Through Silicon Vias (TSVs) to allow the required concurrency of accesses to independent DRAM banks.
Tera-scale Architecture Prototype
The Intel® Teraflop processor [27] is a prototype of some of the elements of tera-scale architecture. The Teraflop processor realizes an 80-core prototype with a 2D-mesh interconnect architecture that reaches more than 1Tflops of performance dissipating less than 100W of power. This illustrates the potential of the tera-scale architecture and validates the efficacy of some of the architectural building blocks.
