Technology & Research

Intel® Technology Journal Home

Volume 11, Issue 03

Tera-scale Computing


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1103.06

  • Volume 11
  • Issue 03
  • Published August 22, 2007

Tera-scale Computing

  Section 2 of 8  

Datacenter-on-Chip Architectures: Tera-scale Opportunities and Challenges

INTRODUCTION

We have entered the era of CMP platforms with Intel's dual-core and quad-core processors [5, 8] flourishing in the mobile, desktop, and server marketplace. Within a decade, we expect to integrate more and more cores on-die and create tera-scale architectures consisting of several tens of physical cores and hundreds of hardware threads. Such tera-scale architectures are highly suitable for high-performance throughput computing especially in the server marketplace.

A decade ago, datacenters employed tens to hundreds of dual-processor and quad-processor server platforms (each running a single application) on an Ethernet fabric. However, recent trends show that most datacenters have started employing virtualization [21, 23, 31, 32] to consolidate multiple applications onto the same platform in order to improve efficiency, manageability, and overall cost [6]. With tera-scale architectures [7] comes the potential to accelerate the consolidation trend and potentially even enable small datacenters to run on a single (or a few) platforms, thus the term "Datacenter-on-Chip" (or DoC) architectures. In this paper, we use an e-commerce benchmark, TPC-W [29], to illustrate this by showing how an earlier configuration (with 60+ server platforms) can now potentially run on a single tera-scale DoC platform with 32 cores and 128 threads (4 threads per core).



Figure 1: Datacenter-on-chip usage models: classification and examples
click image for larger view
 

With tera-scale DoC architectures comes the challenge of designing a balanced platform with sufficient resources to sustain the large number of cores actively running VMs. In this paper, we evaluate the cache, memory, and I/O requirements as well as the behavior of the DoC architecture. We accomplish this by analyzing the TPC-W configuration as well as running detailed platform simulations that mimic multiple VMs running Online Transaction Processing (OLTP) workloads, Java* application server workloads, and even enterprise resource planning workloads simultaneously. To address the cache/memory scalability requirements, we show that (a) a hierarchy of shared caches is most suitable for DoC architectures since it maximizes performance and area efficiency when running virtualized server workloads and (b) integrating a large-capacity DRAM cache can significantly reduce the memory bandwidth requirements and thereby improve performance and scalability.

Another critical challenge in DoC architectures is that the performance of each VM can be highly non-deterministic since it depends heavily on the other VMs running simultaneously. Since an abundant number of cores is provided in tera-scale DoC architectures, the source of this non-determinism comes from interference in shared platform resources such as cache and memory. Through detailed simulations of simultaneously running VMs, we quantify the impact of this interference and the lack of QoS provided to each individual workload. Since datacenters typically provide service-level agreements, it is important to incorporate QoS hooks in the platform resources such as cache and memory. In this paper, we describe potential platform QoS mechanisms and evaluate the effectiveness of these mechanisms in improving the performance isolation provided to each VM.

  Section 2 of 8  

Back to Top

In this article

Download a PDF of this article.