Technology & Research

Intel® Technology Journal Home

Volume 11, Issue 03

Tera-scale Computing


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1103.02

  • Volume 11
  • Issue 03
  • Published August 22, 2007

Tera-scale Computing

  Section 2 of 10  

Accelerator Exoskeleton

INTRODUCTION

The relentless pace of Moore's Law will lead to mainstream multi-core microprocessor designs with extensive on-die integration of a large number of cores [11]. Fundamentally, to scale multi-core processor designs to incorporate a large number of cores, ultra low Energy Per Instruction (EPI) cores are essential [6]. One approach to improving EPI by an order of magnitude is through heterogeneous multi-core design, in which some cores vary in functionality, instruction set (ISA), performance, power, and energy efficiency [14]. The key challenge then becomes how to accomplish such heterogeneous integration and achieve high performance while still maintaining the look-n-feel of the classic mainstream IA-based programming models and software ecosystem.

In this paper we present an overview of EXOCHI: Exoskeleton Sequencer (EXO), an architecture proposal to represent heterogeneous accelerators as ISA-based MIMD architectural resources, and C for Heterogeneous Integration (CHI), a programming environment that supports tightly coupled integration of heterogeneous cores. The EXO architecture supports the familiar POSIX shared virtual memory multi-threaded programming model for heterogeneous cores. Architecturally, the heterogeneous cores are exposed to the programmer as a new form of sequencer resource. They can be regarded as application-level MIMD functional units on which user- level threads, or shreds, encoded in the accelerator-specific ISA can execute. Having a shared virtual address space between the IA sequencer and accelerator sequencers facilitates code and data sharing and harmonizes cooperation between the concurrent shreds of different ISAs. Such a program is said to be multi-shredded.

The CHI integrated programming environment allows an application developer to inline blocks of accelerator-specific assembly or domain-specific language with traditional C/C++ code. The CHI compiler produces a single fat binary consisting of executable code sections corresponding to the different ISAs. CHI further extends the OpenMP pragmas [21, 23, 26] to allow the programmer to express thread-level parallelism by demarcating parallel regions of code targeting heterogeneous accelerators. The CHI extensions to OpenMP support both fork-join and producer-consumer parallelism among the accelerator shreds and between the IA shreds and the accelerator shreds. The CHI runtime can judiciously spread the shreds across the heterogeneous sequencers dynamically to maximize throughput performance while minimizing power.

The rest of the paper is organized as follows. We first briefly review related work. We then introduce the EXO architecture that supports a shared virtual memory heterogeneous multi-threaded programming model. We then present an overview of the CHI integrated programming environment that extends the Intel® C++ Compiler, runtime, and tool chains to provide the familiar IA look-n-feel to program heterogeneous cores. To prototype the EXO architecture, we describe potential heterogeneous multi-core processors which combine an Intel® Core™2 Duo processor [27] and two possible accelerators: an 8-core 32-thread Intel® Graphics Media Accelerator (GMA) X3000 [10] or the Datastream Processing Engine (DPE) from a research Scalable Communication Core (SCC) prototype [8]. We demonstrate code examples and evaluate performance.



Figure 1: Alternate programming environments
click image for larger view
 

  Section 2 of 10  

Back to Top

In This Article

Download a PDF of this article.