Technology & Research

Intel® Technology Journal Home

Volume 11, Issue 03

Tera-scale Computing


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1103.02

  • Volume 11
  • Issue 03
  • Published August 22, 2007

Tera-scale Computing

  Section 3 of 10  

Accelerator Exoskeleton

RELATED WORK

There has been a rich body of research on heterogeneous acceleration. In most published work, the execution models usually fall into two classifications: (category 1) an ISA-based tightly coupled approach or (category 2), a device driver-based loosely coupled execution model. An example of the tightly coupled approach is the Software-configurable Processor (SCP) architecture [4] in which a custom ISA extension represents the operations implemented by a hardware accelerator attached to the CPU. The CPU is then responsible for sequencing, decoding, and dispatching each co-processor instruction, stalling until the co-processor execution completes. This approach resembles the classic x87 escape-wait style co-processor instruction execution where the co-processor does not sequence instructions independently from the CPU.

Examples of the second category include most known GPGPU infrastructures [1, 3, 5, 13, 15, 16, 17, 18, 19, 20, 22, 24, 25, 28]. As depicted in Figure 1(a), the CPU resources (cores and memory) are managed by the operating system (OS), and the GPU resources are separately managed by vender-supplied device drivers. Applications and device drivers run in separate address spaces, and consequently, data communication and synchronization between them is usually carried out in coarse granularity through explicit data copying via device driver APIs. In the EXOCHI framework depicted in Figure 1(b), the EXO architecture supports an execution model with a shared virtual address space and a POSIX multi-threaded programming model for the OS-managed IA sequencer and application-managed non-IA accelerator sequencers.

EXO differs from the existing tightly coupled approaches (category 1) by allowing independent sequencing and concurrent execution of multiple instruction streams on multiple sequencers within a single OS thread context. EXO also differs from the loosely coupled, driver-based approaches (category 2) by directly exposing the heterogeneous sequencers to application programs and by supporting a shared virtual address space amongst these sequencers. EXOCHI's user-level runtime can be used to schedule shreds and coordinate light-weight inter-shred data communication efficiently through shared virtual memory.

In addition, by supporting the shared virtual memory heterogeneous multi-threaded execution model, the CHI integrated programming environment enables the application developer to inline blocks of accelerator-specific assembly or domain-specific languages within traditional C/C++ code. This allows performance-sensitive parts of an algorithm to be optimized for the accelerator ISA just as Intel's SSE ISA extensions are traditionally used in implementing a high-performance math library. CHI's extensions to OpenMP allow programmers to express the underlying thread-level parallelism in a familiar parallel programming environment.

  Section 3 of 10  

Back to Top

In This Article

Download a PDF of this article.