Technology & Research

Intel® Technology Journal Home

Volume 11, Issue 03

Tera-scale Computing


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1103.04

  • Volume 11
  • Issue 03
  • Published August 22, 2007

Tera-scale Computing

  Section 5 of 11  

Runtime Environment for Tera-scale Platforms

SUPPORTING HETEROGENEITY

McRT uses multiple scheduling domains to support platform heterogeneity. Each domain can represent a set of hardware units with specific features such as good scalar performance, good throughput, special instruction sets, and so on. McRT extends the task queue mechanism to support scheduling domains. To create a domain Dk consisting of logical processors Pi to Pj a client creates a task queue Qk that is accessed only by the processors Pi to Pj. New tasks created at these processors are only added to Qk. The scheduler also exports an API that allows a task to yield its current logical processor and enqueue itself on a different task queue. A task executing in a domain Dk can switch to a different domain D'k by enqueing itself on to the task queue Q'k at which point it will get executed by the processors in D'k . Applications, or even different parts of the same application, can be scheduled on different hardware units based on their requirements.

We prototyped a heterogeneous hardware platform on an 8-way SMP system. One processor (referred to as the OS processor) in the system boots up Windows* Server 2003, while the remaining seven processors (referred to as the sequestered processors) use McRT for all the threading services, without using the OS. The sequestered processors use a lightweight executive for interrupt handling. Thus, the sequestered processors emulate an attached compute engine, with the OS processor emulating a host CPU. Internally, McRT creates two scheduling domains, one representing the sequestered processors and the other representing the OS processor. The system configuration is shown in Figure 7.

We ran Equake* using the standard Spec input on the sequestered system. At the beginning the application is serial and reads the input. It then forks off a number of threads to perform the computation. McRT scheduled the serial part on the OS core and the parallel part on the sequestered cores.



Figure 7: Sequestered system
click image for larger view
 

Figure 8 shows the performance of Equake on the sequestered system. We first ran the benchmark on the 8-way SMP system with Windows running on all the processors. These numbers are reported as "Native" and "McRT-OS": "Native" refers to the performance from the Intel OpenMP* implementation (referred to as KAI in the figure), and "McRT-OS" refers to the performance from running McRT on top of Windows on the 8-way SMP. "McRT-Sequestered" refers to the performance on the sequestered system with one OS processor and seven sequestered processors. All speedups are reported with respect to the single thread "Native" execution time.



Figure 8: Equake on sequestered system
click image for larger view
 

Equake performs much better on the sequestered system, mainly due to the fact that the software stack is much more lightweight and is not interrupted as often. Another reason is that in the sequestered mode, the application reserves and locks down enough memory at initialization so that it does not encounter page faults during execution.

  Section 5 of 11  

Back to Top

In This Article

Download a PDF of this article.