- Home›
- Technology and Research›
- Intel Technology Journal›
- Tera-scale Computing
Tera-scale Computing
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures
CONCLUSION
MCAs provide an opportunity to greatly accelerate applications. However, in order to harness the quickly growing compute resources of MCAs, applications must expose their thread-level parallelism to the hardware. We explore one common approach to doing this for large-scale multiprocessor systems: decomposing parallel sections of programs into many tasks, and letting a task scheduler dynamically assign tasks to threads.
Previous work has proposed software implementations of dynamic task schedulers, which we examine in the context of a key emerging application domain, RMS. We find that a significant number of RMS applications achieve poor parallel speedups using software dynamic task scheduling. This is because the overheads of the scheduler are large for some applications.
To enable good parallel scaling even for applications with very small tasks, we propose a hardware scheme to accelerate dynamic task scheduling. It consists of relatively simple hardware and is tolerant to growing on-die latencies; therefore, it is a good solution for scalable MCAs.
We compare the proposed hardware to optimized software task schedulers and to an idealized hardware task scheduler. For the RMS benchmarks we study, our hardware gives large performance benefits over the software schedulers, and it comes very close to the idealized hardware scheduler.
