Intel Lecture
Multithreading and Energy

Edward Grochowski
Edward Grochowski

Computer workloads may be broadly classified into two categories: those that have little inherent parallelism (scalar) and those that have significant amounts of parallelism (parallel). It is desirable to design a microprocessor that can run both scalar and parallel workloads at high performance. However, the design techniques needed to achieve short latency are very different from the design techniques needed to achieve high throughput, and achieving short latency often requires expending large amounts of energy per instruction, whereas achieving high throughput requires minimizing the amount of energy expended per instruction.

The advent of chip multiprocessors, coupled with the requirement not to raise power consumption beyond today's levels, has created a sharp tradeoff between latency performance and throughput performance. Good performance in both areas is needed to run multithreaded workloads that inherently contain phases of sequential execution as well as phases of parallel execution. Amdahl’s law dictates that the speedup achievable through parallelization of such workloads will be limited by the sequential portion of the computation.

In this talk, we show that the key to achieving both excellent latency performance as well as excellent throughput performance is to dynamically vary the amount of energy expended to process instructions according to the amount of parallelism available in the software. We survey four techniques for achieving variable energy per instruction (voltage/frequency scaling, asymmetric cores, variable size cores, and speculation control) and propose an energy per instruction (EPI) throttle. The EPI throttle varies the amount of energy expended per instruction in inverse proportion to the aggregate number of instructions retired per second in order to maintain a chip multiprocessor's total power consumption within a fixed power budget.

To evaluate the effectiveness of EPI throttling, we present an experimental prototype that uses the Pentium 4 microprocessor's clock throttling mechanism and Linux thread affinity to emulate the performance effects of controlling EPI. Across a range of multithreaded workloads, we show that a multiprocessor that can vary its EPI will outperform a multiprocessor that runs at constant EPI when both are given the same power budget. By continuously optimizing the use of a fixed power budget according to the amount of parallelism in each phase, EPI throttling mitigates the effects of Amdahl's law. We believe that techniques to control EPI will become an essential part of future microprocessor microarchitecture.

Ed Grochowski is a Senior Principal Engineer at the Architecture and Microarchitecture Research Lab in Santa Clara, California. Ed joined Intel in 1986 and has had various technical and managerial responsibilities in the Intel 486®, Pentium®, Pentium® II, and Itanium® microprocessor design teams. Ed is currently working on the microarchitectural techniques needed for future energy-efficient chip-level multiprocessors.

Ed received his BSEE from University of California Berkeley in 1985 and his MSEE from University of California Berkeley in 1986. Ed holds over 30 patents in the areas of microarchitecture and logic design.

Previous Lecture    Next Lecture    Return to Lecture Topics



Contact Education  Intel® Education Initiative

* Legal Information and Privacy Policy ©Intel Corporation