Computer workloads may be broadly classified into two categories: those that
have little inherent parallelism (scalar) and those that have significant
amounts of parallelism (parallel). It is desirable to design a microprocessor
that can run both scalar and parallel workloads at high performance. However,
the design techniques needed to achieve short latency are very different from
the design techniques needed to achieve high throughput, and achieving short
latency often requires expending large amounts of energy per instruction,
whereas achieving high throughput requires minimizing the amount of energy
expended per instruction.
The advent of chip multiprocessors, coupled with the requirement not to raise
power consumption beyond today's levels, has created a sharp tradeoff between
latency performance and throughput performance. Good performance in both areas
is needed to run multithreaded workloads that inherently contain phases of
sequential execution as well as phases of parallel execution. Amdahl’s
law dictates that the speedup achievable through parallelization of such
workloads will be limited by the sequential portion of the computation.
In this talk, we show that the key to achieving both excellent latency
performance as well as excellent throughput performance is to dynamically vary
the amount of energy expended to process instructions according to the amount
of parallelism available in the software. We survey four techniques for
achieving variable energy per instruction (voltage/frequency scaling,
asymmetric cores, variable size cores, and speculation control) and propose an
energy per instruction (EPI) throttle. The EPI throttle varies the amount of
energy expended per instruction in inverse proportion to the aggregate number
of instructions retired per second in order to maintain a chip multiprocessor's
total power consumption within a fixed power budget.
To evaluate the effectiveness of EPI throttling, we present an experimental
prototype that uses the Pentium 4 microprocessor's clock throttling mechanism
and Linux thread affinity to emulate the performance effects of controlling
EPI. Across a range of multithreaded workloads, we show that a multiprocessor
that can vary its EPI will outperform a multiprocessor that runs at constant
EPI when both are given the same power budget. By continuously optimizing the
use of a fixed power budget according to the amount of parallelism in each
phase, EPI throttling mitigates the effects of Amdahl's law. We believe that
techniques to control EPI will become an essential part of future
microprocessor microarchitecture.
Ed Grochowski is a Senior Principal Engineer at the Architecture and
Microarchitecture Research Lab in Santa Clara, California. Ed joined Intel in
1986 and has had various technical and managerial responsibilities in the Intel
486®, Pentium®, Pentium® II, and Itanium® microprocessor design teams. Ed is
currently working on the microarchitectural techniques needed for future
energy-efficient chip-level multiprocessors.
Ed received his BSEE from University of California Berkeley in 1985 and his
MSEE from University of California Berkeley in 1986. Ed holds over 30 patents
in the areas of microarchitecture and logic design.
Previous
Lecture
Next Lecture
Return to Lecture Topics 
|