|
The Intel® Core™ Duo processor is the first mobile core to implement Core Multi-Processor (CMP) technology on one die. The
implementation was carefully chosen to maximize performance, so it can be used as a general-purpose processor, and to
minimize power consumption, in order to extend the battery life and have it fit in a large variety of thermal envelopes.
The performance improvement was achieved by enhancing the micro-architecture, based on Pentium® M processor-based
technology, of the single core, and by combining dual cores on the same die. In order to achieve the power consumption
goal, we examined each micro-architectural decision with respect to its power/performance benefit. A general overview of
the processor and its unique features can be found in this special issue of the Intel Technology Journal [1]. This paper
focuses on the multi-core design and performance aspects of the processor, but for each of the decisions we describe here,
we discuss how the power and thermal aspects were taken into account as part of our decision.

Figure 1: Theoretical power consumption for the same performancesingle thread vs. dual thread
click image for larger view
The first question one might ask is "why choose a CMP implementation for a mobile processor"? Figure 1 compares the
power needed to complete the same amount of work, at the same execution time, assuming frequency scaling vs. using dual
cores. In order to conduct the comparison, we assume a single-core processor that consumes 1 Watt at a given frequency and
voltage, as a baseline. In order to double its performance one can either double both its frequency and voltage
respectively, or he can double the number of cores (assuming perfect scaling of the software). As can be seen in the graph,
it is clear that under these simple conditions a better solution will be to use parallel execution than to improve the
speed of the processor to achieve the same performance. It is a known fact that the power (P) a processor consumes depends
on the voltage and the frequency of the processor. In order to explain the graph of Figure 1, consider a more realistic
relationship between the power the processor consumes and its voltage and frequency. The basic relationship is given by
Equation 1:
where P stands for power, C for capacitance, V for voltage, ∝ is the activity factor and F for frequency. For each
frequency within the design space, there is a minimum V that can support it: we call the pair (Fi, Vi), a working point of
the processor. As long as Vmin < Vi < Vmax, we can approximate that Fi is linearly dependent on Vi, and for every
(Fj,Vj) such that Vj < Vmin, we set the Vj to be equal to Vmin. As a result, within the dynamic range of V, the power has
a cube relation with the frequency, while below Vmin, the power has a linear dependency with the frequency. Figure 1 uses
Equation 1 to estimate the power consumption of each configuration, but in order to represent more realistic scenario, we
use an exponent of 2.5 rather than an exponent of 3 ( cubical relation). Unfortunately, the exponential relation between
the power and the frequency/voltage is only true as long as the working point is within the dynamic-scaling portion of the
voltage and provided enough parallelism is available in the software being used.
Since Intel Core Duo technology is aimed at the general purpose mobile market, the design should be balanced between power
consumption and performance. Thus, we used the following criteria to decide between different design alternatives:
- When the system runs single-threaded applications, its performance should be the same or better than
previous-generation Pentium M processors (with the same cache size and at the same frequency).
- When the system runs multi-threaded applications, we wanted to maximize the performance of the execution and
preserve power by introducing a new and efficient power and thermal control system.
On top of all the technical hurdles mentioned above, we also had to consider the complexity of different solutions, since our experience
told us that complicated solutions consume much power. Thus, for any new feature, the performance improvement must be
significant enough to compensate for its complexity.
The primary goal of this paper is to discuss the CMP implementation and resulting performance. We do not focus on the power-saving
techniques in Intel Core Duo processors since reference [2] covers that aspect of the system. However, when we
discuss our design alternatives and why we chose one solution over another, the reader will notice that power savings (both
static and dynamic) were a major factor in our decisions.
The rest of this paper is organized as follows: in the next section we focus on the CMP implementation. This includes the
tradeoffs we considered, why we chose the current implementation, and their power and performance impact. Next, we focus on
performance measurements, and in last section we extend our discussion to cover software optimizations.
|