Parallel computing involves the simultaneous use of more than one computer or processor to execute a program. Ideally, parallel processing makes a program run faster because there are more engines (CPUs) running it. Even single-core processor computers can perform parallel processing by connecting to other computers in a network.
Intel leadership in parallelism
Intel continues to pioneer in one of the most important directions in microprocessor architecture—increasing parallelism for increased performance. As shown in Figure 1, we started with the superscalar architecture of the original Intel® Pentium® processor and multiprocessing, continued in the mid '90s by adding capabilities like "out of order execution," and in 2002 introduced Hyper-Threading Technology in the Intel® Pentium® 4 processor.
Figure 1. Driving increasing degrees of parallelism on Intel® processor architectures.
These technologies paved the way for the next major step—the movement away from one, monolithic processing core to multiple cores on a single chip. Early in 2005, we introduced Intel multi-core processor-based platforms to the mainstream. These platforms now contain Intel processors with two cores, and will soon evolve to include many more. We plan to deliver Intel processors over the coming decade that will have dozens, and even hundreds of cores in some cases. We believe that Intel's chip-level multiprocessing (CMP) architectures represent the future of microprocessors, because they deliver massive performance scaling while effectively managing power and heat.
The challenges of shrinking chips
In the past, performance scaling in conventional single-core processors was accomplished largely through increases in clock frequency (accounting for roughly 80 percent of the performance gains to date). But frequency scaling is running into some fundamental physical barriers.
First of all, as chip geometries shrink and clock frequencies rise, the transistor leakage current increases, leading to excess power consumption and heat. Secondly, the advantages of higher clock speeds are in part negated by memory latency, since memory access times have not been able to keep pace with increasing clock frequencies. Third, for certain applications, traditional serial architectures are becoming less efficient as processors get faster (due to the so-called Von Neumann bottleneck), further undercutting any gains that frequency increases might otherwise buy. In addition, resistance capacitance (RC) delays in signal transmission are growing as feature sizes shrink, imposing an additional bottleneck that frequency increases don't address.
Therefore, performance will have to come by means other than boosting the clock speed of large monolithic cores. Instead, the solution is to divide and conquer, breaking up functions into many concurrent operations and distributing these across many small processing units. Rather than carrying out a few operations serially at an extremely high frequency, Intel's multi-core processors will achieve extreme performance at more practical clock rates by executing many operations in parallel. Our multi-core architectures will circumvent the problems posed by frequency scaling (increased leakage current, mismatches between core performance and memory speed and Von Neumann bottlenecks). Intel® architecture with many cores will also mitigate the impact of RC delays.
Multi-core takes computing to the next level
Intel multi-core architectures provide a way to not only dramatically scale performance, but also to do so while minimizing power consumption and heat dissipation. Rather than relying on one big, power-hungry, heat-producing core, our multi-core processors need activate only those cores needed for a given function, while idle cores are powered down. This fine-grained control over processing resources enables the chip to use only as much power as it needs at any time.
Our multi-core architectures will also provide the essential special-purpose performance and adaptability that future platforms will require. In addition to general-purpose cores, Intel's chips will include specialized cores for various classes of computation, such as graphics, speech recognition algorithms and communications protocol processing. Moreover, Intel will design processors that allow dynamic reconfiguration of the cores, interconnects and caches to meet diverse and changing requirements.
Such reconfiguration might be performed by the chip manufacturer, to repurpose the same silicon for different markets; by the original equipment manufacturer (OEM), to tailor the processor to different kinds of systems; or in the field at runtime, to support changing workload requirements on the fly. Intel® IXP processors today provide some of these capabilities for special purpose network processing. Another related Intel research area is focused on development of a reconfigurable radio architecture, enabling processors to dynamically adapt to different wireless networking environments (such as 802.11b, 802.11a, and W-CDMA).