Technology & Research

Intel® Technology Journal Home

Volume 11, Issue 04

Multi-Core Software


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1104.p

  • Volume 11
  • Issue 04
  • Published November 15, 2007

Multi-Core Software

PREFACE Q4'07

By Lin Chao
Publisher and Editor, Intel Technology Journal

Multi-Core processors are balanced for performance and power consumption. They can achieve high performance with optimal power consumption by sharing the work of executing tasks on multiple execution cores. To take advantage of these multiple cores, software (SW) needs to be designed to execute in parallel across multiple processor cores by the use of threads. SW threads have been long used in scientific and high-performance computing where large-scale computing resources are tied together to crunch on complex, numerically based mathematical problems. Today, multi-core processors are not just used in scientific computing; they are the standard in consumer desktop and mobile computers.

Application developers face a daunting task when threading their applications. New SW development tools won't eliminate the inherent challenges, but they can help simplify the problem by identifying thread correctness issues and performance opportunities. The nine papers in this Volume 11, Issue 4 of the Intel Technology Journal focus on multi-core SW and take a detailed and comprehensive look at important tools and methodologies to thread successfully many types of applications. Among these tools, the Threading Building Blocks Library (TBB) is available in open source. As the number of available cores continues to increase, it is important for SW developers to have the right tools to design and implement scalable solutions for today's and tomorrow's multi-core systems.

Below are snapshots of the nine papers.

The first paper takes a detailed look at the Intel® 10.1 C++/Fortran Compiler that includes new tools for code parallelization and vectorization. This compiler features various advanced optimizations to leverage the enhanced capabilities of Intel® Core™2 Duo and Quad processors. Significant performance gains are shown using the SPEC CPU2006* suite running on a system configured with two Intel® quad-core processors.

The second paper looks at the Intel® Performance Tuning Utility whose performance analysis feature can assist at virtually every stage of both parallel and sequential SW performance tuning, and which may be extremely helpful in the preliminary stages of determining parallelization strategies. The authors discuss data-level decomposition strategy in real program examples and illustrate how the efficiency of a parallel implementation can be estimated, and which steps should be performed to optimize a parallel program using this Intel utility.

Intel provides a set of threading tools targeting various phases of the development cycle. In the third paper, we introduce the principles of parallel application design; and we then show how to parallelize an application with the help of threading tools during each phase of the development cycle. A multiple pattern matching algorithm is used as an example.

In the fourth paper, we present the Intel® Math Kernel Library (MKL) as a mathematical SW package for scientific and technical computation designed for ease of use in environments that can vary greatly. This paper is devoted to the optimization and parallelization of the library. The goal has been to provide an easy-to-use SW package to aid in the development of mathematical SW. Achieving this goal has a number of facets including functionality, compiler independence, and the most recent efforts in performance, focusing on helping the user get the full benefits from Intel® multi-core systems.

In the fifth paper, we provide an overview of the Intel® Threading Building Blocks (TBB), a SW C++ template library that simplifies the development of SW applications running in parallel. In this paper, we describe the design of the TBB task scheduler and several scheduling optimizations users can keep in mind while coding their applications. The task scheduler is complemented by the Intel TBB scalable memory allocator. We provide an overview of its design and look at the tradeoffs.

The sixth paper looks at a case study of semi-automatic parallelization of large-scale integer applications. The application is Intel's own C++ Compiler and we detail how we threaded this compiler to achieve an average of 2x speedup when compiling a range of CPU2000 benchmarks, showcasing our methodology and tools. We believe our approach is generally applicable to threading a large class of applications.

The seventh paper looks at a forward-looking, scalable programming model called the Ct and its associated API that leverages the strengths of data parallel programming to help address the challenges of multi-core software development. In this paper, we describe how Ct is designed for minimal effort by the developer, while providing forward scaling on multi-core Intel® Architecture platforms.

In the eighth paper, we look at applications for video analysis and management including search and retrieval. These applications are becoming mainstream and are mass-marketed. "Content-Based Video Information Retrieval (CBVIR)" is one of the commonly used techniques in this class of applications. In this paper, we optimize and parallelize a set of typical visual feature extraction applications. The underlying optimization and parallel techniques are representative of those used in video-analysis applications and can be further used in other applications to maximally improve their performance on multi-core systems.

The ninth paper looks at how the different multi-core topologies and the associated processor power management technologies bring new optimization opportunities to the process scheduler. We look into different scheduling mechanisms and the associated tradeoffs. Using the Linux* Operating System as an example, we also look into how some of these scheduling mechanisms are currently implemented. As the multi-core platform is evolving, some portions of the hardware (HW) and SW are being reshaped to take maximum advantage of the platform resources. We close this paper with a look at where future efforts in this technology are heading.

Back to Top