Technology and Research
Intel® Technology Journal Home
Volume 10, Issue 02
Intel® Centrino® Duo Processor Technology
Table of Contents
Technical Reviewers
About This Journal
Intel Published Articles
Read Past Journals
Subscribe
E-Mail this Journal to a Collegue
Home  ›  Technology and Research  ›  Intel Technology Journal  ›  Intel® Centrino® Duo Mobile Technology
Main Visual Description
Intel Technology Journal - Featuring Intel's Recent Research and Development
Intel® Centrino® Duo Mobile Technology
Volume 10    Issue 02    Published May 15, 2006
ISSN 1535-864X    DOI: 10.1535/itj.1002.02

  Section 6 of 10  
CMP Implementation in Systems Based on the Intel® Core™ Duo Processor
COMPARING SPLIT CACHE WITH SHARED CACHE

Recently, different architectures use a split last-level cache in order to achieve a fast time-to-market of a dual-core system. Clear downsides of this solution are as follows:

  1. Cache coherent-related events that need to be served over the FSB, such as RFO or invalidation signals, greatly impact performance and power.
  2. An ST application cannot take full advantage of the entire cache.

The hard partitioned cache may have one significant benefit over the unified cache; that is, it may prevent one application from significantly reducing the amount of cache memory available to an application running on the other core. Thus, in this section we compare two systems: one uses a split L2 cache and the other uses a unified model. In order to make the comparison fair, we present speedup numbers and not absolute numbers.

A sample physics engine game is created (using Microsoft DirectX*) to perform this study. The application is MT using data domain decomposition. The threads are synchronized before rendering the updates on the screen. Since the dependency among the threads is very minimal, we expected to achieve ~2.0x performance improvement with the MT version as compared to the ST version.

The split L2 cache indicated approximately a 1.68x performance improvement due to MT. Running the same application on the Intel® Core™ Duo processor-based system demonstrates ~1.90x scaling as per our expectations.

The root cause of the difference in the scaling is due to the shared L2 cache on the Intel Core Duo system. The sample application under study is designed in a way that both threads work on data from a shared data structure. Hence, on the system with the split L2 cache, to get access to the data modified by one processor, the second processor needs to go to main memory, which results in many L2 cache misses. Since the Intel Core Duo system has a unified L2 cache, a penalty of cache miss and access to the main memory is avoided, as the data modified by one core can be made available to the other core immediately.


  Section 6 of 10  

In This Article
Abstract
Introduction
CMP Implementation and Design Considerations
The Protocol
Performance Measurements
Comparing Split Cache with Shared Cache
Optimization Opportunities For Intel® Core™ Duo Processor
Conclusion and Remarks
References
Authors' Biographies
Download a PDF of this article.   
Email This Page
Back to Top