Technology & Research

Intel® Technology Journal Home

Volume 12, Issue 02

Intel's 45nm CMOS Technology


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1202.04

  • Volume 12
  • Issue 02
  • Published June 17, 2008

Intel's 45nm CMOS Technology

  Section 3 of 8  

45nm SRAM Technology Development and Technology Lead Vehicle

X-CHIP OVERVIEW

The X-chip series started many generations ago with the X1 testchip that was the certification vehicle for the 180nm node process technology. From X1 to X6 the role and content of the testchip has grown significantly. However, in the midst of these changes some elements and challenges remain the same.

The memory cell still remains the most critical collateral, the design of which requires careful consideration. The footprint of the memory cell often sets the design rule limit at a given technology node, and it has a large impact on the product die-size, given the tens of megabyte-sized caches of modern CPUs. The stability of the cell at lower voltages often determines the low voltage operation of the products and hence the power consumption. Any systematic process defect mechanism will have a huge impact on product yield due to the tiling of millions of memory bits in product caches. Due to these reasons it has been a product design requirement for several generations to copy exactly the memory bits developed on the X-chips.

The SRAM bits are organized into addressable units called subarrays that have a rectangular matrix-like structure with rows and columns. During a read or write operation from the memory subarray, a specific row and specific columns are activated depending on the address, and a group of bits called a Word are read or written. A subarray is designed to be very compact since it is tiled many times to form an on-die cache in a real product. The compactness can be achieved by exercising the tightest design rules. Once a subarray design is complete, large portions of the X-chip area can be tiled with these subarrays with minimal additional effort. Thus, the SRAM on the X-chips has always proved to be sensitive to many different kinds of process defects. The fine granular addressability of the compact subarrays allows rapid isolation of the defect location.

On the test infrastructure side X-chips have always carried an I/O interface that can communicate at the desired speeds with all the different tester platforms. A Programmable Built-in Self Test (PBIST) has also been an integral part of X-chips to allow high-speed, on-die testing and burn-in tests. Among the mixed-signal collaterals, X-chips have always included the PLL. This also serves as a clock multiplier for on-die, high-speed SRAM testing.

X-chip design faces some unique challenges compared to product design. All the key electrical parameters of the transistors and interconnects will evolve significantly from the beginning to the final goal of the technology, when the process reaches its maturity. Yet, if the design is not robust enough to comprehend this evolution and still be largely functional, it will not be able to provide robust silicon data for continuous process learning. Testchip circuits are designed largely based on engineering judgment with limited simulation data available to the designers when the design is finalized. The concept of "correct-by-construction" is essential in ensuring the robust design. Layout design rules get defined almost simultaneously with the testchip design process. This creates uncertainties and the inevitable last-minute changes for testchip designers. The impact of design rule uncertainties is mitigated by working closely with the rule-definition teams. Another reason for tight coupling with the design-rule-development activity is that the testchip often throws up scenarios not considered at the time of design rule definition. The testchip provides the first reality check for the process design rule set and validation flows. The design time allowed for the testchip is rather short. The testchip design schedule must be synchronized with the process integration schedule so that it does not become a bottleneck for technology development. Process learning accelerates significantly with the arrival of the first testchip silicon.



Figure 1: X6 reticle
click image for larger view
 

Figure 1 shows a diagram of the X6 reticle. There are two different flavors of die on this reticle, and each die is instanced twice. Thus we have four SRAM-based dice on the reticle, and two dice are allocated for discrete test structures. We focus on the SRAM-based X6 testchip. The two flavors of the X6-SRAM testchip have similar architecture and content. However, having two different flavors allows designers to test different flavors of the same circuits and compare directly the merits of each. While the same can be done on a single testchip design, having two die flavors makes it convenient by adding parallelism to the design process, which then shortens the design time.

X6 Test Features

The main SRAM chip-level tileable unit is called the Raster Unit (RU). An RU is a portion of SRAM convenient for generating raster maps. It is significantly bigger than a small subarray but small enough to be tested in a reasonable time. All RUs can be tested in parallel, which saves test time. The RU-based architecture of the testchip allows easy integration of different flavors of memory bits and subarray designs while ensuring that each flavor is statistically significant in size.

The chip-level inputs are collected from the I/Os located at two edges of the die. The major busses are located along a center spine of the die connecting the RUs to the I/O and other control blocks. Another center spine in the orthogonal direction contains the major non-SRAM blocks such as the PBIST, Test Access Port (TAP), Fuse unit, and PLL. The synchronous circuits are all located within the RU with the exception of PBIST. There are two clock domains in each RU. One of them is a high-frequency clock and the other a low-frequency clock. The high-frequency clock is contained entirely within the RU, and no synchronous paths communicate between the RUs. This feature makes the chip design very tolerant of clock skew.

The PBIST on X6 can be programmed to generate the test patterns required for SRAM testing. This is useful in cases where the tester cannot support complex test patterns, as in the burn-in test platform and also for high-frequency testing where expensive high-speed testers and complex I/O circuitry can be avoided by on-die testing. The scheme for high-frequency testing is illustrated in Figure 2. The PBIST is part of the low-frequency clock domain. The advantage is that the PBIST design is unlikely to become the bottleneck for frequency and hence is tolerant to the process being off-target in terms of performance, during the development stage. A Parallel-to-Serial (PS) converter converts the low-frequency instructions received as parallel instructions to high-speed serial instructions [1]. An on-die compare circuit compares the "read" data from the SRAM, and the "expect" data from the PBIST pattern to determine pass/fail.



Figure 2: High-frequency testing scheme
click image for larger view
 

The high-frequency testing scheme described above is adequate for determining the SRAM maximum frequency of operation (Fmax). However, for SRAM design validation and process learnings, it is also quite useful to have raster data at Fmax to determine what limits Fmax—especially for products. The high-frequency raster feature will stall the PBIST when a failure occurs. The faulty addresses will be recorded in on-die storage elements and later can be scanned out to identify the location of the faulty bits.

Error Correction Codes (ECC) are becoming very important in modern SRAM design. ECC is an essential part of the large array for data integrity under all product conditions. Since X6 is used to optimize the SRAM memory bit and subarray design, it is important to quantify the impacts of having ECC. X6 has a simple ECC emulator that can quantify the benefit of ECC.

X6 has a TAP that controls the scan function. There are many scan chains dedicated to a specific purpose. Two major roles of the scan chains for process learning are to physically isolate a defect and to support process variation tracking through the use of In-Die-Variation (IDV) oscillators. When a process defect occurs between two stages in pipelined implementation of synchronous circuits, it is possible to feed the logic cone with the defect with various input combinations by using a scan chain and also by recording the output of the logic cone the same way. By analyzing the logic cone output for different input combinations you can, in a large number of cases, pinpoint the failing device. The majority of the circuitry above the subarray level in the RUs is pipelined, and scan provides a useful method of failure isolation and process learning. IDV oscillators are well known for their ability to track process variation of different process parameters ranging from transistor performance to various kinds of leakage, interconnect delay, and device mismatch [2]. The easiest way to control and extract data from IDV oscillators is through scan chains. X6 provides a robust set of scan chains for this purpose that allow a fine spatial granularity of IDV oscillators on-chip, which otherwise would be impossible to manage using I/O bumps alone.

X6 Memory Design: SRAM and Register File

CMOS technology scaling continues to drive the increase of on-die memory density to meet performance needs in applications such as microprocessors. Meanwhile, the device variation and leakage are increasing as the miniaturization of the transistor continues. As a result, it has become increasingly challenging to design SRAM with an adequate stability margin for low-voltage operation while at the same time keeping the power consumption low enough to meet system-level power requirements. The X6 testchip is a 153Mb SRAM design that is optimized for a 45nm Hi-K Metal-Gate technology [3].

Figure 3 shows the SEM top-down view of the 0.346 µm² 6T-SRAM cell fabricated in the 45nm Hi-K Metal-Gate CMOS technology [3]. The cell design takes full advantage of the new technology features such as trench-contact in achieving high-density and low-voltage operation. The Hi-K Metal Gate transistor technology essentially eliminates the gate leakage in the bitcell. The overall measured cell leakage is reduced by 10x, as shown in Figure 3. This leakage reduction provides enormous benefits in power-constrained applications.



Figure 3: SEM top-down view of 0.346 mm² SRAM cell in 45nm technology (left); SRAM cell leakage comparison between 65nm and 45nm technologies (right)
click image for larger view
 



Figure 4: Measured voltage-frequency schmoo
click image for larger view
 

Figure 5 describes the array architecture and configuration along with critical timing of the array. A 16KB subarray with a 256-row bitline and a 128-column wordline is first constructed to achieve optimal array efficiency while meeting an aggressive frequency target. The subarray has both built-in column and row redundancies for yield improvement. The terminology in Figure 5 for rows and columns is (M+N) where M is the base number and N is the redundant value. To meet today's microprocessor bandwidth requirements, the subarray is designed to support 64-bit wide Read and Write. Write is performed with a 64-bit data stream within a single cycle. Read data comes out of the subarray in two consecutive cycles with two 32-bit "chunks", in order to minimize the global routing congestion. Both Sleep and Forward Body Bias (FBB) have independent control circuits along the 256x256 sector boundary, which provides the fine granularity to achieve balanced design between switching power and area efficiency. The design has achieved up to 3.8GHz operating frequency at 1.1V power supply as shown in Figure 4.



Figure 5: 16-KB subarray configuration and critical signals' timing diagram
click image for larger view
 

The PMOS strength in the 6T SRAM cell is essential to maintain the cell stability during the active mode for low-voltage operation. Using lower threshold voltage PMOS is often prohibitive due to excessive transistor leakage. In X6, a dynamic FBB for the PMOS in the SRAM cell is developed to improve the robustness of low-voltage operation while meeting stringent product and manufacturing requirements at minimum design overhead. Figure 6 shows the critical circuits for the dynamic FBB design.



Figure 6: Dynamic SRAM PMOS FBB circuit
click image for larger view
 

The amount of FBB is determined by the ratio of two PMOS devices (PL and PD), and these devices have a built-in programming control. To meet fast and dynamic requirements, an NMOS pull-down path, formed by transistors ND and NS and controlled by a pulse signal, is employed to achieve the fast voltage transition at the N-well. A feedback or shutoff mechanism is also used to prevent the N-well voltage from dropping too low and causing excessive junction leakage. The trip points of the inverters, SI1 and SI2, are optimized to meet this need. The pull-down signal pulse is programmable and generated off the wakeup signal. It starts the discharge of the N-well voltage one cycle before the WL is turned on in order to ensure that the N-well voltage level has reached the intended static level. By applying the FBB selectively to the activated portions of the large array, the overall power impact from FBB is kept to a minimum. Test results demonstrate that a wide range of FBB strength can be achieved under high-frequency operation. The stronger PMOS under FBB can improve the minimum operating voltage up to 75mV without increasing the overall SRAM leakage power dissipation.

The use of dynamic sleep design to lower SRAM power supply has been proven to be effective in reducing the static power consumption by reducing leakage [4]. The SRAM voltage control during the sleep state is critical to maintain the integrity of the data stored in the array. High-volume manufacturing requires programming capability [5]. A scheme with active feedback control on SRAM VCC has been proposed to improve Process-Voltage-Temperature (PVT) variation [6] requiring off-chip voltage reference. In this design, an on-die programmable voltage generator is designed with N-well-based precision resistors. It provides low design overhead as well as insensitivity to different PVT conditions. A simplified Op-Amp further reduces the overall area overhead down to less than 2%. The integrated new design is shown in Figure 7. SRAM VCC is more sensitive to temperature, as it is dependent on subthreshold leakage, which is the dominant leakage source in Hi-K metal gate technology where gate leakage has been eliminated. The use of the active control along with the temperature-insensitive on-die reference voltage generator provides a much tighter SRAM VCC distribution compared to passive control. This translates into about 100mV lower standby voltage during sleep state and better leakage reduction.



Figure 7: Active SRAM VCC control with integrated on-die programmable reference voltage generation
click image for larger view
 



Figure 8: 45nm Intel® Core™2 Processor with 6MB L2 cache that is formed with the common 16KB subarray
click image for larger view
 

The modular architecture of the X6 SRAM design described above has enabled the 16KB subarray to be used directly as the building block for a 6MB L2 cache in the next-generation Intel® Core™2 processor-based CPU [7]. The die photo of this product with the L2 cache highlighted is shown in Figure 8. This silicon-verified and process-optimized subarray was one of the key contributors to the successful production ramp of this product. Thus, the utility of using a common SRAM design for similar applications across different products to reduce manufacturing risk is clearly established. A detailed description of the SRAM design can be found in [8].

Multi-port register file memories are very common and important in CPU and other logic products. Due to unique circuit topologies used in this kind of memory, the sensitivity to process variation, e.g., NMOS and PMOS strength ratio, has increased as technology scaling continues. The X6 testchip contained several important RF arrays that are directly used by the lead CPU products at Intel. They have proved to be very effective in optimizing the transistor parameters as well as providing better circuit-modeling enhancement for various designs.

  Section 3 of 8  

Back to Top

In this article

Download a PDF of this article.