Defect-Based Test: A Key Enabler for Successful Migration to Structural Test


Previous Next     Page 6 of 12

Challenges of Defect-Based Test

Enumerating Defect Sites

The number of all possible defects on a chip is astronomical, and it is neither feasible nor worthwhile to generate tests for all of them. Fault enumeration is the task of identifying the most important defect sites and then mapping them into fault models that can be targeted by fault simulation and ATPG tools.

To enumerate likely defect sites, we need to understand the underlying causes of defects. Broadly speaking, defects are caused by process variations or random localized manufacturing imperfections, both of which are explained below:

  • Process variations such as transistor channel length variation, transistor threshold voltage variation, metal interconnect thickness variation, and inter metal layer dielectric thickness variation have a big impact on device speed characteristics. In general, the effect of process variation shows up first in the most critical paths in the design, those with maximum and minimum delays.
  • Random imperfections such as resistive bridging defects between metal lines, resistive opens on metal lines, improper via formations, shallow trench isolation defects, etc. are yet another source of defects. Based on the parameters of the defect and "neighboring parasitic," the defect may result in a static or an at-speed failure.

Techniques used for the extraction of faults due to random defects and process variations may differ, but the fundamental approach is to identify design marginalities that are likely to turn into defects when perturbed. The output of a fault extraction tool is typically ordered by probability of occurrence.

Defect Modeling

To test a device, we apply a set of input stimuli and measure the response of the circuit at an output pin. Manufacturing defects, whether random or systematic, eventually manifest themselves as incorrect values on output pins.

Fault simulators and ATPG tools operate at the logical level for efficiency. A fault model is a logic level representation of the defect that is inserted at the defect location. The challenge of fault modeling is to strike a balance between accuracy and simplicity as explained below:

  • Accuracy. The output response of the logic-level netlist with the fault model inserted should closely approximate the output response of the defective circuit for all input stimuli.
  • Simplicity. The fault model should be tractable, i.e., it should not impose a severe burden on fault simulation and ATPG tools.

During the model development phase, the effectiveness of alternative models is evaluated by circuit simulation. Vectors generated on the fault model are simulated at the circuit level in the neighborhood of the defect site, using an accurate device-level model of the defect. However, due to the number of possible defect sites and the complexity of circuit simulation, this can only be done for a small sample.

Defect-Based Fault Simulation

Simulation of defect-based models is conceptually similar to stuck-at fault simulation, with a couple of twists:

  • The number of possible defect-based faults is orders of magnitude larger than stuck-at faults, so the performance of the tool is highly degraded. In order to be effective, a defect-based fault simulator has to be at least an order of magnitude faster.
  • Defect-based faults may involve interactions between nodes across hierarchical boundaries, making it impractical to use a hierarchical or mixed-level approach to fault simulation. It is necessary to simulate the entire design at once, which also imposes capacity and performance requirements.

Defect-Based Test of Cache Memories

Background: The Growth of Caches for Microprocessors

The use of caches for mainstream microprocessors on Intel® architectures, beginning in the early 90s with the i486™ processor, heralded a return to Intel's original technical core competency, silicon memories, albeit with several new twists. The embedded CPU caches have increased in size from the 4K byte cache of the i486 processor generation to 10s and 100s of kilobytes on today's processors and to even larger embedded CPU caches being considered for the future. This has resulted in a steady increase in the fraction of overall memory transistors per CPU and in the amount of CPU cache die area throughout the last decade.

A second key cache test challenge is the increasing number of embedded arrays within a CPU. The number of embedded memory arrays per CPU has gone from a handful on the i486 and i860™ processors to dozens on the more recent Pentium® Pro and Pentium® II processor lines.

Memory Testing Fundamentals: Beyond the Stuck-At Model

The commodity stand-alone memory industry, i.e., DRAMs and 4T SRAMs, have evolved fairly complex sets of tests to thoroughly test simple designs (compared to the complexity of a modern microprocessor) [6]. The targeted fault behaviors include stuck-at, transition, coupling, and disturbs, and the resulting number of targeted tests per circuit, per transistor, or per fault primitive on a memory is much higher than for digital logic devices. On VLSI logic, the challenge is to achieve stuck-at fault coverage in the upper 90 percentile, while on stand-alone memories, the number of targeted tests per circuit component is typically in the 100s or more likely 1000s of accesses per bit within a robust memory test program.

One reason for the greater complexity of memory tests is that at the core of a typical digital memory is a sensitive, small signal bit, bit bar, and sense amp circuit system. Even for stand-alone memories, access and testing of the analog characteristics (e.g., gain, common mode rejection ratio, etc.) is not directly possible and must be done indirectly through the digital interface of address and data control and observability. A large number of first order variables subtly affect the observability of silicon memory defect behavior. Therefore, most memory vendors characterize each variant of a given product line empirically against a broad range of memory patterns before settling on the test suite that meets quality and cost considerations for high-volume manufacturing. These characterization test suites (also known as "kitchen sink" suites) consist of numerous algorithmic march patterns and different sets of cell stability tests (e.g., data retention, bump tests, etc).

A key concept for robust memory testing is the logical to physical mapping. On a given physical design of an array, the physical adjacencies and ordering of bits, bit lines, word lines, decoder bits, etc., typically do not match the logical ordering of bits (such as an address sequence from bit 0 to bit 1 to … highest order bit. Memory tests are designed to be specifically structural where worst-case interactions of the implemented silicon structures with true physical proximity are forced. Thus the true physical to logical mapping is a subsequent transform that must be applied to a given memory pattern in order to maximize its ability to sensitize and observe defects and circuit marginality. Correct and validated documentation to the downstream test writer of the actual physical-to-logical mapping is as important as other design collateral.

Embedded Cache Testing and DFT in the Context of Logic Technologies

Testing of embedded caches also needs to consider the context of related logic technologies. To start with, the basic embedded cache memory cell is typically a six transistor (6T) SRAM as compared to the more typical DRAMs and four transistor (4T) SRAM of the stand-alone silicon memory industry. The 6T SRAM offers better robustness against soft errors and can be thoroughly tested to acceptable quality levels with somewhat simpler test suites. However, the critical motivating factor is that a 6T SRAM cell is feasible, within the context of a high-performance logic silicon fabrication process technology, without additional process steps.

The smaller size (area, # bits) and 6T cell of the embedded CPU cache make it less sensitive than the stand-alone commodity 4T SRAMs and DRAMs. This is somewhat offset by the fact that embedded caches are generally pushing the SRAM design window on a given fabrication technology for system performance reasons. Therefore, adequate testing of embedded 6T SRAMs requires an optimal use of robust memory test techniques targeted at defect behaviors, such as complex march algorithms and cell stability tests.

A critical challenge for embedded SRAM caches is the architectural complexity of access and observability of such arrays compared to a stand-alone memory. For example, for an embedded array such as an instruction cache or a translation buffer, there may not be a normal functional datapath from the array output to the chip primary outputs, making writing of even the simplest memory algorithmic patterns such as the 10N March C- an extreme challenge for even the most experienced CPU design engineers and architects.

In the end, the number and variety of caches and embedded arrays in today's microprocessors demand a multiple of DFT and test solutions optimized to the physical size and area of the various arrays, the performance and cost boundary conditions, and the architectural and micro-architectural details of an embedded array's surroundings. Circuit-level DFT, such as WWTM [7], can offer targeted structural coverage, in this case against cell stability issues and weak bits. External access via special test modes or self-test (BIST) circuits may provide the better solution within different sets of variables. However, care must be taken to ensure the completeness and correctness of the solution in any case and that some level of structural approach is used, i.e., appropriate stimulus-response mapped to the physical implementation of the memory structures. Different types of memory structures, e.g., small signal SRAMs, full Vcc rail swing CMOS register files, CAMs, or domino arrays, each require a targeted structural approach mapped to their strength and weaknesses with respect to defect response.

Technology Development Strategy

The technology for defect-based test spans multiple disciplines in design, CAD tooling, and manufacturing. Although individual components have been tried both within Intel as well in academia and industry, real data on high-volume, high-performance microprocessors is needed to establish the value of this approach.

The defect-based test program at Intel emphasizes early data collection on the effectiveness of fault models. Partnerships with design teams interested in pioneering these new capabilities as they are developed form a cornerstone of this effort. Technology development proceeds in phases as follows:

  • Fault model development. There are a large number of possible defect types that can be modeled. Defects are chosen for modeling based on frequency of occurrence, ease of modeling, escape rate, and perceived importance to the partner design team. Bridges and path delay faults will be the first set of fault models to be investigated.
  • Tool development. A minimal set of prototype tools is developed for the enumeration and simulation of the target fault models. These tools are targeted for limited deployment to a select group of experts in the project design team. The focus of tool development is on accuracy, not performance. Where possible, the tools are validated against existing "golden" capabilities.

    Tools for defect enumeration need to leverage physical design and performance verification tools. Close co-operation with tool builders and project design automation teams is required to build on existing tools, flow, and data in order to facilitate the defect-extraction process.

  • Enumerating fault sites. The actual task of enumerating fault sites is performed jointly by the technology development team and the design team. Working together, test holes such as new architectural enhancements or modules for which legacy tests could not be effectively ported are identified. Fault grading resources are allocated for defect-based test on those regions. When available, data from the FABs are used to assign probabilities to defect sizes.
  • Test generation. Defect-based tests are generated by first grading functional validation and traditional fault grade vectors, and then by targeting the undetected faults for manual test writing. Test writing is necessary at this time because a defect-based ATPG is not yet available, and the legacy designs on which the technology is being pioneered do not have adequate levels of structured DFT. To contain the cost of test writing, defect-based tests are written for carefully selected modules of the design.
  • Model validation. Model validation requires close partnering with the product engineering team. Some changes are required to the manufacturing flow to collect data on the unique DPM contribution of the defect-based tests.

Data from the model validation phase is fed back into model development, as illustrated below. Once a particular fault model is validated, we will enter into development (or co-development with a tools' vendor) of an ATPG capability for that model.

Figure 4:  Technology development process flow

Figure 4: Technology development process flow

Defect Modeling

Modeling of Random Defects

The challenge in fault modeling is to capture a general cause and effect relationship that can be easily simulated or targeted in the case of automatic test pattern generation. A degenerate case of this general approach is a line stuck-at fault model where output at a node is always a logical zero or always a logical one regardless of the logic value it is driven by. Another popular fault model that has been used to target random speed failures is the transition fault model, which is essentially a stuck-at fault with the addition of the condition that the faulty node make a transition, i.e., be at the opposite logic value in the cycle prior to detection.

In creating a realistic fault model for a defect, we must avoid explicitly tabulating the behavior of the defect for every state of the circuit. A table-driven approach will not lend itself to a scalable automated solution for design sizes that exceed 5 million primitives. The approach we use here is to transcribe the deviation in analog behavior into simple conditional logical deviations.

There are a large number of possible failure mechanisms that cause random defects. Rather than develop models for them all and then launch into model validation, our approach is to stage the development of models and tools to address defect types in the order of their importance, and to intercept designs with a complete prototype flow for each model as they become available. This allows us to collect data on the DPM impact of defect-based test early on, and it provides feedback that we can use to refine our models.

One of the most common defect types today is interconnect bridges. As metal densities increase, the importance of metal bridges as a defect-inducing mechanism will grow. Interconnect bridging defects exhibit a range of behavior based on different values of bridge resistance.

This effect is illustrated for the circuit in Figure 5. There is a bridge defect between node j and k in this example. Node k is held at logic 0 as j changes from 0 to 1. The signal transition is propagated and observed at output v.

Figure 5:  Example circuit with a bridge defect

Figure 5: Example circuit with a bridge defect

Figure 6 shows the output response of the circuit for different values of the bridge resistance. Threshold voltages are marked using horizontal dashed lines, and the vertical dashed line shows the required arrival time at node v for the transition to be captured in a downstream latch.

The plot shows three distinct circuit behaviors for varying bridge resistance. For low resistance values, the output never reaches the correct logic value, and the defect shows up as a static logic failure. For intermediate resistances, the output goes to logic 0 too late, resulting in a speed failure. Very high bridge resistances are benign from the viewpoint of correct logical operation of the circuit.

Figure 6:  Output responses for a range of bridge resistance

Figure 6: Output responses for a range of bridge resistance

For low resistances, the defect can be modeled as node j stuck at logic 0 with the condition that k is at logic 0. Speed failures can be modeled as a slow-to-rise transition at node j, with the condition that k is held at logic 0. Such fault models, based on generalizations of the conventional stuck-at and transition faults, are called constrained fault models.

As feature sizes are scaled down, the metal pitch is reduced in tandem to increase density. Reduced metal pitch in turn imposes limitations on the height of metal interconnects that must also decrease to improve manufacturability. Thus the line resistance per unit length goes up almost quadratically. Sustained yield requirement dictates that defect densities remain the same, which in turn implies that interconnect bridge defects also scale in dimension. Higher bridge resistance coupled with lower device resistance during ON state results in more speed failures than hard failures as illustrated in Figure 6 above.

Modeling of Systematic Defects

Not all defects are of a random nature. Known factors such as reticule position, die location on a wafer, mask imperfections, polysilicon density, device orientation, etc., cause systematic variation across wafers and dice. These effects are expected to gain prominence due to reduced noise tolerance as well as a general increase in systematic variability because of such factors as migration towards 300mm wafers, lithographic equipment, and material re-use.

Steeper production ramps are putting increasing pressure on cutting down the time for design correction and test creation based on silicon data. Thus modeling such effects is critical to the success of test.

Process variations lead to delay problems. Therefore, using information of process variations in speed test target selection needs to be addressed.




Previous Next     Page 6 of 12