Defect-Based Test: A Key Enabler for Successful Migration to Structural Test


Previous Next     Page 7 of 12

Defect-Based Test Tooling Challenges

Defect Enumeration

The goal of defect enumeration is to prune the list of all possible defects to a manageable number of the most likely faults. Because the likelihood of a fault has a strong dependence on layout geometry, process parameters and timing marginality, defect enumeration is a multi-disciplinary problem.

Here we describe layout-driven and timing-driven approaches to fault enumeration, and we discuss the inherent challenges.

Physical Design Inductive Fault Analysis

Inductive Fault Analysis (IFA) is based on the premise that the probability of a defect occurring at a particular site is a function of the local layout geometry and the distribution of failure mechanisms observed for the manufacturing process. The most commonly observed defects can be classified into two broad categories of physical faults:

  • Bridges occur when the defect causes a conducting path between two nodes that are electrically isolated by design. The resistance of the bridge can vary by process, layer, and defect mechanism.
  • Breaks happen when the defect introduces undesired impedance along a conducting path. In an extreme case, a break can result in an open circuit.

These physical fault models are then mapped onto logical fault models that can be used for fault simulation at the logical, or gate level, of abstraction. If the likelihood of the defect mechanism causing opens and breaks is known for the process, the physical fault sites extracted by IFA are weighted by probability. These probabilities can be used for pruning the fault list, and for expressing the fault coverage obtained by fault simulation in terms of the overall probability of catching a defective part. This weighted fault coverage number can be a better predictor for outgoing DPM than stuck-at fault coverage.

Traditionally, IFA has focussed on layout geometry and defect distribution, and it has ignored the testability of a fault. This last parameter is an important one: If the faults identified using IFA are highly testable, i.e., easily covered by tests for stuck-at faults, then using an IFA-based approach will not yield a significant incremental DPM improvement over a standard stuck-at fault model. Examples of highly likely and highly testable faults are bridges to power rails and clock lines. Therefore, the challenge for effective IFA tools is to identify faults that are both highly likely and relatively difficult to detect using stuck-at fault vectors.

Because they work at such a low level of abstraction, IFA tools need to be scalable in order to be effective on increasingly larger designs. Two divide-and-conquer approaches can be applied to the problem:

  • Hierarchical analysis. This is where layout blocks are analyzed at a detailed level for bridges and breaks on cell-level nodes, and at a global level to analyze inter-block connectivity. The obvious drawbacks of this method are that interactions between wires across blocks, and between block-level and chip-level layout, are ignored. This problem is accentuated by the increasing trend toward over-the-cell global routing.
  • Layout carving, or "cookie-cutting." In this approach, the layout is flattened and carved into manageable pieces called "cookies." Each cookie includes the layout to be analyzed, as well as sufficient surrounding context. A second phase is required to roll up the results collected at the cookie level, and to tie up the inter-cookie interactions.

Timing-Driven Analysis

As mentioned in a previous section, the performance verification tools for large microprocessor designs are not entirely fool proof. To begin with, the PV database is made up of data from different sources, some of which are SPICE-like simulations (very accurate) and some of which are simple estimators. The net result of this could be incorrectly ordered critical paths (speed-limiting circuit paths). During silicon debug and characterization, some of these issues are generally uncovered.

However, some serious issues abound as we look into the future. First, the increased on-die variation in deep sub-micron technologies means that different paths on the chip can be impacted differently. Further, the trend towards higher frequencies implies fewer gates between sequential elements, which may lead to a larger proportion of the chip's paths having small margins. These two factors combined pose one of the biggest test challenges, namely, speed test.

It is no longer just sufficient to have a few most critical paths in the circuit characterized during silicon debug. What is required is an automatic way to enumerate all such paths and then grade the structural tests for "path delay fault" coverage. There are two main issues that need to be solved. First, PV tool limitations need to be worked around (issues related to generating an ordered list of critical paths), and second, modeling issues related to mapping of paths from transistor level to gate level need to be resolved. (Fault simulation happens at the gate level.)

It is likely that this huge path list can be pruned to a more manageable size. Paths could be selected based on their criticality of speed to the design and on their diversity in composition in terms of distribution of delay amongst various constituent factors such as delays on all interconnect layers and actual devices.

Comprehensive Defect Enumeration

While layout analysis may identify potential bridge defect sites, a resistive bridge may not always manifest itself as a logic error. An example of such a situation would be if the defect site has adequate slack designed into it, an increase in delay up to the slack amount will not be ordinarily detectable. Slack may change with a change in cycle time or a change in power supply voltage, thus altering the test realities.

It is therefore required that the defect enumeration scheme be coupled with timing analysis tools, which in turn should be designed to understand the effect of the test environment (temperature, voltage, cycle time) on slack.

Defect-Based Simulation and ATPG

Traditional test automation tools need to be rethought in the context of defect-based test. The fundamental reason for the effectiveness of the stuck-at fault model is that it opens up an observation path starting from the fault site. Unfortunately, the conditions needed to cause the erroneous circuit behavior may not be created at the time the observation path is set up.

Data reported in the literature show that the effectiveness of a test set could be improved by including vectors that detect the same stuck-at fault multiple times, in different ways. This approach, called N-detection, is a random way to set up the conditions needed to activate different failure modes. Defect-based fault models take this notion a step further by specifying the actual excitation conditions, called constraints.

  • Excitation conditions. These are a relatively straightforward extension to commonly used fault models. Constrained stuck-at and constrained transition faults behave like their traditional counterparts except that the fault effect becomes manifest only when an externally specified condition is met.

    Existing fault simulation and test-generation tools can be used to simulate these models by augmenting the target netlist to detect the excitation condition and to inject the fault when it occurs. However, this can be expensive in terms of netlist size for big designs. Also, depending on the location of the set of nodes involved in the constraints and the fault location, the augmenting circuitry can cause design-rule violations such as phase coloring.

  • Propagation conditions. Certain types of physical faults (such as highly resistive bridges and opens) can manifest themselves as localized delay defects. However, the size of the delay is not always large enough to allow it to be treated as a transition, or gross delay. In such cases, the effectiveness of the test can be increased, propagating the fault effect along the paths with the lowest slack. This method implies a tie-in to the timing analysis sub-system.
  • Path delay fault simulation. Several path delay fault models have been proposed in the literature with a view to identifying tests that are robust (less susceptible to off-path circuit delays), and to simplifying the model to ease fault simulation and test generation. Any of these fault models can be used, but there are two new considerations:

    Paths in high-performance designs are not always limited to a single combinational logic block between two sequential elements. A path can span multiple clock phases, crossing sequential elements when they are transparent. A practical path delay fault model should therefore be applicable to multi-cycle paths. Note that such paths may feed back onto themselves (either to the source of the path or to an off-path input).

    The second consideration is that fault simulation and ATPG are typically performed at the gate level, whereas paths are described at the switch level. When a switch-level path is mapped to the gate level, a path may become incompletely specified. There may be multiple ways to test the same gate-level path not all of which exercise the same switch-level path. This problem can be addressed by specifying gate-level conditions that will exercise the switch-level path in a manner analogous to specifying excitation conditions for random defects.

  • Circuit design styles. High-performance designs have core engines running at very high speeds and external interfaces running at lower speeds. In addition, there may be internal subsystems that run at a different clock frequency. Test generation and fault simulation tools have to be designed to accommodate multiple clock domains running at different frequencies. The clocks are typically generated internally and synchronized. DFT design rules, particularly those that check the clocking methodology, need to be enhanced to handle such designs.

    Another important design consideration is power delivery and consumption. In order to reduce a chip's power needs, clocks are often gated to dynamically turn off units that are not being used at a particular time. In the past, many tool designers assumed that clock-gating logic could be controlled directly by external pins, or they treated clock-gating logic as untestable. These assumptions are no longer valid.

  • Capacity and performance. Next-generation CPUs are expected to require 5 to 10 million primitives to model at the gate level. The designs contain on the order of a hundred embedded memory arrays. These arrays have multiple read/write ports, with some ports accessing only parts of the address or data spaces of the array. In the past, most ATPG tools have provided support for simple RAM/ROM primitives that can be combined to model more complex arrays. However, from the point of view of database size and test generation complexity, it is essential to directly support more general behavioral models.

    Defect-based fault models impose additional performance requirements on the tools because of the exploding number of faults that need to be targeted. In order to deal with larger designs, shrinking time-to-quality goals, and the larger number of faults, the performance of test automation tools needs to increase by an order of magnitude.

Failure Diagnosis

Automated failure diagnosis is valuable at different stages of a product's life: silicon debug and qualification manufacturing test and analysis of customer returns. Next-generation failure analysis tools have two major requirements:

  • They must support defect-based models. Diagnostic tools need to leverage the defect resolution provided by the new fault models. This will enhance diagnostic resolutions by narrowing down the probable cause of a failing device to one defect-based fault, where partial matches were found, before using the stuck-at fault model. Diagnostic resolution can be further enhanced by the use of defect probability for prioritizing candidate failures.
  • They must support limited sequentiality for high-performance designs that cannot afford scan DFT in pipelined stages.



Previous Next     Page 7 of 12