|
Interconnect and Noise Immunity Design for the PentiumŪ 4 Processor (continued)
FULL-CHIP WIRE NOISE VERIFICATION The key idea behind the PentiumŪ 4 processor full-chip noise verification is "strobed signaling." A non-restoring node for noise is defined as a node, which if falsely tripped due to noise, will not recover with the passage of time (e.g., domino node or off pass gate latch). A signal is called "strobed," if its logic cone leading to a non-restoring noise node is controlled with a clock (e.g., D1k domino). In this case, the effect of noise on this node may be dependent on clock frequency.
As shown with the D1-k example in Figure 13, at a lower frequency, the noise will settle down before the signal is sampled and as such will not fail at the lower frequency. In most cases, the timing of aggressors switching for noise is earlier than predicted by max delay timing analysis due to a reduced Miller Coupling Factor (MCF) in the noise case. Further, the worst noise case is usually on fast silicon at high voltage (good for speed). As such, in most cases, we can ignore the cases leading to a slight frequency slowdown in our analysis. The tricky situations are those that lead to excessive frequency slowdown or even worse, frequency shmoo holes. Before spending valuable CAD tool resources on these non-trivial cases, we needed to convince ourselves that the common benign case is indeed the dominant one and therefore the one on which to base our full-chip wiring methodology. Most full-chip signals are busses (~59,000 out of 72,000 nets), and less than 10% of full-chip signals are "sensitive" (feeding domino receivers or direct pass gate, etc.). Most busses have similar timing among different bits, which should ease the frequency slowdown and shmoo problem. Figure 14 shows the significant effect of this analysis. Most of the effect of this filtering was due to the "required filtering" that characterized frequency slowdown, and very little was due to "valid filtering," which looks for aggressors not switching together.
Frequency Independent Filtering
The novel features of timing filtering for the Pentium 4 processor include three modes of frequency analysis (low frequency for burn-in analysis, high frequency for at-frequency noise and delay tests, and all-frequency sweep for noise effects); timing skew between victim and aggressors; required-time filtering with victim recovery; and an interactive graphical waveform interface for timing filter debug. The design of the Pentium 4 processor brought new challenges to timing filtering because of the complexity of its clocking system. In earlier clocking styles, an excessive slowdown or shmoo hole was usually caused by a very late signal coupling into a signal with early-required time or by the interaction between signals from opposite phases. In the Pentium 4 processor, however, the design incorporates several clocks that are multiples of each other: signals are F(ast), M(edium), and S(low) clocked signals. Not only do signals occur in different phases, but also with different periods. In addition, these differently clocked signals interact as they are not a priori restricted to different regions of the chip. Thus, mid-frequency shmoo holes are much more probable in such a design. The new approach handles a clocking system with an arbitrary number of phases and an arbitrary number of synchronous clock frequencies by using a Multi-Frequency Algorithm. At very low frequencies, signals activated by different phases are widely separated in time, so much so that they do not interact. This represents the low end of all frequencies to be considered, while the target operating frequency represents the high end. Sweeping frequencies at a small enough increment to catch waveform overlaps is prohibitive due to the complexity of the internal scan. We, therefore, needed a more adaptive algorithm. Here is the entire algorithm with an all-frequency sweep as its outer loop: For each victim net:
The most difficult part of the algorithm is to compute the frequencies of interaction, as illustrated in Figure 16. Given that an O(N log N) scan is in the internal loop, the algorithm cannot afford to sweep with a very fine grain to catch all interactions. The key to computing the next frequency of interaction is to comprehend the relative velocity of timing edge references as one slows the primary clock. By carefully searching the edges most close to one another and keeping track of their relative velocities, this algorithm can be made reasonably efficient. One difficulty is handling edges that refer to a previous clock phase that are actually moving backward with respect to other timing edges as frequency is increased. To handle this and other difficulties, we developed a general approach to handling both the modular nature of signal timings and measuring the frequency at which they may intersect, based on the concept of relative edge velocity. Full-Chip Noise Convergence To circumvent these problems, simple "perturbation"-based models were built using mathematical spreadsheet software. Parallel probes gather all relevant information about a net (timing, parasitics, length, circuit, etc.) to a total of 87 relevant metrics for each net! Approximately 40 full-chip models were built in one week for various "what if"(perturbation) scenarios. These models looked at tweaking various knobs: number of aggressors, switching probabilities of small aggressors, synchronization of noise propagation with coupling, probability of multiple noise events on same gate, various clock skew assumptions for timing filtering, various frequencies for allowed frequency slowdown, etc., to find reasonable settings and really serious problems but not produce too many false violations. A detailed NoisePad model was used as the starting point for these models. After this analysis, the new noise was assumed to be a slight perturbation around its NoisePad value and predicted by the change in the knob (e.g., changing lumped %xcap from 100% to 50%). Although these fast models were very crude, they were surprisingly accurate because they did not try to predict the real noise but rather the perturbation (much smaller error). Based on these fast models, another detailed NoisePad model was built with correct knob settings and used for final convergence. As can be clearly seen from Figure 18, this exercise helped us greatly with convergence and saved us an estimated one to two months in our noise convergence schedule. The dramatic decrease in noise violations seen in Figure 17 involved no work from the design team!
|