OS Machine Check Recovery on Itanium®–Based Systems
The Intel® Itanium® processor family supports an advanced machine architecture, which allows processors to cooperate with the chipset, firmware, and operating system to contain, signal, correct, and log machine check errors. Most errors are corrected by processor or chipset hardware, but multilevel error handling allows Processor Abstraction Layer (PAL) firmware and System Abstraction Layer (SAL) firmware to provide additional error correction capabilities. To further enhance system availability and reliability, errors that cannot be corrected by hardware or firmware are handed off to the operating system for recovery.
This document primarily deals with recoverable machine check aborts (MCAs) and describes the actions SAL firmware and the operating system can take to successfully handle recoverable MCAs. This document explains how to distinguish between different MCAs, describes some MCA conditions that can be successfully recovered at the OS level, and provides guidelines for recovery. High-level pseudocode for the OS_MCA handler is provided to illustrate the OS actions needed to recover from an MCA. This document can benefit OS developers implementing MCA handling and recovery. SAL developers can benefit from better understanding the OS actions needed for recovery.
This document should be read in conjunction with the following documents:
• Intel® Itanium® Architecture Software Developer’s Manual available at http://developer.intel.com
• Itanium® Processor Family System Abstraction Layer Specification available at http://developer.intel.com
• Itanium® Processor Family Error Handling Guide available at http://developer.intel.com
Read the full OS Machine Check Recovery on Itanium®-based Systems