A guide to memory ECC correctable error and when it triggers an event
Steps to follow when dealing with ECC correctable error event logged in System Event Log (SEL)
ECC correctable errors represent a threshold overflow for a given Dual In-line Memory Modules (DIMM) within a given timeframe.
- if there is no catastrophic issue (Purple Screen Of Death (PSOD) or unexpected restart), and the correctable ECC error including Adaptive Double Device Data Correction (ADDDC) error that is less than 10 events within every 24 hours for each DIMM location is within threshold limit, so the recommendation is to monitor for any recurrence of ECC error each DIMM location that triggers the event
- If there is a catastrophic issue (Purple Screen Of Death (PSOD) or unexpected restart), and the correctable ECC error including Adaptive Double Device Data Correction (ADDDC) error that are more than 10 events within every 24 hours for each DIMM location, it is recommended to re-seat each DIMM location by following the steps below:
- Power OFF the system and remove the AC power cable
- Identify the DIMM location to re-seat, refer to Technical Product Specifications for your server platform to identify DIMM location
- Perform the re-seat of identified DIMM(s)
- Insert AC power cable and power ON the system
- Observe for 24 hours for any recurrence of ECC error
- If the ECC error persists with the same DIMM location that was re-seated, then generate and send SEL and Debug logs, both generated from the BMC Web Console, to Intel Customer Support
The Error Correction Code (ECC) errors are self-correcting. Depending on the Reliability Availability Serviceability (RAS) configuration of the memory, the Integrated Memory Controller (IMC) may take the affected DIMM offline.
For different Intel server platforms, there are some differences in their event definition, refer to System Event Log Troubleshooting Guide for your server platform
Intel recommends to download and update the system BIOS to the latest available version for your server platform.
If the system is an Intel® Data Center Systems certified for Nutanix* Enterprise Cloud Platform, visit the Nutanix* Life Cycle Manager page. For a list of hardware and firmware compatibility, visit the Nutanix* Hardware and Firmware compatibility page.