System reports ECC memory errors on Intel® Server System
Intel® Server System showing ECC memory errors needs to be handled as follow:
In the document:
Memory Replacement Guideline and Advanced Memory Test for Intel® Server Products Based on Intel® 62X Chipset – White Paper
Chapter 4.1, it states that the advanced memory test (AMT) features were introduced in the Datacenter Solutions Group (DSG) firmware stack starting with the BIOS revision 02.01.0014 and 02.01.0097 for the Intel® Server Systems Intel® Server Systems S2600BP, Intel® Server Systems S2600WF, Intel® Server Systems S2600ST and Intel® Server System S9200WK respectively.
For the memory issue, please update BIOS/FW 02.01.0014 (if VMware is running) or later version (if no VMware) to get the enhanced feature of DIMM (no restriction for Intel® Server System S9200WK Family from VMware Web site) and follow Chapter 5: AMT Enablement through the BIOS Setup Utility to enable Advanced Memory Test with below steps:
DIMM's needs to be replaced if any of below 3 events is found in event log:
The correctable ECC, "Advanced Memory Test Failure and Post Package Repair Finish" event and "Post Package Repair Runtime Request" event are Information only, it doesn't indicate the DIMM defect and no need to replace the DIMM.
ECC: Error Correction Code
PPR: Post:Package Repair
AMT: Advanced Memory Test
ADDDC: Adaptive double device data correction
DSG: Datacenter Solutions Group