Article ID: 000094930 Content Type: Maintenance & Performance Last Reviewed: 05/22/2023

ECC Memory Errors on Intel® Server System

Environment

Intel® Server Systems S2600BP Intel® Server Systems S2600WF Intel® Server Systems S2600ST Intel® Server System S9200WK

OS Independent family, VMware*

BUILT IN - ARTICLE INTRO SECOND COMPONENT
Summary

Server-oriented troubleshooting guidance for ECC error

Description

System reports ECC memory errors on Intel® Server System

Resolution

Intel® Server System showing ECC memory errors needs to be handled as follow:

In the document:

Memory Replacement Guideline and Advanced Memory Test for Intel® Server Products Based on Intel® 62X Chipset – White Paper
 

Chapter 4.1it states that the advanced memory test (AMT) features were introduced in the Datacenter Solutions Group (DSG) firmware stack starting with the BIOS revision 02.01.0014 and 02.01.0097 for the Intel® Server Systems Intel® Server Systems S2600BP, Intel® Server Systems S2600WF, Intel® Server Systems S2600ST and Intel® Server System S9200WK respectively.

For the memory issue, please update BIOS/FW 02.01.0014 (if VMware is running) or later version (if no VMware) to get the enhanced feature of DIMM (no restriction for Intel® Server System S9200WK Family from VMware Web site) and follow Chapter 5: AMT Enablement through the BIOS Setup Utility to enable Advanced Memory Test with below steps:

  1. Make sure below change are made in BIOS Setup:
    1. Go to Advanced > Memory Configuration, set PPR Type to <Hard PPR>.
    2. Go to Advanced > Memory Configuration > Memory RAS and Performance Configuration screen, set ADDDC Sparing to <Disabled>
    3. Go to Advanced > Memory Configuration > Memory RAS and Performance Configuration screen, set Correctable Error Threshold to <500>
  2. Reboot system. The AMT test will run for multiple minutes.

DIMM's needs to be replaced if any of below 3 events is found in event log:

  1. DIMM with uncorrectable ECC
  2. DIMM with post package repair failure event
  3. DIMM with advanced memory test completion with error

The correctable ECC, "Advanced Memory Test Failure and Post Package Repair Finish" event and "Post Package Repair Runtime Request" event are Information only, it doesn't indicate the DIMM defect and no need to replace the DIMM.

Additional information

ECC: Error Correction Code
PPR: Post:Package Repair
AMT: Advanced Memory Test
ADDDC: Adaptive double device data correction
DSG: Datacenter Solutions Group

VMware Compatibility Guide