Challenges
- Real-time visibility into server memory health
- Predicting catastrophic server memory failures before they happen
Solution
- Intel® Memory Failure Prediction
Executive Summary
Meituan-Dianping (Meituan), a China leading e-commerce platform for services, setup Intel® Memory Failure Prediction (Intel® MFP) for a test deployment over several thousands of servers based on Intel® Xeon® Scalable Processors to help improve the performance and reliability of its server memory which is essential to fast data analytics computing.
Meituan deployed Intel® MFP in its data center, integrating it into their existing management solutions to take advantage of its memory analysis and predictive capabilities. The aim is to help them analyze and model server memory-failure data in order to predict potential failures, prevent downtime, and optimize their current Dual Inline Memory Module (DIMM) upgrade.
The Intel® MFP deployment resulted in improved memory reliability by predictions based on the analysis of the micro-level memory failure logs. Intel® MFP allowed data center staff to migrate workloads before catastrophic memory failures could happen, use page offlining policies to isolate unreliable memory cells or pages, or replace failing DIMMs before they reach a terminal stage, thus reducing downtime by responding appropriately before server failure occurs.
“We would thank Intel for Memory Failure Prediction collaboration with Meituan” said Rui Guo who is the leader of Infrastructure/Server technology at Meituan, “the testing results indicates, with Intel® MFP’s prediction capabilities, it could significantly reduce server hardware failures by up to 40 percent.”