DCMI Reliability and Resilience Guidelines, V1.0
This document presents the reliability and resilience guidelines for Internet portal servers, which are conformant to the Data Center Manageability Interface (DCMI) specification.
Reliability is defined in this context as the predictable and consistent behavior of the DCMI manageability controller within the server and across the data centers.
Resilience is defined in this context as the ability of the DCMI manageability controller to recover from or adjust to changes in the external factors of the data centers and continue to provide availability of DCMI capabilities.
The key areas of reliability and resilience are influenced by the data center networks, the server operating systems, and the behavior of the remote agent.
The DCMI specification defines standardized, abstracted interfaces to the server management subsystem specific to Data Center Servers. This specification is built upon the Intelligent Platform Management Interface (IPMI), V2.0 specification. The DCMI specification specifies the interfaces and capabilities, but does not specify reliability and resilience for the implementation of those interfaces and capabilities, which is covered in this document.
Data centers require consistency in the reliability and resilience of the implementations of the DCMI specification in order to yield the most benefit from systems that support DCMI.
The purpose of this document is to act as a guide to help data center architects and OEM(s) to provide a baseline for reliability and resilience requirements that can be applied to suppliers of DCMI conformant systems. For example, data centers can use this document to define service level agreements (SLA) with their suppliers. Similarly, suppliers can use this document as part of their validation and quality assurance processes for their DCMI implementations.
Read the full DCMI Reliability and Resilience Guidelines, V1.0.