Cloud Data Center XTREME-D Intel® DCM Optimizes Cooling and Uptime with Virtual Console Case Study

Intel® DCM delivers significant savings and efficiency by providing virtual KVM access, thermal visibility and group policy controls for servers Business: A Japanese Managed Supercomputer as a Service startup supporting HPC (High Performance Computing) and HPDA (High Performance Data Analytics) for AI, Deep Learning for researchers and scientists. Challenges • Real-time server power and thermal data collection • Real-time health monitoring • Cooling analysis • Automated server discovery • Cross-platform group policy control • KVM for device management Solution • Intel® Data Center Manager Executive Summary XTREME-D is a Japanese HPC-as-a-service startup based in Tokyo, Japan, providing high-performance computing (HPC) to support data-intensive research and scientific organizations running artificial intelligence (AI) and deep learning (DL) applications and large scale simulations. The company deployed Intel® Data Center Manager (Intel® DCM) to improve the thermal health of its Intel® Data Center Blocks (Intel® DCB) servers (Intel 1U, 2U, and 2U multi-node devices connected OmniPath) and data center environment. It further sought to reduce downtime by decreasing the Mean Time to Repair (MTTR) for server failures. Finally, it sought to leverage remote access controls and execute group management to reduce man-hours and improve efficiency in its data center operation. The Intel® DCM test deployment was on 100 servers. The data center staff downloaded Intel® DCM and deployed the intuitive solution to gain insights into the cooling efficiency and health of their data center. Using Intel® DCM’s capabilities including cooling analysis, automated server discovery, thermal health monitoring, and remote access control, data center operators began to assess the cooling efficiency of their operations and compile reports on their findings. The added visibility the solution provided allowed them to optimize their operation safely and efficiently. Because Intel® DCM turns servers into sensors, EXTREME-D avoided purchasing expensive Power Distribution Unit (PDU) hardware. In the initial 100-server deployment, the cost savings would be $5,600 USD. Intel® DCM’s granular visibility allowed data center staff to safely raise temperatures in the server room a total of 3°C while continuously monitoring server health. Factoring in the 100-server subset, the calculated annual savings for driving temperatures higher would be $2,270 USD. Case Study | Intel® DCM Optimizes Cooling and Uptime with Virtual Console for Server Group Management Figure 1. Intel® Data Center Manager Console Intel® DCM allowed EXTREME-D to improve the Service Level Agreement (SLA) of their operation by providing automated alerts, server location mapping, and diagnostic features to reduce downtime an average of two hours per event ($40 per hour). By lowering the MTTR, Intel® DCM would save the company $960 USD annually for this deployment. Intel® DCM offers a cross-platform KVM (virtual keyboard- video-mouse) solution to troubleshoot, diagnose server issues, and remote control them too. This eliminates the need for expensive hardware KVM devices, saving the company an additional $6,250 USD. Finally, Intel® DCM’s remote access and patented group policy control features allowed the company to perform a batch firmware update for their Intel® DCB servers, thus eliminating manual processes and yielding the company an annual savings of $125,000 USD. Deploying Intel® DCM on the company’s 100 Intel® DCB Proof of Concept (POC) would yield XTREME-D an annual savings of $140,080 USD. Background XTREME-D installed Intel® DCM in its high-performance server data center operation to monitor the power consumption and thermal levels of its devices. The data center staff wanted to reduce overhead margins and improve server health and operational