Intel® Data Center Diagnostic Tool for Intel® Xeon® Processors

Documentation

Maintenance & Performance

000058107

11/18/2021

Introduction

The Intel® Data Center Diagnostic Tool is a diagnostic software tool that can be run on your data center platforms to:

  • Verify the functionality of all cores within an Intel® Xeon® Processor.
  • Be used as part of a regular system maintenance program.

High reliability and availability in the data center require the right tools and a commitment to maintenance. Intel believes it is an industry best practice to use maintenance tools such as these for both initial deployment and periodic testing to help ensure the best system experience.

    Note
    • Modern computing infrastructure brings ever-increasing demand for processing power combined with business expectations for service quality and high availability (and guarantees on service-level agreements [SLAs] in general). These expectations emphasize the need for powerful software tools that can help predict, identify, and minimize unexpected system faults that might compromise service quality or uptime. Read a paper from IDC that covers the need for diagnostic tools including the Intel® Data Center Diagnostic Tool.

    System requirements

    The Intel Data Center Diagnostic Tool is a Linux* application that can be installed and run on many current Linux distributions. There is no Windows* version of this tool.

    For best coverage, run the application in the root system of a server. It is possible to run it inside a container or virtual machine, but be aware that some functionality may be disabled.

    Supported processors:

    • 3rd Generation Intel® Xeon® Scalable Processors (formerly Ice Lake and Cooper Lake)
    • 2nd Generation Intel® Xeon® Scalable Processors (formerly Cascade Lake)
    • 1st Generation Intel® Xeon® Scalable Processors (formerly Skylake)
    • Intel® Xeon® Processor E5 v4 Family (formerly Broadwell)
    • Intel® Xeon® Processor E7 v4 Family (formerly Broadwell)
    Note
    • For developers: Intel started the Open Data Center Diagnostic Project, which opens Intel’s Data Center Diagnostic framework and provides select tests. This offers developers a consistent test development framework that invites the creativity of the Open-Source community to enhance cloud fleet management through the development of unique test screens and other innovative solutions. For more information and access to this framework and tests

    Installation

    Notes
    • Additional details are available in the /usr/share/doc/dcdiag/README.rst file included in the installation.
    • We recommend using the steps in the sections below to link to the repository, which ensures that you get the latest version of the Intel® Data Center Diagnostic Tool. However, if you require a downloadable binary, use an RPM file or DEB file.

     

    Debian*/Ubuntu*

    To install the Intel® Data Center Diagnostic Tool software packages on Debian*-based distributions, add the Intel software package repository and install the appropriate packages.

    Prior to copying+pasting to your console, you may want to run sudo ls and enter your password to prevent the commands from being consumed by the sudo password prompt:

    Set up the key to verify the package signatures

    curl https://repositories.intel.com/dcdt/dcdiag.pub | sudo apt-key add -

    Set up the repository

    sudo apt-add-repository 'deb https://repositories.intel.com/dcdt/debian stable main'

    Install the package

    sudo apt-get update
    sudo apt-get install dcdiag

    Fedora*/CentOS*/RHEL*

    To install the Intel Data Center Diagnostic Tool software packages on a Fedora-based distribution, add the Intel software package repository and install the package.

    The first time you install, YUM or DNF will prompt you to accept the signing key. Verify that the fingerprint is as follows, and then accept it:
    Userid: "CN=Release Key"
    Fingerprint: 6226 CA48 AAB6 0900 2093 C7C4 0A04 4B42 CF00 5B79

    Prior to copying+pasting to your console, you may want to run sudo ls and enter your password to prevent the commands from being consumed by the sudo password prompt:

    Install the repository file

    sudo yum install https://repositories.intel.com/dcdt/dcdiag-repo.rpm

    Install the package

    sudo yum install dcdiag

    OpenSUSE*/SUSE Linux Enterprise*:

    Install the repository file

    sudo zypper ar https://repositories.intel.com/dcdt/dcdiag.repo

    Install the package

    sudo zypper install dcdiag

    You will be warned that respond.xml is not signed. Respond yes to continue. You will be given another chance to verify the package signature. Verify that the fingerprint is as follows, and then accept it:

    Repository: dcdiag
    Key Name: CN=Release Key
    Key Fingerprint: 6226CA48 AAB60900 2093C7C4 0A044B42 CF005B79
    Key Created: Tue 24 Nov 2020 01:47:38 PM PST
    Key Expires: Sat 25 Nov 2023 01:47:38 PM PST
    Rpm Name: gpg-pubkey-cf005b79-5fbd7f7a

     

    How to test the Intel Xeon Processor

    Once installed, the Intel Data Center Diagnostic Tool is automatically enabled for background execution. You can verify that this is successful with the following command:

    # systemctl status dcdiag
    ● dcdiag.service - Intel® Data Center Diagnostic Tool
    Loaded: loaded (/usr/lib/systemd/system/dcdiag.service; enabled; vendor preset: disabled)
    Active: active (running) since Fri 2021-02-19 11:24:17 MST; 4 days ago
    Docs: file:///usr/share/doc/dcdiag/README.rst
    Main PID: 8777 (dcdiag)
    CGroup: /system.slice/dcdiag.service
    └─8777 /usr/bin/dcdiag --service

    If any errors are detected, the tool will log them to the system log. The tool can also query if any errors were detected in the background scan using the --query argument.

    # dcdiag --query
    Intel® Data Center Diagnostic Tool Version 506
    Test completed successfully. No issues detected.

    This tool can also be run manually in the foreground by executing at a Linux command prompt:

    # dcdiag

    The manual test runs for about 45 minutes and has high CPU utilization.

    When the diagnostic completes, the system returns one of the following messages:

    • Test completed successfully. No issues detected.
       
    • Test completed successfully. One or more machine check errors occurred. Please check the system logs.
       
    • This processor is not supported by this version of the tool.

      Check the system's processor model and version. This message appears if the Intel Data Center Diagnostic Tool does not detect a production version of the supported processors. Engineering samples are not supported by this tool.

      Find help in identifying the processor.
       
    • Test completed. Results are inconclusive due to an outdated version of microcode.

      The latest version of the microcode addresses known issues. Please update. Microcode updates are usually delivered by your Linux distribution vendor alongside security fixes and other firmware updates for various components. If your system does not have these updates enabled, we recommend that you enable them. The microcode is automatically loaded by the Linux kernel on every boot and can be reloaded at runtime with the following command as root:

      echo 1 > /sys/devices/system/cpu/microcode
       
    • Test completed. Results are inconclusive due to the system exceeding temperature limits

      This could be due to a variety of issues with the system that is not providing enough cooling for the CPU to operate within required temperature limits. We recommend that you check your system to ensure that required cooling is operating correctly. This may include faulty fans, incorrect airflow, or some other environmental issue.
       
    • Test completed. Results are inconclusive, one or more machine check errors occurred.

      Check system logs.
       
    • Test failed. Contact your system manufacturer or processor vendor for support.

      If test results show fail, check if your server node's processors are still under warranty:

      • If you have a Boxed Intel® Xeon® Processor still under 3-year warranty, contact Intel Customer Support for assistance.
      • If you have a tray processor, contact your system or processor vendor or place of purchase to check if the processor is still under warranty.
        Note Tray processors are sold directly to system manufacturers or Intel authorized distributors. Intel does not provide direct warranty to end users for tray processors unless they came preinstalled in Intel® Data Center Blocks (Intel® DCB) server systems. Except for Intel DCB systems, the tray processor’s warranty is from the vendor or place of purchase of the processor or the system if the processor was pre-installed. Intel recommends purchasing from Intel Authorized Distributors, Intel Approved Suppliers, and resellers of Intel® products.
      • Be aware that Intel does not have an out-of-warranty replacement program.
         
    • Test failed.

      Test completed, and an error was detected on the physical processor containing /sys/devices/system/cpu/cpuXX.

      Contact your system manufacturer or processor vendor for support.

    • Test failed.

      Test is unable to determine which physical processor caused the failure.

      Contact your system manufacturer or processor vendor for support.
       

    Version history

    Date Version Description
    July 7, 2021 540 Initial version