User Guide

  • 2021.4
  • 10/01/2021
  • Public Content

Using Checks to Diagnose Your System

The Diagnostics Utility for Intel® oneAPI Toolkits contains various kinds of checks that diagnose your system or provide information about it, so all checks are divided into information and verification groups. All checks are divided into different semantic groups, which correspond to tags (keywords explaining which area the check belongs to). Each check can simultaneously be in several groups, meaning it can be marked with several tags. Some checks make up a
default
group, which includes the checks providing basic information about the system.
To get the actual list of checks with corresponding tags and descriptions, run the following command:
python3 diagnostics.py --list

Example Output

The output shows these items:
Check Name
: the name of the check that the Diagnostics Utility for Intel® oneAPI Toolkits can call by using the Check Name or by using the Tag associated with that Check Name.
Tags
: groups similar checks together so that you can run all checks with the same Tag by using just one command. A check may have multiple Tags.
Rights
: the permissions needed to run this check.
Description
: a short description of the check.

Description of Checks

Description of Checks
Check Name
Tags
Rights
Description
oneapi_app_check


compile
default
host
runtime
sysinfo
target

user
The check shows version information of installed oneAPI products.
gpu_backend_check


compile
default
gpu
host
runtime
sysinfo
target

user
The check shows information from OpenCL and LevelZero drivers.
vtune_check


gpu
runtime
target
vtune

user
The check verifies if the system is ready to do VTune analysis on GPU(s).
gcc_version_check


compile
default
host
sysinfo

user
The check shows information about GCC compiler version.
intel_gpu_detector_check


advisor
default
gpu
runtime
sysinfo
target
vtune

user
The check shows which Intel GPU(s) is on the system, based on lspci information and internal table.
base_system_check


compile
host
runtime
sysinfo
target

user
The check shows information about hostname, CPU, BIOS, and operating system.
hangcheck_check


advisor
gpu
runtime
sysinfo
target
vtune

user
The check verifies that the GPU hangcheck option is disabled to allow long-running jobs.
user_group_check


advisor
gpu
runtime
target
vtune

user
The check verifies that the current user is in the same group as the GPU(s).
kernel_boot_options_check


runtime
sysinfo
target

user
The check shows kernel boot options.
gpu_metrics_check


gpu
runtime
target

user
The check verifies that GPU metrics are good.
oneapi_gpu_check


gpu
sysinfo

user
The check runs GPU workloads and verifies readiness to run applications on GPU(s).
advisor_check


advisor
gpu
kernel
runtime
target

user
The check verifies if environment is ready to analyze GPU kernels.
user_resources_limits_check


compile
host
runtime
sysinfo
target

user
The check shows resources limits.

Run the default set of checks

The default set of checks is defined in the Description of Checks. Each check with “default” in the Tags column is a part of the default checks. To run the default set of checks:
python3 diagnostics.py

Example output

Checks results: ============================================================================================================================================================================================================= Check name: oneapi_app_check Description : This is a module for gettings oneAPI product information. Result status: ERROR There is no information about Level Zero driver. ============================================================================================================================================================================================================= ============================================================================================================================================================================================================= Check name: gpu_backend_check Description : This is a module for getting GPU information. Result status: ERROR Level Zero driver is not initialized ============================================================================================================================================================================================================= ============================================================================================================================================================================================================= Check name: gcc_version_check Description : Contains information about GCC compiler version. Result status: PASS ============================================================================================================================================================================================================= ============================================================================================================================================================================================================= Check name: intel_gpu_detector_check Description : Detect which Intel GPU is on the system. Result status: ERROR Unable to get information about initialized devices because the user doesn't have read access to /sys/kernel/debug/dri/. ============================================================================================================================================================================================================= 4 CHECKS, 1 PASSED, 0 FAILED, 0 WARNING, 3 ERROR Console output file: /home/test/intel/diagnostics/diagnostics_nnladtldev-01_20210831-141156.txt JSON output file: /home/test/intel/diagnostics/diagnostics_nnladtldev-01_20210831-141156.json

Status Definitions

After performing a check, you will see one of the following statuses:
  • PASS - A check passes if the actual result matches its expected result.
  • WARNING - A check produces the warning result if the actual result matches its expected result, but there is a potential issue.
  • FAIL - A check fails if the actual result does not match its expected result.
  • ERROR - A check produces the error result if the system or the check is unable to carry out actions required to perform the check.
To run a specific check, see Run a Specific Set of Checks. To create a custom script to run specific checks, see Run a Customized Set of Checks.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.