Download PDF
Correctness Checker Configuration Options
The table below lists the environment variables that help you configure the MPI correctness checking. Please, look through them to understand their purpose. They all are used in the examples given in this tutorial.
Environment Variable | Value | Description |
---|---|---|
VT_DEADLOCK_TIMEOUT <delay> |
<delay> - time threshold Default: 1m Examples: VT_DEADLOCK_TIMEOUT 1m VT_DEADLOCK_TIMEOUT 10s |
If no progress is observed in any process for this amount of time, Intel Trace Collector stops the application and writes a trace file upon reaching this threshold, assuming that a deadlock has occurred.
NOTE:
For interactive use, set this variable to a small value like “10s” to detect the deadlocks quickly without having to wait long for the timeout.
|
VT_DEADLOCK_WARNING <delay> |
<delay> - time threshold Default: 5m Examples: VT_DEADLOCK_WARNING 5m |
Displays a GLOBAL:DEADLOCK:NO_PROGRESS warning if the time spent by MPI processes in their last MPI call exceeds the specified threshold. This warning indicates a load imbalance or a deadlock that cannot be detected, which may occur when at least one process polls for progress instead of blocking inside an MPI call. |
VT_CHECK_TRACING <on | off> |
<on | off> Default: off |
When set to on, this variable enables you to record all events including any MPI errors found during the run and to create a trace file. |
VT_CHECK_MAX_ERRORS <value> |
<value> - maximum errors to detect Default: 1 |
Number of errors that has to be reached by a process before aborting the application. 0 disables the limit. Some errors are fatal and always cause an abort. Errors are counted per-process to avoid the need for communication among processes, as that has several drawbacks, which outweigh the advantage of a global counter. |