Deadlocks
(GLOBAL:DEADLOCK)
Deadlocks are detected through a heuristic: the background thread in each process
cooperates with the MPI wrappers to detect that the process is stuck in a certain
MPI call. That alone is not an error because some other processes might still make
progress. Therefore the background threads communicate if at least one process
appears to be stuck. If all processes are stuck, this is treated as a deadlock. The
timeout after which a process and thus the application is considered as stuck is
configurable with
DEADLOCK-TIMEOUT
.The timeout defaults to one minute which should be long enough to ensure that even
very long running MPI operations are not incorrectly detected as being stuck. In
applications which are known to execute correct MPI calls much faster, it is
advisable to decrease this timeout to detect a deadlock sooner.
This heuristic fails if the application is using non-blocking calls like
MPI_Test()
to poll for
completion of an operation which can no longer complete. This case is covered by
another heuristic: if the average time spent inside the last MPI call of each
process exceeds the DEADLOCK-WARNING
threshold, then a GLOBAL:DEADLOCK:NO_PROGRESS
warning is
printed, but the application is allowed to continue because the same high average
blocking time also occurs in correct application with a high load imbalance. For the
same reason the warning threshold is also higher than the hard deadlock timeout.To help analyzing the deadlock, Intel® Trace Collector prints the call stack of all
process. A real hard deadlock exists if there is a cycle of processes waiting for
data from the previous process in the cycle. This data dependency can be an explicit
MPI_Recv()
, but also a
collective operation like MPI_Reduce()
.If message are involved in the cycle, then it might help to replace send or receive
calls with their non-blocking variant. If a collective operation prevents one
process from reaching a message send that another process is waiting for, then
reordering the message send and the collective operation in the first process would
fix the problem.
Another reason could be messages which were accidentally sent to the wrong process.
This can be checked in debuggers which support that by looking at the pending
message queues. In the future Intel® Trace Collector might also support visualizing
the program run in Intel® Trace Analyzer in case of an error. This would help to
find messages which were not only sent to the wrong process, but also received by
that processes and thus do not show up in the pending message queue.
In addition to the real hard deadlock from which the application cannot recover MPI
applications might also contain potential deadlocks: the MPI standard does not
guarantee that a blocking send returns unless the recipient calls a matching
receive. In the simplest case of a head-to-head send with two processes, both enter
a send and then the receive for the message that the peer just sent. This deadlocks
unless the MPI buffers the message completely and returns from the send without
waiting for the corresponding receive.
Because this relies on undocumented behavior of MPI implementations this is a hard
to detect portability problem. Intel® Trace Collector detects these
GLOBAL:DEADLOCK:POTENTIAL
errors by turning
each normal send into a synchronous send. The MPI standard then guarantees that the
send blocks until the corresponding receive is at least started. Send requests are
also converted to their synchronous counterparts; they block in the call which waits
for completion. With these changes any potential deadlock automatically leads to a
real deadlock at runtime and will be handled as described above. To distinguish
between the two types, check whether any process is stuck in a send function. Due to
this way of detecting it, even the normally non-critical potential deadlocks do not
allow the application to proceed.