Error Message: Bad Termination
NOTE:
The values in the tables below may not reflect the exact node or MPI process where a failure can occur.
Case 1
Error Message
=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 27494 RUNNING AT node1 = KILLED BY SIGNAL: 11 (Segmentation fault) ===================================================================================
or:
=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 27494 RUNNING AT node1 = KILLED BY SIGNAL: 8 (Floating point exception) ===================================================================================
Cause
One of MPI processes is terminated by a signal (for example,
Segmentation fault
or
Floating point exception
) on the
node01
.
Solution
Find the reason of the MPI process termination. It can be the out-of-memory issue in case of
Segmentation fault
or division by zero in case of
Floating point exception
.
Case 2
Error Message
================================================================================ = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 20066 RUNNING AT node01 = KILLED BY SIGNAL: 9 (Killed) ================================================================================
Cause
One of MPI processes is terminated by a signal (for example,
SIGTERM
or
SIGKILL
) on the
node01
due to:
- the host reboot;
- an unexpected signal received;
- out-of-memory manager (OOM) errors;
- killing by the process manager (if another process was terminated before the current process);
- job termination by the Job Scheduler (PBS Pro*, SLURM*) in case of resources limitation (for example, walltime or cputime limitation).
Solution
- Check the system log files.
- Try to find the reason of the MPI process termination and fix the issue.