Error Message: Bad Termination
Error Message: No such file or Directory
Error Message: Permission Denied
Error Message: Fatal Error
Case 1
Case 2
Case 3
Error Message: Bad File Descriptor
Error Message: Too Many Open Files
Error Message: Too Many Communicators
Problem: High Memory Consumption Readings
Problem: MPI Application Hangs
Problem: Password Required
Problem: Cannot Execute Binary File
Problem: MPI limitation for Docker*
Error Message: Fatal Error
Case 1
Error Message
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(653)......: MPID_Init(860).............: MPIDI_NM_mpi_init_hook(698): OFI addrinfo() failed (ofi_init.h:698:MPIDI_NM_mpi_init_hook:No data available)
Cause
The current provider cannot be run on these nodes. The MPI application is run over the psm2 provider on the non-Intel® Omni-Path Architecture card or over the verbs provider on the non-InfiniBand*, non-iWARP, or non-RoCE card.
Solution
- Change the provider or run MPI application on the right nodes. Use fi_info to get information about the current provider.
- Check if services are running on nodes (opafm for Intel® Omni-Path Architecture and opensmd for InfiniBand*).
Case 2
Error Message
Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: … MPIDI_OFI_send_handler(704)............: OFI tagged inject failed (ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)
Cause
OFI transport uses IP interface without access to remote ranks.
Solution
Set FI_SOCKET_IFACE If the socket provider is used or FI_TCP_IFACE and FI_VERBS_IFACE in case of TCP and verbs providers, respectively. To retrieve the list of configured and active IP interfaces, use the ifconfig utility.
Case 3
Error Message
Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: … MPIDI_OFI_send_handler(704)............: OFI tagged inject failed (ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)
Cause
Ethernet is used as an interconnection network.
Solution
Run FI_PROVIDER = sockets mpirun … to overcome this problem.
Parent topic: Troubleshooting