Fortran Coarray Application Hang After Executing FAIL IMAGE

ID 659458
Updated 12/16/2019
Version Latest
Public

author-image

By

Affected Product  Intel® Fortran Compiler 19.1

Problem Description
The FAIL IMAGE statement used with coarrays is introduced in the Intel Fortran Compiler 19.1. FAIL IMAGE allows debugging of recovery code for failed images without having to wait for an actual image failure.

The current implementation has a known issue that manifests after the FAIL IMAGE statement is executed. The issue is the application may hang when the rest of the application does one of these actions:

  1. Application termination
  2. Creates a coarray (declaration of a non-allocatable, allocation of an allocatable)
  3. Frees a coarray (deallocation or termination)
  4. Broadcast and reduce operations, i.e. CO_BROADCAST, CO_MAX, CO_MIN, CO_REDUCE, and CO_SUM

Workaround
There is no certain way to prevent a hang when an image fails. However, if you structure your program so that synchronizations points are infrequent, the chance of a failure happening just before a synchronization point is lower. If images frequently do coarray loads and stores or check image status, they are more likely to discover a failed image sooner.  The FAILED_IMAGES intrinsic will check for failed images, but other images might not get the same result from that call.