Visible to Intel only — GUID: GUID-37A4A903-6880-4571-95DE-5E26882DB14E
Visible to Intel only — GUID: GUID-37A4A903-6880-4571-95DE-5E26882DB14E
FAILED_IMAGES
Transformational Intrinsic Function (Generic): Returns an array of index images that have failed.
result = FAILED_IMAGES ([team, kind])
team |
(Input; optional) Must be a scalar of type TEAM_TYPE defined in the intrinsic module ISO_FORTRAN_ENV whose value represents the current or an ancestor team. If not present, the current team is assumed. |
kind |
(Input; optional) Must be a scalar integer expression with a value that is a valid INTEGER kind type parameter. |
The result is a rank-one integer array with the same type kind parameters as kind if present; otherwise, default integer. The size of the array is equal to the number of images in the specified team that are known to have failed.
The result array elements are the image index values of images on the current team that are known to have failed. The indices are arranged in increasing numeric order.
If the image executing the FAILED_IMAGES reference previously executed a collective subroutine whose STAT argument returned the value STAT_FAILED_IMAGES defined in the intrinsic module ISO_FORTRAN_ENV, or if the image executed an image control statement whose STAT= specifier returned the value STAT_FAILED_IMAGE, at least one image in the team executing the collective or image control statement is known to have failed.
Failed images may lead to unavoidable hangs
Coarray programs are parallel programs, so between synchronization points the relative ordering of events in different images is unknown and undefined. The failure of an image or the execution of the FAIL IMAGE statement is such an event. It happens at a definite time in the image that fails, but the other images will discover the failure at different points in their execution. Also, because a failed image does not participate in synchronization points, it is possible for the discovery to happen before a synchronization point in one image and after it in another.
This means that when other images synchronize (such as with a SYNC ALL), it is possible that some will know that that image has failed and some will not. In this case, the images that don't know will attempt to synchronize with the failed image and the application will hang, making no progress.
There is no certain way to prevent a hang when an image fails. However, if you structure your program so that synchronizations points are infrequent, the chance of a failure happening just before a synchronization point is lower. If images frequently do coarray loads and stores, or check image status, they are more likely to discover a failed image sooner. The FAILED_IMAGES intrinsic will check for failed images, but other images might not get the same result from that call.
If image 5 and 12 of the current team are known to have failed, the result of FAILED_IMAGES ( ) is an array of default integer type with size 2 defined with the value [5, 12]. If no images in the current team are known to have failed, the result of FAILED_IMAGES ( ) is a zero-sized array.