• 04/11/2022
  • Public Content

Modifying the Program to Use Coarrays

Coarrays are used to split the trials across multiple copies of the program. They are called images. Each image has its own local variables, plus a portion of any coarrays shared variables. A coarray can be a scalar. A coarray can be thought of as having extra dimensions, referred to as codimensions. To declare a coarray, either add the
attribute, or specify the cobounds alongside the variable name. The cobounds are always enclosed in square brackets. Some examples:
real, dimension(100), codimension[*] :: A integer :: B[3,*]
When specifying cobounds in a declaration, the last cobound must be an asterisk. This indicates that it depends on the number of images in the application. According to the Fortran standard, you can have up to 15 cobounds (a corank of 15), but the sum of the number of cobounds and array bounds must not exceed 31. As with array bounds, it is possible to have a lower cobound that is not 1, though this is not common.
Since the work is being split across the images, a coarray is needed to keep track of each image's subtotal of points within the circle. At the end the subtotals are added to create a grand total, which is divided as it is in the sequential version. The variable total is reused, but make it a coarray. Delete the existing declaration of total and insert into the declaration section of the program:
! Declare scalar coarray that will exist on each image integer(K_BIGINT) :: total[*] ! Per-image subtotal
The important aspect of coarrays is that there is a local part that resides on an individual image, but you can access the part on other images. To read the value of total on image 3, use the syntax total[3]. To reference the local copy, the coindex in brackets is omitted. For best performance, minimize touching the storage of other images.
In a coarray application, each image has its own set of I/O units. The standard input is preconnected only on image 1. The standard output is preconnected on all images. The standard encourages the implementations to merge output, but the order is unpredictable. Intel® Fortran supports this merging.
It is typical to have image 1 do any setup and terminal I/O. Change the initial display to show how many images are doing the work, and verify that the number of trials is evenly divisible by the number of images (by default, this is the number of cores times threads-per-core). Image 1 does all the timing.
Open the file
and save it as
print '(A,I0,A)', "Computing pi using ",num_trials," trials sequentially" ! Start timing call SYSTEM_CLOCK(clock_start)
! Image 1 initialization if (THIS_IMAGE() == 1) then ! Make sure that num_trials is divisible by the number of images if (MOD(num_trials,INT(NUM_IMAGES(),K_BIGINT)) /= 0_K_BIGINT) & error stop "num_trials not evenly divisible by number of images!" print '(A,I0,A,I0,A)', "Computing pi using ",num_trials," trials across ",NUM_IMAGES()," images" call SYSTEM_CLOCK(clock_start) end if
Use the following steps:
  1. Make the test using the intrinsic function
    . When it is called without arguments, it returns the index of the invoking image. The code should execute only on image 1.
  2. Ensure that the number of trials is evenly divisible by the number of images. The intrinsic function
    returns this value.
    is similar to stop except that it forces all images in a coarray application to exit.
  3. Print the number of trials and the number of images.
  4. Start the timing.
Images other than 1 skip this code and proceed to what comes next. In more complex applications you might want other images to wait until the initialization is done. When that is desired, insert a sync all statement. The execution does not continue until all images have reached that statement.
The initialization of total does not need to be changed. This is done on each image's local version.
The main compute loop needs to be changed to split the work. Replace:
do bigi=1_K_BIGINT,num_trials
do bigi=1_K_BIGINT,num_trials/int(NUM_IMAGES(),K_BIGINT)
After the
loop, insert:
! Wait for everyone sync all
Sum the image-specific totals, compute, and display the result. Again, this is done only on image 1. Replace:
! total/num_trials is an approximation of pi/4 computed_pi = 4.0_K_DOUBLE*(REAL(total,K_DOUBLE)/REAL(num_trials,K_DOUBLE)) print '(A,G0.8,A,G0.3)', "Computed value of pi is ", computed_pi, & ", Relative Error: ",ABS((computed_pi-actual_pi)/actual_pi)! Show elapsed time call SYSTEM_CLOCK(clock_end,clock_rate) print '(A,G0.3,A)', "Elapsed time is ", & REAL(clock_end-clock_start)/REAL(clock_rate)," seconds"
! Image 1 end processing if (this_image() == 1) then ! Sum all of the images' subtotals do i=2,num_images() total = total + total[i] end do ! total/num_trials is an approximation of pi/4 computed_pi = 4.0_K_DOUBLE* (REAL(total,K_DOUBLE)/REAL(num_trials,K_DOUBLE)) print '(A,G0.8,A,G0.3)', "Computed value of pi is ", computed_pi, & ", Relative Error: ",ABS((computed_pi-actual_pi)/actual_pi) ! Show elapsed time call SYSTEM_CLOCK(clock_end,clock_rate) print '(A,G0.3,A)', "Elapsed time is ", & REAL(clock_end-clock_start)/REAL(clock_rate)," seconds" end if
Use the following steps on the new code:
  1. Execute this code only on image 1.
  2. The
    (without a coindex) already has the count from image 1, now add in the values from the other images. Note the [i] coindex.
  3. Ensure that the rest of the code is the same as the sequential version.
All of the images exit.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.