Essential Guide to Distributed Memory Coarray Fortran with the Intel...

Introduction
This is an essential guide to using the Coarray Fortran (CAF) feature of the Intel® Fortran Compiler and the Intel® Fortran Compiler Classic using distributed memory on a Linux cluster.

Version
Coarray Fortran was first available in Intel® Parallel Studio XE 2015. Today it is available in the Intel® Fortran Compiler (ifx) and the Intel® Fortran Compiler Classic (ifort) which are included in the oneAPI HPC Toolkit for Linux and Windows.

The CAF feature is currently not available under macOS.

Configuration Set Up
In order to run a distributed memory Coarray Fortran application, you must have an established Linux cluster with one of the Intel Fortran compilers installed. Intel's Coarray Fortran relies on Intel MPI for cluster communication. Intel MPI for use by CAF is included in the Intel Fortran compiler package.

To do this Configuration Set Up step, you use Intel MPI. It is part of the oneAPI HPC Toolkit. If you installed the oneAPI HPC Toolkit, you can continue. Otherwise, install it.

New to Intel MPI? Go to Get Started with Intel® MPI Library for Linux* OS and setup up your access on the cluster. Attached to this article is a small Fortran MPI version of a 'hello world' program that you can use during the setup. Use mpiifort to compile the attached program.

Successful configuration and running of MPI jobs under Intel MPI is a prerequisite to using the Intel CAF feature in distributed memory mode.

When the 'hello world' program has run successfully, here are additional steps for CAF.

1) Set up a machine file

If your cluster hosts are fixed and you do not run under a batch system like PBS or Slurm, set up a hosts file. In your home directory, create a file with the hostnames of your cluster, one host per line and, optionally, with the number of processors on each host, or the number of CAF images to place on each node. Something like this:

coho:4
chinook:4

Syntax is <hostname>[:<number of CAF images>] This example states to run 4 MPI/CAF processes on each node.

You can use any name for the file. By convention, "machinefile" or "hostfile" are probably easiest to remember and maintain. If you are running under a batch system where the hosts are assigned dynamically, see the Intel MPI Library Developer Guide for details on host selection.

2) Setup a Coarray Fortran (CAF) configuration file

When you run a distributed memory Coarray Fortran program, the application first runs a job launcher. The job launcher that starts a distributed memory CAF application uses the Intel MPI mpiexec.hydra command to actually start the job on the hosts in the cluster. This CAF job launcher first reads the CAF configuration file to pick up arguments that will be passed to the Intel MPI mpiexec.hydra command. The CAF configuration file is nothing more than arguments to the Intel MPI mpiexec.hydra command.

An example CAF configuration file:


-genvall -genv I_MPI_FABRICS=shm:ofi -machinefile=./hostsfile -n 8 ./my_caf_prog

where

-genvall tells the launcher that the child CAF images will inherit environment variable settings from the current shell
-genv I_MPI_FABRICS=shm:ofi selects the fabric(s) to use in priority order: use shared memory within a node, ofi for remote. See the Intel MPI Library Developer Reference for setting fabrics. Available network fabrics can vary with the Intel MPI version.
-machinefile ./hostsfile says where to find the list of cluster nodes on which to run. For batch based systems see the Intel MPI Library Developer Guide.
-n 8 launch 8 CAF images
./my_caf_prog is the name of the Coarray Fortran program

There are many, many commands and configurations possible in the machine file. See the documentation on mpiexec.hydra for a complete list of possible control options. Some other useful options to consider:

-rr round-robin image distribution of images to nodes. Round-robin is one way to avoid using hyperthreaded cores. With the -rr option to mpiexec.hydra, image 1 is assigned to host1 from your machine file, image 2 to host2, etc. to image N on hostN, at which point the allocation cycles back to image N+1 on host1 and so on.
-perhost N distribute images to hosts in groups of N. This is another way to avoid hyperthreaded cores: set N to the number of real cores on each host. image 1..N are allocated on host1, images N+1 to N+N on host2, etc.

Building the Application

You are now ready to compile your Coarray Fortran application. Create or use an existing Coarray Fortran application. The essential compiler arguments to use for distributed memory coarray applications are:


ifx -coarray=distributed -coarray-config-file=<CAF config filename> -o my_caf_prog ./my_caf_prog.f90
-or-
ifort -coarray=distributed -coarray-config-file=<CAF config filename> -o my_caf_prog ./my_caf_prog.f90

-coarray=distributed is necessary to create a distributed memory CAF application.
-coarray-config-file=<CAF configuration file> is used to tell the CAF job launcher where to find the configuration file with the runtime arguments for mpiexec.hydra. This file need not exist at the time of compilation. This file is ONLY read at job launch. Thus, it can be changed or modified between job runs to change the number of images along with any other valid control option to mpiexec.hydra. This gives the programmer a way to change the number of images and other parameters without having to recompile the application. A reasonable name for the file may be ~/cafconfig.txt, but the name of the file and location is up to the user to decide.
The executable name is hard-coded in the CAF config file, so be sure that the executable name in the config file matches the name you used with the 'ifort -o <name>' option. Also, be sure to use either the full pathname to the executable OR the current directory "dot" name, such as './a.out' or './mycafprogram.exe'.
-coarray-num-images=N compiler option is ignored for -coarray=distributed. This option is only used by shared memory Coarray Fortran applications. The number of images for distributed memory CAF applications is ONLY controlled at job launch by the '-n N' option in the CAF config file.
When the compiler option -coarray=distributed is used and there is no accompanying -coarray-config-file compiler option, the CAF program runs as if compiled with -coarray=shared.

Of course, you can include any other compiler options including all optimization options.

Running the Application

After compiling the program, simply execute the program.


./my_caf_prog

That's it! The CAF executable will locate your machine file and your CAF config file. The CAF config file passes arguments to the mpiexec.hydra command to start up your distributed CAF program. Host information is pulled from the machine file.

Need to change the number of images launched or the arguments to mpiexec.hydra? Simply change the settings in the CAF config file. Remember, the -coarray-config-file= options used at compile time set the name and location for this file. You should use a name and location you can remember for this file, such as -coarray-config-file=~/cafconfig.txt

Then just add mpiexec.hydra options to ~/cafconfig.txt, for example,


-perhost 2 -envall -n 64 ./my_caf_prog

Again, read the mpiexec.hydra and I_MPI_FABRICS documentation in the Intel MPI Library Developers Reference.

Known Issues or Limitations

Many clusters have multiple MPI implementations installed along with Intel MPI. The PATH and LD_LIBRARY_PATH environment variables must have Intel MPI paths BEFORE any other MPI installed on your system. Ensure that the correct Intel MPI paths appear before other MPI paths.

Batch system notes: In the above notes, we added the option '-genvall' to the CAF config file. This is an attempt to get your current working environment variables to be inherited by your spawned remote CAF processes. This was done to help insure that your PATH and LD_LIBRARY_PATH contain the paths to Intel MPI and Intel Fortran AND that those paths appear before other MPI and compilers on your system. HOWEVER, some batch scheduling systems will not allow environment inheritance. In other words they will throw out your current environment variables and use defaults for these. That is why we suggested adding


source <Intel oneAPI install-dir>/setvars.sh

to your .bashrc or .bash_profile. These dot files are invoked by each child process created, and, hence, SHOULD set the PATH and LD_LIBRARY_PATH appropriately. When in doubt, execute 'which mpiexec.hydra' interactively, or put 'echo `which mpiexec.hydra`' in your batch script to insure the Intel MPI mpiexec.hydra is being used. Other MPI implementation 'mpiexec' commands cannot be used and will cause errors.

It is critical to insure that you can execute an Intel MPI application PRIOR to attempting to run an Intel CAF program.

Key Documents

Get Started with Intel® MPI Library for Linux* OS

Intel MPI Release Notes

Getting Help

Our User Forums are great places to see current issues and to post questions:
Intel Fortran User Forum

oneAPI HPC Toolkit Forum (MPI)

References

Intel Fortran Compiler Developer Guide and Reference

Intel MPI Library Developer Guide for Linux OS

Intel MPI Library Developer Reference for Linux OS