Collect MPI Performance/Correctness Data
To collect performance or correctness data for an MPI
application with the
Intel® VTune™
/ Intel Inspector on a Windows* or Linux* OS, the following command should be
used:
Profiler
$
mpirun
-n <N> <abbr>-cl -r my_result -collect <analysis
type> my_app [my_app_ options]
where
<abbr>
is
amplxe
or
inspxe
respectively. The list of analysis types available
can be viewed using
amplxe-cl
-help collect
command.
As a result of using the collection commands, a number
of result directories are created in the current directory, named as
my_result.0
-
my_result.3
. The numeric suffix is the corresponding
MPI process rank that is detected and captured by the collector automatically.
The usage of the suffix makes sure that multiple
amplxe-cl
/
inspxe-cl
instances launched in the same directory on
different nodes do not overwrite the data of each other and can work in
parallel. So, a separate result directory is created for each analyzed process
in the job.
Sometimes it is necessary to collect data for a subset
of the MPI processes in the workload. In this case the per-host syntax of
mpirun
/mpiexec*
should be used to
specify different command lines to execute for different processes.
When launching the collection on Windows OS, we
recommend passing the
-genvall
option to the
mpiexec
tool to make sure that the user environment
variables are passed to all instances of the profiled process. Otherwise, by
default the processes are launched in the context of a system account and some
environment variables (USERPROFILE, APPDATA) do not point where the tools
expect them to point to.
There are also some specialties about stdout / stdin
behavior in MPI jobs profiled with the tools:
- It is recommended to pass the-quiet/-qoption toamplxe-cl/inspxe-clto avoid diagnostic output like progress messages being spilled to the console by every tool process in the job.
- The user may want to use the-loption formpiexec/mpirunto get stdout lines marked with MPI rank.
Example
The most reasonable analysis type to start with for the
Intel VTune
is hotspots, so an example of full command line for collection would be:
Profiler
$
mpirun
-n 4 amplxe-cl -r my_result -collect hotspots -- my_app [my_app_
options]
A similar command line for the Intel Inspector and its
ti1/mi1 analysis types (the lowest overhead threading and memory correctness
analysis types respectively) would look like:
$
mpirun
-n 4 inspxe-cl -r my_result -collect mi1 -- my_app [my_app_
options]
$
mpirun
-n 4 inspxe-cl -r my_result -collect ti1 -- my_app [my_app_
options]
Here is an example where there are 16 processes in the job distributed
across the hosts and hotspots data should be collected for only two of them:
$
mpirun
-host myhost -n 14 ./a.out : -host myhost -n 2 amplxe-cl -r foo -c
hotspots ./a.out
As a result, two directories will be created in the
current directory:
foo.14
and
foo.15
(given that process ranks 14 and 15 were
assigned to the last 2 processes in the job). As an alternative to specifying
the command line above, it is possible to create a configuration file with the
following content:
# config.txt configuration file -host myhost -n 14 ./a.out -host myhost -n 2 amplxe-cl -quiet -collect hotspots -r foo ./a.out
and run the data collection as:
$
mpirun
-configfile ./config.txt
to achieve the same result as above
(
foo.14
and
foo.15
result directories will be created). Similarly,
you can use specific host names to control where the analyzed processes are
executed:
# config.txt configuration file -host myhost1 -n 14 ./a.out -host myhost2 -n 2 amplxe-cl -quiet -collect hotspots -r foo ./a.out
When the host names are mentioned, consecutive MPI
ranks are allocated to the specified hosts. In the case above, ranks 0 to 13,
inclusive, will be assigned to myhost1, the remaining ranks 14 and 15 will be
assigned to myhost2. On Linux, it is possible to omit specifying the exact
hosts, in which case the distribution of the processes between the hosts will
be done in round-robin fashion. That is, myhost1 will get MPI ranks 0, 2, and 4
thru 15, while myhost2 will get MPI ranks 1 and 3. The latter behavior may
change in the future.
In the examples this reference uses the
mpirun
command as opposed to
mpiexec
and
mpiexec.hydra
while real-world jobs might use the
mpiexec*
ones.
mpirun
is a higher-level command that dispatches to
mpiexec
or
mpiexec.hydra
depending on the current default and
options passed. All the examples listed in the paper work for the
mpiexec*
commands as well as the
mpirun
command.