User Guide

Contents

MPI Workflow Example

This section shows example workflows for analyzing MPI applications with
Intel® Advisor
. In the commands below:
  • Path to an application executable is
    ./mpi-sample/1_mpi_sample_serial
    .
    Info
    : In the commands below, make sure to replace the
    myApplication
    with your application executable path and name
    before
    executing a command. If your application requires additional command line options, add them
    after
    the executable name.
  • Path to an
    Intel Advisor
    project directory is
    ./advi_results
    .
  • Performance is modeled for
    all
    MPI ranks.
    To reduce overhead, you can run performance modeling only for specific MPI ranks using
    gtool
    .

Analyze MPI Application Performance

This example shows how to run a Survey analysis to get a basic performance and vectorization report for an MPI application. The analysis is performed for an application that is run in four processes.
  1. Collect survey data for all ranks into the
    shared
    ./advi_results
    project directory.
    $ mpirun -n 4 "advisor --collect=survey --project-dir=./advi_results" ./mpi-sample/1_mpi_sample_serial
    To collect survey data for a single rank (for example, rank 0), you can use the following command:
    $ mpirun -n 4 -gtool "advisor --collect=survey --project-dir=./advi_results:0" ./mpi-sample/1_mpi_sample_serial
    If you need to copy the data to the development system, do so now.
  2. Import and finalize your data.
    $ advisor –-import-dir=./advi_results --project-dir=./new_advi_results --mpi-rank=3 --search-dir src:=./mpi_sample
    The
    --project-dir
    should be a different directory for finalized analysis results on the development system.
  3. Open the results in the
    Intel Advisor
    GUI.
    $ advisor-gui ./new_advi_results
You can proceed to run other analyses one by one. After you finish, you need to import and finalize result for an MPI rank of interest to be able to view it.
For a full vectorization workflow, see Analyze Vectorization and Memory Aspects of an MPI Application recipe in the Intel Advisor Cookbook.

Model MPI Application Performance on GPU

This example shows how to run
Offload Modeling
to get insights about your MPI application performance modeled on a GPU. In this example:
  • The analyses are performed for an application that is run in four processes.
  • Performance is modeled for Intel® HD Graphics 630 (
    gen9_gt2
    configuration).
This example uses
advisor
command line interface and the
analyze.py
script to model performance. You can also run the performance modeling using the
--collect=projection
action.
To model performance:
  1. Generate command lines for performance collection:
    $ advisor-python $APM/collect.py ./advi_results --dry-run --config=gen9_gt2 -- ./mpi-sample/1_mpi_sample_serial
    For Windows* OS, replace
    $APM
    with
    %APM%
    .
  2. Copy the printed commands to the clipboard, add
    mpirun
    or
    mpiexec
    to each command, and run them one by one. Survey and Trip Counts and FLOP analyses are required, others are optional. For example, with
    mpirun
    :
    1. Collect survey data for
      all
      ranks into the
      shared
      ./advi_results
      project directory.
      $ mpirun -n 4 "advisor --collect=survey --project-dir=./advi_results --return-app-exitcode --auto-finalize --static-instruction-mix" ./mpi-sample/1_mpi_sample_serial
    2. Collect trip counts and FLOP data:
      $ mpirun -n 4 "advisor --collect=tripcounts --project-dir=./advi_results --return-app-exitcode --flop --auto-finalize --ignore-checksums --stacks --enable-data-transfer-analysis --track-memory-objects --profile-jit --cache-sources --track-stack-accesses --enable-cache-simulation --cache-config=3:1w:4k/1:64w:512k/1:16w:8m" ./mpi-sample/1_mpi_sample_serial
      Cache configuration specified with
      --cache-config
      option is specific for a selected target device. Do not change the option value generated by
      collect.py --dry-run
      option.
    3. [Optional] Collect Dependencies data:
      $ mpirun -n 4 "advisor --collect=dependencies --project-dir=./advi_results --return-app-exitcode --filter-reductions --loop-call-count-limit=16 --ignore-checksums" ./mpi-sample/1_mpi_sample_serial
  3. Run performance modeling for
    all
    MPI ranks of the application:
    $ for x in ./advi_results/rank.*; do advisor-python $APM/analyze.py $x --config=gen9_gt2 -o $x/perf_models; done
    The results are generated per rank in a
    ./advi_results/rank.X/perf_models
    directory. You can transfer them to the development system and view the report.
    If you want to model a single rank, you can provide a path to a specific rank results or use
    --mpi-rank
    option.
For all analysis types
: When using a shared partition on Windows*, either the network paths must be used to specify the project and executable location, or the MPI options
mapall
or
map
can be used to specify these locations on the network drive.
For example:
$ mpiexec -gwdir \\<host1>\mpi -hosts 2 <host1> 1 <host2> 1 advisor --collect=survey --project-dir=\\<host1>\mpi\advi_results -- \\<host1>\mpi\mpi_sample.exe
$ advisor --import-dir=\\<host1>\mpi\advi_results --project-dir=\\<host1>\mpi\new_advi_results --search-dir src:=\\<host1>\mpi --mpi-rank=1
$ advisor --report=survey --project-dir=\\<host1>\mpi\new_advi_results
Or:
$ mpiexec -mapall -gwdir z:\ -hosts 2 <host1> 1 <host2> 1 advisor --collect=survey --project-dir=z:\advi_results -- z:\mpi_sample.exe
Or:
$ mpiexec -map z:\\<host1>\mpi -gwdir z:\ -hosts 2 <host1> 1 <host2> 1 advisor --collect=survey --project-dir=z:\advi_results -- z:\mpi_sample.exe

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.