Identify Communication Issues with Intel® Trace Analyzer and Collector

There are three key reasons for an application to be MPI-bound:
  • High wait times inside the MPI library. This occurs when a process waits for the data from other processes. This case is characterized with high values of MPI Imbalance indicator.
  • Active communications.
  • Poor or incorrectly set optimization settings of the library.
The first two issues can be addressed with the help of the Intel® Trace Analyzer and Collector. Profiling with this tool is as easy as with Application Performance Snapshot - you just need to add the
-trace
option to the launch command.

Run Intel Trace Analyzer and Collector Analysis

Assuming that you have set up the Intel® MPI Library environment, follow these steps to perform the analysis:
  1. Set up the environment for the Intel Trace Analyzer and Collector:
    $ source
    <itac_installdir>
    /bin/itacvars.sh
    where
    <itac_installdir>
    is the installed location of Intel Trace Analyzer and Collector (default location is
    /opt/intel
    ).
  2. Run the
    heart_demo
    application with the
    -trace
    option. Use the host file created in the previous step and use the same processes and threads configuration:
    $ export OMP_NUM_THREADS=64
    $ mpirun -genv VT_LOGFILE_FORMAT=SINGLESTF -trace -n 16 -ppn 2 -f hosts.txt ./heart_demo -m ../mesh_mid -s ../setup_mid.txt -t 50
    In this launch command, the
    -genv VT_LOGFILE_FORMAT=SINGLESTF
    setting ensures that the resulting trace file be generated as a single file, rather than a set of files (default).
    The
    heart_demo.single.stf
    file is created.
  3. Open the trace file to analyze the application:
    $ traceanalyzer ./heart_demo.single.stf &

Interpret Intel Trace Analyzer and Collector Result Data

Among all of the Intel® Trace Analyzer charts, in the case of the
heart_demo
application, the most informative is the Message Profile. This chart indicates the intensity of point-to-point communications for each sender-receiver pair.
To open the Message Profile chart, go to
Charts
Message Profile
or press Ctrl + Alt + M. For the
heart_demo
application, the chart should look similar to this:
In this chart, the vertical processes bar represents the sender ranks, and the horizontal bar represents the receiver ranks. As you can see from the chart, each rank communicates with the others, and rank 0 receives slightly more messages that the others.
Such a picture is typical for a communication pattern where one of the processes (in this case, with number 0) is a so-called "master" process that distributes the workload between others and gathers the results of calculations.

Key Take-Away

When evaluating an application with Intel Trace Analyzer and Collector, it is not always obvious where the problem area is. Examine the application and all available charts closely to find issues. If your application is MPI-bound, start by determining if the issue is due to high MPI library wait times, active communications, or poor library optimization settings.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.