User Guide

  • 2022.0
  • 05/15/2022
  • Public Content

View MPI Collected Data

Once the results are collected, the user can open any of them in the standalone GUI or generate a command line report. Use
-help report
-help report
to see the options available for generating reports.
To view the results through GUI, launch the
{amplxe | inspxe}-gui <result path>
command or launch the
tool and use the
menu item to point to the result. Sometimes it is also convenient to copy the result to another system and view it there (for example, to open a result collected on a Linux cluster on a Windows workstation).
MPI functions are classified by the
Intel® VTune™
as system ones making its level of support in this regard similar to
Intel® oneAPI Threading Building Blocks (oneTBB)
and OpenMP*. This helps the user to focus on his/her code rather than MPI internals.
Intel VTune
Call Stack Mode
and CLI
switches can be used to turn on the mode where the system functions are displayed and thus the internals of the MPI implementation can be viewed and analyzed. The call stack mode
User functions+1
is especially useful to find the MPI functions that consume most of CPU Time (Hotspots analysis) or waited the most (Locks and Waits analysis). For example, assume there is a call chain
main() -> foo() -> MPI_Bar() -> MPI_Bar_Impl() -> ...
is the actual MPI API function you use and the deeper functions are MPI implementation details. The call stack modes behave as follows:
  • The default
    Only user functions
    call stack mode will attribute time spent in the MPI calls to the user function
    so that you can see which of your functions you can change to actually improve the performance.
  • The
    User functions+1
    mode will attribute the time spent in the MPI implementation to the top-level system function -
    so that you can easily see outstandingly heave MPI calls.
  • The
    User/system functions
    mode will show the call tree without any reattribution so that you can see where exactly in the Intel® MPI library the time was spent.
Intel VTune
/ Intel Inspector provide oneTBB and OpenMP support. It is recommended to use these thread-level parallel solutions in addition to MPI-style parallelism to maximize the CPU resource usage across the cluster, and to use the
Intel VTune
/ Intel Inspector to analyze the performance / correctness of that level of parallelism. The MPI, OpenMP, and oneTBB features in the tools are functionally independent, so all usual features of OpenMP and oneTBB support are applicable when looking into a result collected for an MPI process.
Here is an example of viewing the text report for functions and modules after a
Intel VTune
analysis (note that we open individual results each of which was collected for a specific rank of MPI process -
in the example above):
$ amplxe-cl -R hotspots -q -format text -r foo.14 Function Module CPU Time -------- ------ -------- f        a.out  6.070 main     a.out  2.990 $ amplxe-cl -R hotspots -q -format text -group-by module -r foo.14 Module CPU Time ------ -------- a.out  9.060

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at