Intel® Inspector User Guide for Linux* OS

ID 767796
Date 7/13/2023
Public
Document Table of Contents

View MPI Collected Data

Once the results are collected, the user can open any of them in the standalone GUI or generate a command line report. Use inspxe-cl-help report or vtune-help report to see the options available for generating reports.

To view the results through GUI, launch the {vtune | inspxe}-gui <result path> command or launch the *-gui tool and use the File > Open > Result... menu item to point to the result. Sometimes it is also convenient to copy the result to another system and view it there (for example, to open a result collected on a Linux cluster on a Windows workstation).

MPI functions are classified by the Intel® VTune™ Profiler as system ones making its level of support in this regard similar to Intel® oneAPI Threading Building Blocks (oneTBB) and OpenMP*. This helps the user to focus on his/her code rather than MPI internals.Intel VTune Profiler GUI Call Stack Mode and CLI -stack-mode switches can be used to turn on the mode where the system functions are displayed and thus the internals of the MPI implementation can be viewed and analyzed. The call stack mode User functions+1 is especially useful to find the MPI functions that consume most of CPU Time (Hotspots analysis) or waited the most (Locks and Waits analysis). For example, assume there is a call chain main() -> foo() -> MPI_Bar() -> MPI_Bar_Impl() -> ... where MPI_Bar() is the actual MPI API function you use and the deeper functions are MPI implementation details. The call stack modes behave as follows:

  • The default Only user functions call stack mode will attribute time spent in the MPI calls to the user function foo() so that you can see which of your functions you can change to actually improve the performance.

  • The User functions+1 mode will attribute the time spent in the MPI implementation to the top-level system function - MPI_Bar() so that you can easily see outstandingly heave MPI calls.

  • The User/system functions mode will show the call tree without any reattribution so that you can see where exactly in the Intel MPI library the time was spent.

Intel VTune Profiler / Intel Inspector provide oneTBB and OpenMP support. It is recommended to use these thread-level parallel solutions in addition to MPI-style parallelism to maximize the CPU resource usage across the cluster, and to use the Intel VTune Profiler / Intel Inspector to analyze the performance / correctness of that level of parallelism. The MPI, OpenMP, and oneTBB features in the tools are functionally independent, so all usual features of OpenMP and oneTBB support are applicable when looking into a result collected for an MPI process.

Example

Here is an example of viewing the text report for functions and modules after a Intel VTune Profiler analysis (note that we open individual results each of which was collected for a specific rank of MPI process - foo.14 and foo.15 in the example above):

$ vtune -R hotspots -q -format text -r foo.14
Function Module CPU Time
-------- ------ --------
f        a.out  6.070
main     a.out  2.990

$ vtune -R hotspots -q -format text -group-by module -r foo.14
Module CPU Time
------ --------
a.out  9.060