View MPI Collected Data
Once the results are collected, the user can open any of them in the
standalone GUI or generate a command line report. Use
inspxe-cl
-help report
or
amplxe-cl
-help report
to see the options available for generating
reports.
To view the results through GUI, launch the
{amplxe | inspxe}-gui <result path>
command or
launch the
*-gui
tool and use the
menu
item to point to the result. Sometimes it is also convenient to copy the result
to another system and view it there (for example, to open a result collected on
a Linux cluster on a Windows workstation).
MPI functions are classified by the
Intel® VTune™
as system ones making its level of support in this regard similar to
Profiler
Intel® oneAPI Threading Building Blocks (oneTBB)
and OpenMP*. This helps the user to focus on his/her code rather than MPI
internals.Intel VTune
GUI
Profiler
Call Stack Mode
and CLI
-stack-mode
switches can be used to turn on the mode where
the system functions are displayed and thus the internals of the MPI
implementation can be viewed and analyzed. The call stack mode
User functions+1
is especially useful to find the MPI functions that
consume most of CPU Time (Hotspots analysis) or waited the most (Locks and
Waits analysis). For example, assume there is a call chain
main() -> foo() -> MPI_Bar() -> MPI_Bar_Impl() ->
...
where
MPI_Bar()
is the actual MPI API function you use and the
deeper functions are MPI implementation details. The call stack modes behave as
follows:
- The defaultOnly user functionscall stack mode will attribute time spent in the MPI calls to the user functionfoo()so that you can see which of your functions you can change to actually improve the performance.
- TheUser functions+1mode will attribute the time spent in the MPI implementation to the top-level system function -MPI_Bar()so that you can easily see outstandingly heave MPI calls.
- TheUser/system functionsmode will show the call tree without any reattribution so that you can see where exactly in the Intel® MPI library the time was spent.
Intel VTune
/ Intel Inspector provide oneTBB and OpenMP support. It is recommended to use
these thread-level parallel solutions in addition to MPI-style parallelism to
maximize the CPU resource usage across the cluster, and to use the
Profiler
Intel VTune
/ Intel Inspector to analyze the performance / correctness of that level of
parallelism. The MPI, OpenMP, and oneTBB features in the tools are functionally
independent, so all usual features of OpenMP and oneTBB support are applicable
when looking into a result collected for an MPI process.
Profiler
Example
Here is an example of viewing the text report for functions and modules
after a
Intel VTune
analysis (note that we open individual results each of which was collected for
a specific rank of MPI process -
Profiler
foo.14
and
foo.15
in the example
above):
$ amplxe-cl -R hotspots -q -format text -r foo.14 Function Module CPU Time -------- ------ -------- f a.out 6.070 main a.out 2.990 $ amplxe-cl -R hotspots -q -format text -group-by module -r foo.14 Module CPU Time ------ -------- a.out 9.060