Developer Guide

Contents

MPI Tuning

Intel® MPI Library provides the following tuning utilities:

Autotuner

Autotuner is the recommended utility for the application-specific tuning. If an application is spending significant time in MPI collective operations, autotuning might improve its performance. Autotuner is easy-to-use, and its overhead is close to zero.
The autotuning utility's tuning scope is
I_MPI_ADJUST_<opname>
family of environment variables, which are MPI collective operation algorithms. Autotuner limits tuning to the current cluster configuration (fabric, number of ranks, number of ranks per node). It works while an application is running, so performance could be potentially improved just by enabling the autotuner. It is also possible to generate new tuning file with MPI collective operations adjusted to application needs, and this file can be further passed to the
I_MPI_TUNING_BIN
variable.

mpitune_fast

mpitune_fast
is the recommended easy-to-use utility for the cluster-wide tuning. It uses the autotuner internally, so its search space is also collective operation algorithms.
mpitune_fast
iteratively launches IMB with options provided (e.g., scale of tuning and collective operations to tune) and generates a file with tuning parameters for cluster configuration. This file could be provided to the Intel MPI Library with the
I_MPI_TUNING_BIN
environment variable.
mpitune_fast
supports Slurm* and LSF* workloads managers and should automatically detect job-allocated hosts to use.
mpitune_fast
can also perform validation of new tuning files and generate CSV files with performance results, so you do not have to validate tuning manually.

mpitune

mpitune
is useful If the search space of the autotuner is not sufficient for your needs.
mpitune
iteratively launches a benchmarking application with different configurations to measure performance and stores the results of each launch. Based on these results, the tuner generates optimal values for parameters that are being tuned.
mpitune
has an ability to search for optimal values of variables other than I_MPI_ADJUST_<opname>, and it could be used for application-specific and cluster-wide tuning. For example, it could tune parameters (like radix) of collective operations.
Key difference between
mpitune
and
mpitune_fast
is that
mpitune
run application N times, where N is the number of possible variable values, while
mpitune_fast
finds the optimal I_MPI_ADJUST_<opname> after running IMB for that collective operation only once.
Differences between the tuning utilities:
Parameter
Autotuner
mpitune_fast
mpitune
Low tuning overhead
+
+
-
Ease of use
+
+
-
Application tuning
+
-
+
Microbenchmark tuning
+
+
+
Tuning beyond collective operations
-
-
+

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.