Intel® MPI Library provides the following tuning utilities:
Autotuner is the recommended utility for the application-specific tuning. If an application is spending significant time in MPI collective operations, autotuning might improve its performance. Autotuner is easy-to-use, and its overhead is close to zero.
The autotuning utility's tuning scope is I_MPI_ADJUST_<opname> family of environment variables, which are MPI collective operation algorithms. Autotuner limits tuning to the current cluster configuration (fabric, number of ranks, number of ranks per node). It works while an application is running, so performance could be potentially improved just by enabling the autotuner. It is also possible to generate new tuning file with MPI collective operations adjusted to application needs, and this file can be further passed to the I_MPI_TUNING_BIN variable.
mpitune_fast is the recommended easy-to-use utility for the cluster-wide tuning. It uses the autotuner internally, so its search space is also collective operation algorithms. mpitune_fast iteratively launches IMB with options provided (e.g., scale of tuning and collective operations to tune) and generates a file with tuning parameters for cluster configuration. This file could be provided to the Intel MPI Library with the I_MPI_TUNING_BIN environment variable. mpitune_fast supports Slurm* and LSF* workloads managers and should automatically detect job-allocated hosts to use. mpitune_fast can also perform validation of new tuning files and generate CSV files with performance results, so you do not have to validate tuning manually.
mpitune is useful If the search space of the autotuner is not sufficient for your needs. mpitune iteratively launches a benchmarking application with different configurations to measure performance and stores the results of each launch. Based on these results, the tuner generates optimal values for parameters that are being tuned. mpitune has an ability to search for optimal values of variables other than I_MPI_ADJUST_<opname>, and it could be used for application-specific and cluster-wide tuning. For example, it could tune parameters (like radix) of collective operations.
Key difference between mpitune and mpitune_fast is that mpitune run application N times, where N is the number of possible variable values, while mpitune_fast finds the optimal I_MPI_ADJUST_<opname> after running IMB for that collective operation only once.
Differences between the tuning utilities:
|Low tuning overhead||+||+||-|
|Ease of use||+||+||-|
|Tuning beyond collective operations||-||-||+|