Autotuning
If an application spends significant time in MPI collective operations, tuning might improve its performance.
Tuning is very dependent on the specifications of the particular platform. Autotuner searches for the best possible implementation of a collective operation during application runtime. Each collective operation has its own presets, which consist of the algorithm and its parameters, that the autotuning function goes through and then evaluates the performance of each one. Once autotuning has evaluated the search space, it chooses the fastest implementation and uses it for the rest of the application runtime, and this improves application performance. The autotuner search space can be modified by the
I_MPI_ADJUST_<
variable (see
I_MPI_ADJUST Family Environment Variables).
opname
>_LISTAutotuner determines the tuning parameters and makes them available for autotuning using
I_MPI_TUNING_MODE
and the
I_MPI_TUNING_AUTO
family environment variables to find the best settings (see
Tuning Environment Variables and
I_MPI_TUNING_AUTO Family Environment Variables).
I_MPI_TUNING_MODE
and the
I_MPI_TUNING_AUTO
family environment variables support only Intel processors, and cannot be used on other platforms.
The collectives currently available for autotuning are:
MPI_Allreduce, MPI_Bcast, MPI_Barrier, MPI_Reduce, MPI_Gather, MPI_Scatter, MPI_Alltoall, MPI_Allgatherv, MPI_Reduce_scatter, MPI_Reduce_scatter_block, MPI_Scan, MPI_Exscan, MPI_Iallreduce, MPI_Ibcast, MPI_Ibarrier, MPI_Ireduce, MPI_Igather, MPI_Iscatter, MPI_Ialltoall, MPI_Iallgatherv, MPI_Ireduce_scatter, MPI_Ireduce_scatter_block, MPI_Iscan,
and MPI_Iexscan
.
Using autotuner involves these steps:
- Launch the application with autotuner enabled and specify the dump file that stores results:I_MPI_TUNING_MODE=autoI_MPI_TUNING_BIN_DUMP=<tuning-results.dat>
- Launch the application with the tuning results generated at the previous step:I_MPI_TUNING_BIN= ./tuning-results.datOr use the-tuneHydra option.
If you experience performance issues, see
I_MPI_TUNING_AUTO Family Environment Variables.
Examples
- $ export I_MPI_TUNING_MODE=auto $ export I_MPI_TUNING_AUTO_SYNC=1 $ export I_MPI_TUNING_AUTO_ITER_NUM=5 $ export I_MPI_TUNING_BIN_DUMP=<tuning_results.dat> $ mpirun -n 128 -ppn 64 IMB-MPI1 allreduce -iter 1000,800 -time 4800
- $ export I_MPI_TUNING_BIN=./tuning_results.dat $ mpirun -n 128 -ppn 64 IMB-MPI1 allreduce -iter 1000,800 -time 4800
To tune collectives on a communicator identified with the help of Application Performance Snapshot (APS), execute the following variable at step 1:
I_MPI_TUNING_AUTO_COMM_LIST=comm_id_1, … , comm_id_n
.