Multithreaded MPI-1 Benchmarks
The IMB-MT component of the Intel(R)MPI Benchmarks provides benchmarks for some of the MPI-1 functions, running in multiple threads. This implies the use of the MPI_THREAD_MULTIPLE mode and execution of several threads per rank, each performing the communication.
The design of multithreaded benchmarks is based on the following key principles:
To make the communication patterns meaningful, the benchmark has to meet the following requirements:
- Data must be distributed between threads. To avoid threads transferring the same data, in a multithreaded communication the input and the output data must be properly distributed between the threads. This must be done before the main benchmarking loop starts. 
- The communication pattern must ensure the deterministic order of data sends and receives. For point-to-point MPI-1 communications, this could be done by separating the thread message flows with tags. This method, however, is unavailable for collective MPI-1 communications. As a result, a different method is used for both collective and point-to-point benchmarks, with each thread using its own MPI communicator. 
Thread control inside a rank is performed using the OpenMP* API.
The following benchmarks are available within the IMB-MT component:
- PingPongMT 
- PingPingMT 
- SendrecvMT 
- ExchangeMT 
- UnibandMT 
- BibandMT 
- BcastMT 
- ReduceMT 
- AllreduceMT