A newer version of this document is available. Customers should click here to go to the newest version.
Code Change Guide
The example in this section shows you one of the ways to change a legacy program to effectively use the advantages of the MPI_THREAD_SPLIT threading model.
In the original code (thread_split.cpp), the functions work_portion_1(), work_portion_2(), and work_portion_3() represent a CPU load that modifies the content of the memory pointed to by the in and out pointers. In this particular example, these functions perform correctness checking of the MPI_Allreduce() function.
Changes Required to Use the OpenMP* Threading Model
- To run MPI functions in a multithreaded environment, MPI_Init_thread() with the argument equal to MPI_THREAD_MULTIPLE must be called instead of MPI_Init().
- According to the MPI_THREAD_SPLIT model, in each thread you must execute MPI operations over the communicator specific to this thread only. So, in this example, the MPI_COMM_WORLD communicator must be duplicated several times so that each thread has its own copy of MPI_COMM_WORLD. NOTE: The limitation is that communicators must be used in such a way that the thread with thread_id n on one node communicates only with the thread with thread_id m on the other. Communications between different threads (thread_id n on one node, thread_id m on the other) are not supported. 
- The data to transfer must be split so that each thread handles its own portion of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level and the OpenMP level must be combined.
- Check that the runtime sets up a reasonable affinity for OpenMP threads. Typically, the OpenMP runtime does this out of the box, but sometimes, setting up the OMP_PLACES=cores environment variable might be necessary for optimal multi-threaded MPI performance.
Changes Required to Use the POSIX Threading Model
- To run MPI functions in a multithreaded environment, MPI_Init_thread() with the argument equal to MPI_THREAD_MULTIPLE must be called instead of MPI_Init().
- You must execute MPI collective operation over a specific communicator in each thread. So the duplication of MPI_COMM_WORLD should be made, creating a specific communicator for each thread.
- The info key thread_id must be properly set for each of the duplicated communicators. NOTE: The limitation is that communicators must be used in such a way that the thread with thread_idn on one node communicates only with the thread with thread_idm on the other. Communications between different threads (thread_idn on one node, thread_idm on the other) are not supported. 
- The data to transfer must be split so that each thread handles its own portion of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level and the POSIX level must be combined.
- The affinity of POSIX threads can be set up explicitly to reach optimal multithreaded MPI performance.