Instrumenting an Example with Data Type Mismatch
To experiment with the data type mismatch example, copy the contents of the <install-dir>/itac/examples/checking/global/collective/datatype_mismatch/ directory to your working directory:
$ cp -r <install-dir>/itac_latest/examples/checking/global/collective/datatype_mismatch/ ~ $ cd ~/datatype_mismatch
Then compile and run the MPI_Bcast example located in the directory using the following commands:
$ mpiicc -g MPI_Bcast.c -o MPI_Bcast $ mpirun -n 4 -check_mpi -genv VT_CHECK_MAX_ERRORS 0 MPI_Bcast
The command lines above use the following flags:
- -g – generate the debugging information in the object file to be able to analyze the source files
- -check_mpi – dynamically link the correctness checker library (VTmc.so)
- -genv VT_CHECK_MAX_ERRORS 0 – set the maximum of errors found to unlimited (1 by default)
After running the application you will get the following output:
... [0] ERROR: GLOBAL:COLLECTIVE:DATATYPE:MISMATCH: error [0] ERROR: Mismatch found in local rank [1] (global rank [1]), [0] ERROR: other processes may also be affected. [0] ERROR: No problem found in local rank [0] (same as global rank): [0] ERROR: MPI_Bcast(*buffer=0x7fff1066e814, count=1, datatype=MPI_INT, root=0, comm=MPI_COMM_WORLD) [0] ERROR: main (/checking/global/collective/datatype_mismatch/MPI_Bcast.c:50) [0] ERROR: 1 elements transferred by peer but 4 expected by [0] ERROR: the 3 processes with local ranks [1:3] (same as global ranks): [0] ERROR: MPI_Bcast(*buffer=..., count=4, datatype=MPI_CHAR, root=0, comm=MPI_COMM_WORLD) [0] ERROR: main (/checking/global/collective/datatype_mismatch/MPI_Bcast.c:53) [0] INFO: GLOBAL:COLLECTIVE:DATATYPE:MISMATCH: found 1 time (1 error + 0 warnings), 0 reports were suppressed [0] INFO: Found 1 problem (1 error + 0 warnings), 0 reports were suppressed.
The highlighted error messages refer to lines 50 and 53 in the MPI_Bcast.c source file:
... 39 int main (int argc, char **argv) 40 { 41 int rank, size; 42 43 MPI_Init( &argc, &argv ); 44 MPI_Comm_size( MPI_COMM_WORLD, &size ); 45 MPI_Comm_rank( MPI_COMM_WORLD, &rank ); 46 47 /* error: types do not match */ 48 if( !rank ) { 49 int send = 0; 50 MPI_Bcast( &send, 1, MPI_INT, 0, MPI_COMM_WORLD ); 51 } else { 52 char recv[4]; 53 MPI_Bcast( &recv, 4, MPI_CHAR, 0, MPI_COMM_WORLD ); 54 } 55 56 MPI_Finalize( ); 57 58 return 0; 59 }
The above code example shows a mismatch in the data types within the MPI_Bcast function. While you set the sent data type to int, the receiver expects a char. The number of transferred bytes is the same, so normally this issue is not detected by MPI.
To fix the issue:
- in line 52, change the receiver type from char array to int.
- in line 53, change the MPI data-type argument from MPI_CHAR to MPI_INT, and the number of received elements to 1.
52 int recv; 53 MPI_Bcast( &recv, 1, MPI_INT, 0, MPI_COMM_WORLD );
To check that you have eliminated the message checking errors, re-compile and re-run the application:
... [0] INFO: Error checking completed without finding any problems. ...