A newer version of this document is available. Customers should click here to go to the newest version.
Actual Benchmarking
To reduce measurement errors caused by insufficient clock resolution, every benchmark is run repeatedly. The repetition count is as follows:
For IMB-MPI1, IMB-NBC, and aggregate flavors of IMB-EXT, IMB-IO, and IMB-RMA benchmarks, the repetition count is MSGSPERSAMPLE. This constant is defined in IMB_settings.h and IMB_settings_io.h, with 1000 and 50 values, respectively.
To avoid excessive run times for large transfer sizes X, an upper bound is set to OVERALL_VOL/X. The OVERALL_VOL value is defined in IMB_settings.h and IMB_settings_io.h, with 4MB and 16MB values, respectively.
Given transfer size X, the repetition count for all aggregate benchmarks is defined as follows:
n_sample = MSGSPERSAMPLE (X=0)
n_sample = max(1,min(MSGSPERSAMPLE,OVERALL_VOL/X)) (X>0)
The repetition count for non-aggregate benchmarks is defined completely analogously, with MSGSPERSAMPLE replaced by MSGS_NONAGGR. It is recommended to reduce the repetition count as non-aggregate run times are usually much longer.
In the following examples, elementary transfer means a pure function (MPI_[Send, ...], MPI_Put, MPI_Get, MPI_Accumulate, MPI_File_write_XX, MPI_File_read_XX), without any further function call. Assured completion transfer completion is:
IMB-EXT benchmarks: MPI_Win_fence
IMB-IO Write benchmarks: a triplet MPI_File_sync/MPI_Barrier(file_communicator)/MPI_File_sync
IMB-RMA benchmarks: MPI_Win_flush, MPI_Win_flush_all, MPI_Win_flush_local, or MPI_Win_flush_local_all
Other benchmarks: empty
MPI-1 Benchmarks
for ( i=0; i<N_BARR; i++ ) MPI_Barrier(MY_COMM) time = MPI_Wtime() for ( i=0; i<n_sample; i++ ) execute MPI pattern time = (MPI_Wtime()-time)/n_sample
IMB-EXT and Blocking I/O Benchmarks
For aggregate benchmarks, the kernel loop looks as follows:
for ( i=0; i<N_BARR; i++ )MPI_Barrier(MY_COMM) /* Negligible integer (offset) calculations ... */ time = MPI_Wtime() for ( i=0; i<n_sample; i++ ) execute elementary transfer assure completion of all transfers time = (MPI_Wtime()-time)/n_sample
For non-aggregate benchmarks, every transfer is completed before going on to the next transfer:
for ( i=0; i<N_BARR; i++ )MPI_Barrier(MY_COMM) /* Negligible integer (offset) calculations ... */ time = MPI_Wtime() for ( i=0; i<n_sample; i++ ) { execute elementary transfer assure completion of transfer } time = (MPI_Wtime()-time)/n_sample
Non-blocking I/O Benchmarks
A nonblocking benchmark has to provide three timings:
t_pure - blocking pure I/O time
t_ovrl- nonblocking I/O time concurrent with CPU activity
t_CPU - pure CPU activity time
The actual benchmark consists of the following stages:
Calling the equivalent blocking benchmark, as defined in Actual Benchmarking and taking benchmark time as t_pure.
Closing and re-opening the related file(s).
Re-synchronizing the processes.
Running the nonblocking case, concurrent with CPU activity (exploiting t_CPU when running undisturbed), taking the effective time as t_ovrl.
You can set the desired CPU time t_CPU in IMB_settings_io.h:
#define TARGET_CPU_SECS 0.1 /* unit seconds */