Actual Benchmarking

Intel® MPI Benchmarks User Guide

Download PDF

ID 766171

Date 3/26/2021

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Actual Benchmarking

To reduce measurement errors caused by insufficient clock resolution, every benchmark is run repeatedly. The repetition count is as follows:

For IMB-MPI1, IMB-NBC, and aggregate flavors of IMB-EXT, IMB-IO, and IMB-RMA benchmarks, the repetition count is MSGSPERSAMPLE. This constant is defined in IMB_settings.h and IMB_settings_io.h, with 1000 and 50 values, respectively.

To avoid excessive run times for large transfer sizes X, an upper bound is set to OVERALL_VOL/X. The OVERALL_VOL value is defined in IMB_settings.h and IMB_settings_io.h, with 4MB and 16MB values, respectively.

Given transfer size X, the repetition count for all aggregate benchmarks is defined as follows:

n_sample = MSGSPERSAMPLE (X=0)

n_sample = max(1,min(MSGSPERSAMPLE,OVERALL_VOL/X)) (X>0)

The repetition count for non-aggregate benchmarks is defined completely analogously, with MSGSPERSAMPLE replaced by MSGS_NONAGGR. It is recommended to reduce the repetition count as non-aggregate run times are usually much longer.

In the following examples, elementary transfer means a pure function (MPI_[Send, ...], MPI_Put, MPI_Get, MPI_Accumulate, MPI_File_write_XX, MPI_File_read_XX), without any further function call. Assured completion transfer completion is:

IMB-EXT benchmarks: MPI_Win_fence
IMB-IO Write benchmarks: a triplet MPI_File_sync/MPI_Barrier(file_communicator)/MPI_File_sync
IMB-RMA benchmarks: MPI_Win_flush, MPI_Win_flush_all, MPI_Win_flush_local, or MPI_Win_flush_local_all
Other benchmarks: empty

MPI-1 Benchmarks

for ( i=0; i<N_BARR; i++ ) MPI_Barrier(MY_COMM)
time = MPI_Wtime()
for ( i=0; i<n_sample; i++ )
   execute MPI pattern
time = (MPI_Wtime()-time)/n_sample

IMB-EXT and Blocking I/O Benchmarks

For aggregate benchmarks, the kernel loop looks as follows:

for ( i=0; i<N_BARR; i++ )MPI_Barrier(MY_COMM)
/* Negligible integer (offset) calculations ... */
time = MPI_Wtime()
for ( i=0; i<n_sample; i++ )
   execute elementary transfer
   assure completion of all transfers
time = (MPI_Wtime()-time)/n_sample

For non-aggregate benchmarks, every transfer is completed before going on to the next transfer:

for ( i=0; i<N_BARR; i++ )MPI_Barrier(MY_COMM)
/* Negligible integer (offset) calculations ... */
time = MPI_Wtime()
for ( i=0; i<n_sample; i++ )
   {
   execute elementary transfer
   assure completion of transfer
   }
time = (MPI_Wtime()-time)/n_sample

Non-blocking I/O Benchmarks

A nonblocking benchmark has to provide three timings:

t_pure - blocking pure I/O time
t_ovrl- nonblocking I/O time concurrent with CPU activity
t_CPU - pure CPU activity time

The actual benchmark consists of the following stages:

Calling the equivalent blocking benchmark, as defined in Actual Benchmarking and taking benchmark time as t_pure.
Closing and re-opening the related file(s).
Re-synchronizing the processes.
Running the nonblocking case, concurrent with CPU activity (exploiting t_CPU when running undisturbed), taking the effective time as t_ovrl.

You can set the desired CPU time t_CPU in IMB_settings_io.h:

#define TARGET_CPU_SECS 0.1 /* unit seconds */

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® MPI Benchmarks User Guide

Actual Benchmarking

MPI-1 Benchmarks

IMB-EXT and Blocking I/O Benchmarks

Non-blocking I/O Benchmarks