Ireduce_scatter

Intel® MPI Library

Benchmarks User Guide

Download PDF

ID 766171

Date 8/21/2024

Version

Public

Document Table of Contents

Document Table of Contents x

Benchmarks User Guide

Benchmarks User Guide x

Introducing Intel(R) MPI Benchmarks Getting Started MPI-1 Benchmarks MPI-2 Benchmarks MPI-3 Benchmarks Multithreaded MPI-1 Benchmarks MPI-1 Benchmarks with GPU support Benchmark Methodology Notices and Disclaimers

Introducing Intel(R) MPI Benchmarks x

About this Document Getting Help and Support Related Information

Getting Started x

Memory and Disk Space Requirements Building Intel(R) MPI Benchmarks Running Intel(R) MPI Benchmarks Running Benchmarks in Multiple Mode

MPI-1 Benchmarks x

Classification of MPI-1 Benchmarks Single Transfer Benchmarks Parallel Transfer Collective Benchmarks IMB-P2P Benchmarks

Single Transfer Benchmarks x

PingPong, PingPongSpecificSource, PingPongAnySource PingPing, PingPingSpecificSource, PingPingAnySource

Parallel Transfer x

Sendrecv Exchange Uniband Biband

Collective Benchmarks x

Reduce Reduce_scatter Reduce_scatter_block Allreduce Allgather Allgatherv Scatter Scatterv Gather Gatherv Alltoall Bcast Barrier

IMB-P2P Benchmarks x

PingPong PingPing Unirandom Birandom Corandom

MPI-2 Benchmarks x

Naming Conventions IMB-MPI-2 Benchmark Classification MPI-2 Benchmark Modes IMB-EXT Benchmarks IMB-IO Blocking Benchmarks IMB-IO Non-blocking Benchmarks

IMB-EXT Benchmarks x

Unidir_Put Unidir_Get Bidir_Put Bidir_Get Accumulate Window

IMB-IO Blocking Benchmarks x

S_[ACTION]_indv S_[ACTION]_expl P_[ACTION]_indv P_[ACTION]_expl P_[ACTION]_shared P_[ACTION]_priv C_[ACTION]_indv C_[ACTION]_expl C_[ACTION]_shared Open_Close

MPI-3 Benchmarks x

IMB-NBC Benchmarks IMB-RMA Benchmarks

IMB-NBC Benchmarks x

Measuring Communication and Computation Overlap Measuring Pure Communication Time Iallgather Iallgather_pure Iallgatherv Iallgatherv_pure Iallreduce Iallreduce_pure Ialltoall Ialltoall_pure Ialltoallv Ialltoallv_pure Ibarrier Ibarrier_pure Ibcast Ibcast_pure Igather Igather_pure Igatherv Igatherv_pure Ireduce Ireduce_pure Ireduce_scatter Ireduce_scatter_pure Iscatter Iscatter_pure Iscatterv Iscatterv_pure

IMB-RMA Benchmarks x

Classification of IMB-RMA Benchmarks Accumulate All_get_all All_put_all Bidir_get Bidir_put Compare_and_swap Exchange_Get Exchange_Put Fetch_and_op Get_accumulate Get_all_local Get_local One_put_all One_get_all Put_all_local Put_local Truly_passive_put Unidir_get Unidir_put

Benchmark Methodology x

Control Flow Command-line Control Command-Line Control for IMB-P2P Benchmarks Command-Line Control for IMB-MT Benchmarks Parameters Controlling Intel(R) MPI Benchmarks Hard-Coded Settings Checking Results Output

Hard-Coded Settings x

Communicators, Active Processes Message/I-O Buffer Lengths Buffer Initialization Other Preparations for Benchmarking Warm-Up Phase (IMB-MPI1, IMB-EXT, IMB-NBC, and IMB-RMA) Synchronization Actual Benchmarking

Output x

Sample 1 - IMB-MPI1 PingPong Allreduce Sample 2 - IMB-MPI1 PingPing Allreduce Sample 3 - IMB-IO p_write_indv Sample 4 - IMB-EXT.exe

Release Notes

System Requirements

Get Started Guide - Linux

Get Started Guide - Windows

Developer Guide - Linux

Developer Guide - Windows

Developer Reference - Linux

Developer Reference - Windows

Benchmarks User Guide

Introducing Intel(R) MPI Benchmarks

About this Document

Getting Help and Support

Related Information

Getting Started

Memory and Disk Space Requirements

Building Intel(R) MPI Benchmarks

Running Intel(R) MPI Benchmarks

Running Benchmarks in Multiple Mode

MPI-1 Benchmarks

Classification of MPI-1 Benchmarks

Single Transfer Benchmarks

PingPong, PingPongSpecificSource, PingPongAnySource

PingPing, PingPingSpecificSource, PingPingAnySource

Parallel Transfer

Sendrecv

Exchange

Uniband

Biband

Collective Benchmarks

Reduce

Reduce_scatter

Reduce_scatter_block

Allreduce

Allgather

Allgatherv

Scatter

Scatterv

Gather

Gatherv

Alltoall

Bcast

Barrier

IMB-P2P Benchmarks

PingPong

PingPing

Unirandom

Birandom

Corandom

MPI-2 Benchmarks

Naming Conventions

IMB-MPI-2 Benchmark Classification

MPI-2 Benchmark Modes

IMB-EXT Benchmarks

Unidir_Put

Unidir_Get

Bidir_Put

Bidir_Get

Accumulate

Window

IMB-IO Blocking Benchmarks

S_[ACTION]_indv

S_[ACTION]_expl

P_[ACTION]_indv

P_[ACTION]_expl

P_[ACTION]_shared

P_[ACTION]_priv

C_[ACTION]_indv

C_[ACTION]_expl

C_[ACTION]_shared

Open_Close

IMB-IO Non-blocking Benchmarks

MPI-3 Benchmarks

IMB-NBC Benchmarks

Measuring Communication and Computation Overlap

Measuring Pure Communication Time

Iallgather

Iallgather_pure

Iallgatherv

Iallgatherv_pure

Iallreduce

Iallreduce_pure

Ialltoall

Ialltoall_pure

Ialltoallv

Ialltoallv_pure

Ibarrier

Ibarrier_pure

Ibcast

Ibcast_pure

Igather

Igather_pure

Igatherv

Igatherv_pure

Ireduce

Ireduce_pure

Ireduce_scatter

Ireduce_scatter_pure

Iscatter

Iscatter_pure

Iscatterv

Iscatterv_pure

IMB-RMA Benchmarks

Classification of IMB-RMA Benchmarks

Accumulate

All_get_all

All_put_all

Bidir_get

Bidir_put

Compare_and_swap

Exchange_Get

Exchange_Put

Fetch_and_op

Get_accumulate

Get_all_local

Get_local

One_put_all

One_get_all

Put_all_local

Put_local

Truly_passive_put

Unidir_get

Unidir_put

Multithreaded MPI-1 Benchmarks

MPI-1 Benchmarks with GPU support

Benchmark Methodology

Control Flow

Command-line Control

Command-Line Control for IMB-P2P Benchmarks

Command-Line Control for IMB-MT Benchmarks

Parameters Controlling Intel(R) MPI Benchmarks

Hard-Coded Settings

Communicators, Active Processes

Message/I-O Buffer Lengths

Buffer Initialization

Other Preparations for Benchmarking

Warm-Up Phase (IMB-MPI1, IMB-EXT, IMB-NBC, and IMB-RMA)

Synchronization

Actual Benchmarking

Checking Results

Output

Sample 1 - IMB-MPI1 PingPong Allreduce

Sample 2 - IMB-MPI1 PingPing Allreduce

Sample 3 - IMB-IO p_write_indv

Sample 4 - IMB-EXT.exe

Notices and Disclaimers

Code Samples

Ireduce_scatter

The benchmark for MPI_Ireduce_scatter that measures communication and computation overlap. It reduces a vector of length L = X/sizeof(float) float items. The MPI data type is MPI_FLOAT. The MPI operation is MPI_SUM. In the scatter phase, the L items are split as evenly as possible. To be exact, for np number of processes:

L = r*np+s

where

r = ⌊L/np⌋
s = L mod np

In this case, the process with rank i gets:

r+1 items when i<s
r items when is

Property	Description
Measuredpattern	`MPI_Ireduce_scatter/IMB_cpu_exploit/MPI_Wait`
MPI data type	`MPI_FLOAT`
MPI operation	`MPI_SUM`
Reportedtimings	`t_ovrl` `t_pure` `t_CPU` `overlap=100.*max(0,m in(1, (t_pure+t_CPU-t_ovrl) / min(t_pure, t_CPU))` For details, see Measuring Communication and Computation Overlap.
Reportedthroughput	None

Level Two Title

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Benchmarks User Guide

Ireduce_scatter