User Guide

  • 2022.2
  • 08/08/2022
  • Public Content

Checking Collective Operations

Checking correct usage of collective operations is easier than checking messages. At the beginning of each operation, Intel® Trace Collector broadcasts the same data from rank #0 of the communicator. This data includes:
  • Type of the operation
  • Root (zero if not applicable)
  • Reduction type (predefined types only)
Now all involved processes check these parameters against their own parameters and report an error in case of a mismatch. If the type is the same, for collective operations with a root process that rank and for reduce operations the reduction operation are also checked. The
error can only be detected for predefined reduction operation because it is impossible to verify whether the program code associated with a custom reduction operation has the same semantic on all processes. After this step depending on the operation different other parameters are also shared between the processes and checked.
Invalid parameters like
where a valid data type is required are detected while checking the parameters. They are reported as one
error with a description of the parameter which is invalid in each process. This leads to less output than printing one error for each process.
If any of these checks fails, the original operation is not executed on any process. Therefore proceeding is possible, but application semantic will be affected.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at