Data Distribution Among Processes
This page describes how the local arrays are distributed across the available processes that make up the virtual global array on which the DFT is applied. It is the user’s responsibility to properly allocate, initialize the input, and obtain the result. See the sections below on how the input and output data are expected to be distributed among the processes.
Non-batched transforms
For single multi-dimensional transforms, the forward and backward domain data are required to be distributed among the available processes. This can be done either by utilizing the built-in slab decompositions or by providing a custom pencils and slabs decomposition.
Slab decompositions
The distributed DFT interface provides built-in row distribution for a 2D transform and slab distribution for a 3D transform respectively. Note that the slab decomposition does not allow strides leading to a non-packed layout of the global array. In case of the real forward domain, data must to be padded for an in-place operation whereas it can be padded or packed for an out-of-place operation.
- For a 2D transform, consider a C style 2D array of size [Y][X] distributed over p processes. The possible row distributions are shown below,
-
Decomposition along Y means, the first Y % p processes each own (Y/p+1)*X elements and the remaining processes each own (Y/p)*X elements.
Decomposition along X means, the first X % p processes each own (X/p+1)*Y elements and the remaining processes each own (X/p)*Y elements.
- Similarly for a 3D transform, consider a C style 3D array of size [Z][Y][X] distributed over p processes. The possible slab decompositions are shown below,
-
Decomposition along Z means, the first Z % p processes each own (Z/p+1)*Y*X elements and the remaining processes each own (Z/p)*Y*X elements.
Decomposition along Y means, the first Y % p processes each own (Y/p+1)*X*Z elements and the remaining processes each own (Y/p)*X*Z elements.
Decomposition along X means, the first X % p processes each own (X/p+1)*Y*Z elements and the remaining processes each own (X/p)*Y*Z elements.
Choosing the dimension along which the decomposition is to be done, for either a forward or backward domain can be acheived by using the set_value member function of the oneapi::mkl::experimental::dft::distributed_dft class.
Custom pencil and slab distribution
Additionally, the interface supports custom data decompositions in the form of rectangles/blocks for 2D and 3D transforms respectively. These rectangles/blocks define a subregion of the global array by specifying the lower and upper corners. By assigning each block to a process, one can represent a data distribution in which each process owns a portion of the global array.
Calling the set_value member function of the oneapi::mkl::experimental::dft::distributed_dft class with appropriate bounds and strides notifies that a custom decomposition is being used.
Batched transforms
For batches of transforms, the total number of batches are divided among the available processes and each individual batch is executed completely within its respective process. If possible the batches are evenly distributed among the processes. For a batch of size b performed on p processes, where b is not divisible by p, the first b % p processes will perform [b/p] + 1 transforms and the remaining processes will perform b/p transforms.