Multi-GPU Programming: Scale Up & Out Using Intel® MPI Library

For shared-memory programming of general-purpose graphics processing unit (GPGPU) systems, users either have to manually run their domain decomposition along available GPUs and GPU tiles or use implicit scaling mechanisms that transparently scale their offload code across multiple GPU tiles. The former approach can be cumbersome, and the latter approach is not always the best performing.

The Intel® MPI Library can take away that burden by enabling the user to only program for a single GPU or tile and leave the distribution to the library. This can make high-performance computing (HPC) and GPU programming much easier. Intel MPI Library doesn't just allow users to pin individual message passing interface (MPI) ranks to individual GPUs or tiles but to also pass GPU memory pointers to the library.

Michael Steyer is an HPC technical consulting engineer supporting technical and HPC segments within the Software and Advanced Technology Group at Intel.

Dmitry Durnov is an Intel MPI Library and Intel® oneAPI Collective Communications Library (oneCCL) products architect at Intel.

Anatoliy Rozanov is an Intel MPI Library lead developer responsible for Intel® GPU enabling and Intel MPI Library process management and deployment infrastructure at Intel.