Intel® MPI Library
Deliver flexible, efficient, and scalable cluster messaging.
One Library with Multiple Fabric Support
Intel® MPI Library is a multifabric message-passing library that implements the open source MPICH specification. Use the library to create, maintain, and test advanced, complex applications that perform better on HPC clusters based on Intel® and compatible processors.
- Develop applications that can run on multiple cluster interconnects that you choose at runtime.
- Quickly deliver maximum end-user performance without having to change the software or operating environment.
- Achieve the best latency, bandwidth, and scalability through automatic tuning.
- Reduce the time to market by linking to one library and deploying on the latest optimized fabrics.
Download as Part of the Toolkit
Intel MPI Library is included in the Intel® oneAPI HPC Toolkit. Get the toolkit to analyze, optimize, and deliver applications that scale.
Download the Stand-Alone Version
A stand-alone download of Intel MPI Library is available. You can download binaries from Intel or choose your preferred repository.
Features
OpenFabrics Interface* (OFI) Support
This optimized framework exposes and exports communication services to HPC applications. Key components include APIs, provider libraries, kernel services, daemons, and test applications.
Intel MPI Library uses OFI to handle all communications.
- Enables a more streamlined path that starts at the application code and ends with data communications
- Allows tuning for the underlying fabric to happen at runtime through simple environment settings, including network-level features like multirail for increased bandwidth
- Helps you deliver optimal performance on extreme scale solutions based on Mellanox InfiniBand* and Cornelis Networks*
As a result, you gain increased communication throughput, reduced latency, simplified program design, and a common communication infrastructure.
Scalability
This library implements the high-performance MPI 3.1 and 4.0 standard on multiple fabrics. This lets you quickly deliver maximum application performance (even if you change or upgrade to new interconnects) without requiring major modifications to the software or operating systems.
- Thread safety allows you to trace hybrid multithreaded MPI applications for optimal performance on multicore and manycore Intel architectures.
- Improved start scalability is through the mpiexec.hydra process manager, which is:
- a process management system for starting parallel jobs
- designed to natively work with multiple network protocols such as ssh, rsh, pbs, slurm, and sge
- Built-in cloud support for Amazon Web Services*, Microsoft Azure*, and Google* Cloud Platform
Performance and Tuning Utilities
Two additional functionalities help you achieve top performance from your applications.
Interconnect Independence
The library provides an accelerated, universal, multifabric layer for fast interconnects via OFI, including for these configurations:
- Transmission Control Protocol (TCP) sockets
- Shared memory
- Interconnects based on Remote Direct Memory Access (RDMA), including Ethernet and InfiniBand
It accomplishes this by dynamically establishing the connection only when needed, which reduces the memory footprint. It also automatically chooses the fastest transport available.
- Develop MPI code independent of the fabric, knowing it will run efficiently on whatever network you choose at runtime.
- Use a two-phase communication buffer-enlargement capability to allocate only the memory space required.
Application Binary Interface Compatibility
An application binary interface (ABI) is the low-level nexus between two program modules. It determines how functions are called and also the size, layout, and alignment of data types. With ABI compatibility, applications conform to the same set of runtime naming conventions.
Intel MPI Library offers ABI compatibility with existing MPI-1.x and MPI-2.x applications. So even if you are not ready to move to the new 3.1 and 4.0 standards, you can take advantage of the library’s performance improvements by using its runtimes, without recompiling.
Intel® MPI Benchmarks are used as a set of MPI performance measurements for point-to-point and global communication operations across a range of message sizes. Run all of the supported benchmarks or specify a single executable file in the command line to get results for a particular subset.
The generated benchmark data fully characterizes:
- Performance of a cluster system, including node performance, network latency, and throughput
- Efficiency of the MPI implementation
The library has a robust set of default parameters that you can use as is, or refine them to ensure the highest performance. If you want to tune parameters beyond the defaults, use mpitune to adjust your cluster or application parameters, and then iteratively adjust and fine-tune the parameters until you achieve the best performance.
Benchmarks
Documentation
Get Started
Developer Guides
Developer References
Featured Documentation
- Tune the Intel MPI Library: Basic Techniques
- Developer Reference: Autotuning
- Improve Performance and Stability with Intel MPI Library on InfiniBand*
- Introducing Intel MPI Benchmarks
- Introduction to Message Passing Interface 3 (MPI-3) Shared Memory Programming
- Hybrid Applications: Intel MPI Library and OpenMP*
- Control MPI Process Placement
Training
Tutorials
- Use the MPI Tuner for Intel MPI Library: Linux (PDF) | Windows (PDF)
- Analyze an OpenMP and MPI Application on Linux
Specifications
Processors:
- Intel® Xeon® processors and CPUs with compatible Intel® 64 architecture
- Intel® Data Center GPU Max Series
Development environments:
- Windows: Microsoft Visual Studio*
- Linux: Eclipse* and Eclipse C/C++ Development Tooling (CDT)*
Languages:
- Natively supports C, C++, and Fortran development
Interconnect fabric support:
- Shared memory
- Sockets such as TCP/IP over Ethernet and Gigabit Ethernet Extender*
Operating systems:
- Windows
- Linux
Related Tools
- Intel® VTune™ Profiler: Locate performance bottlenecks fast.
- Intel® Advisor: Optimize HPC code for modern hardware.
- Intel® oneAPI Collective Communications Library: Scalable and efficient distributed training for deep neural networks. It is built on the Intel MPI Library.
Stay In the Know on All Things CODE
Sign up to receive the latest tech articles, tutorials, dev tools, training opportunities, product updates, and more, hand-curated to help you optimize your code, no matter where you are in your developer journey. Take a chance and subscribe. You can change your mind at any time.