Deliver flexible, efficient, and scalable cluster messaging.
One Library with Multiple Fabric Support
Intel® MPI Library is a multifabric message-passing library that implements the open-source MPICH specification. Use the library to create, maintain, and test advanced, complex applications that perform better on high-performance computing (HPC) clusters based on Intel® processors.
Develop applications that can run on multiple cluster interconnects that you choose at run time.
Quickly deliver maximum end-user performance without having to change the software or operating environment.
Achieve the best latency, bandwidth, and scalability through automatic tuning for the latest Intel® platforms.
Reduce the time to market by linking to one library and deploying on the latest optimized fabrics.
Develop in the Cloud
Get what you need to build and optimize your oneAPI projects for free. With an Intel® DevCloud account, you get 120 days of access to the latest Intel® hardware—CPUs, GPUs, FPGAs—and Intel oneAPI tools and frameworks. No software downloads. No configuration steps. No installations.
This optimized framework exposes and exports communication services to HPC applications. Key components include APIs, provider libraries, kernel services, daemons, and test applications.
Intel MPI Library uses OFI to handle all communications.
Enables a more streamlined path that starts at the application code and ends with data communications
Allows tuning for the underlying fabric to happen at run time through simple environment settings, including network-level features like multirail for increased bandwidth
Helps you deliver optimal performance on extreme scale solutions based on Mellanox InfiniBand* and Cornelis Networks*
As a result, you gain increased communication throughput, reduced latency, simplified program design, and a common communication infrastructure.
This library implements the high-performance MPI 3.1 standard on multiple fabrics. This lets you quickly deliver maximum application performance (even if you change or upgrade to new interconnects) without requiring major modifications to the software or operating systems.
Thread safety allows you to trace hybrid multithreaded MPI applications for optimal performance on multicore and manycore Intel® architectures.
Support for multi-endpoint communications lets an application efficiently split data communication among threads, maximizing interconnect utilization.
Improved start scalability is through the mpiexec.hydra process manager, which is:
a process management system for starting parallel jobs
designed to natively work with multiple network protocols such as ssh, rsh, pbs, slurm, and sge
The library provides an accelerated, universal, multifabric layer for fast interconnects via OFI, including for these configurations:
Transmission Control Protocol (TCP) sockets
Interconnects based on Remote Direct Memory Access (RDMA), including Ethernet and InfiniBand
It accomplishes this by dynamically establishing the connection only when needed, which reduces the memory footprint. It also automatically chooses the fastest transport available.
Develop MPI code independent of the fabric, knowing it will run efficiently on whatever network you choose at run time.
Use a two-phase communication buffer-enlargement capability to allocate only the memory space required.
Application Binary Interface Compatibility
An application binary interface (ABI) is the low-level nexus between two program modules. It determines how functions are called and also the size, layout, and alignment of data types. With ABI compatibility, applications conform to the same set of runtime naming conventions.
Intel MPI Library offers ABI compatibility with existing MPI-1.x and MPI-2.x applications. So even if you and not ready to move to the new 3.1 standard, you can take advantage of the library’s performance improvements without recompiling, and use its runtimes.
Performance and Tuning Utilities
Two additional functionalities help you achieve top performance from your applications.
Intel® MPI Benchmarks are used as a set of MPIperformance measurements for point-to-point and global communication operations across a range of message sizes. Run all of the supported benchmarks or specify a single executable file in the command line to get results for a particular subset.
The generated benchmark data fully characterizes:
Performance of a cluster system, including node performance, network latency, and throughput
The library has a robust set of default parameters that you can use as is, or refine them to ensure the highest performance. If you want to tune parameters beyond the defaults, use mpitune to adjust your cluster or application parameters, and then iteratively adjust and fine-tune the parameters until you achieve the best performance.