Intel® Omni-Path Fabric 100 Series

The next generation of High Performance Computing (HPC) fabrics

The Next-Generation Fabric

Intel® Omni-Path Architecture (Intel® OPA), an element of Intel® Scalable System Framework, delivers the performance for tomorrow’s high performance computing (HPC) workloads and the ability to scale to tens of thousands of nodes—and eventually more—at a price competitive with today’s fabrics. The Intel OPA 100 Series product line is an end-to-end solution of PCIe* adapters, silicon, switches, cables, and management software. As the successor to Intel® True Scale Fabric, this optimized HPC fabric is built upon a combination of enhanced IP and Intel® technology.

For software applications, Intel OPA will maintain consistency and compatibility with existing Intel True Scale Fabric and InfiniBand* APIs by working through the open source OpenFabrics Alliance (OFA) software stack on leading Linux* distribution releases. Intel True Scale Fabric customers will be able to migrate to Intel OPA through an upgrade program.

A New High Performance Fabric for HPC

Intersect360 Research reports in detail why the Intel® Omni-Path Architecture product line marks the most significant new interconnect for HPC since the introduction of InfiniBand*.

Read the white paper

Intel® Omni-Path Builds Bridges at Pittsburgh Supercomputing Center

Addison Snell, CEO of Intersect360 Research, speaks with Nick Nystrom, Ph.D., and Barry Davis about Pittsburgh Supercomputing Center's impressive Bridges system.

Listen to the podcast

The Future of High Performance Fabrics

Current standards-based high performance fabrics, such as InfiniBand*, were not originally designed for HPC, resulting in performance and scaling weaknesses that are currently impeding the path to Exascale computing. Intel® Omni-Path Architecture is being designed specifically to address these issues and scale cost-effectively from entry level HPC clusters to larger clusters with 10,000 nodes or more. To improve on the InfiniBand specification and design, Intel is using the industry’s best technologies including those acquired from QLogic and Cray alongside Intel® technologies.

While both Intel OPA and InfiniBand Enhanced Data Rate (EDR) will run at 100Gbps, there are many differences. The enhancements of Intel OPA will help enable the progression towards Exascale while cost-effectively supporting clusters of all sizes with optimization for HPC applications at both the host and fabric levels for benefits that are not possible with the standard InfiniBand-based designs.

Intel OPA is designed to provide the:

  • Features and functionality at both the host and fabric levels to greatly raise levels of scaling
  • CPU and fabric integration necessary for the increased computing density, improved reliability, reduced power, and lower costs required by significantly larger HPC deployments
  • Fabric tools to readily install, verify, and manage fabrics at this level of complexity

Intel® Omni-Path Key Fabric Features and Innovations

Adaptive Routing

Adaptive Routing monitors the performance of the possible paths between fabric end-points and selects the least congested path to rebalance the packet load. While other technologies also support routing, the implementation is vital. Intel’s implementation is based on cooperation between the Fabric Manager and the switch ASICs. The Fabric Manager—with a global view of the topology—initializes the switch ASICs with several egress options per destination, updating these options as the fundamental fabric changes when links are added or removed. Once the switch egress options are set, the Fabric Manager monitors the fabric state, and the switch ASICs dynamically monitor and react to the congestion sensed on individual links. This approach enables Adaptive Routing to scale as fabrics grow larger and more complex.

Dispersive Routing

One of the critical roles of fabric management is the initialization and configuration of routes through the fabric between pairs of nodes. Intel® Omni-Path Fabric supports a variety of routing methods, including defining alternate routes that disperse traffic flows for redundancy, performance, and load balancing. Instead of sending all packets from a source to a destination via a single path, Dispersive Routing distributes traffic across multiple paths. Once received, packets are reassembled in their proper order for rapid, efficient processing. By leveraging more of the fabric to deliver maximum communications performance for all jobs, Dispersive Routing promotes optimal fabric efficiency.

Traffic Flow Optimization

Traffic Flow Optimization optimizes the quality of service beyond selecting the priority—based on virtual lane or service level—of messages to be sent on an egress port. At the Intel® Omni-Path Architecture link level, variable length packets are broken up into fixed-sized containers that are in turn packaged into fixed-sized Link Transfer Packets (LTPs) for transmitting over the link. Since packets are broken up into smaller containers, a higher priority container can request a pause and be inserted into the ISL data stream before completing the previous data.

The key benefit is that Traffic Flow Optimization reduces the variation in latency seen through the network by high priority traffic in the presence of lower priority traffic. It addresses a traditional weakness of both Ethernet and InfiniBand* in which a packet must be transmitted to completion once the link starts even if higher priority packets become available.

Packet Integrity Protection

Packet Integrity Protection allows for rapid and transparent recovery of transmission errors between a sender and a receiver on an Intel® Omni-Path Architecture link. Given the very high Intel® OPA signaling rate (25.78125G per lane) and the goal of supporting large scale systems of a hundred thousand or more links, transient bit errors must be tolerated while ensuring that the performance impact is insignificant. Packet Integrity Protection enables recovery of transient errors whether it is between a host and switch or between switches. This eliminates the need for transport level timeouts and end-to-end retries. This is done without the heavy latency penalty associated with alternate error recovery approaches.

Dynamic Lane Scaling

Dynamic Lane Scaling allows an operation to continue even if one or more lanes of a 4x link fail, saving the need to restart or go to a previous checkpoint to keep the application running. The job can then run to completion before taking action to resolve the issue. Currently, InfiniBand* typically drops the whole 4x link if any of its lanes drops, costing time and productivity.

Ask How Intel® Omni-Path Architecture Can Meet Your HPC Needs

Intel is clearing the path to Exoscale computing and addressing tomorrow’s HPC issues. Contact your Intel representative or any authorized Intel® True Scale Fabric provider to discuss how Intel® Omni-Path Architecture can improve the performance of your future HPC workloads.

Watch as Barry Davis from Intel describes how the company is moving forward with Intel® Omni-Path Architecture for high performance computing.

Watch the video

Related Videos