Built on top of lower-level communication middleware – MPI and libfabrics
Optimized to drive scalability of communication patterns by enabling the productive trade-off of compute for communication performance
Enables a set of DL-specific optimizations, such as prioritization, persistent operations, out of order execution, etc.
DPC++-aware API to run across various hardware targets, such as CPUs and GPUs
Works across various interconnects: Intel® Omni-Path Architecture (Intel® OPA), InfiniBand*, and Ethernet