In collaboration with Facebook*, PyTorch* is now directly combined with many Intel optimizations to provide superior performance on Intel architecture. The Intel optimization for PyTorch* provides the binary version of the latest PyTorch release for CPUs, and further adds Intel extensions and bindings with oneAPI Collective Communications Library (oneCCL) for efficient distributed training.
The Intel extension, Intel® Optimization for PyTorch extends PyTorch with optimizations for an extra performance boost on Intel hardware. Most of the optimizations will be included in stock PyTorch releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch on Intel hardware, examples include Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX). You can get more detailed info here.
To raise the performance of distributed training, a PyTorch module, torch-ccl, implements PyTorch C10D ProcessGroup API for Intel® oneAPI Collective Communications Library (oneCCL) or oneAPI Collective Communications Library (oneCCL). Intel oneCCL is a library for efficient distributed deep learning training implementing such collectives like allreduce, allgather, alltoall. For more information on oneCCL, please refer to the oneCCL documentation. torch-ccl module can be dynamically loaded as external ProcessGroup and only works on Linux platform for now. You can get more detailed info here.
See the article Intel and Facebook collaborate to Boost PyTorch CPU Performance and tutorial webiste for more details on performance accelerations.
Please follow installation instructions on Github* page.
- Intel® Optimization for PyTorch - More detailed information can be found on tutorial website.
- torch-ccl module
Alternatively, Intel® AI Analytics Toolkit includes the entire package of Intel optimization for PyTorch that includes binaries from the latest PyTorch release, Intel® Optimization for PyTorch, and the torch-ccl module together. You can find more detailed information about the toolkit here.
We have open sourced sample codes for Intel® Optimization for PyTorch* on Github. Please check more detailed information here.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.