Getting Started with Intel® Optimization for PyTorch*

Published: 03/26/2019  

Last Updated: 03/26/2019

By Nathan G Greeneltch, Jing Xu, and Shailendrsingh Kishore Sobhee

In collaboration with Facebook, PyTorch* is now directly combined with many Intel optimizations to provide superior performance on Intel architecture. The Intel® Optimization for PyTorch* provides the binary version of latest PyTorch release for CPUs, and further adds Intel extensions and bindings with oneAPI Collective Communications Library (oneCCL) for efficient distributed training.

The Intel extension, Intel® Extension for PyTorch (IPEX), to make the out-of-box user experience of PyTorch* CPU better while achieving good performance. The extension also will be the Pull-Request (PR) buffer for the Intel PyTorch framework dev team. The PR buffer will not only contain functions, but also optimizations (for example, take advantage of Intel's new hardware features). You can get more detailed info here.

To raise performance of distributed training, a PyTorch* module, torch-ccl, implements PyTorch* C10D ProcessGroup API for Intel® oneCCL (collective commnications library). Intel oneCCL is a library for efficient distributed deep learning training implementing such collectives like allreduce, allgather, alltoall. For more information on oneCCL, please refer to the oneCCL documentation. torch-ccl can be dynamically loaded as external ProcessGroup and only works on Linux platform for now. You can get more detailed info here.

See the article Intel and Facebook* collaborate to Boost PyTorch* CPU Performance for more details on recent performance accelerations.


  • Install via Intel® AI Analytics Toolkit

Intel® AI Analytics Toolkit includes the entire package of Intel® Optimization for PyTorch that includes binaries from latest PyTorch release, Intel Extensions for Pytorch (IPEX) and Torch-CCL together. There are multiple ways to get the toolkit and its components. It is distributed through several channels – Anaconda, Docker containers, Package managers (Yum, Apt, Zypper) and an online / offline installer from Intel. To download Intel Optimization for PyTorch from the AI Analytics Toolkit, visit here and choose the installation method of your choice. You can find more detailed information about the toolkit here.

  • Install via alternative methods for individual component

Please follow installation instructions in Github page of each individual component.

Sanity Check

You can check whether these components are installed or not via pip command.

import torch
import intel_pytorch_extension as ipex
mkldnn_enabled = torch.backends.mkldnn.is_available()
mkl_enabled = torch.backends.mkl.is_available()
openmp_enabled = torch.backends.openmp.is_available()
print('mkldnn : {0},  mkl : {1}, openmp : {2}'.format(mkldnn_enabled, mkl_enabled, openmp_enabled))


Getting Started

We have open sourced sample codes for Intel® Optimization for PyTorch* on Github. Please check more detailed infomation here.

Performance Considerations

For performance consideration of PyTorch running on Intel® Architecture processors, please refer to Data Layout, Non-Uniform Memory Access (NUMA) Controls Affecting Performance and oneMKL-DNN Technical Performance Considerations sections of: Maximize TensorFlow* Performance on CPU: Considerations and Recommendations for Inference Workloads.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at