Apache* MXNet* v1.2.0 optimized with Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN)


The Apache MXNet community recently announced the v1.2.0 release of the Apache MXNet deep learning framework. One of the most important features in this release is the Intel optimized CPU backend: MXNet now integrates with Intel MKL-DNN to accelerate neural network operators. This includes Convolution, Deconvolution, FullyConnected, Pooling, Batch Normalization, Activation, LRN, Softmax, as well as common operators such as sum and concat. More details are available in the release note and release blog. In this blog, we will detail how to install the latest v1.2.0 release, and showcase the subsequent performance benefits on Intel® Xeon® Scalable processors.

Performance Improvement

Latency Optimization

When deploying a machine learning (ML) or deep learning (DL) model, low latency performance is essential. In the latest MXNet release, specific optimizations using Intel MKL-DNN are applied to reduce latency for the better real-time results, especially for low batch sizes.

As the following chart shows, the latency of single picture inference (batch size 1) is significantly decreased.

NOTE: the latency can be calculated by (1000 * batchsize / throughput) and the unit is ms.

Throughput Improvement

For larger batch sizes, such as BS=32, the throughput has been significantly improved due to the integration with Intel MKL-DNN.

As the following chart shows, the throughput of batch size=32 is about 23.4-56.9X faster than the original backend.

Batch Scalability

Additionally, integrating Intel MKL-DNN has improved scalability across batch sizes. In the chart below, throughput remains constant at ~8 images/second for the non-optimized backend.

With Apache MXNet v1.2.0, batch scalability has improved, where the throughput is boosted from 83.7 images/second (BS=1) to 199.3 images/second (BS=32) for ResNet-50.

Raw Data

Benchmark script
CMD to reproduce the results:

export KMP_AFFINITY=granularity=fine,compact,1,0
export vCPUs=`cat /proc/cpuinfo | grep processor | wc -l`
export OMP_NUM_THREADS=$((vCPUs / 2))

Installation guide

Install from PyPI

● Install prerequisites: wget and latest pip (if needed)
$ sudo apt-get update
$ sudo apt-get install -y wget python gcc
$ wget https://bootstrap.pypa.io/get-pip.py && sudo python get-pip.py

● Install MXNet with MKL-DNN acceleration
MXNet with MKL-DNN backend has been released in 1.2.0.
$ pip install mxnet-mkl==1.2.0 [–user]
Please note that the mxnet-mkl package is built with USE_BLAS=openblas. If you want to leverage the performance boost from MKL blas, please try to install mxnet from source.

● Install MXNet without MKL-DNN acceleration
$ pip install mxnet==1.2.0 [–user]

Install from source code

● Download MXNet source code from GitHub
$ git clone --recursive https://github.com/apache/incubator-mxnet
$ cd incubator-mxnet
$ git checkout 1.2.0
$ git submodule update --init --recursive

● Build with MKL-DNN backend

Note 1: When calling this command, MKL-DNN will be downloaded and built automatically.
Note 2: MKL2017 backend has been removed from MXNet master branch. So users cannot build MXNet with MKL2017 backend from source code anymore.
Note 3: To use MKL as BLAS library, users may need to install Intel(R) Parallel Studio for best performance.
Note 4: If MXNet cannot find MKLML libraries, please add the MKLML library path to LD_LIBRARY_PATH and LIBRARY_PTH at first.

HW Configuration

System Info

Important Official Pages

Access and Installation

Performance Tuning









Notices and Disclaimers

Stay Connected

Keep tabs on all the latest news with our monthly newsletter.