Getting Started with Intel® Software Optimization for Chainer* CPU Platforms

Published: 04/18/2019  

Last Updated: 04/17/2019

By Jing Xu

Introduction Series for Intel® Software Optimization for Chainer*

Vol 1: Getting Started - Installation instructions of Intel® Software Optimization for Chainer* and getting started guide.

Vol 2: Performance considerations - Introduces hardware and software configuration to fully utilize CPU computation resources with Intel Software Optimization for Chainer.

Vol 3: Performance number [In Progress] - Introduces performance number of Intel Software Optimization for Chainer.


Chainer*, a Python*-based deep learning framework, is younger than Caffe* or TensorFlow*, but growing rapidly. Similar to PyTorch*, Chainer is a dynamic framework which makes users be able to define neural network definitions on-the-fly at run time. Intel provides CPU accelerations to the framework in a package called Intel® Software Optimization for Chainer*.

Installation:

Installation of Chainer is as simply as running the following pip command.

pip install chainer

Note: Ubuntu* and CentOS* are recommended OS for utilizing Chainer. Chainer is supposed to be able to run on Windows* or macOS* correctly, but there's no official guarantee.

Users can also get detailed information about installation from Chainer help page.

Intel provides acceleration to Chainer by an open source library Chainer Backend for Intel architecture (iDeep). Users can get its more detailed information and source code from Github page. Intel has already provided a Python package, ideep4py, via pip and conda. Users can simply run any of the following commands to install this acceleration package.

pip install ideep4py
conda install -c intel ideep4py

If you would like to compile this acceleration library from source code, you need to use setuptools of Python.

On CentOS:

$ git submodule update --init && mkdir build && cd build && cmake3 .. 
$ cd ../python 
$ python setup.py install

On other Linux distributions:

$ git submodule update --init && mkdir build && cd build && cmake .. 
$ cd ../python 
$ python setup.py install

Note: ideep4py v1.0.x is incompatible with v2.0.x, and is not supported in Chainer v5.0 or later.

For detailed instructions of installing Chainer Backend for Intel architecture, please refer to Github page.

Docker images are also available on Docker site. Users can choose from the list according to their platforms.

docker pull chainer/chainer:latest-intel-python2
docker run -it chainer/chainer:latest-intel-python2 /bin/bash

How to Enable Intel Acceleration:

Currently Intel acceleration support is disabled by default. To enable this feature, please set the following environment variable before running your code.

export CHAINER_USE_IDEEP="auto"

Alternatively, you can call chainer.using_config() in your code to change the configuration.

x = np.ones((3, 3), dtype='f')
with chainer.using_config('use_ideep', 'auto'):
    y = chainer.functions.relu(x)
print(type(y.data))

Users can get more detailed information from Chainer tips page.

Getting Started

Once Chainer is installed, you can use its official example to discover its functionalities.

wget https://github.com/chainer/chainer/archive/v5.4.0.tar.gz
tar xzf v5.4.0.tar.gz
python chainer-5.4.0/examples/mnist/train_mnist.py

By default, Intel's acceleration is not enabled, you will get the following results (Running on CPU).

$ python chainer-5.4.0/examples/mnist/train_mnist.py
GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.19257     0.0864364             0.94185        0.9719                    10.465
2           0.073246    0.081475              0.97715        0.9729                    22.1638
3           0.0474693   0.0714523             0.9846         0.9781                    34.3896
4           0.0357018   0.0829585             0.988383       0.9763                    46.9038
5           0.028669    0.0765507             0.99085        0.9797                    59.5154
6           0.0237237   0.0777785             0.991883       0.9794                    72.4904
7           0.0215181   0.0851396             0.993217       0.9793                    85.7599
8           0.015366    0.0676308             0.995033       0.9832                    99.0905
9           0.0162062   0.0955388             0.99495        0.9786                    112.692
10          0.0157223   0.0833277             0.9948         0.9826                    126.432
11          0.0127758   0.0961516             0.996067       0.9791                    140.72
12          0.0150901   0.0871599             0.995233       0.98                      155.477
13          0.00863747  0.0959247             0.997133       0.9789                    170.877
14          0.0159231   0.0899868             0.995067       0.9821                    186.494
15          0.00784604  0.102112              0.997517       0.9795                    202.942
16          0.0108176   0.108022              0.997          0.9789                    219.78
17          0.00667875  0.107687              0.9982         0.9814                    237.004
18          0.0110276   0.101468              0.996767       0.9821                    254.609
19          0.00907735  0.113613              0.9974         0.9806                    272.191
20          0.0104995   0.0986947             0.9973         0.9827                    290.137

To enable Intel's acceleration feature, you need to confirm the following two requirements.

  1. ideep4py has been installed.
  2. export CHAINER_USE_IDEEP="auto"

Once again, when you run the example with environment variable MKLDNN_VERBOSE set to 1, you will see verbose messages. You are getting accelerated from Intel's optimization.

$ MKLDNN_VERBOSE=1 python chainer-5.4.0/examples/mnist/train_mnist.py
GPU: -1
# unit: 1000
# Minibatch-size: 100
# epoch: 20

mkldnn_verbose,exec,inner_product,gemm:blas,forward_training,fsrc:nc fwei:oi fbia:x fdst:nc,,mb100ic784oc1000,57.0559
mkldnn_verbose,exec,eltwise,jit:avx512_common,forward_training,fdata:nc fdiff:undef,alg:eltwise_relu,mb100ic1000ih32593iw0,0.240967
mkldnn_verbose,exec,inner_product,gemm:blas,forward_training,fsrc:nc fwei:oi fbia:x fdst:nc,,mb100ic1000oc1000,0.994873
mkldnn_verbose,exec,eltwise,jit:avx512_common,forward_training,fdata:nc fdiff:undef,alg:eltwise_relu,mb100ic1000ih32593iw0,0.107178
......

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.