Introduction Series for Intel® Software Optimization for Chainer*
Vol 1: Getting Started - Installation instructions of Intel® Software Optimization for Chainer* and getting started guide.
Vol 2: Performance considerations - Introduces hardware and software configuration to fully utilize CPU computation resources with Intel Software Optimization for Chainer.
Vol 3: Performance number [In Progress] - Introduces performance number of Intel Software Optimization for Chainer.
Chainer*, a Python*-based deep learning framework, is younger than Caffe* or TensorFlow*, but growing rapidly. Similar to PyTorch*, Chainer is a dynamic framework which makes users be able to define neural network definitions on-the-fly at run time. Intel provides CPU accelerations to the framework in a package called Intel® Software Optimization for Chainer*.
Installation of Chainer is as simply as running the following pip command.
pip install chainer
Note: Ubuntu* and CentOS* are recommended OS for utilizing Chainer. Chainer is supposed to be able to run on Windows* or macOS* correctly, but there's no official guarantee.
Users can also get detailed information about installation from Chainer help page.
Intel provides acceleration to Chainer by an open source library Chainer Backend for Intel architecture (iDeep). Users can get its more detailed information and source code from Github page. Intel has already provided a Python package, ideep4py, via pip and conda. Users can simply run any of the following commands to install this acceleration package.
pip install ideep4py conda install -c intel ideep4py
If you would like to compile this acceleration library from source code, you need to use setuptools of Python.
$ git submodule update --init && mkdir build && cd build && cmake3 .. $ cd ../python $ python setup.py install
On other Linux distributions:
$ git submodule update --init && mkdir build && cd build && cmake .. $ cd ../python $ python setup.py install
Note: ideep4py v1.0.x is incompatible with v2.0.x, and is not supported in Chainer v5.0 or later.
For detailed instructions of installing Chainer Backend for Intel architecture, please refer to Github page.
Docker images are also available on Docker site. Users can choose from the list according to their platforms.
docker pull chainer/chainer:latest-intel-python2 docker run -it chainer/chainer:latest-intel-python2 /bin/bash
How to Enable Intel Acceleration:
Currently Intel acceleration support is disabled by default. To enable this feature, please set the following environment variable before running your code.
Alternatively, you can call chainer.using_config() in your code to change the configuration.
x = np.ones((3, 3), dtype='f') with chainer.using_config('use_ideep', 'auto'): y = chainer.functions.relu(x) print(type(y.data))
Users can get more detailed information from Chainer tips page.
Once Chainer is installed, you can use its official example to discover its functionalities.
wget https://github.com/chainer/chainer/archive/v5.4.0.tar.gz tar xzf v5.4.0.tar.gz python chainer-5.4.0/examples/mnist/train_mnist.py
By default, Intel's acceleration is not enabled, you will get the following results (Running on CPU).
$ python chainer-5.4.0/examples/mnist/train_mnist.py GPU: -1 # unit: 1000 # Minibatch-size: 100 # epoch: 20 epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time 1 0.19257 0.0864364 0.94185 0.9719 10.465 2 0.073246 0.081475 0.97715 0.9729 22.1638 3 0.0474693 0.0714523 0.9846 0.9781 34.3896 4 0.0357018 0.0829585 0.988383 0.9763 46.9038 5 0.028669 0.0765507 0.99085 0.9797 59.5154 6 0.0237237 0.0777785 0.991883 0.9794 72.4904 7 0.0215181 0.0851396 0.993217 0.9793 85.7599 8 0.015366 0.0676308 0.995033 0.9832 99.0905 9 0.0162062 0.0955388 0.99495 0.9786 112.692 10 0.0157223 0.0833277 0.9948 0.9826 126.432 11 0.0127758 0.0961516 0.996067 0.9791 140.72 12 0.0150901 0.0871599 0.995233 0.98 155.477 13 0.00863747 0.0959247 0.997133 0.9789 170.877 14 0.0159231 0.0899868 0.995067 0.9821 186.494 15 0.00784604 0.102112 0.997517 0.9795 202.942 16 0.0108176 0.108022 0.997 0.9789 219.78 17 0.00667875 0.107687 0.9982 0.9814 237.004 18 0.0110276 0.101468 0.996767 0.9821 254.609 19 0.00907735 0.113613 0.9974 0.9806 272.191 20 0.0104995 0.0986947 0.9973 0.9827 290.137
To enable Intel's acceleration feature, you need to confirm the following two requirements.
- ideep4py has been installed.
- export CHAINER_USE_IDEEP="auto"
Once again, when you run the example with environment variable MKLDNN_VERBOSE set to 1, you will see verbose messages. You are getting accelerated from Intel's optimization.
$ MKLDNN_VERBOSE=1 python chainer-5.4.0/examples/mnist/train_mnist.py GPU: -1 # unit: 1000 # Minibatch-size: 100 # epoch: 20 mkldnn_verbose,exec,inner_product,gemm:blas,forward_training,fsrc:nc fwei:oi fbia:x fdst:nc,,mb100ic784oc1000,57.0559 mkldnn_verbose,exec,eltwise,jit:avx512_common,forward_training,fdata:nc fdiff:undef,alg:eltwise_relu,mb100ic1000ih32593iw0,0.240967 mkldnn_verbose,exec,inner_product,gemm:blas,forward_training,fsrc:nc fwei:oi fbia:x fdst:nc,,mb100ic1000oc1000,0.994873 mkldnn_verbose,exec,eltwise,jit:avx512_common,forward_training,fdata:nc fdiff:undef,alg:eltwise_relu,mb100ic1000ih32593iw0,0.107178 ......
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.