Intel® CPU Excels in MLPerf* Reinforcement Learning Training

Published: 07/10/2019  

Last Updated: 07/10/2019

By Koichi Yamada, Wei Li, Guokai Ma, Nathan G Greeneltch, and Linlin Cheng

Today, MLPerf* consortium, a group of 40 companies and university research institutes, published the 2nd round of the benchmark results based upon MLPerf training rev 0.6.  MLPerf training benchmarks are designed to measure performance for training workloads across cloud providers and on-premise hardware platforms.

In Intel’s MLPerf submission, we measured 77.95 minutes[1] to train MiniGo on a single node of a 2 socket Intel® Xeon® Platinum 9280 system.  On 32 nodes of a 2 socket Intel® Xeon® Platinum 8260L processor cluster system, we completed training of the MiniGo model in just 14.43 minutes[2]. These results demonstrate that 2nd generation Intel® Xeon® Scalable Processors can deliver comparable reinforcement learning (MiniGO) training time as the best accelerator performance in today’s MLPerf 0.6 result publication[3].

Reinforcement Learning is Gaining Momentum

Note: Image source can be found here from

MiniGo is a representative benchmark in MLPerf for reinforcement learning.  Reinforcement learning is one of the hottest research areas of machine learning with its applications in robotics, autonomous driving, traffic control systems, finance and games.  What makes reinforcement learning unique from the other machine learning methods is that it allows an algorithm to learn what to do by interacting with the environment and having a continuous process of receiving rewards or penalties on every action. As a result, reinforcement learning can learn to solve complex tasks such as developing the strategy to win at the board game GO which other standard machine learning techniques cannot do easily.

MiniGo is inspired by the work done by DeepMind* with the following papers: "Mastering the Game of Go with Deep Neural Networks and Tree Search", "Mastering the Game of Go without Human Knowledge", and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". Our results used Intel® Optimization for TensorFlow*. The MiniGo algorithm follows AlphaGo Zero* by Google* closely. It is a full reinforcement learning workload. The algorithm first starts on randomized weights, then goes to self-play. The results are used for training, which are then used in the next-iteration of self-play to reinforce the learning. MiniGo’s self-play inference is the most expensive part of the whole reinforcement learning pipeline. MLPerf has a version of MiniGo for TensorFlow, which is based on a fork of the MiniGo project.

Intel® Xeon® Scalable processors are excellent multi-purpose hardware that runs many of the world’s cloud and datacenter workloads. We benchmarked the 2nd-Generation Intel® Xeon® Scalable processor family including Intel Xeon Platinum 9280 (56 cores) and 8260L (24 cores) processors with 48 CPU cores for each node.

We optimized this self-play inference task with Intel® Deep Learning Boost (Intel® DL Boost) technology available on the 2nd-Generation Intel Xeon Scalable processors. This technology includes integer vector neural network instructions (VNNI), providing a high 8-bit inference throughput with a theoretical peak compute gain of 4x INT8 OPS over FP32 OPS. For optimal performance, we also parallelized ‘self-play path’ and ‘train path’ by allocating dedicated CPU compute resources per training and self-play inference.  Intel Xeon’s high-count multi-core and thread architecture allows us to allocate and divide the compute resources at a fine grain level to both simulation and inference tasks running many instances in parallel across CPU cores, which makes self-play more compute efficient than coarse grain resource allocation with a large batch compute approach.

The Intel Xeon processors are the standard for DL inference today. The Intel CPU MiniGo results show that Intel Xeon can also be used effectively for training, which is especially beneficial for customers who wish to run multiple workloads on their infrastructure without investing in dedicated hardware. We look forward to additional optimizations which will provide our customers continued machine learning improvements on the hardware they already know and trust.  Reinforcement learning is becoming a very important workload in many industries and this MLPerf result is an example of how Intel’s continued investment in AI research is moving the field forward.



[1] Training time of 77.95 minutes on the Reinforcement Learning benchmark (MiniGo) using a 2 chip count Intel Xeon Platinum 9282. System employed Intel Optimization for TensorFlow with Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) v0.18 library. MLPerf v0.6 Training Closed; Retrieved from 10 July 2019, entry 0.6-31. MLPerf name and logo are trademarks. See for more information.

Configuration detail:


[2] Training time of 14.43 minutes on the Reinforcement Learning benchmark (MiniGo) using 32 nodes of 2 chip count Intel Xeon Platinum 8260L. System employed Intel Optimization for TensorFlow with the Intel MKL-DNN v0.18 library. MLPerf v0.6 Training Closed; Retrieved from 10 July 2019, entry 0.6-7. MLPerf name and logo are trademarks. See for more information.

Configuration detail:


[3] Training time of 13.57 minutes on the Reinforcement Learning benchmark (MiniGo) using Nvidia* 3xDGX-1 systems (24 GPUs) employed TensorFlow and NGC19.05 MLPerf v0.6 Training Closed; Retrieved from 10 July 2019, entry 0.6-11. MLPerf name and logo are trademarks. See for more information.

Configuration detail: 

Notices and Disclaimers

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.

Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.   For more complete information visit

Performance results are based on testing as of dates shown in configuration details and may not reflect all publicly available security updates.


See references (above) for configurations

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product user and reference guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at

Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at