Inside Intel: The Race for Faster Machine Learning



The goal is not just the fastest but the most productive machine-learning platform for researchers.

Pradeep Dubey, an Intel Fellow at the Intel Labs division

Product and Performance Information


Configuration information - Hardware: Intel® Xeon® Processor E5-2699 v3, 2 eighteen-core CPUs (45MB LLC, 2.3GHz), Intel® TurboBoost Technology off, Intel® Hyperthreading technology off, 64GB of RAM; Operating System: RHEL 6.5 GA x86_64; testing source, internal Intel measurements.


Up to 2.3x faster training per system claim based on AlexNet* topology workload (batch size = 256) using a large image database running 4-nodes Intel® Xeon Phi™ processor 7250 (16 GB, 1.4 GHz, 68 Cores) in Intel® Server System LADMP2312KXXX41, 96GB DDR4-2400 MHz, quad cluster mode, MCDRAM flat memory mode, Red Hat Enterprise Linux* 6.7 (Santiago), 1.0 TB SATA drive WD1003FZEX-00MK2A0 System Disk, running Intel® Optimized DNN Framework, Intel® Optimized Caffe (source: training 1.33 billion images/day in 10.5 hours compared to 1-node host with four NVIDIA “Maxwell” GPUs training 1.33 billion images/day in 25 hours (source: slide 32).


Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

Performance varies by use, configuration and other factors. Learn more at
Configuration information: One 2-socket Intel® Xeon® processor E5-2697 v4 (45M cache, 2.3GHz, 18 cores), memory 128GB vs one NVIDIA* Tesla K80 GPUs, NVIDIA CUDA* 7.5.17 (Driver 352.39), ECC enabled, persistence mode enabled.