Accelerating AI on Intel® Xeon® Scalable Processors

Huma Abidi

All around the world, customers like Novartis[1], Warner Bros.[2], GE Healthcare[3], and Ziva Dynamics[4] are achieving excellent real-world AI results on Intel® architecture. However, AI hardware is nothing without software. The complex set of machine learning, deep learning, and advanced analytics workloads that comprise modern AI applications requires versatile, performant software optimized to make the best use of that hardware’s features.

My team and I deliver software optimizations for deep learning on current-gen and future-gen Intel® Xeon® Scalable processors. I’m excited to share our progress this week at O’Reilly AI San Francisco.

Making an Impact in AI

In 2017 alone, Intel produced more than $1 billion in AI-driven Intel Xeon processor revenue.[5] “One billion” is a big number, but it still doesn’t fully capture the effect that Intel Xeon Scalable processors are having in AI. Much of AI today occurs on Intel Xeon processor-based servers that organizations already use for tasks that keep critical infrastructure up and running, perform advanced analytics, or enable high-performance computing. With this in mind, we enhanced the Intel Xeon Scalable platform specifically to run high-performance AI workloads alongside the other cloud and data center workloads they already run. This gives you the best of both worlds. At Intel’s 2018 Data-Centric Innovation Summit, we showcased new features coming in future generations of the Intel Xeon Scalable platform, called Intel® Deep Learning Boost (Intel® DL Boost), that will further accelerate deep learning inferencing on Intel architecture.

The first of these technologies, the Vector Neural Network Instruction set (VNNI), will be included in the next generation of the Intel Xeon Scalable platform and will accomplish in a single instruction what formerly required three. With VNNI, we’ve projected an up to 11X performance increase in low-precision inferencing for this next generation platform[6], compared to the performance of the Intel Xeon Scalable platform at its launch in July 2017. The microarchitecture to follow will add support for bfloat16, a new numeric format quickly being adopted by the AI practitioners for highly accurate algorithmic performance and increased parallelism at a fraction of the power[7].

Accelerating the Most Popular Deep Learning Frameworks

Many recent results point to the efficacy of Intel Xeon Scalable processors for deep learning applications across enterprises and in the cloud.

  • Stanford DAWNBench - In April 2018, Intel® Optimized Caffe* running in Amazon EC2 [c5.18xlarge] demonstrated the ability to classify one ImageNet image using a model with a top-5 validation accuracy of up to 93% or greater in just a few milliseconds. As of September 2018, Intel has posted the three fastest completion times for this particular inferencing task.[8]
  • Novartis – Pharmaceutical leader Novartis accelerated the time to train a multiscale convolutional neural network (M-CNN) for 10K high-content cellular microscopic images from hours to minutes—with more than 99 percent accuracy--using multi-node Intel Xeon Scalable processor-based servers, Intel® Omni-Path Architecture (Intel® OPA), and multi-node TensorFlow*. This amounts to an improvement of greater than 6x[9].
  • Apache MXNet* - As of its v1.2.0 release, MXNet integrates Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) to speed the execution of deep neural network operations including Convolution, Deconvolution, FullyConnected, Pooling, Batch Normalization, Activation, LRN, Softmax, as well as common operators such as sum and concat. In early testing by an Intel AI team, these optimizations have been shown to decrease latency for single-picture inference by up to 43x and increase throughput by up to 56.9x with a batch size of 32 images[10].

Facilitating Deep Learning Application Development and Deployment

Our work prioritizes the out of box experience for data scientists and developers using TensorFlow through Optimized Wheels and the Anaconda* Python* distribution. Our goal is to improve access to the latest performance improvements for Intel processors in TensorFlow. These performance improvements are largely due to the integration of and improvements to Intel MKL-DNN.

Gaining the benefit of Intel MKL-DNN in TensorFlow formerly required building TensorFlow with the MKL tag, which could be a tedious, time-consuming process. We’re now easing this process through the release of Intel-optimized Wheels (or pre-built binaries) and containers for TensorFlow. Customers can now simply use ‘pip’ to install these existing libraries instead of building a new optimized TensorFlow instance.

We’re additionally excited to showcase that the latest Intel optimizations (using Intel MKL-DNN libraries) can install easily and quickly using “conda install” in a conda environment on Linux* OS. Anaconda is a Python distribution that includes many of the most popular packages for data science, analytics, machine learning, and deep learning. Anaconda users can now easily install TensorFlow optimized with Intel MKL-DNN from into their virtual environments. These performance-optimized wheels and streamlined TensorFlow installations through Anaconda represent great improvements in terms of ease of use.

Accelerating Real AI on Intel® Architecture

Software is key to moving AI forward. Intel – and my team – will continue to deliver the performance and simplicity needed to shorten the distance between idea and production AI solution. For more on software optimizations and tools for AI on Intel architecture, please look for us at O’Reilly AI San Francisco this week, follow @intelAI on Twitter, and stay tuned to









[9] § Configuration: CPU: Intel Xeon 6148 processor @ 2.4GHz, Hyper-threading: Enabled. NIC: Intel® Omni-Path Host Fabric Interface, TensorFlow: v1.7.0, Horovod: 0.12.1, OpenMPI: 3.0.0. OS: CentOS 7.3, OpenMPU 23.0.0, Python 2.7.5. Time to Train to converge to 99% accuracy in model. Performance results are based on testing as May 25th 2018 and may not reflect all publicly available security update. See configuration disclosure for details. No product can be absolutely secure


Stay Connected

Keep tabs on all the latest news with our monthly newsletter.