Build Faster AI Solutions with the Intel-Optimized ONNX* Runtime

Published: 05/08/2019  

Last Updated: 05/08/2019

By Jose A Vargas

At the Microsoft* Build 2019 Developer Conference in Seattle, Intel showcased new data-centric products aimed at satisfying the exponential growth in demand for data generation and consumption. That demand drives huge opportunities for developers to deliver AI-fueled solutions on heterogeneous hardware from edge to cloud. Intel and Microsoft are co-engineering powerful tools based on the open source Open Neural Network Exchange (ONNX*) Runtime so developers can build applications that take advantage of the latest AI-boosting features from Intel and deliver proven, cross-platform solutions to satisfy organizations’ nearly insatiable data-centric demands.

The Opportunity for Developers

AI is a key technology driver for making productive use of the massive amounts of data resulting from the rapid proliferation of digitally enhanced products, services, and experiences, from edge to cloud.

“Previously, AI capabilities were only accessible by companies with deep expertise in the field. In just a few years, we’re seeing Intel customers around the world realizing transformative successes across a wide range of use cases and environments using AI,” observed Naveen Rao, Corporate VP and General Manager, AI Products Group at Intel, in a recent IT Peer Network blog. “This is due to the rising maturity of the software tools, ecosystems, and hardware capabilities.”

What Microsoft* and Intel Are Doing

Using the Microsoft Open Neural Network Exchange (ONNX) Runtime, a new open-source AI inference engine for ONNX models, Intel and Microsoft are co-engineering powerful development tools to take advantage of Intel’s latest AI-accelerating technologies across the intelligent cloud and the intelligent edge.

data centered products logos

The ONNX Runtime features an extensible design to allow Execution Provider (EP) plugins to provide deployment options across a variety of hardware choices to satisfy a broad range of compute needs. The Intel and Microsoft co-engineering efforts include upstream open source contributions of capabilities and performance optimizations for three ONNX Runtime EPs: nGraph deep learning compiler, Intel® Distribution of OpenVINO™ toolkit and Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN). Together, these EPs enable ONNX model execution across a broad range of Intel® CPUs, integrated GPUs, FPGAs and VPUs.

As an open format supported by various ML and deep neural network (DNN) frameworks and tools, ONNX enables developers to leverage a heterogeneous mix of hardware and use whichever AI framework they prefer. Developers can train a model with any popular framework (including PyTorch and TensorFlow), convert it to ONNX format, and inference efficiently across a wide range of hardware with ONNX Runtime. Models can also be run at the edge and on client systems. Key features of the ONNX Runtime include:

  • Interoperability: Fully compliant with the 1.4 ONNX Spec
  • Performance: Microsoft sees a 2x1 performance improvement compared to other existing solutions
  • Cross platform and multiple language support: Android*, Linux*, iOS* or macOS*, Windows*, APIs for Python*, C#, and C
  • Extensible architecture: Add hardware acceleratorsas plugins or "Execution Providers"
  • Lightweight: Small enough diskfootprint to work well in edge deployments as well as cloud
  • Ease of development: Develop once and deploy on multiple target platforms
  • Proven: Used with Microsoft models and services such as Bing* Search, Bing Ads, and Office 365*

Offering high-performance and interoperability with the ONNX standard, the ONNX Runtime is light-weight for a variety of deployment scenarios.

In the Cloud

At Microsoft Build 2019, Intel showcased these efforts with Microsoft for the ONNX Runtime. We’re seeing greater than 3.4X performance improvement2 with key benchmarks like ResNet50 and Inception v3 in our performance testing with DL Boost on 2nd Gen Intel® Xeon® Scalable processor-based systems and the nGraph EP added to the ONNX Runtime.

arpan shah and lisa spelmanat
Arpan Shah and Lisa Spelmanat Intel’s Data Centric Innovation Day

Recently, Microsoft’s Arpan Shah joined Intel’s Lisa Spelman on stage at Data Centric Innovation Day and confirmed that Microsoft improved ONNX inference performance by 3.4X2. This dramatic increase was due to nGraph plus DL Boost. “We’re seeing amazing results,” Shah said.

At the Intelligent Edge

Solution developers can use ONNX Runtime to inference not only in the cloud but also at the edge for faster, more portable AI applications. Developers can seamlessly deploy both pre-trained Microsoft topologies and models or use custom models created using Azure* Machine Learning services to the edge, across Intel CPUs (including the Intel Atom®, Intel® Core™ and 2nd Gen Intel Xeon Scalable processors), GPUs, VPUs, and FPGAs, utilizing ONNX Runtime with the OpenVINO™ toolkit.

The Intel AI in Production Program offers multiple edge devices to help developers along this journey across a wide range of use cases in retail, manufacturing, healthcare, smart cities, and more. Developers can start with the Intel® Neural Compute Stick 2 (Intel® NCS2) and Intel® Vision Accelerator Design products, developer kits, and Intel® IoT RFP Ready Kits to develop their applications. To deploy complete, scalable solutions, developers can take advantage of capabilities from equipment providers, software and analytics providers, system integrators, solution aggregators, and cloud service providers.

Ubiquitous Computing

Microsoft and Intel share the vision of ubiquitous computing, enabled by cloud and AI technologies, for every type of intelligent application and system you can imagine. With the ONNX Runtime, the Intel and Microsoft co-engineering strategy is simple: give customers the flexibility to choose their preferred deep learning framework, without getting locked-in, coupled with the ability to run models efficiently anywhere.

Get Started Today

2nd Gen Intel Xeon Scalable processor-based hardware is shipping now, and the ONNX Runtime software is continuously being updated and improved, so download it today, and explore the possibilities.

Related Content

Learn about and download the nGraph deep learning compiler, a framework-neutral Deep Neural Network (DNN) model compiler that can target a variety of devices.

Learn about and download the Intel Distribution of the OpenVINO toolkit, to make your vision a reality on Intel® platforms—from smart cameras to robotics, transportation, and more.

Learn about and download the Intel Math Kernel Library for Deep Neural Networks (Intel MKL-DNN), an open source performance library for Deep Learning (DL) applications intended for acceleration of DL frameworks on Intel® architecture.

Accelerate the end-to-end machine learning lifecycle with Azure Machine Learning service. Simplify model training for any skill level and deploy anywhere across cloud and edge.

Deploy containerized computer vision inference and analytics applications from the cloud to the edge with Intel® Development Kits for Microsoft Azure Cloud-to-Edge Solutions.

Hardware-accelerated Function-as-a-Service Using Microsoft Azure IoT Edge enables cloud developers to deploy inference functionalities on Intel® IoT edge devices with accelerators.

Deploy your own IoT solutions by using these prebuilt open source projects: Intel IoT Reference Implementations for Developers.

About the Author

Andy Vargas is a 24-year Intel veteran and leads a division that provides architecture definition, performance analysis and optimization, feature-enabling capabilities and co-engineering in Microsoft Windows for Intel platforms across a wide range of Intel products. He is also a senior principal engineer who works closely with other business unit architects to define next-generation capabilities and solutions, especially in the areas of cloud and enterprise. Andy holds 22 US and foreign patents in reliability, availability, scalability, and interconnect architecture, with additional patents pending.


1. ONNX Runtime is now open source.

2. Performance results are based on testing as of dates shown in configuration below and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. At the end of this sentence please add: “For more complete information visit Performance Benchmark Test Disclosure.

Intel does not control or audit the design or implementation of third-party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase.

Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms and assigning them a relative performance number that correlates with the performance improvements reported. SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjbb, SPECjvm, SPECWeb, SPECompM, SPECompL, SPEC MPI, SPECjEnterprise* are trademarks of the Standard Performance Evaluation Corporation. See for more information. TPC-C*, TPC-H*, TPC-E* are trademarks of the Transaction Processing Council. See TPC* for more information.

Performance test configurations:

Tested by Intel as of 3/12/2019. Intel® Xeon® Platinum 8268 Processor, 24 cores, 384 GB (12 slots/ 32GB/ 2666 MHz), HT ON, BIOS: SE5C620.86B.BR.2018.6.10.1757, Ubuntu 4.19.5, 4.19.5-041905, nGraph version: b8106133dca9c63bf167e34306513111adf61995, ONNX version: 1.3.0, MKL DNN version: v0.18, MKLML_VERSION_2019.0.3.20190125,Topology: ResNet-50,BS=1, Dataset: Synthetic, Datatype: INT8w/ Intel® DL Boost
Tested by Intel as of 3/12/2019. Intel® Xeon® Platinum 8168 Processor, 24 cores, 384 GB (12 slots/ 32GB/ 2666 MHz), HT ON, BIOS: SE5C620.86B.BR.2018.6.10.1757, Ubuntu 4.19.5, 4.19.5-041905, nGraph version: b8106133dca9c63bf167e34306513111adf61995, ONNX version: 1.3.0, MKL DNN version: v0.18, MKLML_VERSION_2019.0.3.20190125,Topology: ResNet-50,BS=1, Dataset: Synthetic, Datatype: FP32

Copyright © 2019 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All dates and products specified are for planning purposes only and are subject to change without notice.

*Other names and brands may be claimed as the property of others.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.

Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at