Windows* Machine Learning: Performance Improvements on Intel Integrated Graphics

Published: 10/12/2018  

Last Updated: 10/12/2018

By Michael J Coppock

In a previous post, Windows* Machine Learning: AI Acceleration on Intel® Hardware, we introduced some of the work happening at Intel and Microsoft*. Key messages outlined how machine learning (ML) functionality is provided in a simple Windows* ML API and that hardware acceleration for various workloads can be made available across a variety of underlying hardware, without complicating the API interface for the application developer. In this post, we’re previewing what that acceleration looks like for the first release on devices using Intel integrated graphics devices.

In conjunction with the Windows* October Fall 2018 operating system release, end-users and ISVs will be able to benefit from performance improvements on Intel integrated graphics, using 7th generation and higher supported processors.

Leverage the Windows* ML API to Execute ML Workloads

The API can take advantage of High-Level Shader Language (HLSL) shaders based on Microsoft* DirectML APIs, which is how operations are executed on devices compatible with the DirectX* 12 API. In addition to that path, Intel® graphics drivers now contain implementations for selective Metacommands that will replace the HLSL shaders for certain DirectML operators when workload requirements are met. This allows Intel graphics Drivers to indicate which optimizations are supported and transparently provide an additional performance benefit to end-users and ISV applications, with no additional effort from the application developer at the Windows ML API level. An advantage of this driver-based deployment model is that additional enhancements to Metacommands are possible with future Intel graphics driver updates as well.

Metacommand Performance

In the initial release of DirectML Metacommands, support for the Intel graphics Driver includes accelerations for many cases of fp32-based convolution operations, a smaller set of cases for fp16-based convolution operations, and many cases of fp32-based matrix multiply operations. A large number of well-known convolution neural networks can benefit from these fundamental operations.

The chart below shows the expected Metacommand performance improvements on a set of Open Neural Network Exchange (ONNX) models compatible with Windows ML. The existence and scale of any gains depends on the network topology specifics and precision levels that the application evaluates through the Windows ML API interface.

performance metrics
Figure 1. This data shows a greater than three times performance improvement, for some fp32-based topologies, over the default HLSL implementation.

performance metrics
Figure 2. This data shows a greater than four times performance improvement, for some fp16-based topologies, over the default HLSL implementation.

The test environment includes the following configuration details:

  • Intel® Core™ i7-7567U processor with Iris™ Plus Graphics 650
  • Graphics driver
  • Microsoft* Windows October 2018 build 17763
  • Operating system power plan set to “High Performance”
  • Tests performed using Microsoft’s WinMLRunner tool in the Windows ML sample application. The inferences/sec is calculated from the application’s reported “Evaluate” time over 1000 iterations.
  • Pretrained models were obtained from the ONNX documentation.


With this latest release, the Windows ML API begins to take advantage of DX12 DirectML Metacommands on Intel integrated graphics.


Product and Performance Information


Performance varies by use, configuration and other factors. Learn more at