Accelerate AI with oneDNN

Louie Tsai and Jing Xu, Intel AI Technical Consulting Engineers

Get the Latest on All Things CODE

author-image

The oneAPI Deep Neural Network Library (oneDNN) is an open-source, standards-based performance library for deep-learning applications. It is already integrated into leading deep-learning frameworks like TensorFlow* because of the superior performance and portability that it provides. oneDNN has been ported to at least three different architectures, including Arm*, Huawei, and Intel® CPUs and GPUs. The library includes basic building blocks for neural networks:
 

  • Convolutional neural network primitives (convolutions, inner product, pooling, etc.)
  • Recursive neural network primitives (LSTM, vanilla, RNN, GTU)
  • Normalizations (local response normalization, batch, layer)
  • Elementwise operations (ReLU, ELU, hyperbolic tangent, abs, etc.)
  • Softmax, Sum, Concat, Shuffle
  • Reorders from/to optimized data layouts
  • 16- and 32-bit and bfloat16 floating point, and 8-bit integer data types

We are not sitting still, however, and continue to make improvements in the entire software stack to fuel the next AI breakthroughs. New features are being added to oneDNN, like support for Intel® Advanced Matrix Extensions (Intel® AMX) to the x86 instruction set architecture. Intel AMX will be introduced in the next-generation Intel® Xeon® Scalable processors (code named Sapphire Rapids). They are designed to accelerate the typical matrix computations in artificial intelligence workloads. Intel AMX hardware can execute more multiply-add instructions per cycle than the Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) in current Intel Xeon processors. Internal matrix-multiply microbenchmarks using Intel AMX run faster than VNNI instructions on preproduction Sapphire Rapids processors. oneDNN will allow users to take advantage of Intel AMX without writing low-level code. Both inference and training workloads will benefit from this simple integration. Support for Intel GPUs based on Xe Architecture is also being added to oneDNN so that users can take advantage of accelerated computing with minimal code changes.

oneDNN Graph API extends oneDNN with a unified high-level API for multiple AI hardware classes (CPU, GPU, accelerators). With a flexible interface, it maximizes the opportunity to generate efficient code across a variety of hardware and can be tightly integrated with training and inference engines. oneDNN Graph API partitions an input deep-learning graph so that nodes that are candidates for fusion are grouped together. oneDNN Graph API compiles and runs a group of deep-learning operations in a graph partition as a fused operation. It can then perform target-specific optimization and code generation on a larger scope, which allows it to map the operation to hardware resources and improve execution efficiency and data locality with a global view of the computation graph.

oneDNN is intended for deep-learning framework developers, but data scientists and other deep-learning practitioners can take advantage of its optimizations through frameworks. This includes the most of popular frameworks such as TensorFlow, PyTorch*, OpenVINO™ toolkit, Apache MXNet*, ONNX* (Open Neural Network Exchange) Runtime, PaddlePaddle, DeepLearning4J, Apache SINGA, and Flashlight. For example, oneDNN optimizations are in the official x86-64 releases of PyTorch and TensorFlow (v2.5 and later).

The oneDNN project welcomes community contributions. In addition to the GitHub* repository, it is also distributed in the Intel® oneAPI Base Toolkit and via APT and YUM channels. oneDNN provides optimizations for Intel, ARM* 64-bit (AArch64), and NVIDIA* architectures. It also supports Power ISA, IBM Z, and RISC-V architectures. Work from Fujitsu and RIKEN is a notable example of community contributions; they ported oneDNN to Arm to achieve 7.8x better deep-learning performance on the Fugaku supercomputer. Fugaku recently took 1st place for CosmoFlow, one of the key MLPerf HPC benchmarks for deep-learning training.

There are plenty of good reasons to start using oneDNN to optimize your AI workloads.

Here are a few resources on how to use this library:

Get Started with oneDNN
 

See how others are using this library.
 

 

 

Use Automatic Differentiation to Optimize Parallel Computing

Read

 

 

 
 

Deploy Math Routines on CPUs and GPUs with Free Math Library

Watch

Accelerate AI Inferencing from Development to Deployment

Watch

Optimize Data Science & Machine Learning Pipelines

Watch

Intel® Distribution of OpenVINO™ Toolkit
Deploy deep learning inference with unified programming models and broad support for trained neural networks from popular deep learning frameworks.

Get It Now

See All Tools