Intel® oneAPI Deep Neural Network Library
Increase Deep Learning Framework Performance on CPUs and GPUs
Develop Faster Deep Learning Frameworks and Applications
The Intel® oneAPI Deep Neural Network Library (oneDNN) provides highly optimized implementations of deep learning building blocks. With this open source, cross-platform library, deep learning application and framework developers can use the same API for CPUs, GPUs, or both—it abstracts out instruction sets and other complexities of performance optimization.
Using this library, you can:
- Improve performance of frameworks you already use, such as OpenVINO™ toolkit, Intel® AI Analytics Toolkit, Intel® Distribution for PyTorch*, and Intel® Distribution for TensorFlow*.
- Develop faster deep learning applications and frameworks using optimized building blocks.
- Deploy applications optimized for Intel® CPUs and GPUs without writing any target-specific code.
Download as Part of the Toolkit
oneDNN is included as part of the Intel oneAPI Base Toolkit, which is a core set of tools and libraries for developing high-performance, data-centric applications across diverse architectures.
Download the Stand-Alone Version
A stand-alone download of oneDNN is available. You can download binaries from Intel or choose your preferred repository.
Develop in the Free Intel® Cloud
Get what you need to build and optimize your oneAPI projects for free. With an Intel® Developer Cloud account, you get 120 days of access to the latest Intel® hardware—CPUs, GPUs, FPGAs—and Intel® oneAPI tools and frameworks. No software downloads. No configuration steps. No installations.
Help oneDNN Evolve
oneDNN is part of the oneAPI industry standards initiative. We welcome you to participate.
Features
Automatic Optimization
- Use existing deep learning frameworks
- Develop platform-independent deep learning applications and deploy your instruction set architecture (ISA) with automatic detection of and optimization.
Network Optimization
- Identify performance bottlenecks using Intel® VTune™ Profiler
- Use automatic memory format selection and propagation based on hardware and convolutional parameters
- Fuse primitives with operations applied to the primitive’s result, for instance, Conv+ReLU
- Quantize primitives from FP32 to FP16, BF16, or INT8 using Intel® Neural Compressor
Optimized Implementations of Key Building Blocks
- Convolution
- Matrix multiplication
- Pooling
- Batch normalization
- Activation functions
- Recurrent neural network (RNN) cells
- Long short-term memory (LSTM) cells
Abstract Programming Model
- Primitive: Any low-level operation from which more complex operations are constructed, such as convolution, data format reorder, and memory
- Memory: Handles to memory allocated on a specific engine, tensor dimensions, data type, and memory format
- Engine: A hardware processing unit, such as a CPU or GPU
- Stream: A queue of primitive operations on an engine
Benchmarks
Case Studies
Preparing for Aurora: Ensuring the Portability of Deep Learning Software to Explore Fusion Energy
Argonne National Laboratory ported FusionDL, a collection of machine learning models and implementations in multiple frameworks, including TensorFlow and PyTorch optimized by oneDNN, to the Aurora exascale supercomputer.
Digital Transformation in Tough Times: Four Innovative Examples Powered by Data, AI, and Flexible Infrastructure
Large datasets and AI are applied securely and reliably to address challenges with the supply chain, utilities, healthcare, and COVID-19 risk management for returning to work while preserving privacy.
Demonstrations
Leverage Deep Learning Optimizations from Intel in TensorFlow*
oneDNN optimizations are available in TensorFlow, which enables developers to seamlessly benefit from Intel's optimizations.
Accelerate Bfloat16-based PyTorch*
Engineers from Intel and Facebook* introduce the latest software advancements added to Intel® Extension for PyTorch* on top of PyTorch and oneDNN.
News
TensorFlow and oneDNN in Partnership
Google* and Intel have been collaborating closely and optimizing TensorFlow to fully use new hardware features and accelerators.
Software AI Accelerators: AI Performance Boost for Free
Accelerate the deep learning framework you already use, such as TensorFlow, PyTorch, or Apache MXNe*, with oneDNN.
Documentation & Code Samples
Code Samples
Learn how to access oneAPI code samples in a tool command line or IDE.
- oneDNN Get Started
- oneDNN with SYCL* Interoperability
- oneDNN Library Convolutional Neural Network (CNN) Inference (FP32)
View All Code Samples (GitHub)
Specifications
Processors:
- Intel Atom® processors with Intel® Streaming SIMD Extensions
- Intel® Core™ processors
- Intel® Xeon® processors
- Intel® Xeon® Scalable processors
GPUs:
- Intel® Processor Graphics Gen9 and above
- Intel® Iris® Xe MAX graphic
Host & target operating systems:
- Linux*
- Windows*
- macOS*
Languages:
- SYCL
Note Must have Intel oneAPI Base Toolkit installed
- C and C++
Compilers:
- Intel® oneAPI DPC++/C++ Compiler
- Intel® C++ Compiler Classic
- Clang*
- GNU C++ Compiler*
- Microsoft Visual Studio*
- LLVM* for Apple*
Threading runtimes:
- Intel® oneAPI Threading Building Blocks
- OpenMP*
- SYCL
For more information, see the system requirements.
Get Help
Your success is our success. Access these resources when you need assistance.
For additional help, see oneAPI Support.
Stay in the Know with All Things CODE
Sign up to receive the latest trends, tutorials, tools, training, and more to
help you write better code optimized for CPUs, GPUs, FPGAs, and other
accelerators—stand-alone or in any combination.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.