Intel® oneAPI Deep Neural Network Library Release Notes and System Requirements

ID 918376
Updated 5/1/2026
Version 2026
Public

author-image

By

Where to Find the Release

You can get the latest version of Intel® oneAPI Deep Neural Network Library as part of the Intel® oneAPI Toolkit, or as a Stand-Alone download.

If you have Priority Support for Intel® oneAPI Toolkit or any other oneAPI toolkit (i.e. an unexpired free license or a paid license with active support at the time of the build date for this product release), you can download the installer by logging in to Intel® Registration Center and selecting the appropriate product. You may need to create an account and/or register your product. For additional information please see: Product Registration and Sign-Up FAQ

Release history

Release version

Release Date

2026.0

April 29, 2026

New in this Release

2026.0

Highlights

  • Optimized support for the latest Intel® CPUs and GPUs, delivering faster model training and inference.
  • Reduced memory bottlenecks through host-side scalar memory management, optimizing critical matrix multiplication and convolution operations.
  • Added support for emerging floating-point formats (MXFP8, NVFP4), enabling users to leverage state-of-the-art quantization methods that deliver faster training and inference while maintaining accuracy.

Improvements and Changes

Performance Optimizations

Intel® Architecture Processors

  • Improved performance on future Intel® Xeon® processors with Intel® Advanced Vector Extensions 10.2 (Intel® AVX10.2) and Intel® Advanced Matrix Extensions (Intel® AMX) instruction sets support. This functionality is not dispatched by default and requires opt-in with the environment variable ONEDNN_MAX_CPU_ISA=AVX10_2_512_AMX_2.
  • Improved performance on future Intel® Core™ processors with Intel® AVX10.2 instruction set support. This functionality is not dispatched by default and requires opt-in with the environment variable ONEDNN_MAX_CPU_ISA=AVX10_2_512.
  • Improved performance of matmul primitive on processors with Intel® AMX support.
  • Improved performance of f32 matmul primitive for GEMV cases on processors with Intel® AVX2 instruction set support.
  • Improved matmul performance with int4 and int8 compressed weights and per-channel zero-points.
  • Improved f32 matmul performance with int4 and int8 compressed weights on processors with Intel® AVX2 and Intel® AVX-512 instruction set support.
  • Improved bf16 matmul performance with int4 and int8 compressed weights on processors with Intel® AVX-512, Intel® Deep Learning Boost (Intel® DL Boost) and bfloat16 instruction set support.
  • Improved performance of int8 convolution primitive when using zero points.
  • Improved performance of int8 matmul and inner product primitives with fp16 destination.
  • Improved performance of f32 and bf16 convolution primitive with int8 destination.
  • Improved performance of subgraphs containing sequence of multiple binary ops with Graph API.
  • Improved fp32 matmul performance with fp4 compressed weights.
  • Improved fp32 matmul performance for cases when one of the tensors has a trivial dimension on processors with Intel® AVX-512 instruction set support.

Intel® Graphics Products

  • Improved fp16/bf16 matmul performance for large tensor cases on Intel® Graphics for Intel® Core™ Ultra Processor Series 3 (formerly Panther Lake).
  • Improved matmul performance for cases with 4-byte alignment on Intel® GPUs based on Xe2 architecture.
  • Improved performance of fp16/bf16 matmul with mxfp4 weights.
  • Improved convolution performance with host-side scalar scales and zero points.
  • Improved matmul performance for LLM inference workloads on Intel® GPUs based on Xe2/Xe3 architectures.
  • Improved f32 SDPA performance for small head sizes.
  • Improved GEMM performance for small batch size on Intel® Core™ Ultra processors (Series 2) (formerly Lunar Lake).
  • Improved matmul performance for Qwen2-7B shapes on Intel® Arc™ Graphics (formerly Alchemist) and Intel® Arc™ Graphics for Intel® Core™ Ultra processors (formerly Arrow Lake-H).
  • Improved int8 matmul performance with int4 weights and per-tensor zero-points.
  • Improved bf16 matmul performance with fp8 weights.
  • Graph API optimizations:
      • Improved Scaled Dot Product Attention (SDPA) subgraph performance for inference when relaxed accumulation mode is enabled on Intel® Core™ Ultra processors (formerly Meteor Lake).
      • Improved SDPA and GQA subgraphs performance when using host-side scalars.
      • Improved performance of GQA subgraph for 2nd token scenarios.
      • Improved performance of subgraphs containing sequence of multiple binary ops.
      • Improved performance of Grouped Query Attention (GQA) subgraphs for training forward and backward propagation.

Functionality

Functional API

  • Introduced destination tensor dynamic quantization in the matmul primitive following the Open Compute Microscaling (MX) formats specification. See MXFP8 matmul tutorial for quick introduction into MX-capabilities in oneDNN.
  • Introduced support for NVFP4 quantization scheme. The changes include support for fp8_e4m3 grouped scales and dynamic quantization support for destination tensor with NVFP4-specific formula for scales computation.
  • Introduced support for dropout as a primitive attribute for matmul, softmax and eltwise primitives.
  • Introduced host-side scalar memory objects. This functionality allows passing host-side scalars instead of device memory objects when using oneDNN with OpenCL or SYCL runtimes. Host-side scalars are currently supported in matmul and convolution primitives on Intel® GPUs.
  • Introduced support for pre-computed reductions in the matmul primitive. This functionality is intended to improve performance for int8 activations and int8 weights with zero-point.

Graph API

  • Introduced support for RMS Normalization operation.
  • Introduced support for output gradient of attention mask for SDPA and GQA training.
  • Introduced host_scalar property for logical tensors. This functionality allows passing host-side scalars instead of device memory objects when using oneDNN with OpenCL or SYCL runtimes. Host-side scalars are currently supported to define attention scale, sequence length, and the negative infinity value in SDPA/GQA subgraphs.
  • Introduced accumulation mode attribute support in Matmul op. This attribute allows relaxing fp32 accumulation requirements to achieve performance benefits on some platforms.

Intel® Graphics Products

  • Introduced support for fp4 weights in the matmul primitive.
  • Introduced support for weight scales and zero-points with group size 16 in matmul with compressed weights.
  • Introduced support for convolution with u8 weights.
  • Introduced support for 2D grouped scales in fp8 and dual zero points in matmul.
  • Extended support for 5D and 6D tensors in matmul with post-ops.

Intel® Architecture Processors

  • Introduced support for different data types of source and destination in pooling forward propagation.
  • Introduced fp4 weights support for fp32 matmul and convolution for future Intel® Xeon® processors with Intel® AVX10.2 instruction set support.
  • Added support for bf16 and f16 matmul with transposed source on Intel® CPUs with Intel® AVX2 instruction set support

Usability

Common

  • Extended quantization attributes documentation to cover all quantization schemes supported by the library.
  • Added matmul fp8 quantization example demonstrating use of matmul primitive with fp8 source, destination, and weights.
  • Extended diagnostics available in verbose mode for primitive descriptor creation issues.

Intel® Graphics

  • Extended dispatch diagnostics in verbose mode output for primitives implementations on Intel® GPUs.

Fixed Issues

  • Fixed sporadic correctness issue in fp32 matmul on Intel® GPUs based on Xe2 architecture
  • Fixed correctness issue in fp16/bf16 matmul on Intel® GPUs based on Xe3 architecture
  • Fixed performance regression in bf16 convolution weight gradient on Intel Arc Graphics B-series
  • Fixed regression in matmul primitive creation time on Intel® GPUs
  • Fixed convolution performance regression on Intel® Arc™ Graphics B-series
  • Fixed a memory leak in Graph API related to host scalars use
  • Fixed f16 matmul performance regression with int4 weights on Intel® Arc™ Graphics for Intel® Core™ Ultra processors (Series 3)
  • Fixed bf16 matmul performance regression on Intel® Xeon® processors with Intel® AMX instruction set support
  • Changed register allocation in BRGEMM kernel to avoid register conflicts and improve code safety
  • Fixed a crash related to incorrect caching of int8 convolution primitive on Intel® GPUs
  • Fixed a bug preventing correct detection of Intel® AVX10.2 instruction set on Intel® Xeon® processors
  • Fixed performance regression in bf16 matmul with int4 weights on Intel® GPUs based on Xe2 architecture
  • Fixed performance regression in inner product primitive with transposed weights on Intel® CPUs
  • Fixed an out of registers issue in SDPA fusion with Graph API on Intel® GPUs
  • Fixed integer overflow in softmax primitive implementation for Intel® GPUs
  • Fixed incorrect results in f64 convolution weight gradient on Intel® GPUs based on Xe-LPG architecture
  • Removed in-place optimization for reorder in Graph API to avoid correctness issues
  • Fixed a correctness issue in f32 convolution with small number of input channels
  • Fixed a correctness issue in matmul with binary post-op and non-trivial strides on x64 CPUs
  • Fixed a correctness issue in 3D grouped convolution weight gradient on Intel® GPUs
  • Fixed a page fault issue in f32 SDPA subgraph on Intel® GPUs
  • Fixed a performance regression in bf16 matmul on Intel® CPUs with Intel® AMX instruction set support
  • Fixed a segmentation fault in matmul on Intel® processors with Intel® AVX10.2 and Intel® AMX instruction set support
  • Fixed correctness issue in SDPA subgraph with non-trivial strides for mask on Intel® GPUs
  • Fixed an issue with unintentionally exposed internal symbols in Graph API
  • Fixed an integer overflow in memory descriptor size computation for humungous tensors
  • Fixed a potential heap corruption in f32 GEMM kernels on Intel® CPUs

Known Issues and Limitations

  • Convolution primitive may require an excessive amount of scratchpad memory for shapes with a large input width value on Intel® CPUs.
  • bf16 convolution primitive has a performance regression on Intel® Arc™ B-series graphics.
  • Reduction primitive may produce incorrect results for tensors exceeding 4 GB on Intel® Arc™ Graphics (formerly DG2) and Intel® Arc™ Graphics for Intel® Core™ Ultra processors (formerly Arrow Lake-H).
  • Concat primitive may produce incorrect results for certain shapes on Intel® Arc™ A-series GPUs.
  • fp16 matmul primitive has a performance regression on Intel® GPUs based on Xe2 architecture.
  • f32 matmul primitive may sporadically produce incorrect results on Intel Arc B-series graphics.
  • int8 inner product primitive with tensors exceeding 4 GB in size may produce incorrect results on Intel® Data Center GPU Max Series.
  • bf16 layer normalization backpropagation may produce incorrect results on Intel® Data Center GPU Max Series.

System Requirements

Hardware requirements

CPU Processor Requirements
  • Intel Atom® Processor
  • Intel® Core™ Processor Family
  • Intel® Core™ Ultra Processors
  • Intel® Xeon® Processor Family 
  • Intel® Xeon® Scalable Performance Processor Family 

 

GPU Accelerator Requirements
  • Intel® UHD Graphics for 11th Generation and newer Intel® Processors 
  • Intel® Iris® Graphics
  • Intel® Iris® Pro Graphics
  • Intel® Iris® Plus Graphics
  • Intel® Iris®  Xe Graphics
  • Intel® Iris®  Xe Max Graphics
  • Intel® Data Center GPU Max Series
  • Intel® Data Center GPU Flex Series
  • Intel® Arc™ A-Series Graphics
  • Intel® Arc™ B-Series Graphics

 

Software Requirements

Supported Operating Systems

Intel® CPU Target Support

  • Linux*

    • Ubuntu* LTS v22.04, v24.04
    • SUSE Linux Enterprise Server* (SLES) 15 SP4, SP5, SP6, SP7, 16.0
    • Red Hat* Enterprise Linux*  8, 9, 10
    • Fedora* 41, 42
    • Debian* 11, 12
    • WSL 2 (via Ubuntu*, via SUSE Linux Enterprise Server*)
    • Amazon 2023, 2025
    • Rocky 9
  • Windows*

    • Windows* Pro & Enterprise 10, 11
    • Windows* Server 2019, 2022, 2025

Note: Minor releases (e.g., 2026.1) automatically inherit all OS requirements from the base major release (2026.0). Only operating systems marked with (+) are newly added in that minor release, and those marked with (–) are removed. Asterisks (**) indicate deprecation. All others remain unchanged and are not duplicated in the list.

Note: Using Microsoft’s Windows Subsystem for Linux 2 (WSL2) in Windows 10 and Windows 11, you can install the native Linux distribution of Intel oneAPI toolkits and libraries on Windows for CPU and GPU workflows.

Intel® GPU Accelerator Support

  • Intel® UHD Graphics for 11th Gen and newer Intel® Processors
    • Windows* Pro & Enterprise 10, 11
    • Ubuntu* v24.04, v24.10, v25.04, v22.04 LTS, v24.04 LTS
    • Rocky 9.6, 10
  • Intel® Arc™ Graphics
    • Windows* 10, 11
    • Ubuntu* v24.10, v24.04 LTS & v22.04 LTS
  • Intel® Data Center GPU Flex Series
    • Windows* 10, 11
    • Windows* Server 2019, 2022
    • Ubuntu* v22.04 LTS, v24.04 LTS, v25.04
    • Red Hat Enterprise Linux (RHEL)* 8.10, 9.4, 9.6, 10
  • Intel® Data Center GPU Max Series 
    • Ubuntu* v24.04 & v22.04 LTS, v25.04
    • Redhat Enterprise Linux (RHEL)* 8.10, 9.4 9.6, 10
    • SUSE Linux Enterprise Server* (SLES) 15 SP4,  SP5, SP6, SP7

 

Developer Tools

CPU and GPU:

Intel® oneAPI DPC++/C++ Compiler is supported for Windows and Linux systems.

The developer tools required for oneAPI Deep Neural Network Library (oneDNN) are the same as those for the Intel® oneAPI Toolkit. Refer to Intel® oneAPI Toolkit System Requirements

Graphics Driver Installation

  • Windows Intel® Graphics Driver
    To install the driver follow the directions in the article appropriate for your device

    • Intel® Arc™ Graphics, 11th-13th Gen Intel® Core™ processor directions.
    • Xe Dedicated, 6th-10th Gen Intel® Core™ Processor Graphics, and related Intel Atom®, Pentium®, and Celeron® processors directions. Driver version varies depending on the Intel® Graphics in the system.
    • Intel® Data Center GPU Flex Series (ATS-M). Contact your OEM representative for access to the Intel® Registration Center.
  • Linux General Purpose Intel® GPUs (GPGPU) Driver
    For all Intel® GPUs, see this article, Overview, and follow the directions for your device.

Deprecation announcements

Please note that the Intel® oneAPI Deep Neural Network Library will no longer be part of the Intel® oneAPI Toolkit starting 2027.0.

Removal announcements

Please note that the Intel® oneAPI Deep Neural Network Library will be removed from Intel® Deep Learning Essentials in 2026.0.

Other Documentation and Support

Customers with active Priority Support for Intel® oneAPI Toolkit can open a ticket for this product at Online Service Center.

You can also visit the technical support forum, FAQs, and other support information at:

Attributions

For the avoidance of doubt, the Intel® oneAPI Deep Neural Network Library is solely governed by the terms and conditions of the End User License Agreement for Intel® Software Development Product that accompanies the Intel® oneAPI Deep Neural Network Library.

Legal Information

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

The products and services described may contain defects or errors which may cause deviations from published specifications. Current characterized errata are available on request. No product or component can be absolutely secure.

Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting Resource & Documentation Center.

Intel, the Intel logo, Intel Core, Intel Xeon Phi, VTune, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Java is a registered trademark of Oracle and/or its affiliates.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.


Copyright 2010 - 2026 Intel Corporation.

This software and the related documents are Intel copyrighted materials, and your use of them is governed by the express license under which they were provided to you (License). Unless the License provides otherwise, you may not use, modify, copy, publish, distribute, disclose or transmit this software or the related documents without Intel's prior written permission.

This software and the related documents are provided as is, with no express or implied warranties, other than those that are expressly stated in the License.

 

1