This document summarizes new and changed product features and includes notes about features and problems not described in the product documentation.
Where to Find the Release
Please follow the steps to download the Intel® oneAPI Base Toolkit from the Intel® oneAPI Base Toolkit Download page and follow the installation instructions to install.
The Intel® oneAPI DPC++/C++ Compiler’s integrated support for Altera FPGA has been removed as of the 2025.1 release. Altera® will continue to provide FPGA support through their dedicated FPGA software development tools. Existing customers can continue to use the Intel® oneAPI DPC++/C++ Compiler 2025.0 release which supports FPGA development and is available through Linux* package managers such as APT, YUM/DNF, or Zypper. Additionally, customers with an active support license can access the Intel® oneAPI DPC++/C++ Compiler 2025.0 via their customer support account.
For more information and assistance with transitioning to the Altera development tools, please contact your Altera representative.
oneAPI 2025.2, Compiler Release 2025.2
Major New Features and Enhancements
- ThreadSanitizer Support: Extended CPU Thread Sanitizer support to device-side, to detect data races access in both CPU and device code. It supports data race detection within USM memory, SYCL buffers, and device_global memory detection in SYCL and OpenMP C/C++ device code.
- MemorySanitizer Support: The device-side Memory Sanitizer is extended to support OpenMP offload to detect uninitialized memory use. It’s also extended to support the detection on local and private memory as an experimental feature.
- Hardware Profile Guided Optimization (HWPGO): Enhanced HWPGO to remove pseudo probe description and restore dwarf discriminator for call instruction when using
-sample-profile-remove-probe
. Enhanced Clang driver to automatically add column info, which is important for HWPGO to generate/load profile file, when using-gdwarf
or-fprofile-sample-use
.
New Features
C/C++ Compiler:
- Added an
--lbr-mispredicts
mode to llvm-profgen which can use theLBR_INFO
branch prediction flag to create a branch mispredict profile instead of samples of a separate branch mispredict event. This approach has the known downside of only collecting mispredicts on taken branches, but it may simplifies sampling requirements in special use cases. Using a separate branch mispredict event as described in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference should still be used in preference to--lbr-mispredicts
. - Improved code generation for AMX ISA and inline assembly error checking.
- Various improvements in integer and floating point arithmetic.
- For the
-qopt-report
option:- Memory prefetching will be reported in store/load-only mode (as well as the default mode).
- Memory accesses using
omp simd nontemporal
will be recorded in the optimization report. - The vectorizer will add a remark to the optimization report when overriding the unroll factor provided by
#pragma unroll
. This can happen when the unroll factor is too large for the number of vectorized loop iterations.
- Added support for
-f[no]-offload-fp32-prec-div
and-f[no]-offload-fp32-prec-sqrt
compiler flags to control precision of floating-point division and square root. - Native CPU Device:
- Added support for source-based code coverage on Native CPU.
SYCL Compiler:
- The sycl_ext_oneapi_kernel_compiler extension specification was updated to accept SYCL as source language.
- Initial support for runtime compilation of SYCL code was implemented.
- The Level Zero v2 (L0 v2) adapter is a new backend that was added as an experimental feature, with plans to make it the default in 2025.3. This is a redesigned version of the L0 adapter that focuses on maximizing the performance of each queue mode individually. It currently supports immediate in-order mode only. This second version of the adapter significantly reduces host runtime overhead and improves latency of kernel submissions. If you experience any performance or functional issues with this adapter enabled, please report them here, specifying the adapter used.
- It can be enabled with
SYCL_UR_USE_LEVEL_ZERO_V2=1
.
- It can be enabled with
SYCL Library:
- SYCL Graphs:
- Implemented sycl_ext_codeplay_enqueue_native_command extension which allows submitting custom commands for interoperability with native runtimes to graphs built using the sycl_ext_oneapi_graph extension.
- Introduced ability to update host-task nodes in graphs.
- SYCL Bindless Images:
- Added support for more kinds of copy operations (image_mem_handle to USM and vice versa, USM to USM, etc.).
- Added support for Vulkan* timeline semaphores.
- SYCL Extensions:
- Implemented
sycl_khr_default_context
extension. - Introduced and implemented sycl_ext_oneapi_device_image_backend_content extension which allows to query underlying content of a device image for interoperability with with other runtimes (such as OpenCL or Level Zero).
- Introduced and implemented
sycl_ext_oneapi_current_device
extension which introduces another state into SYCL holding per-thread device. - Introduced and implemented
sycl_ext_oneapi_work_group_static
andsycl_ext_oneapi_work_group_scratch_memory
extensions that provide different ways of allocating and accessing device local memory (i.e. shared by all work-items within a work-group). - Introduced and implemented
sycl_ext_intel_kernel_queries
extension. - Implemented proposed
sycl_ext_intel_event_mode
extension. - Completed implementation of
sycl_ext_oneapi_launch_queries
extension. - Completed implementation of the
sycl_ext_oneapi_kernel_arg_properties
extension by implementing missing unaliased property.- It used to be called restrict in previous versions of the extension, but a renaming was done to avoid conflict with C99 restrict type qualifier.
- Introduced and implemented the
sycl_ext_oneapi_num_compute_units
extension.
- Implemented
- Support for core SYCL 2020 functionality:
- Aligned
SYCL_LANGUAGE_VERSION
macro definition with the recent SYCL 2020 spec change. (See KhronosGroup/SYCL-Docs#704).
- Aligned
- Implemented swizzle method for swizzles.
- Support for pre-C++11 ABI:
- Many SYCL APIs use
std::string
as argument or return type and it is known for its ABI being broken by gcc at some point. There are applications which are still built using old, pre-C++11 ABI and in order to support them, the SYCL Runtime should not havestd::string
(and some other classes) used at the ABI boundry. This effort has been largely completed, but some APIs still sneak up from time to time and being fixed:- Added support for
print_graph
API in pre-C++11 ABI mode. - Added support for
pipe::get_pipe_name
API in pre-C++11 ABI mode. - Decided not to support
get_backend_info
in pre-C++11 ABI mode (at least for now) because there are no queries that could be done through it. Calling it under pre-C++11 ABI mode now causes an error.
- Added support for
- Many SYCL APIs use
OpenMP:
- Support the OpenMP 6.0 stripe loop-transformation construct.
- The nowait clause in target, target enter/exit data, and target update constructs can now take an optional Boolean argument to conditionally choose between asynchronous or synchronous offloading.
- For spir64 devices, a new command-line flag,
-fopenmp-target-teams-default-vla-alloc-mode=malloc/wilocal
(default: wilocal), was added to allow control over how local copies for variable-length arrays private to teams and distribute constructs are allocated. - Added a new command-line flag,
-fopenmp-target-loop-stride=local-size/global-size/one
(default: local-size), to tune performance of spir64-offloaded OpenMP loops by controlling their loop stride. - Improved debug info in OpenMP-outlined routines where some variables were previously reported as optimized away by gdb.
Unified Runtime:
- Expanded support for the Level Zero adapters to provide binary backwards compatibility with Level Zero drivers with APIs as old as v1.7 of the Level Zero Specification.
Improvements and Bugfixes
SYCL Runtime:
- Reduced amount of string copies unnecessarily made by the SYCL Runtime for debug traces even if debug tracing is disabled.
- Reduced number of times
shared_ptrs
are copied. - Reduced amount of memory allocations happening by moving away from using std::function. This should also help with reducing compilation time of SYCL headers.
- Reduced amount of memory allocations required for
local_accessor
. - Reduce amount of memory allocations on "fast" kernel enqueue path and dropped some unnecessary runtime checks.
- Made more queue operations go through the "fast" path.
C/C++ and SYCL Compilers:
- Introduced a new optimization to eliminate back-to-back barriers when it is safe. Such chain of barriers may occur when multiple group algorithms are used next to each other.
- Removed a busy-wait loop from the implementation of
-fsycl-max-parallel-link-jobs
flag, making it consume less resources when waiting. - Uplifted maximum version of SPIR-V that compiler can generate to 1.5.
- Made compiler embed device library needed for bfloat16 support into the application (if it is used). This change will allow us to reduce the size of redistributable SYCL Runtime package by eliminating some files from it.
- Added a compiler warning diagnostic about undefined
SYCL_EXTERNAL
functions used in a module to help catch linking errors earlier. - Addressed issue where the compiler would generate invalid SPIR-V if kernel used arguments of boolean type.
- Switched to use native bfloat16 implementation for devices that support it, as well as fixed a bug where native implementation won't be used if multiple AOT targets are specified.
- Aligned behavior of
-Wimplicit-float-conversion
with the upstream clang for non-SYCL language modes. - Improved check for unsupported data types to actually rely on target information instead of hardcoded knowledge.
- Fixed where compilation with
-mlong-double-64
would still result in error that a 128-bit double is not supported by a target. - Fixed a bug where linking static libraries with SYCL code in them using -l:libname.a spelling would ignore device code from those libraries.
- Fixed a bug where having a pure virtual function during device compilation would cause unresolved symbol errors emitted by device compiler on Windows.
- Fixed a bug where having two kernels (one annotated with reqd_work_group_size attribute/property and another without it) together with
-fsycl-device-code-split=off
would cause runtime error about mismatched work-group size. - Native CPU device:
- Improved support for
dynamic_address_cast
on Native CPU device. - Improved performance of Native CPU device: less memory allocations and thread launches.
- Fixed a bug where submitting the same kernel multiple times at about the same time with different argument would lead to incorrect arguments being used.
- Fixed compiler crashes when building applications that use atomics.
- Fixed segfaults happening in SYCL CTS tests for
async_work_group_copy
API. - Improved support for sub-groups by updating version of oneAPI Construction Kit.
- Improved support for
- Sanitizers:
- Reduced the frequency of shadow memory reallocation to reduce memory overhead and improve runtime performance.
- Fixed ASAN throwing an exception with
UR_RESULT_ERROR_INVALID_ARGUMENT
when detecting incorect memory free operation.
- Explicit SIMD extension:
- Extended
sycl_ext_intel_esimd
extension specification and implementation with new queries to check support for 2d load/store/prefetch operations. - Fixed miscompilations of ESIMD functions under high optimization levels when compiler performs aggressive inlining.
- Extended
SYCL Library:
- Made
group_[load|store]
functions to use native built-ins when used with vectors of 16 shorts. - Extended support for shared libraries to make it work with kernel bundles as well.
- Added tracing (through
SYCL_UR_TRACE
) forSYCL_DEVICE_ALLOWLIST
decisions for better discoverability of the feature. - Aligned implementation of
info::execution_capability
query with the recent SYCL 2020 specification change made in KhronosGroup/SYCL-Docs#625. - Fixed compilation issues with group functions like
select_from_group
with certain data types (pointers, marray<bfloat16, 4>, for example). - Implemented persistent cache eviction.
- Enforced constraints documented by the
sycl_ext_oneapi_reduction_properties
extension. - Clarified and enforced properties constraints in the
sycl_ext_oneapi_group_load_store
extension specification and implementation. - Implemented properties validation to kernel bundle and graph APIs.
- Updated the
sycl_ext_oneapi_in_order_queue_events
extension specification and implementation to make event returned byext_oneapi_get_last_event
optional for queues where no work had been submitted. - Update the
sycl_ext_oneapi_group_load_store
extension specification and implementation to accept the alignment property in group load/store built-in functions to allow for more optimized implementation. - Lifted restriction that host APIs from
sycl_ext_oneapi_free_function_kernels
had to be guarded by#ifndef __SYCL_DEVICE_ONLY__
. - Fixed potential resource leaks in online compiler extension.
- Fixed an issue where
known_identity<min|max>
would return incorrect values with the-ffast-math
flag. - Fixed a UB in implementation of device_global which sometimes led to spurious results.
- Fixed a static_assert failure in SYCL headers when an application is built with
-funsigned-char
. - Resolved an issue caused memory operations enqueued through
sycl_ext_oneapi_enqueue_functions
extension to break functionality ofsycl_ext_oneapi_enqueue_barrier
extension. - Fixed a bug where compiling with
-D_FORTIFY_SOURCE=2
would cause errors from device compilers at JIT stage (or during AOT compilation) about undefined__memcpy_chk
symbol. - Fixed an incorrect result of
std::exp(std::complex)
in some corner cases. - Fixed a crash happening when you launch a kernel that is defined in both the application and a dlopen-ed shared library after that library was unloaded through dlclose.
- Fixed a memory leak happening when a kernel submission failed.
- Fixed a bug where using
vec::operator[]
would cause compilation issues on Windows when an application is built using clang.exe and_DEBUG
macro is set. - Aligned
joint_matrix_apply
implementation with the specification change to be able to modify both matrices. - Bindless Images:
- Added support for
ext_oneapi_bindless_sampled_image_fetch_1d
,ext_oneapi_bindless_sampled_image_fetch_1d_usm
,ext_oneapi_bindless_sampled_image_fetch_2d
,ext_oneapi_bindless_sampled_image_fetch_2d_usm
andext_oneapi_bindless_sampled_image_fetch_3d
aspects on Level Zero backend. - Fixed return types of image extent queries to match the specification.
- Clarified the types of supported USM memory in the extension specification.
- Fixed compiler crash caused by the use of anisotropic sampling operations on 3D mipmaps, due to the intrinsic being generated with an incorrect number of LOD gradient parameters.
- Added support for
- SYCL Graphs:
- Reimplemented topological sort algorithm used to determine graph nodes execution order to avoid issues with overflowing stack on huge graphs and improve performance.
- Documented kernel binary update feature which allows to update kernel nodes in graphs.
- Fixed race condition in
command_graph
node queries. - Fixed the issue with not all graph-related classes fully implementing common reference semantics.
- Made
ext_oneapi_weak_object
extension work with graph objects. - Fixed a bug where using
local_accessor
orwork_group_memory
objects as part of graph update would function incorrectly on non-SYCL backends.
SYCLcompat Library:
- Introduced new set of group utility functions and classes aimed to reduce the gap between
syclcompat
anddpct
namespaces. - Fixed
compare_mask
putting results in the wrong 2-byte segment of 4-byte output. - Optimized implementation of
permute_sub_group_by_xor
for the case whenlogical_sub_group_size
is 32. - Added new function
ternary_logic_op
to perform bitwise logical operations on three input values based on the specified 8-bit truth table. - Fixed issues with multiple vectorized operations returning wrong results.
OpenMP:
- Fixed a bug affecting usage of target offload in lambda.
- Implemented a more robust mechanism to detect OpenMP loops that were optimized away to better differentiate them from malformed loops.
- Corrected the handling of OpenMP simd loops that were optimized away, previously crashing the compiler in some cases.
- Fixed an issue in the collapsing of OpenMP simd loops at
-O0
that caused incorrect vectorization. - Fixed incorrect OMPT callbacks for teams distribute parallel for constructs.
- Fixed a lastprivate issue in task and target regions causing spurious function arguments to be created for outer target/task regions.
- Improved emission of
-qopt-report
remarks about how OpenMP data-sharing clauses were optimized. - Fixed an incorrect behavior of tile construct nested inside a target teams distribute parallel for collapse(N) construct.
- The flush construct is no longer ignored for OpenMP spir64 offload.
Issues with 3rd-party host compilers:
- Fixed compilation issue with
get_vec_idx
internal helper with MSVC as host compiler. - Fixed missing
#include
when building with GCC 13 as host compiler. - Fixed compilation issue with joint matrix extension with MSVC from Visual Studio 2019 as host compiler.
Misc:
- Removed testing on FPGA Emulator as a step towards our strategy to drop FPGA support. Starting with this release there is no guarantee that FPGA-specific features continue to work.
- Docker images containing nightly builds are not provided anymore, but we still provide Dockerfiles so you can build those images yourself.
- Fixed OCL CPU Runtime installation script leaving incorrect permissions on a system folder.
Known Issues and Limitations
SYCL:
- SYCL headers use unreserved identifiers which sometimes cause clashes with user-provided macro definitions. Known identifiers include:
G
,VL
. - When using
sycl_ext_oneapi_matrix
extension it is important for some devices to use the sm version (Compute Capability) corresponding to the device that will run the program. This particularly affects matrix operations using half data type. - C/C++ math built-ins (like
exp
ortanh
) can return incorrect results on Windows for some edge-case input. The problems have been fixed in the SYCL implementation, and the remaining issues are thought to be in MSVC. - There are known issues and limitations in virtual functions functionality, such as:
- Optional kernel features handling implementation is not complete yet.
- AOT support is not complete yet.
- A virtual function definition and definitions of all kernels using it must be in the same translation unit. Please refer to sycl/test-e2e/VirtualFunctions to see the list of working and non-working examples.
OpenMP:
- Some OpenMP spir64 offload programs compiled with
-O0 -g
may result in a segfault failure at runtime. Workaround: compile the program without-g
, or compile it with-O2 -g
. - For OpenMP spir64 offload, the memory-order and memscope clauses of the flush construct are silently ignored and the default (more conservative but correct) values of seq_cst/device are used for now.
Unified Runtime:
- On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts on Windows. This is because on Windows the release of the plugin DLLs races against the release of static global variables (like the default context).
Intel® Graphics Compiler:
- The Vector Compute backend does not support
-O0
code and often gets miscompiles, producing wrong answers and crashes. This issue directly affects ESIMD code at-O0
. As a temporary workaround, the ESIMD code is optimized even in-O0
mode.
CPATH to C_INCLUDE_PATH and CPLUS_INCLUDE_PATH Transition:
The oneAPI environment setup scripts have historically added paths for C, C++, and Fortran header files to the CPATH environment variable. These scripts are being transitioned to add relevant paths to the C_INCLUDE_PATH and CPLUS_INCLUDE_PATH environment variables instead of CPATH. This transition is intended to isolate Intel provided header files from the effects of compiler options used by customers to request or suppress compiler warnings in their own source files.
Paths present in the CPATH environment variable specify user include paths. By default, compiler warnings are issued for potentially problematic source code in header files found via these paths subject to use of options like -Wall
, and most other options that begin with -W
. Paths present in the C_INCLUDE_PATH and CPLUS_INCLUDE_PATH environment variables specify system include paths. By default, compiler warnings are suppressed for header files found via these paths (warnings in system header files can be enabled with the -Wsystem-headers
option).
For most customers, this transition will be transparent with the only observed difference being that warnings are less likely to be issued for source code in Intel provided header files. However, there are some edge cases that could cause other differences in behavior for some customers.
Paths present in the CPATH environment variable are searched after paths specified by the -I
option, but before paths present in the C_INCLUDE_PATH and CPLUS_INCLUDE_PATH environment variables, paths specified by the -isystem
option, and paths implicitly added by the compiler. Customers that have been adding their own include paths to the end of the CPATH environment variable (after paths historically added by the oneAPI environment scripts) will now find that their include paths will be searched before include paths for Intel-provided header files. This could result in a customer provided header file with the same name as an Intel provided header file now being found first where as previously the Intel-provided header file would have been found first.
If the same path is present in both the CPATH and C_INCLUDE_PATH or CPLUS_INCLUDE_PATH environment variables, or is specified by the -isystem
option or implicitly added by the compiler, then the matching path in the CPATH environment variable will be ignored. Customers that, perhaps inadvertently, add a path to Intel-provided header files to the CPATH environment variable that match a path added to the C_INCLUDE_PATH or CPLUS_INCLUDE_PATH environment variables by the oneAPI environment setup scripts, will find their addition to CPATH ignored. Since the oneAPI environment setup scripts would have previously added paths to CPATH, this could result in different include path search orders and thus different header files being found when a customer-provided header file has the same name as one provided by Intel.
Both of the above scenarios depend on header files with the same name being present in multiple include paths. Observable differences are only likely to occur if those header files have different contents or if they use the #include_next
directive.
API/ABI Breaking Changes
- Removed support for FPGA-related options as part of our strategy to drop FPGA support.
- Removed options:
-fintelfpga
,-fsycl-targets=spir64_fpga[-unknown-unknown]
,-fsycl-link=early|image
,-Xsycl-target-backend=spir64_fpga "opt"
,-reuse-exe=arg
, and-fsycl-help=fpga
. - Removed experimental
sycl_ext_intel_oneapi_compiler
extension support. Its APIs have been marked as deprecated for a while andsycl_ext_oneapi_kernel_compiler
extension should be used instead.
Deprecations
- Deprecated
sycl_ext_oneapi_default_context
extension in favor ofsycl_khr_default_context
extension. - Deprecated
-fsycl-fp32-prec-sqrt
compiler flag in favor of-foffload-fp32-prec-sqrt
flag. - Deprecated overloads of
single_task
andparallel_for
APIs that accept properties which used to be a part ofsycl_ext_oneapi_kernel_properties
extension.sycl_ext_oneapi_enqueue_functions
extension should be used instead. - Deprecated overloads were completely removed from the extension specification.
- Deprecated current implementation of
get_backend_info
API. The SYCL 2020 specification currently does not document anything that could be queried through it and therefore existing queries supported through it are deprecated to avoid possible confusion.
- A known issue when compiling SYCL code on Windows using CMake with the 2025.1.0 version of the compiler has been fixed in this patch release. Please use 2025.1.1 or just replace IntelSYCLconfig.cmake from 2025.1 with 2025.1.1 for the latest fix for “CMake Error: Could NOT find IntelSYCL (missing: SYCL_LIBRARY)”.
- Fixed the issue that led to the launch failure of an application built with ‘-x’ option specifying the target platform(which is TigerLake or above) on an OS without CET support.
- Updated encodings of VCOMX*/VUCOMX* and VGETEXPPBF16 instructions according to AVX10.2 spec rev. 2. This may require an update to SDE 9.53 or later when binary is built with -mavx10.2 or with options implying -mavx10.2.
Major New Features and Enhancements:
MemorySanitizer Support: Extended CPU Memory Sanitizer support to device-side, including GPUs facilitating detection and troubleshooting of memory issues in both CPU and device code. This improves application reliability by ensuring comprehensive memory error checking across platforms.
ccache* Integration: Compiler now supports ccache* to significantly speed up build times for C++ and SYCL codes. By caching previous compilations and reusing them, developers can experience faster iterations and more efficient workflows.
Floating Point Accuracy Controls: User control over accuracy of floating-point operations and library calls is now extended to the device code.
SYCL Interoperability with Graphics APIs:Added initial support for SYCL interoperability with DirectX* 12 and Vulkan*, which enables developers to build efficient visual compute, media processing, and rendering applications on Intel® Graphics. For details on image-formats and platform support, refer to SYCL Interoperability Limited Support
New Features
SYCL Compiler:
- Implemented initial support for SYCL Virtual Functions support with the intent to gather initial feedback from users. Please refer to the Known Issues section for details on current limitations of this feature.
- Dynamic linking of device code is now supported via
-fsycl-allow-device-image-dependencies
command line option. This feature allows device code to be exported via a Windows DLL and includes support for dynamic linking of AOT compiled images for the OpenCL GPU backend. - Enhancements to free function kernel support include the addition of structs as kernel arguments and the inclusion of work group memory as a kernel parameter.
- Device sanitizer now supports invalid kernel argument detection, and address sanitizer has been enhanced to detect null pointers.
- A mechanism has been implemented to lift restrictions on SYCL device code in constant expressions via the option -fsycl-allow-all-features-in-constexpr.
SYCL Library:
- Enhanced SYCL Graph functionality with implicit recording mechanism and dynamic command-groups, and a new graph enqueue function,
execute_graph
, in accordance with the updated sycl_ext_oneapi_graph extension. - Added support for Intel® Arc™ B series and Intel® Core Ultra Series device architectures.
- Added additional devices with Joint Matrix support: Battlemage, Lunar Lake and Arrow Lake H. Added more types and shapes to PVC combinations for SYCL Matrix.
- New ESIMD features include mask compressed ESIMD load/store API, support for root group barriers, addition of
clamp
API for ESIMD, and support for theext::intel::experimental::esimd::frem
function - Implemented the following set of extensions:
- Added support for
sycl_ext_oneapi_enqueue_functions
to SYCL Graph. - Implemented
sycl_ext_oneapi_raw_kernel_arg
extension. - Added initial support for
sycl_ext_oneapi_atomic16
extension. - Implemented
sycl_ext_oneapi_get_kernel_info
extension. - Implemented
sycl_ext_oneapi_work_group_memory
extension. - Implemented
sycl_ext_oneapi_reduction_properties
extension.
- Added support for
Unified Runtime:
- To support NPU/GPU device coexistence in the same application, support for the new L0 init zeInitDrivers has been added in 2025.1. This enables for SYCL and OpenVINO™ and other NPU device libraries to coexist in the same application utilizing GPU + NPU functionality simultaneously.
- Updated the Mutable Command List support in the UR L0 Adapter to utilize the Level Zero Specification’s extension functionality instead of the driver experimental.
- For improved performance, usage of immediate command lists is the default behavior on Linux in the UR L0 adapter for Intel® Arc™ Series GPUs along with Intel® Core Ultra 200v Series.
On Windows, usage of immediate command lists is the default behavior on Intel® Arc™ B Series GPUs along with Intel® Core Ultra 200v Series.
OpenMP:
- Support the OMP6.0 interchange loop-transformation construct and the permutation clause.
- Emit opt-report remarks for load/store of variables listed in the nontemporal clause of the simd construct.
Misc:
- Added several enhancements in sanitizer support:
- New Numerical Stability Sanitizer (NSAN) for C++ Code adopted from community contributions
- Memory Sanitizer extended to support SYCL and OpenMP C/C++ Device Code (only USM device allocations)
- Major improvements to Address Sanitizer for Device Code – invalid kernel argument detection, null-pointer detection, memory leak detection, private memory support for openMP Offload
- For C/C++ compilations on Linux, added support for -q[no-]unknown-option-as-warning option which provides the ability to handle unknown options on the command line with a warning diagnostic. The default behavior is to error on unknown options.
- The compiler's code coverage tool has been enhanced to offer detailed analysis and comprehensive HTML reports like ICC to identify tested and untested code sections.
Improvements
SYCL Compiler:
- Removed the need for the
SYCL_EXTERNAL
attribute in free function kernel definitions. - Enhanced compilation time for ESIMD kernels.
- Disabled attribute propagation from SYCL 1.2.1 and removed remaining SYCL 2017/1.2.1 compatibility elements, including
-Wsycl-strict
diagnostics. - Ensured compiler-generated integration headers/footers are warning-free to prevent
-Werror
build failures, especially with third-party host compilers. - Built basic functionality of the SYCL
joint_matrix
extension on theSPV_KHR_cooperative_matrix
extension. - Expanded supported aspects for the CPU AOT target.
- Added diagnostics for incorrect arguments with
-fsycl-device-obj
. - Introduced a warning for applying kernel-only attributes to non-kernel functions.
- Fixed misleading diagnostics for non-external functions/variables when using attributes like
[[sycl_device]]
or[[intel::device_indirectly_callable]]
. - Updated
-fsycl-link=image
to package host objects like-fsycl-link=early
, ensuring proper linking, especially on Windows. - Added extra optimization passes in the Native CPU pipeline.
- Updated
-fsycl-host-compiler
to use only user-provided hints (e.g.,PATH
) for locating the specified compiler, avoiding incorrect binary usage. - Deprecated
[[intel::reqd_sub_group_size]]
; use the SYCL 2020 spelling with thesycl::
namespace. - Disabled ITT annotations in device code by default to reduce code size.
- Enabled floating-point atomics via
atomicrmw
instructions for Native CPU. - Enabled nonsemantic debug info by default to improve the debugging experience.
SYCL Library:
- Added binary caching support to the
kernel_compiler
extension. - Enabled a check on Linux systems to inform users to use
SYCL_UR_TRACE
instead ofSYCL_PI_TRACE
. - Improved GDB printers for SYCL types and values.
- Renamed
ur
tour.call
in XPTI traces. - Refactored the XPTI framework to use 128-bit keys for collision elimination and added support for 64-bit universal IDs for backward compatibility.
- Made repeated calls to
command_graph::begin_recording
an error. - Aligned
sycl_ext_oneapi_address_cast
implementation with the specification. - Optimized the
atomic_ref
constructor for the SPIR-V target. - Enhanced handling of compile-time properties.
- Refined parsing of Device Sanitizer options via the
UR_LAYER_ASAN_OPTIONS
environment variable. - Improved detection of conflicts between kernel properties related to work group size.
- Enhanced framework/app software layers to provide code locations for SYCL-generated XPTI events.
- Improved performance of the
rsqrt
ESIMD API. - Added property validation to core SYCL object constructors.
- Deprecated
__SYCL_USE_VARIADIC_SPIRV_OCL_PRINTF__
. - Enforced data type restrictions in
marray
andvec
. - Improved
sycl_ext_oneapi_address_cast
by changing "dynamic" behavior to "static" where allowed. - Enhanced
sycl-ls
to reportext::intel::info::device::device_id
. - Added no-op implementations for runtime APIs for Native CPU, as programs are compiled offline.
- Updated the
local_accessor
GDB printer to display elements with a decorated pointer and address space qualifier. - Improved ESIMD
copy_to()
andcopy_from()
to useblock_load
/block_store
for better performance. - The OpenCL adapter now uses the local work size set in program IL when not specified in
clEnqueueNDRangeKernel
. - Improved OpenCL adapter to support older ICD loaders.
- Repurposed
SYCL_CACHE_TRACE
for fine-grained tracing of all SYCL program caches. - Enabled Sysman API by default in the L0 adapter, removing the need to set
ZES_ENABLE_SYSMAN
. - Allowed copy-construction of
device_global
without thedevice_image_scope
property. - Improved UR libraries to avoid unnecessary overhead if nothing is subscribed to the
ur.call
XPTI call stream. - Refactored copy engine usage checks in the L0 adapter for better performance.
- Implemented tracing for in-memory kernel and program cache.
- Improved error handling in the SYCL RT command enqueue function to provide clearer exceptions.
- Added address sanitizer AOT libraries for various GPU/CPU targets and renamed the device sanitizer library to
libsycl-asan
. - Undeprecated legacy
multi_ptr
as it is no longer deprecated in the SYCL specification. - Deprecated
info::device::atomic64
; usesycl::aspect::atomic64
instead. - Removed build options from the fast kernel cache key to reduce lookup overhead.
- Improved OpenCL adapter to use the extension version of
clGetKernelSubGroupInfo
when necessary. - Updated SYCL graph design documentation with a new command-list enqueue path.
- Enhanced
online_compiler::compile
to support pre-C++11 ABI.
Misc:
- Support for OpenCL __attribute__((blocking)) has been removed. This allows enabling support for the [[clang::nonblocking]], [[clang::nonallocating]], [[clang::blocking]] and [[clang::allocating]] function type attributes, as well as their GNU-style variants.
- For the functions which return structs by value, ABI requires passing a special parameter which contains the address of memory where that returned struct should be placed. This parameter is implicit, users don't see it and can't provide any vector specification for it. Support for allowing such functions and emitting vector-variants attribute for them has been added.
Bug Fixes
SYCL:
- Resolved false positives in Device Sanitizer by unpoisoning local/private shadow memory before function return.
- Added
ext_oneapi_ballot_group
aspect to thespir64_x86_64
target, supported since OpenCL CPU 2024.2. - Restored kernel instantiations on the host for debugger compatibility with SYCL code.
- Fixed local scope module variables for Native CPU.
- Corrected device libraries requirement mask for the SPIRV target to ensure proper linking.
- Suppressed system errors when loading adapters on Windows.
- Disabled internalization of kernels for dynamic linking to ensure visibility.
- Fixed a use-after-free bug in the
clang-linker-wrapper
. - Enforced SYCL headers to be included with
#include <sycl/sycl.hpp>
. - Fixed device module splitting for ESIMD related to using
assert
in user code. - Correctly assigned architectures to their respective targets with
-fsycl-targets
. - Fixed devicelib handling when linking multiple images.
- Matched
-device_options
with-device
for AOT GPU. - Stopped passing HEX values to
-device_options
due to IGC limitations. - Fixed crash with an empty
-fsycl-targets
option. - Set calling convention to
spir_func
for SPIRV function calls related to specialization constants and hierarchical parallelism. - Added a workaround for SPIRV
AccessChain
usage in SYCL matrix operations. - Addressed code splitting issues with FPGA archives.
- Fixed parsing of device values in backend target options.
- Limited Device Sanitizer to report only one error per kernel instance.
- Resolved issues with vector shuffle built-ins on the NativeCPU backend.
- Fixed incorrect symbolizer output for shared libraries in Device Sanitizer.
- Disabled Address Sanitizer on modules with ESIMD to prevent excessive kernel code size.
- Fixed iterator invalidation issue in the SYCL Joint Matrix pass on Windows debug builds.
- Corrected integration footer for
device_global
with explicit template specialization.
OpenMP:
- Fixed a bug related to mapping of variable-length arrays where the size is known at compile time.
- Fixed a performance issue when an unroll construct is in a loop nest bound to an outer parallel for construct.
- Fixed potential unsafe vectorization of some loops that are bound to parallel for.
- Improved performance of some collapsed loops by choosing a more optimal data size for the collapsed loop IV.
- Improved offload performance of some target teams distribute parallel for reduction loops with constant trip count.
- Fixed flaky fails due to race conditions when using dispatch construct with SYCL interop objects.
- Fixed a bug where the nogroup clause of a taskloop construct was not honored.
- Fixed a crash when running certain target nowait (asynchronous offload) kernels containing loops.
- Fixed an ICE in some cases where a tile construct is bound to the same loop bound to an outer for construct.
- Fixed an issue where the device clause was not honored for the dispatch construct.
- Improved performance of some low-trip-count loops bound to the loop construt.
- Fixed a bug where some for or simd loops with trip counts > MAX_INT were not being transformed correctly.
- GPU dispatch now supports “Battlemage” architecture integrated (Lunar Lake) and discrete graphics (Intel® Arc™ B-Series graphics cards) parts that utilize the Xe2 microarchitecture.
Known Issues & Limitations
SYCL:
- Following are the details on the limited support of SYCL interoperability:
- Platform Support: Intel® Arc™ B series Graphics(Battlemage), Intel® Iris® Xe Graphics (DG2), Intel® Core™ Ultra Processors (Lunarlake and Meterolake).
- Image channels: 1, 2 and 4-channel
- Image formats: VK_FORMAT_R16G16_SFLOAT, VK_FORMAT_R32_SFLOAT, VK_FORMAT_R16G16B16A16_SFLOAT, VK_FORMAT_R32G32_SFLOAT, VK_FORMAT_R16_SFLOAT
- Known issues
- On Intel® Iris® Xe Graphics and Intel® Core™ Ultra Series 1 (Meteorlake) Processors currently there is a known issue with compressed 2D and 3D images for 1,2 and 4 channels that are greater than 64KB in size, where if users try to export images from other APIs and import into SYCL for manipulation, it leads to data mismatches once SYCL operates (performs computations) on the images. This issue found in GPU driver version 2507.12 will be addressed in an upcoming GPU driver release.
-
There is a known issue when compiling SYCL code on Windows using CMake with the 2025.1 version of the compiler which can cause errors like
CMake Error at C:/Program Files/CMake/share/cmake-3.31/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
Could NOT find IntelSYCL (missing: SYCL_LIBRARY)
Reason given by package: SYCL: It appears that the C:/Program Files (x86)/Intel/oneAPI/compiler/latest/bin/icx.exe does not support SYCL
Workaround: C:\Program Files (x86)\Intel\oneAPI\compiler\latest\lib\cmake\IntelSYCL\IntelSYCLConfig.cmake file needs to be updated with the following two changes:-
Line 332: Update set(sycl_lib_suffix "7") with set(sycl_lib_suffix "8")
-
Line 365: Replace set(SYCL_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SYCL_FLAGS}") with
list(JOIN SYCL_FLAGS " " SYCL_FLAGS_STRING)
message(DEBUG "SYCL_FLAGS_STRING: ${SYCL_FLAGS_STRING}")
set(SYCL_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SYCL_FLAGS_STRING}")
A fix for this issue is now available in the 2025.1.1 compiler release.
-
- On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts on Windows. This is because on Windows the release of the plugin DLLs races against the release of static global variables (like the default context).
- Intel Graphic Compiler's Vector Compute backend does not support O0 code and often gets miscompiled, produces wrong answers and crashes. This issue directly affects ESIMD code at O0. As a temporary workaround, we have optimize ESIMD code even in O0 mode.
- C/C++ math built-ins (like
exp
ortanh
) can return incorrect results on Windows for some edge-case input. The problems have been fixed in the SYCL implementation, and the remaining issues are thought to be in MSVC. - [new] There are known issues and limitations in virtual functions functionality, such as:
- Optional kernel features handling implementation is not complete yet.
- AOT support is not complete yet.
- A virtual function definition and definitions of all kernels using it must be in the same translation unit. Please refer to
sycl/test-e2e/VirtualFunctions
to see the list of working and non-working examples.
- When running synthetic benchmarks, it is possible for performance on Intel's Flex and Arc A Series GPUs to be less than previously measured when running with the new defaults using Immediate Command Lists in SYCL/Unified Runtime L0 Adapter. To mitigate this issue on those workloads, one can regain the lost performance by creating the SYCL queue with the `no_immediate_command_list` queue property or by setting the environment variable UR_L0_USE_IMMEDIATE_COMMANDLISTS=0. These will enforce the usage of command batching in the Unified Runtime L0 adapter which may improve the performance of those workloads.
OpenMP:
- Offload code with reduction across teams may result in incorrect results or even hangs on some platforms with integrated GPUs.
- ICX and ICPX ignore "#pragma omp flush" for spir64 offload compilation.
Other Known Issues:
- The switch from a static to a dynamic sanitizer runtime in 2025.1 compiler has led to runtime crashes due to the missing clang_rt.asan_dynamic-x86_64.dll. The workaround is to add C:\Program Files (x86)\Intel\oneAPI\compiler\2025.1\lib\clang\20\lib\windows to the PATH environment variable.
API/ABI Breaking Changes
- Updated experimental
sycl_ext_oneapi_bindless_images
extension documentation and implementation: interoperability structs/funcs were renamed toexternal
keyword overinterop
. - Removed
sycl::ext::oneapi::experimental::is_property_key
. - Removed some
OSUtil::*
funcs from ABI under-fpreview-breaking-changes
, these are used internally in the DSO and don't need to be exposed outside. - Made
ext_oneapi_cl_profile
implementation to be ABI-neutral. - Fixed SYCL Graph API to be ABI-neutral to avoid dual-abi issues on Linux.
This patch release of the compiler consists of various bug fixes and quality improvements.
Deprecation Notice: The Intel® oneAPI DPC++/C++ Compiler integrated support for Altera FPGA is now deprecated and will be removed with the compiler's release in the first quarter of 2025. Altera* will continue to provide FPGA support through their dedicated FPGA software development tools. Existing customers can continue to use the Intel® oneAPI DPC++/C++ Compiler 2025.0 release which supports FPGA development and is available through Linux* package managers such as APT, YUM/DNF, or Zypper. Additionally, customers with an active support license can access the Intel® oneAPI DPC++/C++ Compiler 2025.0 via their customer support account.
For more information and assistance with transitioning to the Altera development tools, please contact your Altera representative.
This patch release consists of the following new features, improvements and bug fixes:
- Added functionality to compress device images during compilation and decompress them at runtime as needed. More details on this feature and case studies can be found at C++ with SYCL Device Image Compression.
- The Unified Runtime Level Zero Adapter enabled the usage of Level Zero System Management functionality by default.
- Created the launch API to SYCL Compat API library.
- ABI neutral version of modifiable_command_graph::print_graph has been enabled under preview option and will be enabled by default in the next major release.
- Fixed "-ipp" / "-qipp" switch linkage error.
- Added the following missing option values in IDE for -x, -ax, /arch, /Qx, /Qax flags:
-
[-x|-ax][SIERRAFOREST|GRANDRIDGE|GRANITERAPIDS|EMERALDRAPIDS|GRANITERAPIDS-D|ARROWLAKE|ARROWLAKE-S|LUNARLAKE|PANTHERLAKE|CLEARWATERFOREST] // Linux
[/arch:|/Qx|/Qax][SIERRAFOREST|GRANDRIDGE|GRANITERAPIDS|EMERALDRAPIDS|GRANITERAPIDS-D|ARROWLAKE|ARROWLAKE-S|LUNARLAKE|PANTHERLAKE|CLEARWATERFOREST] // Windows
-
- SYCLcompat introduces a new experimental launch API which allows the user to pass kernel properties, launch properties, and required local memory size in a launch_policy struct. These requirements are passed down to the SYCL runtime to define how the kernel is launched.
- Other small usability improvements
Major Enhancements and New Features
New Features:
- Hardware Enablement: Optimized for new Intel hardware including EMR, GNR, BMG, and LNL, with features such as cache hints and new data types for AI applications, delivering improved efficiency and computing power.
- Bindless Textures Support: Implemented Bindless Textures for Intel GPUs (DG2, Arc), allowing dynamic texture usage at runtime without compile-time knowledge, enabling enhanced performance and scalability.
Performance Tuning and Enhancements:
- AI and HPC Optimization: Tuned performance for AI frameworks and HPC applications.
- OpenMP Enhancements: Early support for OpenMP 6.0 features, including the DEVICE_TYPE clause for TARGET construct and mandatory offloading support. Also, fixed the OpenMP loop rotation issue. Checkout Advanced OpenMP* Device Offload with Intel® Compilers for more details.
- Compiler Reports: Enhanced opt-report for better user experience, now providing detailed information on OpenMP offloading and integrating with the open-source optimization remark framework. Details on recent enhancements can be found at Develop Highly Optimized Applications Faster with Compiler Optimization Reports
- Sanitizers for Device Code: Device code now supports LLVM sanitizers to help detect and resolve issues during development. It includes a compiler instrumentation module and runtime support, allowing it to detect issues such as out-of-bounds memory access on USM, SYCL buffers, local memory, and device globals, as well as bad-free, use-after-free, bad context, and more. In this release, PVC GPUs and CPUs are supported on Linux OS. More details on how and when to use sanitizers can be found at Find Bugs Quickly Using Sanitizers with the Intel® oneAPI DPC++/C++ Compiler.
- Comprehensive Performance Insights: Upgraded optimization reports now cover SYCL, OpenMP, and AOT compilation, offering developers deeper insights into application performance.
- Hardware Profile Guided Optimization (HWPGO): Key improvements include enhanced profile propagation for better accuracy, additional profile-driven optimizations to further boost performance, and early support for "pseudo probes" on Windows as an alternative to DWARF for profiling. Additionally, HWPGO has introduced selective function outlining, allowing for specific functions to be optimized based on profiling data, further enhancing runtime efficiency.
New Features
SYCL Compiler:
- SYCL Offload Model: Introduced a new SYCL offload driver mechanism with
--offload-new-driver
to improve infrastructure for better link times by reducing I/O and external processes. - Range Rounding Control: Added
-fsycl-range-rounding
option for managing range rounding, including forcing full rounding to reduce binary size. Additionally, the experimental-fsycl-exp-range-rounding
option performs rounding across all dimensions. - Double Type Emulation: Added
-fsycl-fp64-conv-emu
option for partial emulation of double data types on Intel GPUs. - Dynamic Linking: Initial support added for dynamic linking, though some features like
kernel_bundle API
andAOT mode
are not yet supported.
SYCL Library:
- Extensions: Implemented multiple extensions, including
sycl_ext_oneapi_prod
,sycl_ext_oneapi_profiling_tag
,sycl_ext_oneapi_forward_progress
,sycl_ext_oneapi_private_alloca
,syclext_codeplay_enqueue_native_command
, andsycl_ext_oneapi_enqueue_functions
. - Group Load/Store: Added support for
sycl_ext_oneapi_group_load_store
, enabling native hardware block read/write capabilities where applicable. - Free Function Kernels: Initial support for
sycl_ext_oneapi_free_function_kernels
extension, with known limitations around argument types and diagnostics. - Fused Multiply-Add (FMA): Added experimental ESIMD function
fma
which results in a guaranteed fused multiply-add operation performed. - Improved
sycl_ext_oneapi_group_sort
extension: Updated implementation ofsycl_ext_oneapi_group_sort
extension to match revision 2 of the specification. Previous version 1 is not available anymore and some code changes may be required.
Improvements
SYCL Compiler
-
Improved Compilation Flow: The process of generating integration footers has been optimized when no third-party host compiler is used, resulting in fewer temporary files and faster compilation times.
-
Additional Math Function Support: New support for math functions like
truncf
,sinpif
,rsqrtf
,exp10f
,ceilf
,copysignf
,cospif
,fmaxf
, andfminf
in SYCL kernels has been added as part of the C-CXX-StandardLibrary extension. More Intel Math Functions (IMF),::rand
and::srand
in device code on Intel devices, have also been integrated. -
Enhanced Error Messaging: Error messages have been improved for scenarios involving implicit
this
capture in kernels and missing architecture information when multiple targets are passed into the-fsycl-targets
flag. -
Optimized Compilation Flow: The number of commands needed for generating dependencies using the
-MD
flag has been reduced, streamlining the build process. -
Security and Debugging: Security-related compilation flags for libraries and tools have been strengthened, and the debugging experience has been improved for both Linux and Windows environments.
SYCL Library
- Support for ESIMD functions: Added support for
sqrt
andrsqrt
functions for double data types in ESIMD. - Cubemap and Sampled Image Arrays Support: Updated
sycl_ext_oneapi_bindless_images
extension to support cubemap images and sampled image arrays. - Named Barrier Allocation in ESIMD: Introduced ESIMD API for dynamic allocation of named barriers.
- Executable Command Graph Update: Added support for whole graph updates using
executable_command_graph::update
. - Deprecation Warning: A warning has been added for the use of the deprecated
<CL/sycl.hpp>
header. - Accessor Improvements:
local_accessor::get_pointer
andlocal_accessor::get_multi_ptr
now throw an invalid exception if called on the host. - Queue Operations Detection: Extended detection of nested queue operations to support shortcut methods.
- Simplified ESIMD API Interface: Added overloads of various ESIMD APIs (e.g.,
atomic_update
,block_load
,block_store
) allowing omission of some template arguments. - Bfloat16 Math Functions: Updated
sycl_ext_oneapi_bfloat16_math_functions
to support vectors ofbfloat16
passed to math functions. - Optimized
sycl::vec::as
: Improved the performance ofsycl::vec::as
by optimizing the implementation ofsycl::detail::memcpy
. - SYCL 2020 Exception Updates: Updated the implementation to throw SYCL 2020 exceptions instead of legacy SYCL 1.2.1 exceptions across the board.
sycl::vec::convert
Support: Added support forsycl::vec::convert
to and fromvec<bfloat16, N>
.- Deprecations:
marray<bool, n>::operator++/--
andaccessor::get_multi_ptr
for non-device accessors have been deprecated. - ESIMD Named Barriers: Moved ESIMD named barrier APIs out of the experimental namespace.
- SYCL Extensions and API Enhancements:
- Implemented the latest revision of
sycl_ext_oneapi_free_function_queries
. - Extended
sycl-ls --verbose
to print detailed device information, including UUIDs and architecture. - Introduced support for compile-time properties in
copy_to
andcopy_from
ESIMD APIs.
- Implemented the latest revision of
- Non-Variadic
printf
Interface: Switchedexperimental::printf
to a non-variadic interface to improve usability when printing float values. - Enhanced ESIMD API Validation: Improved validation for
rdregion
andwrregion
APIs using static assertions on template arguments. - SYCL 2020 Specification Alignment: Updated mutating swizzle operators and scalar conversions for
vec
to align with the SYCL 2020 specification. - Miscellaneous ESIMD Improvements:
- Added support for 1- and 2-byte data types to ESIMD prefetch APIs.
- Enabled
ext_intel_matrix
support for Intel GNR devices. - Introduced new overloads of
load_2d
,store_2d
, andprefetch_2d
ESIMD APIs with compile-time properties. - Added support for group shift algorithms (e.g.,
shift_group_left
,permute_group_by_xor
) for non-uniform groups. - Lifted restrictions on the ESIMD
block_store
API and enhanced theslm_atomic_update
API to supportfsub
andfadd
.
- Graph and Semaphore Support:
- Added support for graph update functionality and external semaphore wait/signal operations with values in the bindless images extension.
- Introduced device-to-device copying of
image_device_handle
.
- Unified Runtime: Removed the Plugin Interface, replacing it with the Unified Runtime, which reduces the number and size of redistributable libraries.
- Performance Improvements: Reduced startup overhead of
libsycl.so
by outlining the SYCL JIT compiler into a standalone library, dynamically loaded on first use.
Bug Fixes
SYCL Compiler
- Fixed a bug where using the
-fsycl-link-targets
flag would inadvertently trigger additional device code linking steps. - Resolved an issue where AOT-compiling for Intel GPUs would pass PVC-specific flags even if the target device was not a PVC.
- Fixed a bug with incorrect file extensions being emitted in AOT compilation when using
--save-temps
. - Fixed an issue where performing separate compilation and linking with
-fsycl-link
resulted in a "number of output files and targets should match in unbundling mode" error during the link step. - Resolved an issue where passing pointers in the generic address space to certain built-in math functions could cause compilation failure.
- Fixed a bug where compiling kernels with different
reqd_work_group_size
attributes using-fsycl-device-code-split=none
could result in a runtime exception about mismatching work-group sizes. - Resolved a bug where using the
reqd_work_group_size
attribute with fewer than three arguments caused a crash. - Addressed issues with
shift_group_[right|left]
,permute_by_xor
, andselect_from_group
algorithms returning invalid values when used with thehalf
data type.
SYCL Library
- Fixed a situation where querying
sycl::ext::oneapi::experimental::info::device
could result in an exception instead of returning an empty vector. - Corrected the
esimd::atan
implementation under the-ffast-math
flag. - Fixed an issue where component devices were not correctly identified as descendants of composite devices when creating a queue.
- Addressed an issue where querying for composite devices could return duplicate entries.
- Fixed bugs in the copy-constructor of the
config_2d_mem_access
ESIMD class, which led to compilation errors. - Resolved an issue where the use of
atomic_ref<T*>
was not detected as using theatomic64
aspect, leading to errors. - Fixed bugs with
ctanh
andcexp
returning incorrect values in edge cases. - Fixed an issue where values passed to the
-Xs
option viabuild_options
were not passed down to the device compiler. - Fixed a compilation error when defining kernels as named functors while using
-fno-sycl-unnamed-lambda
. - Corrected compilation issues with the
-fpreview-breaking-changes
flag caused by conflicts with macros inwindows.h
. - Resolved strict aliasing violations in the implementation of
sycl::vec<sycl::half, N>::operator[]
that caused errors. - Fixed bugs where barriers submitted to a command queue with host tasks ignored them, and improved synchronization of host tasks with barriers.
- Fixed issues where the compiler could emit unsupported SPIR-V instructions for bit-reversal.
- Addressed a bug where default-constructed
local_accessor
arguments could cause runtime errors, especially on Windows and under-O0
optimization on Linux. - Resolved a hang when invalid values were passed to the
ONEAPI_DEVICE_SELECTOR
. - Fixed issues with persistent cache functionality where certain setups would prevent necessary directories from being created.
- Corrected a bug where querying a kernel by name from a kernel bundle could crash the program.
- Fixed an error handling bug where non-blocking pipe operations would mistakenly throw exceptions.
- Addressed compilation issues when using non-uniform group built-ins with
marray
andvec
. - Resolved a bug where memory attributes applied to a
struct
used as a type of adevice_global
variable were ignored. - Added missing
value_type
andvector_t
member type aliases to swizzles. - Fixed shutdown sequence issues when SYCL RT was used in applications or libraries with custom shutdown processes.
- Resolved a crash when calling
event::get_backend()
on a default-constructed event in environments with malformedONEAPI_DEVICE_SELECTOR
. - Fixed a bug where
sycl-ls
with--ignore-device-selectors
would not properly ignore the environment variable. - Corrected memory order capabilities returned by the Native CPU backend.
- Fixed the variadic constructor of
sycl::ext::oneapi::experimental::properties
to match the extension specification. - Fixed build program failures when using ESIMD functions like
load_2d
,store_2d
, orprefetch_2d
. - Resolved a bug where querying free device memory on integrated Intel GPUs returned 0 instead of throwing an exception for unsupported features.
- Addressed a heap buffer overflow in the
sycl_ext_oneapi_kernel_compiler_opencl
extension implementation. - Corrected a bug where the
sycl_ext_oneapi_graph
extension ignored the access mode of accessors, creating unnecessary graph edges. - Fixed issues where graph submissions involving barriers could result in runtime errors or cause resource leaks.
- Addressed performance regressions when kernels without dependencies were submitted to in-order queues.
- Fixed profiling issues in
Level Zero
backend where timestamps could be zeros or incorrect for in-order queues. - Resolved crashes when using multiple queues with immediate command list properties--
immediate_command_list
andno_immediate_command_list.
. - Fixed a bug where
info::kernel_device_specific::work_group_size
would return the device-specific limit, ignoring the kernel on theLevel Zero
backend.
Misc
SYCL Compiler
- Reverted changes previously made on Windows to support a separate compilation scenario where the compilation step was performed without the
-fsycl
flag, but the link step included the-fsycl
flag. This scenario is now considered unsupported, as the compiler does not know which version of the standard library to link during the link step.
API/ABI Breaking Changes in 2025.0
This release is an ABI-breaking release, meaning that any applications built with older versions of the toolchain must be recompiled to run with newer versions of the SYCL runtime library.
- Bumped the major version of the SYCL runtime library to 8.
- Cleaned up the list of symbols exported from the SYCL runtime library by dropping some legacy symbols and hiding others that should not have been exported.
- Updated the ABI of several functions and methods to avoid using
std::string
and other objects in the library interface, allowing SYCL RT to be used in applications built with pre-C++11 ABI. - Changed the
ext_oneapi_copy
API from the experimentalsycl_ext_oneapi_bindless_images
extension to accept const-qualified types for theSrc
parameter.
Several API breaking changes were made, including dropping support for previously deprecated APIs and switching implementations of some classes to a preview implementation. Code modification recommendations for some of these breaking changes can be found here.
- Removed the
sycl::abs
overload taking a floating-point argument. - Removed
sycl::host_ptr
andsycl::device_ptr
. - Removed
queue::discard_or_return
. - Removed
sycl::make_unique_ptr
. - Removed the
use_primary_context
property and methods related to the previously removed host device. - Removed SYCL 1.2.1 exception subclasses, including
runtime_error
,nd_range_error
,invalid_parameter_error
,device_error
, andfeature_not_supported
. - Removed
queue::mem_advice
overload acceptingpi_mem_advice
. - Removed several deprecated ESIMD APIs.
- Removed the non-standard
sycl::id -> sycl::range
conversion operator. - Removed deprecated APIs from the
sycl_ext_oneapi_bindless_images
extension implementation. - Renamed the experimental
destroy_external_semaphore
API from thesycl_ext_oneapi_bindless_images
extension torelease_external_semaphore
. - Replaced the
image_channel_order
field of theimage_descriptor
struct with the number of channels in the experimentalsycl_ext_oneapi_bindless_images
extension. - Enforced restrictions on the first argument of lambdas/functors passed to
parallel_for(range)
andparallel_for(nd_range)
. - Switched the
sycl::vec
implementation to its preview version, which uses a different storage type to fix several strict aliasing rule violations. - Restricted math operations available to
vec<std::byte, N>
to those applicable tostd::byte
. - Switched the
sycl::exception
implementation to its preview version. - Switched math built-ins implementation to use their preview version.
- Switched
bfloat16
implementation to use its preview version. - Switched
sycl::nd_item
implementation to use its preview version. - Enforced a restriction that a buffer's element type must be device copyable.
- Restructured SYCL headers to exclude
<cmath>
and<complex>
. - Dropped support for the
SYCL_DEVICE_FILTER
environment variable. - Updated the
accessor::get_pointer
interface to returnglobal_ptr<value_type>
, which can be const-qualified if the accessor data type is const-qualified or if the accessor is read-only. - Removed deprecated APIs related to
sycl_ext_oneapi_free_function_queries
. - Moved
slm_allocator
ESIMD APIs into the experimental namespace. - Removed the deprecated
usm_system_allocator
aspect. - Removed
get_child_group
API from the experimentalsycl_ext_oneapi_root_group
extension. - Simplified template arguments related to
simd_view
of many ESIMD APIs. - Removed
ESIMD atomic_op::predec
. - Dropped interfaces from revision 1 of the experimental
sycl_ext_oneapi_group_sort
extension. - Changed the return type of
command_graph::begin_recording
andcommand_graph::end_recording
fromvoid
tobool
in the experimentalsycl_ext_oneapi_graph
extension.
Breaking changes were also made to compiler flags:
- Removed the deprecated
-fsycl-link-huge-device-code
,-fsycl-[add|link]-targets
,-foffload-static-lib
,-foffload-whole-static-lib
,-fsycl-disable-range-rounding
,-sycl-std
flags.
SYCL Known Issues
- On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts. This is due to the release of the plugin DLLs racing against the release of static global variables, such as the default context.
- The Intel Graphic Compiler's Vector Compute backend does not support certain optimization levels and often produces incorrect results or crashes. This issue directly affects ESIMD code. As a temporary workaround, optimize ESIMD code even in the affected mode.
- When using the sycl_ext_oneapi_matrix extension, it is important for some devices to use the appropriate settings corresponding to the device that will run the program, particularly for matrix operations using half data type.
- When using queue shortcut functions with in-order queues, dependencies between commands submitted to different queues may be ignored. A workaround is to explicitly call
.wait()
. This issue will be fixed in the next release. In below example, the seocnd kernel will start execution before the first completes its execution.
- C/C++ math built-ins can return incorrect results for some edge-case inputs when called from SYCL kernels.
- To enhance performance on Intel® GPUs using the Unified Runtime Level Zero Adapter, support for driver-optimized in-order lists has been introduced in version 2025.0. However, when running workloads with sycl::property::queue::enable_profiling, some performance overhead from these lists is expected. If this overhead negatively impacts performance, it can be mitigated by disabling the driver in-order lists. To do so, set UR_L0_USE_DRIVER_INORDER_LISTS=0.
-
To ensure compatibility with the Intel® oneAPI DPC++ Compiler on Windows*, which requires OpenCL 3.0, it is essential to address potential issues caused by older versions of
opencl.dll
on your system. If an outdatedopencl.dll
is present in system directories or takes precedence in the library path, it may lead to failures, including SYCL-related issues and crashes in tools like Intel® VTune™ and Intel® Advisor when specific OpenCL 3.0 features are used. The recommended solution is to replace the oldopencl.dll
with the one installed in the DPC++ package. You can do this by copying the neweropencl.dll
from$oneAPI_Install_Folder\compiler\latest\bin
to your system folder. Be sure to back up the originalopencl.dll
in case it's needed for other applications. sycl_ext_oneapi_free_function_kernels
has limitations including:- free function kernels are only supported if defined at file scope
SYCL_EXTERNAL
has to be used alongsideSYCL_EXT_ONEAPI_FUNCTION_PROPERTY
to define free function kernel- compiler won't emit any diagnostics if some restrictions from the extension specification are violated
- arguments of a free function kernels cannot be composite data types like structs or SYCL classes like
accessor
- using
-fsycl-dead-args-optimization
(ON by default) can lead to failures info::kernel::num_args
won't return the right result for free function kernels
New OpenMP Features
- Support for the
-fopenmp-offload-mandatory
compiler flag to omit creation of host-fallback code and emit a runtime error if OpenMP offload to the device fails. - Improved optimization report support for OpenMP constructs.
- Enhanced conversion scheme of nested loop constructs to consider loop trip counts.
- Updates to the
declare
variant for a dispatch construct to include GPUs with the Xe2 architecture when the match clause specifiesdevice={arch(gen)}
. - Support for the
device_type(host|nohost|any)
clause for the target construct. - Inclusion of the
if
clause for the teams construct. - Change of the map-type property to "default," allowing map-type modifiers to be specified without a map-type. For example,
map(always : x)
is equivalent tomap(always, tofrom : x)
. - Support for the Intel extension
ompx_sub_group_size
clause for the target construct to set the SIMD width of the kernel. - Support for the Intel extension
ompx_dyn_cgroup_mem
clause for the target construct, allowing dynamic allocation in SLM for GPU offloading. - Extension of environment variables
OMP_THREAD_LIMIT
,OMP_TEAMS_THREAD_LIMIT
, andOMP_NUM_THREADS
to support abstract names. For example,OMP_THREAD_LIMIT=n_cores
. - Extension of the syntax of the environment variable
OMP_PLACES
to support bound and stride for abstract names. For example,OMP_PLACES=threads(4:2)
. - Host runtime support for the environment variable
OMP_AVAILABLE_DEVICES
. - Extension of the environment variable
OMP_DEFAULT_DEVICE
to support device selection by traits.
Notable OpenMP Fixes
- Fixed a bug where the dispatch construct’s device clause was not updating OpenMP’s default-device-var ICV.
- Resolved an internal compiler error when the declare variant for a dispatch construct did not specify an
adjust_args
clause. - Fixed an optimization bug in OpenMP
for
andsimd
loops with large trip counts. - Corrected a regression where enclosing task constructs inside a teams construct triggered a compiler error message.
- When
thread_limit
is specified for both target and teams, the compiler now correctly chooses their minimum instead of always using the one specified for target. - Fixed an internal compiler error related to the initialization of global variables allocated in GPU’s SLM.
- Addressed a problem in offload runtime where the reference counts of variables mapped using
declare mapper
were not decremented correctly. - Fixed a GPU offload performance issue related to L1 cache being affected by temporary copies of reduction variables.
- Resolved a bug where user-defined reduction variables were not properly constructed or destructed.
OpenMP Known Issues
- Implicit barriers at the end of parallel regions do not act as synchronization points for the tasks associated with target nowait and dispatch nowait constructs. This may result in incorrect results or crashes. A workaround is to use #pragma omp taskwait at the end of parallel region to ensure synchronization of target/dispatch nowait regions, where it would otherwise have happened due to the presence of a parallel region’s implicit barrier.
Other Known Issues and Limitations
- Visual Studio IDE Integration: Users will encounter an error while building the C++ project using 'Intel C++ Compiler 2025' for Win32 platform. Please note that Win32 platform is not supported with 'Intel C++ Compiler 2025' and project should be compiled for x64 platform only. If Win32 platform is selected, an error will be raised that ICX compiler not found.
Hardware Support:
- -march=lunarlake
- -march=graniterapids
Please check here for details about -march usage.
Toolchain Support to Intel Platforms
Granite Rapids | Granite Rapids-D | Lunar Lake |
GCC13.1 | GCC13.1 | GCC14.1 |
Binutils 2.40 | Binutils 2.41 | Binutils 2.42 |
Glibc2.37 | Glibc2.37 | Glibc2.39 |
LLVM 16.0 | LLVM 17.0 | LLVM 18.0 |
ICX 2023.1 | ICX 2023.2 | ICX 2024.0 |
C/C++ Standard
- Intel® oneAPI DPC++/C++ Compiler version 2025.0 supports the C/C++ standards through the Clang 19 front end.
- Initiated support for C++2c, the next release of C++ after C++23, and C2y, the next release of C after C23
- Finalized the implementation of “deducing this” (C++23)
- Relaxed some constexpr restrictions (C++23)
- Implemented the [[assume]] attribute (C++23)
- Completed support for Concepts (C++20)
- Added support for char8_t (C23)
- Implemented the constexpr keyword for object declarations (C23)
- Implemented #embed for embedding binary resources in source (C23)
System Requirements
Additional Documentation
- Get Started with the Intel® oneAPI Toolkit for Linux*
- Get Started with the Intel® oneAPI Toolkit for Windows*
- OneAPI Versioning Schema based on Semantic Versioning
- Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference
- Intel® oneAPI Programming Guide
- SYCL* 2020 Specification Features and DPC++ Language Extensions Supported
-
OpenMP* Features and Extensions Supported in Intel® oneAPI DPC++/C++ Compiler
Notices and Disclaimers
Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
Intel technologies may require enabled hardware, software, or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.