Intel® oneAPI DPC+/C+ Compiler Release Notes

This document summarizes new and changed product features and includes notes about features and problems not described in the product documentation.

Where to Find the Release

Please follow the steps to download the Intel® oneAPI Base Toolkit from the Intel® oneAPI Base Toolkit Download page and follow the installation instructions to install.

The Intel® oneAPI DPC++/C++ Compiler’s integrated support for Altera FPGA has been removed as of the 2025.1 release. Altera® will continue to provide FPGA support through their dedicated FPGA software development tools. Existing customers can continue to use the Intel® oneAPI DPC++/C++ Compiler 2025.0 release which supports FPGA development and is available through Linux* package managers such as APT, YUM/DNF, or Zypper. Additionally, customers with an active support license can access the Intel® oneAPI DPC++/C++ Compiler 2025.0 via their customer support account.
For more information and assistance with transitioning to the Altera development tools, please contact your Altera representative.

oneAPI 2025.2, Compiler Release 2025.2

Major New Features and Enhancements

ThreadSanitizer Support: Extended CPU Thread Sanitizer support to device-side, to detect data races access in both CPU and device code. It supports data race detection within USM memory, SYCL buffers, and device_global memory detection in SYCL and OpenMP C/C++ device code.
MemorySanitizer Support: The device-side Memory Sanitizer is extended to support OpenMP offload to detect uninitialized memory use. It’s also extended to support the detection on local and private memory as an experimental feature.
Hardware Profile Guided Optimization (HWPGO): Enhanced HWPGO to remove pseudo probe description and restore dwarf discriminator for call instruction when using -sample-profile-remove-probe . Enhanced Clang driver to automatically add column info, which is important for HWPGO to generate/load profile file, when using -gdwarf or -fprofile-sample-use.

New Features

C/C++ Compiler:

Added an --lbr-mispredicts mode to llvm-profgen which can use the LBR_INFO branch prediction flag to create a branch mispredict profile instead of samples of a separate branch mispredict event. This approach has the known downside of only collecting mispredicts on taken branches, but it may simplifies sampling requirements in special use cases. Using a separate branch mispredict event as described in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference should still be used in preference to --lbr-mispredicts.
Improved code generation for AMX ISA and inline assembly error checking.
Various improvements in integer and floating point arithmetic.
For the -qopt-report option:
- Memory prefetching will be reported in store/load-only mode (as well as the default mode).
- Memory accesses using omp simd nontemporal will be recorded in the optimization report.
- The vectorizer will add a remark to the optimization report when overriding the unroll factor provided by #pragma unroll. This can happen when the unroll factor is too large for the number of vectorized loop iterations.
Added support for -f[no]-offload-fp32-prec-div and -f[no]-offload-fp32-prec-sqrt compiler flags to control precision of floating-point division and square root.
Native CPU Device:
- Added support for source-based code coverage on Native CPU.

SYCL Compiler:

The sycl_ext_oneapi_kernel_compiler extension specification was updated to accept SYCL as source language.
Initial support for runtime compilation of SYCL code was implemented.
The Level Zero v2 (L0 v2) adapter is a new backend that was added as an experimental feature, with plans to make it the default in 2025.3. This is a redesigned version of the L0 adapter that focuses on maximizing the performance of each queue mode individually. It currently supports immediate in-order mode only. This second version of the adapter significantly reduces host runtime overhead and improves latency of kernel submissions. If you experience any performance or functional issues with this adapter enabled, please report them here, specifying the adapter used.
- It can be enabled with SYCL_UR_USE_LEVEL_ZERO_V2=1.

SYCL Library:

SYCL Graphs:
- Implemented sycl_ext_codeplay_enqueue_native_command extension which allows submitting custom commands for interoperability with native runtimes to graphs built using the sycl_ext_oneapi_graph extension.
- Introduced ability to update host-task nodes in graphs.
SYCL Bindless Images:
- Added support for more kinds of copy operations (image_mem_handle to USM and vice versa, USM to USM, etc.).
- Added support for Vulkan* timeline semaphores.
SYCL Extensions:
- Implemented sycl_khr_default_context extension.
- Introduced and implemented sycl_ext_oneapi_device_image_backend_content extension which allows to query underlying content of a device image for interoperability with with other runtimes (such as OpenCL or Level Zero).
- Introduced and implemented sycl_ext_oneapi_current_device extension which introduces another state into SYCL holding per-thread device.
- Introduced and implemented sycl_ext_oneapi_work_group_static and sycl_ext_oneapi_work_group_scratch_memory extensions that provide different ways of allocating and accessing device local memory (i.e. shared by all work-items within a work-group).
- Introduced and implemented sycl_ext_intel_kernel_queries extension.
- Implemented proposed sycl_ext_intel_event_mode extension.
- Completed implementation of sycl_ext_oneapi_launch_queries extension.
- Completed implementation of the sycl_ext_oneapi_kernel_arg_properties extension by implementing missing unaliased property.
  - It used to be called restrict in previous versions of the extension, but a renaming was done to avoid conflict with C99 restrict type qualifier.
- Introduced and implemented the sycl_ext_oneapi_num_compute_units extension.
Support for core SYCL 2020 functionality:
- Aligned SYCL_LANGUAGE_VERSION macro definition with the recent SYCL 2020 spec change. (See KhronosGroup/SYCL-Docs#704).
Implemented swizzle method for swizzles.
Support for pre-C++11 ABI:
- Many SYCL APIs use std::string as argument or return type and it is known for its ABI being broken by gcc at some point. There are applications which are still built using old, pre-C++11 ABI and in order to support them, the SYCL Runtime should not have std::string (and some other classes) used at the ABI boundry. This effort has been largely completed, but some APIs still sneak up from time to time and being fixed:
  - Added support for print_graph API in pre-C++11 ABI mode.
  - Added support for pipe::get_pipe_name API in pre-C++11 ABI mode.
  - Decided not to support get_backend_info in pre-C++11 ABI mode (at least for now) because there are no queries that could be done through it. Calling it under pre-C++11 ABI mode now causes an error.

OpenMP:

Support the OpenMP 6.0 stripe loop-transformation construct.
The nowait clause in target, target enter/exit data, and target update constructs can now take an optional Boolean argument to conditionally choose between asynchronous or synchronous offloading.
For spir64 devices, a new command-line flag, -fopenmp-target-teams-default-vla-alloc-mode=malloc/wilocal (default: wilocal), was added to allow control over how local copies for variable-length arrays private to teams and distribute constructs are allocated.
Added a new command-line flag, -fopenmp-target-loop-stride=local-size/global-size/one (default: local-size), to tune performance of spir64-offloaded OpenMP loops by controlling their loop stride.
Improved debug info in OpenMP-outlined routines where some variables were previously reported as optimized away by gdb.

Unified Runtime:

Expanded support for the Level Zero adapters to provide binary backwards compatibility with Level Zero drivers with APIs as old as v1.7 of the Level Zero Specification.

Improvements and Bugfixes

SYCL Runtime:

Reduced amount of string copies unnecessarily made by the SYCL Runtime for debug traces even if debug tracing is disabled.
Reduced number of times shared_ptrs are copied.
Reduced amount of memory allocations happening by moving away from using std::function. This should also help with reducing compilation time of SYCL headers.
Reduced amount of memory allocations required for local_accessor.
Reduce amount of memory allocations on "fast" kernel enqueue path and dropped some unnecessary runtime checks.
Made more queue operations go through the "fast" path.

C/C++ and SYCL Compilers:

Introduced a new optimization to eliminate back-to-back barriers when it is safe. Such chain of barriers may occur when multiple group algorithms are used next to each other.
Removed a busy-wait loop from the implementation of -fsycl-max-parallel-link-jobs flag, making it consume less resources when waiting.
Uplifted maximum version of SPIR-V that compiler can generate to 1.5.
Made compiler embed device library needed for bfloat16 support into the application (if it is used). This change will allow us to reduce the size of redistributable SYCL Runtime package by eliminating some files from it.
Added a compiler warning diagnostic about undefined SYCL_EXTERNAL functions used in a module to help catch linking errors earlier.
Addressed issue where the compiler would generate invalid SPIR-V if kernel used arguments of boolean type.
Switched to use native bfloat16 implementation for devices that support it, as well as fixed a bug where native implementation won't be used if multiple AOT targets are specified.
Aligned behavior of -Wimplicit-float-conversion with the upstream clang for non-SYCL language modes.
Improved check for unsupported data types to actually rely on target information instead of hardcoded knowledge.
Fixed where compilation with -mlong-double-64 would still result in error that a 128-bit double is not supported by a target.
Fixed a bug where linking static libraries with SYCL code in them using -l:libname.a spelling would ignore device code from those libraries.
Fixed a bug where having a pure virtual function during device compilation would cause unresolved symbol errors emitted by device compiler on Windows.
Fixed a bug where having two kernels (one annotated with reqd_work_group_size attribute/property and another without it) together with -fsycl-device-code-split=off would cause runtime error about mismatched work-group size.
Native CPU device:
- Improved support for dynamic_address_cast on Native CPU device.
- Improved performance of Native CPU device: less memory allocations and thread launches.
- Fixed a bug where submitting the same kernel multiple times at about the same time with different argument would lead to incorrect arguments being used.
- Fixed compiler crashes when building applications that use atomics.
- Fixed segfaults happening in SYCL CTS tests for async_work_group_copy API.
- Improved support for sub-groups by updating version of oneAPI Construction Kit.
Sanitizers:
- Reduced the frequency of shadow memory reallocation to reduce memory overhead and improve runtime performance.
- Fixed ASAN throwing an exception with UR_RESULT_ERROR_INVALID_ARGUMENT when detecting incorect memory free operation.
Explicit SIMD extension:
- Extended sycl_ext_intel_esimd extension specification and implementation with new queries to check support for 2d load/store/prefetch operations.
- Fixed miscompilations of ESIMD functions under high optimization levels when compiler performs aggressive inlining.

SYCL Library:

Made group_[load|store] functions to use native built-ins when used with vectors of 16 shorts.
Extended support for shared libraries to make it work with kernel bundles as well.
Added tracing (through SYCL_UR_TRACE) for SYCL_DEVICE_ALLOWLIST decisions for better discoverability of the feature.
Aligned implementation of info::execution_capability query with the recent SYCL 2020 specification change made in KhronosGroup/SYCL-Docs#625.
Fixed compilation issues with group functions like select_from_group with certain data types (pointers, marray<bfloat16, 4>, for example).
Implemented persistent cache eviction.
Enforced constraints documented by the sycl_ext_oneapi_reduction_properties extension.
Clarified and enforced properties constraints in the sycl_ext_oneapi_group_load_store extension specification and implementation.
Implemented properties validation to kernel bundle and graph APIs.
Updated the sycl_ext_oneapi_in_order_queue_events extension specification and implementation to make event returned by ext_oneapi_get_last_event optional for queues where no work had been submitted.
Update the sycl_ext_oneapi_group_load_store extension specification and implementation to accept the alignment property in group load/store built-in functions to allow for more optimized implementation.
Lifted restriction that host APIs from sycl_ext_oneapi_free_function_kernels had to be guarded by #ifndef __SYCL_DEVICE_ONLY__.
Fixed potential resource leaks in online compiler extension.
Fixed an issue where known_identity<min|max> would return incorrect values with the -ffast-math flag.
Fixed a UB in implementation of device_global which sometimes led to spurious results.
Fixed a static_assert failure in SYCL headers when an application is built with -funsigned-char.
Resolved an issue caused memory operations enqueued through sycl_ext_oneapi_enqueue_functions extension to break functionality of sycl_ext_oneapi_enqueue_barrier extension.
Fixed a bug where compiling with -D_FORTIFY_SOURCE=2 would cause errors from device compilers at JIT stage (or during AOT compilation) about undefined __memcpy_chk symbol.
Fixed an incorrect result of std::exp(std::complex) in some corner cases.
Fixed a crash happening when you launch a kernel that is defined in both the application and a dlopen-ed shared library after that library was unloaded through dlclose.
Fixed a memory leak happening when a kernel submission failed.
Fixed a bug where using vec::operator[] would cause compilation issues on Windows when an application is built using clang.exe and _DEBUG macro is set.
Aligned joint_matrix_apply implementation with the specification change to be able to modify both matrices.
Bindless Images:
- Added support for ext_oneapi_bindless_sampled_image_fetch_1d, ext_oneapi_bindless_sampled_image_fetch_1d_usm, ext_oneapi_bindless_sampled_image_fetch_2d, ext_oneapi_bindless_sampled_image_fetch_2d_usm and ext_oneapi_bindless_sampled_image_fetch_3d aspects on Level Zero backend.
- Fixed return types of image extent queries to match the specification.
- Clarified the types of supported USM memory in the extension specification.
- Fixed compiler crash caused by the use of anisotropic sampling operations on 3D mipmaps, due to the intrinsic being generated with an incorrect number of LOD gradient parameters.
SYCL Graphs:
- Reimplemented topological sort algorithm used to determine graph nodes execution order to avoid issues with overflowing stack on huge graphs and improve performance.
- Documented kernel binary update feature which allows to update kernel nodes in graphs.
- Fixed race condition in command_graph node queries.
- Fixed the issue with not all graph-related classes fully implementing common reference semantics.
- Made ext_oneapi_weak_object extension work with graph objects.
- Fixed a bug where using local_accessor or work_group_memory objects as part of graph update would function incorrectly on non-SYCL backends.

SYCLcompat Library:

Introduced new set of group utility functions and classes aimed to reduce the gap between syclcompat and dpct namespaces.
Fixed compare_mask putting results in the wrong 2-byte segment of 4-byte output.
Optimized implementation of permute_sub_group_by_xor for the case when logical_sub_group_size is 32.
Added new function ternary_logic_op to perform bitwise logical operations on three input values based on the specified 8-bit truth table.
Fixed issues with multiple vectorized operations returning wrong results.

OpenMP:

Fixed a bug affecting usage of target offload in lambda.
Implemented a more robust mechanism to detect OpenMP loops that were optimized away to better differentiate them from malformed loops.
Corrected the handling of OpenMP simd loops that were optimized away, previously crashing the compiler in some cases.
Fixed an issue in the collapsing of OpenMP simd loops at -O0 that caused incorrect vectorization.
Fixed incorrect OMPT callbacks for teams distribute parallel for constructs.
Fixed a lastprivate issue in task and target regions causing spurious function arguments to be created for outer target/task regions.
Improved emission of -qopt-report remarks about how OpenMP data-sharing clauses were optimized.
Fixed an incorrect behavior of tile construct nested inside a target teams distribute parallel for collapse(N) construct.
The flush construct is no longer ignored for OpenMP spir64 offload.

Issues with 3rd-party host compilers:

Fixed compilation issue with get_vec_idx internal helper with MSVC as host compiler.
Fixed missing #include when building with GCC 13 as host compiler.
Fixed compilation issue with joint matrix extension with MSVC from Visual Studio 2019 as host compiler.

Misc:

Removed testing on FPGA Emulator as a step towards our strategy to drop FPGA support. Starting with this release there is no guarantee that FPGA-specific features continue to work.
Docker images containing nightly builds are not provided anymore, but we still provide Dockerfiles so you can build those images yourself.
Fixed OCL CPU Runtime installation script leaving incorrect permissions on a system folder.

Known Issues and Limitations

SYCL:

SYCL headers use unreserved identifiers which sometimes cause clashes with user-provided macro definitions. Known identifiers include: G, VL.
When using sycl_ext_oneapi_matrix extension it is important for some devices to use the sm version (Compute Capability) corresponding to the device that will run the program. This particularly affects matrix operations using half data type.
C/C++ math built-ins (like exp or tanh) can return incorrect results on Windows for some edge-case input. The problems have been fixed in the SYCL implementation, and the remaining issues are thought to be in MSVC.
There are known issues and limitations in virtual functions functionality, such as:
- Optional kernel features handling implementation is not complete yet.
- AOT support is not complete yet.
- A virtual function definition and definitions of all kernels using it must be in the same translation unit. Please refer to sycl/test-e2e/VirtualFunctions to see the list of working and non-working examples.

OpenMP:

Some OpenMP spir64 offload programs compiled with -O0 -g may result in a segfault failure at runtime. Workaround: compile the program without -g, or compile it with -O2 -g.
For OpenMP spir64 offload, the memory-order and memscope clauses of the flush construct are silently ignored and the default (more conservative but correct) values of seq_cst/device are used for now.

Unified Runtime:

On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts on Windows. This is because on Windows the release of the plugin DLLs races against the release of static global variables (like the default context).

Intel® Graphics Compiler:

The Vector Compute backend does not support -O0 code and often gets miscompiles, producing wrong answers and crashes. This issue directly affects ESIMD code at -O0. As a temporary workaround, the ESIMD code is optimized even in -O0 mode.

CPATH to C_INCLUDE_PATH and CPLUS_INCLUDE_PATH Transition:

The oneAPI environment setup scripts have historically added paths for C, C++, and Fortran header files to the CPATH environment variable. These scripts are being transitioned to add relevant paths to the C_INCLUDE_PATH and CPLUS_INCLUDE_PATH environment variables instead of CPATH. This transition is intended to isolate Intel provided header files from the effects of compiler options used by customers to request or suppress compiler warnings in their own source files.

Paths present in the CPATH environment variable specify user include paths. By default, compiler warnings are issued for potentially problematic source code in header files found via these paths subject to use of options like -Wall, and most other options that begin with -W. Paths present in the C_INCLUDE_PATH and CPLUS_INCLUDE_PATH environment variables specify system include paths. By default, compiler warnings are suppressed for header files found via these paths (warnings in system header files can be enabled with the -Wsystem-headers option).

For most customers, this transition will be transparent with the only observed difference being that warnings are less likely to be issued for source code in Intel provided header files. However, there are some edge cases that could cause other differences in behavior for some customers.

Paths present in the CPATH environment variable are searched after paths specified by the -I option, but before paths present in the C_INCLUDE_PATH and CPLUS_INCLUDE_PATH environment variables, paths specified by the -isystem option, and paths implicitly added by the compiler. Customers that have been adding their own include paths to the end of the CPATH environment variable (after paths historically added by the oneAPI environment scripts) will now find that their include paths will be searched before include paths for Intel-provided header files. This could result in a customer provided header file with the same name as an Intel provided header file now being found first where as previously the Intel-provided header file would have been found first.

If the same path is present in both the CPATH and C_INCLUDE_PATH or CPLUS_INCLUDE_PATH environment variables, or is specified by the -isystem option or implicitly added by the compiler, then the matching path in the CPATH environment variable will be ignored. Customers that, perhaps inadvertently, add a path to Intel-provided header files to the CPATH environment variable that match a path added to the C_INCLUDE_PATH or CPLUS_INCLUDE_PATH environment variables by the oneAPI environment setup scripts, will find their addition to CPATH ignored. Since the oneAPI environment setup scripts would have previously added paths to CPATH, this could result in different include path search orders and thus different header files being found when a customer-provided header file has the same name as one provided by Intel.

Both of the above scenarios depend on header files with the same name being present in multiple include paths. Observable differences are only likely to occur if those header files have different contents or if they use the #include_next directive.

API/ABI Breaking Changes

Removed support for FPGA-related options as part of our strategy to drop FPGA support.
Removed options: -fintelfpga, -fsycl-targets=spir64_fpga[-unknown-unknown], -fsycl-link=early|image, -Xsycl-target-backend=spir64_fpga "opt", -reuse-exe=arg, and -fsycl-help=fpga.
Removed experimental sycl_ext_intel_oneapi_compiler extension support. Its APIs have been marked as deprecated for a while and sycl_ext_oneapi_kernel_compiler extension should be used instead.

Deprecations

Deprecated sycl_ext_oneapi_default_context extension in favor of sycl_khr_default_context extension.
Deprecated -fsycl-fp32-prec-sqrt compiler flag in favor of -foffload-fp32-prec-sqrt flag.
Deprecated overloads of single_task and parallel_for APIs that accept properties which used to be a part of sycl_ext_oneapi_kernel_properties extension. sycl_ext_oneapi_enqueue_functions extension should be used instead.
Deprecated overloads were completely removed from the extension specification.
Deprecated current implementation of get_backend_info API. The SYCL 2020 specification currently does not document anything that could be queried through it and therefore existing queries supported through it are deprecated to avoid possible confusion.

Compiler Patch Release 2025.1.1

A known issue when compiling SYCL code on Windows using CMake with the 2025.1.0 version of the compiler has been fixed in this patch release. Please use 2025.1.1 or just replace IntelSYCLconfig.cmake from 2025.1 with 2025.1.1 for the latest fix for “CMake Error: Could NOT find IntelSYCL (missing: SYCL_LIBRARY)”.
Fixed the issue that led to the launch failure of an application built with ‘-x’ option specifying the target platform(which is TigerLake or above) on an OS without CET support.
Updated encodings of VCOMX*/VUCOMX* and VGETEXPPBF16 instructions according to AVX10.2 spec rev. 2. This may require an update to SDE 9.53 or later when binary is built with -mavx10.2 or with options implying -mavx10.2.

oneAPI 2025.1, Compiler Release 2025.1

Major New Features and Enhancements:

MemorySanitizer Support: Extended CPU Memory Sanitizer support to device-side, including GPUs facilitating detection and troubleshooting of memory issues in both CPU and device code. This improves application reliability by ensuring comprehensive memory error checking across platforms.

ccache* Integration: Compiler now supports ccache* to significantly speed up build times for C++ and SYCL codes. By caching previous compilations and reusing them, developers can experience faster iterations and more efficient workflows.

Floating Point Accuracy Controls: User control over accuracy of floating-point operations and library calls is now extended to the device code.

SYCL Interoperability with Graphics APIs:Added initial support for SYCL interoperability with DirectX* 12 and Vulkan*, which enables developers to build efficient visual compute, media processing, and rendering applications on Intel® Graphics. For details on image-formats and platform support, refer to SYCL Interoperability Limited Support

New Features

SYCL Compiler:

Implemented initial support for SYCL Virtual Functions support with the intent to gather initial feedback from users. Please refer to the Known Issues section for details on current limitations of this feature.
Dynamic linking of device code is now supported via -fsycl-allow-device-image-dependencies command line option. This feature allows device code to be exported via a Windows DLL and includes support for dynamic linking of AOT compiled images for the OpenCL GPU backend.
Enhancements to free function kernel support include the addition of structs as kernel arguments and the inclusion of work group memory as a kernel parameter.
Device sanitizer now supports invalid kernel argument detection, and address sanitizer has been enhanced to detect null pointers.
A mechanism has been implemented to lift restrictions on SYCL device code in constant expressions via the option -fsycl-allow-all-features-in-constexpr.

SYCL Library:

Enhanced SYCL Graph functionality with implicit recording mechanism and dynamic command-groups, and a new graph enqueue function, execute_graph, in accordance with the updated sycl_ext_oneapi_graph extension.
Added support for Intel® Arc™ B series and Intel® Core Ultra Series device architectures.
Added additional devices with Joint Matrix support: Battlemage, Lunar Lake and Arrow Lake H. Added more types and shapes to PVC combinations for SYCL Matrix.
New ESIMD features include mask compressed ESIMD load/store API, support for root group barriers, addition of clamp API for ESIMD, and support for the ext::intel::experimental::esimd::frem function
Implemented the following set of extensions:
- Added support for sycl_ext_oneapi_enqueue_functions to SYCL Graph.
- Implemented sycl_ext_oneapi_raw_kernel_arg extension.
- Added initial support for sycl_ext_oneapi_atomic16 extension.
- Implemented sycl_ext_oneapi_get_kernel_info extension.
- Implemented sycl_ext_oneapi_work_group_memory extension.
- Implemented sycl_ext_oneapi_reduction_properties extension.

Unified Runtime:

To support NPU/GPU device coexistence in the same application, support for the new L0 init zeInitDrivers has been added in 2025.1. This enables for SYCL and OpenVINO™ and other NPU device libraries to coexist in the same application utilizing GPU + NPU functionality simultaneously.
Updated the Mutable Command List support in the UR L0 Adapter to utilize the Level Zero Specification’s extension functionality instead of the driver experimental.
For improved performance, usage of immediate command lists is the default behavior on Linux in the UR L0 adapter for Intel® Arc™ Series GPUs along with Intel® Core Ultra 200v Series.
On Windows, usage of immediate command lists is the default behavior on Intel® Arc™ B Series GPUs along with Intel® Core Ultra 200v Series.

OpenMP:

Support the OMP6.0 interchange loop-transformation construct and the permutation clause.
Emit opt-report remarks for load/store of variables listed in the nontemporal clause of the simd construct.

Misc:

Added several enhancements in sanitizer support:
- New Numerical Stability Sanitizer (NSAN) for C++ Code adopted from community contributions
- Memory Sanitizer extended to support SYCL and OpenMP C/C++ Device Code (only USM device allocations)
- Major improvements to Address Sanitizer for Device Code – invalid kernel argument detection, null-pointer detection, memory leak detection, private memory support for openMP Offload
For C/C++ compilations on Linux, added support for -q[no-]unknown-option-as-warning option which provides the ability to handle unknown options on the command line with a warning diagnostic. The default behavior is to error on unknown options.
The compiler's code coverage tool has been enhanced to offer detailed analysis and comprehensive HTML reports like ICC to identify tested and untested code sections.

Improvements

SYCL Compiler:

Removed the need for the SYCL_EXTERNAL attribute in free function kernel definitions.
Enhanced compilation time for ESIMD kernels.
Disabled attribute propagation from SYCL 1.2.1 and removed remaining SYCL 2017/1.2.1 compatibility elements, including -Wsycl-strict diagnostics.
Ensured compiler-generated integration headers/footers are warning-free to prevent -Werror build failures, especially with third-party host compilers.
Built basic functionality of the SYCL joint_matrix extension on the SPV_KHR_cooperative_matrix extension.
Expanded supported aspects for the CPU AOT target.
Added diagnostics for incorrect arguments with -fsycl-device-obj.
Introduced a warning for applying kernel-only attributes to non-kernel functions.
Fixed misleading diagnostics for non-external functions/variables when using attributes like [[sycl_device]] or [[intel::device_indirectly_callable]].
Updated -fsycl-link=image to package host objects like -fsycl-link=early, ensuring proper linking, especially on Windows.
Added extra optimization passes in the Native CPU pipeline.
Updated -fsycl-host-compiler to use only user-provided hints (e.g., PATH) for locating the specified compiler, avoiding incorrect binary usage.
Deprecated [[intel::reqd_sub_group_size]]; use the SYCL 2020 spelling with the sycl:: namespace.
Disabled ITT annotations in device code by default to reduce code size.
Enabled floating-point atomics via atomicrmw instructions for Native CPU.
Enabled nonsemantic debug info by default to improve the debugging experience.

SYCL Library:

Added binary caching support to the kernel_compiler extension.
Enabled a check on Linux systems to inform users to use SYCL_UR_TRACE instead of SYCL_PI_TRACE.
Improved GDB printers for SYCL types and values.
Renamed ur to ur.call in XPTI traces.
Refactored the XPTI framework to use 128-bit keys for collision elimination and added support for 64-bit universal IDs for backward compatibility.
Made repeated calls to command_graph::begin_recording an error.
Aligned sycl_ext_oneapi_address_cast implementation with the specification.
Optimized the atomic_ref constructor for the SPIR-V target.
Enhanced handling of compile-time properties.
Refined parsing of Device Sanitizer options via the UR_LAYER_ASAN_OPTIONS environment variable.
Improved detection of conflicts between kernel properties related to work group size.
Enhanced framework/app software layers to provide code locations for SYCL-generated XPTI events.
Improved performance of the rsqrt ESIMD API.
Added property validation to core SYCL object constructors.
Deprecated __SYCL_USE_VARIADIC_SPIRV_OCL_PRINTF__.
Enforced data type restrictions in marray and vec.
Improved sycl_ext_oneapi_address_cast by changing "dynamic" behavior to "static" where allowed.
Enhanced sycl-ls to report ext::intel::info::device::device_id.
Added no-op implementations for runtime APIs for Native CPU, as programs are compiled offline.
Updated the local_accessor GDB printer to display elements with a decorated pointer and address space qualifier.
Improved ESIMD copy_to() and copy_from() to use block_load/block_store for better performance.
The OpenCL adapter now uses the local work size set in program IL when not specified in clEnqueueNDRangeKernel.
Improved OpenCL adapter to support older ICD loaders.
Repurposed SYCL_CACHE_TRACE for fine-grained tracing of all SYCL program caches.
Enabled Sysman API by default in the L0 adapter, removing the need to set ZES_ENABLE_SYSMAN.
Allowed copy-construction of device_global without the device_image_scope property.
Improved UR libraries to avoid unnecessary overhead if nothing is subscribed to the ur.call XPTI call stream.
Refactored copy engine usage checks in the L0 adapter for better performance.
Implemented tracing for in-memory kernel and program cache.
Improved error handling in the SYCL RT command enqueue function to provide clearer exceptions.
Added address sanitizer AOT libraries for various GPU/CPU targets and renamed the device sanitizer library to libsycl-asan.
Undeprecated legacy multi_ptr as it is no longer deprecated in the SYCL specification.
Deprecated info::device::atomic64; use sycl::aspect::atomic64 instead.
Removed build options from the fast kernel cache key to reduce lookup overhead.
Improved OpenCL adapter to use the extension version of clGetKernelSubGroupInfo when necessary.
Updated SYCL graph design documentation with a new command-list enqueue path.
Enhanced online_compiler::compile to support pre-C++11 ABI.

Misc:

Support for OpenCL __attribute__((blocking)) has been removed. This allows enabling support for the [[clang::nonblocking]], [[clang::nonallocating]], [[clang::blocking]] and [[clang::allocating]] function type attributes, as well as their GNU-style variants.
For the functions which return structs by value, ABI requires passing a special parameter which contains the address of memory where that returned struct should be placed. This parameter is implicit, users don't see it and can't provide any vector specification for it. Support for allowing such functions and emitting vector-variants attribute for them has been added.

Bug Fixes

SYCL:

Resolved false positives in Device Sanitizer by unpoisoning local/private shadow memory before function return.
Added ext_oneapi_ballot_group aspect to the spir64_x86_64 target, supported since OpenCL CPU 2024.2.
Restored kernel instantiations on the host for debugger compatibility with SYCL code.
Fixed local scope module variables for Native CPU.
Corrected device libraries requirement mask for the SPIRV target to ensure proper linking.
Suppressed system errors when loading adapters on Windows.
Disabled internalization of kernels for dynamic linking to ensure visibility.
Fixed a use-after-free bug in the clang-linker-wrapper.
Enforced SYCL headers to be included with #include <sycl/sycl.hpp>.
Fixed device module splitting for ESIMD related to using assert in user code.
Correctly assigned architectures to their respective targets with -fsycl-targets.
Fixed devicelib handling when linking multiple images.
Matched -device_options with -device for AOT GPU.
Stopped passing HEX values to -device_options due to IGC limitations.
Fixed crash with an empty -fsycl-targets option.
Set calling convention to spir_func for SPIRV function calls related to specialization constants and hierarchical parallelism.
Added a workaround for SPIRV AccessChain usage in SYCL matrix operations.
Addressed code splitting issues with FPGA archives.
Fixed parsing of device values in backend target options.
Limited Device Sanitizer to report only one error per kernel instance.
Resolved issues with vector shuffle built-ins on the NativeCPU backend.
Fixed incorrect symbolizer output for shared libraries in Device Sanitizer.
Disabled Address Sanitizer on modules with ESIMD to prevent excessive kernel code size.
Fixed iterator invalidation issue in the SYCL Joint Matrix pass on Windows debug builds.
Corrected integration footer for device_global with explicit template specialization.

OpenMP:

Fixed a bug related to mapping of variable-length arrays where the size is known at compile time.
Fixed a performance issue when an unroll construct is in a loop nest bound to an outer parallel for construct.
Fixed potential unsafe vectorization of some loops that are bound to parallel for.
Improved performance of some collapsed loops by choosing a more optimal data size for the collapsed loop IV.
Improved offload performance of some target teams distribute parallel for reduction loops with constant trip count.
Fixed flaky fails due to race conditions when using dispatch construct with SYCL interop objects.
Fixed a bug where the nogroup clause of a taskloop construct was not honored.
Fixed a crash when running certain target nowait (asynchronous offload) kernels containing loops.
Fixed an ICE in some cases where a tile construct is bound to the same loop bound to an outer for construct.
Fixed an issue where the device clause was not honored for the dispatch construct.
Improved performance of some low-trip-count loops bound to the loop construt.
Fixed a bug where some for or simd loops with trip counts > MAX_INT were not being transformed correctly.
GPU dispatch now supports “Battlemage” architecture integrated (Lunar Lake) and discrete graphics (Intel® Arc™ B-Series graphics cards) parts that utilize the Xe2 microarchitecture.

Known Issues & Limitations

SYCL:

Following are the details on the limited support of SYCL interoperability:
- Platform Support: Intel® Arc™ B series Graphics(Battlemage), Intel® Iris® Xe Graphics (DG2), Intel® Core™ Ultra Processors (Lunarlake and Meterolake).
- Image channels: 1, 2 and 4-channel
- Image formats: VK_FORMAT_R16G16_SFLOAT, VK_FORMAT_R32_SFLOAT, VK_FORMAT_R16G16B16A16_SFLOAT, VK_FORMAT_R32G32_SFLOAT, VK_FORMAT_R16_SFLOAT
- Known issues
  - On Intel® Iris® Xe Graphics and Intel® Core™ Ultra Series 1 (Meteorlake) Processors currently there is a known issue with compressed 2D and 3D images for 1,2 and 4 channels that are greater than 64KB in size, where if users try to export images from other APIs and import into SYCL for manipulation, it leads to data mismatches once SYCL operates (performs computations) on the images. This issue found in GPU driver version 2507.12 will be addressed in an upcoming GPU driver release.
There is a known issue when compiling SYCL code on Windows using CMake with the 2025.1 version of the compiler which can cause errors like

CMake Error at C:/Program Files/CMake/share/cmake-3.31/Modules/FindPackageHandleStandardArgs.cmake:233 (message):
Could NOT find IntelSYCL (missing: SYCL_LIBRARY)
Reason given by package: SYCL: It appears that the C:/Program Files (x86)/Intel/oneAPI/compiler/latest/bin/icx.exe does not support SYCL
Workaround: C:\Program Files (x86)\Intel\oneAPI\compiler\latest\lib\cmake\IntelSYCL\IntelSYCLConfig.cmake file needs to be updated with the following two changes:
- Line 332: Update set(sycl_lib_suffix "7") with set(sycl_lib_suffix "8")
- Line 365: Replace set(SYCL_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SYCL_FLAGS}") with
  list(JOIN SYCL_FLAGS " " SYCL_FLAGS_STRING)
  message(DEBUG "SYCL_FLAGS_STRING: ${SYCL_FLAGS_STRING}")
  set(SYCL_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SYCL_FLAGS_STRING}")
  
  A fix for this issue is now available in the 2025.1.1 compiler release.
On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts on Windows. This is because on Windows the release of the plugin DLLs races against the release of static global variables (like the default context).
Intel Graphic Compiler's Vector Compute backend does not support O0 code and often gets miscompiled, produces wrong answers and crashes. This issue directly affects ESIMD code at O0. As a temporary workaround, we have optimize ESIMD code even in O0 mode.
C/C++ math built-ins (like exp or tanh) can return incorrect results on Windows for some edge-case input. The problems have been fixed in the SYCL implementation, and the remaining issues are thought to be in MSVC.
[new] There are known issues and limitations in virtual functions functionality, such as:
- Optional kernel features handling implementation is not complete yet.
- AOT support is not complete yet.
- A virtual function definition and definitions of all kernels using it must be in the same translation unit. Please refer to sycl/test-e2e/VirtualFunctions to see the list of working and non-working examples.
When running synthetic benchmarks, it is possible for performance on Intel's Flex and Arc A Series GPUs to be less than previously measured when running with the new defaults using Immediate Command Lists in SYCL/Unified Runtime L0 Adapter. To mitigate this issue on those workloads, one can regain the lost performance by creating the SYCL queue with the `no_immediate_command_list` queue property or by setting the environment variable UR_L0_USE_IMMEDIATE_COMMANDLISTS=0. These will enforce the usage of command batching in the Unified Runtime L0 adapter which may improve the performance of those workloads.

OpenMP:

Offload code with reduction across teams may result in incorrect results or even hangs on some platforms with integrated GPUs.
ICX and ICPX ignore "#pragma omp flush" for spir64 offload compilation.

Other Known Issues:

The switch from a static to a dynamic sanitizer runtime in 2025.1 compiler has led to runtime crashes due to the missing clang_rt.asan_dynamic-x86_64.dll. The workaround is to add C:\Program Files (x86)\Intel\oneAPI\compiler\2025.1\lib\clang\20\lib\windows to the PATH environment variable.

API/ABI Breaking Changes

Updated experimental sycl_ext_oneapi_bindless_images extension documentation and implementation: interoperability structs/funcs were renamed to external keyword over interop.
Removed sycl::ext::oneapi::experimental::is_property_key.
Removed some OSUtil::* funcs from ABI under -fpreview-breaking-changes, these are used internally in the DSO and don't need to be exposed outside.
Made ext_oneapi_cl_profile implementation to be ABI-neutral.
Fixed SYCL Graph API to be ABI-neutral to avoid dual-abi issues on Linux.

oneAPI 2025.0.1, Compiler Patch Release 2025.0.4

This patch release of the compiler consists of various bug fixes and quality improvements.

Deprecation Notice: The Intel® oneAPI DPC++/C++ Compiler integrated support for Altera FPGA is now deprecated and will be removed with the compiler's release in the first quarter of 2025. Altera* will continue to provide FPGA support through their dedicated FPGA software development tools. Existing customers can continue to use the Intel® oneAPI DPC++/C++ Compiler 2025.0 release which supports FPGA development and is available through Linux* package managers such as APT, YUM/DNF, or Zypper. Additionally, customers with an active support license can access the Intel® oneAPI DPC++/C++ Compiler 2025.0 via their customer support account.

For more information and assistance with transitioning to the Altera development tools, please contact your Altera representative.

Compiler Patch Release 2025.0.1

This patch release consists of the following new features, improvements and bug fixes:

Added functionality to compress device images during compilation and decompress them at runtime as needed. More details on this feature and case studies can be found at C++ with SYCL Device Image Compression.
The Unified Runtime Level Zero Adapter enabled the usage of Level Zero System Management functionality by default.
Created the launch API to SYCL Compat API library.
ABI neutral version of modifiable_command_graph::print_graph has been enabled under preview option and will be enabled by default in the next major release.
Fixed "-ipp" / "-qipp" switch linkage error.
Added the following missing option values in IDE for -x, -ax, /arch, /Qx, /Qax flags:
- [-x|-ax][SIERRAFOREST|GRANDRIDGE|GRANITERAPIDS|EMERALDRAPIDS|GRANITERAPIDS-D|ARROWLAKE|ARROWLAKE-S|LUNARLAKE|PANTHERLAKE|CLEARWATERFOREST] // Linux
  [/arch:|/Qx|/Qax][SIERRAFOREST|GRANDRIDGE|GRANITERAPIDS|EMERALDRAPIDS|GRANITERAPIDS-D|ARROWLAKE|ARROWLAKE-S|LUNARLAKE|PANTHERLAKE|CLEARWATERFOREST] // Windows
SYCLcompat introduces a new experimental launch API which allows the user to pass kernel properties, launch properties, and required local memory size in a launch_policy struct. These requirements are passed down to the SYCL runtime to define how the kernel is launched.
Other small usability improvements

oneAPI 2025.0, Compiler Release 2025.0

Major Enhancements and New Features

New Features:

Hardware Enablement: Optimized for new Intel hardware including EMR, GNR, BMG, and LNL, with features such as cache hints and new data types for AI applications, delivering improved efficiency and computing power.
Bindless Textures Support: Implemented Bindless Textures for Intel GPUs (DG2, Arc), allowing dynamic texture usage at runtime without compile-time knowledge, enabling enhanced performance and scalability.

Performance Tuning and Enhancements:

AI and HPC Optimization: Tuned performance for AI frameworks and HPC applications.
OpenMP Enhancements: Early support for OpenMP 6.0 features, including the DEVICE_TYPE clause for TARGET construct and mandatory offloading support. Also, fixed the OpenMP loop rotation issue. Checkout Advanced OpenMP* Device Offload with Intel® Compilers for more details.
Compiler Reports: Enhanced opt-report for better user experience, now providing detailed information on OpenMP offloading and integrating with the open-source optimization remark framework. Details on recent enhancements can be found at Develop Highly Optimized Applications Faster with Compiler Optimization Reports
Sanitizers for Device Code: Device code now supports LLVM sanitizers to help detect and resolve issues during development. It includes a compiler instrumentation module and runtime support, allowing it to detect issues such as out-of-bounds memory access on USM, SYCL buffers, local memory, and device globals, as well as bad-free, use-after-free, bad context, and more. In this release, PVC GPUs and CPUs are supported on Linux OS. More details on how and when to use sanitizers can be found at Find Bugs Quickly Using Sanitizers with the Intel® oneAPI DPC++/C++ Compiler.
Comprehensive Performance Insights: Upgraded optimization reports now cover SYCL, OpenMP, and AOT compilation, offering developers deeper insights into application performance.
Hardware Profile Guided Optimization (HWPGO): Key improvements include enhanced profile propagation for better accuracy, additional profile-driven optimizations to further boost performance, and early support for "pseudo probes" on Windows as an alternative to DWARF for profiling. Additionally, HWPGO has introduced selective function outlining, allowing for specific functions to be optimized based on profiling data, further enhancing runtime efficiency.

New Features

SYCL Compiler:

SYCL Offload Model: Introduced a new SYCL offload driver mechanism with --offload-new-driver to improve infrastructure for better link times by reducing I/O and external processes.
Range Rounding Control: Added -fsycl-range-rounding option for managing range rounding, including forcing full rounding to reduce binary size. Additionally, the experimental -fsycl-exp-range-rounding option performs rounding across all dimensions.
Double Type Emulation: Added -fsycl-fp64-conv-emu option for partial emulation of double data types on Intel GPUs.
Dynamic Linking: Initial support added for dynamic linking, though some features like kernel_bundle API and AOT mode are not yet supported.

SYCL Library:

Extensions: Implemented multiple extensions, including sycl_ext_oneapi_prod, sycl_ext_oneapi_profiling_tag, sycl_ext_oneapi_forward_progress, sycl_ext_oneapi_private_alloca, syclext_codeplay_enqueue_native_command, and sycl_ext_oneapi_enqueue_functions.
Group Load/Store: Added support for sycl_ext_oneapi_group_load_store, enabling native hardware block read/write capabilities where applicable.
Free Function Kernels: Initial support for sycl_ext_oneapi_free_function_kernels extension, with known limitations around argument types and diagnostics.
Fused Multiply-Add (FMA): Added experimental ESIMD function fma which results in a guaranteed fused multiply-add operation performed.
Improvedsycl_ext_oneapi_group_sortextension: Updated implementation of sycl_ext_oneapi_group_sort extension to match revision 2 of the specification. Previous version 1 is not available anymore and some code changes may be required.

Improvements

SYCL Compiler

Improved Compilation Flow: The process of generating integration footers has been optimized when no third-party host compiler is used, resulting in fewer temporary files and faster compilation times.
Additional Math Function Support: New support for math functions like truncf, sinpif, rsqrtf, exp10f, ceilf, copysignf, cospif, fmaxf, and fminf in SYCL kernels has been added as part of the C-CXX-StandardLibrary extension. More Intel Math Functions (IMF), ::rand and ::srand in device code on Intel devices, have also been integrated.
Enhanced Error Messaging: Error messages have been improved for scenarios involving implicit this capture in kernels and missing architecture information when multiple targets are passed into the -fsycl-targets flag.
Optimized Compilation Flow: The number of commands needed for generating dependencies using the -MD flag has been reduced, streamlining the build process.
Security and Debugging: Security-related compilation flags for libraries and tools have been strengthened, and the debugging experience has been improved for both Linux and Windows environments.

SYCL Library

Support for ESIMD functions: Added support for sqrt and rsqrt functions for double data types in ESIMD.
Cubemap and Sampled Image Arrays Support: Updated sycl_ext_oneapi_bindless_images extension to support cubemap images and sampled image arrays.
Named Barrier Allocation in ESIMD: Introduced ESIMD API for dynamic allocation of named barriers.
Executable Command Graph Update: Added support for whole graph updates using executable_command_graph::update.
Deprecation Warning: A warning has been added for the use of the deprecated <CL/sycl.hpp> header.
Accessor Improvements: local_accessor::get_pointer and local_accessor::get_multi_ptr now throw an invalid exception if called on the host.
Queue Operations Detection: Extended detection of nested queue operations to support shortcut methods.
Simplified ESIMD API Interface: Added overloads of various ESIMD APIs (e.g., atomic_update, block_load, block_store) allowing omission of some template arguments.
Bfloat16 Math Functions: Updated sycl_ext_oneapi_bfloat16_math_functions to support vectors of bfloat16 passed to math functions.
Optimized sycl::vec::as: Improved the performance of sycl::vec::as by optimizing the implementation of sycl::detail::memcpy.
SYCL 2020 Exception Updates: Updated the implementation to throw SYCL 2020 exceptions instead of legacy SYCL 1.2.1 exceptions across the board.
sycl::vec::convert Support: Added support for sycl::vec::convert to and from vec<bfloat16, N>.
Deprecations: marray<bool, n>::operator++/-- and accessor::get_multi_ptr for non-device accessors have been deprecated.
ESIMD Named Barriers: Moved ESIMD named barrier APIs out of the experimental namespace.
SYCL Extensions and API Enhancements:
- Implemented the latest revision of sycl_ext_oneapi_free_function_queries.
- Extended sycl-ls --verbose to print detailed device information, including UUIDs and architecture.
- Introduced support for compile-time properties in copy_to and copy_from ESIMD APIs.
Non-Variadic printf Interface: Switched experimental::printf to a non-variadic interface to improve usability when printing float values.
Enhanced ESIMD API Validation: Improved validation for rdregion and wrregion APIs using static assertions on template arguments.
SYCL 2020 Specification Alignment: Updated mutating swizzle operators and scalar conversions for vec to align with the SYCL 2020 specification.
Miscellaneous ESIMD Improvements:
- Added support for 1- and 2-byte data types to ESIMD prefetch APIs.
- Enabled ext_intel_matrix support for Intel GNR devices.
- Introduced new overloads of load_2d, store_2d, and prefetch_2d ESIMD APIs with compile-time properties.
- Added support for group shift algorithms (e.g., shift_group_left, permute_group_by_xor) for non-uniform groups.
- Lifted restrictions on the ESIMD block_store API and enhanced the slm_atomic_update API to support fsub and fadd.
Graph and Semaphore Support:
- Added support for graph update functionality and external semaphore wait/signal operations with values in the bindless images extension.
- Introduced device-to-device copying of image_device_handle.
Unified Runtime: Removed the Plugin Interface, replacing it with the Unified Runtime, which reduces the number and size of redistributable libraries.
Performance Improvements: Reduced startup overhead of libsycl.so by outlining the SYCL JIT compiler into a standalone library, dynamically loaded on first use.

Bug Fixes

SYCL Compiler

Fixed a bug where using the -fsycl-link-targets flag would inadvertently trigger additional device code linking steps.
Resolved an issue where AOT-compiling for Intel GPUs would pass PVC-specific flags even if the target device was not a PVC.
Fixed a bug with incorrect file extensions being emitted in AOT compilation when using --save-temps.
Fixed an issue where performing separate compilation and linking with -fsycl-link resulted in a "number of output files and targets should match in unbundling mode" error during the link step.
Resolved an issue where passing pointers in the generic address space to certain built-in math functions could cause compilation failure.
Fixed a bug where compiling kernels with different reqd_work_group_size attributes using -fsycl-device-code-split=none could result in a runtime exception about mismatching work-group sizes.
Resolved a bug where using the reqd_work_group_size attribute with fewer than three arguments caused a crash.
Addressed issues with shift_group_[right|left], permute_by_xor, and select_from_group algorithms returning invalid values when used with the half data type.

SYCL Library

Fixed a situation where querying sycl::ext::oneapi::experimental::info::device could result in an exception instead of returning an empty vector.
Corrected the esimd::atan implementation under the -ffast-math flag.
Fixed an issue where component devices were not correctly identified as descendants of composite devices when creating a queue.
Addressed an issue where querying for composite devices could return duplicate entries.
Fixed bugs in the copy-constructor of the config_2d_mem_access ESIMD class, which led to compilation errors.
Resolved an issue where the use of atomic_ref<T*> was not detected as using the atomic64 aspect, leading to errors.
Fixed bugs with ctanh and cexp returning incorrect values in edge cases.
Fixed an issue where values passed to the -Xs option via build_options were not passed down to the device compiler.
Fixed a compilation error when defining kernels as named functors while using -fno-sycl-unnamed-lambda.
Corrected compilation issues with the -fpreview-breaking-changes flag caused by conflicts with macros in windows.h.
Resolved strict aliasing violations in the implementation of sycl::vec<sycl::half, N>::operator[] that caused errors.
Fixed bugs where barriers submitted to a command queue with host tasks ignored them, and improved synchronization of host tasks with barriers.
Fixed issues where the compiler could emit unsupported SPIR-V instructions for bit-reversal.
Addressed a bug where default-constructed local_accessor arguments could cause runtime errors, especially on Windows and under -O0 optimization on Linux.
Resolved a hang when invalid values were passed to the ONEAPI_DEVICE_SELECTOR.
Fixed issues with persistent cache functionality where certain setups would prevent necessary directories from being created.
Corrected a bug where querying a kernel by name from a kernel bundle could crash the program.
Fixed an error handling bug where non-blocking pipe operations would mistakenly throw exceptions.
Addressed compilation issues when using non-uniform group built-ins with marray and vec.
Resolved a bug where memory attributes applied to a struct used as a type of a device_global variable were ignored.
Added missing value_type and vector_t member type aliases to swizzles.
Fixed shutdown sequence issues when SYCL RT was used in applications or libraries with custom shutdown processes.
Resolved a crash when calling event::get_backend() on a default-constructed event in environments with malformed ONEAPI_DEVICE_SELECTOR.
Fixed a bug where sycl-ls with --ignore-device-selectors would not properly ignore the environment variable.
Corrected memory order capabilities returned by the Native CPU backend.
Fixed the variadic constructor of sycl::ext::oneapi::experimental::properties to match the extension specification.
Fixed build program failures when using ESIMD functions like load_2d, store_2d, or prefetch_2d.
Resolved a bug where querying free device memory on integrated Intel GPUs returned 0 instead of throwing an exception for unsupported features.
Addressed a heap buffer overflow in the sycl_ext_oneapi_kernel_compiler_opencl extension implementation.
Corrected a bug where the sycl_ext_oneapi_graph extension ignored the access mode of accessors, creating unnecessary graph edges.
Fixed issues where graph submissions involving barriers could result in runtime errors or cause resource leaks.
Addressed performance regressions when kernels without dependencies were submitted to in-order queues.
Fixed profiling issues in Level Zero backend where timestamps could be zeros or incorrect for in-order queues.
Resolved crashes when using multiple queues with immediate command list properties--immediate_command_list and no_immediate_command_list..
Fixed a bug where info::kernel_device_specific::work_group_size would return the device-specific limit, ignoring the kernel on the Level Zero backend.

Misc

SYCL Compiler

Reverted changes previously made on Windows to support a separate compilation scenario where the compilation step was performed without the -fsycl flag, but the link step included the -fsycl flag. This scenario is now considered unsupported, as the compiler does not know which version of the standard library to link during the link step.

API/ABI Breaking Changes in 2025.0

This release is an ABI-breaking release, meaning that any applications built with older versions of the toolchain must be recompiled to run with newer versions of the SYCL runtime library.

Bumped the major version of the SYCL runtime library to 8.
Cleaned up the list of symbols exported from the SYCL runtime library by dropping some legacy symbols and hiding others that should not have been exported.
Updated the ABI of several functions and methods to avoid using std::string and other objects in the library interface, allowing SYCL RT to be used in applications built with pre-C++11 ABI.
Changed the ext_oneapi_copy API from the experimental sycl_ext_oneapi_bindless_images extension to accept const-qualified types for the Src parameter.

Several API breaking changes were made, including dropping support for previously deprecated APIs and switching implementations of some classes to a preview implementation. Code modification recommendations for some of these breaking changes can be found here.

Removed the sycl::abs overload taking a floating-point argument.
Removed sycl::host_ptr and sycl::device_ptr.
Removed queue::discard_or_return.
Removed sycl::make_unique_ptr.
Removed the use_primary_context property and methods related to the previously removed host device.
Removed SYCL 1.2.1 exception subclasses, including runtime_error, nd_range_error, invalid_parameter_error, device_error, and feature_not_supported.
Removed queue::mem_advice overload accepting pi_mem_advice.
Removed several deprecated ESIMD APIs.
Removed the non-standard sycl::id -> sycl::range conversion operator.
Removed deprecated APIs from the sycl_ext_oneapi_bindless_images extension implementation.
Renamed the experimental destroy_external_semaphore API from the sycl_ext_oneapi_bindless_images extension to release_external_semaphore.
Replaced the image_channel_order field of the image_descriptor struct with the number of channels in the experimental sycl_ext_oneapi_bindless_images extension.
Enforced restrictions on the first argument of lambdas/functors passed to parallel_for(range) and parallel_for(nd_range).
Switched the sycl::vec implementation to its preview version, which uses a different storage type to fix several strict aliasing rule violations.
Restricted math operations available to vec<std::byte, N> to those applicable to std::byte.
Switched the sycl::exception implementation to its preview version.
Switched math built-ins implementation to use their preview version.
Switched bfloat16 implementation to use its preview version.
Switched sycl::nd_item implementation to use its preview version.
Enforced a restriction that a buffer's element type must be device copyable.
Restructured SYCL headers to exclude <cmath> and <complex>.
Dropped support for the SYCL_DEVICE_FILTER environment variable.
Updated the accessor::get_pointer interface to return global_ptr<value_type>, which can be const-qualified if the accessor data type is const-qualified or if the accessor is read-only.
Removed deprecated APIs related to sycl_ext_oneapi_free_function_queries.
Moved slm_allocator ESIMD APIs into the experimental namespace.
Removed the deprecated usm_system_allocator aspect.
Removed get_child_group API from the experimental sycl_ext_oneapi_root_group extension.
Simplified template arguments related to simd_view of many ESIMD APIs.
Removed ESIMD atomic_op::predec.
Dropped interfaces from revision 1 of the experimental sycl_ext_oneapi_group_sort extension.
Changed the return type of command_graph::begin_recording and command_graph::end_recording from void to bool in the experimental sycl_ext_oneapi_graph extension.

Breaking changes were also made to compiler flags:

Removed the deprecated -fsycl-link-huge-device-code, -fsycl-[add|link]-targets , -foffload-static-lib , -foffload-whole-static-lib , -fsycl-disable-range-rounding , -sycl-std flags.

SYCL Known Issues

On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts. This is due to the release of the plugin DLLs racing against the release of static global variables, such as the default context.
The Intel Graphic Compiler's Vector Compute backend does not support certain optimization levels and often produces incorrect results or crashes. This issue directly affects ESIMD code. As a temporary workaround, optimize ESIMD code even in the affected mode.
When using the sycl_ext_oneapi_matrix extension, it is important for some devices to use the appropriate settings corresponding to the device that will run the program, particularly for matrix operations using half data type.
When using queue shortcut functions with in-order queues, dependencies between commands submitted to different queues may be ignored. A workaround is to explicitly call .wait(). This issue will be fixed in the next release. In below example, the seocnd kernel will start execution before the first completes its execution.


// q1 long running task
sycl::event e = q1.single_task([=](){ /* ... */ });
// q2 task
q2.single_task(e, [=](){ /* ... */ });

C/C++ math built-ins can return incorrect results for some edge-case inputs when called from SYCL kernels.
To enhance performance on Intel® GPUs using the Unified Runtime Level Zero Adapter, support for driver-optimized in-order lists has been introduced in version 2025.0. However, when running workloads with sycl::property::queue::enable_profiling, some performance overhead from these lists is expected. If this overhead negatively impacts performance, it can be mitigated by disabling the driver in-order lists. To do so, set UR_L0_USE_DRIVER_INORDER_LISTS=0.
To ensure compatibility with the Intel® oneAPI DPC++ Compiler on Windows*, which requires OpenCL 3.0, it is essential to address potential issues caused by older versions of opencl.dll on your system. If an outdated opencl.dll is present in system directories or takes precedence in the library path, it may lead to failures, including SYCL-related issues and crashes in tools like Intel® VTune™ and Intel® Advisor when specific OpenCL 3.0 features are used. The recommended solution is to replace the old opencl.dll with the one installed in the DPC++ package. You can do this by copying the newer opencl.dll from $oneAPI_Install_Folder\compiler\latest\bin to your system folder. Be sure to back up the original opencl.dll in case it's needed for other applications.
sycl_ext_oneapi_free_function_kernels has limitations including:
- free function kernels are only supported if defined at file scope
- SYCL_EXTERNAL has to be used alongside SYCL_EXT_ONEAPI_FUNCTION_PROPERTY to define free function kernel
- compiler won't emit any diagnostics if some restrictions from the extension specification are violated
- arguments of a free function kernels cannot be composite data types like structs or SYCL classes like accessor
- using -fsycl-dead-args-optimization (ON by default) can lead to failures
- info::kernel::num_args won't return the right result for free function kernels

New OpenMP Features

Support for the -fopenmp-offload-mandatory compiler flag to omit creation of host-fallback code and emit a runtime error if OpenMP offload to the device fails.
Improved optimization report support for OpenMP constructs.
Enhanced conversion scheme of nested loop constructs to consider loop trip counts.
Updates to the declare variant for a dispatch construct to include GPUs with the Xe2 architecture when the match clause specifies device={arch(gen)}.
Support for the device_type(host|nohost|any) clause for the target construct.
Inclusion of the if clause for the teams construct.
Change of the map-type property to "default," allowing map-type modifiers to be specified without a map-type. For example, map(always : x) is equivalent to map(always, tofrom : x).
Support for the Intel extension ompx_sub_group_size clause for the target construct to set the SIMD width of the kernel.
Support for the Intel extension ompx_dyn_cgroup_mem clause for the target construct, allowing dynamic allocation in SLM for GPU offloading.
Extension of environment variables OMP_THREAD_LIMIT, OMP_TEAMS_THREAD_LIMIT, and OMP_NUM_THREADS to support abstract names. For example, OMP_THREAD_LIMIT=n_cores.
Extension of the syntax of the environment variable OMP_PLACES to support bound and stride for abstract names. For example, OMP_PLACES=threads(4:2).
Host runtime support for the environment variable OMP_AVAILABLE_DEVICES.
Extension of the environment variable OMP_DEFAULT_DEVICE to support device selection by traits.

Notable OpenMP Fixes

Fixed a bug where the dispatch construct’s device clause was not updating OpenMP’s default-device-var ICV.
Resolved an internal compiler error when the declare variant for a dispatch construct did not specify an adjust_args clause.
Fixed an optimization bug in OpenMP for and simd loops with large trip counts.
Corrected a regression where enclosing task constructs inside a teams construct triggered a compiler error message.
When thread_limit is specified for both target and teams, the compiler now correctly chooses their minimum instead of always using the one specified for target.
Fixed an internal compiler error related to the initialization of global variables allocated in GPU’s SLM.
Addressed a problem in offload runtime where the reference counts of variables mapped using declare mapper were not decremented correctly.
Fixed a GPU offload performance issue related to L1 cache being affected by temporary copies of reduction variables.
Resolved a bug where user-defined reduction variables were not properly constructed or destructed.

OpenMP Known Issues

Implicit barriers at the end of parallel regions do not act as synchronization points for the tasks associated with target nowait and dispatch nowait constructs. This may result in incorrect results or crashes. A workaround is to use #pragma omp taskwait at the end of parallel region to ensure synchronization of target/dispatch nowait regions, where it would otherwise have happened due to the presence of a parallel region’s implicit barrier.

Other Known Issues and Limitations

Visual Studio IDE Integration: Users will encounter an error while building the C++ project using 'Intel C++ Compiler 2025' for Win32 platform. Please note that Win32 platform is not supported with 'Intel C++ Compiler 2025' and project should be compiled for x64 platform only. If Win32 platform is selected, an error will be raised that ICX compiler not found.

Hardware Support:

-march=lunarlake
-march=graniterapids

Please check here for details about -march usage.

Toolchain Support to Intel Platforms

Granite Rapids	Granite Rapids-D	Lunar Lake
GCC13.1	GCC13.1	GCC14.1
Binutils 2.40	Binutils 2.41	Binutils 2.42
Glibc2.37	Glibc2.37	Glibc2.39
LLVM 16.0	LLVM 17.0	LLVM 18.0
ICX 2023.1	ICX 2023.2	ICX 2024.0

C/C++ Standard

Intel® oneAPI DPC++/C++ Compiler version 2025.0 supports the C/C++ standards through the Clang 19 front end.
Initiated support for C++2c, the next release of C++ after C++23, and C2y, the next release of C after C23
Finalized the implementation of “deducing this” (C++23)
Relaxed some constexpr restrictions (C++23)
Implemented the [[assume]] attribute (C++23)
Completed support for Concepts (C++20)
Added support for char8_t (C23)
Implemented the constexpr keyword for object declarations (C23)
Implemented #embed for embedding binary resources in source (C23)

System Requirements

Intel® oneAPI DPC++/C++ Compiler System Requirements

Additional Documentation

Notices and Disclaimers

Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI DPC++/C++ Compiler Release Notes

Where to Find the Release

oneAPI 2025.2, Compiler Release 2025.2

Major New Features and Enhancements

New Features

Improvements and Bugfixes

Known Issues and Limitations

API/ABI Breaking Changes

Deprecations

Compiler Patch Release 2025.1.1

oneAPI 2025.1, Compiler Release 2025.1

oneAPI 2025.0.1, Compiler Patch Release 2025.0.4

Compiler Patch Release 2025.0.1

oneAPI 2025.0, Compiler Release 2025.0

Hardware Support:

Toolchain Support to Intel Platforms

C/C++ Standard

System Requirements

Additional Documentation

Notices and Disclaimers

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI DPC++/C++ Compiler Release Notes

Where to Find the Release

oneAPI 2025.2, Compiler Release 2025.2

Major New Features and Enhancements

New Features

Improvements and Bugfixes

Known Issues and Limitations

API/ABI Breaking Changes

Deprecations

Compiler Patch Release 2025.1.1

oneAPI 2025.1, Compiler Release 2025.1

oneAPI 2025.0.1, Compiler Patch Release 2025.0.4

Compiler Patch Release 2025.0.1

oneAPI 2025.0, Compiler Release 2025.0

Hardware Support:

Toolchain Support to Intel Platforms

C/C++ Standard

System Requirements

Additional Documentation

Notices and Disclaimers

Product and Performance Information