Intel® oneAPI DPC++/C++ Compiler Release Notes

ID 852942
Updated 6/23/2025
Version
Public

This document summarizes new and changed product features and includes notes about features and problems not described in the product documentation.

Where to Find the Release

Please follow the steps to download the Intel® oneAPI Base Toolkit from the Intel® oneAPI Base Toolkit Download page and follow the installation instructions to install.

The Intel® oneAPI DPC++/C++ Compiler’s integrated support for Altera FPGA has been removed as of the 2025.1 release. Altera® will continue to provide FPGA support through their dedicated FPGA software development tools. Existing customers can continue to use the Intel® oneAPI DPC++/C++ Compiler 2025.0 release which supports FPGA development and is available through Linux* package managers such as APT, YUM/DNF, or Zypper. Additionally, customers with an active support license can access the Intel® oneAPI DPC++/C++ Compiler 2025.0 via their customer support account.
For more information and assistance with transitioning to the Altera development tools, please contact your Altera representative.

oneAPI 2025.2, Compiler Release 2025.2

Major New Features and Enhancements

  • ThreadSanitizer Support: Extended CPU Thread Sanitizer support to device-side, to detect data races access in both CPU and device code. It supports data race detection within USM memory, SYCL buffers, and device_global memory detection in SYCL and OpenMP C/C++ device code.
  • MemorySanitizer Support: The device-side Memory Sanitizer is extended to support OpenMP offload to detect uninitialized memory use. It’s also extended to support the detection on local and private memory as an experimental feature.
  • Hardware Profile Guided Optimization (HWPGO): Enhanced HWPGO to remove pseudo probe description and restore dwarf discriminator for call instruction when using -sample-profile-remove-probe . Enhanced Clang driver to automatically add column info, which is important for HWPGO to generate/load profile file, when using -gdwarf  or -fprofile-sample-use.

New Features

C/C++ Compiler:

  • Added an --lbr-mispredicts mode to llvm-profgen which can use the LBR_INFO branch prediction flag to create a branch mispredict profile instead of samples of a separate branch mispredict event. This approach has the known downside of only collecting mispredicts on taken branches, but it may simplifies sampling requirements in special use cases. Using a separate branch mispredict event as described in the Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference should still be used in preference to --lbr-mispredicts.
  • Improved code generation for AMX ISA and inline assembly error checking.
  • Various improvements in integer and floating point arithmetic.
  • For the -qopt-report option:
    • Memory prefetching will be reported in store/load-only mode (as well as the default mode).
    • Memory accesses using omp simd nontemporal will be recorded in the optimization report.
    • The vectorizer will add a remark to the optimization report when overriding the unroll factor provided by #pragma unroll.  This can happen when the unroll factor is too large for the number of vectorized loop iterations.
  • Added support for -f[no]-offload-fp32-prec-div and -f[no]-offload-fp32-prec-sqrt compiler flags to control precision of floating-point division and square root.
  • Native CPU Device:
    • Added support for source-based code coverage on Native CPU.

SYCL Compiler:

  • The sycl_ext_oneapi_kernel_compiler extension specification was updated to accept SYCL as source language.
  • Initial support for runtime compilation of SYCL code was implemented.
  • The Level Zero v2 (L0 v2) adapter is a new backend that was added as an experimental feature, with plans to make it the default in 2025.3. This is a redesigned version of the L0 adapter that focuses on maximizing the performance of each queue mode individually. It currently supports immediate in-order mode only. This second version of the adapter significantly reduces host runtime overhead and improves latency of kernel submissions. If you experience any performance or functional issues with this adapter enabled, please report them here, specifying the adapter used.
    • It can be enabled with SYCL_UR_USE_LEVEL_ZERO_V2=1.

SYCL Library:

  • SYCL Graphs:
  • SYCL Bindless Images:
    • Added support for more kinds of copy operations (image_mem_handle to USM and vice versa, USM to USM, etc.).
    • Added support for Vulkan* timeline semaphores.
  • SYCL Extensions:
  • Support for core SYCL 2020 functionality:
  • Implemented swizzle method for swizzles.
  • Support for pre-C++11 ABI:
    • Many SYCL APIs use std::string as argument or return type and it is known for its ABI being broken by gcc at some point. There are applications which are still built using old, pre-C++11 ABI and in order to support them, the SYCL Runtime should not have std::string (and some other classes) used at the ABI boundry. This effort has been largely completed, but some APIs still sneak up from time to time and being fixed:
      • Added support for print_graph API in pre-C++11 ABI mode.
      • Added support for pipe::get_pipe_name API in pre-C++11 ABI mode.
      • Decided not to support get_backend_info in pre-C++11 ABI mode (at least for now) because there are no queries that could be done through it. Calling it under pre-C++11 ABI mode now causes an error.

OpenMP:

  • Support the OpenMP 6.0 stripe loop-transformation construct.
  • The nowait clause in target, target enter/exit data, and target update constructs can now take an optional Boolean argument to conditionally choose between asynchronous or synchronous offloading.
  • For spir64 devices, a new command-line flag, -fopenmp-target-teams-default-vla-alloc-mode=malloc/wilocal (default: wilocal), was added to allow control over how local copies for variable-length arrays private to teams and distribute constructs are allocated.
  • Added a new command-line flag, -fopenmp-target-loop-stride=local-size/global-size/one (default: local-size), to tune performance of spir64-offloaded OpenMP loops by controlling their loop stride.
  • Improved debug info in OpenMP-outlined routines where some variables were previously reported as optimized away by gdb.

Unified Runtime:

  • Expanded support for the Level Zero adapters to provide binary backwards compatibility with Level Zero drivers with APIs as old as v1.7 of the Level Zero Specification.

Improvements and Bugfixes

SYCL Runtime:

  • Reduced amount of string copies unnecessarily made by the SYCL Runtime for debug traces even if debug tracing is disabled.
  • Reduced number of times shared_ptrs are copied.
  • Reduced amount of memory allocations happening by moving away from using std::function. This should also help with reducing compilation time of SYCL headers.
  • Reduced amount of memory allocations required for local_accessor.
  • Reduce amount of memory allocations on "fast" kernel enqueue path and dropped some unnecessary runtime checks.
  • Made more queue operations go through the "fast" path.

C/C++ and SYCL Compilers:

  • Introduced a new optimization to eliminate back-to-back barriers when it is safe. Such chain of barriers may occur when multiple group algorithms are used next to each other.
  • Removed a busy-wait loop from the implementation of -fsycl-max-parallel-link-jobs flag, making it consume less resources when waiting.
  • Uplifted maximum version of SPIR-V that compiler can generate to 1.5.
  • Made compiler embed device library needed for bfloat16 support into the application (if it is used). This change will allow us to reduce the size of redistributable SYCL Runtime package by eliminating some files from it.
  • Added a compiler warning diagnostic about undefined SYCL_EXTERNAL functions used in a module to help catch linking errors earlier.
  • Addressed issue where the compiler would generate invalid SPIR-V if kernel used arguments of boolean type.
  • Switched to use native bfloat16 implementation for devices that support it, as well as fixed a bug where native implementation won't be used if multiple AOT targets are specified.
  • Aligned behavior of -Wimplicit-float-conversion with the upstream clang for non-SYCL language modes.
  • Improved check for unsupported data types to actually rely on target information instead of hardcoded knowledge.
  • Fixed where compilation with -mlong-double-64 would still result in error that a 128-bit double is not supported by a target.
  • Fixed a bug where linking static libraries with SYCL code in them using -l:libname.a spelling would ignore device code from those libraries.
  • Fixed a bug where having a pure virtual function during device compilation would cause unresolved symbol errors emitted by device compiler on Windows.
  • Fixed a bug where having two kernels (one annotated with reqd_work_group_size attribute/property and another without it) together with -fsycl-device-code-split=off would cause runtime error about mismatched work-group size.
  • Native CPU device:
    • Improved support for dynamic_address_cast on Native CPU device.
    • Improved performance of Native CPU device: less memory allocations and thread launches.
    • Fixed a bug where submitting the same kernel multiple times at about the same time with different argument would lead to incorrect arguments being used.
    • Fixed compiler crashes when building applications that use atomics.
    • Fixed segfaults happening in SYCL CTS tests for async_work_group_copy API.
    • Improved support for sub-groups by updating version of oneAPI Construction Kit.
  • Sanitizers:
    • Reduced the frequency of shadow memory reallocation to reduce memory overhead and improve runtime performance.
    • Fixed ASAN throwing an exception with UR_RESULT_ERROR_INVALID_ARGUMENT when detecting incorect memory free operation.
  • Explicit SIMD extension:
    • Extended sycl_ext_intel_esimd extension specification and implementation with new queries to check support for 2d load/store/prefetch operations.
    • Fixed miscompilations of ESIMD functions under high optimization levels when compiler performs aggressive inlining.

SYCL Library:

  • Made group_[load|store] functions to use native built-ins when used with vectors of 16 shorts.
  • Extended support for shared libraries to make it work with kernel bundles as well.
  • Added tracing (through SYCL_UR_TRACE) for SYCL_DEVICE_ALLOWLIST decisions for better discoverability of the feature.
  • Aligned implementation of info::execution_capability query with the recent SYCL 2020 specification change made in KhronosGroup/SYCL-Docs#625.
  • Fixed compilation issues with group functions like select_from_group with certain data types (pointers, marray<bfloat16, 4>, for example).
  • Implemented persistent cache eviction.
  • Enforced constraints documented by the sycl_ext_oneapi_reduction_properties extension.
  • Clarified and enforced properties constraints in the sycl_ext_oneapi_group_load_store extension specification and implementation.
  • Implemented properties validation to kernel bundle and graph APIs.
  • Updated the sycl_ext_oneapi_in_order_queue_events extension specification and implementation to make event returned by ext_oneapi_get_last_event optional for queues where no work had been submitted.
  • Update the sycl_ext_oneapi_group_load_store extension specification and implementation to accept the alignment property in group load/store built-in functions to allow for more optimized implementation.
  • Lifted restriction that host APIs from sycl_ext_oneapi_free_function_kernels had to be guarded by #ifndef __SYCL_DEVICE_ONLY__.
  • Fixed potential resource leaks in online compiler extension.
  • Fixed an issue where known_identity<min|max> would return incorrect values with the -ffast-math flag.
  • Fixed a UB in implementation of device_global which sometimes led to spurious results.
  • Fixed a static_assert failure in SYCL headers when an application is built with -funsigned-char.
  • Resolved an issue caused memory operations enqueued through sycl_ext_oneapi_enqueue_functions extension to break functionality of sycl_ext_oneapi_enqueue_barrier extension.
  • Fixed a bug where compiling with -D_FORTIFY_SOURCE=2 would cause errors from device compilers at JIT stage (or during AOT compilation) about undefined __memcpy_chk symbol.
  • Fixed an incorrect result of std::exp(std::complex) in some corner cases.
  • Fixed a crash happening when you launch a kernel that is defined in both the application and a dlopen-ed shared library after that library was unloaded through dlclose.
  • Fixed a memory leak happening when a kernel submission failed.
  • Fixed a bug where using vec::operator[] would cause compilation issues on Windows when an application is built using clang.exe and _DEBUG macro is set.
  • Aligned joint_matrix_apply implementation with the specification change to be able to modify both matrices.
  • Bindless Images:
    • Added support for ext_oneapi_bindless_sampled_image_fetch_1d, ext_oneapi_bindless_sampled_image_fetch_1d_usm, ext_oneapi_bindless_sampled_image_fetch_2d, ext_oneapi_bindless_sampled_image_fetch_2d_usm and ext_oneapi_bindless_sampled_image_fetch_3d aspects on Level Zero backend.
    • Fixed return types of image extent queries to match the specification.
    • Clarified the types of supported USM memory in the extension specification.
    • Fixed compiler crash caused by the use of anisotropic sampling operations on 3D mipmaps, due to the intrinsic being generated with an incorrect number of LOD gradient parameters.
  • SYCL Graphs:
    • Reimplemented topological sort algorithm used to determine graph nodes execution order to avoid issues with overflowing stack on huge graphs and improve performance.
    • Documented kernel binary update feature which allows to update kernel nodes in graphs.
    • Fixed race condition in command_graph node queries.
    • Fixed the issue with not all graph-related classes fully implementing common reference semantics.
    • Made ext_oneapi_weak_object extension work with graph objects.
    • Fixed a bug where using local_accessor or work_group_memory objects as part of graph update would function incorrectly on non-SYCL backends.

SYCLcompat Library:

  • Introduced new set of group utility functions and classes aimed to reduce the gap between syclcompat and dpct namespaces.
  • Fixed compare_mask putting results in the wrong 2-byte segment of 4-byte output.
  • Optimized implementation of permute_sub_group_by_xor for the case when logical_sub_group_size is 32.
  • Added new function ternary_logic_op to perform bitwise logical operations on three input values based on the specified 8-bit truth table.
  • Fixed issues with multiple vectorized operations returning wrong results.

OpenMP:

  • Fixed a bug affecting usage of target offload in lambda.
  • Implemented a more robust mechanism to detect OpenMP loops that were optimized away to better differentiate them from malformed loops.
  • Corrected the handling of OpenMP simd loops that were optimized away, previously crashing the compiler in some cases.
  • Fixed an issue in the collapsing of OpenMP simd loops at -O0 that caused incorrect vectorization.
  • Fixed incorrect OMPT callbacks for teams distribute parallel for constructs.
  • Fixed a lastprivate issue in task and target regions causing spurious function arguments to be created for outer target/task regions.
  • Improved emission of -qopt-report remarks about how OpenMP data-sharing clauses were optimized.
  • Fixed an incorrect behavior of tile construct nested inside a target teams distribute parallel for collapse(N) construct.
  • The flush construct is no longer ignored for OpenMP spir64 offload.

Issues with 3rd-party host compilers:

  • Fixed compilation issue with get_vec_idx internal helper with MSVC as host compiler.
  • Fixed missing #include when building with GCC 13 as host compiler.
  • Fixed compilation issue with joint matrix extension with MSVC from Visual Studio 2019 as host compiler.

Misc:

  • Removed testing on FPGA Emulator as a step towards our strategy to drop FPGA support. Starting with this release there is no guarantee that FPGA-specific features continue to work.
  • Docker images containing nightly builds are not provided anymore, but we still provide Dockerfiles so you can build those images yourself.
  • Fixed OCL CPU Runtime installation script leaving incorrect permissions on a system folder.

Known Issues and Limitations

SYCL:

  • SYCL headers use unreserved identifiers which sometimes cause clashes with user-provided macro definitions. Known identifiers include: G, VL.
  • When using sycl_ext_oneapi_matrix extension it is important for some devices to use the sm version (Compute Capability) corresponding to the device that will run the program. This particularly affects matrix operations using half data type.
  • C/C++ math built-ins (like exp or tanh) can return incorrect results on Windows for some edge-case input. The problems have been fixed in the SYCL implementation, and the remaining issues are thought to be in MSVC.
  • There are known issues and limitations in virtual functions functionality, such as:
    • Optional kernel features handling implementation is not complete yet.
    • AOT support is not complete yet.
    • A virtual function definition and definitions of all kernels using it must be in the same translation unit. Please refer to sycl/test-e2e/VirtualFunctions to see the list of working and non-working examples.

OpenMP:

  • Some OpenMP spir64 offload programs compiled with -O0 -g may result in a segfault failure at runtime.  Workaround: compile the program without -g, or compile it with -O2 -g.
  • For OpenMP spir64 offload, the memory-order and memscope clauses of the flush construct are silently ignored and the default (more conservative but correct) values of seq_cst/device are used for now.

Unified Runtime:

  • On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts on Windows. This is because on Windows the release of the plugin DLLs races against the release of static global variables (like the default context).

Intel® Graphics Compiler:

  • The Vector Compute backend does not support -O0 code and often gets miscompiles, producing wrong answers and crashes. This issue directly affects ESIMD code at -O0. As a temporary workaround, the ESIMD code is optimized even in -O0 mode.

CPATH to C_INCLUDE_PATH and CPLUS_INCLUDE_PATH Transition:

The oneAPI environment setup scripts have historically added paths for C, C++, and Fortran header files to the CPATH environment variable. These scripts are being transitioned to add relevant paths to the C_INCLUDE_PATH and CPLUS_INCLUDE_PATH environment variables instead of CPATH. This transition is intended to isolate Intel provided header files from the effects of compiler options used by customers to request or suppress compiler warnings in their own source files.

Paths present in the CPATH environment variable specify user include paths. By default, compiler warnings are issued for potentially problematic source code in header files found via these paths subject to use of options like -Wall, and most other options that begin with -W. Paths present in the C_INCLUDE_PATH and CPLUS_INCLUDE_PATH environment variables specify system include paths. By default, compiler warnings are suppressed for header files found via these paths (warnings in system header files can be enabled with the -Wsystem-headers option).

For most customers, this transition will be transparent with the only observed difference being that warnings are less likely to be issued for source code in Intel provided header files. However, there are some edge cases that could cause other differences in behavior for some customers.

Paths present in the CPATH environment variable are searched after paths specified by the -I option, but before paths present in the C_INCLUDE_PATH and CPLUS_INCLUDE_PATH environment variables, paths specified by the -isystem option, and paths implicitly added by the compiler. Customers that have been adding their own include paths to the end of the CPATH environment variable (after paths historically added by the oneAPI environment scripts) will now find that their include paths will be searched before include paths for Intel-provided header files. This could result in a customer provided header file with the same name as an Intel provided header file now being found first where as previously the Intel-provided header file would have been found first.

If the same path is present in both the CPATH and C_INCLUDE_PATH or CPLUS_INCLUDE_PATH environment variables, or is specified by the -isystem option or implicitly added by the compiler, then the matching path in the CPATH environment variable will be ignored. Customers that, perhaps inadvertently, add a path to Intel-provided header files to the CPATH environment variable that match a path added to the C_INCLUDE_PATH or CPLUS_INCLUDE_PATH environment variables by the oneAPI environment setup scripts, will find their addition to CPATH ignored. Since the oneAPI environment setup scripts would have previously added paths to CPATH, this could result in different include path search orders and thus different header files being found when a customer-provided header file has the same name as one provided by Intel.

Both of the above scenarios depend on header files with the same name being present in multiple include paths. Observable differences are only likely to occur if those header files have different contents or if they use the #include_next directive.

API/ABI Breaking Changes

  • Removed support for FPGA-related options as part of our strategy to drop FPGA support.
  • Removed options: -fintelfpga, -fsycl-targets=spir64_fpga[-unknown-unknown], -fsycl-link=early|image, -Xsycl-target-backend=spir64_fpga "opt", -reuse-exe=arg, and -fsycl-help=fpga.
  • Removed experimental sycl_ext_intel_oneapi_compiler extension support. Its APIs have been marked as deprecated for a while and sycl_ext_oneapi_kernel_compiler extension should be used instead.

Deprecations

  • Deprecated sycl_ext_oneapi_default_context extension in favor of sycl_khr_default_context extension.
  • Deprecated -fsycl-fp32-prec-sqrt compiler flag in favor of -foffload-fp32-prec-sqrt flag.
  • Deprecated overloads of single_task and parallel_for APIs that accept properties which used to be a part of sycl_ext_oneapi_kernel_properties extension. sycl_ext_oneapi_enqueue_functions extension should be used instead.
  • Deprecated overloads were completely removed from the extension specification.
  • Deprecated current implementation of get_backend_info API. The SYCL 2020 specification currently does not document anything that could be queried through it and therefore existing queries supported through it are deprecated to avoid possible confusion.
  • A known issue when compiling SYCL code on Windows using CMake with the 2025.1.0 version of the compiler has been fixed in this patch release. Please use 2025.1.1 or just replace IntelSYCLconfig.cmake from 2025.1 with 2025.1.1 for the latest fix for “CMake Error: Could NOT find IntelSYCL (missing: SYCL_LIBRARY)”.
  • Fixed the issue that led to the launch failure of an application built with ‘-x’ option specifying the target platform(which is TigerLake or above) on an OS without CET support.
  • Updated encodings of VCOMX*/VUCOMX* and VGETEXPPBF16 instructions according to AVX10.2 spec rev. 2. This may require an update to SDE 9.53 or later when binary is built with -mavx10.2 or with options implying -mavx10.2.

 Major Enhancements and New Features

New Features:

  • Hardware Enablement: Optimized for new Intel hardware including EMR, GNR, BMG, and LNL, with features such as cache hints and new data types for AI applications, delivering improved efficiency and computing power.
  • Bindless Textures Support: Implemented Bindless Textures for Intel GPUs (DG2, Arc), allowing dynamic texture usage at runtime without compile-time knowledge, enabling enhanced performance and scalability.

Performance Tuning and Enhancements:

  • AI and HPC Optimization: Tuned performance for AI frameworks and HPC applications.
  • OpenMP Enhancements: Early support for OpenMP 6.0 features, including the DEVICE_TYPE clause for TARGET construct and mandatory offloading support. Also, fixed the OpenMP loop rotation issue. Checkout Advanced OpenMP* Device Offload with Intel® Compilers for more details. 
  • Compiler Reports: Enhanced opt-report for better user experience, now providing detailed information on OpenMP offloading and integrating with the open-source optimization remark framework. Details on recent enhancements can be found at Develop Highly Optimized Applications Faster with Compiler Optimization Reports
  • Sanitizers for Device Code: Device code now supports LLVM sanitizers to help detect and resolve issues during development. It includes a compiler instrumentation module and runtime support, allowing it to detect issues such as out-of-bounds memory access on USM, SYCL buffers, local memory, and device globals, as well as bad-free, use-after-free, bad context, and more. In this release, PVC GPUs and CPUs are supported on Linux OS. More details on how and when to use sanitizers can be found at Find Bugs Quickly Using Sanitizers with the Intel® oneAPI DPC++/C++ Compiler.
  • Comprehensive Performance Insights: Upgraded optimization reports now cover SYCL, OpenMP, and AOT compilation, offering developers deeper insights into application performance.
  • Hardware Profile Guided Optimization (HWPGO): Key improvements include enhanced profile propagation for better accuracy, additional profile-driven optimizations to further boost performance, and early support for "pseudo probes" on Windows as an alternative to DWARF for profiling. Additionally, HWPGO has introduced selective function outlining, allowing for specific functions to be optimized based on profiling data, further enhancing runtime efficiency.​

New Features

SYCL Compiler:

  • SYCL Offload Model: Introduced a new SYCL offload driver mechanism with --offload-new-driver to improve infrastructure for better link times by reducing I/O and external processes.
  • Range Rounding Control: Added -fsycl-range-rounding option for managing range rounding, including forcing full rounding to reduce binary size. Additionally, the experimental -fsycl-exp-range-rounding option performs rounding across all dimensions.
  • Double Type Emulation: Added -fsycl-fp64-conv-emu option for partial emulation of double data types on Intel GPUs.
  • Dynamic Linking: Initial support added for dynamic linking, though some features like kernel_bundle API and AOT mode are not yet supported.

SYCL Library:

  • Extensions: Implemented multiple extensions, including sycl_ext_oneapi_prod, sycl_ext_oneapi_profiling_tag, sycl_ext_oneapi_forward_progress, sycl_ext_oneapi_private_alloca, syclext_codeplay_enqueue_native_command, and sycl_ext_oneapi_enqueue_functions.
  • Group Load/Store: Added support for sycl_ext_oneapi_group_load_store, enabling native hardware block read/write capabilities where applicable.
  • Free Function Kernels: Initial support for sycl_ext_oneapi_free_function_kernels extension, with known limitations around argument types and diagnostics.
  • Fused Multiply-Add (FMA): Added experimental ESIMD function fma which results in a guaranteed fused multiply-add operation performed.
  • Improvedsycl_ext_oneapi_group_sortextension: Updated implementation of sycl_ext_oneapi_group_sort extension to match revision 2 of the specification. Previous version 1 is not available anymore and some code changes may be required.

Improvements

SYCL Compiler

  • Improved Compilation Flow: The process of generating integration footers has been optimized when no third-party host compiler is used, resulting in fewer temporary files and faster compilation times.

  • Additional Math Function Support: New support for math functions like truncf, sinpif, rsqrtf, exp10f, ceilf, copysignf, cospif, fmaxf, and fminf in SYCL kernels has been added as part of the C-CXX-StandardLibrary extension. More Intel Math Functions (IMF), ::rand and ::srand in device code on Intel devices, have also been integrated.

  • Enhanced Error Messaging: Error messages have been improved for scenarios involving implicit this capture in kernels and missing architecture information when multiple targets are passed into the -fsycl-targets flag.

  • Optimized Compilation Flow: The number of commands needed for generating dependencies using the -MD flag has been reduced, streamlining the build process.

  • Security and Debugging: Security-related compilation flags for libraries and tools have been strengthened, and the debugging experience has been improved for both Linux and Windows environments.

SYCL Library

  • Support for ESIMD functions: Added support for sqrt and rsqrt functions for double data types in ESIMD.
  • Cubemap and Sampled Image Arrays Support: Updated sycl_ext_oneapi_bindless_images extension to support cubemap images and sampled image arrays.
  • Named Barrier Allocation in ESIMD: Introduced ESIMD API for dynamic allocation of named barriers.
  • Executable Command Graph Update: Added support for whole graph updates using executable_command_graph::update.
  • Deprecation Warning: A warning has been added for the use of the deprecated <CL/sycl.hpp> header.
  • Accessor Improvements: local_accessor::get_pointer and local_accessor::get_multi_ptr now throw an invalid exception if called on the host.
  • Queue Operations Detection: Extended detection of nested queue operations to support shortcut methods.
  • Simplified ESIMD API Interface: Added overloads of various ESIMD APIs (e.g., atomic_update, block_load, block_store) allowing omission of some template arguments.
  • Bfloat16 Math Functions: Updated sycl_ext_oneapi_bfloat16_math_functions to support vectors of bfloat16 passed to math functions.
  • Optimized sycl::vec::as: Improved the performance of sycl::vec::as by optimizing the implementation of sycl::detail::memcpy.
  • SYCL 2020 Exception Updates: Updated the implementation to throw SYCL 2020 exceptions instead of legacy SYCL 1.2.1 exceptions across the board.
  • sycl::vec::convert Support: Added support for sycl::vec::convert to and from vec<bfloat16, N>.
  • Deprecations: marray<bool, n>::operator++/-- and accessor::get_multi_ptr for non-device accessors have been deprecated.
  • ESIMD Named Barriers: Moved ESIMD named barrier APIs out of the experimental namespace.
  • SYCL Extensions and API Enhancements:
    • Implemented the latest revision of sycl_ext_oneapi_free_function_queries.
    • Extended sycl-ls --verbose to print detailed device information, including UUIDs and architecture.
    • Introduced support for compile-time properties in copy_to and copy_from ESIMD APIs.
  • Non-Variadic printf Interface: Switched experimental::printf to a non-variadic interface to improve usability when printing float values.
  • Enhanced ESIMD API Validation: Improved validation for rdregion and wrregion APIs using static assertions on template arguments.
  • SYCL 2020 Specification Alignment: Updated mutating swizzle operators and scalar conversions for vec to align with the SYCL 2020 specification.
  • Miscellaneous ESIMD Improvements:
    • Added support for 1- and 2-byte data types to ESIMD prefetch APIs.
    • Enabled ext_intel_matrix support for Intel GNR devices.
    • Introduced new overloads of load_2d, store_2d, and prefetch_2d ESIMD APIs with compile-time properties.
    • Added support for group shift algorithms (e.g., shift_group_left, permute_group_by_xor) for non-uniform groups.
    • Lifted restrictions on the ESIMD block_store API and enhanced the slm_atomic_update API to support fsub and fadd.
  • Graph and Semaphore Support:
    • Added support for graph update functionality and external semaphore wait/signal operations with values in the bindless images extension.
    • Introduced device-to-device copying of image_device_handle.
  • Unified Runtime: Removed the Plugin Interface, replacing it with the Unified Runtime, which reduces the number and size of redistributable libraries.
  • Performance Improvements: Reduced startup overhead of libsycl.so by outlining the SYCL JIT compiler into a standalone library, dynamically loaded on first use.

Bug Fixes

SYCL Compiler

  • Fixed a bug where using the -fsycl-link-targets flag would inadvertently trigger additional device code linking steps.
  • Resolved an issue where AOT-compiling for Intel GPUs would pass PVC-specific flags even if the target device was not a PVC.
  • Fixed a bug with incorrect file extensions being emitted in AOT compilation when using --save-temps.
  • Fixed an issue where performing separate compilation and linking with -fsycl-link resulted in a "number of output files and targets should match in unbundling mode" error during the link step.
  • Resolved an issue where passing pointers in the generic address space to certain built-in math functions could cause compilation failure.
  • Fixed a bug where compiling kernels with different reqd_work_group_size attributes using -fsycl-device-code-split=none could result in a runtime exception about mismatching work-group sizes.
  • Resolved a bug where using the reqd_work_group_size attribute with fewer than three arguments caused a crash.
  • Addressed issues with shift_group_[right|left], permute_by_xor, and select_from_group algorithms returning invalid values when used with the half data type.

SYCL Library

  • Fixed a situation where querying sycl::ext::oneapi::experimental::info::device could result in an exception instead of returning an empty vector.
  • Corrected the esimd::atan implementation under the -ffast-math flag.
  • Fixed an issue where component devices were not correctly identified as descendants of composite devices when creating a queue.
  • Addressed an issue where querying for composite devices could return duplicate entries.
  • Fixed bugs in the copy-constructor of the config_2d_mem_access ESIMD class, which led to compilation errors.
  • Resolved an issue where the use of atomic_ref<T*> was not detected as using the atomic64 aspect, leading to errors.
  • Fixed bugs with ctanh and cexp returning incorrect values in edge cases.
  • Fixed an issue where values passed to the -Xs option via build_options were not passed down to the device compiler.
  • Fixed a compilation error when defining kernels as named functors while using -fno-sycl-unnamed-lambda.
  • Corrected compilation issues with the -fpreview-breaking-changes flag caused by conflicts with macros in windows.h.
  • Resolved strict aliasing violations in the implementation of sycl::vec<sycl::half, N>::operator[] that caused errors.
  • Fixed bugs where barriers submitted to a command queue with host tasks ignored them, and improved synchronization of host tasks with barriers.
  • Fixed issues where the compiler could emit unsupported SPIR-V instructions for bit-reversal.
  • Addressed a bug where default-constructed local_accessor arguments could cause runtime errors, especially on Windows and under -O0 optimization on Linux.
  • Resolved a hang when invalid values were passed to the ONEAPI_DEVICE_SELECTOR.
  • Fixed issues with persistent cache functionality where certain setups would prevent necessary directories from being created.
  • Corrected a bug where querying a kernel by name from a kernel bundle could crash the program.
  • Fixed an error handling bug where non-blocking pipe operations would mistakenly throw exceptions.
  • Addressed compilation issues when using non-uniform group built-ins with marray and vec.
  • Resolved a bug where memory attributes applied to a struct used as a type of a device_global variable were ignored.
  • Added missing value_type and vector_t member type aliases to swizzles.
  • Fixed shutdown sequence issues when SYCL RT was used in applications or libraries with custom shutdown processes.
  • Resolved a crash when calling event::get_backend() on a default-constructed event in environments with malformed ONEAPI_DEVICE_SELECTOR.
  • Fixed a bug where sycl-ls with --ignore-device-selectors would not properly ignore the environment variable.
  • Corrected memory order capabilities returned by the Native CPU backend.
  • Fixed the variadic constructor of sycl::ext::oneapi::experimental::properties to match the extension specification.
  • Fixed build program failures when using ESIMD functions like load_2d, store_2d, or prefetch_2d.
  • Resolved a bug where querying free device memory on integrated Intel GPUs returned 0 instead of throwing an exception for unsupported features.
  • Addressed a heap buffer overflow in the sycl_ext_oneapi_kernel_compiler_opencl extension implementation.
  • Corrected a bug where the sycl_ext_oneapi_graph extension ignored the access mode of accessors, creating unnecessary graph edges.
  • Fixed issues where graph submissions involving barriers could result in runtime errors or cause resource leaks.
  • Addressed performance regressions when kernels without dependencies were submitted to in-order queues.
  • Fixed profiling issues in Level Zero backend where timestamps could be zeros or incorrect for in-order queues.
  • Resolved crashes when using multiple queues with immediate command list properties--immediate_command_list and no_immediate_command_list..
  • Fixed a bug where info::kernel_device_specific::work_group_size would return the device-specific limit, ignoring the kernel on the Level Zero backend.

Misc

SYCL Compiler

  • Reverted changes previously made on Windows to support a separate compilation scenario where the compilation step was performed without the -fsycl flag, but the link step included the -fsycl flag. This scenario is now considered unsupported, as the compiler does not know which version of the standard library to link during the link step.

API/ABI Breaking Changes in 2025.0

This release is an ABI-breaking release, meaning that any applications built with older versions of the toolchain must be recompiled to run with newer versions of the SYCL runtime library.

  • Bumped the major version of the SYCL runtime library to 8.
  • Cleaned up the list of symbols exported from the SYCL runtime library by dropping some legacy symbols and hiding others that should not have been exported.
  • Updated the ABI of several functions and methods to avoid using std::string and other objects in the library interface, allowing SYCL RT to be used in applications built with pre-C++11 ABI.
  • Changed the ext_oneapi_copy API from the experimental sycl_ext_oneapi_bindless_images extension to accept const-qualified types for the Src parameter.

Several API breaking changes were made, including dropping support for previously deprecated APIs and switching implementations of some classes to a preview implementation. Code modification recommendations for some of these breaking changes can be found here.

  • Removed the sycl::abs overload taking a floating-point argument.
  • Removed sycl::host_ptr and sycl::device_ptr.
  • Removed queue::discard_or_return.
  • Removed sycl::make_unique_ptr.
  • Removed the use_primary_context property and methods related to the previously removed host device.
  • Removed SYCL 1.2.1 exception subclasses, including runtime_error, nd_range_error, invalid_parameter_error, device_error, and feature_not_supported.
  • Removed queue::mem_advice overload accepting pi_mem_advice.
  • Removed several deprecated ESIMD APIs.
  • Removed the non-standard sycl::id -> sycl::range conversion operator.
  • Removed deprecated APIs from the sycl_ext_oneapi_bindless_images extension implementation.
  • Renamed the experimental destroy_external_semaphore API from the sycl_ext_oneapi_bindless_images extension to release_external_semaphore.
  • Replaced the image_channel_order field of the image_descriptor struct with the number of channels in the experimental sycl_ext_oneapi_bindless_images extension.
  • Enforced restrictions on the first argument of lambdas/functors passed to parallel_for(range) and parallel_for(nd_range).
  • Switched the sycl::vec implementation to its preview version, which uses a different storage type to fix several strict aliasing rule violations.
  • Restricted math operations available to vec<std::byte, N> to those applicable to std::byte.
  • Switched the sycl::exception implementation to its preview version.
  • Switched math built-ins implementation to use their preview version.
  • Switched bfloat16 implementation to use its preview version.
  • Switched sycl::nd_item implementation to use its preview version.
  • Enforced a restriction that a buffer's element type must be device copyable.
  • Restructured SYCL headers to exclude <cmath> and <complex>.
  • Dropped support for the SYCL_DEVICE_FILTER environment variable.
  • Updated the accessor::get_pointer interface to return global_ptr<value_type>, which can be const-qualified if the accessor data type is const-qualified or if the accessor is read-only.
  • Removed deprecated APIs related to sycl_ext_oneapi_free_function_queries.
  • Moved slm_allocator ESIMD APIs into the experimental namespace.
  • Removed the deprecated usm_system_allocator aspect.
  • Removed get_child_group API from the experimental sycl_ext_oneapi_root_group extension.
  • Simplified template arguments related to simd_view of many ESIMD APIs.
  • Removed ESIMD atomic_op::predec.
  • Dropped interfaces from revision 1 of the experimental sycl_ext_oneapi_group_sort extension.
  • Changed the return type of command_graph::begin_recording and command_graph::end_recording from void to bool in the experimental sycl_ext_oneapi_graph extension.

Breaking changes were also made to compiler flags:

  • Removed the deprecated -fsycl-link-huge-device-code, -fsycl-[add|link]-targets , -foffload-static-lib , -foffload-whole-static-lib , -fsycl-disable-range-rounding , -sycl-std flags.

 SYCL Known Issues

  • On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts. This is due to the release of the plugin DLLs racing against the release of static global variables, such as the default context.
  • The Intel Graphic Compiler's Vector Compute backend does not support certain optimization levels and often produces incorrect results or crashes. This issue directly affects ESIMD code. As a temporary workaround, optimize ESIMD code even in the affected mode.
  • When using the sycl_ext_oneapi_matrix extension, it is important for some devices to use the appropriate settings corresponding to the device that will run the program, particularly for matrix operations using half data type.
  • When using queue shortcut functions with in-order queues, dependencies between commands submitted to different queues may be ignored. A workaround is to explicitly call .wait(). This issue will be fixed in the next release. In below example, the seocnd kernel will start execution before the first completes its execution. 
// q1 long running task sycl::event e = q1.single_task([=](){ /* ... */ }); // q2 task q2.single_task(e, [=](){ /* ... */ });
  • C/C++ math built-ins can return incorrect results for some edge-case inputs when called from SYCL kernels.
  • To enhance performance on Intel® GPUs using the Unified Runtime Level Zero Adapter, support for driver-optimized in-order lists has been introduced in version 2025.0. However, when running workloads with sycl::property::queue::enable_profiling, some performance overhead from these lists is expected. If this overhead negatively impacts performance, it can be mitigated by disabling the driver in-order lists. To do so, set UR_L0_USE_DRIVER_INORDER_LISTS=0.
  • To ensure compatibility with the Intel® oneAPI DPC++ Compiler on Windows*, which requires OpenCL 3.0, it is essential to address potential issues caused by older versions of opencl.dll on your system. If an outdated opencl.dll is present in system directories or takes precedence in the library path, it may lead to failures, including SYCL-related issues and crashes in tools like Intel® VTune™ and Intel® Advisor when specific OpenCL 3.0 features are used. The recommended solution is to replace the old opencl.dll with the one installed in the DPC++ package. You can do this by copying the newer opencl.dll from $oneAPI_Install_Folder\compiler\latest\bin to your system folder. Be sure to back up the original opencl.dll in case it's needed for other applications.

  • sycl_ext_oneapi_free_function_kernels has limitations including:
    • free function kernels are only supported if defined at file scope
    • SYCL_EXTERNAL has to be used alongside SYCL_EXT_ONEAPI_FUNCTION_PROPERTY to define free function kernel
    • compiler won't emit any diagnostics if some restrictions from the extension specification are violated
    • arguments of a free function kernels cannot be composite data types like structs or SYCL classes like accessor
    • using -fsycl-dead-args-optimization (ON by default) can lead to failures
    • info::kernel::num_args won't return the right result for free function kernels

New OpenMP Features

  • Support for the -fopenmp-offload-mandatory compiler flag to omit creation of host-fallback code and emit a runtime error if OpenMP offload to the device fails.
  • Improved optimization report support for OpenMP constructs.
  • Enhanced conversion scheme of nested loop constructs to consider loop trip counts.
  • Updates to the declare variant for a dispatch construct to include GPUs with the Xe2 architecture when the match clause specifies device={arch(gen)}.
  • Support for the device_type(host|nohost|any) clause for the target construct.
  • Inclusion of the if clause for the teams construct.
  • Change of the map-type property to "default," allowing map-type modifiers to be specified without a map-type. For example, map(always : x) is equivalent to map(always, tofrom : x).
  • Support for the Intel extension ompx_sub_group_size clause for the target construct to set the SIMD width of the kernel.
  • Support for the Intel extension ompx_dyn_cgroup_mem clause for the target construct, allowing dynamic allocation in SLM for GPU offloading.
  • Extension of environment variables OMP_THREAD_LIMIT, OMP_TEAMS_THREAD_LIMIT, and OMP_NUM_THREADS to support abstract names. For example, OMP_THREAD_LIMIT=n_cores.
  • Extension of the syntax of the environment variable OMP_PLACES to support bound and stride for abstract names. For example, OMP_PLACES=threads(4:2).
  • Host runtime support for the environment variable OMP_AVAILABLE_DEVICES.
  • Extension of the environment variable OMP_DEFAULT_DEVICE to support device selection by traits.

Notable OpenMP Fixes

  • Fixed a bug where the dispatch construct’s device clause was not updating OpenMP’s default-device-var ICV.
  • Resolved an internal compiler error when the declare variant for a dispatch construct did not specify an adjust_args clause.
  • Fixed an optimization bug in OpenMP for and simd loops with large trip counts.
  • Corrected a regression where enclosing task constructs inside a teams construct triggered a compiler error message.
  • When thread_limit is specified for both target and teams, the compiler now correctly chooses their minimum instead of always using the one specified for target.
  • Fixed an internal compiler error related to the initialization of global variables allocated in GPU’s SLM.
  • Addressed a problem in offload runtime where the reference counts of variables mapped using declare mapper were not decremented correctly.
  • Fixed a GPU offload performance issue related to L1 cache being affected by temporary copies of reduction variables.
  • Resolved a bug where user-defined reduction variables were not properly constructed or destructed.

OpenMP Known Issues

  • Implicit barriers at the end of parallel regions do not act as synchronization points for the tasks associated with target nowait and dispatch nowait constructs. This may result in incorrect results or crashes. A workaround is to use #pragma omp taskwait at the end of parallel region to ensure synchronization of target/dispatch nowait regions, where it would otherwise have happened due to the presence of a parallel region’s implicit barrier.

Other Known Issues and Limitations

  • Visual Studio IDE Integration: Users will encounter an error while building the C++ project using 'Intel C++ Compiler 2025' for Win32 platform. Please note that Win32 platform is not supported with 'Intel C++ Compiler 2025' and project should be compiled for x64 platform only. If Win32 platform is selected, an error will be raised that ICX compiler not found.

Hardware Support:

  • -march=lunarlake
  • -march=graniterapids

Please check here for details about -march usage. 

Toolchain Support to Intel Platforms

Granite Rapids Granite Rapids-D Lunar Lake
GCC13.1 GCC13.1 GCC14.1
Binutils 2.40 Binutils 2.41 Binutils 2.42
Glibc2.37 Glibc2.37 Glibc2.39
LLVM 16.0 LLVM 17.0 LLVM 18.0
ICX 2023.1 ICX 2023.2 ICX 2024.0

C/C++ Standard

  • Intel® oneAPI DPC++/C++ Compiler version 2025.0 supports the C/C++ standards through the Clang 19 front end. 
  • Initiated support for C++2c, the next release of C++ after C++23, and C2y, the next release of C after C23
  • Finalized the implementation of “deducing this” (C++23)
  • Relaxed some constexpr restrictions (C++23)
  • Implemented the [[assume]] attribute (C++23)
  • Completed support for Concepts (C++20)
  • Added support for char8_t (C23)
  • Implemented the constexpr keyword for object declarations (C23)
  • Implemented #embed for embedding binary resources in source (C23)

System Requirements

Additional Documentation

Notices and Disclaimers

Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.

1