Intel® oneAPI DPC++/C++ Compiler Release Notes
Version: 2023.0
Published: 11/02/2020
Last Updated: 02/09/2023
This document summarizes new and changed product features and includes notes about features and problems not described in the product documentation.
Where to Find the Release
Please follow the steps to download the toolkit from the Base Toolkit Download page and follow the installation instructions to install.
oneAPI 2023.0, Compiler Release 2023.0
New Features and Improvements
- The compiler has moved to using C++17 as the default C++ language. If users want to use an older version, they have to specify it as a compiler option. For example, if users want to use C++14, they need to use
-std=c++14
. - Added support for FPGA IP authoring flow. It allows you to target your SYCL* code to generate standalone IP components on different targets and integrate it into a custom Intel® Quartus® Prime project. You can target your compilation to a supported Intel® FPGA device family or part number instead of a specific acceleration platform.
- FPGA optimization reports now support user-defined loop labels replacing the system-generated loop labels. For example:
LOOP1: for( int i = 0; i < 12; i++ ) { ... }
- Added support for the standalone Intel® oneAPI FPGA Reports tool.
- Added support for using latency controls with a stall-free loop in FPGA.
- Added support to view simulation waveforms in the simulators supported by FPGA.
- Added ability to enforce stateless memory accesses for ESIMD.
- Added support for
-fsycl-force-target
compiler option. - Added support for
-fsycl-link-huge-device-code
compiler option, which allows linking object files larger than 2GB. - Implemented group collective built-in functions for more integral types.
- Implemented SYCL 2020 callable device selectors.
- Implemented SYCL 2020 standalone device selectors.
- Added SYCL 2020 property interfaces for
local_accessor
,usm_allocator
,accessor
andhost_accessor
classes. - Added support for
fpga_simulator_selector
. - Added support for
local_accessor
. Deprecatedtarget::local
. - Added support for querying free device memory on Level Zero backend.
- Implemented
bfloat16
conversions from/tofloat
for host. - Added support for
ext::oneapi::property::queue::discard_events
to Level Zero PI plugin. - Added
lsc_atomic
support on ESIMD emulator. - Added
dpas
support on ESIMD emulator. - Added C++ API for
imf
libdevice built-ins. - Introduced predicates for ESIMD
lsc_block_store/load
. - Added experimental
set_kernel_properties
API anduse_double_grf
property for ESIMD. - Added "eager initialization" mode to Level Zero PI plugin. It might result in unnecessary work done by the plugin, but it ensures the fastest possible execution on hot and reportable paths.
- Implemented
group::get_linear_id(int)
method. - Ensured that a correct
errc
thrown for an unassociated placeholder accessor. - Removed dependency on OpenCL ICD Loader from the runtime.
- Added support for
ZEBIN
format to persistent caching mechanism. - Added identification mechanism for binaries in the newer
ZEBIN
format. - Switched to use
struct
information descriptors in accordance with SYCL 2020. Removed some deprecated information queries. - Updated
kernel_device_specific::max_sub_group_size
query to match SYCL 2020 spec. Deprecated the old variant. - Deprecated SYCL 1.2.1 device selectors.
- Improved error messages reported for unsupported device partitioning.
- Made
device
andplatform
default todefault_selector_v
. - Deprecated
address_space::constant_space
. - Marked
sycl::exception::has_context
asnoexcept
. - Improved range reduction performance on CPU.
- Made
sycl::exception
nothrow
copy constructible. - Marked
has_property
methods asnoexcept
. - Improved
sycl::event::get_profiling_info
exception message whenevent
is default constructed. - Added a diagnostic (in the form of
static_assert
) about kernel lambda size mismatch between host and device. - Updated
pipes
class to throw exceptions if used on the host. - Updated ESIMD Emulator PI plugin to report support for
cl_khr_fp64
extension. - Updated Level Zero plugin to prefer copy engine for memory read/write operations.
- Optimized some memory transfers.
- Enabled event caching in the Level Zero PI plugin.
- Optimized some reductions for
parallel_for
acceptingsycl::range
for discrete GPUs. - Added ability to use descendent devices of context members within that context. Not supported with the OpenCL backend yet.
- Limited allowed argument types for
rol/ror
ESIMD functions to better represent HW capabilities. - Implemented lazy mechanism of setting the context for default-constructed events.
- Improved performance for multi-dimensional accessors with multiple accesses in a kernel.
- Increased max
_Bitint
size to 4096 for FPGA target. - Removed deprecation message for
[[intel::disable_loop_pipelining]]
attribute. - Allowed
__builtin_assume_aligned
to be called from device code. - Improved link step performance when
per_kernel
device code split is used. - Added support for
SYCL_EXTERNAL
ondevice_global
variables. - Updated
__builtin_intel_fpga_mem
to accept more parameters. - Updated
ivdep
attribute to allowsafelen = 0
. - Improved linking with
sycl.lib
on Windows. - Implemented more diagnostics for incorrect
device_global
usages. - Improved library resolution for
libsycl.so
. - Improved diagnostics when linking with mismatched objects.
- Added a warning for floating-point size changes after implicit conversions.
- Made
invoke_simd
convert its argument to appropriate types.
Bug Fixes
- Removed deprecated
kernel::get_work_group_info
. - Removed deprecated
get_native
class method. - Removed support for
intel::fpga_pipeline
attribute. - Added
MAJOR_VERSION
to the name of the SYCL library on Windows. - Removed
sycl::program
class. - Removed
ext::oneapi::reduction
. - Removed deprecated
address_space
enum values. - Removed
event::get
method. - Removed
using namespace experimental
insideext::intel
. - Made intel-specific device info descriptors namespace-qualified.
- Removed deprecated
make_queue
API. - Aligned return types of
sycl::get_native
andinterop::get_native_mem
functions to be in conformance with SYCL 2020 spec. - Aligned
sycl::buffer_allocator
interface with SYCL 2020 spec. - Removed
cl
namespace fromsycl/sycl.hpp
header. - Dropped support for compiling SYCL in less than C++17 mode.
- Many other ABI-breaking changes resulting from internal refactoring.
- When compiling for FPGA, you can now use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA.
- When compiling for FPGA emulator flow on Windows system, an issue leading to the failure to launch device kernels has been fixed.
- Fixed a compilation issue where it wasn't possible to pass an initializer list for dependency events vector in
queue
shortcuts withoffset
parameter. - Fixed
sycl::get_pointer_device
throwing an exception when it passed a descendent device (sub-device) instead of a root device. - Fixed memory leak happening when kernel bundles are linked.
- Fixed USM free throwing an exception when it passed a context created for a descendent device.
- Fixed a compilation issue when using multi-dimensional
accessor
's subscript operator. - Fixed "definition with the same mangled name" error happening when using multiple buffer reductions in a kernel.
- Fixed a compilation issue with SYCL math built-ins when GCC < 11.1 is used as a host compiler.
- Fixed a compilation issue with SYCL math built-ins (such as
sycl::modf
, for example) not accepting pointers tohalf
. - Fixed an issue with
reduction
s when MSVC is used as the host compiler. - Fixed a compilation issue when fully specialized
sycl::span
is initialized from an array. - Fixed a crash in Level Zero PI plugins caused by specialization constants not being used on the device side, but present in a program.
- Fixed event leak in the Level Zero plugin.
- Fixed an issue with sub-sub-devices in the Level Zero plugin.
- Fixed an issue with incorrect
half
conversion on ESIMD emulator. - Fixed a compilation issue with
abs
ESIMD function. - Fixed some warnings coming out of SYCL headers when compiled in C++20 mode.
- Fixed a compilation issue when using multiple bitwise shift operations in ESIMD.
- Fixed a crash in Level Zero PI plugin, which occurs when the runtime tries to reset a command list that does not have a synchronization fence associated with it.
- Fixed a compilation issue with
sycl::get_native<sycl::backend::ext_oneapi_cuda>(sycl::device)
free function (#6653). - Fixed synchronization issue for explicit dependencies (
depends_on
usage) which is blocked by the host task or host accessor. - Fixed an issue in the Level Zero plugin, which could cause barriers not to be correctly applied for an entire queue.
- Fixed
accessor
so gdb can parse its template parameters correctly. - Fixed uses of common macro names in the implementation's header files.
- Fixed a performance regression related to the command list in the Level Zero backend.
- Fixed cleanup of temporary files produced by unbundling archives.
- Fixed optimizing out
device_global
variables with internal linkage. - Fixed an issue when compiling and linking with different optimization levels that could cause runtime errors.
- Fixed description of
-f[no-]sycl-unnamed-lambda
compiler option. - Fixed an issue when building SYCL programs in Debug mode with
Windows-Clang.cmake
. - Fixed an issue causing incorrect conversions involving unsigned types in ESIMD.
- Fixed a crash in applications containing a mix of unnamed ESIMD and non-ESIMD kernels.
- Fixed an issue when
op[]
was called with a typedef argument under gdb.
Known Issues and Limitations
- Customers might see "fatal error: 'iostream' file not found" when trying to compile a simple program with Intel® oneAPI DPC++/C++ Compiler on a Linux* machine if matching GNU g++ package is not installed. For further details, please check: fatal error: <C++ header> file not found with Intel® oneAPI DPC++/C++ Compiler.
- This release is not backward compatible with previous releases, which means that existing SYCL applications won't work with the newer runtime without re-compilation.
- There is a potential for incorrect results using OpenMP pragmas to offload to Intel GPUs where a parallel loop nested inside a TEAM construct is using a variable in a REDUCTION clause and the TEAM construct does not have the same REDUCTION clause. To avoid incorrect results, compile with
-mllvm -vpo-paropt-atomic-free-reduction-slm=true
to disable global memory buffers. - There is a known issue with using opt-reports with programs containing OpenMP loop constructs with "schedule(dynamic)", which may cause the compiler to emit an error. In this case, it is recommended that the user remove -qopt-report from their compilation.
- Intel® oneAPI DPC++ Compiler 2023.0.0 may not include all the latest functional and security updates. A new version of Intel® oneAPI DPC++/C++ Compiler is targeted to be released by March 2023 and will include additional functional and security updates. Customers should update to the latest version as it becomes available.
- If your design has nested loops and data is carried across the loops, you should run simulation to verify that the output is correct. In very rare circumstances, functional issue when you have nested loops and data is carried across the loops, the RTL generated by the compiler is functionally incorrect. If there are any errors in the simulation output, you might be affected by this issue. You can work around the issue by removing the loop nest either by using the loop-coalesce attribute, or manually changing the code. This issue is scheduled to be fixed in a future version of oneAPI.
- If you use SUSE15 U3, SUSE15 U3 and include <complex.h> header, you might run into an error: "expanded from macro 'I'". It is a problem with SYCL headers with <complex.h> which should define macro ‘I’ (https://en.cppreference.com/w/c/numeric/complex/I) but the identifier ‘I’ is widely used in SYCL headers. The reason why it appears on SUSE15 U3 but not other OS is because the provided C/C++ headers may vary between different OS.
-
When compiling with the following options, -fiopenmp -fopenmp-targets=spir64_gen -Xopenmp-target-backend "-device xxx" -fopenmp-device-code-split=per_kernel, i.e. Ahead of Time (AOT), and the offload kernel contains print statements, the program will stop with a runtime failure.
- SYCL built-in group algorithms may produce wrong results on CPU or FPGA emulator devices if all of the following conditions are met:
- The work-group size on the highest dimension is larger than the sub-group size
- The group algorithm is applied to the work-group
- The group algorithm produces the same result for all work items in the work group (e.g. all_of_group, any_of_group, group_broadcast, reduce_over_group)
- The group algorithm is used in a loop, and the result may change due to input changes. For example, the following kernel code would produce wrong results (the while loop may not exit or acc[gid] may not be set for all work items due to the known issue):
cgh.parallel_for( sycl::nd_range<1>(8, 8), [=](sycl::nd_item<1> item) [[intel::reqd_sub_group_size(4)]] { // work-group size > sub-group size bool predicate = true; int gid = item.get_global_id(0); while (sycl::all_of_group(item.get_group(), predicate)) { // applying all_of_group to the work-group // and all_of_group is expected to produce same result for all work-items in the group // and is used inside a loop acc[gid] = 1; predicate = false; // the result of all_of_group would change on the second loop iteration because predicate is changing } });
- SYCL 2020 barriers show worse performance than SYCL 1.2.1 do.
- It requires explicit linking against
lib/libsycl-fallback-cassert.o
orlib/libsycl-fallback-cassert.spv
when using fallback assert in a separate compilation flow. - Limit alignment of allocation requests at 64KB, which is the only alignment supported by Level Zero.
- On the following scenario on Level Zero backend:
- Kernel A, which uses buffer A, is submitted to queue A.
- Kernel B, which uses buffer B, is submitted to queue B.
queueA.wait()
.queueB.wait()
.
DPCPP runtime is used to treat unmap/write commands for buffer A/B as host dependencies (i.e., they were waited for before enqueueing any command that's dependent on them). This allowed the Level Zero plugin to detect that each queue is idle on steps 1/2 and submit the command list immediately. This is no longer the case since we started passing these dependencies in an event waitlist, and the Level Zero plugin attempts to batch these commands, so the execution of kernel B starts only on step 4. The workaround restores the old behavior in this case until this is resolved.
- User-defined functions with the name and signature matching those of any OpenCL C built-in function (i.e., an exact match of arguments, return type doesn't matter) can lead to Undefined Behavior.
- A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also open up the device and lock it to that process since the runtime needs to query the actual device to obtain that information.
- The format of the object files produced by the compiler can change between versions. The workaround is to rebuild the application.
- Using
sycl::program
/sycl::kernel_bundle
API to refer to a kernel defined in another translation unit leads to undefined behavior - Linkage errors with the following message:
error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
can happen when a SYCL application is built using MS Visual Studio 2019 version below 16.3.0 and the user specifies-std=c++14
or/std:c++14
. - Printing internal defines is not supported on Windows.
- The usage of new -ax (auto cpu dispatch) is not currently supported when building libraries with -fpic option.
- /Fo<file or dir/> flag no longer accepts directory arguments. Using this flag will result in an error message: clang-offload-bundler command failed with exit code 1. Fix is not available in this release.
- Having MESA OpenCL implementation, which provides no devices on a system, may cause incorrect device discovery. As a workaround, such an OpenCL implementation can be disabled by removing
/etc/OpenCL/vendor/mesa.icd
. - Compilation may fail on Windows in debug mode if a kernel uses
std::array
. This happens because debug version ofstd::array
in Microsoft STL C++ headers calls functions that are illegal for the device code. As a workaround, the following can be done:- Dump compiler pipeline execution strings by passing
-###
option to the compiler. The compiler will print the internal execution strings of compilation tools. The actual compilation will not happen. - Modify the (usually) first execution string (it should have
-fsycl-is-device
option) by adding-D_CONTAINER_DEBUG_LEVEL=0 -D_ITERATOR_DEBUG_LEVEL=0
options to the end of the string. Execute all string one by one.
- Dump compiler pipeline execution strings by passing
-fsycl-dead-args-optimization
cannot eliminate the offset of the accessor even though it is created with no offset specified.- SYCL 2020 barriers show worse performance than SYCL 1.2.1 do.
- When using fallback assert in a separate compilation flow, it requires explicit linking against
lib/libsycl-fallback-cassert.o
orlib/libsycl-fallback-cassert.spv.
- Limit alignment of allocation requests at 64KB, which is the only alignment supported by Level Zero.
- On the following scenario on Level Zero backend:
- Kernel A, which uses buffer A, is submitted to queue A.
- Kernel B, which uses buffer B, is submitted to queue B.
queueA.wait()
.queueB.wait()
. DPCPP runtime is used to treat unmap/write commands for buffer A/B as host dependencies (i.e. they were waited for before enqueueing any command that's dependent on them). This allowed the Level Zero plugin to detect that each queue is idle on steps 1/2 and submit the command list immediately. This is no longer the case since we started passing these dependencies in an event waitlist and the Level Zero plugin attempts to batch these commands, so the execution of kernel B starts only on step 4. The workaround restores the old behavior in this case until this is resolved.
- User-defined functions with the name and signature matching those of any OpenCL C built-in function (i.e. an exact match of arguments, return type doesn't matter) can lead to Undefined Behavior.
- A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through
device.get_info<>()
also open up the device and lock it to that process since the runtime needs to query the actual device to obtain that information. - The format of the object files produced by the compiler can change between versions. The workaround is to rebuild the application.
- Using
sycl::kernel_bundle
API to refer to a kernel defined in another translation unit leads to undefined behavior - Linkage errors with the following message:
error LNK2005: "bool const std::_Is_integral<bool>" (??$_Is_integral@_N@std@@3_NB) already defined
can happen when a SYCL application is built using MS Visual Studio 2019 version below 16.3.0 and user specifies-std=c++14
or/std:c++14
. - Printing internal defines isn't supported on Windows.
- The compile times can be significant when compiling for FPGA and using a read-only accessor for a very wide struct. As a workaround, use a read-write accessor instead to address long compile times.
- When you perform FPGA compile and link stages with a single dpcpp command (for example, dpcpp -fintelfpga <other arguments> -Xshardware src/kernel.cpp), if the source code is not located in the current directory, you might observe that the source code browser is missing in the generated FPGA optimization reports. To work around this issue, compile and link the executable in separate stages, as follows:
icpx -fsycl -fintelfpga <other arguments> -Xshardware -c src/kernel.cpp -o kernel.o icpx -fsycl -fintelfpga <other arguments> -Xshardware -kernel.o
- When compiling for FPGA, the debug support on Windows is unavailable when using device-side libraries. To avoid this issue, do not run a debugger on the emulator platform on Windows.
-
The modulefiles-setup.sh script is not supported for FPGA in this release. As a workaround, use the setvars.sh script.
-
On Windows, compiling FPGA designs in a directory with a long path name might fail, and you might see the following error:
dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation)
NMAKE : fatal error U1077: ‘…\oneAPI\compiler\latest\windows\bin\dpcpp.EXE' : return code '0x1'
As a workaround, either compile the design in a directory with a short path name or reset TMP and TEMP environment variables to point to a shorter path (for example, C:\temp). -
When using the atomic_fence function for FPGA, the memory_scope::system constraint is not supported. The broadest scope supported is the memory_scope::device constraint. There is no workaround available for this currently.
-
When compiling for FPGA, the compiler might produce a different intermediate representation (IR) on Windows than Linux. Misaligned structs cause this issue. As a result, some designs that compile with an II=1 on Linux might have, for example, II=10 on Windows. As a workaround, force an alignment on the misaligned structs, as shown in the following example:
//Code with misaligned struct struct Item { bool valid; int value1; unsigned char value2; }; //Forced alignment of the struct struct Item { bool valid; bool __empty__[3]; int value1; unsigned char value2; unsigned char __empty2__[3]; }
-
The FPGA emulator does not recognize different Avalon interfaces when defining a host pipe. This can lead to unexpected behavior when specifying the Avalon interface type. There is no known workaround for this issue.
-
When compiling for FPGA and trying to reduce the II of the II-critical path, the scheduler may return an incorrect II-critical path. This means the compiler reduces the II of the wrong path, and the II goal is not achieved. You might observe this issue only when there are multiple negative cycles in the LSU's critical path. There is no known workaround for this issue. However, your design’s functionality stays unaffected. Performance (QoR) might get degraded slightly.
-
When simulating FPGA designs, a design with a host channel might pose two signal mismatch errors—dataBitsPerSymbol and firstSymbolInHigh OrderBits:
-
dataBitsPerSymbol error can occur in the FPGA IP authoring flow when you specify a dataBitsPerSymbol value that is not equal to 8. As a workaround, set the dataBitsPerSymbol to 8.
-
firstSymbolInHigh OrderBits error can occur in the FPGA IP authoring flow when you set firstSymbolInHigh OrderBits to false. As a workaround, set the firstSymbolInHigh OrderBits to true.
-
-
With the FPGA IP Authoring flow, you can intuitively integrate your design into the Platform Designer by copying the generated .prj folder into your Intel® Quartus® Prime project directory. The Platform Designer detects the project automatically. However, there is a known issue with the generated hw.tcl file, which is not mapping the signals correctly. To work around this issue, follow these steps on both Linux and Windows systems:
$ cd <kernel_name>.prj $ python <kernel-name>_di_hw_tcl_adjustment_script.py
-
Add python to your PATH environment variable to run python from your command line.
-
Execute the following commands to run the <kernel-name>_di_hw_tcl_adjustment_script.py python script generated in your .prj directory before integrating your IP authoring kernel into the Platform Designer:
-
-
When compiling an FPGA kernel that calls the sycl::ext::oneapi::experimental::printf() function, the compiler issues the following warning message:
compiler warning: argument 'llvm_fpga_printf_buffer_start' on component '<your kernel name>' is never used by the component. Note that the compiler may optimize it away.
There is no known workaround for this issue. However, you can ignore this warning since it does not impact the kernel’s functionality. -
When compiling for FPGA, if your SYCL code contains the std::popcount function inside a fixed-size loop (bit-widths not in 8, 16, 32, or 64), it gets mapped directly into llvm.ctpop, and the compilation fails with an error message. There is no known workaround for this issue. However, Intel recommends avoiding the use of the std::popcount function inside loops.
-
On the Windows system, the standalone Intel® oneAPI FPGA Reports Tool application might fail to run on a mapped network drive and display "GPU process launch failed" error message on the console. As a workaround for this issue, copy the Intel® oneAPI FPGA Reports Tool application from the mapped network drive to your local computer and run it locally.
-
The Intel FPGA IP authoring encryption flow is not fully supported on Windows systems.
-
In the Intel FPGA IP authoring flow, the fpga_tools::UnrolledLoop utility defined in the unrolled_loop.hpp code sample header file does not support the kernel argument interface macros (mmhost, conduit_mmhost, and register_map_mmhost). For example:
fpga_tools::UnrolledLoop<ROWS>([&](auto row) { #pragma unroll for (int i = COLS - 1; i > 0; i--) { shift_reg[row][i] = shift_reg[row][i - 1]; } shift_reg[row][0] = MA[col * ROWS + row]; });
As a workaround, use the #pragma unroll before a for loop, as shown in the following example:
#pragma unroll for (int row = 0; row < ROWS; row++) { #pragma unroll for (int i = COLS - 1; i > 0; i--) { shift_reg[row][i] = shift_reg[row][i - 1]; } shift_reg[row][0] = MA[col * ROWS + row]; }
System Requirements
Additional Documentation
- Get Started with the Intel® oneAPI Toolkits for Linux*
- Get Started with the Intel® oneAPI Toolkits for Windows*
- OneAPI Versioning Schema based on Semantic Versioning
- Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference
- SYCL* 2020 Specification Features and DPC++ Language Extensions Supported
-
OpenMP* Features and Extensions Supported in Intel® oneAPI DPC++/C++ Compiler
Previous oneAPI Releases
Notices and Disclaimers
Intel technologies may require enabled hardware, software, or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.