Where to Find the Release
Please follow the steps to download the toolkit from the Intel® oneAPI Base Toolkit Download, and follow the installation instructions.
Overview
The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C++ Compiler and provides high-productivity APIs aimed to minimize programming efforts of C++ developers creating efficient heterogeneous applications.
2022.10.0
Deprecation Notices
The ONEDPL_USE_AOT_COMPILATION and ONEDPL_AOT_ARCH CMake options are deprecated and will be removed in a future release. Please use the relevant compiler flags to enable this feature.
New Features
- Added parallel range algorithms in namespace oneapi::dpl::ranges:set_intersection,set_union,set_difference,set_symmetric_difference,includes,unique,unique_copy,destroy,uninitialized_fill,uninitialized_move,uninitialized_copy,uninitialized_value_construct,uninitialized_default_construct,reverse,reverse_copy,swap_ranges. These algorithms operate with C++20 random access ranges.
- Improved performance of gpu::inclusive_scankernel template and added support for binary operator and type combinations which do not have a SYCL known identity.
- Improved performance of inclusive_scan_by_segment,exclusive_scan_by_segment,set_union,set_difference,set_intersection, andset_symmetric_differencewhen using device policies.
- Improved performance of search operations (e.g., find,all_of,equal,search, etc.),is_heapandis_heap_untilalgorithms on Intel® Arc™ B-series GPU devices.
Fixed Issues
- Removed requirement of GPU double precision support to use set_union,set_difference,set_intersection, andset_symmetric_differenceon Windows operating systems.
- Removed default-constructible requirements from the value type for reduceandtransform_reducealgorithms, as well as copy-constructible requirements when these algorithms are used with a native ("host") policy.
- Fixed an issue with ranges::mergewhen projections of the two input ranges were not the same.
- Fixed equalreturning afalsefor empty input sequences; now it returnstrue.
- Fixed a compilation error SYCL kernel cannot use exceptions occurring with libstdc++ version 10 when calling adjacent_find,is_sortedandis_sorted_untilrange algorithms with device policies.
- Fixed an issue with PSTL_USE_NONTEMPORAL_STORESmacro having no effect.
- Fixed a bug where uniquecalled with a device policy returned an incorrect result iterator.
- Fixed a bug in exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exlusive_scan_by_segment, andinclusive_scan_by_segmentalgorithms when using device policies with different input and output value types.
- Fixed a bug in return value types of minmax_elementandmismatchrange algorithms.
- Fixed compile errors in set_unionandset_symmetric_differencewhen using device policies with different second-input and output value types.
Known Issues and Limitations
New in This Release
- copy_if,- unique_copy,- set_union,- set_intersection,- set_difference,- set_symmetric_differencerange algorithms require the output range to have sufficient size to hold all resulting elements.
Existing Issues
See oneDPL Guide for other restrictions and known limitations.
- histogramalgorithm requires the output value type to be an integral type no larger than four bytes when used with a device policy on hardware that does not support 64-bit atomic operations.
- For transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined.
- Incorrect results may be produced by exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler 2024.1 or earlier with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead.
References
- the oneDPL Specification
- oneDPL Guide
- Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes
- Restrictions and Known Limitations
- Tested Standard C++ API
- Macros
- 2022.0 Changes
- sycl device copyable
- oneAPI DPC++ Library Manual Migration Guide
2022.9.0
New Features
- 
	Added parallel range algorithms in namespace oneapi::dpl::ranges : fill,move,replace,replace_if,remove,remove_if,mismatch,minmax_element,min,max,find_first_of,find_end,is_sorted_until. These algorithms operate with C++20 random access ranges.
- Improved performance of set operation algorithms when using device policies: set_union,set_difference,set_intersection,set_symmetric_difference.
- Improved performance of copy,fill,for_each,replace,reverse,rotate,transformand 30+ other algorithms with device policies on GPUs when usingstd::reverse_iterator.
- Added ADL-based customization point is_onedpl_indirectly_device_accessible, which can be used to mark iterator types as indirectly device accessible. Added public traitoneapi::dpl::is_directly_device_accessible[_v]to query if types are indirectly device accessible.
Fixed Issues
- Eliminated runtime exceptions encountered when compiling code that called inclusive_scan,copy_if,partition,unique,reduce_by_segment, and related algorithms with device policies using the open source oneAPI DPC++ Compiler without specifying an optimization flag.
- Fixed a compilation error in reduce_by_segmentregarding return type deduction when called with a device policy.
- Eliminated multiple compile time warnings throughout the library.
Known Issues and Limitations
New in This Release
- The set_intersection, set_difference, set_symmetric_difference, and set_union algorithms with a device policy require GPUs with double-precision support on Windows, regardless of the value type of the input sequences.
Existing Issues
- Incorrect results may be observed when calling sortwith a device policy on Intel® Arc™ graphics 140V with data sizes of 4-8 million elements.
- histogramalgorithm requires the output value type to be an integral type no larger than four bytes when used with a device policy on hardware that does not support 64-bit atomic operations.
- For transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined.
- Incorrect results may be produced by exclusive_scan,inclusive_scan, transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segment, withunseqorpar_unseq, policy when compiled by Intel® oneAPI DPC++/C++ Compiler 2024.1 or earlier with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead.
- With libstdc++ version 10, the compilation error SYCL kernel cannot use exceptions occurs when calling the range-based adjacent_find,is_sortedoris_sorted_untilalgorithms with device policies.
- The range-based count_ifmay produce incorrect results on Intel® Data Center GPU Max Series when the driver version is "Rolling 2507.12" and newer.
2022.8.0
New Features
- 
	Added support of host policies for histogramalgorithms.
- Added support for an undersized output range in the range-based mergealgorithm.
- Improved performance of the mergeand sorting algorithms (sort,stable_sort,sort_by_key,stable_sort_by_key) that rely on Merge sort [1], with device policies for large data sizes.
- Improved performance of copy,fill,for_each,replace,reverse,rotate,transformand 30+ other algorithms with device policies on GPUs.
- Improved oneDPL use with SYCL implementations other than Intel® oneAPI DPC++/C++ Compiler.
Fixed Issues
- Fixed an issue with drop_viewin the experimental range-based API.
- Fixed compilation errors in find_ifandfind_if_notwith device policies where the user provided predicate is device copyable but not trivially copyable.
- Fixed incorrect results or synchronous SYCL exceptions for several algorithms when compiled with -O0and executed on a GPU device.
- Fixed an issue preventing inclusion of the <numeric>header after<execution>and<algorithm>headers.
- Fixed several issues in the sort,stable_sort,sort_by_keyandstable_sort_by_keyalgorithms that:- Allows the use of non-trivially-copyable comparators.
- Eliminates duplicate kernel names.
- Resolves incorrect results on devices with sub-group sizes smaller than four.
- Resolved synchronization errors that were seen on Intel® Arc™ B-series GPU devices.
 
Known Issues and Limitations
New in This Release
- Incorrect results may be observed when calling sortwith a device policy on Intel® Arc™ graphics 140V with data sizes of 4-8 million elements.
- sort,- stable_sort,- sort_by_keyand- stable_sort_by_keyalgorithms fail to compile when using Clang 17 and earlier versions, as well as compilers based on these versions, such as Intel® oneAPI DPC++/C++ Compiler 2023.2.0.
- When compiling code that uses device policies with the open source oneAPI DPC++ Compiler (clang++ driver), synchronous SYCL runtime exceptions regarding unfound kernels may be encountered unless an optimization flag is specified (for example -O1) as opposed to relying on the compiler's default optimization level.
Existing Issues
- histogramrequires the output value type to be an integral type no larger than 4 bytes when used with an FPGA policy.
- histogramalgorithm requires the output value type to be an integral type no larger than four bytes when used with an FPGA policy.
- histogrammay provide incorrect results with device policies in a program built with- -O0option.
- Compilation issues may be encountered when passing zip iterators to exclusive_scan_by_segmenton Windows.
- For transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined.
- Incorrect results may be produced by exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead.
2022.7.0
New Features
- 
	Improved performance of the following algorithms with device policies: adjacent_find,all_of,any_of,copy_if,exclusive_scan,equal,find,find_if,find_end,find_first_of,find_if_not,inclusive_scan,includes,is_heap,is_heap_until,is_partitioned,is_sorted,is_sorted_until,lexicographical_compare,max_element,min_element,minmax_element,mismatch,none_of,partition,partition_copy,reduce,remove,remove_copy,remove_copy_if,remove_if,search,search_n,stable_partition,transform_exclusive_scan,transform_inclusive_scan,unique, andunique_copy.
- 
	Improved performance of sort,stable_sort, andsort_by_keyalgorithms with device policies when using Merge sort [#fnote1]_.
- 
	Added stable_sort_by_keyalgorithm innamespace oneapi::dpl.
- 
	Introduced parallel range algorithms in namespace oneapi::dpl::ranges:all_of,any_of,none_of,for_each,find,find_if,find_if_not,adjacent_find,search,search_n,transform,sort,stable_sort,is_sorted,merge,count,count_if,equal,copy,copy_if,min_element,max_element. These algorithms work with C++20 random access ranges and views while also taking an execution policy, similar to other oneDPL algorithms.
- 
	Added support for operators ==,!=,<<, and>>for RNG engines and distributions.
- 
	Introduced experimental support for the Philox RNG engine in namespace oneapi::dpl::experimental.
- 
	Added the <oneapi/dpl/version>header containing oneDPL version macros and new feature testing macros.
Fixed Issues
- Fixed unused variable and unused type warnings.
- Resolved memory leaks when using sortandstable_sortalgorithms with the oneTBB backend.
- Fixed a build error for oneapi::dpl::beginandoneapi::dpl::endfunctions used with the Microsoft* Visual C++ standard library and with C++20.
- Reordered template parameters of the histogramalgorithm to match its function parameter order. We recommend removing explicit template parameter specifications and instead adding explicit type conversions for function arguments.
- The gpu::esimd::radix_sortandgpu::esimd::radix_sort_by_keykernel templates now throwstd::bad_allocif they fail to allocate global memory.
- Fixed a potential hang with gpu::esimd::radix_sortandgpu::esimd::radix_sort_by_keykernel templates.
- Corrected the documentation for the sort_by_keyalgorithm, previously described as stable, which can be unstable for some execution policies. Usestable_sort_by_keyfor guaranteed stability.
- Fixed an error when calling sortwith device execution policies on CUDA devices.
- Allowed passing C++20 random access iterators to oneDPL algorithms.
- Fixed issues with the initialization of SYCL queues in predefined device execution policies, which have been updated to be immutable (const) objects.
Known Issues and Limitations
New in This Release
- histogrammay provide incorrect results with device policies when built with the- -O0option.
- Including <oneapi/dpl/dynamic_selection>before<oneapi/dpl/random>may result in compilation errors. Include<oneapi/dpl/random>first as a workaround.
- Incorrect results may occur when using oneapi::dpl::experimental::philox_enginewith undefined template parameters and withword_sizevalues other than 64 and 32.
- Some algorithms (e.g., exclusive_scan,inclusive_scan,transform_exclusive_scan,copy_if) may produce incorrect results or trigger synchronous SYCL exceptions when built with-O0on GPU devices.
- For certain algorithms (transform_inclusive_scan,inclusive_scan), the value type of the input sequence must be convertible to the type of the initial element.
- Some algorithms (copy_if,remove,partition_copy,unique) with device policies may exceed the C++ standard's requirements for predicate applications, applying the predicate or equality operatorO(n)times.
- Some algorithms (adjacent_find,find_if,search, etc.) may cause segmentation faults when used with device execution policies on a CPU device, particularly when built on Linux with Intel® oneAPI DPC++/C++ Compiler 2025.0.0 and-O0 -gcompiler options.
Existing Issues
- histogramrequires the output value type to be an integral type no larger than 4 bytes when used with an FPGA policy.
- Compilation issues may arise when passing zip iterators to exclusive_scan_by_segmenton Windows.
- Algorithms such as sort,stable_sort, andsort_by_keymay work incorrectly or cause segmentation faults when used with device policies on a CPU device and built with-O0 -g. To avoid the issue, pass the-fsycl-device-code-split=per_kerneloption to the compiler.
- Some scan algorithms (exclusive_scan,reduce_by_segment) may produce incorrect results when compiled with certain OpenMP options on Linux.
- 64-bit reductions with reduceandtransform_reducemay produce incorrect results on GPU devices. Workaround: define theONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTIONmacro.
- std::tupleand- std::paircannot be used with SYCL buffers for data transfer.
- The oneapi::dpl::experimental::ranges::reversealgorithm is unavailable with the-fno-sycl-unnamed-lambdaoption.
- STL algorithms (e.g., std::for_each) do not compile with the debug version of the Microsoft* Visual C++ standard library.
2022.6.0
News
- oneAPI DPC++ Library Manual Migration Guide to simplify the migration of Thrust* and CUB* APIs from CUDA*.
- radix_sortand- radix_sort_by_keykernel templates were moved into- oneapi::dpl::experimental::kt::gpu::esimdnamespace. The former- oneapi::dpl::experimental::kt::esimdnamespace is deprecated and will be removed in a future release.
- The for_loop,for_loop_strided,for_loop_n,for_loop_n_stridedalgorithms innamespace oneapi::dpl::experimentalare enforced to fail with device execution policies.
New Features
- Added experimental inclusive_scankernel template algorithm residing in theoneapi::dpl::experimental::kt::gpunamespace.
- radix_sortand- radix_sort_by_keykernel templates are extended with overloads for out-of-place sorting. These overloads preserve the input sequence and sort data into the user provided output sequence.
- Improved performance of the reduce,min_element,max_element,minmax_element,is_partitioned,lexicographical_compare,binary_search,lower_bound, andupper_boundalgorithms with device policies.
- sort,- stable_sort,- sort_by_keyalgorithms now use Radix sort1 for sorting- sycl::halfelements compared with- std::lessor- std::greater.
Fixed Issues
- Fixed compilation errors when using reduce,min_element,max_element,minmax_element,is_partitioned, andlexicographical_comparewith Intel oneAPI DPC++/C++ compiler 2023.0 and earlier.
- Fixed possible data races in the following algorithms used with device execution policies: remove_if,unique,inplace_merge,stable_partition,partial_sort_copy,rotate.
- Fixed excessive copying of data in std::vectorallocated with a USM allocator for standard library implementations which have allocator information in thestd::vector::iteratortype.
- Fixed an issue where checking std::is_default_constructiblefortransform_iteratorwith a functor that is not default-constructible could cause a build error or an incorrect result.
- Fixed handling of sycl device copyable for internal and public oneDPL types.
- Fixed handling of std::reverse_iteratoras input to oneDPL algorithms using a device policy.
- Fixed set_intersectionto always copy from the first input sequence to the output, where previously some calls would copy from the second input sequence.
- Fixed compilation errors when using oneapi::dpl::zip_iteratorwith the oneTBB backend and C++20.
Known Issues and Limitations
New in This Release
- histogramalgorithm requires the output value type to be an integral type no larger than 4 bytes when used with an FPGA policy.
Existing Issues
See oneDPL Guide for other restrictions and known limitations.
- When compiled with -fsycl-pstl-offloadoption of Intel oneAPI DPC++/C++ compiler and withlibstdc++version 8 orlibc++,oneapi::dpl::execution::par_unseqoffloads standard parallel algorithms to the SYCL device similarly tostd::execution::par_unseqin accordance with the-fsycl-pstl-offloadoption value.
- When using the dpl modulefile to initialize the user's environment and compiling with -fsycl-pstl-offloadoption of Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory containing libpstloffload.so not being included in the search path. Use the env/vars.sh to configure the working environment to avoid the issue.
- Compilation issues may be encountered when passing zip iterators to exclusive_scan_by_segmenton Windows.
- For transform_exclusive_scanandexclusive_scanto run in-place (that is, with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined.
- sort,- stable_sort,- sort_by_key,- partial_sort_copyalgorithms may work incorrectly or cause a segmentation fault when used a DPC++ execution policy for CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler and- -O0 -gcompiler options. To avoid the issue, pass- -fsycl-device-code-split=per_kerneloption to the compiler.
- Incorrect results may be produced by exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass-fopenmpor-fopenmp-simdoption instead.
- Incorrect results may be produced by reduce,reduce_by_segment, andtransform_reducewith 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer and executed on GPU devices. For a workaround, define theONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTIONmacro to1before including oneDPL header files.
- std::tuple,- std::paircannot be used with SYCL buffers to transfer data between host and device.
- std::arraycannot be swapped in DPC++ kernels with- std::swapfunction or- swapmember function in the Microsoft* Visual C++ standard library.
- The oneapi::dpl::experimental::ranges::reversealgorithm is not available with-fno-sycl-unnamed-lambdaoption.
- STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
The sorting algorithms in oneDPL use Radix sort for arithmetic data types and sycl::half (since oneDPL 2022.6) compared with std::less or std::greater, otherwise Merge sort.
2022.5.0
New Features
- Added new histogramalgorithms for generating a histogram from an input sequence into an output sequence representing either equally spaced or user-defined bins.These algorithms are currently only available for device execution policies.
- Supported zip_iteratorfor thetransformalgorithm.
Fixed Issues
- Fixed handling of permutation_iteratoras input to oneDPL algorithms for a variety of source iterator and permutation types which caused issues.
- Fixed zip_iteratorto be SYCL device copyable for trivially copyable source iterator types.
- Added a workaround for reduction algorithm failures with 64-bit data types. Define the ONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTIONmacro to1before including oneDPL header files.
Known Issues and Limitations
New in This Release
- Crashes or incorrect results may occur when using oneapi::dpl::reverse_iteratororstd::reverse_iteratoras input to oneDPL algorithms with device execution policies.
Existing Issues
See oneDPL Guide for other restrictions and known limitations.
- When compiled with the -fsycl-pstl-offloadoption of the Intel oneAPI DPC++/C++ compiler and with libstdc++ version 8 or libc++,oneapi::dpl::execution::par_unseqoffloads standard parallel algorithms to the SYCL device similarly tostd::execution::par_unseqin accordance with the-fsycl-pstl-offloadoption value.
- When using the dpl modulefile to initialize the user's environment and compiling with the -fsycl-pstl-offloadoption of the Intel® oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory containinglibpstloffload.sonot being included in the search path. Useenv/vars.shto configure the working environment to avoid the issue.
- Compilation issues may be encountered when passing zip_iteratorstoexclusive_scan_by_segmenton Windows.
- Incorrect results may be produced by set_intersectionwith a DPC++ execution policy, where elements are copied from the second input range rather than the first input range.
- For transform_exclusive_scanandexclusive_scanto run in-place (with the same data used for both input and destination) and with an execution policy ofunseqorpar_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined.
- sort,- stable_sort,- sort_by_key, and- partial_sort_copyalgorithms may work incorrectly or cause a segmentation fault when used with a DPC++ execution policy for a CPU device, and built on Linux with the Intel® oneAPI DPC++/C++ Compiler and- -O0 -gcompiler options. To avoid the issue, pass the- -fsycl-device-code-split=per_kerneloption to the compiler.
- Incorrect results may be produced by exclusive_scan,inclusive_scan,transform_exclusive_scan,transform_inclusive_scan,exclusive_scan_by_segment,inclusive_scan_by_segment,reduce_by_segmentwithunseqorpar_unseqpolicy when compiled by Intel® oneAPI DPC++/C++ Compiler with-fiopenmp,-fiopenmp-simd,-qopenmp,-qopenmp-simdoptions on Linux. To avoid the issue, pass the-fopenmpor-fopenmp-simdoption instead.
- Incorrect results may be produced by reduce,reduce_by_segment, andtransform_reducewith 64-bit data types when compiled by Intel® oneAPI DPC++/C++ Compiler versions 2021.3 and newer and executed on GPU devices. For a workaround, define theONEDPL_WORKAROUND_FOR_IGPU_64BIT_REDUCTIONmacro to1before including oneDPL header files.
- std::tupleand- std::paircannot be used with SYCL buffers to transfer data between host and device.
- std::arraycannot be swapped in DPC++ kernels with- std::swapfunction or- swapmember function in the Microsoft* Visual C++ standard library.
- The oneapi::dpl::experimental::ranges::reversealgorithm is not available with the-fno-sycl-unnamed-lambdaoption.
- STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
2022.4.0 (available on oneDPL GitHub)
New Features
- 
	Added experimental radix_sort and radix_sort_by_key algorithms residing in the oneapi::dpl::experimental::kt::esimd namespace. These algorithms are first in the family of _kernel templates_ that allow configuring a variety of parameter including the number of elements to process by a work item, and the size of a workgroup. The algorithms only work with Intel® Data Center GPU Max Series. 
- 
	Added new transform_if algorithm for applying a transform function conditionally based on a predicate, with overloads provided for one and two input sequences that use correspondingly unary and binary operations and predicates. 
- 
	Optimizations used with Intel® oneAPI DPC++/C++ Compiler are expanded to the open source oneAPI DPC++ compiler. 
Known Issues and Limitations
New in This Release
- 
	esimd::radix_sort and esimd::radix_sort_by_key kernel templates fail to compile when a program is built with -g, -O0, -O1 compiler options. 
- 
	esimd::radix_sort_by_key kernel template produces wrong results with the following combinations of kernel_param and types of keys and values:sizeof(key_type) + sizeof(val_type) == 12, kernel_param::workgroup_size == 64, and kernel_param::data_per_workitem == 96sizeof(key_type) + sizeof(val_type) == 16, kernel_param::workgroup_size == 64, and kernel_param::data_per_workitem == 64 
2022.3.0
New Features
- 
	Added an experimental feature to dynamically select an execution context, e.g., a SYCL queue. The feature provides selection functions such as such as select, submit and submit_and_wait, and several selection policies: fixed_resource_policy, round_robin_policy, dynamic_load_policy, and auto_tune_policy. 
- 
	unseq and par_unseq policies now enable vectorization for Intel oneAPI DPC++/C++ Compiler. 
- 
	Added support for passing zip iterators as segment value data in reduce_by_segment, exclusive_scan_by_segment, and inclusive_scan_by_segment. 
- 
	Improved performance of merge, sort, stable_sort, and sort_by_key, reduce, min_element, max_element, minmax_element, is_partitioned, and lexicographical_compare algorithms with DPC++ execution policies. 
Fixed Issues
- Fixed the reduce_async function to not ignore the provided binary operation.
Known Issues and Limitations
New in This Release
- 
	When compiled with -fsycl-pstl-offload option of Intel oneAPI DPC++/C++ compiler and with libstdc++ version 8 or libc++, oneapi::dpl::execution::par_unseq offloads standard parallel algorithms to the SYCL device similarly to std::execution::par_unseq in accordance with the -fsycl-pstl-offload option value. 
- 
	When using the dpl modulefile to initialize the user's environment and compiling with -fsycl-pstl-offload option of Intel oneAPI DPC++/C++ compiler, a linking issue or program crash may be encountered due to the directory containing libpstloffload.so not being included in the search path. Users need to use the env/vars.sh to configure the working environment to avoid the issue. 
- 
	Compilation issues may be encountered when passing zip iterators to exclusive_scan_by_segment on Windows. 
- 
	Incorrect results may be produced by set_intersection with a DPC++ execution policy, where elements are copied from the second input range rather than the first input range. 
- 
	For transform_exclusive_scan and exclusive_scan to run in-place (that is, with the same data used for both input and destination) and with an execution policy of unseq or par_unseq, it is required that the provided input and destination iterators are equality comparable. Furthermore, the equality comparison of the input and destination iterator must evaluate to true. If these conditions are not met, the result of these algorithm calls is undefined. 
- 
	sort, stable_sort, stable_sort_by_key, partial_sort_copy algorithms may work incorrectly or cause a segmentation fault when used a DPC++ execution policy for CPU device, and built on Linux with Intel® oneAPI DPC++/C++ Compiler and -O0 -g compiler options. To avoid the issue, pass -fsycl-device-code-split=per_kernel option to the compiler. 
- 
	Incorrect results may be produced by exclusive_scan,inclusive_scan, transform_exclusive_scan, transform_inclusive_scan, exclusive_scan_by_segment, inclusive_scan_by_segment, reduce_by_segment with unseq or par_unseq policy when compiled by Intel® oneAPI DPC++/C++ Compiler with -fiopenmp, -fiopenmp-simd, -qopenmp, -qopenmp-simd options on Linux. To avoid the issue, pass -fopenmp or -fopenmp-simd option instead. 
- 
	Incorrect results may be produced by reduce and transform_reduce with 64-bit types and std::multiplies, sycl::multiplies operations when compiled by Intel® C++ Compiler 2021.3 and newer and executed on GPU devices. 
Existing Issues
See oneDPL Guide for other restrictions and known limitations.
- 
	std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device. 
- 
	std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library. 
- 
	The oneapi::dpl::experimental::ranges::reverse algorithm is not available with -fno-sycl-unnamed-lambda option. 
- 
	STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library. 
2022.2.0
New Features
- Added sort_by_key algorithm for key-value sorting.
- Improved performance of the reduce, min_element, max_element, minmax_element, is_partitioned, and lexicographical_compare algorithms with DPC++ execution policies.
- Improved performance of the reduce_by_segment, inclusive_scan_by_segment, and exclusive_scan_by_segment algorithms for binary operators with known identities when using DPC++ execution policies.
- Improved sort algorithm performance for the arithmetic data types with std::less or std::greater comparison operator and DPC++ execution policies.
- Added value_type to all views in oneapi::dpl::__ranges.
- Extended oneapi::dpl::experimental::ranges::sort to support projections applied to the range elements prior to comparison.
Fixed Issues
- The minimally required CMake version is raised to 3.11 on Linux and 3.20 on Windows.
- Added new CMake package oneDPLIntelLLVMConfig.cmake to resolve issues using CMake 3.20+ on Windows for icx and icx-cl.
- Fixed an error in the sort and stable_sort algorithms when performing a descending sort on signed numeric types with negative values.
- Fixed an error in reduce_by_segment algorithm when a non-commutative predicate is used.
- Fixed an error in sort and stable_sort algorithms for integral types wider than 4 bytes.
- Fixed an error for some compilers where OpenMP or SYCL backend was selected by CMake scripts without full compiler support.
- Fixed an error that caused segmentation faults in transform_reduce, minmax_element, and related algorithms when ran on CPU devices.
- Fixed a compilation error in transform_reduce, minmax_element, and related algorithms on FPGAs.
- Fixed a radix sort issue with 64-bit signed integer types.
Known Issues and Limitations
New in This Release
- Incorrect results may be produced with in-place scans using unseq and par_unseq policies on CPUs with the Intel® C++ Compiler 2021.8.
Existing Issues
See oneDPL Guide for other restrictions and known limitations.
- 
	std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device. 
- 
	std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library. 
- 
	The oneapi::dpl::experimental::ranges::reverse algorithm is not available with -fno-sycl-unnamed-lambda option. 
- 
	STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library. 
2022.1.0
New Features
- Added generate, generate_n, transform algorithms to Tested Standard C++ API.
- Improved performance of inclusive_scan, exclusive_scan, reduce, and max_element algorithms with DPC++ execution policies.
Fixed Issues
- Added a workaround for the "TBB headers not found" issue occurring with libstdc++ version 9 when oneTBB headers are not present in the environment. The workaround requires inclusion of the oneDPL headers before the libstdc++ headers.
- When possible, oneDPL CMake scripts now enforce C++17 as the minimally required language version.
- 
	Fixed an error in the ``exclusive_scan`` algorithm when the output iterator is equal to the input iterator (in-place scan). 
Known Issues and Limitations
New in This Release
- None in this release.
Existing Issues
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- The oneapi::dpl::experimental::ranges::reverse algorithm is not available with -fno-sycl-unnamed-lambda option.
- STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
NOTE: See oneDPL Guide for other restrictions and known limitations.
2022.0.0
New Features
- Added ``<complex>`` header functionality as Tested Standard C++ API.
- Improved performance of ``sort`` and ``stable_sort`` algorithms on GPU devices when using Radix sort. The sorting algorithms in oneDPL use Radix sort for arithmetic data types compared with ``std::less`` or ``std::greater``, otherwise Merge sort.
Fixed Issues
- Fixed error in ``oneapi::dpl::experimental::ranges::guard_view`` and ``oneapi::dpl::experimental::ranges::zip_view`` when using ``operator[]`` with a very large index.
- Fixed permutation_iterator to work with C++ lambda functions for index permutation.
- Fixed an error in ``oneapi::dpl::experimental::ranges::guard_view`` and ``oneapi::dpl::experimental::ranges::zip_view`` when using ``operator[]`` with an index exceeding the limits of a 32 bit integer type.
- Fixed errors when data size is 0 in ``upper_bound``, ``lower_bound`` and ``binary_search`` algorithms.
Changes Affecting Backward Compatibility
- Removed support of C++11 and C++14.
- Changed the size and the layout of the ``discard_block_engine`` class template.For further details, please refer to 2022.0 Changes.
Known Issues and Limitations
New in This Release
- None in this release.
Existing Issues
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- The oneapi::dpl::experimental::ranges::reverse algorithm is not available with -fno-sycl-unnamed-lambda option.
- STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
NOTE: See oneDPL Guide for other restrictions and known limitations.
Additional Documentation
- 
	oneDPL Specification 
- 
	Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes 
Previous oneDPL Releases
Notices and Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.