This article includes the Release Notes for Intel® oneAPI Data Analytics Library (oneDAL)
Version History
Document revision  Date  Change History 

2023.2  2023713  2023.2 Release Update 
2023.1  2023330  2023.1 Release Update 
2023.0  20221216  2023.0 Release Update 
2022.3.1  20221110  2022.3.1 Release Update 
2022.3  2022927  2022.3 Release Update 
2022.2  2022413  2022.2 Release Update 
2022.1  2021127  2022.1 Release Update 
2021.4  2021929  2021.4 Release Update 
2021.3  2021622  2021.3 Release Update 
2021.2  2021329  2021.2 Release Update 
2021.1  20201207  2021.1 Release Update 
Overview
oneDAL is the library of Intel® architecture optimized building blocks covering all stages of computeintense data analytics: data acquisition from a data source, preprocessing, transformation, data mining, modeling, validation, and decision making.
System Requirements
Please see dedicated system requirements article.
2023.2
Deprecation Notice
 The compression functionality in the oneDAL library is deprecated. Starting with the 2024.0 release, oneDAL will not support the compression functionality
 The DAAL CPP SYCL Interfaces in the oneDAL library are deprecated. Starting with the 2024.0 release, oneDAL will not support the DAAL CPP SYCL Interfaces
 The Java* interfaces in the oneDAL library are marked as deprecated. The future releases of the oneDAL library may no longer include support for these Java* interfaces
 ABI compatibility is to be broken as part of the 2024.0 release of oneDAL. The library’s major version is to be incremented to two to enforce the relinking of existing applications
 macOS* support is deprecated for oneDAL. The 2023.x releases are the last to provide it
Library Engineering
 CSR tables interface have been changed and moved from detail namespace
What's New
 Introduced new Intel® oneDAL functionality:
 Distributed KMeans++ algorithm
 Logistic Loss objective algorithm
 Introduced new functionality for Intel® Extension for Scikitlearn:
 NaN(missing values) support was added to Model Builders
 Improved performance for the following Intel® Extension for Scikitlearn algorithms:
 Model Builders performance have been improved up to 2x
2023.1
What's New
 Introduced new Intel® oneDAL functionality:
 Distributed Linear Regression, kNN, PCA algorithms
 Introduced new functionality for Intel® Extension for Scikitlearn:
 Enabled PCA, Linear Regression, Random Forest algorithms and SPMD policy as preview
 Scikitlearn 1.2 support
 sklearn_is_patched() function added to validate status of algorithms patching
 Improved performance for the following Intel® Extension for Scikitlearn algorithms:
 tSNE for “BurnesHut” algorithm
 SVM algorithm for single row inference
Known Issues
 In certain conditions DAAL SYCL interface might hang with L0 backend – please use oneDAL DPC interfaces instead. If older interfaces are required OpenCL backend can be used as workaround.
Library Engineering
 Reduced the size of Intel® oneDAL library by approximately ~30%
 Enabled NuGet distribution channel for Intel® oneDAL on Linux and MacOS
Support Materials
The following additional materials were created:
 Accelerating BarnesHut tSNE Algorithm by Efficient Parallelization on MultiCore CPUs https://arxiv.org/abs/2212.11506
Deprecation Notice
 DAAL data compression functionality is deprecated and would be removed in 2024.0 release
 oneDAL make and Visual studio examples are deprecated – please use CMake based examples instead
 DAAL cpp_sycl interfaces are deprecated and would be removed in 2024.0 release
What's New
 Introduced new Intel® oneDAL functionality:
 DPC++ interface for Linear Regression algorithm
Known Issues
 Intel® Extension for Scikitlearn SVC.fit and KNN.fit do not support GPU
 Most Intel® Extension for Scikitlearn sycl examples fail when using GPU context

Running the Random Forest algorithm with versions 2021.7.1 and 2023.0 of scikitlearnintelex on the 2nd Generation Intel® Xeon® Scalable Processors, formerly Cascade Lake may result in an 'Illegal instruction' error.

No workaround is currently available for this issue.

Recommendation: Use an older version of scikitlearnintelex until the issue is fixed in a future release.

Deprecation Notice
 The sequential version of oneDAL was deprecated starting the 2023.0 version. Please use TBB capabilities to limit the thread count if execution on single core is required
 Intel® oneAPI Data Analytics Library KDB* Samples were deprecated in the open source distribution
 Intel® oneAPI Data Analytics Library Hadoop* Samples on macOS were deprecated in the open source distribution
 Intel® oneAPI Data Analytics Library Spark* Samples on macOS were deprecated in the open source distribution
What's New
 Get more functionality and productivity for Intel® Extension for Scikitlearn with Minkowski and Chebyshev distances in kNN and acceleration of the tSNE algorithm.
 For oneDAL, take advantage of the new LinReg algorithm and distributed PCA algorithm.
 This release is immediately available through the Intel® Developer Zone. It will be available through repositories at a later date.
Deprecation Notice
zlib and bzip2 methods of compression were deprecated. They are dispatched to the lzo method starting the 2022.3.1 version.
There are no updates for the 2022.3 release. Please refer to the 2022.2 release notes.
Library Engineering
 Reduced the size of oneDAL python runtime package by approximately 8%
 Added Python 3.10 support for daal4py and Intel(R) Extension for Scikitlearn packages
Support Materials
Created Kaggle kernels for Intel® Extension for Scikitlearn:
 Fast Feature Importance using scikitlearnintelex
 [TPSDEC] Fast Feature Importance with sklearnex
 [TPSDec]SVC with sklearnex 20x speedup
 [TPSJan] Fast PyCaret with ScikitlearnIntelex
 [TPSFeb] KNN with sklearnex 13x speedup
 Fast SVM for Sparse Data from NLP Problem
 Introduction to scikitlearnintelex
 [Datasets] Fast Feature Importance using sklearnex
 [TPSMar] Fast workflow using scikitlearnintelex
What's New
 Improved performance of oneDAL algorithms:
 Optimized data conversion for tables with columnmajor layout in host memory to tables with rowmajor layout in device memory
 Optimized the computation of Minkowski distances in bruteforce kNN on CPU
 Optimized Covariance algorithm
 Added DPC++ columnwise atomic reduction
 Introduced new oneDAL functionality:
 KMeans distributed random dense initialization
 Distributed PcaCov
 sendrecv_replace communicator method
 Added new parameters to oneDAL algorithms:
 Weights in Decision Forest for CPU
 Cosine and Chebyshev distances for KNN on GPU
 Improved performance for the following Intel® Extension for Scikitlearn algorithms:
 tSNE for “BurnesHut” algorithm
 Introduced new functionality for Intel® Extension for Scikitlearn:
 Manhattan, Minkowski, Chebyshev and Cosine distances for KNeighborsClassifier and NearestNeighbors with “brute” algorithm
 Fixed the following issues in Intel® Extension for Scikitlearn:
 An issue with the search of common data type in pandas DataFrame
 Patching overhead of finiteness checker for specific small data sizes
 Incorrect values in a tree visualization with plot_tree function in RandomForestClassifier
 Unexpected error for device strings in {device}:{device_index} format while using config context
 The sequential version of oneDAL will be deprecated starting in the next release
The release introduces the following changes:
Library Engineering
 Reduced the size of oneDAL library by approximately ~15%.
Support Materials
The following additional materials were created:
 oneDAL samples:
 Intel® Extension for Scikitlearn samples:
 demo samples of the Intel® Extension for Scikitlearn usage with the performance comparison to original Scikitlearn for ElasticNet, Kmeans, Lasso Regression, Linear regression, and Ridge Regression
 demo samples of the Modin usage
 daal4py samples:
 an example of Catboost converter usage
 Kaggle kernels for Intel® Extension for Scikitlearn:
 [Tabular Playground Series  Sep 2021] Ridge with sklearnintelex 2x speedup
 [Tabular Playground Series  Oct 2021] Fast AutoML with Intel Extension for Scikitlearn
 [Titanic – Machine Learning from Disaster] AutoML with Intel Extension for Sklearn
 [Tabular Playground Series  Nov 2021] AutoML with Intel® Extension
 [Tabular Playground Series  Nov 2021] Log Regression with sklearnex 17x speedup
What's New
 Introduced new oneDAL functionality:
 Distributed algorithms for Covariance, DBSCAN, Decision Forest, Low Order Moments
 oneAPI interfaces for Linear Regression, DBSCAN, KNN
 Improved error handling for distributed algorithms in oneDAL in case of compute nodes failures
 Improved performance for the following oneDAL algorithms:
 Louvain algorithm
 KNN and SVM algorithms on GPU
 Introduced new functionality for Intel® Extension for Scikitlearn:
 Scikitlearn 1.0 support
 Fixed the following issues:
 Stabilized the results of Linear Regression in oneDAL and Intel® Extension for Scikitlearn
 Fixed an issue with RPATH on MacOS
The release introduces the following changes:
Library Engineering
 Introduced new functionality for Intel® Extension for Scikitlearn*:
 Enabled patching for all Scikitlearn applications at once:
 You can enable global patching via command line:
 python m sklearnex.glob patch_sklearn
 Or via code:
 from sklearnex import patch_sklearn
 patch_sklearn(global_patch=True)
 Read more in Intel® Extension for Scikitlearn documentation.
 You can enable global patching via command line:
 Added the support of Python 3.9 for both Intel® Extension for Scikitlearn and daal4py. The packages are available from PyPI and the Intel Channel on Anaconda Cloud.
 Enabled patching for all Scikitlearn applications at once:
 Introduced new oneDAL functionality:
 Added pkgconfig support for Linux, macOS, Windows and for static/dynamic, thread/sequential configurations of oneDAL applications.
 Reduced the size of oneDAL library by approximately ~30%.
Support Materials
The following additional materials were created:
 Samples:
 Added demo samples comparing the usage of Intel® Extension for Scikitlearn and the original Scikitlearn for KNN, Logistic Regression, SVM and Random Forest algorithms
 Anaconda blogs:
 Medium blogs:
 Oracle blogs:
 Kaggle kernels:
 [Tabular Playground Series  Jun 2021] Fast LogReg with scikitlearnintelex
 [Tabular Playground Series  Jun 2021] AutoGluon with sklearnex
 [Tabular Playground Series  Jul 2021] Fast RandomForest with sklearnex
 [Tabular Playground Series  Jul 2021] RF with Intel Extension for Scikitlearn
 [Tabular Playground Series  Jul 2021] Stacking with scikitlearnintelex
 [Tabular Playground Series  Aug 2021] NuSVR with Intel Extension for Sklearn
 [Predict Future Sales] Stacking with scikitlearnintelex
 [House Prices  Advanced Regression Techniques] NuSVR sklearnintelex 4x speedup
What's New
 Introduced new oneDAL functionality:
 General:
 Basic statistics (Low order moments) algorithm in oneDAL interfaces
 Result options for kNN Bruteforce in oneDAL interfaces: using a single function call to return any combination of responses, indices, and distances
 CPU:
 Sigmoid kernel of SVM algorithm
 Model converter from CatBoost to oneDAL representation
 Louvain Community Detection algorithm technical preview
 Connected Components algorithm technical preview
 Search task and cosine distance for kNN Bruteforce
 GPU:
 The full range support of Minkowski distances in kNN Bruteforce
 General:
 Improved oneDAL performance for the following algorithms:
 CPU:
 Decision Forest training and prediction
 Bruteforce kNN
 KMeans
 NuSVMs and SVR training
 CPU:
 Introduced new functionality in Intel® Extension for Scikitlearn:
 General:
 Enabled the global patching of all Scikitlearn applications
 Provided an integration with dpctl for heterogeneous computing (the support of dpctl.tensor.usm_ndarray for input and output)
 Extended API with set_config and get_config methods. Added the support of target_offload and allow_fallback_to_host options for device offloading scenarios
 Added the support of predict_proba in RandomForestClassifier estimator
 CPU:
 Added the support of Sigmoid kernel in SVM algorithms
 GPU
 Added binary SVC support with Linear and RBF kernels
 General:
 Improved the performance of the following scikitlearn estimators via scikitlearn patching:
 SVR algorithm training
 NuSVC and NuSVR algorithms training
 RandomForestRegression and RandomForestClassifier algorithms training and prediction
 KMeans
 Fixed the following issues:
 General:
 Fixed an incorrectly raised exception during the patching of Random Forest algorithm when the number of trees was more than 7000.
 CPU:
 Fixed an accuracy issue in Random Forest algorithm caused by the exclusion of constant features.
 Fixed an issue in NuSVC Multiclass.
 Fixed an issue with KMeans convergence inconsistency.
 Fixed incorrect work of train_test_split with specific subset sizes.
 GPU:
 Fixed incorrect bias calculation in SVM.
 General:
Known Issues
 GPU:
 For most algorithms, performance degradations were observed when the 2021.4 version of Intel® oneAPI DPC++ Compiler was used.
 Examples are failing when run with Visual Studio Solutions on hardware that does not support double precision floatingpoint operations.
The release introduces the following changes:
Library Engineering
 Introduced a new Python package, Intel® Extension for Scikitlearn*. The scikitlearnintelex package contains scikitlearn patching functionality that was originally available in daal4py package. All future updates for the patches will be available only in Intel® Extension for Scikitlearn. We recommend using scikitlearnintelex package instead of daal4py.
 Download the extension using one of the following commands:
 pip install scikitlearnintelex
 conda install scikitlearnintelex c condaforge
 Enable Scikitlearn patching:
 from sklearnex import patch_sklearn
 patch_sklearn()
 Download the extension using one of the following commands:
 Introduced optional dependencies on DPC++ runtime to daal4py. To enable DPC++ backend, install dpcpp_cpp_rt package. It reduces the default package size with all dependencies from 1.2GB to 400 MB.
 • Added the support of building oneDALbased applications with /MD and /MDd options on Windows. The d suffix is used in the names of oneDAL libraries that are built with debug runtime (/MDd).
Support Materials
The following additional materials were created:
 Medium blogs:
 Superior Machine Learning Performance on the Latest Intel Xeon Scalable Processors
 Leverage Intel Optimizations in ScikitLearn (SVM Performance Training and Inference)
 Optimizing CatBoost Performance
 Performance Optimizations for EndtoEnd AI Pipelines
 Optimizing the EndtoEnd Training Pipeline on Apache Spark Clusters
 Kaggle kernels:
 [Tabular Playground Series  Apr 2021] RF with Intel Extension for Scikitlearn
 [Tabular Playground Series  Apr 2021] SVM with Intel Extension for Scikitlearn
 [Tabular Playground Series  Apr 2021] SVM with scikitlearnintelex
 Samples that illustrate the usage of Intel Extension for Scikitlearn
What's New
 Introduced new oneDAL and daal4py functionality:
 CPU:
 SVM Regression algorithm
 NuSVM algorithm for both Classification and Regression tasks
 Polynomial kernel support for all SVM algorithms (SVC, SVR, NuSVC, NuSVR)
 Minkowski and Chebyshev distances for kNN Bruteforce
 The bruteforce method and the voting mode support for kNN algorithm in oneDAL interfaces
 Multiclass support for SVM algorithms in oneDAL interfaces
 CSRmatrix support for SVM algorithms in oneDAL interfaces
 Subgraph Isomorphism algorithm technical preview
 Single Source Shortest Path (SSSP) algorithm technical preview
 CPU:
 Improved oneDAL and daal4py performance for the following algorithms:
 CPU:
 Support Vector Machines training and prediction
 Linear, Ridge, ElasticNet, and LASSO regressions prediction
 GPU:
 Decision Forest training and prediction
 Principal Components Analysis training
 CPU:
 Introduced the support of scikitlearn 1.0 version in Intel Extension for Scikitlearn. The 2021.3 release of Intel Extension for Scikitlearn supports the latest scikitlearn releases: 0.22.X, 0.23.X, 0.24.X and 1.0.X.
 Introduced new functionality for Intel Extension for Scikitlearn:
 General:
 The support of patch_sklearn for all algorithms
 CPU:
 Acceleration of SVR estimator
 Acceleration of NuSVC and NuSVR estimators
 Polynomial kernel support in SVM algorithms
 General:
 Improved the performance of the following scikitlearn estimators via scikitlearn patching:
 SVM algorithms training and prediction
 Linear, Ridge, ElasticNet, and Lasso regressions prediction
 Fixed the following issues:
 General:
 Fixed binary incompatibility for the versions of numpy earlier than 1.19.4
 Fixed an issue with a very large number of trees (> 7000) for Random Forest algorithm.
 Fixed patch_sklearn to patch both fit and predict methods of Logistic Regression when the algorithm is given as a single parameter to patch_sklearn
 CPU:
 Improved numerical stability of training for Alternating Least Squares (ALS) and Linear and Ridge regressions with Normal Equations method
 Reduced the memory consumption of SVM prediction
 GPU:
 Fixed an issue with kernel compilation on the platforms without hardware FP64 support
 General:
Known Issues
 Intel® Extension for Scikitlearn and daal4py packages installed from PyPI repository can’t be found on Debian systems (including Google Collab). Mitigation: add “sitepackages” folder into Python packages searching before importing the packages:
import sys import os import site sys.path.append(os.path.join(os.path.dirname(site.getsitepackages()[0]), "sitepackages"))
The release introduces the following changes:
Library Engineering
 Enabled new PyPI distribution channel for daal4py:
 Four latest Python versions (3.6, 3.7, 3.8, 3.9) are supported on Linux, Windows and MacOS.
 Support of both CPU and GPU is included in the package.
 You can download daal4py using the following command: pip install daal4py
 Introduced CMake support for oneDAL examples
Support Materials
The following additional materials were created:
 Medium blogs:
 Kaggle kernels:
What's New
 Introduced new oneDAL and daal4py functionality:
 CPU:
 Hist method for Decision Forest Classification and Regression, which outperforms the existing exact method
 Bittobit results reproducibility for: Linear and Ridge regressions, LASSO and ElasticNet, KMeans training and initialization, PCA, SVM, Logistic Regression, kNN Brute Force method, Decision Forest Classification and Regression
 GPU:
 Multinode multiGPU algorithms: Kmeans (batch and online), Covariance (batch and online), Low order moments (batch and online) and PCA
 Sparsity support for SVM algorithm
 CPU:
 Improved oneDAL and daal4py performance for the following algorithms:
 CPU:
 Decision Forest training Classification and Regression
 Support Vector Machines training and prediction
 Logistic Regression, Logistic Loss and Cross Entropy for nonhomogeneous input types
 GPU:
 Decision Forest training Classification and Regression
 All algorithms with GPU kernels (as a result of migration to Unified Shared Memory data management)
 CPU:
 Reduced performance overhead for oneAPI C++ interfaces on CPU and oneAPI DPC++ interfaces on GPU
 Added technical preview features in Graph Analytics:
 CPU:
 Local and Global Triangle Counting
 CPU:
 Introduced new functionality for scikitlearn patching through daal4py:
 CPU:
 Patches for four latest scikitlearn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
 Acceleration of roc_auc_score function
 Bittobit results reproducibility for: LinearRegression, Ridge, SVC, KMeans, PCA, Lasso, ElasticNet, tSNE, KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors, RandomForestClassifier, RandomForestRegressor
 CPU:
 Improved performance of the following scikitlearn estimators via scikitlearn patching:
 CPU:
 RandomForestClassifier and RandomForestRegressor scikitlearn estimators: training and prediction
 Principal Component Analysis (PCA) scikitlearn estimator: training
 Support Vector Classification (SVC) scikitlearn estimators: training and prediction
 Support Vector Classification (SVC) scikitlearn estimator with the probability==True parameter: training and prediction
 CPU:
 Fixed the following issues:
 Scikitlearn patching:
 Improved accuracy of RandomForestClassifier and RandomForestRegressor scikitlearn estimators
 Fixed patching issues with pairwise_distances
 Fixed the behavior of the patch_sklearn and unpatch_sklearn functions
 Fixed unexpected behavior that made accelerated functionality unavailable through scikitlearn patching if the unput was not of float32 or float64 data types. Scikitlearn patching now works with all numpy data types.
 Fixed a memory leak that appeared when DataFrame from pandas was used as an input type
 Fixed performance issue for interoperability with Modin
 Scikitlearn patching:
 daal4py:
 GPU:
 Fixed the crash of SVM and kNN algorithms on Windows
 GPU:
 oneDAL:
 CPU:
 Improved accuracy of Decision Forest Classification and Regression
 GPU:
 Improved accuracy of KMeans algorithm
 Improved stability of Linear Regression and Logistic Regression algorithms
 CPU:
Known Issues
 oneDAL vars.sh script does not support kornShell
The release introduces the following changes:
The release contains all functionality of Intel® DAAL. See Intel® DAAL release notes for more details.
Library Engineering
 Renamed the library from Intel® Data Analytics Acceleration Library to oneAPI Data Analytics Library and changed the package names to reflect this.
 Deprecated 32bit version of the library.
 Introduced Intel GPU support for both OpenCL and Level Zero backends.
 Introduced Unified Shared Memory (USM) support
What's New
 Introduced new Intel® DAAL and daal4py functionality:
 GPU:
 Batch algorithms: Kmeans, Covariance, PCA, Logistic Regression, Linear Regression, Random Forest Classification and Regression, Gradient Boosting Classification and Regression, kNN, SVM, DBSCAN and Loworder moments
 Online algorithms: Covariance, PCA, Linear Regression and Loworder moments
 Added Data Management functionality to support DPC++ APIs: a new table type for representation of SYCLbased numeric tables (SyclNumericTable) and an optimized CSV data source
 GPU:
 Improved oneDAL and daal4py performance for the following algorithms:
 CPU:
 Logistic Regression training and prediction
 kNearest Neighbors prediction with Brute Force method
 Logistic Loss and Cross Entropy objective functions
 CPU:
 Added Technical Preview Features in Graph Analytics:

CPU:
 Undirected graph without edge and vertex weights (undirected_adjacency_array_graph), where vertex indices can only be of type int32
 Jaccard Similarity Coefficients for all pairs of vertices, a batch algorithm that processes the graph by blocks

 Aligned the library with oneDAL Specification 1.0 for the following algorithms:
 CPU/GPU:
 Kmeans, PCA, Random Forest Classification and Regression, kNN and SVM
 CPU/GPU:
 Introduced new functionality for scikitlearn patching through daal4py:
 CPU:
 Acceleration of NearestNeighbors and KNeighborsRegressor scikitlearn estimators with Brute Force and KD tree methods
 Acceleration of TSNE scikitlearn estimator
 GPU:

Intel GPU support in scikitlearn for DBSCAN, Kmeans, Linear and Logistic Regression

 CPU:

Improved performance of the following scikitlearn estimators via scikitlearn patching:

CPU:
 LogisticRegression fit, predict and predict_proba methods
 KNeighborsClassifier predict, predict_proba and kneighbors methods with “brute” method

Known Issues
 oneDAL DPC++ APIs does not work on GEN12 graphics with OpenCL backend. Use Level Zero backend for such cases.
 train_test_split in daal4py patches for ScikitLearn* can produce incorrect shuffling on Windows*
 The following daal4py examples do not work on Intel® Iris Xe MAX with float64 compute mode:
 gradient_boosted_regression_batch
 decision_forest_classification_batch
 decision_forest_regression_batch
 bf_knn_classification_batch
 dbscan_batch
 svm_batch
 sklearn_sycl.py
 kmeans_batch
Run daal4py examples using float32 compute mode instead:
 Use np.float32 data type for input data. To do this, add parameter t=np.float32 to the readcsv function used in the examples.
 Set the parameter fptype to float in the algorithm object constructor: fptype='float'.
 Switch on float64 software emulation on Intel® Iris Xe MAX
 KMeans example in daal4py (examples/sycl/kmeans_batch.py) produces different results on GPU and CPU. To avoid failures, comment assert statements that compare GPU results and classic results in the example.
 DBSCAN example in daal4py (examples/sycl/dbscan_batch.py) hangs when it is running on CPU with data wrapped in sycl_buffer. To avoid hangs, do not pass sycl_buffer objects to DBSCAN on CPU.
Getting Started Guide
Please refer to oneDAL Getting Started Guide
Notices and Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and noninfringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.