Release Notes for Intel® oneAPI Data Analytics Library

Published: 07/01/2019  

Last Updated: 09/29/2021

This article includes the Release Notes for Intel® oneAPI Data Analytics Library (oneDAL)

Version History

Document revision Date Change History
2021.4 2021-9-29 2021.4 Release Update
2021.3 2021-6-22 2021.3 Release Update
2021.2 2021-3-29 2021.2 Release Update
2021.1 2020-12-07 2021.1 Release Update

Overview

oneDAL is the library of Intel® architecture optimized building blocks covering all stages of compute-intense data analytics: data acquisition from a data source, preprocessing, transformation, data mining, modeling, validation, and decision making.

System Requirements

Please see dedicate system requirements article.

2021.4

The release introduces the following changes: 

Library Engineering

  • Introduced new functionality for Intel® Extension for Scikit-learn*:
    • Enabled patching for all Scikit-learn applications at once:
    •  Added the support of Python 3.9 for both Intel® Extension for Scikit-learn and daal4py. The packages are available from PyPI and the Intel Channel on Anaconda Cloud.
  • Introduced new oneDAL functionality:
    • Added pkg-config support for Linux, macOS, Windows and for static/dynamic, thread/sequential configurations of oneDAL applications.
    • Reduced the size of oneDAL library by approximately ~30%.

Support Materials

The following additional materials were created:

What's New

  • Introduced new oneDAL functionality: 
    • General:
      • Basic statistics (Low order moments) algorithm in oneDAL interfaces
      • Result options for kNN Brute-force in oneDAL interfaces: using a single function call to return any combination of responses, indices, and distances
    • CPU:
      • Sigmoid kernel of SVM algorithm
      • Model converter from CatBoost to oneDAL representation
      • Louvain Community Detection algorithm technical preview
      • Connected Components algorithm technical preview
      • Search task and cosine distance for kNN Brute-force
    • GPU:
      • The full range support of Minkowski distances in kNN Brute-force
  • Improved oneDAL performance for the following algorithms:
    • CPU:
      • Decision Forest training and prediction
      • Brute-force kNN
      • KMeans
      • NuSVMs and SVR training
  • Introduced new functionality in Intel® Extension for Scikit-learn:
    • General:
      • Enabled the global patching of all Scikit-learn applications
      • Provided an integration with dpctl for heterogeneous computing (the support of dpctl.tensor.usm_ndarray for input and output)
      • Extended API with set_config and get_config methods. Added the support of target_offload and allow_fallback_to_host options for device offloading scenarios
      • Added the support of predict_proba in RandomForestClassifier estimator
    • CPU:
      • Added the support of Sigmoid kernel in SVM algorithms
    • GPU
      • Added binary SVC support with Linear and RBF kernels
  • Improved the performance of the following scikit-learn estimators via scikit-learn patching:
    • SVR algorithm training
    • NuSVC and NuSVR algorithms training
    • RandomForestRegression and RandomForestClassifier algorithms training and prediction
    • KMeans
  • Fixed the following issues:
    • General:
      • Fixed an incorrectly raised exception during the patching of Random Forest algorithm when the number of trees was more than 7000.
    • CPU:
      • Fixed an accuracy issue in Random Forest algorithm caused by the exclusion of constant features.
      • Fixed an issue in NuSVC Multiclass.
      • Fixed an issue with KMeans convergence inconsistency.
      • Fixed incorrect work of train_test_split with specific subset sizes. 
    • GPU:
      • Fixed incorrect bias calculation in SVM.

Known Issues

  • GPU:
    • For most algorithms, performance degradations were observed when the 2021.4 version of Intel® oneAPI DPC++ Compiler was used. 
    • Examples are failing when run with Visual Studio Solutions on hardware that does not support double precision floating-point operations.
       

The release introduces the following changes: 

Library Engineering

  • Introduced a new Python package, Intel® Extension for Scikit-learn*. The scikit-learn-intelex package contains scikit-learn patching functionality that was originally available in daal4py package. All future updates for the patches will be available only in Intel® Extension for Scikit-learn. We recommend using scikit-learn-intelex package instead of daal4py.
    • Download the extension using one of the following commands:
      • pip install scikit-learn-intelex
      • conda install scikit-learn-intelex -c conda-forge
    • Enable Scikit-learn patching:
      • from sklearnex import patch_sklearn
      • patch_sklearn()
  • Introduced optional dependencies on DPC++ runtime to daal4py. To enable DPC++ backend, install dpcpp_cpp_rt package. It reduces the default package size with all dependencies from 1.2GB to 400 MB.
  • •    Added the support of building oneDAL-based applications with /MD and /MDd options on Windows. The -d suffix is used in the names of oneDAL libraries that are built with debug run-time (/MDd).

Support Materials

The following additional materials were created:

What's New

  • Introduced new oneDAL and daal4py functionality: 
    • CPU:
      • SVM Regression algorithm
      • NuSVM algorithm for both Classification and Regression tasks
      • Polynomial kernel support for all SVM algorithms (SVC, SVR, NuSVC, NuSVR)
      • Minkowski and Chebyshev distances for kNN Brute-force
      • The brute-force method and the voting mode support for kNN algorithm in oneDAL interfaces
      • Multiclass support for SVM algorithms in oneDAL interfaces
      • CSR-matrix support for SVM algorithms in oneDAL interfaces
      • Subgraph Isomorphism algorithm technical preview
      • Single Source Shortest Path (SSSP) algorithm technical preview
  • Improved oneDAL and daal4py performance for the following algorithms:
    • CPU:
      • Support Vector Machines training and prediction
      • Linear, Ridge, ElasticNet, and LASSO regressions prediction
    • GPU:
      • Decision Forest training and prediction
      • Principal Components Analysis training 
  • Introduced the support of scikit-learn 1.0 version in Intel Extension for Scikit-learn. The 2021.3 release of Intel Extension for Scikit-learn supports the latest scikit-learn releases: 0.22.X, 0.23.X, 0.24.X and 1.0.X.
  • Introduced new functionality for Intel Extension for Scikit-learn:
    • General:
      • The support of patch_sklearn for all algorithms
    • CPU:
      • Acceleration of SVR estimator
      • Acceleration of NuSVC and NuSVR estimators
      • Polynomial kernel support in SVM algorithms
  • Improved the performance of the following scikit-learn estimators via scikit-learn patching:
    • SVM algorithms training and prediction
    • Linear, Ridge, ElasticNet, and Lasso regressions prediction
  • Fixed the following issues:
    • General:
      • Fixed binary incompatibility for the versions of numpy earlier than 1.19.4
      • Fixed an issue with a very large number of trees (> 7000) for Random Forest algorithm.
      • Fixed patch_sklearn to patch both fit and predict methods of Logistic Regression when the algorithm is given as a single parameter to patch_sklearn
    • CPU:
      • Improved numerical stability of training for Alternating Least Squares (ALS) and Linear and Ridge regressions with Normal Equations method
      • Reduced the memory consumption of SVM prediction
    • GPU:
      • Fixed an issue with kernel compilation on the platforms without hardware FP64 support

Known Issues

  • Intel® Extension for Scikit-learn and daal4py packages installed from PyPI repository can’t be found on Debian systems (including Google Collab). Mitigation: add “site-packages” folder into Python packages searching before importing the packages:

import sys  import os  import site  sys.path.append(os.path.join(os.path.dirname(site.getsitepackages()[0]), "site-packages")) 

The release introduces the following changes: 

Library Engineering

  • Enabled new PyPI distribution channel for daal4py:
    • Four latest Python versions (3.6, 3.7, 3.8, 3.9) are supported on Linux, Windows and MacOS.
    • Support of both CPU and GPU is included in the package.
    • You can download daal4py using the following command: pip install daal4py
  • Introduced CMake support for oneDAL examples

Support Materials

The following additional materials were created:

What's New

  •  Introduced new oneDAL and daal4py functionality:  
    • CPU:
      • Hist method for Decision Forest Classification and Regression, which outperforms the existing exact method
      • Bit-to-bit results reproducibility for: Linear and Ridge regressions, LASSO and ElasticNet, KMeans training and initialization, PCA, SVM, Logistic Regression, kNN Brute Force method, Decision Forest Classification and Regression
    • GPU:
      • Multi-node multi-GPU algorithms: K-means (batch and online), Covariance (batch and online), Low order moments (batch and online) and PCA
      • Sparsity support for SVM algorithm
  • Improved oneDAL and daal4py performance for the following algorithms:
    • CPU:
      • Decision Forest training Classification and Regression
      • Support Vector Machines training and prediction
      • Logistic Regression, Logistic Loss and Cross Entropy for non-homogeneous input types
    • GPU:
      • Decision Forest training Classification and Regression
      • All algorithms with GPU kernels (as a result of migration to Unified Shared Memory data management)
  • Reduced performance overhead for oneAPI C++ interfaces on CPU and oneAPI DPC++ interfaces on GPU
  • Added technical preview features in Graph Analytics:
    • CPU:
      • Local and Global Triangle Counting
  • Introduced new functionality for scikit-learn patching through daal4py:
    • CPU:
      • Patches for four latest scikit-learn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
      • Acceleration of roc_auc_score function
      • Bit-to-bit results reproducibility for: LinearRegression, Ridge, SVC, KMeans, PCA, Lasso, ElasticNet, tSNE, KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors, RandomForestClassifier, RandomForestRegressor
  • ​Improved performance of the following scikit-learn estimators via scikit-learn patching:
    • CPU:
      • RandomForestClassifier and RandomForestRegressor scikit-learn estimators: training and prediction
      • Principal Component Analysis (PCA) scikit-learn estimator: training 
      • Support Vector Classification (SVC) scikit-learn estimators: training and prediction
      • Support Vector Classification (SVC) scikit-learn estimator with the probability==True parameter: training and prediction
  • Fixed the following issues:
    • Scikit-learn patching:
      • Improved accuracy of RandomForestClassifier and RandomForestRegressor scikit-learn estimators
      • Fixed patching issues with pairwise_distances
      • Fixed the behavior of the patch_sklearn and unpatch_sklearn functions
      • Fixed unexpected behavior that made accelerated functionality unavailable through scikit-learn patching if the unput was not of float32 or float64 data types. Scikit-learn patching now works with all numpy data types.
      • Fixed a memory leak that appeared when DataFrame from pandas was used as an input type
      • Fixed performance issue for interoperability with Modin
  • daal4py:
    • GPU:
      • Fixed the crash of SVM and kNN algorithms on Windows
  • oneDAL:
    • CPU:
      • Improved accuracy of Decision Forest Classification and Regression
    • GPU:
      • Improved accuracy of KMeans algorithm
      • Improved stability of Linear Regression and Logistic Regression algorithms

​​Known Issues

  • oneDAL vars.sh script does not support kornShell

 

Getting Started Guide

Please refer to oneDAL Getting Started Guide

 

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.