Get Started Guide

  • 2021.6
  • 04/11/2022
  • Public Content

Get Started with the Intel® oneAPI Data Analytics Library

Intel® oneAPI Data Analytics Library (oneDAL) is a library that helps speed up big data analysis by providing highly optimized algorithmic building blocks for all stages of data analytics (preprocessing, transformation, analysis, modeling, validation, and decision making) in batch, online, and distributed processing modes of computation.
For general information about oneDAL, visit oneDAL official page.

Before You Begin

oneDAL is located in
<install_dir>/dal
directory where
<install_dir>
is the directory in which Intel® oneAPI Base Toolkit was installed.
The current version of oneDAL with SYCL support is available for Linux* and Windows* 64-bit operating systems. The prebuilt oneDAL libraries can be found in the
<install_dir>/dal/<version>/redist
directory.
To learn about the system requirements and the dependencies needed to build examples, refer to the System Requirements page.

End-to-end Example

Below you can find a typical usage workflow for a oneDAL algorithm on GPU. The example is provided for Principal Component Analysis algorithm (PCA).
The following steps depict how to:
  • Read the data from CSV file
  • Run the training and inference operations for PCA
  • Access intermediate results obtained at the training stage
  1. Include the following header that makes all oneDAL declarations available.
    #include "oneapi/dal.hpp" /* Standard library headers required by this example */ #include <cassert> #include <iostream>
  2. Create a SYCL* queue with the desired device selector. In this case, GPU selector is used:
    const auto queue = sycl::queue{ sycl::gpu_selector{} };
  3. Since all oneDAL declarations are in the
    oneapi::dal
    namespace, import all declarations from the
    oneapi
    namespace to use
    dal
    instead of
    oneapi::dal
    for brevity:
    using namespace oneapi;
  4. Use CSV data source to read the data from the CSV file into a table:
    const auto data = dal::read<dal::table>(queue, dal::csv::data_source{"data.csv"});
  5. Create a PCA descriptor, configure its parameters, and run the training algorithm on the data loaded from CSV.
    const auto pca_desc = dal::pca::descriptor<float> .set_component_count(3) .set_deterministic(true); const dal::pca::train_result train_res = dal::train(queue, pca_desc, data);
  6. Print the learned eigenvectors:
    const dal::table eigenvectors = train_res.get_eigenvectors(); const auto acc = dal::row_accessor<const float>{eigenvectors}; for (std::int64_t i = 0; i < eigenvectors.row_count(); i++) { /* Get i-th row from the table, the eigenvector stores pointer to USM */ const dal::array<float> eigenvector = acc.pull(queue, {i, i + 1}); assert(eigenvector.get_count() == eigenvectors.get_column_count()); std::cout << i << "-th eigenvector: "; for (std::int64_t j = 0; j < eigenvector.get_count(); j++) { std::cout << eigenvector[j] << " "; } std::cout << std::endl; }
  7. Use the trained model for inference to reduce dimensionality of the data:
    const dal::pca::model model = train_res.get_model(); const dal::table data_transformed = dal::infer(queue, pca_desc, data).get_transformed_data(); assert(data_transformed.column_count() == 3);

Build and Run Examples

Perform the following steps to build and run examples demonstrating the basic usage scenarios of oneDAL with SYCL support. Go to
<install_dir>/dal/<version>
and then set up an environment as shown in the example below:
All content below that starts with
#
is considered a comment and should not be run with the code.
  1. Set up the required environment for oneDAL (variables such as
    CPATH
    ,
    LIBRARY_PATH
    , and
    LD_LIBRARY_PATH
    ):
    • On Linux, there are two possible ways to set up the required environment: via
      vars.sh
      script or via
      modulefiles
      .
      • Setting up oneDAL environment via
        vars.sh
        script
        Run the following command:
        source ./env/vars.sh
      • Setting up oneDAL environment via
        modulefiles
        1. Initialize
          modules
          :
          source $MODULESHOME/init/bash
          Refer to Environment Modules documentation for details.
        2. Provide
          modules
          with a path to the
          modulefiles
          directory:
          module use ./modulefiles
        3. Run the module:
          module load dal
    • On Windows, run the following command:
      /env/vars.bat
  2. Copy
    ./examples/oneapi/dpc
    to a writable directory if necessary (since it creates temporary files):
    cp –r ./examples/oneapi/dpc ${WRITABLE_DIR}
  3. Set up the compiler environment for Intel® oneAPI DPC++/C++ Compiler. See Get Started with Intel® oneAPI DPC++/C++ Compiler for details.
  4. Build and run the examples that show how to use oneDAL with SYCL support:
    You need to have write permissions to the
    examples
    folder to build examples, and execute permissions to run them. Otherwise, you need to copy
    examples/oneapi/dpc
    and
    examples/oneapi/data
    folders to the directory with right permissions. These two folders must be retained in the same directory level relative to each other.
    • On Linux:
      # Navigate to the directory containing examples and then build them: cd /examples/oneapi/dpc make so example=svm_two_class_thunder_dense_batch # This will compile and run Correlation example using Intel(R) oneAPI DPC++/C++ Compiler make so mode=build # This compiles all examples in the current directory
    • On Windows:
      # Navigate to the directory containing examples and then build them: cd /examples/oneapi/dpc nmake dll example=svm_two_class_thunder_dense_batch+ # This will compile and run Correlation example using Intel(R) oneAPI DPC++/C++ Compiler nmake dll mode=build # This compiles all examples in the current directory
    To see all available parameters of the build procedure, type
    make
    on Linux* or
    nmake
    on Windows*.
  5. The resulting example binaries and log files are written into the
    _results
    directory.
    You should run the examples from
    examples/oneapi/dpc
    folder, not from
    _results
    folder. Most examples require data to be stored in
    examples/oneapi/data
    folder and to have a relative link to it started from
    examples/oneapi/dpc
    folder.
    You can build traditional C++ examples located in
    examples/oneapi/cpp
    folder in a similar way.

Compile and build applications with pkg-config

The pkg-config tool is a widely used tool for building software with dependencies. Intel® oneAPI Data Analytics Library provides files with pkg-config metadata for compiling and linking an application to the library.
Set up the environment
To use pkg-config, build the library and then set up the environment using
vars.sh
or
vars.bat
scripts:
  • On Linux:
    source ./env/vars.sh
  • On Windows:
    /env/vars.bat
Choose a metadata file
The metadata files provided by oneDAL cover only host device configuration on 64-bit Linux, macOS, or Windows operating system for C++.
Choose the metadata file based on oneDAL threading mode and linking method you use:
oneDAL pkg-config metadata files
Single-threaded (non-threaded)
Multi-threaded (internally threaded)
Static linking
dal-static-sequential-host
dal-static-threading-host
Dynamic linking
dal-dynamic-sequential-host
dal-dynamic-threading-host
Compile a program using pkg-config
To compile a
test.cpp
program with oneDAL and pkg-config, provide the name of the oneDAL pkg-config metadata file as an input parameter. For example:
  • On Linux or macOS:
    icc test.cpp pkg-config --cflags --libs dal-dynamic-threading-host
  • On Windows:
    for /F "delims=," %i in ('pkg-config --cflags --libs dal-dynamic-threading-host) do icl test.cpp %i
A sample code for
svm_two_class_thunder_dense_batch
example with SYCL support. Run the following from the
examples/oneapi/cpp
directory:
  • On Linux or macOS:
    icc -I source/ source/svm/svm_two_class_thunder_dense_batch.cpp icc test.cpp pkg-config --cflags --libs dal-dynamic-threading-host
  • On Windows:
    for /F "delims=," %i in ('pkg-config --cflags --libs dal-dynamic-threading-host) do icl -I source/ icl svm_two_class_thunder_dense_batch.cpp %i

Find More

Document
Description
Refer to oneDAL Developer Guide and Reference for detailed information about implemented algorithms.
Check system requirements before you install Intel® oneAPI Data Analytics Library.
Refer to release notes for Intel® oneAPI Data Analytics Library to learn about new updates in the latest release.
Learn how to use oneDAL with daal4py, a Python* API.
Learn about requirements for implementations of oneAPI Data Analytics Library.

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.