Developer Guide and Reference

  • 2021.4
  • 09/27/2021
  • Public Content
Contents

Principal Components Analysis (PCA)

Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.

Mathematical formulation

Programming Interface

All types and functions in this section are declared in the
oneapi::dal::pca
namespace and be available via inclusion of the
oneapi/dal/algo/pca.hpp
header file.
Descriptor
template<typename
Float
= float, typename
Method
= method::by_default, typename
Task
= task::by_default>
class
descriptor
Template Parameters
  • Float
    – The floating-point type that the algorithm uses for intermediate computations. Can be
    float
    or
    double
    .
  • Method
    – Tag-type that specifies an implementation of algorithm. Can be
    method::cov
    or
    method::svd
    .
  • Task
    – Tag-type that specifies type of the problem to solve. Can be .
Constructors
descriptor
(std::int64_t
component_count
= 0)
Creates a new instance of the class with the given
component_count
property value.
Properties
std::int64_t
component_count
The number of principal components LaTex Math image.. If it is zero, the algorithm computes the eigenvectors for all features, LaTex Math image..
Default value
: 0.
Getter & Setter


std::int64_t get_component_count() const
auto & set_component_count(int64_t value)

Invariants


component_count >= 0

bool
deterministic
Specifies whether the algorithm applies the sign-flip technique. If it is
true
, the directions of the eigenvectors must be deterministic.
Default value
: true.
Getter & Setter


bool get_deterministic() const
auto & set_deterministic(bool value)

Method tags
struct
cov
Tag-type that denotes Covariance computational method.
struct
svd
Tag-type that denotes SVD computational method.
using
by_default
= cov
Alias tag-type for Covariance computational method.
Task tags
struct
dim_reduction
Tag-type that parameterizes entities used for solving dimensionality reduction problem.
using
by_default
= dim_reduction
Alias tag-type for dimensionality reduction task.
Model
template<typename
Task
= task::by_default>
class
model
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
model
()
Creates a new instance of the class with the default property values.
Properties
const
table &
eigenvectors
An LaTex Math image. table with the eigenvectors. Each row contains one eigenvector.
Default value
: table{}.
Getter & Setter


const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)

Training
train(...)
Input
template<typename
Task
= task::by_default>
class
train_input
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
train_input
(
const
table &
data
)
Creates a new instance of the class with the given
data
property value.
Properties
const
table &
data
An LaTex Math image. table with the training data, where each row stores one feature vector.
Default value
: table{}.
Getter & Setter


const table & get_data() const
auto & set_data(const table &data)

Result
template<typename
Task
= task::by_default>
class
train_result
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
train_result
()
Creates a new instance of the class with the default property values.
Public Methods
const
table &
get_eigenvectors
()
const
An LaTex Math image. table with the eigenvectors. Each row contains one eigenvector.
Properties
const
table &
variances
A LaTex Math image. table that contains the variances for the first
r
features.
Default value
: table{}.
Getter & Setter


const table & get_variances() const
auto & set_variances(const table &value)

const
table &
means
A LaTex Math image. table that contains the mean values for the first
r
features.
Default value
: table{}.
Getter & Setter


const table & get_means() const
auto & set_means(const table &value)

const
model<Task> &
model
The trained PCA model.
Default value
: model<Task>{}.
Getter & Setter


const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

const
table &
eigenvalues
A LaTex Math image. table that contains the eigenvalues for for the first
r
features.
Default value
: table{}.
Getter & Setter


const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)

Operation
template<typename
Descriptor
> pca::train_result
train
(
const
Descriptor &
desc
,
const
pca::train_input &
input
)
Parameters
  • desc
    – PCA algorithm descriptor
  • input
    – Input data for the training operation
Preconditions


input.data.has_data == true
input.data.column_count >= desc.component_count

Postconditions


result.means.row_count == 1
result.means.column_count == desc.component_count
result.variances.row_count == 1
result.variances.column_count == desc.component_count
result.variances[i] >= 0.0
result.eigenvalues.row_count == 1
result.eigenvalues.column_count == desc.component_count
result.model.eigenvectors.row_count == 1
result.model.eigenvectors.column_count == desc.component_count

Inference
infer(...)
Input
template<typename
Task
= task::by_default>
class
infer_input
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
infer_input
(
const
model<Task> &
trained_model
,
const
table &
data
)
Creates a new instance of the class with the given
model
and
data
property values.
Properties
const
model<Task> &
model
The trained PCA model.
Default value
: model<Task>{}.
Getter & Setter


const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

const
table &
data
The dataset for inference LaTex Math image..
Default value
: table{}.
Getter & Setter


const table & get_data() const
auto & set_data(const table &value)

Result
template<typename
Task
= task::by_default>
class
infer_result
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
infer_result
()
Creates a new instance of the class with the default property values.
Properties
const
table &
transformed_data
An LaTex Math image. table that contains data projected to the
r
principal components.
Default value
: table{}.
Getter & Setter


const table & get_transformed_data() const
auto & set_transformed_data(const table &value)

Operation
template<typename
Descriptor
> pca::infer_result
infer
(
const
Descriptor &
desc
,
const
pca::infer_input &
input
)
Parameters
  • desc
    – PCA algorithm descriptor
  • input
    – Input data for the inference operation
Preconditions


input.data.has_data == true
input.model.eigenvectors.row_count == desc.component_count
input.model.eigenvectors.column_count == input.data.column_count

Postconditions


result.transformed_data.row_count == input.data.row_count
result.transformed_data.column_count == desc.component_count

Usage example

Training
pca::model<> run_training(const table& data) { const auto pca_desc = pca::descriptor<float>{} .set_component_count(5) .set_deterministic(true); const auto result = train(pca_desc, data); print_table("means", result.get_means()); print_table("variances", result.get_variances()); print_table("eigenvalues", result.get_eigenvalues()); print_table("eigenvectors", result.get_eigenvectors()); return result.get_model(); }
Inference
table run_inference(const pca::model<>& model, const table& new_data) { const auto pca_desc = pca::descriptor<float>{} .set_component_count(model.get_component_count()); const auto result = infer(pca_desc, model, new_data); print_table("labels", result.get_transformed_data()); }

Examples

oneAPI DPC++
Batch Processing:
oneAPI C++
Batch Processing:
Python* with DPC++ support
Batch Processing:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.