Intel® oneAPI Data Analytics Library Developer Guide and Reference
A newer version of this document is available. Customers should click here to go to the newest version.
Principal Components Analysis (PCA)
Principal Component Analysis (PCA) is an algorithm for exploratory data analysis and dimensionality reduction. PCA transforms a set of feature vectors of possibly correlated features to a new set of uncorrelated features, called principal components. Principal components are the directions of the largest variance, that is, the directions where the data is mostly spread out.
Operation |
Computational methods |
Programming Interface |
|||
Mathematical formulation
Programming Interface
All types and functions in this section are declared in the oneapi::dal::pca namespace and be available via inclusion of the oneapi/dal/algo/pca.hpp header file.
Enum classes
enumclassnormalization
- normalization::none
-
No normalization is necessary or data is not normalized.
- normalization::mean_center
-
Just mean centered is necessary, or data is already centered.
- normalization::zscore
-
Normalization is necessary, or data is already normalized.
Descriptor
template<typenameFloat=float,typenameMethod=method::by_default,typenameTask=task::by_default>classdescriptor
- Template Parameters
-
Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.
Method – Tag-type that specifies an implementation of algorithm. Can be method::cov or method::svd.
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
descriptor(std::int64_tcomponent_count=0)
Creates a new instance of the class with the given component_count property value.
Public Methods
boolwhiten()const
auto&set_whiten(boolvalue)
Properties
result_option_idresult_options
Choose which results should be computed and returned.
- Getter & Setter
-
result_option_id get_result_options() const
auto & set_result_options(const result_option_id &value)
normalizationnormalization_mode
. Default value: normalization::zscore.
- Getter & Setter
-
normalization get_normalization_mode() const
auto & set_normalization_mode(normalization value)
booldeterministic
Specifies whether the algorithm applies the sign-flip technique. If it is true, the directions of the eigenvectors must be deterministic. Default value: true.
- Getter & Setter
-
bool get_deterministic() const
auto & set_deterministic(bool value)
std::int64_tcomponent_count
The number of principal components . If it is zero, the algorithm computes the eigenvectors for all features,
. Default value: 0.
- Getter & Setter
-
std::int64_t get_component_count() const
auto & set_component_count(std::int64_t value)
- Invariants
-
component_count >= 0
normalizationdata_normalization
. Default value: normalization::none.
- Getter & Setter
-
normalization get_data_normalization() const
auto & set_data_normalization(normalization value)
Method tags
structcov
Tag-type that denotes Covariance computational method.
structprecomputed
structsvd
Tag-type that denotes SVD computational method.
usingby_default=cov
Alias tag-type for Covariance computational method.
Task tags
structdim_reduction
Tag-type that parameterizes entities used for solving dimensionality reduction problem.
usingby_default=dim_reduction
Alias tag-type for dimensionality reduction task.
Model
template<typenameTask=task::by_default>classmodel
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
model()
Creates a new instance of the class with the default property values.
Properties
consttable&eigenvectors
An table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.
- Getter & Setter
-
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
consttable&variances
Variances. Default value: table{}.
- Getter & Setter
-
const table & get_variances() const
auto & set_variances(const table &value)
consttable&means
Means. Default value: table{}.
- Getter & Setter
-
const table & get_means() const
auto & set_means(const table &value)
consttable&eigenvalues
Eigenvalues. Default value: table{}.
- Getter & Setter
-
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
Training train(...)
Input
template<typenameTask=task::by_default>classtrain_input
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
train_input()
train_input(consttable&data)
Creates a new instance of the class with the given data property value.
Properties
consttable&data
An table with the training data, where each row stores one feature vector. Default value: table{}.
- Getter & Setter
-
const table & get_data() const
auto & set_data(const table &data)
Result and Finalize Result
template<typenameTask=task::by_default>classtrain_result
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
train_result()
Creates a new instance of the class with the default property values.
Properties
consttable&variances
A table that contains the variances for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_variances() const
auto & set_variances(const table &value)
constresult_option_id&result_options
Result options that indicates availability of the properties. Default value: default_result_options<Task>.
- Getter & Setter
-
const result_option_id & get_result_options() const
auto & set_result_options(const result_option_id &value)
consttable&eigenvalues
A table that contains the eigenvalues for for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
consttable&singular_values
A table that contains the singular values for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_singular_values() const
auto & set_singular_values(const table &value)
consttable&means
A table that contains the mean values for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_means() const
auto & set_means(const table &value)
consttable&eigenvectors
An table with the eigenvectors. Each row contains one eigenvector. Default value: table{}.
- Getter & Setter
-
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
- Invariants
-
eigenvectors == model.eigenvectors
constmodel<Task>&model
The trained PCA model. Default value: model<Task>{}.
- Getter & Setter
-
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
consttable&explained_variances_ratio
A table that contains the explained variances values for the first r features. Default value: table{}.
- Getter & Setter
-
const table & get_explained_variances_ratio() const
auto & set_explained_variances_ratio(const table &value)
Operation
template<typenameDescriptor>pca::train_resulttrain(constDescriptor&desc, constpca::train_input&input)
- Parameters
-
desc – PCA algorithm descriptor pca::descriptor
input – Input data for the training operation
- Preconditions
-
input.data.has_data == true
input.data.column_count >= desc.component_count
- Postconditions
-
result.means.row_count == 1
result.means.column_count == desc.component_count
result.variances.row_count == 1
result.variances.column_count == desc.component_count
result.variances[i] >= 0.0
result.eigenvalues.row_count == 1
result.eigenvalues.column_count == desc.component_count
result.model.eigenvectors.row_count == 1
result.model.eigenvectors.column_count == desc.component_count
Partial Training
Partial Input
template<typenameTask=task::by_default>classpartial_train_input
Constructors
partial_train_input()
partial_train_input(consttable&data)
partial_train_input(constpartial_train_result<Task>&prev, consttable&data)
Properties
constpartial_train_result<Task>&prev
- Getter & Setter
-
const partial_train_result< Task > & get_prev() const
auto & set_prev(const partial_train_result< Task > &value)
consttable&data
- Getter & Setter
-
const table & get_data() const
auto & set_data(const table &value)
Partial Result and Finalize Input
template<typenameTask=task::by_default>classpartial_train_result
Constructors
partial_train_result()
Public Methods
std::int64_tget_auxiliary_table_count()const
Properties
consttable&partial_crossproduct
The crossproduct matrix. Default value: table{}.
- Getter & Setter
-
const table & get_partial_crossproduct() const
auto & set_partial_crossproduct(const table &value)
consttable&partial_n_rows
The nobs value. Default value: table{}.
- Getter & Setter
-
const table & get_partial_n_rows() const
auto & set_partial_n_rows(const table &value)
consttable&auxiliary_table
- Getter & Setter
-
const table & get_auxiliary_table(const std::int64_t) const
auto & set_auxiliary_table(const table &value)
consttable&partial_sum
Sums. Default value: table{}.
- Getter & Setter
-
const table & get_partial_sum() const
auto & set_partial_sum(const table &value)
Finalize Training
Inference infer(...)
Input
template<typenameTask=task::by_default>classinfer_input
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
infer_input(constmodel<Task>&trained_model, consttable&data)
Creates a new instance of the class with the given model and data property values.
Properties
constmodel<Task>&model
The trained PCA model. Default value: model<Task>{}.
- Getter & Setter
-
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
consttable&data
The dataset for inference . Default value: table{}.
- Getter & Setter
-
const table & get_data() const
auto & set_data(const table &value)
Result
template<typenameTask=task::by_default>classinfer_result
- Template Parameters
-
Task – Tag-type that specifies type of the problem to solve. Can be task::dim_reduction.
Constructors
infer_result()
Creates a new instance of the class with the default property values.
Properties
consttable&transformed_data
An table that contains data projected to the r principal components. Default value: table{}.
- Getter & Setter
-
const table & get_transformed_data() const
auto & set_transformed_data(const table &value)
Operation
template<typenameDescriptor>pca::infer_resultinfer(constDescriptor&desc, constpca::infer_input&input)
- Parameters
-
desc – PCA algorithm descriptor pca::descriptor
input – Input data for the inference operation
Usage Example
Training
pca::model<> run_training(const table& data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(5)
.set_deterministic(true);
const auto result = train(pca_desc, data);
print_table("means", result.get_means());
print_table("variances", result.get_variances());
print_table("eigenvalues", result.get_eigenvalues());
print_table("eigenvectors", result.get_eigenvectors());
return result.get_model();
}
Inference
table run_inference(const pca::model<>& model,
const table& new_data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(model.get_component_count());
const auto result = infer(pca_desc, model, new_data);
print_table("labels", result.get_transformed_data());
}
Examples
oneAPI DPC++
Batch Processing:
Online Processing:
oneAPI C++
Batch Processing:
Online Processing: