Principal Components Analysis (PCA)
Operation | Computational methods | Programming Interface |
Mathematical formulation
Programming Interface
- template<typenameFloat= float, typenameMethod= method::by_default, typenameTask= task::by_default>classdescriptor
- Template Parameters
- Float– The floating-point type that the algorithm uses for intermediate computations. Can befloatordouble.
- Task– Tag-type that specifies type of the problem to solve. Can betask::dim_reduction.
Constructors- descriptor(std::int64_tcomponent_count= 0)
- Creates a new instance of the class with the givencomponent_countproperty value.
Properties- std::int64_tcomponent_count
- The number of principal components
. If it is zero, the algorithm computes the eigenvectors for all features,
.
Default value: 0.- Getter & Setter
std::int64_t get_component_count() const
auto & set_component_count(int64_t value)
- Invariants
component_count >= 0
- booldeterministic
- Specifies whether the algorithm applies the sign-flip technique. If it istrue, the directions of the eigenvectors must be deterministic.Default value: true.
- Getter & Setter
bool get_deterministic() const
auto & set_deterministic(bool value)
- structcov
- Tag-type that denotes Covariance computational method.
- structsvd
- Tag-type that denotes SVD computational method.
- Alias tag-type for Covariance computational method.
- structdim_reduction
- Tag-type that parameterizes entities used for solving dimensionality reduction problem.
- Alias tag-type for dimensionality reduction task.
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::dim_reduction.
Constructors- model()
- Creates a new instance of the class with the default property values.
Properties- An
table with the eigenvectors. Each row contains one eigenvector.
Default value: table{}.- Getter & Setter
const table & get_eigenvectors() const
auto & set_eigenvectors(const table &value)
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::dim_reduction.
Constructors- Creates a new instance of the class with the givendataproperty value.
Properties- An
table with the training data, where each row stores one feature vector.
Default value: table{}.- Getter & Setter
const table & get_data() const
auto & set_data(const table &data)
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::dim_reduction.
Constructors- train_result()
- Creates a new instance of the class with the default property values.
Public Methods- An
table with the eigenvectors. Each row contains one eigenvector.
Properties- A
table that contains the eigenvalues for for the first
rfeatures.Default value: table{}.- Getter & Setter
const table & get_eigenvalues() const
auto & set_eigenvalues(const table &value)
- A
table that contains the variances for the first
rfeatures.Default value: table{}.- Getter & Setter
const table & get_variances() const
auto & set_variances(const table &value)
- A
table that contains the mean values for the first
rfeatures.Default value: table{}.- Getter & Setter
const table & get_means() const
auto & set_means(const table &value)
- The trained PCA model.Default value: model<Task>{}.
- Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
- template<typenameDescriptor> pca::train_resulttrain(constDescriptor &desc,constpca::train_input &input)
- Parameters
- desc– PCA algorithm descriptorpca::descriptor
- input– Input data for the training operation
- Preconditions
input.data.has_data == true
input.data.column_count >= desc.component_count
- Postconditions
result.means.row_count == 1
result.means.column_count == desc.component_count
result.variances.row_count == 1
result.variances.column_count == desc.component_count
result.variances[i] >= 0.0
result.eigenvalues.row_count == 1
result.eigenvalues.column_count == desc.component_count
result.model.eigenvectors.row_count == 1
result.model.eigenvectors.column_count == desc.component_count
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::dim_reduction.
ConstructorsProperties- The dataset for inference
.
Default value: table{}.- Getter & Setter
const table & get_data() const
auto & set_data(const table &value)
- The trained PCA model.Default value: model<Task>{}.
- Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::dim_reduction.
Constructors- infer_result()
- Creates a new instance of the class with the default property values.
Properties- An
table that contains data projected to the
rprincipal components.Default value: table{}.- Getter & Setter
const table & get_transformed_data() const
auto & set_transformed_data(const table &value)
- template<typenameDescriptor> pca::infer_resultinfer(constDescriptor &desc,constpca::infer_input &input)
- Parameters
- desc– PCA algorithm descriptorpca::descriptor
- input– Input data for the inference operation
- Preconditions
input.data.has_data == true
input.model.eigenvectors.row_count == desc.component_count
input.model.eigenvectors.column_count == input.data.column_count
- Postconditions
result.transformed_data.row_count == input.data.row_count
result.transformed_data.column_count == desc.component_count
Usage example
pca::model<> run_training(const table& data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(5)
.set_deterministic(true);
const auto result = train(pca_desc, data);
print_table("means", result.get_means());
print_table("variances", result.get_variances());
print_table("eigenvalues", result.get_eigenvalues());
print_table("eigenvectors", result.get_eigenvectors());
return result.get_model();
}
table run_inference(const pca::model<>& model,
const table& new_data) {
const auto pca_desc = pca::descriptor<float>{}
.set_component_count(model.get_component_count());
const auto result = infer(pca_desc, model, new_data);
print_table("labels", result.get_transformed_data());
}