K-Means
Operation | Computational methods | Programming Interface |
Mathematical formulation
Programming Interface
- template<typenameFloat= float, typenameMethod= method::by_default, typenameTask= task::by_default>classdescriptor
- Template Parameters
- Float– The floating-point type that the algorithm uses for intermediate computations. Can befloatordouble.
- Method– Tag-type that specifies an implementation of algorithm. Can bemethod::lloyd_dense.
- Task– Tag-type that specifies the type of the problem to solve. Can betask::clustering.
Constructors- descriptor(std::int64_tcluster_count= 2)
- Creates a new instance of the class with the givencluster_count.
Properties- std::int64_tmax_iteration_count
- The maximum number of iterationsT.Default value: 100.
- Getter & Setter
std::int64_t get_max_iteration_count() const
auto & set_max_iteration_count(int64_t value)
- Invariants
max_iteration_count >= 0
- std::int64_tcluster_count
- The number of clusters k.Default value: 2.
- Getter & Setter
std::int64_t get_cluster_count() const
auto & set_cluster_count(int64_t value)
- Invariants
cluster_count > 0
- doubleaccuracy_threshold
- The threshold
for the stop condition.
Default value: 0.0.- Getter & Setter
double get_accuracy_threshold() const
auto & set_accuracy_threshold(double value)
- Invariants
accuracy_threshold >= 0.0
- structlloyd_dense
- Tag-type that denotes Lloyd’s computational method.
- Alias tag-type for Lloyd’s computational method.
- structclustering
- Tag-type that parameterizes entities used for solving clustering problem.
- Alias tag-type for the clustering task.
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::clustering.
Constructors- model()
- Creates a new instance of the class with the default property values.
Public Methods- std::int64_tget_cluster_count()const
- Number of clusters k in the trained model.
Properties- A
table with the cluster centroids. Each row of the table stores one centroid.
Default value: table{}.- Getter & Setter
const table & get_centroids() const
auto & set_centroids(const table &value)
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::clustering.
ConstructorsProperties- A
table with the initial centroids, where each row stores one centroid.
- Getter & Setter
const table & get_initial_centroids() const
auto & set_initial_centroids(const table &data)
- An
table with the data to be clustered, where each row stores one feature vector.
- Getter & Setter
const table & get_data() const
auto & set_data(const table &data)
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::clustering.
Constructors- train_result()
- Creates a new instance of the class with the default property values.
Properties- The trained K-means model.Default value: model<Task>{}.
- Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
- doubleobjective_function_value
- The value of the objective function
, where C is
model.centroids.- Getter & Setter
double get_objective_function_value() const
auto & set_objective_function_value(double value)
- Invariants
objective_function_value >= 0.0
- int64_titeration_count
- The number of iterations performed by the algorithm.Default value: 0.
- Getter & Setter
int64_t get_iteration_count() const
auto & set_iteration_count(std::int64_t value)
- Invariants
iteration_count >= 0
- An
table with the labels
assigned to the samples
in the input data,
.
Default value: table{}.- Getter & Setter
const table & get_labels() const
auto & set_labels(const table &value)
- An
table with the responses
assigned to the samples
in the input data,
.
Default value: table{}.- Getter & Setter
const table & get_responses() const
auto & set_responses(const table &value)
- template<typenameDescriptor> kmeans::train_resulttrain(constDescriptor &desc,constkmeans::train_input &input)
- Parameters
- desc– K-Means algorithm descriptorkmeans::descriptor
- input– Input data for the training operation
- Preconditions
input.data.has_data == true
input.initial_centroids.row_count == desc.cluster_count
input.initial_centroids.column_count == input.data.column_count
- Postconditions
result.labels.row_count == input.data.row_count
result.labels.column_count == 1
result.labels[i] >= 0
result.labels[i] < desc.cluster_count
result.iteration_count <= desc.max_iteration_count
result.model.centroids.row_count == desc.cluster_count
result.model.centroids.column_count == input.data.column_count
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::clustering.
ConstructorsProperties- The trained K-Means model.Default value: table{}.
- Getter & Setter
const table & get_data() const
auto & set_data(const table &value)
- An
table with the data to be assigned to the clusters, where each row stores one feature vector.
Default value: model<Task>{}.- Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::clustering.
Constructors- infer_result()
- Creates a new instance of the class with the default property values.
Properties- doubleobjective_function_value
- The value of the objective function
, where C is defined by the corresponding
infer_input::model::centroids.Default value: 0.0.- Getter & Setter
double get_objective_function_value() const
auto & set_objective_function_value(double value)
- Invariants
objective_function_value >= 0.0
- An
table with assignments labels to feature vectors in the input data.
Default value: table{}.- Getter & Setter
const table & get_labels() const
auto & set_labels(const table &value)
- An
table with assignments responses to feature vectors in the input data.
Default value: table{}.- Getter & Setter
const table & get_responses() const
auto & set_responses(const table &value)
- template<typenameDescriptor> kmeans::infer_resultinfer(constDescriptor &desc,constkmeans::infer_input &input)
- Parameters
- desc– K-Means algorithm descriptorkmeans::descriptor
- input– Input data for the inference operation
- Preconditions
input.data.has_data == true
input.model.centroids.has_data == true
input.model.centroids.row_count == desc.cluster_count
input.model.centroids.column_count == input.data.column_count
- Postconditions
result.labels.row_count == input.data.row_count
result.labels.column_count == 1
result.labels[i] >= 0
result.labels[i] < desc.cluster_count
Usage example
kmeans::model<> run_training(const table& data,
const table& initial_centroids) {
const auto kmeans_desc = kmeans::descriptor<float>{}
.set_cluster_count(10)
.set_max_iteration_count(50)
.set_accuracy_threshold(1e-4);
const auto result = train(kmeans_desc, data, initial_centroids);
print_table("labels", result.get_labels());
print_table("centroids", result.get_model().get_centroids());
print_value("objective", result.get_objective_function_value());
return result.get_model();
}
table run_inference(const kmeans::model<>& model,
const table& new_data) {
const auto kmeans_desc = kmeans::descriptor<float>{}
.set_cluster_count(model.get_cluster_count());
const auto result = infer(kmeans_desc, model, new_data);
print_table("labels", result.get_labels());
}