Developer Guide and Reference

  • 2021.4
  • 09/27/2021
  • Public Content
Contents

K-Means

The K-Means algorithm solves clustering problem by partitioning
n
feature vectors into
k
clusters minimizing some criterion. Each cluster is characterized by a representative point, called
a centroid
.

Mathematical formulation

Programming Interface

All types and functions in this section are declared in the
oneapi::dal::kmeans
namespace and be available via inclusion of the
oneapi/dal/algo/kmeans.hpp
header file.
Descriptor
template<typename
Float
= float, typename
Method
= method::by_default, typename
Task
= task::by_default>
class
descriptor
Template Parameters
  • Float
    – The floating-point type that the algorithm uses for intermediate computations. Can be
    float
    or
    double
    .
  • Method
    – Tag-type that specifies an implementation of algorithm. Can be
    method::lloyd_dense
    .
  • Task
    – Tag-type that specifies the type of the problem to solve. Can be .
Constructors
descriptor
(std::int64_t
cluster_count
= 2)
Creates a new instance of the class with the given
cluster_count
.
Properties
std::int64_t
cluster_count
The number of clusters k.
Default value
: 2.
Getter & Setter


std::int64_t get_cluster_count() const
auto & set_cluster_count(int64_t value)

Invariants


cluster_count > 0

double
accuracy_threshold
The threshold LaTex Math image. for the stop condition.
Default value
: 0.0.
Getter & Setter


double get_accuracy_threshold() const
auto & set_accuracy_threshold(double value)

Invariants


accuracy_threshold >= 0.0

std::int64_t
max_iteration_count
The maximum number of iterations
T
.
Default value
: 100.
Getter & Setter


std::int64_t get_max_iteration_count() const
auto & set_max_iteration_count(int64_t value)

Invariants


max_iteration_count >= 0

Method tags
struct
lloyd_dense
Tag-type that denotes Lloyd’s computational method.
using
by_default
= lloyd_dense
Alias tag-type for Lloyd’s computational method.
Task tags
struct
clustering
Tag-type that parameterizes entities used for solving clustering problem.
using
by_default
= clustering
Alias tag-type for the clustering task.
Model
template<typename
Task
= task::by_default>
class
model
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
model
()
Creates a new instance of the class with the default property values.
Public Methods
std::int64_t
get_cluster_count
()
const
Number of clusters k in the trained model.
Properties
const
table &
centroids
A LaTex Math image. table with the cluster centroids. Each row of the table stores one centroid.
Default value
: table{}.
Getter & Setter


const table & get_centroids() const
auto & set_centroids(const table &value)

Training
train(...)
Input
template<typename
Task
= task::by_default>
class
train_input
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
train_input
(
const
table &
data
)
train_input
(
const
table &
data
,
const
table &
initial_centroids
)
Creates a new instance of the class with the given
data
and
initial_centroids
.
Properties
const
table &
initial_centroids
A LaTex Math image. table with the initial centroids, where each row stores one centroid.
Getter & Setter


const table & get_initial_centroids() const
auto & set_initial_centroids(const table &data)

const
table &
data
An LaTex Math image. table with the data to be clustered, where each row stores one feature vector.
Getter & Setter


const table & get_data() const
auto & set_data(const table &data)

Result
template<typename
Task
= task::by_default>
class
train_result
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
train_result
()
Creates a new instance of the class with the default property values.
Properties
const
table &
responses
An LaTex Math image. table with the responses LaTex Math image. assigned to the samples LaTex Math image. in the input data, LaTex Math image..
Default value
: table{}.
Getter & Setter


const table & get_responses() const
auto & set_responses(const table &value)

int64_t
iteration_count
The number of iterations performed by the algorithm.
Default value
: 0.
Getter & Setter


int64_t get_iteration_count() const
auto & set_iteration_count(std::int64_t value)

Invariants


iteration_count >= 0

const
model<Task> &
model
The trained K-means model.
Default value
: model<Task>{}.
Getter & Setter


const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

double
objective_function_value
The value of the objective function LaTex Math image., where C is
model.centroids
.
Getter & Setter


double get_objective_function_value() const
auto & set_objective_function_value(double value)

Invariants


objective_function_value >= 0.0

const
table &
labels
An LaTex Math image. table with the labels LaTex Math image. assigned to the samples LaTex Math image. in the input data, LaTex Math image..
Default value
: table{}.
Getter & Setter


const table & get_labels() const
auto & set_labels(const table &value)

Operation
template<typename
Descriptor
> kmeans::train_result
train
(
const
Descriptor &
desc
,
const
kmeans::train_input &
input
)
Parameters
  • desc
    – K-Means algorithm descriptor
    kmeans::descriptor
  • input
    – Input data for the training operation
Preconditions


input.data.has_data == true
input.initial_centroids.row_count == desc.cluster_count
input.initial_centroids.column_count == input.data.column_count

Postconditions


result.labels.row_count == input.data.row_count
result.labels.column_count == 1
result.labels[i] >= 0
result.labels[i] < desc.cluster_count
result.iteration_count <= desc.max_iteration_count
result.model.centroids.row_count == desc.cluster_count
result.model.centroids.column_count == input.data.column_count

Inference
infer(...)
Input
template<typename
Task
= task::by_default>
class
infer_input
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
infer_input
(
const
model<Task> &
trained_model
,
const
table &
data
)
Creates a new instance of the class with the given
model
and
data
.
Properties
const
model<Task> &
model
An LaTex Math image. table with the data to be assigned to the clusters, where each row stores one feature vector.
Default value
: model<Task>{}.
Getter & Setter


const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

const
table &
data
The trained K-Means model.
Default value
: table{}.
Getter & Setter


const table & get_data() const
auto & set_data(const table &value)

Result
template<typename
Task
= task::by_default>
class
infer_result
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be .
Constructors
infer_result
()
Creates a new instance of the class with the default property values.
Properties
double
objective_function_value
The value of the objective function LaTex Math image., where C is defined by the corresponding
infer_input::model::centroids
.
Default value
: 0.0.
Getter & Setter


double get_objective_function_value() const
auto & set_objective_function_value(double value)

Invariants


objective_function_value >= 0.0

const
table &
labels
An LaTex Math image. table with assignments labels to feature vectors in the input data.
Default value
: table{}.
Getter & Setter


const table & get_labels() const
auto & set_labels(const table &value)

const
table &
responses
An LaTex Math image. table with assignments responses to feature vectors in the input data.
Default value
: table{}.
Getter & Setter


const table & get_responses() const
auto & set_responses(const table &value)

Operation
template<typename
Descriptor
> kmeans::infer_result
infer
(
const
Descriptor &
desc
,
const
kmeans::infer_input &
input
)
Parameters
  • desc
    – K-Means algorithm descriptor
    kmeans::descriptor
  • input
    – Input data for the inference operation
Preconditions


input.data.has_data == true
input.model.centroids.has_data == true
input.model.centroids.row_count == desc.cluster_count
input.model.centroids.column_count == input.data.column_count

Postconditions


result.labels.row_count == input.data.row_count
result.labels.column_count == 1
result.labels[i] >= 0
result.labels[i] < desc.cluster_count

Usage example

Training
kmeans::model<> run_training(const table& data, const table& initial_centroids) { const auto kmeans_desc = kmeans::descriptor<float>{} .set_cluster_count(10) .set_max_iteration_count(50) .set_accuracy_threshold(1e-4); const auto result = train(kmeans_desc, data, initial_centroids); print_table("labels", result.get_labels()); print_table("centroids", result.get_model().get_centroids()); print_value("objective", result.get_objective_function_value()); return result.get_model(); }
Inference
table run_inference(const kmeans::model<>& model, const table& new_data) { const auto kmeans_desc = kmeans::descriptor<float>{} .set_cluster_count(model.get_cluster_count()); const auto result = infer(kmeans_desc, model, new_data); print_table("labels", result.get_labels()); }

Examples

oneAPI DPC++
Batch Processing:
oneAPI C++
Batch Processing:
Python* with DPC++ support
Batch Processing:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.