## Developer Guide and Reference

• 2021.3
• 06/28/2021
• Public Content
Contents

# k-Nearest Neighbors Classification (k-NN)

k
-NN classification algorithm infers the class for the new feature vector by computing majority vote of the
k
nearest observations from the training set.
 Operation Computational methods Programming Interface

## Mathematical formulation

Training
Let be the training set of
p
-dimensional feature vectors, let be the set of class labels, where , . Given
X
,
Y
and the number of nearest neighbors
k
, the problem is to build a model that allows distance computation between the feature vectors in training and inference sets at the inference stage.
Training method:
brute-force
The training operation produces the model that stores all the feature vectors from the initial training set
X
.
Training method:
k-d tree
The training operation builds a
k
-
d
tree that partitions the training set
X
(for more details, see k-d Tree).
Inference
Let be the inference set of
p
-dimensional feature vectors. Given , the model produced at the training stage and the number of nearest neighbors
k
, the problem is to predict the label for each , , by performing the following steps:
1. Identify the set of the
k
feature vectors in the training set that are nearest to with respect to the Euclidean distance, which is chosen by default. The distance can be customized with the predefined set of pairwise distances: Minkowski distances with fractional degree (including Euclidean distance) and Chebyshev distance.
2. Estimate the conditional probability for the
l
-th class as the fraction of vectors in whose labels are equal to
l
: 3. Predict the class that has the highest probability for the feature vector : Inference method:
brute-force
Brute-force inference method determines the set of the nearest feature vectors by iterating over all the pairs in the implementation defined order, , . The final prediction is computed according to the equations (1) and (2).
Inference method:
k-d tree
K-d tree inference method traverses the
k
-
d
tree to find feature vectors associated with a leaf node that are closest to , . The set of the currently-known nearest
k
-th neighbors is progressively updated during tree traversal. The search algorithm limits exploration of the nodes for which the distance between the and respective part of the feature space is not less than the distance between and the most distant feature vector from . Once tree traversal is finished, . The final prediction is computed according to the equations (1) and (2).

## Usage example

Training
``````knn::model<> run_training(const table& data,
const table& labels) {
const std::int64_t class_count = 10;
const std::int64_t neighbor_count = 5;
const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

const auto result = train(knn_desc, data, labels);

return result.get_model();
}``````
Inference
``````table run_inference(const knn::model<>& model,
const table& new_data) {
const std::int64_t class_count = 10;
const std::int64_t neighbor_count = 5;
const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count};

const auto result = infer(knn_desc, model, new_data);

print_table("labels", result.get_labels());
}``````

## Examples

oneAPI DPC++
Batch Processing:
oneAPI C++
Batch Processing:
Python* with DPC++ support
Batch Processing:

#### Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.