Developer Guide and Reference

  • 2021.4
  • 09/27/2021
  • Public Content
Contents

k-Nearest Neighbors Classification and Search (k-NN)

k
-NN classification and search algorithms are based on finding the
k
nearest observations to the training set. For classification, the problem is to infer the class of a new feature vector by computing the majority vote of its
k
nearest observations from the training set. For search, the problem is to infer
k
nearest observations from the training set to a new feature vector. The nearest observations are computed based on the chosen distance metric.

Mathematical formulation

Training
Classification
Let LaTex Math image. be the training set of
p
-dimensional feature vectors, let LaTex Math image. be the set of class labels, where LaTex Math image., LaTex Math image., and
C
is the number of classes. Given
X
,
Y
, and the number of nearest neighbors
k
, the problem is to build a model that allows distance computation between the feature vectors in training and inference sets at the inference stage.
Search
Let LaTex Math image. be the training set of
p
-dimensional feature vectors. Given
X
and the number of nearest neighbors
k
, the problem is to build a model that allows distance computation between the feature vectors in training and inference sets at the inference stage.
Training method:
brute-force
The training operation produces the model that stores all the feature vectors from the initial training set
X
.
Training method:
k-d tree
The training operation builds a
k
-
d
tree that partitions the training set
X
(for more details, see k-d Tree).
Inference
Classification
Let LaTex Math image. be the inference set of
p
-dimensional feature vectors. Given LaTex Math image., the model produced at the training stage, and the number of nearest neighbors
k
, the problem is to predict the label LaTex Math image. from the
Y
set for each LaTex Math image., LaTex Math image., by performing the following steps:
  1. Identify the set LaTex Math image. of
    k
    feature vectors in the training set that are nearest to LaTex Math image. with respect to the Euclidean distance, which is chosen by default. The distance can be customized with the predefined set of pairwise distances: Minkowski distances with fractional degree (including Euclidean distance), Chebyshev distance, and Cosine distance.
  2. Estimate the conditional probability for the
    l
    -th class as the fraction of vectors in LaTex Math image. whose labels LaTex Math image. are equal to
    l
    :
    LaTex Math image.
  3. Predict the class that has the highest probability for the feature vector LaTex Math image.:
    LaTex Math image.
Search
Let LaTex Math image. be the inference set of
p
-dimensional feature vectors. Given LaTex Math image., the model produced at the training stage, and the number of nearest neighbors
k
:
  1. Identify the set LaTex Math image. of
    k
    feature vectors in the training set that are nearest to LaTex Math image. with respect to the Euclidean distance, which is chosen by default. The distance can be customized with the predefined set of pairwise distances: Minkowski distances with fractional degree (including Euclidean distance), Chebyshev distance, and Cosine distance.
Inference method:
brute-force
Brute-force inference method determines the set LaTex Math image. of the nearest feature vectors by iterating over all the pairs LaTex Math image. in the implementation defined order, LaTex Math image., LaTex Math image..
Inference method:
k-d tree
K-d tree inference method traverses the
k
-
d
tree to find feature vectors associated with a leaf node that are closest to LaTex Math image., LaTex Math image.. The set LaTex Math image. of the currently known nearest
k
neighbors is progressively updated during the tree traversal. The search algorithm limits exploration of the nodes for which the distance between the LaTex Math image. and respective part of the feature space is not less than the distance between LaTex Math image. and the most distant feature vector from LaTex Math image.. Once tree traversal is finished, LaTex Math image..

Usage example

Training
knn::model<> run_training(const table& data, const table& labels) { const std::int64_t class_count = 10; const std::int64_t neighbor_count = 5; const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count}; const auto result = train(knn_desc, data, labels); return result.get_model(); }
Inference
table run_inference(const knn::model<>& model, const table& new_data) { const std::int64_t class_count = 10; const std::int64_t neighbor_count = 5; const auto knn_desc = knn::descriptor<float>{class_count, neighbor_count}; const auto result = infer(knn_desc, model, new_data); print_table("labels", result.get_labels()); }

Examples

oneAPI DPC++
Batch Processing:
oneAPI C++
Batch Processing:
Python* with DPC++ support
Batch Processing:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.