Developer Guide and Reference

  • 2021.4
  • 09/27/2021
  • Public Content
Contents

Naïve Bayes Classifier

Naïve Bayes is a set of simple and powerful classification methods often used for text classification, medical diagnosis, and other classification problems. In spite of their main assumption about independence between features, Naïve Bayes classifiers often work well when this assumption does not hold. An advantage of this method is that it requires only a small amount of training data to estimate model parameters.

Details

The library provides Multinomial Naïve Bayes classifier [Renie03].
Let
J
be the number of classes, indexed LaTex Math image.. The integer-valued feature vector LaTex Math image., LaTex Math image., contains scaled frequencies: the value of LaTex Math image. is the number of times the
k
-th feature is observed in the vector LaTex Math image. (in terms of the document classification problem, LaTex Math image. is the number of occurrences of the word indexed
k
in the document LaTex Math image.. For a given data set (a set of
n
documents), LaTex Math image., the problem is to train a Naïve Bayes classifier.
Training Stage
The Training stage involves calculation of these parameters:
  • LaTex Math image., where LaTex Math image. is the number of occurrences of the feature
    k
    in the class
    j
    , LaTex Math image. is the total number of occurrences of all features in the class, the LaTex Math image. (for example, LaTex Math image.), and LaTex Math image. is the sum of all LaTex Math image..
  • LaTex Math image., where LaTex Math image. is the prior class estimate.
Prediction Stage
Given a new feature vector LaTex Math image., the classifier determines the class the vector belongs to:
LaTex Math image.

Computation

The following computation modes are available:

Examples

C++ (CPU)
Batch Processing:
Online Processing:
Distributed Processing:
Java*
There is no support for Java on GPU.
Batch Processing:
Online Processing:
Distributed Processing:
Python*
Batch Processing:
Online Processing:
Distributed Processing:

Performance Considerations

Training Stage
To get the best overall performance at the Naïve Bayes classifier training stage:
  • If input data is homogeneous:
    • For the training data set, use a homogeneous numeric table of the same type as specified in the algorithmFPType class template parameter.
    • For class labels, use a homogeneous numeric table of type int.
  • If input data is non-homogeneous, use AOS layout rather than SOA layout.
The training stage of the Naïve Bayes classifier algorithm is memory access bound in most cases. Therefore, use efficient data layout whenever possible.
Prediction Stage
To get the best overall performance at the Naïve Bayes classifier prediction stage:
  • For the working data set, use a homogeneous numeric table of the same type as specified in the algorithmFPType class template parameter.
  • For predicted labels, use a homogeneous numeric table of type int.
Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​.
Notice revision #20201201

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.