Developer Guide and Reference

  • 2021.4
  • 09/27/2021
  • Public Content
Contents

Multivariate BACON Outlier Detection

In multivariate outlier detection methods, the observation point is the entire feature vector.

Details

Given a set
X
of
n
feature vectors LaTex Math image. of dimension
p
, the problem is to identify the vectors that do not belong to the underlying distribution using the BACON method (see [Billor2000]).
In the iterative method, each iteration involves several steps:
  1. Identify an initial basic subset of LaTex Math image. feature vectors that can be assumed as not containing outliers. The constant
    m
    is set to LaTex Math image.. The library supports two approaches to selecting the initial subset:
    • Based on distances from the medians LaTex Math image., where:
      • med
        is the vector of coordinate-wise medians
      • LaTex Math image. is the vector norm
      • LaTex Math image.
    • Based on the Mahalanobis distance LaTex Math image., where:
      • mean
        and
        S
        are the mean and the covariance matrix, respectively, of
        n
        feature vectors
      • LaTex Math image.
    Each method chooses
    m
    feature vectors with the smallest values of distances.
  2. Compute the discrepancies using the Mahalanobis distance above, where mean and S are the mean and the covariance matrix, respectively, computed for the feature vectors contained in the basic subset.
  3. Set the new basic subset to all feature vectors with the discrepancy less than LaTex Math image., where:
    • LaTex Math image. is the LaTex Math image. percentile of the Chi-square distribution with
      p
      degrees of freedom
    • LaTex Math image., where:
      • r
        is the size of the current basic subset
      • LaTex Math image., where LaTex Math image. and LaTex Math image. is the integer part of a number
      • LaTex Math image.
  4. Iterate steps 2 and 3 until the size of the basic subset no longer changes.
  5. Nominate the feature vectors that are not part of the final basic subset as outliers.

Batch Processing

Algorithm Input
The multivariate BACON outlier detection algorithm accepts the input described below. Pass the
Input ID
as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.
Input ID
Input
data
Pointer to the LaTex Math image. numeric table with the data for outlier detection.
The input can be an object of any class derived from the
NumericTable
class.
Algorithm Parameters
The multivariate BACON outlier detection algorithm has the following parameters:
Parameter
Default Value
Description
algorithmFPType
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
initializationMethod
baconMedian
The initialization method, can be:
  • baconMedian
    - median-based method
  • defaultDense
    - Mahalanobis distance-based method
alpha
0.05
One-tailed probability that defines the LaTex Math image. quantile of the LaTex Math image. distribution with
p
degrees of freedom.
Recommended value: LaTex Math image., where
n
is the number of observations.
toleranceToConverge
0.005
The stopping criterion. The algorithm is terminated if the size of the basic subset is changed by less than the threshold.
Algorithm Output
The multivariate BACON outlier detection algorithm calculates the result described below. Pass the
Result ID
as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.
Result ID
Result
weights
Pointer to the LaTex Math image. numeric table of zeros and ones. Zero in the
i
-th position indicates that the
i
-th feature vector is an outlier.
By default, the result is an object of the
HomogenNumericTable
class, but you can define the result as an object of any class derived from
NumericTable
except the
PackedSymmetricMatrix
,
PackedTriangularMatrix
, and
CSRNumericTable
.

Examples

C++ (CPU)
Batch Processing:
Java*
There is no support for Java on GPU.
Batch Processing:
Python*
Batch Processing:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.