Application Notes for oneMKL Summary Statistics

ID 772991
Date 3/31/2023
Public
Document Table of Contents

Detecting Multivariate Outliers

Use the BACON algorithm to detect multivariate outliers [Billor2000].

The parameters of the algorithm are packed into the BaconParams array. Use the EditOutliersDetection editor to pass into the library the pointer to this array and other required parameters. The Structure of the Array of BACON Parameters table in the Summary Statistics section of [MKLMan] describes the structure of the BaconParams array.

The BACON algorithm outputs a vector of weights BaconWeights that can take the following values:

  1. If the i-th observation is detected as an outlier, BaconWeights(i) = 0.

  2. If the vector of input weights is not provided and the i-th observation is not detected as an outlier, BaconWeights(i) = 1.

  3. In all other cases, BaconWeights(i) = w(i), where w is the vector of input weights.

The example below illustrates the outlier detection using the BACON algorithm:

#include "mkl_vsl.h"
 
#define DIM 10     /* dimension of the task */
#define N   1000   /* number of observations */
#define M      3   /* number of BACON algorithm parameters */
 
int main()
{
    VSLSSTaskPtr task;
    double x[DIM][N];  /* matrix of observations */
    double BaconParams[VSL_SS_BACON_PARAMS_N];
    double BaconWeights[N];
    MKL_INT p, n, xstorage;
    MKL_INT NParams;
    int status;
    double init_method, alpha, beta;
 
    /* Task and Initialization Parameters */
    p = DIM;
    n = N;
    xstorage = VSL_SS_MATRIX_STORAGE_ROWS;
 
    /* Parameters of the BACON algorithm */
    init_method = VSL_SS_METHOD_BACON_MEDIAN_INIT;
    alpha  = 0.01;
    beta   = 0.01;
    NParams = VSL_SS_BACON_PARAMS_N;
 
    BaconParams[0] = init_method;
    BaconParams[1] = alpha;
    BaconParams[2] = beta;
 
    /* Create a task */
    status = vsldSSNewTask( &task, &p, &n, &xstorage, (double*)x, 0, 0 );
 
    /* Initialize the task parameters */
    status = vsldSSEditOutliersDetection( task, &NParams, BaconParams, 
                                          BaconWeights );
 
    /* Detect the outliers in the observations */
    status = vsldSSCompute( task, VSL_SS_OUTLIERS, VSL_SS_METHOD_BACON );
 
    /* BaconWeights will hold zeros or/and ones */
 
    /* Deallocate the task resources */
    status = vslSSDeleteTask( &task );
 
    return 0;
}
NOTE:

Outlier detection is possible only in data arrays available at once, or in separate blocks of the datasets.

Calculation of the Mahalanobis distance used in the BACON algorithm requires computation of an inverse variance-covariance matrix. In some cases, the inverse matrix cannot be calculated, for example, if components of the random vector are dependent. The oneMKL version of the BACON algorithm checks the reversibility of the matrix by calculating its eigenvalues. If the minimum eigenvalue is non-positive, the algorithm searches for the minimal matrix eigenvalue E exceeding 1000*P, where P is the minimal positive floating-point number. If the routine fails to find such an eigenvalue, the computations terminate with a corresponding error code. Otherwise, the variance-covariance matrix is corrected by adding 0.01*E to elements of the main diagonal, and the calculations continue. Upon successful completion, the function returns the VSL_SS_NOT_FULL_RANK_MATRIX warning, indicating that the algorithm has detected a variance-covariance matrix of an incomplete rank.