Visible to Intel only — GUID: GUID-E578C0D2-50C1-4ADD-B40B-61288D1454CD

Visible to Intel only — GUID: GUID-E578C0D2-50C1-4ADD-B40B-61288D1454CD

## Using the BACON Algorithm for Outlier Detection

The BACON algorithm is a tool for outlier detection that finds "suspicious" observations and provides unbiased statistical estimates for contaminated datasets.

Consider a dataset generated from a multivariate Gaussian distribution with the help of a corresponding generator available in Intel® oneAPI Math Kernel Library (oneMKL). Some of the observations are then replaced with the outliers from the multivariate Gaussian distribution that has a significantly bigger mathematical expectation. The number of outliers is approximately 20%.

To detect the outliers using the BACON algorithm, do the following:

Initialize the algorithm parameters:

Define the initialization scheme of the algorithm. You can choose between Median- and Mahalanobis distance-based schemes.

Define rejection level as

`alpha`and stopping criteria level as`beta`.

The parameters are initialized as follows:

init_method = VSL_SS_METHOD_BACON_MEDIAN_INIT; beta = 0.005; BaconN = VSL_SS_BACON_PARAMS_N; BaconParams[0] = init_method; BaconParams[1] = alpha; BaconParams[2] = beta;

For details on the parameters, see Table Structure of the Array of BACON Parameters of the Summary Statistics section in [MKLMan].

Pass the parameters into the library using a suitable editor:

errcode=vsldSSEditOutliersDetection( task, &BaconN, BaconParams, BaconWeights );

The

`BaconWeights`parameter is an array of weights that holds the output of the algorithm and points at suspicious observations. The size of the array equals the number of observations. The 0 value in the`i`-th position of the array indicates that the`i`-th observation requires special attention. The 1 value indicates that the observation is unbiased.Call the

`Compute`routine:errcode = vsldSSCompute( task, VSL_SS_OUTLIERS, VSL_SS_METHOD_BACON );

When the computation completes, the `BaconWeights` array contains weights of the observations that have to be analyzed. You can use this array in further data processing. Register this array as an array of observation `weights` and use it in the usual manner. Expectedly, after all outliers are removed, the statistical estimates for the contaminated dataset are not biased.

**Parent topic:**Detecting Outliers in Datasets