Univariate Outlier Detection

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Download PDF

ID 772611

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-4E9927C2-2F63-4AC0-BAA4-D25C8FA0FBBC

View Details

Univariate Outlier Detection

A univariate outlier is an occurrence of an abnormal value within a single observation point.

Details

Given a set X of n feature vectors of dimension p, the problem is to identify the vectors that do not belong to the underlying distribution (see [Ben2005] for exact definitions of an outlier).

The algorithm for univariate outlier detection considers each feature independently. The univariate outlier detection method can be parametric, assumes a known underlying distribution for the data set, and defines an outlier region such that if an observation belongs to the region, it is marked as an outlier. Definition of the outlier region is connected to the assumed underlying data distribution.

The following is an example of an outlier region for the univariate outlier detection:

where and are (robust) estimates of the mean and standard deviation computed for a given data set, is the confidence coefficient, and defines the limits of the region and should be adjusted to the number of observations.

Batch Processing

Algorithm Input

The univariate outlier detection algorithm accepts the input described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Algorithm Input for Univariate Outlier Detection (Batch Processing)
Input ID	Input
`data`	Pointer to the numeric table with the data for outlier detection. NOTE: The input can be an object of any class derived from the `NumericTable` class.
`location`	Pointer to the numeric table with the vector of means. NOTE: The input can be an object of any class derived from `NumericTable` except `PackedSymmetricMatrix` and `PackedTriangularMatrix`.
`scatter`	Pointer to the numeric table with the vector of standard deviations. NOTE: The input can be an object of any class derived from `NumericTable` except `PackedSymmetricMatrix` and `PackedTriangularMatrix`.
`threshold`	Pointer to the numeric table with non-negative numbers that define the outlier region. NOTE: The input can be an object of any class derived from `NumericTable` except `PackedSymmetricMatrix` and `PackedTriangularMatrix`.

If you do not provide at least one of the location, scatter, threshold inputs, the library will initialize all of them with the following default values:

Default Values for Algorithm Input of Univariate Outlier Detection (Batch Processing)
`location`	A set of 0.0
`scatter`	A set of 1.0
`threshold`	A set of 3.0

Algorithm Parameters

The univariate outlier detection algorithm has the following parameters:

Algorithm Parameters for Univariate Outlier Detection (Batch Processing)
Parameter	Default Value	Description
`algorithmFPType`	`float`	The floating-point type that the algorithm uses for intermediate computations. Can be `float` or `double`.
`method`	`defaultDense`	Performance-oriented computation method, the only method supported by the algorithm.

Algorithm Output

The univariate outlier detection algorithm calculates the result described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Algorithm Output for Univariate Outlier Detection (Batch Processing)
Result ID	Result
`weights`	Pointer to the numeric table of zeros and ones. Zero in the position indicates an outlier in the i-th observation of the j-th feature. NOTE: By default, the result is an object of the `HomogenNumericTable` class, but you can define the result as an object of any class derived from `NumericTable` except `PackedSymmetricMatrix`, `PackedTriangularMatrix`, and `СSRNumericTable`.

Examples

C++ (CPU)

Batch Processing:

out_detect_uni_dense_batch.cpp

Java*

NOTE:

There is no support for Java on GPU.

Batch Processing:

OutDetectUniDenseBatch.java

Python*

Batch Processing:

univariate_outlier_batch.py

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Univariate Outlier Detection

Details

Batch Processing

Examples