Application Notes for oneMKL Summary Statistics

ID 772991
Date 3/31/2023
Public
Document Table of Contents

Estimating Raw and Central Moments and Sums, Skewness, Excess Kurtosis, Variation, and Variance-Covariance/Correlation/Cross-Product Matrix

Summary Statistics offers the following methods to support computation of raw and central moments and sums, skewness, excess kurtosis (further referred to as kurtosis), variation, and variance-covariance/correlation/cross-product matrix:

  1. Method VSL_SS_METHOD_FAST is a performance-oriented implementation of an algorithm for estimate calculations.

  2. Method VSL_SS_METHOD_FAST_USER_MEAN is an implementation of an algorithm for estimate calculations when a user-defined mean is provided.

  3. Method VSL_SS_METHOD_1PASS is an implementation of a one-pass algorithm. In this case, all requested estimates are computed for a single pass. For example, see [West79].

  4. Method VSL_SS_METHOD_CP_TO_COVCOR is an implementation of computation of a variance-covariance and/or correlation matrix from a corresponding cross-product matrix.

  5. Method VSL_SS_METHOD_SUM_TO_MOM is an implementation of computation of raw/central statistical moments as well as kurtosis/skewness/variation from corresponding raw/central sums.

The VSL_SS_METHOD_FAST method for variance-covariance estimation can be numerically unstable for some datasets, such as a dataset from Gaussian distribution with a standard deviation several orders smaller than its mean.  For such datasets, to estimate variance-covariance, cross-product or another estimate relying on mean, use the one-pass algorithm supported by the library, or the two-pass algorithm [West79], whose building blocks are available in the library. In the latter case, you need to do the following:

  1. Compute the mean using Summary Statistics functions.

  2. Compute the variance-covariance, cross-product or another estimate by providing the computed mean and applying the VSL_SS_METHOD_FAST_USER_MEAN method.

Each estimate is stored as a one-dimensional array. The size of the array may differ depending on the type of the estimate, as follows:

Estimate Type Size of the Array
  1. Raw and central moments

  2. Raw and central sums

  3. Kurtosis

  4. Skewness

  5. Variation

Must be sufficient to store at least p elements, where p is the dimension of the task.
  1. Variance-covariance matrix

  2. Correlation matrix

  3. Cross-product matrix

Depends on the storage format. For details, see Table Storage formats of a variance-covariance/correlation/cross-product matrix in the Summary Statistics section of [MKLMan].