Application Notes for oneMKL Summary Statistics

ID 772991
Date 3/31/2023
Public
Document Table of Contents

Common Usage Model of Summary Statistics Algorithms

Any typical application that uses Summary Statistics passes four stages:

  1. Creating a task

  2. Modifying the task parameters

  3. Computing statistical estimates

  4. Destroying the task

Example:

To compute the mean, variance-covariance, and variation coefficient, you need to do the following:

  1. Create a new task and pass into the library the parameters of the problem, dimension p, the number of observations n, and a pointer to the memory location where the dataset X is stored:

    xstorage = VSL_SS_MATRIX_STORAGE_COLS;
    errcode = vsldSSNewTask( &task, &p, &n, &xstorage, X, weights, indices );

    where

    1. The weights array contains the weights assigned to each observation.

    2. The indices array determines components of the random vector to be analyzed. Set the weights of the component to zero to exclude its observation from the analysis. For example, indices can be initialized as follows:

      indices[p] = {0, 1, 1, 0, 1,..};

    You can store the dataset in column-major or in row-major order. Use the xstorage variable to pass the storage format into the library. If you need to set all weights to 1 and process all components of the random vector, pass the NULL pointers instead of weights and indices.

  2. Register arrays to hold computation results and other parameters. Use the editors available in the Summary Statistics domain. The example below illustrates how to use some of them:

    errcode = vsldSSEditTask( task, VSL_SS_ED_ACCUM_WEIGHT, W );
    errcode = vsldSSEditTask( task, VSL_SS_ED_VARIATION, Variation );
    errcode = vsldSSEditMoments( task, Xmean, Raw2Mom, 0, 0, Central2Mom, 0, 0 );
     
    covstorage = VSL_SS_MATRIX_STORAGE_FULL;
    errcode = vsldSSEditCovCor( task, Xmean, Cov, &covstorage, 0, 0 );

    The arrays Xmean, Raw2Mom, Central2Mom, Cov, and Variation store estimates for the mean, the second algebraic moment, variance-covariance, and the variation coefficient, respectively. You need to specify the storage format for the variance-covariance matrix. You can choose between full and packed formats. Registration of an array of means is required in most cases even if you do not need this estimate. This is necessary as many other statistical estimates use the mean value. For more details, please see the Estimation of Raw and Central Moments and Sums, Skewness, Kurtosis, Variation, and Variance-Covariance/Correlation/Cross-Product Matrix chapter of this document and the Summary Statistics section of [MKLMan].

  3. Compute the estimates of your interest by calling the computing routine that calculates them all at once:

    estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION;
    errcode = vsldSSCompute( task, estimates, VSL_SS_METHOD_FAST );

    The library only expects a pointer to the memory with the dataset. This permits placing another data to the same memory location and calling the Compute routine without re-editing the task descriptor.

  4. Deallocate task resources:

    errcode = vslSSDeleteTask( &task );

Product and Performance Information

= = = = = = = = = =

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

= = = = = = = = = =