Application Notes for oneMKL Summary Statistics

ID 772991
Date 3/31/2023
Public
Document Table of Contents

Estimating a Pooled/Group Variance-Covariance Matrices/Means

Use the VSL_SS_METHOD_1PASS method to compute pooled/group variance-covariance matrices, or pooled/group means.

For the definition of pooled/group variance-covariance matrices, see the Mathematical Notation and Definitions chapter in the Summary Statistics section of [MKLMan].

To compute a pooled variance-covariance and/or a pooled mean, you need to split observations into g groups by allocating array grp_indices of size n, where n is the number of observations. Indices of the groups take values from the range [0,1, ... g-1]. Thus, grp_indices[j]= k if observation j belongs to the group indexed k.

The pooled variance-covariance matrix is packed as a one-dimensional array. For information on available storage formats and memory requirements, see Table Storage formats of a variance-covariance/correlation matrix of the Summary Statistics section of [MKLMan]. The pooled mean estimate is returned in the array that should store at least p elements, where p is the dimension of the task.

You can get estimates for group variance-covariance matrices and/or group means by passing into the library the array grp_cov_indices of size g. This array determines the group variance-covariance matrices and/or means to be returned:

  1. If the group variance-covariance matrix and/or the vector of means are to be returned, grp_cov_indices[idx] = 1.

  2. Otherwise, grp_cov_indices[idx] = 0.

The estimates for group variance-covariance matrices and group means are stored in one-dimensional arrays grp_cov and grp_means, respectively.

The group means are packed in the grp_means array in series. The size of the array should be sufficient for at least p*k elements,

where

  1. p is the dimension of the task.

  2. k is the number of group matrices to be returned.

Group matrices are packed in the grp_cov array in series according to the contents of the array grp_cov_indices. The size of the grp_cov array should be sufficient for at least cov_dim*k

where

  1. cov_dim is the size of a single group matrix defined by the chosen storage format.

  2. k is the number of group matrices to be returned.

The library checks that the initialization of the grp_indices pointer is correct and the values stored in the array are positive. If the initialization is wrong, computation of pooled/group variance-covariance matrix terminates with an error code. In this case, you need to make sure that the grp_indices array contains all values from 0 to g-1 inclusively, and the memory allocated for the grp_cov_indices array  is sufficient to hold at least g values.

The example below shows pooled/group variance-covariance matrices that you can get:

 #include "mkl_vsl.h"
  
 #define DIM 3      /* dimension of the task */
 #define N   1000   /* number of observations */
 #define G   2      /* number of groups */ 
 #define GN  2      /* number of group variance-covariance matrices */
  
 int main()
 {
    int i;
    VSLSSTaskPtr task;
    double g_indices[N];           /* indices of the groups */
    double x[N][DIM];              /* matrix of observations */
    double g_cov_indices[G]={1,1}; /* two group matrices to be returned */
  
    double pcov[DIM*DIM];          /* pooled variance-covariance matrix */
    double pmean[DIM];             /* array of pooled means */
  
    double gcov[DIM*DIM*GN];       /* array for group variance-covariance matrices */
    double gmean[DIM*GN];          /* array for group means */
    int status;
  
    MKL_INT p, n, xstorage, pcovstorage, gcovstorage; 
    unsigned long long estimates;
  
    /* Parameters of the task and initialization */
    p = DIM;
    n = N;
    xstorage = VSL_SS_MATRIX_STORAGE_COLS;
    pcovstorage = VSL_SS_MATRIX_STORAGE_FULL;
    gcovstorage = VSL_SS_MATRIX_STORAGE_FULL;
  
    /* The first N/2 elements belong to the first group, the rest belong to the second group */
    for ( i = 0; i < N/2; i++ )
    {
        g_indices[i+0] = 0; g_indices[i+N/2] = 1;
    }
  
    /* Create a task */
    status = vslsSSNewTask( &task, &p, &n, &xstorage, x, 0, 0 );
  
    /* Initialize the task parameters */
    status = vslsSSEditTask( task, VSL_SS_ED_POOLED_COV_STORAGE,
                                   &pcovstorage );
    status = vslsSSEditTask( task, VSL_SS_ED_GROUP_COV_STORAGE,
                                   &gcovstorage );
    status = vsldSSEditPooledCovariance( task, g_indices, pmean,
                                         pcov, g_cov_indices, gmean, gcov );
  
    /* Compute the pooled and group variance-covariance matrices */
    estimates = VSL_SS_POOLED_COV|VSL_SS_GROUP_COV;
    status = vsldSSCompute( task, estimates, VSL_SS_METHOD_1PASS );
  
    /* Deallocate the task resources */
    status = vslSSDeleteTask( &task );
  
    return 0;
 }

Computation of pooled/group variance-covariance matrices does not support datasets available in blocks.