## Application Notes for Intel® oneAPI Math Kernel Library Summary Statistics

ID 772991
Date 12/04/2020
Public

A newer version of this document is available. Customers should click here to go to the newest version.

## Processing Data in Blocks

Summary Statistics enables block-based data analysis that can help you:

1. compute statistical estimates for out-of-memory datasets, splitting them into blocks

2. analyze in-memory data arrays that become available block by block

3. tune your applications for out-of-memory data support

To compute statistical estimates for out-of-memory datasets, do the following:

1. Set the estimates of your interest to zero, or to any other meaningful value:

for( i = 0; i < p; i++ )
{
Xmean[i] = 0.0;
Raw2Mom[i] = 0.0;
Central2Mom[i] = 0.0;
for(j = 0; j < p; j++)
{
Cov[i][j] = 0.0;
}
}
2. Initialize array W of size 2 with zero values.

This array holds accumulated weights that are important for correct computation of the estimates:

W[0] = 0.0; W[1] = 0.0;
3. Get the first portion of the dataset into array X, and the corresponding weights into array weights:

GetNextDataChunk( X, weights );
4. Follow the common usage model of the Summary Statistics algorithms:

/* Create a task */
xstorage = VSL_SS_MATRIX_STORAGE_COLS;
&xstorage, X, weights, indices );

/* Edit the task parameters */
errcode = vsldSSEditTask( task, VSL_SS_ED_ACCUM_WEIGHT, W );
errcode = vsldSSEditTask( task, VSL_SS_ED_VARIATION, Variation );
errcode = vsldSSEditMoments( task, Xmean, Raw2Mom, 0, 0, Central2Mom, 0, 0 );

covstorage = VSL_SS_MATRIX_STORAGE_FULL;
errcode = vsldSSEditCovCor( task, Xmean, cov, &covstorage, 0, 0 );

/* Compute the estimates for the dataset split into chunks */
estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION;
for( nchunk = 0;  nchunk++; )
errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD );
If ( nchunk >= N ) break;
GetNextDataChunk( X, weights );
}

/* Deallocate task resources */
errcode = vslSSDeleteTask( &task );

Summary statistics domain also enables reading the next data block into a different array. The whole computation scheme remains the same. You just need to provide the address of this data block to the library:

double* NextXChunk[N];
estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION;
for( nchunk = 0; nchunk++; )
{
errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD );
If ( nchunk >= N ) break;
GetNextDataChunk( NextXChunk, [nchunk], weights );
errcode = vsldSSEditTask( task, VSL_SS_ED_OBSERV, NextXChunk,[nchunk] );
}

For the list of estimators that support processing datasets in blocks, see Table VS Summary Statistics Estimates Obtained with Compute Routine in the Summary Statistics section of [MKLMan].

Product and Performance Information

= = = = = = = = = =

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

= = = = = = = = = =