Visible to Intel only — GUID: GUID-BD8AA239-5027-46E1-AD0D-BB1ECE28DF21

Visible to Intel only — GUID: GUID-BD8AA239-5027-46E1-AD0D-BB1ECE28DF21

## Processing Data in Blocks

Summary Statistics enables block-based data analysis that can help you:

compute statistical estimates for out-of-memory datasets, splitting them into blocks

analyze in-memory data arrays that become available block by block

tune your applications for out-of-memory data support

To compute statistical estimates for out-of-memory datasets, do the following:

Set the estimates of your interest to zero, or to any other meaningful value:

for( i = 0; i < p; i++ ) { Xmean[i] = 0.0; Raw2Mom[i] = 0.0; Central2Mom[i] = 0.0; for(j = 0; j < p; j++) { Cov[i][j] = 0.0; } }

Initialize array

`W`of size 2 with zero values.This array holds accumulated weights that are important for correct computation of the estimates:

W[0] = 0.0; W[1] = 0.0;

Get the first portion of the dataset into array

`X,`and the corresponding weights into array`weights`:GetNextDataChunk( X, weights );

Follow the common usage model of the Summary Statistics algorithms:

/* Create a task */ xstorage = VSL_SS_MATRIX_STORAGE_COLS; errcode = vsldSSNewTask( &task, &p, &nblock, &xstorage, X, weights, indices ); /* Edit the task parameters */ errcode = vsldSSEditTask( task, VSL_SS_ED_ACCUM_WEIGHT, W ); errcode = vsldSSEditTask( task, VSL_SS_ED_VARIATION, Variation ); errcode = vsldSSEditMoments( task, Xmean, Raw2Mom, 0, 0, Central2Mom, 0, 0 ); covstorage = VSL_SS_MATRIX_STORAGE_FULL; errcode = vsldSSEditCovCor( task, Xmean, cov, &covstorage, 0, 0 ); /* Compute the estimates for the dataset split into chunks */ estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION; for( nchunk = 0; nchunk++; ) errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD ); If ( nchunk >= N ) break; GetNextDataChunk( X, weights ); } /* Deallocate task resources */ errcode = vslSSDeleteTask( &task );

Summary statistics domain also enables reading the next data block into a different array. The whole computation scheme remains the same. You just need to provide the address of this data block to the library:

double* NextXChunk[N]; estimates = VSL_SS_MEAN | VSL_SS_2C_MOM | VSL_SS_COV | VSL_SS_VARIATION; for( nchunk = 0; nchunk++; ) { errcode = vsldSSCompute( task, estimates, VSL_SS_1PASS_METHOD ); If ( nchunk >= N ) break; GetNextDataChunk( NextXChunk, [nchunk], weights ); errcode = vsldSSEditTask( task, VSL_SS_ED_OBSERV, NextXChunk,[nchunk] ); }

For the list of estimators that support processing datasets in blocks, see Table VS Summary Statistics Estimates Obtained with Compute Routine in the Summary Statistics section of [MKLMan].

Product and Performance Information |
---|

= = = = = = = = = = Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 = = = = = = = = = = |