## Noise filtering in financial market data streams

### Goal

Detect how the price movements of some stocks influences the price movements of others in a large stock portfolio.

### Solution

Split a correlation matrix representing the overall dependencies in data into two components, a signal matrix and a noise matrix. The signal matrix gives an accurate estimate of dependencies between stocks. The algorithm ([Zhang12],[Kargupta02]) relies on an eigenstate-based approach that separates noise from useful information by considering the eigenvalues of the correlation matrix for the accumulated data.

oneMKL Summary Statistics provides functions to calculate correlation matrix for streaming data. oneMKL LAPACK contains a set of computational routines to compute eigenvalues and eigenvectors for symmetric matrices of various properties and storage formats.

The online noise filtering algorithm is:

Compute

`λ`_{min}and`λ`_{max}, the boundaries of the interval of the noise eigenstates.Get a new block of data from the data stream.

Update the correlation matrix using the latest data block.

Compute the eigenvalues and eigenvectors that define the noise component, by searching the eigenvalues of the correlation matrix belonging to the interval [

`λ`_{min},`λ`_{max}].Compute the correlation matrix of the noise component by combining the eigenvalues and eigenvectors computed in Step 4.

Compute the correlation matrix of the signal component by subtracting the noise component from the overall correlation matrix. If there is more data, go back to Step 2.

Source code: see the nf folder in the samples archive available at https://software.intel.com/content/dam/develop/external/us/en/documents/mkl-cookbook-samples-120115.zip.

### Initialization

Initialize a correlation analysis task and its parameters.

```
VSLSSTaskPtr task;
double *x, *mean, *cor;
double W[2];
MKL_INT x_storage, cor_storage;
...
scanf("%d", &m); // number of observations in block
scanf("%d", &n); // number of stocks (task dimension)
...
/* Allocate memory */
nfAllocate(m, n, &x, &mean, &cor, ...);
/* Initialize Summary Statistics task structure */
nfInitSSTask(&m, &n, &task, x, &x_storage, mean, cor, &cor_storage, W);
...
/* Allocate memory */
void nfAllocate(MKL_INT m, MKL_INT n, double **x, double **mean, double **cor,
...)
{
*x = (double *)mkl_malloc(m*n*sizeof(double), ALIGN);
CheckMalloc(*x);
*mean = (double *)mkl_malloc(n*sizeof(double), ALIGN);
CheckMalloc(*mean);
*cor = (double *)mkl_malloc(n*n*sizeof(double), ALIGN);
CheckMalloc(*cor);
...
}
/* Initialize Summary Statistics task structure */
void nfInitSSTask(MKL_INT *m, MKL_INT *n, VSLSSTaskPtr *task, double *x,
MKL_INT *x_storage, double *mean, double *cor,
MKL_INT *cor_storage, double *W)
{
int status;
/* Create VS Summary Statistics task */
*x_storage = VSL_SS_MATRIX_STORAGE_COLS;
status = vsldSSNewTask(task, n, m, x_storage, x, 0, 0);
CheckSSError(status);
/* Register array of weights in the task */
W[0] = 0.0;
W[1] = 0.0;
status = vsldSSEditTask(*task, VSL_SS_ED_ACCUM_WEIGHT, W);
CheckSSError(status);
/* Initialization of the task parameters using full storage
for correlation matrix computation */
*cor_storage = VSL_SS_MATRIX_STORAGE_FULL;
status = vsldSSEditCovCor(*task, mean, 0, 0, cor, cor_storage);
CheckSSError(status);
}
```

### Computation

Perform noise filtering steps for each block of data.

```
/* Set threshold that define noise component */
sqrt_n_m = sqrt((double)n / (double)m);
lambda_min = (1.0 - sqrt_n_m) * (1.0 - sqrt_n_m);
lambda_max = (1.0 + sqrt_n_m) * (1.0 + sqrt_n_m);
...
/* Loop over data blocks */
for (i = 0; i < n_block; i++)
{
/* Read next portion of data */
nfReadDataBlock(m, n, x, fh);
/* Update "signal" and "noise" covariance estimates */
nfKernel(m, n, lambda_min, lambda_max, x, cor, cor_copy,
task, eval, evect, work, iwork, isuppz,
cov_signal, cov_noise);
}
...
void nfKernel(…)
{
...
/* Update correlation matrix estimate using FAST method */
errcode = vsldSSCompute(task, VSL_SS_COR, VSL_SS_METHOD_FAST);
CheckSSError(errcode);
...
/* Compute eigenvalues and eigenvectors of the correlation matrix */
dsyevr(&jobz, &range, &uplo, &n, cor_copy, &n, &lmin, &lmax,
&imin, &imax, &abstol, &n_noise, eval, evect, &n, isuppz,
work, &lwork, iwork, &liwork, &info);
/* Calculate "signal" and "noise" part of covariance matrix */
nfCalculateSignalNoiseCov(n, n_signal, n_noise,
eval, evect, cor, cov_signal, cov_noise);
}
...
static int nfCalculateSignalNoiseCov(int n, int n_signal, int n_noise,
double *eval, double *evect, double *cor, double *cov_signal,
double *cov_noise)
{
int i, j, nn;
/* SYRK parameters */
char uplo, trans;
double alpha, beta;
/* Calculate "noise" part of covariance matrix. */
for (j = 0; j < n_noise; j++) eval[j] = sqrt(eval[j]);
for (i = 0; i < n_noise; i++)
for (j = 0; j < n; j++)
evect[i*n + j] *= lambda[i];
uplo = 'U';
trans = 'N';
alpha = 1.0;
beta = 0.0;
nn = n;
if (n_noise > 0)
{
dsyrk(&uplo, &trans, &nn, &n_noise, &alpha, evect, &nn,
&beta, cov_noise, &nn);
}
else
{
for (i = 0; i < n*n; i++) cov_noise[i] = 0.0;
}
/* Calculate "signal" part of covariance matrix. */
if (n_signal > 0)
{
for (i = 0; i < n; i++)
for (j = 0; j <= i; j++)
cov_signal[i*n + j] = cor[i*n + j] - cov_noise[i*n + j];
}
else
{
for (i = 0; i < n*n; i++) cov_signal[i] = 0.0;
}
return 0;
}
```

### Deinitialization

Delete the task and release associated resources.

```
errcode = vslSSDeleteTask(task);
CheckSSError(errcode);
MKL_Free_Buffers();
```

### Routines Used

Task |
Routine |
Description |
---|---|---|

Initialize a summary statistics task and define the objects for analysis: dataset, its sizes (number of variables and number of observations), and the storage format. |
vsldSSNewTask |
Creates and initializes a new summary statistics task descriptor. |

Specify the memory to hold the correlation matrix. |
vsldSSEditCovCor |
Modifies the pointers to covariance/correlation/cross-product parameters. |

Specify the two-element array intended to hold accumulated weights of observations processed so far (necessary for correct computation of estimates for data streams) |
vsldSSEditTask |
Modifies address of an input/output parameter in the task descriptor. |

Call the major compute driver by specifying computation type VSL_SS_COR, and computation method, VSL_SS_METHOD_FAST. |
vsldSSCompute |
Computes Summary Statistics estimates. |

De-allocate resources associated with the task. |
vslSSDeleteTask |
Destroys the task object and releases the memory. |

Compute eigenvalues and eigenvectors of the correlation matrix. |
dsyevr |
Computes selected eigenvalues and, optionally, eigenvectors of a real symmetric matrix using the Relatively Robust Representations. |

Perform a symmetric rank-k update. |
dsyrk |
Performs a symmetric rank-k update. |

### Discussion

Step 4 of the algorithm involves solving an eigenvalue problem for a symmetric matrix. The online noise filtration algorithm requires computation of eigenvalues that belong to the predefined interval [`λ`_{min}, `λ`_{max} ], which define noise in the data. The LAPACK driver routine ?syevr is the default routine for solving this type of problem. The ?syevr interface allows the caller to specify a pair of values, in this case corresponding to `λ`_{min} and `λ`_{max}, as the lower and upper bounds of the interval to be searched for eigenvalues.

The eigenvectors found are returned as columns of an array containing an orthogonal matrix `A`, and eigenvalues are returned in an array containing elements of the diagonal matrix `Diag`. The correlation matrix for the noise component can be obtained by computing `A`*`Diag`*`A`^{T}. However, instead of constructing a noise correlation matrix using two general matrix multiplications, this can be more efficiently computed with one diagonal matrix multiplication and one rank-n update operation:

For the rank-n update operation, oneMKL provides the BLAS function ?syrk.

**Parent topic:**Intel® oneAPI Math Kernel Library Cookbook