Z-score

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Download PDF

ID 772611

Date 11/07/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Z-score

Z-score normalization is an algorithm that produces data with each feature (column) having zero mean and unit variance.

Details

Given a set X of n feature vectors of dimension p, the problem is to compute the matrix of dimension as following:

where:

is the mean of j-th component of set , where
value of depends omn a computation mode

oneDAL provides two modes for computing the result matrix. You can enable the mode by setting the flag doScale to a certain position (for details, see Algorithm Parameters). The mode may include:

Centering only. In this case, and no scaling is performed. After normalization, the mean of j-th component of result set will be zero.
Centering and scaling. In this case, , where is the standard deviation of j-th component of set . After normalization, the mean of j-th component of result set will be zero and its variance will get a value of one.

NOTE:

Some algorithms require normalization parameters (mean and variance) as an input. The implementation of Z-score algorithm in oneDAL does not return these values by default. Enable this option by setting the resultsToCompute flag. For details, see Algorithm Parameters.

Batch Processing

Algorithm Input

Z-score normalization algorithm accepts an input as described below. Pass the Input ID as a parameter to the methods that provide input for your algorithm. For more details, see Algorithms.

Algorithm Input for Z-score (Batch Processing)
Input ID	Input
`data`	Pointer to the numeric table of size . NOTE: This table can be an object of any class derived from `NumericTable`.

Algorithm Parameters

Z-score normalization algorithm has the following parameters. Some of them are required only for specific values of the computation method parameter method:

Algorithm Parameters for Z-score (Batch Processing)
Parameter	method	Default Value	Description
`algorithmFPType`	`defaultDense` or `sumDense`	`float`	The floating-point type that the algorithm uses for intermediate computations. Can be `float` or `double`.
`method`	Not applicable	`defaultDense`	Available computation methods: defaultDense a performance-oriented method. Mean and variance are computed by low order moments algorithm. For details, see Batch Processing for Moments of Low Order. sumDense a method that uses the basic statistics associated with the numeric table of pre-computed sums. Returns an error if pre-computed sums are not defined.
`moments`	`defaultDense`	SharedPtr<low_order_moments::Batch<algorithmFPType, low_order_moments::defaultDense> >	Pointer to the low order moments algorithm that computes means and standard deviations to be used for Z-score normalization with the `defaultDense` method.
`doScale`	`defaultDense` or `sumDense`	`true`	If true, the algorithm applies both centering and scaling. Otherwise, the algorithm provides only centering.
`resultsToCompute`	`defaultDense` or `sumDense`	Not applicable	Optional. Pointer to the data collection containing the following key-value pairs for Z-score: `mean` - means `variance` - variances Provide one of these values to request a single characteristic or use bitwise OR to request a combination of them.

Algorithm Output

Z-score normalization algorithm calculates the result as described below. Pass the Result ID as a parameter to the methods that access the results of your algorithm. For more details, see Algorithms.

Algorithm Output for Z-score (Batch Processing)
Result ID	Result
`normalizedData`	Pointer to the numeric table that stores the result of normalization. NOTE: By default, the result is an object of the `HomogenNumericTable` class, but you can define the result as an object of any class derived from `NumericTable` except `PackedTriangularMatrix`, `PackedSymmetricMatrix`, and `CSRNumericTable`.
`means`	Optional. Pointer to the numeric table that contains mean values for each feature. If the function result is not requested through the `resultsToCompute` parameter, the numeric table contains a `NULL` pointer.
`variances`	Optional. Pointer to the numeric table that contains variance values for each feature. If the function result is not requested through the `resultsToCompute` parameter, the numeric table contains a `NULL` pointer. -

NOTE:

By default, each numeric table specified by the collection elements is an object of the HomogenNumericTable class. You can also define the result as an object of any class derived from NumericTable, except for PackedSymmetricMatrix, PackedTriangularMatrix, and CSRNumericTable.

Examples

C++ (CPU)

Batch Processing:

zscore_dense_batch.cpp

Python*

Batch Processing:

https://github.com/intel/scikit-learn-intelex/tree/master/examples/daal4py/normalization_zscore_batch.py

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Data Analytics Library Developer Guide and Reference

Z-score

Details

Batch Processing

Examples