Developer Guide and Reference

  • 2021.4
  • 09/27/2021
  • Public Content
Contents

Classification Decision Forest

Decision forest classifier is a special case of the Decision Forest model.

Details

Given:
  • n
    feature vectors LaTex Math image. of size
    p
    ;
  • their non-negative sample weights LaTex Math image.;
  • the vector of class labels LaTex Math image. that describes the class to which the feature vector LaTex Math image. belongs, where LaTex Math image. and
    C
    is the number of classes.
The problem is to build a decision forest classifier.
Training Stage
Decision forest classifier follows the algorithmic framework of decision forest training with Gini impurity metrics as impurity metrics [Breiman84]. If sample weights are provided as input, the library uses a weighted version of the algorithm.
Gini index is an impurity metric, calculated as follows:
LaTex Math image.
where
  • D
    is a set of observations that reach the node;
  • LaTex Math image. is specified in the table below:
Without sample weights
With sample weights
LaTex Math image. is the observed fraction of observations that belong to class
i
in
D
LaTex Math image. is the observed weighted fraction of observations that belong to class
i
in
D
:
LaTex Math image.
Prediction Stage
Given decision forest classifier and vectors LaTex Math image., the problem is to calculate the labels for those vectors. To solve the problem for each given query vector LaTex Math image., the algorithm finds the leaf node in a tree in the forest that gives the classification response by that tree. The forest chooses the label y taking the majority of trees in the forest voting for that label.
Out-of-bag Error
Decision forest classifier follows the algorithmic framework for calculating the decision forest out-of-bag (OOB) error, where aggregation of the out-of-bag predictions in all trees and calculation of the OOB error of the decision forest is done as follows:
  • For each vector LaTex Math image. in the dataset
    X
    , predict its label LaTex Math image. by having the majority of votes from the trees that contain LaTex Math image. in their OOB set, and vote for that label.
  • Calculate the OOB error of the decision forest
    T
    as the average of misclassifications:
    LaTex Math image.
  • If OOB error value per each observation is required, then calculate the prediction error for LaTex Math image.: LaTex Math image.
Variable Importance
The library computes
Mean Decrease Impurity
(MDI) importance measure, also known as the
Gini importance
or
Mean Decrease Gini
, by using the Gini index as impurity metrics.

Usage of Training Alternative

To build a Decision Forest Classification model using methods of the Model Builder class of Decision Forest Classification, complete the following steps:
  • Create a Decision Forest Classification model builder using a constructor with the required number of classes and trees.
  • Create a decision tree and add nodes to it:
    • Use the
      createTree
      method with the required number of nodes in a tree and a label of the class for which the tree is created.
    • Use the
      addSplitNode
      and
      addLeafNode
      methods to add split and leaf nodes to the created tree. See the note below describing the decision tree structure.
    • After you add all nodes to the current tree, proceed to creating the next one in the same way.
  • Use the
    getModel
    method to get the trained Decision Forest Classification model after all trees have been created.
Each tree consists of internal nodes (called non-leaf or split nodes) and external nodes (leaf nodes). Each split node denotes a feature test that is a Boolean expression, for example, f <
featureValue
or f =
featureValue
, where f is a feature and
featureValue
is a constant. The test type depends on the feature type: continuous, categorical, or ordinal. For more information on the test types, see Decision Tree.
The inducted decision tree is a binary tree, meaning that each non-leaf node has exactly two branches: true and false. Each split node contains
featureIndex
, the index of the feature used for the feature test in this node, and
featureValue
, the constant for the Boolean expression in the test. Each leaf node contains a
classLabel
, the predicted class for this leaf. For more information on decision trees, see Decision Tree.
Add nodes to the created tree in accordance with the pre-calculated structure of the tree. Check that the leaf nodes do not have children nodes and that the splits have exactly two children.
Examples
C++ (CPU)
Java*
There is no support for Java on GPU.
Python*

Batch Processing

Decision forest classification follows the general workflow described in Decision Forest and Classification Usage Model.
Training
In addition to the parameters of a classifier (see Classification Usage Model) and decision forest parameters described in Batch Processing, the training algorithm for decision forest classification has the following parameters:
Parameter
Default Value
Description
algorithmFPType
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
defaultDense
The computation method used by the decision forest classification.
For CPU:
  • defaultDense
    - default performance-oriented method
  • hist
    - inexact histogram computation method
For GPU:
nClasses
Not applicable
The number of classes. A required parameter.
Output
Decision forest classification calculates the result of regression and decision forest. For more details, refer to Batch Processing and Classification Usage Model.
Prediction
For the description of the input and output, refer to Classification Usage Model.
In addition to the parameters of a classifier, decision forest classification has the following parameters at the prediction stage:
Parameter
Default Value
Description
algorithmFPType
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
defaultDense
The computation method used by the decision forest classification. The only prediction method supported so far is the default dense method.
nClasses
Not applicable
The number of classes. A required parameter.
votingMethod
weighted
A flag that specifies which method is used to compute probabilities and class labels:
weighted
  • Probability for each class is computed as a sample mean of estimates across all trees, where each estimate is the normalized number of training samples for this class that were recorded in a particular leaf node for current input.
  • The algorithm returns the label for the class that gets the maximal value in a sample mean.
unweighted
  • Probabilities are computed as normalized votes distribution across all trees of the forest.
  • The algorithm returns the label for the class that gets the majority of votes across all trees of the forest.
Examples
oneAPI DPC++
Batch Processing:
oneAPI C++
Batch Processing:
C++ (CPU)
Batch Processing:
Java*
There is no support for Java on GPU.
Batch Processing:
Python*
Batch Processing:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.