Intel® oneAPI Data Analytics Library Developer Guide and Reference
A newer version of this document is available. Customers should click here to go to the newest version.
Classification Decision Tree
Classification decision tree is a kind of a decision tree described in Decision Tree.
Details
Given:
n feature vectors
 of size pThe vector of class labels
 that describes the class to which the feature vector 
 belongs, where 
 and C is the number of classes.
The problem is to build a decision tree classifier.
Split Criteria
The library provides the decision tree classification algorithm based on split criteria Gini index [Breiman84] and Information gain [Quinlan86], [Mitchell97]:
Gini index
 
     where
D is a set of observations that reach the node
 is the observed fraction of observations with class i in D
To find the best test using Gini index, each possible test is examined using
 
     where
 is the set of all possible outcomes of test 
 is the subset of D, for which outcome of 
 is v, for example 
The test to be used in the node is selected as
. For binary decision tree with ‘true’ and ‘false’ branches, 
Information gain
 
     where
, D, 
 are defined above
, with 
 defined above in Gini index.
Similarly to Gini index, the test to be used in the node is selected as
. For binary decision tree with ‘true’ and ‘false’ branches, 
Training Stage
The classification decision tree follows the algorithmic framework of decision tree training described in Decision Tree.
Prediction Stage
The classification decision tree follows the algorithmic framework of decision tree prediction described in Decision Tree.
Given decision tree and vectors 
, the problem is to calculate the responses for those vectors.
Batch Processing
Decision tree classification follows the general workflow described in Classification Usage Model.
Training
In addition to common input for a classifier, decision trees can accept the following inputs that are used for post-pruning:
Input ID  |  
        Input  |  
       
|---|---|
dataForPruning  |  
        Pointer to the   |  
       
labelsForPruning  |  
        Pointer to the   |  
       
At the training stage, decision tree classifier has the following parameters:
Parameter  |  
        Default Value  |  
        Description  |  
       
|---|---|---|
algorithmFPType  |  
        float  |  
        The floating-point type that the algorithm uses for intermediate computations. Can be float or double.  |  
       
method  |  
        defaultDense  |  
        The computation method used by the decision tree classification. The only training method supported so far is the default dense method.  |  
       
nClasses  |  
        Not applicable  |  
        The number of classes. A required parameter.  |  
       
splitCriterion  |  
        infoGain  |  
        Split criterion to choose the best test for split nodes. Available split criteria for decision trees: 
  |  
       
pruning  |  
        reducedErrorPruning  |  
        Method to perform post-pruning. Available options for the pruning parameter: 
  |  
       
maxTreeDepth  |  
        0  |  
        Maximum tree depth. Zero value means unlimited depth. Can be any non-negative number.  |  
       
minObservationsInLeafNodes  |  
        1  |  
        Minimum number of observations in the leaf node. Can be any positive number.  |  
       
Prediction
At the prediction stage, decision tree classifier has the following parameters:
Parameter  |  
        Default Value  |  
        Description  |  
       
|---|---|---|
algorithmFPType  |  
        float  |  
        The floating-point type that the algorithm uses for intermediate computations. Can be float or double.  |  
       
method  |  
        defaultDense  |  
        The computation method used by the decision tree classification. The only training method supported so far is the default dense method.  |  
       
Examples
C++ (CPU)
Batch Processing:
Python*
Batch Processing:
 numeric table with the pruning data set. This table can be an object of any class derived from NumericTable.
 numeric table with class labels. This table can be an object of any class derived from NumericTable except PackedSymmetricMatrix and PackedTriangularMatrix.