Decision Forest Classification and Regression (DF)
Operation | Computational methods | Programming Interface |
Mathematical formulation
Programming Interface
- error_metric_mode::none
- Do not compute error metric.
- error_metric_mode::out_of_bag_error
- Train produces
table with cumulative prediction error for out of bag observations.
- error_metric_mode::out_of_bag_error_per_observation
- Train produces
table with prediction error for out-of-bag observations.
- variable_importance_mode::none
- Do not compute variable importance.
- variable_importance_mode::mdi
- Mean Decrease Impurity. Computed as the sum of weighted impurity decreases for all nodes where the variable is used, averaged over all trees in the forest.
- variable_importance_mode::mda_raw
- Mean Decrease Accuracy (permutation importance). For each tree, the prediction error on the out-of-bag portion of the data is computed (error rate for classification, MSE for regression). The same is done after permuting each predictor variable. The difference between the two are then averaged over all trees.
- variable_importance_mode::mda_scaled
- Mean Decrease Accuracy (permutation importance). This is MDA_Raw value scaled by its standard deviation.
- infer_mode::class_labels
- Infer produces a “math:n times 1table with the predicted labels.
- infer_mode::class_responses
- deprecated
- infer_mode::class_probabilities
- Infer produces
table with the predicted class probabilities for each observation.
- voting_mode::weighted
- The final prediction is combined through a weighted majority voting.
- voting_mode::unweighted
- The final prediction is combined through a simple majority voting.
- template<typenameFloat= float, typenameMethod= method::by_default, typenameTask= task::by_default>classdescriptor
- Template Parameters
- Float– The floating-point type that the algorithm uses for intermediate computations. Can befloatordouble.
- Task– Tag-type that specifies type of the problem to solve. Can betask::classificationortask::regression.
Constructors- descriptor() = default
- Creates a new instance of the class with the default property values.
Properties- doubleimpurity_threshold
- The impurity threshold, a node will be split if this split induces a decrease of the impurity greater than or equal to the input value.Default value: 0.0.
- Getter & Setter
double get_impurity_threshold() const
auto & set_impurity_threshold(double value)
- Invariants
impurity_threshold >= 0.0
- std::int64_tmax_tree_depth
- The maximal depth of the tree. If 0, then nodes are expanded until all leaves are pure or until all leaves contain less or equal to min observations in leaf node samples.Default value: 0.
- Getter & Setter
std::int64_t get_max_tree_depth() const
auto & set_max_tree_depth(std::int64_t value)
- std::int64_tmax_bins
- The maximal number of discrete bins to bucket continuous features. Used withmethod::histsplit-finding method only. Increasing the number results in higher computation costs.Default value: 256.
- Getter & Setter
std::int64_t get_max_bins() const
auto & set_max_bins(std::int64_t value)
- Invariants
max_bins > 1
- std::int64_tseed
- Seed for the random numbers generator used by the algorithm.
- Getter & Setter
std::int64_t get_seed() const
auto & set_seed(std::int64_t value)
- Invariants
tree_count > 0
- std::int64_ttree_count
- The number of trees in the forest.Default value: 100.
- Getter & Setter
std::int64_t get_tree_count() const
auto & set_tree_count(std::int64_t value)
- Invariants
tree_count > 0
- variable_importance_modevariable_importance_mode
- The variable importance mode.Default value: variable_importance_mode::none.
- Getter & Setter
variable_importance_mode get_variable_importance_mode() const
auto & set_variable_importance_mode(variable_importance_mode value)
- voting_modevoting_mode
- The voting mode. Used withtask::classificationonly.
- Getter & Setter
template> voting_mode get_voting_mode() const
template> auto & set_voting_mode(voting_mode value)
- std::int64_tfeatures_per_node
- The number of features to consider when looking for the best split for a node.Default value: task::classification ? sqrt(p) : p/3, where p is the total number of features.
- Getter & Setter
std::int64_t get_features_per_node() const
auto & set_features_per_node(std::int64_t value)
- doublemin_impurity_decrease_in_split_node
- The min impurity decrease in a split node is a threshold for stopping the tree growth early. A node will be split if its impurity is above the threshold, otherwise it is a leaf.Default value: 0.0.
- Getter & Setter
double get_min_impurity_decrease_in_split_node() const
auto & set_min_impurity_decrease_in_split_node(double value)
- Invariants
min_impurity_decrease_in_split_node >= 0.0
- boolmemory_saving_mode
- The memory saving mode.Default value: false.
- Getter & Setter
bool get_memory_saving_mode() const
auto & set_memory_saving_mode(bool value)
- boolbootstrap
- The bootstrap mode, if true, the training set for a tree is a bootstrap of the whole training set, if False, the whole dataset is used to build each tree.Default value: true.
- Getter & Setter
bool get_bootstrap() const
auto & set_bootstrap(bool value)
- std::int64_tmin_bin_size
- The minimal number of observations in a bin. Used withmethod::histsplit-finding method only.Default value: 5.
- Getter & Setter
std::int64_t get_min_bin_size() const
auto & set_min_bin_size(std::int64_t value)
- Invariants
min_bin_size > 0
- doubleobservations_per_tree_fraction
- The fraction of observations per tree.Default value: 1.0.
- Getter & Setter
double get_observations_per_tree_fraction() const
auto & set_observations_per_tree_fraction(double value)
- Invariants
observations_per_tree_fraction > 0.0
observations_per_tree_fraction <= 1.0
- std::int64_tclass_count
- The class count. Used withtask::classificationonly.Default value: 2.
- Getter & Setter
template> std::int64_t get_class_count() const
template> auto & set_class_count(std::int64_t value)
- std::int64_tmax_leaf_nodes
- The maximal number of the leaf nodes. If 0, the number of leaf nodes is not limited.Default value: 0.
- Getter & Setter
std::int64_t get_max_leaf_nodes() const
auto & set_max_leaf_nodes(std::int64_t value)
- error_metric_modeerror_metric_mode
- The error metric mode.Default value: error_metric_mode::none.
- Getter & Setter
error_metric_mode get_error_metric_mode() const
auto & set_error_metric_mode(error_metric_mode value)
- doublemin_weight_fraction_in_leaf_node
- The min weight fraction in a leaf node. The minimum weighted fraction of the total sum of weights (of all input observations) required to be at a leaf node.Default value: 0.0.
- Getter & Setter
double get_min_weight_fraction_in_leaf_node() const
auto & set_min_weight_fraction_in_leaf_node(double value)
- Invariants
min_weight_fraction_in_leaf_node >= 0.0
min_weight_fraction_in_leaf_node <= 0.5
- std::int64_tmin_observations_in_leaf_node
- The minimal number of observations in a leaf node.Default value: 1 for classification, 5 for regression.
- Getter & Setter
std::int64_t get_min_observations_in_leaf_node() const
auto & set_min_observations_in_leaf_node(std::int64_t value)
- Invariants
min_observations_in_leaf_node > 0
- infer_modeinfer_mode
- The infer mode. Used withtask::classificationonly.
- Getter & Setter
template> infer_mode get_infer_mode() const
template> auto & set_infer_mode(infer_mode value)
- std::int64_tmin_observations_in_split_node
- The minimal number of observations in a split node.Default value: 2.
- Getter & Setter
std::int64_t get_min_observations_in_split_node() const
auto & set_min_observations_in_split_node(std::int64_t value)
- Invariants
min_observations_in_split_node > 1
- structdense
- Tag-type that denotes dense computational method.
- structhist
- Tag-type that denotes hist computational method.
- structclassification
- Tag-type that parameterizes entities used for solving classification problem.
- structregression
- Tag-type that parameterizes entities used for solving regression problem.
- Alias tag-type for classification task.
- Template Parameters
- Task– Tag-type that specifies the type of the problem to solve. Can betask::classificationortask::regression.
Constructors- model()
- Creates a new instance of the class with the default property values.
Public Methods- std::int64_tget_tree_count()const
- The number of trees in the forest.
- template<typenameT= Task, typenameNone= detail::enable_if_classification_t<T>> std::int64_tget_class_count()const
- The class count. Used withoneapi::dal::decision_forest::task::classificationonly.
- template<typenameVisitor> voidtraverse_depth_first(std::int64_ttree_idx, Visitor &&visitor)const
- Performs Depth First Traversal of i-th tree.
- Parameters
- tree_idx– Index of the tree to traverse.
- visitor– This functor gets notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&).
- template<typenameVisitor> voidtraverse_breadth_first(std::int64_ttree_idx, Visitor &&visitor)const
- Performs Breadth First Traversal of i-th tree.
- Parameters
- tree_idx– Index of the tree to traverse.
- visitor– This functor gets notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&).
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::classificationortask::regression.
ConstructorsProperties- Vector of labels
for the training set
.
Default value: table{}.- Getter & Setter
const table & get_labels() const
auto & set_labels(const table &value)
- Vector of responses
for the training set
.
Default value: table{}.- Getter & Setter
const table & get_responses() const
auto & set_responses(const table &value)
- The training set
.
Default value: table{}.- Getter & Setter
const table & get_data() const
auto & set_data(const table &value)
- Template Parameters
- Task– Tag-type that specifies type of the problem to solve. Can betask::classificationortask::regression.
Constructors- train_result()
- Creates a new instance of the class with the default property values.
Properties- A
table containing cumulative out-of-bag error value. Computed when
error_metric_modeset witherror_metric_mode::out_of_bag_error.Default value: table{}.- Getter & Setter
const table & get_oob_err() const
auto & set_oob_err(const table &value)
- A
table containing variable importance value for each feature. Computed when
variable_importance_mode != variable_importance_mode::none.Default value: table{}.- Getter & Setter
const table & get_var_importance() const
auto & set_var_importance(const table &value)
- A
table containing out-of-bag error value per observation. Computed when
error_metric_modeset witherror_metric_mode::out_of_bag_error_per_observation.Default value: table{}.- Getter & Setter
const table & get_oob_err_per_observation() const
auto & set_oob_err_per_observation(const table &value)
- The trained Decision Forest model.Default value: model<Task>{}.
- Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
- template<typenameDescriptor> decision_forest::train_resulttrain(constDescriptor &desc,constdecision_forest::train_input &input)
- Parameters
- desc– Decision Forest algorithm descriptordecision_forest::descriptor.
- input– Input data for the training operation
- Preconditions
input.data.is_empty == false
input.labels.is_empty == false
input.labels.column_count == 1
input.data.row_count == input.labels.row_count
desc.get_bootstrap() == true || (desc.get_bootstrap() == false && desc.get_variable_importance_mode() != variable_importance_mode::mda_raw && desc.get_variable_importance_mode() != variable_importance_mode::mda_scaled)
desc.get_bootstrap() == true || (desc.get_bootstrap() == false && desc.get_error_metric_mode() == error_metric_mode::none)
- Template Parameters
- Task– Tag-type that specifies the type of the problem to solve. Can betask::classificationortask::regression.
ConstructorsProperties- The dataset for inference
.
Default value: table{}.- Getter & Setter
const table & get_data() const
auto & set_data(const table &value)
- The trained Decision Forest model.Default value: model<Task>{}.
- Getter & Setter
const model< Task > & get_model() const
auto & set_model(const model< Task > &value)
- Template Parameters
- Task– Tag-type that specifies the type of the problem to solve. Can betask::classificationortask::regression.
Constructors- infer_result()
- Creates a new instance of the class with the default property values.
Properties- The
table with the predicted labels.
Default value: table{}.- Getter & Setter
const table & get_labels() const
auto & set_labels(const table &value)
- The
table with the predicted responses.
Default value: table{}.- Getter & Setter
const table & get_responses() const
auto & set_responses(const table &value)
- A
table with the predicted class probabilities for each observation.
- Getter & Setter
template> const table & get_probabilities() const
template> auto & set_probabilities(const table &value)
- template<typenameDescriptor> decision_forest::infer_resultinfer(constDescriptor &desc,constdecision_forest::infer_input &input)
- Parameters
- desc– Decision Forest algorithm descriptordecision_forest::descriptor.
- input– Input data for the inference operation
- Preconditions
input.data.is_empty == false