# Decision Forest Classification and Regression (DF)

Decision Forest (DF) classification and regression algorithms are based on an ensemble of tree-structured classifiers, which are known as decision trees. Decision forest is built using the general technique of bagging, a bootstrap aggregation, and a random choice of features. For more details, see [Breiman84] and [Breiman2001].
## Programming Interface

All types and functions in this section are declared in the
oneapi::dal::decision_forest
namespace and are available via inclusion of the
oneapi/dal/algo/decision_forest.hpp
Enum classes
error_metric_mode
error_metric_mode::none
Do not compute error metric.
error_metric_mode::out_of_bag_error
Train produces table with cumulative prediction error for out of bag observations.
error_metric_mode::out_of_bag_error_per_observation
Train produces table with prediction error for out-of-bag observations.
variable_importance_mode
variable_importance_mode::none
Do not compute variable importance.
variable_importance_mode::mdi
Mean Decrease Impurity. Computed as the sum of weighted impurity decreases for all nodes where the variable is used, averaged over all trees in the forest.
variable_importance_mode::mda_raw
Mean Decrease Accuracy (permutation importance). For each tree, the prediction error on the out-of-bag portion of the data is computed (error rate for classification, MSE for regression). The same is done after permuting each predictor variable. The difference between the two are then averaged over all trees.
variable_importance_mode::mda_scaled
Mean Decrease Accuracy (permutation importance). This is MDA_Raw value scaled by its standard deviation.
infer_mode
infer_mode::class_labels
Infer produces a “math:
n times 1
table with the predicted labels.
infer_mode::class_responses
deprecated
infer_mode::class_probabilities
Infer produces table with the predicted class probabilities for each observation.
voting_mode
voting_mode::weighted
The final prediction is combined through a weighted majority voting.
voting_mode::unweighted
The final prediction is combined through a simple majority voting.
Descriptor
template<typename
Float
= float, typename
Method
= method::by_default, typename
class
descriptor
Template Parameters
• Float
– The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
• Method
– Tag-type that specifies an implementation of algorithm. Can be
method::dense
or
method::hist
.
– Tag-type that specifies type of the problem to solve. Can be or .
Constructors
descriptor
() = default
Creates a new instance of the class with the default property values.
Properties
double
impurity_threshold
The impurity threshold, a node will be split if this split induces a decrease of the impurity greater than or equal to the input value.
Default value
: 0.0.
Getter & Setter

double get_impurity_threshold() const
auto & set_impurity_threshold(double value)

Invariants

impurity_threshold >= 0.0

std::int64_t
max_tree_depth
The maximal depth of the tree. If 0, then nodes are expanded until all leaves are pure or until all leaves contain less or equal to min observations in leaf node samples.
Default value
: 0.
Getter & Setter

std::int64_t get_max_tree_depth() const
auto & set_max_tree_depth(std::int64_t value)

std::int64_t
max_bins
The maximal number of discrete bins to bucket continuous features. Used with
method::hist
split-finding method only. Increasing the number results in higher computation costs.
Default value
: 256.
Getter & Setter

std::int64_t get_max_bins() const
auto & set_max_bins(std::int64_t value)

Invariants

max_bins > 1

std::int64_t
seed
Seed for the random numbers generator used by the algorithm.
Getter & Setter

std::int64_t get_seed() const
auto & set_seed(std::int64_t value)

Invariants

tree_count > 0

std::int64_t
tree_count
The number of trees in the forest.
Default value
: 100.
Getter & Setter

std::int64_t get_tree_count() const
auto & set_tree_count(std::int64_t value)

Invariants

tree_count > 0

variable_importance_mode
variable_importance_mode
The variable importance mode.
Default value
: variable_importance_mode::none.
Getter & Setter

variable_importance_mode get_variable_importance_mode() const
auto & set_variable_importance_mode(variable_importance_mode value)

voting_mode
voting_mode
The voting mode. Used with only.
Getter & Setter

template > voting_mode get_voting_mode() const
template > auto & set_voting_mode(voting_mode value)

std::int64_t
features_per_node
The number of features to consider when looking for the best split for a node.
Default value
: task::classification ? sqrt(p) : p/3, where p is the total number of features.
Getter & Setter

std::int64_t get_features_per_node() const
auto & set_features_per_node(std::int64_t value)

double
min_impurity_decrease_in_split_node
The min impurity decrease in a split node is a threshold for stopping the tree growth early. A node will be split if its impurity is above the threshold, otherwise it is a leaf.
Default value
: 0.0.
Getter & Setter

double get_min_impurity_decrease_in_split_node() const
auto & set_min_impurity_decrease_in_split_node(double value)

Invariants

min_impurity_decrease_in_split_node >= 0.0

bool
memory_saving_mode
The memory saving mode.
Default value
: false.
Getter & Setter

bool get_memory_saving_mode() const
auto & set_memory_saving_mode(bool value)

bool
bootstrap
The bootstrap mode, if true, the training set for a tree is a bootstrap of the whole training set, if False, the whole dataset is used to build each tree.
Default value
: true.
Getter & Setter

bool get_bootstrap() const
auto & set_bootstrap(bool value)

std::int64_t
min_bin_size
The minimal number of observations in a bin. Used with
method::hist
split-finding method only.
Default value
: 5.
Getter & Setter

std::int64_t get_min_bin_size() const
auto & set_min_bin_size(std::int64_t value)

Invariants

min_bin_size > 0

double
observations_per_tree_fraction
The fraction of observations per tree.
Default value
: 1.0.
Getter & Setter

double get_observations_per_tree_fraction() const
auto & set_observations_per_tree_fraction(double value)

Invariants

observations_per_tree_fraction > 0.0
observations_per_tree_fraction <= 1.0

std::int64_t
class_count
The class count. Used with only.
Default value
: 2.
Getter & Setter

template > std::int64_t get_class_count() const
template > auto & set_class_count(std::int64_t value)

std::int64_t
max_leaf_nodes
The maximal number of the leaf nodes. If 0, the number of leaf nodes is not limited.
Default value
: 0.
Getter & Setter

std::int64_t get_max_leaf_nodes() const
auto & set_max_leaf_nodes(std::int64_t value)

error_metric_mode
error_metric_mode
The error metric mode.
Default value
: error_metric_mode::none.
Getter & Setter

error_metric_mode get_error_metric_mode() const
auto & set_error_metric_mode(error_metric_mode value)

double
min_weight_fraction_in_leaf_node
The min weight fraction in a leaf node. The minimum weighted fraction of the total sum of weights (of all input observations) required to be at a leaf node.
Default value
: 0.0.
Getter & Setter

double get_min_weight_fraction_in_leaf_node() const
auto & set_min_weight_fraction_in_leaf_node(double value)

Invariants

min_weight_fraction_in_leaf_node >= 0.0
min_weight_fraction_in_leaf_node <= 0.5

std::int64_t
min_observations_in_leaf_node
The minimal number of observations in a leaf node.
Default value
: 1 for classification, 5 for regression.
Getter & Setter

std::int64_t get_min_observations_in_leaf_node() const
auto & set_min_observations_in_leaf_node(std::int64_t value)

Invariants

min_observations_in_leaf_node > 0

infer_mode
infer_mode
The infer mode. Used with only.
Getter & Setter

template > infer_mode get_infer_mode() const
template > auto & set_infer_mode(infer_mode value)

std::int64_t
min_observations_in_split_node
The minimal number of observations in a split node.
Default value
: 2.
Getter & Setter

std::int64_t get_min_observations_in_split_node() const
auto & set_min_observations_in_split_node(std::int64_t value)

Invariants

min_observations_in_split_node > 1

Method tags
struct
dense
Tag-type that denotes dense computational method.
struct
hist
Tag-type that denotes hist computational method.
using
by_default
= dense
Alias tag-type for dense computational method.
struct
classification
Tag-type that parameterizes entities used for solving classification problem.
struct
regression
Tag-type that parameterizes entities used for solving regression problem.
using
by_default
= classification
Alias tag-type for classification task.
Model
template<typename
class
model
Template Parameters
– Tag-type that specifies the type of the problem to solve. Can be or .
Constructors
model
()
Creates a new instance of the class with the default property values.
Public Methods
std::int64_t
get_tree_count
()
const
The number of trees in the forest.
template<typename
T
None
= detail::enable_if_classification_t<T>> std::int64_t
get_class_count
()
const
The class count. Used with
only.
template<typename
Visitor
> void
traverse_depth_first
(std::int64_t
tree_idx
, Visitor &&
visitor
)
const
Performs Depth First Traversal of i-th tree.
Parameters
• tree_idx
– Index of the tree to traverse.
• visitor
– This functor gets notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&).
template<typename
Visitor
> void
(std::int64_t
tree_idx
, Visitor &&
visitor
)
const
Performs Breadth First Traversal of i-th tree.
Parameters
• tree_idx
– Index of the tree to traverse.
• visitor
– This functor gets notified when tree nodes are visited, via corresponding operators: bool operator()(const decision_forest::split_node_info<Task>&) bool operator()(const decision_forest::leaf_node_info<Task>&).
Training
train(...)
Input
template<typename
class
train_input
Template Parameters
– Tag-type that specifies type of the problem to solve. Can be or .
Constructors
train_input
(
const
table &
data
,
const
table &
responses
)
Creates a new instance of the class with the given
data
and
responses
property values.
Properties
const
table &
labels
Vector of labels for the training set .
Default value
: table{}.
Getter & Setter

const table & get_labels() const
auto & set_labels(const table &value)

const
table &
responses
Vector of responses for the training set .
Default value
: table{}.
Getter & Setter

const table & get_responses() const
auto & set_responses(const table &value)

const
table &
data
The training set .
Default value
: table{}.
Getter & Setter

const table & get_data() const
auto & set_data(const table &value)

Result
template<typename
class
train_result
Template Parameters
– Tag-type that specifies type of the problem to solve. Can be or .
Constructors
train_result
()
Creates a new instance of the class with the default property values.
Properties
const
table &
oob_err
A table containing cumulative out-of-bag error value. Computed when
error_metric_mode
set with
error_metric_mode::out_of_bag_error
.
Default value
: table{}.
Getter & Setter

const table & get_oob_err() const
auto & set_oob_err(const table &value)

const
table &
var_importance
A table containing variable importance value for each feature. Computed when
variable_importance_mode != variable_importance_mode::none
.
Default value
: table{}.
Getter & Setter

const table & get_var_importance() const
auto & set_var_importance(const table &value)

const
table &
oob_err_per_observation
A table containing out-of-bag error value per observation. Computed when
error_metric_mode
set with
error_metric_mode::out_of_bag_error_per_observation
.
Default value
: table{}.
Getter & Setter

const table & get_oob_err_per_observation() const
auto & set_oob_err_per_observation(const table &value)

const
model
The trained Decision Forest model.
Default value
Getter & Setter

const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

Operation
template<typename
Descriptor
> decision_forest::train_result
train
(
const
Descriptor &
desc
,
const
decision_forest::train_input &
input
)
Parameters
• desc
– Decision Forest algorithm descriptor
decision_forest::descriptor
.
• input
– Input data for the training operation
Preconditions

input.data.is_empty == false
input.labels.is_empty == false
input.labels.column_count == 1
input.data.row_count == input.labels.row_count
desc.get_bootstrap() == true || (desc.get_bootstrap() == false && desc.get_variable_importance_mode() != variable_importance_mode::mda_raw && desc.get_variable_importance_mode() != variable_importance_mode::mda_scaled)
desc.get_bootstrap() == true || (desc.get_bootstrap() == false && desc.get_error_metric_mode() == error_metric_mode::none)

Inference
infer(...)
Input
template<typename
class
infer_input
Template Parameters
– Tag-type that specifies the type of the problem to solve. Can be or .
Constructors
infer_input
(
const
trained_model
,
const
table &
data
)
Creates a new instance of the class with the given
model
and
data
property values.
Properties
const
table &
data
The dataset for inference .
Default value
: table{}.
Getter & Setter

const table & get_data() const
auto & set_data(const table &value)

const
model
The trained Decision Forest model.
Default value
Getter & Setter

const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

Result
template<typename
class
infer_result
Template Parameters
– Tag-type that specifies the type of the problem to solve. Can be or .
Constructors
infer_result
()
Creates a new instance of the class with the default property values.
Properties
const
table &
labels
The table with the predicted labels.
Default value
: table{}.
Getter & Setter

const table & get_labels() const
auto & set_labels(const table &value)

const
table &
responses
The table with the predicted responses.
Default value
: table{}.
Getter & Setter

const table & get_responses() const
auto & set_responses(const table &value)

const
table &
probabilities
A table with the predicted class probabilities for each observation.
Getter & Setter

template > const table & get_probabilities() const
template > auto & set_probabilities(const table &value)

Operation
template<typename
Descriptor
> decision_forest::infer_result
infer
(
const
Descriptor &
desc
,
const
decision_forest::infer_input &
input
)
Parameters
• desc
– Decision Forest algorithm descriptor
decision_forest::descriptor
.
• input
– Input data for the inference operation
Preconditions

input.data.is_empty == false

