|
Statistical information processing is needed for many applications to extract
patterns and unknown interdependencies between factors. A wide variety of data
mining algorithms has been developed over the last decade, but active human
intervention is still required to drive an analysis. The intention of expert
work is to sequentially clarify models, and to compare models to provide
accurate predictions. The productivity of expert work is largely constrained by
the amount of time that is needed to compute model updates.
Recent modeling techniques such as classification and regression trees, and
ensembles of machine-learning classifiers, incur high computational loads.
Building such models in online interactive mode is a challenging task for
upcoming platforms.
Tree-based models are applicable to a wide range of problems that include
medical expert systems, analysis of manufacturing data, financial analysis, and
market prediction. Ensembles of trees are notable for their accuracy. They can
handle mixed-type data (consisting of both numerical and categorical data) and
missing values. Several commercial packages implement these techniques.
In this paper we consider several data-mining methods based on ensembles of
trees. The balance between complexity and accuracy is studied for different
parameter sets. We provide an analysis of the computational resources required
by the algorithms, and we discuss how they scale for execution on
multiprocessor systems with shared memory.
|