Technology and Research
Intel® Technology Journal Home
Volume 09, Issue 02
Compute-Intensive, Highly Parallel Applications and Uses
Table of Contents
Technical Reviewers
About This Journal
Intel Published Articles
Read Past Journals
Subscribe
E-Mail this Journal to a Collegue
Main Visual Description Intel Technology Journal - Featuring Intel's Recent Research and Development
Compute-Intensive, Highly Parallel Applications and Uses
Volume 09    Issue 02    Published May 19, 2005
ISSN 1535-864X    DOI: 10.1535/itj.0902.05
  Section 1 of 10  
Performance and Scalability Analysis of Tree-Based Models in Large-Scale Data-Mining Problems
Alexander Borisov, Technology and Manufacturing Group, Intel Corporation
Igor Chikalov, Technology and Manufacturing Group, Intel Corporation
Victor Eruhimov, Corporate Technology Group, Intel Corporation
Eugene Tuv, Technology and Manufacturing Group, Intel Corporation

Index words: machine learning, data mining, decision trees

Citation for this paper: Borisov, A.; Chikalov, I.; Eruhimov, V.; Tuv, E. "Performance and Scalability Analysis of Tree-Based Models in Large-Scale Data-Mining Problems." Intel Technology Journal. http://developer.intel.com/technology/itj/2005/volume09issue02/
art05_tree-based_models/p01_abstract.htm
(May 2005).
ABSTRACT

Statistical information processing is needed for many applications to extract patterns and unknown interdependencies between factors. A wide variety of data mining algorithms has been developed over the last decade, but active human intervention is still required to drive an analysis. The intention of expert work is to sequentially clarify models, and to compare models to provide accurate predictions. The productivity of expert work is largely constrained by the amount of time that is needed to compute model updates.

Recent modeling techniques such as classification and regression trees, and ensembles of machine-learning classifiers, incur high computational loads. Building such models in online interactive mode is a challenging task for upcoming platforms.

Tree-based models are applicable to a wide range of problems that include medical expert systems, analysis of manufacturing data, financial analysis, and market prediction. Ensembles of trees are notable for their accuracy. They can handle mixed-type data (consisting of both numerical and categorical data) and missing values. Several commercial packages implement these techniques.

In this paper we consider several data-mining methods based on ensembles of trees. The balance between complexity and accuracy is studied for different parameter sets. We provide an analysis of the computational resources required by the algorithms, and we discuss how they scale for execution on multiprocessor systems with shared memory.

  Section 1 of 10  

Error processing SSI file
Download a PDF of this article.   
Email This Page
Back to Top