Benchmarking attribute selection techniques for discrete class data mining

Hall, Mark A.; Holmes, Geoffrey

Benchmarking attribute selection techniques for discrete class data mining

Authors

Hall, Mark A.

Holmes, Geoffrey

Files

uow-cs-wp-2002-02.pdf (1.02 MB)

Permanent Link

https://hdl.handle.net/10289/1013

Abstract

Data engineering is generally considered to be a central issue in the development of data mining applications. The success of many learning schemes, in their attempts to construct models of data, hinges on the reliable identification of a small set of highly predictive attributes. The inclusion of irrelevant, redundant and noisy attributes in the model building process phase can result in poor predictive performance and increased computation. Attribute selection generally involves a combination of search and attribute utility estimation plus evaluation with respect to specific learning schemes. This leads to a large number of possible permutation and has led to a situation where very few benchmark studies have been conducted. This paper presents a benchmark comparison of several attribute selection methods for supervised classification. All the methods produce an attribute ranking, a useful devise for isolating the individual merit of an attribute. Attribute selection is achieved by cross-validating the attribute rankings with respect to a classification learner to find the best attributes. Results are reported for a selection of standard data sets and two diverse learning schemes C4.5 and naïve Bayes.

Citation

Holmes, G., Pfahringer, B., Kirkby, R., Frank, E. & Hall, M. (2002). Benchmarking attribute selection techniques for discrete class data mining. (Working paper 02/02). Hamilton, New Zealand: University of Waikato, Department of Computer Science.

Type

Working Paper

Series name

Computer Science Working Papers

Date

2002-04

Publisher

University of Waikato, Department of Computer Science

Benchmarking attribute selection techniques for discrete class data mining

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor