Feature subset selection: a correlation based filter approach

Loading...
Thumbnail Image

Publisher link

Rights

This is an author’s version of an article published in 1997 International Conference on Neural Information Processing and Intelligent Information Systems. © Springer.

Abstract

Recent work has shown that feature subset selection can have a position affect on the performance of machine learning algorithms. Some algorithms can be slowed or their performance adversely affected by too much data some of which may be irrelevant or redundant to the learning task. Feature subset selection, then, is a method of enhancing the performance of learning algorithms, reducing the hypothesis search space, and, in some cases, reducing the storage requirement. This paper describes a feature subset selector that uses a correlation based heuristic to determine the goodness of feature subsets, and evaluates its effectiveness with three common ML algorithms: a decision tree inducer (C4.5), a naive Bayes classifier, and an instance based learner(IBI). Experiments using a number of standard data sets drawn from real and artificial domains are presented. Feature subset selection gave significant improvement for all three algorithms; C4.5 generated smaller decision trees.

Citation

Hall, M. A. & Smith, L. A. (1997). Feature subset selection: a correlation based filter approach. In 1997 International Conference on Neural Information Processing and Intelligent Information Systems (pp. 855-858). Berlin: Springer.

Series name

Date

Publisher

Springer

Degree

Type of thesis

Supervisor