Feature subset selection: a correlation based filter approach
Hall, M. A. & Smith, L. A. (1997). Feature subset selection: a correlation based filter approach. In 1997 International Conference on Neural Information Processing and Intelligent Information Systems (pp. 855-858). Berlin: Springer.
Permanent Research Commons link: https://hdl.handle.net/10289/1515
Recent work has shown that feature subset selection can have a position affect on the performance of machine learning algorithms. Some algorithms can be slowed or their performance adversely affected by too much data some of which may be irrelevant or redundant to the learning task. Feature subset selection, then, is a method of enhancing the performance of learning algorithms, reducing the hypothesis search space, and, in some cases, reducing the storage requirement. This paper describes a feature subset selector that uses a correlation based heuristic to determine the goodness of feature subsets, and evaluates its effectiveness with three common ML algorithms: a decision tree inducer (C4.5), a naive Bayes classifier, and an instance based learner(IBI). Experiments using a number of standard data sets drawn from real and artificial domains are presented. Feature subset selection gave significant improvement for all three algorithms; C4.5 generated smaller decision trees.
This is an author’s version of an article published in 1997 International Conference on Neural Information Processing and Intelligent Information Systems. © Springer.