Feature subset selection: a correlation based filter approach

Hall, Mark A.Smith, Lloyd A.2008-12-022008-12-021997Hall, M. A. & Smith, L. A. (1997). Feature subset selection: a correlation based filter approach. In 1997 International Conference on Neural Information Processing and Intelligent Information Systems (pp. 855-858). Berlin: Springer.https://hdl.handle.net/10289/1515Recent work has shown that feature subset selection can have a position affect on the performance of machine learning algorithms. Some algorithms can be slowed or their performance adversely affected by too much data some of which may be irrelevant or redundant to the learning task. Feature subset selection, then, is a method of enhancing the performance of learning algorithms, reducing the hypothesis search space, and, in some cases, reducing the storage requirement. This paper describes a feature subset selector that uses a correlation based heuristic to determine the goodness of feature subsets, and evaluates its effectiveness with three common ML algorithms: a decision tree inducer (C4.5), a naive Bayes classifier, and an instance based learner(IBI). Experiments using a number of standard data sets drawn from real and artificial domains are presented. Feature subset selection gave significant improvement for all three algorithms; C4.5 generated smaller decision trees.application/pdfenThis is an author’s version of an article published in 1997 International Conference on Neural Information Processing and Intelligent Information Systems. © Springer.computer sciencefeature selectiondecision treenaive BayesFeature subset selection: a correlation based filter approachConference Contribution