Frank, E. & Witten, I.H.(1998). Using a permutation test for attribute selection in decision trees. In Proceeding of 15th International Conference on Machine Learning, Madison, Wisconsin(pp.152-160). San Francisco: Morgan Kaufmann Publishers.
Permanent Research Commons link: http://hdl.handle.net/10289/1506
Most techniques for attribute selection in decision trees are biased towards attributes with many values, and several ad hoc solutions to this problem have appeared in the machine learning literature. Statistical tests for the existence of an association with a prespecified significance level provide a well-founded basis for addressing the problem. However, many statistical tests are computed from a chi-squared distribution, which is only a valid approximation to the actural distribution in the large-sample case-and this patently does not hold near the leaves of a decision tree. An exception is the class of permutation tests. We describe how permutation tests can be applied to this problem. We choose one such test for further exploration, and give a novel two-stage method for applying it to select attributes in a decision tree. Results on practical datasets compare favourably with other methods that also adopt a pre-pruning strategy.
Morgan Kaufmann Publishers
This article has been published in Proceeding of 15th International Conference on Machine Learning, Madison, Wisconsin. ©1998 Morgan Kaufmann.