Show simple item record  

dc.contributor.authorFrank, Eibe
dc.contributor.authorWitten, Ian H.
dc.date.accessioned2008-10-17T03:34:23Z
dc.date.available2008-10-17T03:34:23Z
dc.date.issued1999-06
dc.identifier.citationFrank, E. & Witten, I.H. (1999). Reduced-error pruning with significance tests. (Working paper 99/10). Hamilton, New Zealand: University of Waikato, Department of Computer Science.en_US
dc.identifier.issn1170-487X
dc.identifier.urihttps://hdl.handle.net/10289/1039
dc.description.abstractWhen building classification models, it is common practice to prune them to counter spurious effects of the training data: this often improves performance and reduces model size. “Reduced-error pruning” is a fast pruning procedure for decision trees that is known to produce small and accurate trees. Apart from the data from which the tree is grown, it uses an independent “pruning” set, and pruning decisions are based on the model’s error rate on this fresh data. Recently it has been observed that reduced-error pruning overfits the pruning data, producing unnecessarily large decision trees. This paper investigates whether standard statistical significance tests can be used to counter this phenomenon. The problem of overfitting to the pruning set highlights the need for significance testing. We investigate two classes of test, “parametric” and “non-parametric.” The standard chi-squared statistic can be used both in a parametric test and as the basis for a non-parametric permutation test. In both cases it is necessary to select the significance level at which pruning is applied. We show empirically that both versions of the chi-squared test perform equally well if their significance levels are adjusted appropriately. Using a collection of standard datasets, we show that significance testing improves on standard reduced error pruning if the significance level is tailored to the particular dataset at hand using cross-validation, yielding consistently smaller trees that perform at least as well and sometimes better.en_US
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherComputer Science, University of Waikatoen_NZ
dc.relation.ispartofseriesComputer Science Working Papers
dc.subjectcomputer scienceen_US
dc.subjectMachine learning
dc.titleReduced-error pruning with significance testsen_US
dc.typeWorking Paperen_US
uow.relation.series99/10
pubs.elements-id54958
pubs.place-of-publicationHamiltonen_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record