Pruning decision trees and lists

dc.contributor.advisorWitten, Ian H.
dc.contributor.authorFrank, Eibe
dc.date.accessioned2022-06-01T21:06:27Z
dc.date.available2022-06-01T21:06:27Z
dc.date.issued2000
dc.date.updated2022-06-01T20:55:36Z
dc.description.abstractMachine learning algorithms are techniques that automatically build models describing the structure at the heart of a set of data. Ideally, such models can be used to predict properties of future data points and people can use them to analyze the domain from which the data originates. Decision trees and lists are potentially powerful predictors and embody an explicit representation of the structure in a dataset. Their accuracy and comprehensibility depends on how concisely the learning algorithm can summarize this structure. The final model should not incorporate spurious effects-patterns that are not genuine features of the underlying domain. Given an efficient mechanism for determining when a particular effect is due to chance alone, non-predictive parts of a model can be eliminated or “pruned.” Pruning mechanisms require a sensitive instrument that uses the data to detect whether there is a genuine relationship between the components of a model and the domain. Statistical significance tests are theoretically well-founded tools for doing exactly that. This thesis presents pruning algorithms for decision trees and lists that are based on significance tests. We explain why pruning is often necessary to obtain small and accurate models and show that the performance of standard pruning algorithms can be improved by taking the statistical significance of observations into account. We compare the effect of parametric and non-parametric tests, analyze why current pruning algorithms for decision lists often prune too aggressively, and review related work-in particular existing approaches that use significance tests in the context of pruning. The main outcome of this investigation is a set of simple pruning algorithms that should prove useful in practical data mining applications.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10289/14883
dc.language.isoen
dc.publisherThe University of Waikato
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.titlePruning decision trees and lists
dc.typeThesis
pubs.place-of-publicationHamilton, New Zealanden_NZ
thesis.degree.grantorThe University of Waikato
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
13.33 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.58 KB
Format:
Item-specific license agreed upon to submission
Description: