Show simple item record  

dc.contributor.authorKirkby, Richard Brendonen_NZ
dc.date.accessioned2008-04-15T10:37:51Z
dc.date.available2008-04-23T15:48:32Z
dc.date.issued2007en_NZ
dc.identifier.citationKirkby, R. B. (2007). Improving Hoeffding Trees (Thesis, Doctor of Philosophy (PhD)). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/2568en
dc.identifier.urihttps://hdl.handle.net/10289/2568
dc.description.abstractModern information technology allows information to be collected at a far greater rate than ever before. So fast, in fact, that the main problem is making sense of it all. Machine learning offers promise of a solution, but the field mainly focusses on achieving high accuracy when data supply is limited. While this has created sophisticated classification algorithms, many do not cope with increasing data set sizes. When the data set sizes get to a point where they could be considered to represent a continuous supply, or data stream, then incremental classification algorithms are required. In this setting, the effectiveness of an algorithm cannot simply be assessed by accuracy alone. Consideration needs to be given to the memory available to the algorithm and the speed at which data is processed in terms of both the time taken to predict the class of a new data sample and the time taken to include this sample in an incrementally updated classification model. The Hoeffding tree algorithm is a state-of-the-art method for inducing decision trees from data streams. The aim of this thesis is to improve this algorithm. To measure improvement, a comprehensive framework for evaluating the performance of data stream algorithms is developed. Within the framework memory size is fixed in order to simulate realistic application scenarios. In order to simulate continuous operation, classes of synthetic data are generated providing an evaluation on a large scale. Improvements to many aspects of the Hoeffding tree algorithm are demonstrated. First, a number of methods for handling continuous numeric features are compared. Second, tree prediction strategy is investigated to evaluate the utility of various methods. Finally, the possibility of improving accuracy using ensemble methods is explored. The experimental results provide meaningful comparisons of accuracy and processing speeds between different modifications of the Hoeffding tree algorithm under various memory limits. The study on numeric attributes demonstrates that sacrificing accuracy for space at the local level often results in improved global accuracy. The prediction strategy shown to perform best adaptively chooses between standard majority class and Naive Bayes prediction in the leaves. The ensemble method investigation shows that combining trees can be worthwhile, but only when sufficient memory is available, and improvement is less likely than in traditional machine learning. In particular, issues are encountered when applying the popular boosting method to streams.en_NZ
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherThe University of Waikatoen_NZ
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectmachine learningen_NZ
dc.subjectclassificationen_NZ
dc.subjectdata streamsen_NZ
dc.subjectdecision treesen_NZ
dc.subjecthoeffding treesen_NZ
dc.subjectboostingen_NZ
dc.subjectbaggingen_NZ
dc.subjectoption treesen_NZ
dc.titleImproving Hoeffding Treesen_NZ
dc.typeThesisen_NZ
thesis.degree.disciplineComputing and Mathematical Sciencesen_NZ
thesis.degree.grantorUniversity of Waikatoen_NZ
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)en_NZ
uow.date.accession2008-04-15T10:37:51Zen_NZ
uow.date.available2008-04-23T15:48:32Zen_NZ
uow.identifier.adthttp://adt.waikato.ac.nz/public/adt-uow20080415.103751en_NZ
uow.date.migrated2009-06-14T21:23:33Zen_NZ
pubs.place-of-publicationHamilton, New Zealanden_NZ


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record