Show simple item record  

dc.contributor.advisorDurrant, Robert J
dc.contributor.advisorHunt, Lynette Anne
dc.contributor.authorLim, Jin Sean
dc.date.accessioned2020-02-10T20:18:43Z
dc.date.available2020-02-10T20:18:43Z
dc.date.issued2020
dc.identifier.citationLim, J. S. (2020). Ensemble learning of high dimension datasets (Thesis, Doctor of Philosophy (PhD)). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/13422en
dc.identifier.urihttps://hdl.handle.net/10289/13422
dc.description.abstractEnsemble learning, an approach in Machine Learning, makes decisions based on the collective decision of a committee of learners to solve complex tasks with minimal human intervention. Advances in computing technology have enabled researchers build datasets with the number of features in the order of thousands and enabled building more accurate predictive models. Unfortunately, high dimensional datasets are especially challenging for machine learning due to the phenomenon dubbed as the "curse of dimensionality". One approach to overcoming this challenge is ensemble learning using Random Subspace (RS) method, which has been shown to perform very well empirically however with few theoretical explanations to said effectiveness for classification tasks. In this thesis, we aim to provide theoretical insights into RS ensemble classifiers to give a more in-depth understanding of the theoretical foundations of other ensemble classifiers. We investigate the conditions for norm-preservations in RS projections. Insights into this provide us with the theoretical basis for RS in algorithms that are based on the geometry of the data (i.e. clustering, nearest-neighbour). We then investigate the guarantees for the dot products of two random vectors after RS projection. This guarantee is useful to capture the geometric structure of a classification problem. We will then investigate the accuracy of a majority vote ensemble using a generalized Polya-Urn model, and how the parameters of the model are derived from diversity measures. We will discuss the practical implications of the model, explore the noise tolerance of ensembles, and give a plausible explanation for the effectiveness of ensembles. We will provide empirical corroboration for our main results with both synthetic and real-world high-dimensional data. We will also discuss the implications of our theory on other applications (i.e. compressive sensing). Based on our results, we will propose a method of building ensembles for Deep Neural Network image classifications using RS projections without needing to retrain the neural network, which showed improved accuracy and very good robustness to adversarial examples. Ultimately, we hope that the insights gained in this thesis would make in-roads towards the answer to a key open question for ensemble classifiers, "When will an ensemble of weak learners outperform a single carefully tuned learner?"
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherThe University of Waikato
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectEnsemble learning
dc.subjectHigh dimensional datasets
dc.subjectClassification
dc.subjectRandom subspace method
dc.titleEnsemble learning of high dimension datasets
dc.typeThesis
thesis.degree.grantorThe University of Waikato
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)
dc.date.updated2020-01-30T02:00:36Z
pubs.place-of-publicationHamilton, New Zealanden_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record