Tree-based Density Estimation: Algorithms and Applications
Schmidberger, G. (2009). Tree-based Density Estimation: Algorithms and Applications (Thesis). The University of Waikato, Hamilton, New Zealand. Retrieved from http://hdl.handle.net/10289/3283
Permanent Research Commons link: http://hdl.handle.net/10289/3283
Data Mining can be seen as an extension to statistics. It comprises the preparationof data and the process of gathering new knowledge from it. The extraction ofnew knowledge is supported by various machine learning methods. Many of thealgorithms are based on probabilistic principles or use density estimations for theircomputations. Density estimation has been practised in the field of statistics forseveral centuries. In the simplest case, a histogram estimator, like the simple equalwidthhistogram, can be used for this task and has been shown to be a practicaltool to represent the distribution of data visually and for computation. Like othernonparametric approaches, it can provide a flexible solution. However, flexibilityin existing approaches is generally restricted because the size of the bins is fixed either the width of the bins or the number of values in them. Attempts have beenmade to generate histograms with a variable bin width and a variable number ofvalues per interval, but the computational approaches in these methods have proventoo difficult and too slow even with modern computer technology.In this thesis new flexible histogram estimation methods are developed and testedas part of various machine learning tasks, namely discretization, naive Bayes classification,clustering and multiple-instance learning. Not only are the new densityestimation methods applied to machine learning tasks, they also borrow designprinciples from algorithms that are ubiquitous in artificial intelligence: divide-andconquermethods are a well known way to tackle large problems by dividing theminto small subproblems. Decision trees, used for machine learning classification,successfully apply this approach. This thesis presents algorithms that build densityestimators using a binary split tree to cut a range of values into subranges ofvarying length. No class values are required for this splitting process, making it anunsupervised method. The result is a histogram estimator that adapts well even tocomplex density functions a novel density estimation method with flexible densityestimation ability and good computational behaviour.Algorithms are presented for both univariate and multivariate data. The univariatehistogram estimator is applied to discretization for density estimation andalso used as density estimator inside a naive Bayes classifier. The multivariate histogram,used as the basis for a clustering method, is applied to improve the runtimebehaviour of a well-known algorithm for multiple-instance classification. Performancein these applications is evaluated by comparing the new approaches withexisting methods.
The University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
- Higher Degree Theses