dc.contributor.author | Schmidberger, Gabi | en_NZ |
dc.date.accessioned | 2009-09-08T16:05:42Z | |
dc.date.available | 2009-09-11T14:44:49Z | |
dc.date.issued | 2009 | en_NZ |
dc.identifier.citation | Schmidberger, G. (2009). Tree-based Density Estimation: Algorithms and Applications (Thesis). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/3283 | en |
dc.identifier.uri | https://hdl.handle.net/10289/3283 | |
dc.description.abstract | Data Mining can be seen as an extension to statistics. It comprises the preparation
of data and the process of gathering new knowledge from it. The extraction of
new knowledge is supported by various machine learning methods. Many of the
algorithms are based on probabilistic principles or use density estimations for their
computations. Density estimation has been practised in the field of statistics for
several centuries. In the simplest case, a histogram estimator, like the simple equalwidth
histogram, can be used for this task and has been shown to be a practical
tool to represent the distribution of data visually and for computation. Like other
nonparametric approaches, it can provide a flexible solution. However, flexibility
in existing approaches is generally restricted because the size of the bins is fixed
either the width of the bins or the number of values in them. Attempts have been
made to generate histograms with a variable bin width and a variable number of
values per interval, but the computational approaches in these methods have proven
too difficult and too slow even with modern computer technology.
In this thesis new flexible histogram estimation methods are developed and tested
as part of various machine learning tasks, namely discretization, naive Bayes classification,
clustering and multiple-instance learning. Not only are the new density
estimation methods applied to machine learning tasks, they also borrow design
principles from algorithms that are ubiquitous in artificial intelligence: divide-andconquer
methods are a well known way to tackle large problems by dividing them
into small subproblems. Decision trees, used for machine learning classification,
successfully apply this approach. This thesis presents algorithms that build density
estimators using a binary split tree to cut a range of values into subranges of
varying length. No class values are required for this splitting process, making it an
unsupervised method. The result is a histogram estimator that adapts well even to
complex density functions a novel density estimation method with flexible density
estimation ability and good computational behaviour.
Algorithms are presented for both univariate and multivariate data. The univariate
histogram estimator is applied to discretization for density estimation and
also used as density estimator inside a naive Bayes classifier. The multivariate histogram,
used as the basis for a clustering method, is applied to improve the runtime
behaviour of a well-known algorithm for multiple-instance classification. Performance
in these applications is evaluated by comparing the new approaches with
existing methods. | en_NZ |
dc.format.mimetype | application/pdf | |
dc.language.iso | en | |
dc.publisher | The University of Waikato | en_NZ |
dc.rights | All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated. | |
dc.subject | machine learning | en_NZ |
dc.subject | data mining | en_NZ |
dc.subject | density estimation | en_NZ |
dc.subject | multivariate density estimation | en_NZ |
dc.subject | histograms | en_NZ |
dc.subject | multi-dimensional histograms | en_NZ |
dc.subject | nonparametric density estimation | en_NZ |
dc.subject | discretization | en_NZ |
dc.subject | naive Bayes | en_NZ |
dc.subject | clustering | en_NZ |
dc.subject | multi-dimensional clustering | en_NZ |
dc.subject | subspace clustering | en_NZ |
dc.subject | multiple-instance learning | en_NZ |
dc.subject | data exploration | en_NZ |
dc.subject | data visualization | en_NZ |
dc.title | Tree-based Density Estimation: Algorithms and Applications | en_NZ |
dc.type | Thesis | en_NZ |
thesis.degree.discipline | Computer Science | en_NZ |
thesis.degree.grantor | University of Waikato | en_NZ |
thesis.degree.level | Doctoral | |
thesis.degree.name | Doctor of Philosophy (PhD) | |
uow.date.accession | 2009-09-08T16:05:42Z | en_NZ |
uow.date.available | 2009-09-11T14:44:49Z | en_NZ |
uow.identifier.adt | http://adt.waikato.ac.nz/public/adt-uow20090908.160542 | en_NZ |
pubs.place-of-publication | Hamilton, New Zealand | en_NZ |