Clustering using finite mixture models
Authors
Loading...
Files
Permanent Link
Publisher link
Rights
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
Abstract
This thesis introduces a class of multivariate mixture models that includes latent class models and mixtures of multivariate normal distributions as special cases. Like latent class models, these models make free use of local independence to reduce the number of parameters in the model and to lead to descriptions of clusters that are easily understood. Provision is made for the introduction of within cluster associations between the variables.
Discrete, multivariate normal and location model distributions are the ‘atoms’ with which the models are built, but where more is known about the nature of the distributions in sub-populations other types of distributions could be used in place of these.
We use the EM algorithm to find the maximum likelihood estimates of the model parameters, however the emphasis is less on parameter estimation than on the use of the estimated component distributions to cluster the data. We implement the approach of multivariate mixture models with a Fortran 77 program. The program is used to fit models to several data sets, including a large medical data set. Analysis of the resulting clusters shows that sensible clusters have been achieved.
The thesis shows how our ability to analyse data using multivariate mixture models can be extended to include the facility to handle situations where data are missing at random in the sense of Rubin (1976). The program written for this thesis incorporates this facility. The scope of the methods proposed is illustrated by clustering several data sets.
Citation
Type
Series name
Date
Publisher
The University of Waikato