Loading...
Thumbnail Image
Publication

Clustering with finite data from semi-parametric mixture distributions

Abstract
Existing clustering methods for the semi-parametric mixture distribution perform well as the volume of data increases. However, they all suffer from a serious drawback in finite-data situations: small outlying groups of data points can be completely ignored in the clusters that are produced, no matter how far away they lie from the major clusters. This can result in unbounded loss if the loss function is sensitive to the distance between clusters. This paper proposes a new distance-based clustering method that overcomes the problem by avoiding global constraints. Experimental results illustrate its superiority to existing methods when small clusters are present in finite data sets; they also suggest that it is more accurate and stable than other methods even when there are no small clusters.
Type
Working Paper
Type of thesis
Series
Computer Science Working Papers
Citation
Wang, Y. & Witten, H. (1999). Clustering with finite data from semi-parametric mixture distributions. (Working paper 99/14). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
Date
1999-11
Publisher
Dept. of Computer Science, University of Waikato
Degree
Supervisors
Rights