Clustering with finite data from semi-parametric mixture distributions

Wang, Yong; Witten, Ian H.

Clustering with finite data from semi-parametric mixture distributions

Authors

Wang, Yong

Witten, Ian H.

Files

uow-cs-wp-1999-14.pdf (569.21 KB)

Permanent Link

https://hdl.handle.net/10289/1043

Abstract

Existing clustering methods for the semi-parametric mixture distribution perform well as the volume of data increases. However, they all suffer from a serious drawback in finite-data situations: small outlying groups of data points can be completely ignored in the clusters that are produced, no matter how far away they lie from the major clusters. This can result in unbounded loss if the loss function is sensitive to the distance between clusters. This paper proposes a new distance-based clustering method that overcomes the problem by avoiding global constraints. Experimental results illustrate its superiority to existing methods when small clusters are present in finite data sets; they also suggest that it is more accurate and stable than other methods even when there are no small clusters.

Citation

Wang, Y. & Witten, H. (1999). Clustering with finite data from semi-parametric mixture distributions. (Working paper 99/14). Hamilton, New Zealand: University of Waikato, Department of Computer Science.

Type

Working Paper

Series name

Computer Science Working Papers

Date

1999-11

Publisher

Dept. of Computer Science, University of Waikato

Clustering with finite data from semi-parametric mixture distributions

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor