Clustering large datasets using cobweb and K-means in tandem

Li, Mi; Holmes, Geoffrey; Pfahringer, Bernhard

doi:10.1007/978-3-540-30549-1_33

Clustering large datasets using cobweb and K-means in tandem

Authors

Li, Mi

Holmes, Geoffrey

Pfahringer, Bernhard

Permanent Link

https://hdl.handle.net/10289/1461

DOI

10.1007/978-3-540-30549-1_33

Abstract

This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with subclusters at the leaves, and then K-means is applied to the resulting subclusters. The resulting method, Scalable Cobweb, is then compared to a single pass K-means algorithm and standard K-means. The evaluation looks at error as measured by the sum of squared error and vulnerability to the order in which data points are processed.

Citation

Li, M., Holmes, G. & Pfahringer, B. (2005). Clustering large datasets using cobweb and K-means in tandem. In G.I. Webb & Xinghuo Yu(Eds.), Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004. (pp. 368-379). Berlin: Springer.

Type

Conference Contribution

Date

2005

Publisher

Springer, Berlin

Clustering large datasets using cobweb and K-means in tandem

Authors

Permanent Link

DOI

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor