Clustering large datasets using cobweb and K-means in tandem
Abstract
This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with subclusters at the leaves, and then K-means is applied to the resulting subclusters. The resulting method, Scalable Cobweb, is then compared to a single pass K-means algorithm and standard K-means. The evaluation looks at error as measured by the sum of squared error and vulnerability to the order in which data points are processed.
Type
Conference Contribution
Type of thesis
Series
Citation
Li, M., Holmes, G. & Pfahringer, B. (2005). Clustering large datasets using cobweb and K-means in tandem. In G.I. Webb & Xinghuo Yu(Eds.), Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004. (pp. 368-379). Berlin: Springer.
Date
2005
Publisher
Springer, Berlin