Clustering large datasets using cobweb and K-means in tandem
Loading...
Permanent Link
Publisher link
Rights
Abstract
This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with subclusters at the leaves, and then K-means is applied to the resulting subclusters. The resulting method, Scalable Cobweb, is then compared to a single pass K-means algorithm and standard K-means. The evaluation looks at error as measured by the sum of squared error and vulnerability to the order in which data points are processed.
Citation
Li, M., Holmes, G. & Pfahringer, B. (2005). Clustering large datasets using cobweb and K-means in tandem. In G.I. Webb & Xinghuo Yu(Eds.), Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004. (pp. 368-379). Berlin: Springer.
Series name
Date
Publisher
Springer, Berlin