Li, M., Holmes, G. & Pfahringer, B. (2005). Clustering large datasets using cobweb and K-means in tandem. In G.I. Webb & Xinghuo Yu(Eds.), Proceedings of the 17th Australian Joint Conference on Artificial Intelligence, Cairns, Australia, December 4-6, 2004. (pp. 368-379). Berlin: Springer.
Permanent Research Commons link: http://hdl.handle.net/10289/1461
This paper presents a single scan algorithm for clustering large datasets based on a two phase process which combines two well known clustering methods. The Cobweb algorithm is modified to produce a balanced tree with subclusters at the leaves, and then K-means is applied to the resulting subclusters. The resulting method, Scalable Cobweb, is then compared to a single pass K-means algorithm and standard K-means. The evaluation looks at error as measured by the sum of squared error and vulnerability to the order in which data points are processed.