Hierarchical document clustering using automatically extracted keyphrases
Citation
Export citationJones, S. & Mahoui, M. (2000). Hierarchical document clustering using automatically extracted keyphrases. (Working paper 00/13). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
Permanent Research Commons link: https://hdl.handle.net/10289/1029
Abstract
In this paper we present a technique for automatically generating hierarchical clusters of documents. Our technique exploits document keyphrases as features of the document space to support clustering. In fact, we cluster keyphrases rather than documents themselves and then associate documents with keyphrase clusters. We discuss alternative measures of similarity between ‘soft-clusters’ which seed Ward’s hierarchical clustering algorithm, and present the resulting cluster hierarchies that we have produced for a large collection of scientific technical reports. We analyse the effect of the alternative similarity measures and suggest improvement to our technique.
Date
2000-10Type
Report No.
00/13
Publisher
University of Waikato, Department of Computer Science
Collections
- 2000 Working Papers [12]