Loading...
Thumbnail Image
Item

Clustering documents using a Wikipedia-based concept representation

Abstract
This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. We test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon related techniques.
Type
Type of thesis
Series
Citation
Huang, A., Witten, I. H., Frank, E. & Milne, D. (2009). Clustering documents using a Wikipedia-based concept representation. In Proceedings of 13th Pacific-Asia Conference, PAKDD 2009 Bangkok, Thailand, April 27-30, 2009. (pp. 628-636).
Date
2009
Publisher
Springer
Degree
Supervisors
Rights
This is an author’s accepted version of an article published in Proceedings of 13th Pacific-Asia Conference, PAKDD 2009 Bangkok, Thailand, April 27-29. ©2009 Springer-Verlag Berlin Heidelberg.