Show simple item record  

dc.contributor.authorHuang, Anna
dc.contributor.authorWitten, Ian H.
dc.contributor.authorFrank, Eibe
dc.contributor.authorMilne, David N.
dc.coverage.spatialConference held at Pisa, Italyen_NZ
dc.date.accessioned2010-02-09T01:56:36Z
dc.date.available2010-02-09T01:56:36Z
dc.date.issued2009
dc.identifier.citationHuang, A., Witten, I. H., Frank, E. & Milne, D. (2009). Clustering documents with active learning using Wikipedia. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, December 15-19, 2009. (pp. 839-844). Washington, DC, USA: IEEE Computer Society.en
dc.identifier.urihttps://hdl.handle.net/10289/3557
dc.description.abstractWikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. We first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. We then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. We test our approach on three standard text document datasets. Empirical results show that our basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%.en
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherIEEE Computer Societyen_NZ
dc.rights©2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.en
dc.source8th IEEE International Conference on Data Miningen_NZ
dc.subjectcomputer scienceen
dc.subjectWikipediaen
dc.subjectMachine learning
dc.titleClustering documents with active learning using Wikipediaen
dc.typeConference Contributionen
dc.identifier.doi10.1109/ICDM.2008.80en
dc.relation.isPartOfProc Eighth IEEE International Conference on Data Mining (ICDM)en_NZ
pubs.begin-page839en_NZ
pubs.elements-id18445
pubs.end-page844en_NZ
pubs.finish-date2008-12-19en_NZ
pubs.start-date2008-12-15en_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record