Show simple item record  

dc.contributor.authorHuang, Lan
dc.contributor.authorMilne, David N.
dc.contributor.authorFrank, Eibe
dc.contributor.authorWitten, Ian H.
dc.date.accessioned2012-07-25T04:30:15Z
dc.date.available2012-07-25T04:30:15Z
dc.date.issued2012
dc.identifier.citationHuang, L., Milne, D.N., Frank, E. & Witten, I.H. (2012). Learning a concept-based document similarity measure. Journal of American Society for Information Science and Technology, 63(8), 1593-1608 .en_NZ
dc.identifier.urihttps://hdl.handle.net/10289/6561
dc.description.abstractDocument similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning techniques. Experiments show that the new measure produces values for documents that are more consistent with people's judgments than people are with each other. We also use it to classify and cluster large document sets covering different genres and topics, and find that it improves both classification and clustering performance.en_NZ
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherWileyen_NZ
dc.relation.urihttp://onlinelibrary.wiley.com/doi/10.1002/asi.22689/abstracten_NZ
dc.rightsThis is a preprint of an article accepted for publication in Journal of the American Society for Information Science and Technology. © 2012 (American Society for Information Science and Technology)en_NZ
dc.subjectcontent analysisen_NZ
dc.subjecttext miningen_NZ
dc.subjectsemantic analysisen_NZ
dc.titleLearning a concept-based document similarity measureen_NZ
dc.typeJournal Articleen_NZ
dc.identifier.doi10.1002/asi.22689en_NZ
dc.relation.isPartOfJournal of the American Society for Information Science and Technologyen_NZ
pubs.begin-page1593en_NZ
pubs.editionAugusten_NZ
pubs.elements-id37717
pubs.end-page1608en_NZ
pubs.issue8en_NZ
pubs.volume63en_NZ
uow.identifier.article-no8en_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record