Research Commons

Semantic document representation: Do It with Wikification

Research Commons

Show simple item record

dc.contributor.author Witten, Ian H.
dc.date.accessioned 2012-12-06T22:57:17Z
dc.date.available 2012-12-06T22:57:17Z
dc.date.issued 2012
dc.identifier.citation Witten, I. (2012). Semantic document representation: Do It with Wikification. In Lecture Notes in Computer Science, 2012, Volume 7608 String Processing and Information Retrieval: 19th International Symposium, SPIRE 2012,pp. 17-17. en_NZ
dc.identifier.isbn 9783642341083
dc.identifier.issn 0302-9743
dc.identifier.uri http://hdl.handle.net/10289/6933
dc.description.abstract Wikipedia is a goldmine of information. Each article describes a single concept, and together they constitute a vast investment of manual effort and judgment. Wikification is the process of automatically augmenting a plain-text document with hyperlinks to Wikipedia articles. This involves associating phrases in the document with concepts, disambiguating them, and selecting the most pertinent. All three processes can be addressed by exploiting Wikipedia as a source of data. For the first, link anchor text illustrates how concepts are described in running text. For the second and third, Wikipedia provides millions of examples that can be used to prime machine-learned algorithms for disambiguation and selection respectively. Wikification produces a semantic representation of any document in terms of concepts. We apply this to (a) select index terms for scientific documents, and (b) determine the similarity of two documents, in both cases outperforming humans in terms of agreement with human judgment. I will show how it can be applied to document clustering and classification algorithms, and to produce back of the book indexes, improving on the state of the art in each case. en_NZ
dc.language.iso en
dc.publisher Springer-Verlag en_NZ
dc.relation.uri http://link.springer.com/book/10.1007/978-3-642-34109-0/page/1 en_NZ
dc.subject document representation en_NZ
dc.subject Wikipeda en_NZ
dc.subject large scale data mining en_NZ
dc.subject semantic representation en_NZ
dc.title Semantic document representation: Do It with Wikification en_NZ
dc.type Conference Contribution en_NZ
dc.identifier.doi 10.1007/978-3-642-34109-0/page/1 en_NZ


Full-text options:

This item appears in the following Collection(s)

Show simple item record

Search Research Commons


Advanced Search

Browse

Theses

About Research Commons

My Account

Usage Statistics

Share

  • Bookmark and Share