Semantic document representation: Do It with Wikification

dc.contributor.authorWitten, Ian H.
dc.coverage.spatialConference held at Cartagena de Indias, Colombiaen_NZ
dc.date.accessioned2012-12-06T22:57:17Z
dc.date.available2012-12-06T22:57:17Z
dc.date.issued2012
dc.description.abstractWikipedia is a goldmine of information. Each article describes a single concept, and together they constitute a vast investment of manual effort and judgment. Wikification is the process of automatically augmenting a plain-text document with hyperlinks to Wikipedia articles. This involves associating phrases in the document with concepts, disambiguating them, and selecting the most pertinent. All three processes can be addressed by exploiting Wikipedia as a source of data. For the first, link anchor text illustrates how concepts are described in running text. For the second and third, Wikipedia provides millions of examples that can be used to prime machine-learned algorithms for disambiguation and selection respectively. Wikification produces a semantic representation of any document in terms of concepts. We apply this to (a) select index terms for scientific documents, and (b) determine the similarity of two documents, in both cases outperforming humans in terms of agreement with human judgment. I will show how it can be applied to document clustering and classification algorithms, and to produce back of the book indexes, improving on the state of the art in each case.en_NZ
dc.identifier.citationWitten, I. (2012). Semantic document representation: Do It with Wikification. In Lecture Notes in Computer Science, 2012, Volume 7608 String Processing and Information Retrieval: 19th International Symposium, SPIRE 2012,pp. 17-17.en_NZ
dc.identifier.doi10.1007/978-3-642-34109-0_3en_NZ
dc.identifier.isbn9783642341083
dc.identifier.issn0302-9743
dc.identifier.urihttps://hdl.handle.net/10289/6933
dc.language.isoen
dc.publisherSpringer-Verlagen_NZ
dc.relation.isPartOfProc 19th International Symposium on String Processing and Information Retrievalen_NZ
dc.relation.urihttp://link.springer.com/book/10.1007/978-3-642-34109-0/page/1en_NZ
dc.sourceSPIRE 2012en_NZ
dc.subjectdocument representationen_NZ
dc.subjectWikipedaen_NZ
dc.subjectlarge scale data miningen_NZ
dc.subjectsemantic representationen_NZ
dc.titleSemantic document representation: Do It with Wikificationen_NZ
dc.typeConference Contributionen_NZ
pubs.begin-page17en_NZ
pubs.elements-id22397
pubs.end-page17en_NZ
pubs.finish-date2012-10-25en_NZ
pubs.place-of-publicationGermanyen_NZ
pubs.start-date2012-10-21en_NZ
pubs.volumeLNCS 7608en_NZ
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: