Show simple item record  

dc.contributor.authorMedelyan, Olenaen_NZ
dc.date.accessioned2009-12-18T00:41:54Z
dc.date.available2009-12-18T00:41:54Z
dc.date.issued2009en_NZ
dc.identifier.citationMedelyan, O. (2009). Human-competitive automatic topic indexing (Thesis). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/3513en
dc.identifier.urihttps://hdl.handle.net/10289/3513
dc.description.abstractTopic indexing is the task of identifying the main topics covered by a document. These are useful for many purposes: as subject headings in libraries, as keywords in academic publications and as tags on the web. Knowing a document's topics helps people judge its relevance quickly. However, assigning topics manually is labor intensive. This thesis shows how to generate them automatically in a way that competes with human performance. Three kinds of indexing are investigated: term assignment, a task commonly performed by librarians, who select topics from a controlled vocabulary; tagging, a popular activity of web users, who choose topics freely; and a new method of keyphrase extraction, where topics are equated to Wikipedia article names. A general two-stage algorithm is introduced that first selects candidate topics and then ranks them by significance based on their properties. These properties draw on statistical, semantic, domain-specific and encyclopedic knowledge. They are combined using a machine learning algorithm that models human indexing behavior from examples. This approach is evaluated by comparing automatically generated topics to those assigned by professional indexers, and by amateurs. We claim that the algorithm is human-competitive because it chooses topics that are as consistent with those assigned by humans as their topics are with each other. The approach is generalizable, requires little training data and applies across different domains and languages.en_NZ
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherThe University of Waikatoen_NZ
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjecttopic indexingen_NZ
dc.subjectkeyphrase extractionen_NZ
dc.subjecttaggingen_NZ
dc.subjectwikipediaen_NZ
dc.subjectmachine learningen_NZ
dc.titleHuman-competitive automatic topic indexingen_NZ
dc.typeThesisen_NZ
thesis.degree.disciplineComputing and Mathematical Sciencesen_NZ
thesis.degree.grantorUniversity of Waikatoen_NZ
thesis.degree.levelDoctoral
uow.date.accession2009-10-29T16:09:23Zen_NZ
uow.identifier.adthttp://adt.waikato.ac.nz/public/adt-uow20091029.160923
pubs.place-of-publicationHamilton, New Zealanden_NZ


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record