Item

An Evaluation of Document Keyphrase Sets

Abstract
Keywords and keyphrases have many useful roles as document surrogates and descriptors, but the manual production of keyphrase metadata for large digital library collections is at best expensive and time-consuming, and at worst logistically impossible. Algorithms for keyphrase extraction like Kea and Extractor produce a set of phrases that are associated with a document. Though these sets are often utilized as a group, keyphrase extraction is usually evaluated by measuring the quality of individual keyphrases. This paper reports an assessment that asks human assessors to rate entire sets of keyphrases produced by Kea, Extractor and document authors. The results provide further evidence that human assessors rate all three sources highly (with some caveats), but show that the relationship between the quality of the phrases in a set and the set as a whole is not always simple. Choosing the best individual phrases will not necessarily produce the best set; combinations of lesser phrases may result in better overall quality.
Type
Conference Contribution
Type of thesis
Series
Citation
Jones, S. & Paynter, G.W.(2003). An evaluation of document keyphrase sets. Journal of Digital Information, 4(1).
Date
2003
Publisher
British Computer Society
Degree
Supervisors
Rights
This is an article published in the Journal of Digital Information. The original publication is available at http://journals.tdl.org/jodi/index
DOI
Publisher version