Medelyan, O. & Witten, I.H. (2006). Measuring inter-indexer consistency using a thesaurus. In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, 2006, Chapel Hill, NC, USA, June 11 - 15, 2006 (pp. 296-297). New York: ACM.
Permanent Research Commons link: http://hdl.handle.net/10289/1346
When professional indexers independently assign terms to a given document, the term sets generally differ between indexers. Studies of inter-indexer consistency measure the percentage of matching index terms, but none of them consider the semantic relationships that exist amongst these terms. We propose to represent multiple-indexers data in a vector space and use the cosine metric as a new consistency measure that can be extended by semantic relations between index terms. We believe that this new measure is more accurate and realistic than existing ones and therefore more suitable for evaluation of automatically extracted index terms.