Medelyan, O., & Witten, I.H. (2005). Thesaurus-based index term extraction for agricultural documents. In Proceedings of 2005 EFITA/WCCA Joint Congress on IT in Agriculture, 25-28 July 2005, Vila Real, Portugal (pp. 1122-1129).
Permanent Research Commons link: http://hdl.handle.net/10289/8101
This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction.
© 2005 the authors.