Medelyan, O., Manion, S., Broekstra, J., Divoli, A., Huang, A.-L., & Witten, I. H. (2013). Constructing a focused taxonomy from a document collection. In P. Cimiano et al. (Eds.): ESWC 2013, LNCS 7882 (pp. 367-381). Berlin, Germany: Springer-Verlag Berlin Heidelberg.
Permanent Research Commons link: http://hdl.handle.net/10289/7975
We describe a new method for constructing custom taxonomies from document collections. It involves identifying relevant concepts and entities in text; linking them to knowledge sources like Wikipedia, DBpedia, Freebase, and any supplied taxonomies from related domains; disambiguating conflicting concept mappings; and selecting semantic relations that best group them hierarchically. An RDF model supports interoperability of these steps, and also provides a flexible way of including existing NLP tools and further knowledge sources. From 2000 news articles we construct a custom taxonomy with 10,000 concepts and 12,700 relations, similar in structure to manually created counterparts. Evaluation by 15 human judges shows the precision to be 89% and 90% for concepts and relations respectively; recall was 75% with respect to a manually generated taxonomy for the same domain.
Springer-Verlag Berlin Heidelberg
This is an author’s accepted version. The original publication is available at www.springerlink.com.