Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 2007 Working Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 2007 Working Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Extracting corpus specific knowledge bases from Wikipedia

      Milne, David N.; Witten, Ian H.; Nichols, David M.
      Thumbnail
      Files
      content.pdf
      93.08Kb
      Citation
      Export citation
      Milne, D., Witten, I.H. & Nichols, D.M. (2007). Extracting corpus specific knowledge bases from Wikipedia. (Working paper series. University of Waikato, Department of Computer Science. No. 03/2007). Hamilton, New Zealand: University of Waikato.
      Permanent Research Commons link: https://hdl.handle.net/10289/69
      Abstract
      Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. One approach, which has been attempted without much success for decades, is to seek statistical natural language processing algorithms that work on free text. Instead, we propose to replace costly professional indexers with thousands of dedicated amateur volunteers--namely, those that are producing Wikipedia. This vast, open encyclopedia represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide WikiSauri: manually-defined yet inexpensive thesaurus structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We also offer concrete evidence of the effectiveness of WikiSauri for assisting information retrieval.
      Date
      2007-06-01
      Type
      Working Paper
      Series
      Computer Science Working Papers
      Report No.
      03/2007
      Publisher
      University of Waikato, Department of Computer Science
      Collections
      • 2007 Working Papers [8]
      Show full item record  

      Usage

      Downloads, last 12 months
      33
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement