Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computing and Mathematical Sciences Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computing and Mathematical Sciences Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      An open-source toolkit for mining Wikipedia

      Milne, David N.; Witten, Ian H.
      DOI
       10.1016/j.artint.2012.06.007
      Find in your library  
      Citation
      Export citation
      Milne, D.N., & Witten, I. H. (2013). An open-source toolkit for mining Wikipedia. Artificial Intelligence, 194, 222-239.
      Permanent Research Commons link: https://hdl.handle.net/10289/6733
      Abstract
      The online encyclopedia Wikipedia is a vast, constantly evolving tapestry of interlinked articles. For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing and many other research areas. This paper introduces the Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia's rich semantics into their own applications. The toolkit creates databases that contain summarized versions of Wikipedia's content and structure, and includes a Java API to provide access to them. Wikipedia's articles, categories and redirects are represented as classes, and can be efficiently searched, browsed, and iterated over. Advanced features include parallelized processing of Wikipedia dumps, machine-learned semantic relatedness measures and annotation features, and XML-based web services. Wikipedia Miner is intended to be a platform for sharing data mining techniques. Annotation; Disambiguation; Ontology extraction; Semantic relatedness; Toolkit; Wikipedia
      Date
      2013
      Type
      Journal Article
      Publisher
      Elsevier
      Collections
      • Computing and Mathematical Sciences Papers [1455]
      Show full item record  

      Usage

       
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement