Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computing and Mathematical Sciences Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computing and Mathematical Sciences Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Improving access to large-scale digital libraries through semantic-enhanced search and disambiguation

      Hinze, Annika; Taube-Schock, Craig; Bainbridge, David; Matamua, Rangi; Downie, J. Stephen
      Thumbnail
      Files
      JCDL15-hinze-Capisco.pdf
      Accepted version, 1.866Mb
      DOI
       10.1145/2756406.2756920
      Link
       dx.doi.org
      Find in your library  
      Citation
      Export citation
      Hinze, A., Taube-Schock, C., Bainbridge, D., Matamua, R., & Downie, J. S. (2015). Improving access to large-scale digital libraries through semantic-enhanced search and disambiguation. In 15th ACM/IEEE-CE on Joint Conference on Digital Libraries (pp. 147–156). Knoxville, Tennessee, USA: ACM. http://doi.org/10.1145/2756406.2756920
      Permanent Research Commons link: https://hdl.handle.net/10289/9570
      Abstract
      With 13,000,000 volumes comprising 4.5 billion pages of text, it is currently very difficult for scholars to locate relevant sets of documents that are useful in their research from the HathiTrust Digital Libary (HTDL) using traditional lexically-based retrieval techniques. Existing document search tools and document clustering approaches use purely lexical analysis, which cannot address the inherent ambiguity of natural language. A semantic search approach offers the potential to overcome the shortcoming of lexical search, but even if an appropriate network of ontologies could be decided upon it would require a full semantic markup of each document. In this paper, we present a conceptual design and report on the initial implementation of a new framework that affords the benefits of semantic search while minimizing the problems associated with applying existing semantic analysis at scale. Our approach avoids the need for complete semantic document markup using pre-existing ontologies by developing an automatically generated Concept-in-Context (CiC) network seeded by a priori analysis of Wikipedia texts and identification of semantic metadata. Our Capisco system analyzes documents by the semantics and context of their content. The disambiguation of search queries is done interactively, to fully utilize the domain knowledge of the scholar. Our method achieves a form of semantic-enhanced search that simultaneously exploits the proven scale benefits provided by lexical indexing.
      Date
      2015
      Type
      Conference Contribution
      Publisher
      ACM
      Rights
      This is an author’s accepted version of an article published in Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries. © 2015 ACM.
      Collections
      • Computing and Mathematical Sciences Papers [1455]
      Show full item record  

      Usage

      Downloads, last 12 months
      91
       
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement