Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Theses
      • Masters Degree Theses
      • View Item
      •   Research Commons
      • University of Waikato Theses
      • Masters Degree Theses
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Improving Digital Library Support for Historic Newspaper Collections

      Lin, Leo
      Thumbnail
      Files
      thesis.pdf
      15.78Mb
      Citation
      Export citation
      Lin, L. (2009). Improving Digital Library Support for Historic Newspaper Collections (Thesis, Master of Science (MSc)). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/3262
      Permanent Research Commons link: https://hdl.handle.net/10289/3262
      Abstract
      National and international initiatives are underway around the globe to digitise the vast treasure troves of historical artefacts they contain and make them available as digital libraries (DLs). The developed DLs are often constructed from facsimile pages with pre-existing metadata, such as historic newspapers stored on microfiche or generated from the non-destructive scanning of precious manuscripts. Access to the source documents is therefore limited to methods constructed from the metadata. Other projects look to introduce full-text indexing through the application of off-the-shelf commercial Optical Character Recognition (OCR) software. While this has greater potential for the end user experience over the metadata-only versions, the approach currently taken is best effort in the time available rather than a process informed by detailed analysis of the issues. In this thesis, we investigate if a richer level of support and service can be achieved by more closely integrating image processing techniques with DL software.

      The thesis presents a variety of experiments, implemented within the recently published open-source OCR System (Ocropus). In particular, existing segmentation algorithms are compared against our own based on Hough Transform, using our own created corpus gathered from different major online digital historic newspaper archives.
      Date
      2009
      Type
      Thesis
      Degree Name
      Master of Science (MSc)
      Supervisors
      Bainbridge, David
      Publisher
      The University of Waikato
      Rights
      All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
      Additional information
      DVD-ROM Appendix available with the print copy of this thesis.
      Collections
      • Masters Degree Theses [2384]
      Show full item record  

      Usage

      Downloads, last 12 months
      64
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement