Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Theses
      • Higher Degree Theses
      • View Item
      •   Research Commons
      • University of Waikato Theses
      • Higher Degree Theses
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      A Predictive Model for the Parallel Processing of Digital Libraries

      Thompson, John Matthew
      Thumbnail
      Files
      thesis.pdf
      2.430Mb
      Citation
      Export citation
      Thompson, J. M. (2015). A Predictive Model for the Parallel Processing of Digital Libraries (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/9835
      Permanent Research Commons link: https://hdl.handle.net/10289/9835
      Abstract
      The computing world is facing the problem of a seemingly exponential increase in the amount of raw digital data, and the speed at which it is being collected, is eclipsing our ability to manage it manually. Combine this with the increasing expectations of a growing number of experienced computer users—including real-time access and a demand for expensive-to-process file types such as multimedia—and it is not hard to understand why managing data of this scale and providing timely access to useful information requires specialized algorithms, techniques, and software.

      Digital libraries are being used to help address these challenges. Drawing upon knowledge learned through traditional library science, digital libraries excel in providing structured user access to a wide variety of documents. They increasingly include tools for managing, moderating, and marking up these documents. Furthermore, they often feature phases where documents are independently processed and so can benefit from the application of parallel processing techniques—the focus of this thesis. Whether a digital library collection can benefit from parallel processing depends on considerations such as document type, processing cost per document, number of documents, and file-system input/output.

      To aid in deciding when to apply parallel processing techniques to digital libraries, this thesis explores the creation a model for predicting key outcomes of leveraging such techniques. It does so by implementing parallel processing in three distinct open-source digital library tools, undertaking experiments designed to measure key processing features (such as processing time versus number of compute nodes), and applying machine learning techniques to these features in order to derive a predictive model.

      The model created predicts parallel processing performance at 96% accuracy (adjusted r-squared) for a number of exemplar collection types. The result is a generally applicable tool for estimating the benefits of applying parallel processing to a wide range of digital collections.
      Date
      2015
      Type
      Thesis
      Degree Name
      Doctor of Philosophy (PhD)
      Supervisors
      Bainbridge, David
      Witten, Ian H.
      Publisher
      University of Waikato
      Rights
      All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
      Collections
      • Higher Degree Theses [1748]
      Show full item record  

      Usage

      Downloads, last 12 months
      406
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement