Show simple item record  

dc.contributor.advisorBainbridge, David
dc.contributor.advisorWitten, Ian H.
dc.contributor.authorThompson, John Matthew
dc.date.accessioned2016-01-07T01:38:41Z
dc.date.available2016-01-07T01:38:41Z
dc.date.issued2015
dc.identifier.citationThompson, J. M. (2015). A Predictive Model for the Parallel Processing of Digital Libraries (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/9835en
dc.identifier.urihttps://hdl.handle.net/10289/9835
dc.description.abstractThe computing world is facing the problem of a seemingly exponential increase in the amount of raw digital data, and the speed at which it is being collected, is eclipsing our ability to manage it manually. Combine this with the increasing expectations of a growing number of experienced computer users—including real-time access and a demand for expensive-to-process file types such as multimedia—and it is not hard to understand why managing data of this scale and providing timely access to useful information requires specialized algorithms, techniques, and software. Digital libraries are being used to help address these challenges. Drawing upon knowledge learned through traditional library science, digital libraries excel in providing structured user access to a wide variety of documents. They increasingly include tools for managing, moderating, and marking up these documents. Furthermore, they often feature phases where documents are independently processed and so can benefit from the application of parallel processing techniques—the focus of this thesis. Whether a digital library collection can benefit from parallel processing depends on considerations such as document type, processing cost per document, number of documents, and file-system input/output. To aid in deciding when to apply parallel processing techniques to digital libraries, this thesis explores the creation a model for predicting key outcomes of leveraging such techniques. It does so by implementing parallel processing in three distinct open-source digital library tools, undertaking experiments designed to measure key processing features (such as processing time versus number of compute nodes), and applying machine learning techniques to these features in order to derive a predictive model. The model created predicts parallel processing performance at 96% accuracy (adjusted r-squared) for a number of exemplar collection types. The result is a generally applicable tool for estimating the benefits of applying parallel processing to a wide range of digital collections.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherUniversity of Waikato
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectDigital Library
dc.subjectParallel Processing
dc.subjectMathematical Modelling
dc.titleA Predictive Model for the Parallel Processing of Digital Libraries
dc.typeThesis
thesis.degree.grantorUniversity of Waikato
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)
dc.date.updated2015-12-15T21:23:37Z
pubs.place-of-publicationHamilton, New Zealanden_NZ


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record