Thompson, J., Bainbridge, D., & Roüast, M. (2012). Parallel processing videos in very large digital libraries. In 14th International Conference on Asia-Pacific Digital Libraries, ICADL 2012, November 12-15 2012. Lecture Notes in Computer Science, 2012, Volume 7634 2012 (pp. 219-228).
Permanent Research Commons link: http://hdl.handle.net/10289/6966
Nowhere are the 'growing pains' of Very Large-scale Digital Libraries more pronounced than in collections containing multimedia data. Not only do such collections contain large numbers of items, but they also push the boundaries of scale in terms of storage space and processing expense. In this paper we explore how applying parallel processing open-source libraries and techniques-previously developed for and applied to textual content-can be of benefit to multimedia digital libraries. We provide a real-world use case of ingesting video into the ReplayMe! system, an extension of the Greenstone digital library software, that simultaneously records and ingests all of the free-to-air television channels available in New Zealand. Current ingest of video in ReplayMe! is intentionally light due to processing time constraints on the single processor architecture it was developed on. The work reported here investigates how this system can be scaled up to include the conversion of the broadcast video transport format to a suitable a streaming format (MP4) and to automatically extract content analysis based keyframes, while still performing within real-time. By applying parallel processing, and utilizing a cluster of sixteen desktop computers, the paper shows how this processing time can be significantly reduced compared to the equivalent computation if conducted serially. We then generalize the work, and show how the same basic techniques can be applied to other common digital library software such as DSpace to provide similar advantages when dealing with processor intensive content.