Loading...
Retention of User Edits in Content Analysis Output Combination and its Utilisation in Digital Repositories
Abstract
This thesis investigates how repeated applications of content analysis algorithms applied to a document corpus can be incorporated with human corrections in such a way that the corrections are not erased should the corpus be re-processed. Diarisation (who speaks when) is the focal content analysis type, however the edit-aware concepts apply to other areas such as OCR and Automatic Speech Recognition (ASR). As a means for experimentation, an enriched audio player is built to provide the ability to visualise diarisation data. This web tool is further developed to allow for the editing of speaker region bounds and speaker names, the locking of corrected regions, and the comparison between two diarisation outputs. Sequence merging and region conflict concepts are also explored and implemented. The audio tool is integrated into the Greenstone Digital Library Software, which is used as an example digital library framework. The concepts explored are outlined in a way that allows the strategies to be transferable to alternative content analysis systems or digital libraries.
Type
Thesis
Type of thesis
Series
Citation
Date
2023
Publisher
The University of Waikato
Degree
Supervisors
Rights
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.