Improving the performance of Hierarchical Hidden Markov Models on Information Extraction tasks

dc.contributor.authorChou, Lin-Yien_NZ
dc.date.accessioned2007-02-12T15:26:08Z
dc.date.available2007-03-06T15:12:46Z
dc.date.issued2006en_NZ
dc.description.abstractThis thesis presents novel methods for creating and improving hierarchical hidden Markov models. The work centers around transforming a traditional tree structured hierarchical hidden Markov model (HHMM) into an equivalent model that reuses repeated sub-trees. This process temporarily breaks the tree structure constraint in order to leverage the benefits of combining repeated sub-trees. These benefits include lowered cost of testing and an increased accuracy of the final model-thus providing the model with greater performance. The result is called a merged and simplified hierarchical hidden Markov model (MSHHMM). The thesis goes on to detail four techniques for improving the performance of MSHHMMs when applied to information extraction tasks, in terms of accuracy and computational cost. Briefly, these techniques are: a new formula for calculating the approximate probability of previously unseen events; pattern generalisation to transform observations, thus increasing testing speed and prediction accuracy; restructuring states to focus on state transitions; and an automated flattening technique for reducing the complexity of HHMMs. The basic model and four improvements are evaluated by applying them to the well-known information extraction tasks of Reference Tagging and Text Chunking. In both tasks, MSHHMMs show consistently good performance across varying sizes of training data. In the case of Reference Tagging, the accuracy of the MSHHMM is comparable to other methods. However, when the volume of training data is limited, MSHHMMs maintain high accuracy whereas other methods show a significant decrease. These accuracy gains were achieved without any significant increase in processing time. For the Text Chunking task the accuracy of the MSHHMM was again comparable to other methods. However, the other methods incurred much higher processing delays compared to the MSHHMM. The results of these practical experiments demonstrate the benefits of the new method-increased accuracy, lower computation costs, and better performance.en_NZ
dc.format.mimetypeapplication/pdf
dc.identifier.citationChou, L.-Y. (2006). Improving the performance of Hierarchical Hidden Markov Models on Information Extraction tasks (Thesis, Doctor of Philosophy (PhD)). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/2633en
dc.identifier.urihttps://hdl.handle.net/10289/2633
dc.language.isoen
dc.publisherThe University of Waikatoen_NZ
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjecthidden Markov modelen_NZ
dc.subjecthierarchical hidden Markov modelen_NZ
dc.subjectinformation extractionen_NZ
dc.subjecttext miningen_NZ
dc.titleImproving the performance of Hierarchical Hidden Markov Models on Information Extraction tasksen_NZ
dc.typeThesisen_NZ
pubs.place-of-publicationHamilton, New Zealanden_NZ
thesis.degree.disciplineComputer Scienceen_NZ
thesis.degree.grantorUniversity of Waikatoen_NZ
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)en_NZ
uow.date.accession2007-02-12T15:26:08Zen_NZ
uow.date.available2007-03-06T15:12:46Zen_NZ
uow.date.migrated2009-06-12T04:51:38Zen_NZ
uow.identifier.adthttp://adt.waikato.ac.nz/public/adt-uow20070212.152608en_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis.pdf
Size:
1.73 MB
Format:
Adobe Portable Document Format