Show simple item record  

dc.contributor.authorYeates, Stuart Andrewen_NZ
dc.date.accessioned2006-09-25T13:35:40Z
dc.date.available2006-10-16T13:40:22Z
dc.date.issued2006en_NZ
dc.identifier.citationYeates, S. A. (2006). Text Augmentation: Inserting markup into natural language text with PPM Models (Thesis, Doctor of Philosophy (PhD)). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/2600en
dc.identifier.urihttps://hdl.handle.net/10289/2600
dc.description.abstractThis thesis describes a new optimisation and new heuristics for automatically marking up XML documents. These are implemented in CEM, using PPMmodels. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods. Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BIBTEX system and markedup in XML with every field from the original BIBTEX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists’ Communique corpus and the Reuters’ corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory. A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked-up documents. The performance of the new heuristics and optimisation is examined using the four corpora.en_NZ
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherThe University of Waikatoen_NZ
dc.rightsAll items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectMarkupen_NZ
dc.subjectText Augmentationen_NZ
dc.subjectTextual Analysisen_NZ
dc.subjectHidden Markov Modelsen_NZ
dc.subjectHMMen_NZ
dc.subjectPPMen_NZ
dc.subjectViterbi Searchen_NZ
dc.subjectPart-Of-Speech Taggingen_NZ
dc.subjectXMLen_NZ
dc.subjectMetadataen_NZ
dc.titleText Augmentation: Inserting markup into natural language text with PPM Modelsen_NZ
dc.typeThesisen_NZ
thesis.degree.disciplineComputer Scienceen_NZ
thesis.degree.grantorUniversity of Waikatoen_NZ
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (PhD)en_NZ
uow.date.accession2006-09-25T13:35:40Zen_NZ
uow.date.available2006-10-16T13:40:22Zen_NZ
uow.identifier.adthttp://adt.waikato.ac.nz/public/adt-uow20060925.133540en_NZ
uow.date.migrated2009-06-14T21:34:08Zen_NZ
pubs.place-of-publicationHamilton, New Zealanden_NZ


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record