Publication:
Correcting English text using PPM models

dc.contributor.authorTeahan, W.J.
dc.contributor.authorInglis, Stuart J.
dc.contributor.authorCleary, John G.
dc.contributor.authorHolmes, Geoffrey
dc.date.accessioned2008-10-22T02:52:11Z
dc.date.available2008-10-22T02:52:11Z
dc.date.issued1997-11
dc.description.abstractAn essential component of many applications in natural language processing is a language modeler able to correct errors in the text being processed. For optical character recognition (OCR), poor scanning quality or extraneous pixels in the image may cause one or more characters to be mis-recognized; while for spelling correction, two characters may be transposed, or a character may be inadvertently inserted or missed out. This paper describes a method for correcting English text using a PPM model. A method that segments words in English text is introduced and is shown to be a significant improvement over previously used methods. A similar technique is also applied as a post-processing stage after pages have been recognized by a state-of-the-art commercial OCR system. We show that the accuracy of the OCR system can be increased from 95.9% to 96.6%, a decrease of about 10 errors per page.en_US
dc.format.mimetypeapplication/pdf
dc.identifier.citationTeahan, W.J., Inglis, S., Cleary, J.G. & Holmes, G. (1997). Correcting English text using PPM models. (Working paper 97/26). Hamilton, New Zealand: University of Waikato, Department of Computer Science.en_US
dc.identifier.issn1170-487X
dc.identifier.urihttps://hdl.handle.net/10289/1122
dc.language.isoen
dc.publisherComputer Science, University of Waikatoen_NZ
dc.relation.ispartofseriesComputer Science Working Papers
dc.subjectcomputer scienceen_US
dc.subjectMachine learning
dc.titleCorrecting English text using PPM modelsen_US
dc.typeWorking Paperen_US
dspace.entity.typePublication
pubs.place-of-publicationHamiltonen_NZ
uow.relation.series97/26

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
uow-cs-wp-1997-26.pdf
Size:
1.73 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.8 KB
Format:
Item-specific license agreed upon to submission
Description: