Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 1997 Working Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 1997 Working Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Correcting English text using PPM models

      Teahan, W.J.; Inglis, Stuart J.; Cleary, John G.; Holmes, Geoffrey
      Thumbnail
      Files
      uow-cs-wp-1997-26.pdf
      1.730Mb
      Find in your library  
      Citation
      Export citation
      Teahan, W.J., Inglis, S., Cleary, J.G. & Holmes, G. (1997). Correcting English text using PPM models. (Working paper 97/26). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
      Permanent Research Commons link: https://hdl.handle.net/10289/1122
      Abstract
      An essential component of many applications in natural language processing is a language modeler able to correct errors in the text being processed. For optical character recognition (OCR), poor scanning quality or extraneous pixels in the image may cause one or more characters to be mis-recognized; while for spelling correction, two characters may be transposed, or a character may be inadvertently inserted or missed out.

      This paper describes a method for correcting English text using a PPM model. A method that segments words in English text is introduced and is shown to be a significant improvement over previously used methods. A similar technique is also applied as a post-processing stage after pages have been recognized by a state-of-the-art commercial OCR system. We show that the accuracy of the OCR system can be increased from 95.9% to 96.6%, a decrease of about 10 errors per page.
      Date
      1997-11
      Type
      Working Paper
      Series
      Computer Science Working Papers
      Report No.
      97/26
      Publisher
      Computer Science, University of Waikato
      Collections
      • 1997 Working Papers [31]
      Show full item record  

      Usage

      Downloads, last 12 months
      107
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement