Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 1997 Working Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 1997 Working Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Tag based models of English text

      Teahan, W.J.; Cleary, John G.
      Thumbnail
      Files
      uow-cs-wp-1997-24.pdf
      1.891Mb
      Find in your library  
      Citation
      Export citation
      Teahan, W.J. & Cleary, J.G. (1997). Tag based models of English text. (Working paper 97/24). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
      Permanent Research Commons link: https://hdl.handle.net/10289/1120
      Abstract
      The problem of compressing English text is important both because of the ubiquity of English as a target for compression and because of the light that compression can shed on the structure of English. English text is examined in conjunction with additional information about the parts of speech of each word in the text (these are referred to as “tags”). It is shown that the tags plus the text can be compressed more than the text alone. Essentially the tags can be compressed for nothing or even a small net saving in size. A comparison is made of a number of different ways of integrating compression of tags and text using an escape mechanism similar to PPM. These are also compared with standard word based and character based compression programs. The result is that the tag character and word based schemes always outperform the character based schemes. Overall, the tag based schemes outperform the word based schemes. We conclude by conjecturing that tags chosen for compression rather than linguistic purposes would perform even better.
      Date
      1997-11
      Type
      Working Paper
      Series
      Computer Science Working Papers
      Report No.
      97/24
      Collections
      • 1997 Working Papers [31]
      Show full item record  

      Usage

      Downloads, last 12 months
      95
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement