Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 1997 Working Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 1997 Working Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Fast convergence with a greedy tag-phrase dictionary

      Peeters, Ross; Smith, Tony C.
      Thumbnail
      Files
      uow-cs-wp-1997-23.pdf
      1.678Mb
      Find in your library  
      Citation
      Export citation
      Peeters, R. & Smith, T.C. (1997). Fast convergence with a greedy tag-phrase dictionary. (Working paper 97/23). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
      Permanent Research Commons link: https://hdl.handle.net/10289/1119
      Abstract
      The best general-purpose compression schemes make their gains by estimating a probability distribution over all possible next symbols given the context established by some number of previous symbols. Such context models typically obtain good compression results for plain text by taking advantage of regularities in character sequences. Frequent words and syllables can be incorporated into the model quickly and thereafter used for reasonably accurate prediction. However, the precise context in which frequent patterns emerge is often extremely varied, and each new word or phrase immediately introduces new contexts which can adversely affect the compression rate.

      A great deal of the structural regularity in a natural language is given rather more by properties of its grammar than by the orthographic transcription of its phonology. This implies that access to a grammatical abstraction might lead to good compression. While grammatical models have been used successfully for compressing computer programs [4], grammar-based compression of plain text has received little attention, primarily because of the difficulties associated with constructing a suitable natural language grammar. But even without a precise formulation of the syntax of a language, there is a linguistic abstraction which is easily accessed and which demonstrates a high degree of regularity which can be exploited for compression purposes—namely, lexical categories.
      Date
      1997-11
      Type
      Working Paper
      Series
      Computer Science Working Papers
      Report No.
      97/23
      Collections
      • 1997 Working Papers [31]
      Show full item record  

      Usage

      Downloads, last 12 months
      195
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement