Show simple item record  

dc.contributor.authorPeeters, Ross
dc.contributor.authorSmith, Tony C.
dc.date.accessioned2008-10-22T02:16:14Z
dc.date.available2008-10-22T02:16:14Z
dc.date.issued1997-11
dc.identifier.citationPeeters, R. & Smith, T.C. (1997). Fast convergence with a greedy tag-phrase dictionary. (Working paper 97/23). Hamilton, New Zealand: University of Waikato, Department of Computer Science.en_US
dc.identifier.issn1170-487X
dc.identifier.urihttps://hdl.handle.net/10289/1119
dc.description.abstractThe best general-purpose compression schemes make their gains by estimating a probability distribution over all possible next symbols given the context established by some number of previous symbols. Such context models typically obtain good compression results for plain text by taking advantage of regularities in character sequences. Frequent words and syllables can be incorporated into the model quickly and thereafter used for reasonably accurate prediction. However, the precise context in which frequent patterns emerge is often extremely varied, and each new word or phrase immediately introduces new contexts which can adversely affect the compression rate. A great deal of the structural regularity in a natural language is given rather more by properties of its grammar than by the orthographic transcription of its phonology. This implies that access to a grammatical abstraction might lead to good compression. While grammatical models have been used successfully for compressing computer programs [4], grammar-based compression of plain text has received little attention, primarily because of the difficulties associated with constructing a suitable natural language grammar. But even without a precise formulation of the syntax of a language, there is a linguistic abstraction which is easily accessed and which demonstrates a high degree of regularity which can be exploited for compression purposes—namely, lexical categories.en_US
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.relation.ispartofseriesComputer Science Working Papers
dc.subjectcomputer scienceen_US
dc.subjectMachine learning
dc.titleFast convergence with a greedy tag-phrase dictionaryen_US
dc.typeWorking Paperen_US
uow.relation.series97/23


Files in this item

This item appears in the following Collection(s)

Show simple item record