Loading...
Thumbnail Image
Item

Lexical attraction for text compression

Abstract
New methods of acquiring structural information in text documents may support better compression by identifying an appropriate prediction context for each symbol. The method of “lexical attraction” infers syntactic dependency structures from statistical analysis of large corpora. We describe the generation of a lexical attraction model, discuss its application to text compression, and explore its potential to outperform fixed-context models such as word-level PPM. Perhaps the most exciting aspect of this work is the prospect of using compression as a metric for structure discovery in text.
Type
Working Paper
Type of thesis
Series
Computer Science Working Papers
Citation
Bach, J. & Witten, I.H. (1999). Lexical attraction for text compression. (Working paper 99/01). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
Date
1999-01
Publisher
Computer Science, University of Waikato
Degree
Supervisors
Rights