Bach, J. & Witten, I.H. (1999). Lexical attraction for text compression. (Working paper 99/01). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
Permanent Research Commons link: https://hdl.handle.net/10289/1030
New methods of acquiring structural information in text documents may support better compression by identifying an appropriate prediction context for each symbol. The method of “lexical attraction” infers syntactic dependency structures from statistical analysis of large corpora. We describe the generation of a lexical attraction model, discuss its application to text compression, and explore its potential to outperform fixed-context models such as word-level PPM. Perhaps the most exciting aspect of this work is the prospect of using compression as a metric for structure discovery in text.
Computer Science, University of Waikato
- 1999 Working Papers