Thumbnail Image

Semantic and generative models for lossy text compression

The apparent divergence between the research paradigms of text and image compression has led us to consider the potential for applying methods developed for one domain to the other. This paper examines the idea of "lossy" text compression, which transmits an approximation to the input text rather than the text itself. In image coding, lossy techniques have proven to yield compression factors that are vastly superior to those of the best lossless schemes, and we show that this a also the case for text. Two different methods are described here, one inspired by the use of fractals in image compression. They can be combined into an extremely effective technique that provides much better compression than the present state of the art and yet preserves a reasonable degree of match between the original and received text. The major challenge for lossy text compression is identified as the reliable evaluation of the quality of this match.
Working Paper
Type of thesis
Computer Science Working Papers
Witten, I. H., Bell, T. C., Moffat, A., Smith, T. C., & Nevill-Manning, C. G. (1992). Semantic and generative models for lossy text compression (Computer Science Working Papers 92/8). Hamilton, New Zealand: Department of Computer Science, University of Waikato.
Department of Computer Science, University of Waikato
© 1992 by Ian H. Witten, Timothy C. Bell, Alistair Moffat, Tony C. Smith & Craig G. Nevill-Manning.