Semantic and generative models for lossy text compression

The apparent divergence between the research paradigms of text and image compression has led us to consider the potential for applying methods developed for one domain to the other. This paper examines the idea of "lossy" text compression, which transmits an approximation to the input text rather than the text itself. In image coding, lossy techniques have proven to yield compression factors that are vastly superior to those of the best lossless schemes, and we show that this a also the case for text. Two different methods are described here, one inspired by the use of fractals in image compression. They can be combined into an extremely effective technique that provides much better compression than the present state of the art and yet preserves a reasonable degree of match between the original and received text. The major challenge for lossy text compression is identified as the reliable evaluation of the quality of this match.

Citation

Witten, I. H., Bell, T. C., Moffat, A., Smith, T. C., & Nevill-Manning, C. G. (1992). Semantic and generative models for lossy text compression (Computer Science Working Papers 92/8). Hamilton, New Zealand: Department of Computer Science, University of Waikato.

Type

Working Paper

Series name

Computer Science Working Papers

Date

1992

Publisher

Department of Computer Science, University of Waikato

Semantic and generative models for lossy text compression

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor