Semantic and generative models for lossy text compression

Witten, Ian H.; Bell, Timothy C.; Moffat, Alistair; Smith, Tony C.; Nevill-Manning, Craig G.

Item

Semantic and generative models for lossy text compression

Witten, Ian H.
;
Bell, Timothy C.
;
Moffat, Alistair
;
Smith, Tony C.
;
Nevill-Manning, Craig G.

Abstract

The apparent divergence between the research paradigms of text and image compression has led us to consider the potential for applying methods developed for one domain to the other. This paper examines the idea of "lossy" text compression, which transmits an approximation to the input text rather than the text itself. In image coding, lossy techniques have proven to yield compression factors that are vastly superior to those of the best lossless schemes, and we show that this a also the case for text. Two different methods are described here, one inspired by the use of fractals in image compression. They can be combined into an extremely effective technique that provides much better compression than the present state of the art and yet preserves a reasonable degree of match between the original and received text. The major challenge for lossy text compression is identified as the reliable evaluation of the quality of this match.

Type

Working Paper

Series

Computer Science Working Papers

Citation

Witten, I. H., Bell, T. C., Moffat, A., Smith, T. C., & Nevill-Manning, C. G. (1992). Semantic and generative models for lossy text compression (Computer Science Working Papers 92/8). Hamilton, New Zealand: Department of Computer Science, University of Waikato.

Date

1992

Publisher

Department of Computer Science, University of Waikato

Rights

Permanent link

https://hdl.handle.net/10289/9911

Collections

1992 Working Papers

Full item page

Semantic and generative models for lossy text compression

Witten, Ian H.
;
Bell, Timothy C.
;
Moffat, Alistair
;
Smith, Tony C.
;
Nevill-Manning, Craig G.

Abstract

Type

Type of thesis

Series

Citation

Date

Publisher

Degree

Supervisors

Rights

Permanent link

DOI

Publisher version

Collections