Better text compression from fewer lexical n-grams

Loading...
Thumbnail Image

Publisher link

Rights

This paper has been published in the Proceedings of Data Compression Conference(DCC ‘01). ©2001 IEEE Computer Society.

Abstract

Word-based context models for text compression have the capacity to outperform more simple character-based models, but are generally unattractive because of inherent problems with exponential model growth and corresponding data sparseness. These ill-effects can be mitigated in an adaptive lossless compression scheme by modelling syntactic and semantic lexical dependencies independently.

Citation

Smith, T.C. & Lorenz, M.(2001). Better text compression from fewer lexical n-grams. In Proceedings of Data Compression Conference (DCC ‘01). Washington, DC, USA: IEEE Computer Society.

Series name

Date

Publisher

IEEE Computer Society

Degree

Type of thesis

Supervisor