Better text compression from fewer lexical n-grams
Files
Citation
Export citationSmith, T.C. & Lorenz, M.(2001). Better text compression from fewer lexical n-grams. In Proceedings of Data Compression Conference (DCC ‘01). Washington, DC, USA: IEEE Computer Society.
Permanent Research Commons link: https://hdl.handle.net/10289/1722
Abstract
Word-based context models for text compression have the capacity to outperform more simple character-based models, but are generally unattractive because of inherent problems with exponential model growth and corresponding data sparseness. These ill-effects can be mitigated in an adaptive lossless compression scheme by modelling syntactic and semantic lexical dependencies independently.
Date
2001Publisher
IEEE Computer Society
Rights
This paper has been published in the Proceedings of Data Compression Conference(DCC ‘01). ©2001 IEEE Computer Society.