Smith, Tony C.Lorenz, Michelle2008-12-182008-12-182001Smith, T.C. & Lorenz, M.(2001). Better text compression from fewer lexical n-grams. In Proceedings of Data Compression Conference (DCC ‘01). Washington, DC, USA: IEEE Computer Society.https://hdl.handle.net/10289/1722Word-based context models for text compression have the capacity to outperform more simple character-based models, but are generally unattractive because of inherent problems with exponential model growth and corresponding data sparseness. These ill-effects can be mitigated in an adaptive lossless compression scheme by modelling syntactic and semantic lexical dependencies independently.application/pdfenThis paper has been published in the Proceedings of Data Compression Conference(DCC ‘01). ©2001 IEEE Computer Society.computer sciencetext compressionMachine learningBetter text compression from fewer lexical n-gramsConference Contribution10.1109/DCC.2001.10047