Better text compression from fewer lexical n-grams

Smith, Tony C.; Lorenz, Michelle

Item

Better text compression from fewer lexical n-grams

Smith, Tony C.
;
Lorenz, Michelle

Abstract

Word-based context models for text compression have the capacity to outperform more simple character-based models, but are generally unattractive because of inherent problems with exponential model growth and corresponding data sparseness. These ill-effects can be mitigated in an adaptive lossless compression scheme by modelling syntactic and semantic lexical dependencies independently.

Type

Conference Contribution

Citation

Smith, T.C. & Lorenz, M.(2001). Better text compression from fewer lexical n-grams. In Proceedings of Data Compression Conference (DCC ‘01). Washington, DC, USA: IEEE Computer Society.

Date

2001

Publisher

IEEE Computer Society

Rights

Better text compression from fewer lexical n-grams

Smith, Tony C.
;
Lorenz, Michelle

Abstract

Type

Type of thesis

Series

Citation

Date

Publisher

Degree

Supervisors

Rights

Files

Permanent link

DOI

Publisher version

Collections