Loading...
Thumbnail Image
Item

Online and offline heuristics for inferring hierarchies of repetitions in sequences

Abstract
Hierarchical dictionary-based compression schemes form a grammar for a text by replacing each repeated string with a production rule. While such schemes usually operate online, making a replacement as soon as repetition is detected, offline operation permits greater freedom in choosing the order of replacement. In this paper, we compare the online method with three offline heuristics for selecting the next substring to replace: longest string first, most common string first, and the string that minimized the size of the grammar locally. Surprisingly, two of the offline techniques, like the online method, run in time linear in the size of the input. We evaluate each technique on artificial and natural sequences. In general, the locally-most-compressive heuristic performs best, followed by most frequent, the online technique, and, lagging by some distance, the longest-first technique.
Type
Journal Article
Type of thesis
Series
Citation
Nevill-Manning, C.G. & Witten, I.H. (2000). Online and offline heuristics for inferring hierarchies of repetitions in sequences. Proceedings of the IEEE, 88(11), 1745-1755.
Date
2000
Publisher
IEEE, New York
Degree
Supervisors
Rights
This article has been published in the journal: Proc IEEE, ©2008 IEEE.