Nevill-Manning, C.G. & Witten, I.H. (2000). Online and offline heuristics for inferring hierarchies of repetitions in sequences. Proceedings of the IEEE, 88(11), 1745-1755.
Permanent Research Commons link: http://hdl.handle.net/10289/1319
Hierarchical dictionary-based compression schemes form a grammar for a text by replacing each repeated string with a production rule. While such schemes usually operate online, making a replacement as soon as repetition is detected, offline operation permits greater freedom in choosing the order of replacement. In this paper, we compare the online method with three offline heuristics for selecting the next substring to replace: longest string first, most common string first, and the string that minimized the size of the grammar locally. Surprisingly, two of the offline techniques, like the online method, run in time linear in the size of the input. We evaluate each technique on artificial and natural sequences. In general, the locally-most-compressive heuristic performs best, followed by most frequent, the online technique, and, lagging by some distance, the longest-first technique.
IEEE, New York
This article has been published in the journal: Proc IEEE, ©2008 IEEE.