Nevill-Manning, C. G. & Witten, I. H. (1996). Authorship patterns in information systems. (Working paper 96/25). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
Permanent Research Commons link: http://hdl.handle.net/10289/1186
This paper describes an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing phrases which appear more than once by a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence. The algorithm works by maintaining two constraints: every diagram in the grammar must be unique, and every rule must be used more than once. It breaks new ground by operating incrementally. Moreover, its simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 10,000 symbols/second and has been applied to an extensive range of sequences encountered in practice.
- 1996 Working Papers