Yeates, S., Witten, I.H. & Bainbridge, D. (2001). Tag insertion complexity. In J. A. Stored(Ed.), Proceedings of the Data Compression Conference, March 2001, Snowbird, Utah (pp. 243-252). Washington DC, USA: IEEE Press.
Permanent Research Commons link: https://hdl.handle.net/10289/1324
This paper is about inferring markup information, a generalization of part-of-speech-tagging. We use compression models based on a marked-up training corpus and apply them to fresh, unmarked, text. In effect, this technique builds filters that extract information from text in a way that is generalized because it depends on training text rather than preprogrammed heuristics.
IEEE Computer Society
Copyright © IEEE 2001. This article has been published in Proceedings of the Data Compression Conference, March 2001, Snowbird, Utah.