Yeates, Stuart Andrew; Witten, Ian H.; Bainbridge, David
(IEEE Computer Society, 2001)
This paper is about inferring markup information, a generalization of part-of-speech-tagging. We use compression models based on a marked-up training corpus and apply them to fresh, unmarked, text. In effect, this technique ...