Show simple item record  

dc.contributor.authorTeahan, W.J.
dc.contributor.authorCleary, John G.
dc.date.accessioned2008-10-22T02:19:45Z
dc.date.available2008-10-22T02:19:45Z
dc.date.issued1997-11
dc.identifier.citationTeahan, W.J. & Cleary, J.G. (1997). Tag based models of English text. (Working paper 97/24). Hamilton, New Zealand: University of Waikato, Department of Computer Science.en_US
dc.identifier.issn1170-487X
dc.identifier.urihttps://hdl.handle.net/10289/1120
dc.description.abstractThe problem of compressing English text is important both because of the ubiquity of English as a target for compression and because of the light that compression can shed on the structure of English. English text is examined in conjunction with additional information about the parts of speech of each word in the text (these are referred to as “tags”). It is shown that the tags plus the text can be compressed more than the text alone. Essentially the tags can be compressed for nothing or even a small net saving in size. A comparison is made of a number of different ways of integrating compression of tags and text using an escape mechanism similar to PPM. These are also compared with standard word based and character based compression programs. The result is that the tag character and word based schemes always outperform the character based schemes. Overall, the tag based schemes outperform the word based schemes. We conclude by conjecturing that tags chosen for compression rather than linguistic purposes would perform even better.en_US
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.relation.ispartofseriesComputer Science Working Papers
dc.subjectcomputer scienceen_US
dc.titleTag based models of English texten_US
dc.typeWorking Paperen_US
uow.relation.series97/24


Files in this item

This item appears in the following Collection(s)

Show simple item record