Publication:
Lexical attraction for text compression

dc.contributor.authorBach, Joscha
dc.contributor.authorWitten, Ian H.
dc.date.accessioned2008-10-17T02:52:40Z
dc.date.available2008-10-17T02:52:40Z
dc.date.issued1999-01
dc.description.abstractNew methods of acquiring structural information in text documents may support better compression by identifying an appropriate prediction context for each symbol. The method of “lexical attraction” infers syntactic dependency structures from statistical analysis of large corpora. We describe the generation of a lexical attraction model, discuss its application to text compression, and explore its potential to outperform fixed-context models such as word-level PPM. Perhaps the most exciting aspect of this work is the prospect of using compression as a metric for structure discovery in text.en_US
dc.format.mimetypeapplication/pdf
dc.identifier.citationBach, J. & Witten, I.H. (1999). Lexical attraction for text compression. (Working paper 99/01). Hamilton, New Zealand: University of Waikato, Department of Computer Science.en_US
dc.identifier.doi10.1109/DCC.1999.785673en_NZ
dc.identifier.issn1170-487X
dc.identifier.urihttps://hdl.handle.net/10289/1030
dc.language.isoen
dc.publisherComputer Science, University of Waikatoen_NZ
dc.relation.isPartOfData Compression Conference, DCC 1999, Snowbird, Utah, USA, March 29-31, 1999.en_NZ
dc.relation.ispartofseriesComputer Science Working Papers
dc.subjectcomputer scienceen_US
dc.subjectMachine learning
dc.subjectMachine learning
dc.titleLexical attraction for text compressionen_US
dc.typeWorking Paperen_US
dspace.entity.typePublication
pubs.begin-page516en_NZ
pubs.end-page516en_NZ
pubs.place-of-publicationHamiltonen_NZ
uow.relation.series99/01

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
uow-cs-wp-1999-01.pdf
Size:
761.14 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.8 KB
Format:
Item-specific license agreed upon to submission
Description: