Show simple item record  

dc.contributor.authorNevill-Manning, Craig G.
dc.contributor.authorWitten, Ian H.
dc.contributor.authorOlsen, Dan R., Jr.
dc.date.accessioned2010-12-02T21:15:14Z
dc.date.available2010-12-02T21:15:14Z
dc.date.issued1996
dc.identifier.citationNevill-Manning, C.G., Witten, I.H. & Olsen, D.R., Jr. (1996). Compressing semi-structured text using hierarchical phrase identification. In Data Compression Conference (DCC ‘96), Snowbird, Utah, March 31-April 3, 1996 (pp. 63-72). California, USA: IEEE Computer Society Press.en_NZ
dc.identifier.urihttps://hdl.handle.net/10289/4835
dc.description.abstractMany computer files contain highly-structured, predictable information interspersed with information which has less regularity and is therefore less predictable—such as free text. Examples range from word-processing source files, which contain precisely-expressed formatting specifications enclosing tracts of natural-language text, to files containing a sequence of filled-out forms which have a predefined skeleton clothed with relatively unpredictable entries. These represent extreme ends of a spectrum. Word-processing files are dominated by free text, and respond well to general-purpose compression techniques. Forms generally contain database-style information, and are most appropriately compressed by taking into account their special structure. But one frequently encounters intermediate cases. For example, in many email messages the formal header and the informal free-text content are equally voluminous. Short SGML files often contain comparable amounts of formal structure and informal text. Although such files may be compressed quite well by general-purpose adaptive text compression algorithms, which will soon pick up the regular structure during the course of normal adaptation, better compression can often be obtained by methods that are equipped to deal with both formal and informal structure.en_NZ
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherIEEE Computer Society Pressen_NZ
dc.relation.urihttp://www.computer.org/portal/web/csdl/doi/10.1109/DCC.1996.488311en_NZ
dc.rights©1996 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.en_NZ
dc.subjectcomputer scienceen_NZ
dc.subjectcompressingen_NZ
dc.subjectMachine learning
dc.subjectMachine learning
dc.titleCompressing semi-structured text using hierarchical phrase identificationen_NZ
dc.typeConference Contributionen_NZ
dc.identifier.doi10.1109/DCC.1996.488311en_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record