Using compression to identify acronyms in text

Yeates, Stuart Andrew; Bainbridge, David; Witten, Ian H.

Using compression to identify acronyms in text

Authors

Yeates, Stuart Andrew

Bainbridge, David

Witten, Ian H.

Files

uow-cs-wp-2000-01.pdf (726.6 KB)

Permanent Link

https://hdl.handle.net/10289/1018

Abstract

Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key technology for text mining, and backed this up with a study that showed how particular kinds of lexical tokens - names, dates, locations, etc. - can be identified and located in running text, using compression models to provide the leverage necessary to distinguish different token types.

Citation

Yeates, S., Bainbridge, D. & Witten, I.H. (2000). Using compression to identify acronyms in text. (Working paper 00/01). Hamilton, New Zealand: University of Waikato, Department of Computer Science.

Type

Working Paper

Series name

Computer Science Working Papers

Date

2000-01

Publisher

University of Waikato, Department of Computer Science

Using compression to identify acronyms in text

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor