Text categorization using compression models

Frank, Eibe; Chui, Chang; Witten, Ian H.

Text categorization using compression models

Authors

Frank, Eibe

Chui, Chang

Witten, Ian H.

Files

uow-cs-wp-2000-02.pdf (746.63 KB)

Permanent Link

https://hdl.handle.net/10289/1019

Abstract

Text categorization, or the assignment of natural language texts to predefined categories based on their content, is of growing importance as the volume of information available on the internet continues to overwhelm us. The use of predefined categories implies a “supervised learning” approach to categorization, where already-classified articles which effectively define the categories are used as “training data” to build a model that can be used for classifying new articles that comprise the “test data”. This contrasts with “unsupervised” learning, where there is no training data and clusters of like documents are sought amongst the test articles. With supervised learning, meaningful labels (such as keyphrases) are attached to the training documents, and appropriate labels can be assigned automatically to test documents depending on which category they fall into.

Citation

Frank, E., Chui, C. & Witten, I.H. (2000). Text categorization using compression models. (Working paper 00/02). Hamilton, New Zealand: University of Waikato, Department of Computer Science.

Type

Working Paper

Series name

Computer Science Working Papers

Date

2000-01

Publisher

University of Waikato, Department of Computer Science

Text categorization using compression models

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor