Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 2000 Working Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 2000 Working Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Text categorization using compression models

      Frank, Eibe; Chui, Chang; Witten, Ian H.
      Thumbnail
      Files
      uow-cs-wp-2000-02.pdf
      746.6Kb
      Find in your library  
      Citation
      Export citation
      Frank, E., Chui, C. & Witten, I.H. (2000). Text categorization using compression models. (Working paper 00/02). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
      Permanent Research Commons link: https://hdl.handle.net/10289/1019
      Abstract
      Text categorization, or the assignment of natural language texts to predefined categories based on their content, is of growing importance as the volume of information available on the internet continues to overwhelm us. The use of predefined categories implies a “supervised learning” approach to categorization, where already-classified articles which effectively define the categories are used as “training data” to build a model that can be used for classifying new articles that comprise the “test data”. This contrasts with “unsupervised” learning, where there is no training data and clusters of like documents are sought amongst the test articles. With supervised learning, meaningful labels (such as keyphrases) are attached to the training documents, and appropriate labels can be assigned automatically to test documents depending on which category they fall into.
      Date
      2000-01
      Type
      Working Paper
      Series
      Computer Science Working Papers
      Report No.
      00/02
      Publisher
      University of Waikato, Department of Computer Science
      Collections
      • 2000 Working Papers [12]
      Show full item record  

      Usage

      Downloads, last 12 months
      130
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement