Research Commons
      • Browse 
        • Communities & Collections
        • Titles
        • Authors
        • By Issue Date
        • Subjects
        • Types
        • Series
      • Help 
        • About
        • Collection Policy
        • OA Mandate Guidelines
        • Guidelines FAQ
        • Contact Us
      • My Account 
        • Sign In
        • Register
      View Item 
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 2002 Working Papers
      • View Item
      •   Research Commons
      • University of Waikato Research
      • Computing and Mathematical Sciences
      • Computer Science Working Paper Series
      • 2002 Working Papers
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Racing committees for large datasets.

      Frank, Eibe; Holmes, Geoffrey; Kirkby, Richard Brendon; Hall, Mark A.
      Thumbnail
      Files
      content.pdf
      387.7Kb
      Citation
      Export citation
      Frank, E., Holmes, G., Kirkby, R. & Hall, M. (2002). Racing committees for large datasets. (Working paper series. University of Waikato, Department of Computer Science. No. 03/02/2002). Hamilton, New Zealand: University of Waikato.
      Permanent Research Commons link: https://hdl.handle.net/10289/39
      Abstract
      This paper proposes a method for generating classifiers from large datasets by building a committee of simple base classifiers using a standard boosting algorithm. It allows the processing of large datasets even if the underlying base learning algorithm cannot efficiently do so. The basic idea is to split incoming data into chunks and build a committee based on classifiers build from these individual chunks [3]. Our method extends earlier work in two ways: (a) the best chunk size is chosen automatically by racing committees corresponding to different chunk sizes, and (b) the committees are pruned adaptively to keep the size of each individual committee as small as possible without negatively affecting accuracy. This paper shows that choosing an appropriate chunk size automatically is important because the accuracy of the resulting committee can vary significantly with the chunk size. It also shows that pruning is crucial to make the method practical for large datasets in terms of running time and memory requirements. Surprisingly, the results demonstrate that pruning can also improve accuracy.
      Date
      2002-06-01
      Type
      Working Paper
      Series
      Computer Science Working Papers
      Report No.
      03/02
      Publisher
      University of Waikato, Department of Computer Science
      Collections
      • 2002 Working Papers [12]
      Show full item record  

      Usage

      Downloads, last 12 months
      81
       
       

      Usage Statistics

      For this itemFor all of Research Commons

      The University of Waikato - Te Whare Wānanga o WaikatoFeedback and RequestsCopyright and Legal Statement