Racing committees for large datasets.

Frank, Eibe; Holmes, Geoffrey; Kirkby, Richard Brendon; Hall, Mark A.

Racing committees for large datasets.

Authors

Frank, Eibe

Holmes, Geoffrey

Kirkby, Richard Brendon

Hall, Mark A.

Files

content.pdf (387.79 KB)

Permanent Link

https://hdl.handle.net/10289/39

Abstract

This paper proposes a method for generating classifiers from large datasets by building a committee of simple base classifiers using a standard boosting algorithm. It allows the processing of large datasets even if the underlying base learning algorithm cannot efficiently do so. The basic idea is to split incoming data into chunks and build a committee based on classifiers build from these individual chunks [3]. Our method extends earlier work in two ways: (a) the best chunk size is chosen automatically by racing committees corresponding to different chunk sizes, and (b) the committees are pruned adaptively to keep the size of each individual committee as small as possible without negatively affecting accuracy. This paper shows that choosing an appropriate chunk size automatically is important because the accuracy of the resulting committee can vary significantly with the chunk size. It also shows that pruning is crucial to make the method practical for large datasets in terms of running time and memory requirements. Surprisingly, the results demonstrate that pruning can also improve accuracy.

Citation

Frank, E., Holmes, G., Kirkby, R. & Hall, M. (2002). Racing committees for large datasets. (Working paper series. University of Waikato, Department of Computer Science. No. 03/02/2002). Hamilton, New Zealand: University of Waikato.

Type

Working Paper

Series name

Computer Science Working Papers

Date

2002-06-01

Publisher

University of Waikato, Department of Computer Science

Racing committees for large datasets.

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor