Learning from batched data: model combination vs data combination

Ting, Kai Ming; Low, Boon Toh; Witten, Ian H.

Learning from batched data: model combination vs data combination

Authors

Ting, Kai Ming

Low, Boon Toh

Witten, Ian H.

Files

uow-cs-wp-1997-14.pdf (4.7 MB)

Permanent Link

https://hdl.handle.net/10289/1077

Abstract

When presented with multiple batches of data, one can either combine them into a single batch before applying a machine learning procedure or learn from each batch independently and combine the resulting models. The former procedure, data combination, is straightforward; this paper investigates the latter, model combination. Given an appropriate combination method, one might expect model combination to prove superior when the data in each batch was obtained under somewhat different conditions or when different learning algorithms were used on the batches. Empirical results show that model combination often outperforms data combination even when the batches are drawn randomly from a single source of data and the same learning method is used on each. Moreover, this is not just an artifact of one particular method of combining models: it occurs with several different combination methods. We relate this phenomenon to the learning curve of the classifiers being used. Early in the learning process when the learning curve is steep there is much to gain from data combination, but later when it becomes shallow there is less to gain and model combination achieves a greater reduction in variance and hence a lower error rate. The practical implication of these results is that one should consider using model combination rather than data combination, especially when multiple batches of data for the same task are readily available. It is often superior even when the batches are drawn randomly from a single sample, and we expect its advantage to increase if genuine statistical differences between the batches exist.

Citation

Ting, K.M., Low, B.T. & Witten, I.H. (1997). Learning from batched data: model combination vs data combination. (Working paper 97/14). Hamilton, New Zealand: University of Waikato, Department of Computer Science.

Type

Working Paper

Series name

Computer Science Working Papers

Date

1997-05

Publisher

Department of Computer Science, University of Waik

Learning from batched data: model combination vs data combination

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor