Learning from batched data: model combination vs data combination

Ting, Kai Ming; Low, Boon Toh; Witten, Ian H.

Learning from batched data: model combination vs data combination

dc.contributor.author	Ting, Kai Ming
dc.contributor.author	Low, Boon Toh
dc.contributor.author	Witten, Ian H.
dc.date.accessioned	2008-10-20T03:49:44Z
dc.date.available	2008-10-20T03:49:44Z
dc.date.issued	1997-05
dc.description.abstract	When presented with multiple batches of data, one can either combine them into a single batch before applying a machine learning procedure or learn from each batch independently and combine the resulting models. The former procedure, data combination, is straightforward; this paper investigates the latter, model combination. Given an appropriate combination method, one might expect model combination to prove superior when the data in each batch was obtained under somewhat different conditions or when different learning algorithms were used on the batches. Empirical results show that model combination often outperforms data combination even when the batches are drawn randomly from a single source of data and the same learning method is used on each. Moreover, this is not just an artifact of one particular method of combining models: it occurs with several different combination methods. We relate this phenomenon to the learning curve of the classifiers being used. Early in the learning process when the learning curve is steep there is much to gain from data combination, but later when it becomes shallow there is less to gain and model combination achieves a greater reduction in variance and hence a lower error rate. The practical implication of these results is that one should consider using model combination rather than data combination, especially when multiple batches of data for the same task are readily available. It is often superior even when the batches are drawn randomly from a single sample, and we expect its advantage to increase if genuine statistical differences between the batches exist.	en_US
dc.format.mimetype	application/pdf
dc.identifier.citation	Ting, K.M., Low, B.T. & Witten, I.H. (1997). Learning from batched data: model combination vs data combination. (Working paper 97/14). Hamilton, New Zealand: University of Waikato, Department of Computer Science.	en_US
dc.identifier.issn	1170-487X
dc.identifier.uri	https://hdl.handle.net/10289/1077
dc.language.iso	en
dc.publisher	Department of Computer Science, University of Waik	en_NZ
dc.relation.ispartofseries	Computer Science Working Papers
dc.subject	computer science	en_US
dc.subject	Machine learning
dc.title	Learning from batched data: model combination vs data combination	en_US
dc.type	Working Paper	en_US
pubs.begin-page	83	en_NZ
pubs.elements-id	54840
pubs.end-page	106	en_NZ
pubs.place-of-publication	Hamilton	en_NZ
uow.relation.series	97/14