Ensembles of balanced nested dichotomies for multi-class problems

Loading...
Thumbnail Image

Publisher link

Rights

This is an author’s accepted version of a conference paper published in Proc 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. © 2005 Springer.

Abstract

A system of nested dichotomies is a hierarchical decomposition of a multi-class problem with c classes into c−1 two-class problems and can be represented as a tree structure. Ensembles of randomly generated nested dichotomies have proven to be an effective approach to multi-class learning problems [1]. However, sampling trees by giving each tree equal probability means that the depth of a tree is limited only by the number of classes, and very unbalanced trees can negatively affect runtime. In this paper, we investigate two approaches to building balanced nested dichotomies—class-balanced nested dichotomies and data-balanced nested dichotomies—and evaluate them in the same ensemble setting. Using C4.5 decision trees as the base models, we show that both approaches can reduce runtime with little or no effect on accuracy, especially on problems with many classes. We also investigate the effect of caching models when building ensembles of nested dichotomies.

Citation

Series name

Publisher

SPRINGER-VERLAG BERLIN

Degree

Type of thesis

Supervisor