On calibration of nested dichotomies
Leathart, T., Frank, E., Pfahringer, B., & Holmes, G. (2019). On calibration of nested dichotomies. In Q. Yang, Z.-H. Zhou, Z. Gong, M.-L. Zhang, & S.-J. Huang (Eds.), Advances in Knowledge Discovery and Data Mining: Proc 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2019), LNCS 11439 (Vol. Part I, pp. 69–80). Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-16148-4_6
Permanent Research Commons link: https://hdl.handle.net/10289/12887
Nested dichotomies (NDs) are used as a method of transforming a multiclass classification problem into a series of binary problems. A tree structure is induced that recursively splits the set of classes into subsets, and a binary classification model learns to discriminate between the two subsets of classes at each node. In this paper, we demonstrate that these NDs typically exhibit poor probability calibration, even when the binary base models are well-calibrated. We also show that this problem is exacerbated when the binary models are poorly calibrated. We discuss the effectiveness of different calibration strategies and show that accuracy and log-loss can be significantly improved by calibrating both the internal base models and the full ND structure, especially when the number of classes is high.
© 2019 Springer Nature Switzerland AG. This is the author's accepted version. The final publication is available at Springer via dx.doi.org/10.1007/978-3-030-16148-4_6