Carnein, M., Trautmann, H., Bifet, A., & Pfahringer, B. (2020). confstream: automated algorithm selection and configuration of stream clustering algorithms. In I. S. Kotsireas & P. M. Pardalos (Eds.), Proceedings of 14th International Conference on Learning and Intelligent Optimization (LION 2020) (Vol. LNCS 12096, pp. 80–95). Athens, Greece: Springer. https://doi.org/10.1007/978-3-030-53552-0_10
Permanent Research Commons link: https://hdl.handle.net/10289/14113
Machine learning has become one of the most important tools in data analysis. However, selecting the most appropriate machine learning algorithm and tuning its hyperparameters to their optimal values remains a difficult task. This is even more difficult for streaming applications where automated approaches are often not available to help during algorithm selection and configuration. This paper proposes the first approach for automated algorithm selection and configuration of stream clustering algorithms. We train an ensemble of different stream clustering algorithms and configurations in parallel and use the best performing configuration to obtain a clustering solution. By drawing new configurations from better performing ones, we are able to improve the ensemble performance over time. In large experiments on real and artificial data we show how our ensemble approach can improve upon default configurations and can also compete with a-posteriori algorithm configuration. Our approach is considerably faster than a-posteriori approaches and applicable in real-time. In addition, it is not limited to stream clustering and can be generalised to all streaming applications, including stream classification and regression.
This is a post-peer-review, pre-copyedit version of an article published in the Proceedings of 14th International Conference on Learning and Intelligent Optimization (LION 2020). The final authenticated version is available online at: http://dx.doi.org/10.1007/978-3-030-53552-0_10