Carnein, M., Trautmann, H., Bifet, A., & Pfahringer, B. (2019). Towards automated configuration of stream clustering algorithms. In P. Cellier & K. Driessens (Eds.), Machine Learning and Knowledge Discovery in Databases: Proc ECML PKDD 2019, Part 1 (Vol. CCIS 1167, pp. 137–143). Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-43823-4_12
Permanent Research Commons link: https://hdl.handle.net/10289/13746
Clustering is an important technique in data analysis which can reveal hidden patterns and unknown relationships in the data. A common problem in clustering is the proper choice of parameter settings. To tackle this, automated algorithm configuration is available which can automatically find the best parameter settings. In practice, however, many of our today’s data sources are data streams due to the widespread deployment of sensors, the internet-of-things or (social) media. Stream clustering aims to tackle this challenge by identifying, tracking and updating clusters over time. Unfortunately, none of the existing approaches for automated algorithm configuration are directly applicable to the streaming scenario. In this paper, we explore the possibility of automated algorithm configuration for stream clustering algorithms using an ensemble of different configurations. In first experiments, we demonstrate that our approach is able to automatically find superior configurations and refine them over time.
This is a post-peer-review, pre-copyedit version of an article published in Proceedings of ECML PKDD: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. The final authenticated version is available online at: https://doi.org/10.1007/978-3-030-43823-4_12