Improving ensembles and prediction intervals for machine learning on data streams

Sun, Yibin

Improving ensembles and prediction intervals for machine learning on data streams

Authors

Sun, Yibin

Files

thesis.pdf (29.33 MB)

Permanent Link

https://hdl.handle.net/10289/17616

Rights

Abstract

The rapid growth of streaming data presents significant challenges for traditional machine learning, including popular tasks like regression and classification. This thesis proposes adaptive and dynamic methods to address key issues, including concept drift, uncertainty quantification, and ensemble optimization, in evolving data streams. The Self-Optimising K Nearest Leaves (SOKNL) regression algorithm integrates k-Nearest Neighbors (kNN) and Adaptive Random Forest Regression (ARF-Reg), dynamically optimizing neighbor selection to improve regression accuracy without relying on fixed window sizes. Extensive experimental results suggest that SOKNL outperforms the state-of-the-art streaming regression algorithms, including its origin, ARF-Reg. For classification tasks, the Dynamic Ensemble Member Selection (DEMS) method dynamically adjusts ensemble size and selects members based on accuracy and diversity, improving predictive performance while handling concept drift. DEMS extends the idea of dynamic selection of ensemble members from SOKNL to classification tasks, with more flexible selection criteria. The Adaptive Prediction Interval (AdaPI) framework provides robust uncertainty quantification by adaptively adjusting prediction intervals based on historical coverage, ensuring reliability in streaming regression. To evaluate prediction intervals holistically, the thesis introduces Coverage Interval Width in Non-dominated Groups (CING), a multi-objective evaluation method balancing interval width and coverage. Aiming at analyzing the proposed methods for regression, this thesis also contributes the New Zealand Energy Pricing (NZEP) datasets, a comprehensive repository for real-time energy analytics. NZEP aims at providing a real, growing, customizable regression data source that can enrich the current regression benchmark data for stream learning, and potentially time-series. By providing scalable, adaptive solutions for regression and classification, this research advances real-time decision-making in streaming data environments.

Type

Thesis

Date

2025

Publisher

The University of Waikato

Degree

Doctor of Philosophy (PhD)

Supervisor

Pfahringer, Bernhard
Bifet, Albert
Gomes, Heitor Murilo

Improving ensembles and prediction intervals for machine learning on data streams

Authors

Files

Permanent Link

Publisher link

Rights

Abstract

Citation

Type

Series name

Date

Publisher

Degree

Type of thesis

Supervisor