Selected data exploration methods in hydroclimatology
Vetrova, V. (2016). Selected data exploration methods in hydroclimatology (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/10824
Permanent Research Commons link: https://hdl.handle.net/10289/10824
The volumes of climatological data are rapidly growing due to development of new acquisition platforms and advances in data storage technologies. Such advances provide new challenging problems for data analysis methods. As a result, there is an increasing interest in application and development of new machine learning and data mining methods in climatological data analysis. This dissertation contributes to the field of data analysis by evaluation of selected methods and developing new techniques with application to hydro-climatological datasets. It is shown that data pre-processing with the decimated wavelet discrete transform can cause false predictive accuracy in regression machine learning algorithms. A general result is obtained that a decimated wavelet discrete transform based on a pyramidal algorithm requires utilizing some future values of the time series concerned. When the discrete wavelet transform is utilized as a pre-processing step for forecasting the time series, the necessary independence of calibration and validation data is compromised. This in turn translates into over-optimistic forecasting accuracy or even giving the illusion of forecasting skill when there in none. The obtained result is general and has wide implications in any discipline where the discrete wavelet transform is utilised in forecasting frameworks. In addition, a general framework for creating simple predictive models is presented, based on LASSO regularised regression. The method is illustrated for time series modelling with external event forcing but the approach has general applicability. As a contribution to association discovery, two tests of bivariate association are developed. The first method is designed for detecting threshold-like associations in a scatter plot with particular reference to testing the significance of the extent of a data-sparse region within a scatter plot. The second method is a more general test of non-random associations. Both methods utilise significance testing based on randomizations. Finally, LASSO regularized regression is investigated as a tool for discovering informative large scale climatogical predictors of local hydrological processes. A cross-validation scheme is proposed, which is related to practical forecasts of the next time interval to come, while at the same time maximising use of available information. The proposed methodology was applied to a case study predicting next-season river discharges in the Upper Waitaki River in New Zealand. The proposed forecasting methodology and cross-validation frameworks are applicable for similar hydroclimatological forecasting situations. The physical aspect of this part of the study included discovering the influence of the Interdecadal Pacific Oscillation on winter discharges in the upper Waitaki catchment.
University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
- Higher Degree Theses