Reinforcement learning for optimisation of a cascade river system

Loading...
Thumbnail Image

Publisher link

Rights

All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.

Abstract

Getting the most out of renewable power stations has financial benefits and the environmental benefits of displacing electricity generation from fossil fuels. This thesis considers how to reduce losses in the Waikato Hydro Scheme that are due to inefficient operation while ensuring no operating constraints are breached. The hydroelectric dams are currently centrally run by a human "hydro controller" who makes decisions on how much generation to assign to each station at any given time. The hydro controller ensures no operating constraints are breached, but it may be possible to improve the running efficiency of the hydro stations. Currently, hydro controllers only have basic tools to assist them when deciding which action to take. The aim of this thesis is to create live recommendations to help them make decisions. To this end, historical data is used to train a model using reinforcement learning that can be used to provide such live recommendations on how much generation to assign to each station. The Waikato River hydro control problem is formulated as a Markov decision process and several variations on the Monte Carlo control algorithm for reinforcement learning are proposed and compared empirically. The Waikato River hydro control problem has a large action space with billions of distinct ways to assign generation to stations. The K-medoids algorithm is used to cluster actions so that learning becomes practical. Furthermore, once a model has been learned, a hill-climbing approach is used to improve the quality of the selected actions. Combining these techniques gave better running efficiency than historical human performance in 18% of scenarios without breaching any operating constraints. 34% of scenarios did breach constraints so possible improvements to address this are also discussed.

Citation

Type

Series name

Date

Publisher

The University of Waikato

Type of thesis

Supervisor