MaxGain: Regularisation of neural networks by constraining activation magnitudes
Gouk, H., Pfahringer, B., Frank, E., & Cree, M. J. (2019). MaxGain: Regularisation of neural networks by constraining activation magnitudes. In M. Berlingerio, F. Bonchi, T. Gärtner, N. Hurley, & G. Ifrim (Eds.), Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science (Vol. 11051, pp. 541–556). Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-10925-7_33
Permanent Research Commons link: https://hdl.handle.net/10289/12301
Effective regularisation of neural networks is essential to combat overfitting due to the large number of parameters involved. We present an empirical analogue to the Lipschitz constant of a feed-forward neural network, which we refer to as the maximum gain. We hypothesise that constraining the gain of a network will have a regularising effect, similar to how constraining the Lipschitz constant of a network has been shown to improve generalisation. A simple algorithm is provided that involves rescaling the weight matrix of each layer after each parameter update. We conduct a series of studies on common benchmark datasets, and also a novel dataset that we introduce to enable easier significance testing for experiments using convolutional networks. Performance on these datasets compares favourably with other common regularisation techniques. Data related to this paper is available at: https://www.cs.waikato.ac.nz/~ml/sins10/.
©2019 Springer Nature Switzerland AG.This is the author's accepted version. The final publication is available at Springer via dx.doi.org/10.1007/978-3-030-10925-7_33