Learning discrete and Lipschitz representations
Gouk, H. (2019). Learning discrete and Lipschitz representations (Thesis, Doctor of Philosophy (PhD)). The University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/13144
Permanent Research Commons link: https://hdl.handle.net/10289/13144
Learning to embed data into a low dimensional vector space that is more useful for some downstream task is one of the most common problems addressed in the representation learning literature. Conventional approaches to solving this problem typically rely on training neural networks using labelled training data. In order to construct an accurate embedding function that will generalise to data not seen during training, one must either gather a very large training dataset, or adequately bias the learning process. This thesis focuses on the task of incorporating new inductive biases into the representation learning paradigm by constraining the set of functions that a learned feature extractor can come from. The first part of this thesis investigates how one can learn a mapping that changes slowly with respect to its input. This is first addressed by deriving the Lipschitz constant of common feed-forward neural network architectures, and subsequently demonstrating how this constant can be constrained during training. Following this, it is investigated how a similar goal can be accomplished when one assumes that the inputs of interest lie near a low dimensional manifold embedded in a high dimensional vector space. This results in an algorithm that takes advantage of an empirical analog to the Lipschitz constant. Experimental results show that these methods have favourable performance compared to other methods commonly used for imposing inductive biases on neural network learning algorithms. In the second part of this thesis, methods for extracting representations using decision tree models are developed. The first method presented is a problem transformation approach that allows one to reuse existing tree induction techniques. The second approach shows how one can incrementally construct decision trees using gradient information as the source of supervision, allowing one to use an ensemble of decision trees as a layer in a neural network. The experimental results indicate that these approaches improve the performance of representation learning on tabular data across multiple tasks.
The University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.
- Higher Degree Theses