Out-of-distribution detection with deep hybrid models

Deep learning systems suffer from “silent failures” where they make highly confident, but incorrect, predictions for input instances well outside of their training data. This motivates the development of out-of-distribution (OOD) detection for such systems: the ability to recognise when an input deviates significantly from the training data. The “deep hybrid model” (DHM) for image classification presented by Cao and Z. Zhang (2022a) uses a normalising flow to perform density estimation for OOD detection and addresses shortcomings of approaches that model pixel-space densities: it performs density estimation using classifier features. Remarkably, Cao and Z. Zhang (2022a) claim 100% detection accuracy on a number of common benchmarks but do not make their code available. As we find the principles behind the DHM interesting and sound, we reimplement it to either confirm its capabilities or understand why it falls short. We perform an extensive search over possible model configurations to maximise performance and provide a detailed record for best practice. Although unable to achieve 100% detection accuracy in our experiments, the DHM delivers competitive performance with careful fine-tuning, while exhibiting great sensitivity to hyperparameter settings. We argue that this is predominantly due to an adversarial relationship between the classifier and the normalising flow that can result in the collapse of the feature space. We verify this by means of several synthetic datasets and show that one of the assumptions underlying the DHM architecture, that the feature extractor can be regularised to preserve input-space densities in feature space, is not satisfied, thereby providing an understanding of where the DHM falls short and informing the development of future OOD detectors based on modelling feature space densities. We also evaluate the DHM on a real-world dataset of endemic and invasive stink bugs in New Zealand that poses a fine-grained OOD problem due to the high visual similarity between the bug species. Low DHM performance, compared to OOD benchmarks, reveals the benefit of testing OOD systems in real-world settings.
Type of thesis
The University of Waikato
All items in Research Commons are provided for private study and research purposes and are protected by copyright with all rights reserved unless otherwise indicated.