dc.description.abstract | Can deep neural networks super-resolve satellite imagery to a high perceptual quality? This thesis explores the juxtaposition between the pixel accuracy and perceptual qualities of super-resolved imagery by comparing and combining a discriminative and a generative network. Rather than solving a theoretical problem, we tackle a real-world low-resolution scenario: Sentinel-2 imagery is super-resolved and evaluated against high-resolution aerial photos as ground truth; this is in contrast to super-resolving previously down-sampled data, which is the methodology used in most other studies. An existing feed-forward network architecture designed for super-resolution, called DeepSUM, is used to super-resolve multiple low-resolution images by a factor of four to obtain a single high-resolution image. DeepSUM is trained using a range of loss functions, to assess the e ect on network accuracy. A novel loss function is created, called variation loss, to help better define edges and textures to create a sharper, perceptually better product. Using an SSIM loss function gives the best result in terms of pixel-based performance. Running DeepSUM alone creates a superior output compared to bicubically up-sampling the input data, but the output is blurry and not photo-realistic. A probabilistic model from the literature, ESRGAN (Enhanced SRGAN), a Generative Adversarial Network, is trained against both raw Sentinel-2 data and the output of DeepSUM. Using ERS-GAN for super-resolution, creates a perceptually better, more realistic looking output. However, the ESRGAN output is less accurate than the DeepSUM output, as measured using pixel-based metrics. Combining ESRGAN with DeepSUM is found to inherit some of the advantages of both approaches. In an end-to-end process, using ESRGAN with the output of DeepSUM trained using variation loss is found to super-resolve an image to better show boundaries, textures and detail. | |