山口県大学共同リポジトリ

Title

教師無し深層学習によるハイパースペクトル画像の超解像度

Deep unsupervised hyperspectral image super-resolution

Degree 博士(理学) Dissertation Number 創科博甲第120号 (2023-09-26)

Degree Grantors Yamaguchi University

[kakenhi]15501 grid.268397.1

Abstract

Hyperspectral (HS) imaging can capture the detailed spectral signature of each spatial location of a scene and leads to better understanding of different material characteristics than traditional imaging systems. However, existing HS sensors can only provide low spatial resolution images at a video rate in practice. Thus reconstructing high-resolution HS (HR-HS) image via fusing a low-resolution HS (LR-HS) image and a high-resolution RGB (HR-RGB) image with image processing and machine learning technique, called as hyperspectral image super resolution (HSI SR), has attracted a lot of attention. Existing methods for HSI SR are mainly categorized into two research directions: mathematical model based method and deep learning based method. Mathematical model based methods generally formulate the degradation procedure of the observed LR-HS and HR-RGB images with a mathematical model and employ an optimization strategy for solving. Due to the ill-posed essence of the fusion problem, most works leverage the hand-crafted prior to model the underlying structure of the latent HR-HS image, and pursue a more robust solution of the HR-HS image. Recently, deep learning-based approaches have evolved for HS image reconstruction, and current efforts mainly concentrated on designing more complicated and deeper network architectures to pursue better performance. Although impressive reconstruction results can be achieved compared with the mathematical model based methods, the existing deep learning methods have the following three limitations. 1) They are usually implemented in a fully supervised manner, and require a large-scale external dataset including the degraded observations: the LR-HS/HR-RGB images and their corresponding HR-HS ground-truth image, which are difficult to be collected especially in the HSI SR task. 2) They aim to learn a common model from training triplets, and are undoubtedly insufficient to model abundant image priors for various HR-HS images with rich contents, where the spatial structures and spectral characteristics have considerable difference. 3) They generally assume that the spatial and spectral degradation procedures for capturing the LR-HS and HR-RGB images are fixed and known, and then synthesize the training triplets to learn the reconstruction model, which would produce very poor recovering performance for the observations with different degradation procedures. To overcome the above limitations, our research focuses on proposing the unsupervised learning-based framework for HSI SR to learn the specific prior of an under-studying scene without any external dataset. To deal with the observed images captured under different degradation procedures, we further automatically learn the spatial blurring kernel and the camera spectral response function (CSF) related to the specific observations, and incorporate them with the above unsupervised framework to build a high-generalized blind unsupervised HSI SR paradigm.
Moreover, Motivated by the fact that the cross-scale pattern recurrence in the natural images may frequently exist, we synthesize the pseudo training triplets from the degraded versions of the LR-HS and HR-RGB observations and themself, and conduct supervised and unsupervised internal learning to obtain a specific model for the HSI SR, dubbed as generalized internal learning. Overall, the main contributions of this dissertation are three-fold and summarized as follows:
1. A deep unsupervised fusion-learning framework for HSI SR is proposed. Inspired by the insights that the convolution neural networks themself possess large amounts of image low-level statistics (priors) and can more easy to generate the image with regular spatial structure and spectral pattern than noisy data, this study proposes an unsupervised framework to automatically generating the target HS image with the LR-HS and HR-RGB observations only without any external training database. Specifically, we explore two paradigms for the HS image generation: 1) learn the HR-HS target using a randomly sampled noise as the input of the generative network from data generation view; 2) reconstructing the target using the
fused context of the LR-HS and HR-RGB observations as the input of the generative network from a self-supervised learning view. Both paradigms can automatically model the specific priors of the under-studying scene by optimizing the parameters of the generative network instead of the raw HR-HS target. Concretely, we employ an encoder-decoder architecture to configure our generative network, and generate the target HR-HS image from the noise or the fused context input. We assume that the spatial and spectral degradation procedures for the under-studying LR-HS and HR-RGB observation are known, and then can produce the approximated version of the observations by degrading the generated HR-HS image, which can intuitively used to obtain the reconstruction errors of the observation as the loss function for network training. Our unsupervised learning framework can not only model the specific prior of the under-studying scene to reconstruct a plausible HR-HS estimation without any external dataset but also be easy to be adapted to the observations captured under various imaging conditions, which can be naively realized by changing the degradation operations in our framework.
2. A novel blind learning method for unsupervised HSI SR is proposed. As described in the above deep unsupervised framework for HSI SR that the spatial and spectral degradation procedures are required to be known. However, different optical designs of the HS imaging devices and the RGB camera would cause various degradation processes such as the spatial blurring kernels for capturing LRHS images and the camera spectral response functions (CSF) in the RGB sensors, and it is difficult to get the detailed knowledge for general users. Moreover, the concrete computation in the degradation procedures would be further distorted under various imaging conditions. Then, in real applications, it is hard to have the known degradation knowledge for each under-studying scene. To handle the above issue, this study exploits a novel parallel blind unsupervised approach by automatically and jointly learning the degradation parameters and the generative network. Specifically, according to the unknown components, we propose three ways to solve different problems: 1) a spatial-blind method to automatically learn the spatial blurring kernel in the capture of the LR-HS observation with the known CSF of the RGB sensor; 2) a spectral-blind method to automatically learn the CSF transformation matrix in the capture of the HR-RGB observation with known burring kernel in the HS imaging device; 3) a complete-blind method to simultaneously learn both spatial blurring kernel and CSF matrix. Based on our previously proposed unsupervised framework, we particularly design the special convolution layers for parallelly realizing the spatial and spectral degradation procedures, where the layer parameters are treated as the weights of the blurring kernel and the CSF matrix for being automatically learned. The spatial degradation procedure is implemented by a depthwise convolution layer, where the kernels for different spectral channel are imposed as the same and the stride parameter is set as the expanding scale factor, while the spectral degradation procedure is achieved with a pointwise convolution layer with the output channel 3 to produce the approximated HR-RGB image. With the learnable implementation of the degradation procedure, we construct an end-toend framework to jointly learn the specific prior of the target HR-HS images and the degradation knowledge, and build a high-generalized HSI SR system. Moreover, the proposed framework can be unified for realizing different versions of blind HSI SR by fixing the parameters of the implemented convolution as the known blurring kernel or the CSF, and is highly adapted to arbitrary observation for HSI SR.
3. A generalized internal learning method for HSI SR is proposed. Motivated by the fact that natural images have strong internal data repetition and the crossscale internal recurrence, we further synthesize labeled training triplets using the LR-HS and HR-RGB observation only, and incorporate them with the un-labeled observation as the training data to conduct both supervised and unsupervised learning for constructing a more robust image-specific CNN model of the under-studying HR-HS data. Specifically, we downsample the observed LR-HS and HR-RGB image to their son versions, and produce the training triplets with the LR-HS/HR-RGB sons and the LR-HS observation, where the relation among them would be same as among the LR-HS/HR-RGB observations and the HR-HS target despite of the difference in resolutions. With the synthesized training samples, it is possible to train a image-specific CNN model to achieve the HR-HS target with the observation as input, dubbed as internal learning. However, the synthesized labeled training samples usually have small amounts especially for a large spatial expanding factor, and the further down-sampling on the LR-HS observation would bring severe spectral mixing of the surrounding pixels causing the deviation of the spectral mixing levels at the training phase and test phase. Therefore, these limitations possibly degrade the super-resolved performance with the naive internal learning. To mitigate the above limitations, we incorporate the naive internal learning with our selfsupervised learning method for unsupervised HSI SR, and present a generalized internal learning method to achieve more robust HR-HS image reconstruction.

Creators LIU ZHE

Languages eng

Resource Type doctoral thesis

File Version Version of Record

Access Rights open access