Progressive Spatial–Spectral Joint Network for Hyperspectral Image Reconstruction

(Hyperspectral image reconstruction with progressive joint spatial-spectral networks)

(☆☆☆☆☆☆☆ learning to build HS from MS ☆☆☆☆☆☆☆)

Hyperspectral (HS) images are widely used to identify and characterize targets in scenes of interest with high acquisition cost and low spatial resolution. Acquiring high spatial resolution HS images (HSI) by spectral reconstruction of high spatial resolution multispectral (MS) images is an inexpensive method. In this paper, we propose a progressive spatial-spectral joint network (PSJN) to reconstruct the HSI of MS images.The PSJN consists of the2-D spatial feature extraction module、3-DProgressive spatial-spectral feature construction module and spectral post-processing moduleComposition.PSJN takes full advantage of theShallow spatial features extracted by 2D spatial feature extraction module and 3D asymptotic null spectrum feature construction moduleExtracted null spectral features. 3-D progressive spatial-spectralThe feature construction module is designed to be used to localize spectra from the local space of theSpatial-spectral information is extracted and constructed in a pyramidal structure from a few bands to multiple bands. In addition, for images with poor original spectral reconstruction, a proposedNetwork update mechanism to improve spectral reconstruction.. Experimental results on three HS-MS datasets and one MS dataset validate the effectiveness of the proposed method.

present (sb for a job etc)

Hyperspectral imagery (HSI) provides rich and diverse spectral information from scenes with hundreds or thousands of narrow spectral bands. The rich spectral features help to distinguish different targets. Therefore, HSI is widely used for classification, target detection, eigenimage recovery, change detection, spectral unmixing and scene segmentation. Individual spectral bands have low energy due to narrow spectral bands. Therefore, hyperspectral sensors need to expand the instantaneous field of view to obtain a reasonable signal-to-noise ratio, which results in limited spatial resolution. In addition, due to the very limited number of HS satellites, the revisit time of satellite-borne HS data is very long. In contrast, multispectral satellites have higher spatial resolution (mostly less than 10 meters) and shorter revisit time for satellite networks. Some scholars have attempted to add HSIs with different phases or different sensors to the already used datasets through migration learning methods. However, the available HSI data are still too few.
The researchers attempted to set one resolution and improve another in order to resolve the contradiction between the various resolutions (essentially between spectral and spatial resolution) in the multimodal HS term. Initially, the researchers focused on improving the spatial resolution of the HSI. By fusing high spatial resolution MS images with low spatial resolution HSI or other relevant information, many effects have been achieved to recover images with high spectral and spatial resolutions. However, due to the long review cycle of HS satellites, it is challenging to collect well-documented HSI and MSI at any given time and meet our needs.
Given the advantages of short revisit time and high spatial resolution of MS images, theRecovering the lost spectral information of MS images is also an effective way to resolve the contradiction between spectral and spatial resolution, known as spectral reconstruction or spectral super-resolution. Spectral reconstruction establishes a mapping from a few bands (3 or 4-20) to a large number of bands (〉100). This inverse process is a serious discomfort problem. Many spectral reconstruction methods have been proposed to solve this problem.
Early researchers used shallow mapping based on principal component analysis (PCA) or pseudo-inverse (PI). Mapping methods based on sparse dictionary learning emerged due to more prior knowledge. With the development of deep learning, CNN or GAN based methods are used for spectral reconstruction. The powerful feature representation and mapping relation construction capabilities of deep learning methods have achieved great success in spectral reconstruction.
However, spectral reconstruction models based on deep learning methods such as CNN and GAN still have some shortcomings. On the one hand, most of the existing spectral reconstruction models are applied to ground images, using a large number of up-sampling, down-sampling and non-local attention structures. Due to the large scale, large number and complex structure of remote sensing image features, it is difficult for these structures to work well in the spectral reconstruction of remote sensing images. On the other hand, remote sensing HSI has high spectral resolution and relatively low spatial resolution. The spectral correlation between remote sensing image bands is stronger than the spatial correlation, and more attention should be paid to the continuity of neighboring spectra. However, existing 2-D-CNN-based models are difficult to express the spectral continuity between neighboring spectra.
To address these issues, we propose a progressive spatial-spectral joint network (PSJN) that combines spatial and spectral features of 2-D-CNN and 3-D-CNN.The PSJN consists of a spatial-spectral feature construction module and a spectral post-processing module.The spatial-spectral feature construction module combines a 2-D spatial feature extraction module and a 3-D asymptotic spatial-spectral feature construction module.. The asymptotic spatial-spectral feature extraction module, which comprises the 3-D asymptotic spatial-spectral feature construction module, generates the high-frequency spectral information and the low-frequency spectral information layer by layer. With the above structure, HSI can be reconstructed from MS images with high accuracy.

dedicate

1) A PSJN combining spatial features of 2-D-CNN and spatial-spectral features of 3-D-CNN is proposed for spectral reconstruction.The 2-D network extracts the shallow spatial information of MS images.The 3-D network is used to construct localized spatial-spectral correlation features step by step. Combining spatial and spectral features allows accurate reconstruction of HSI.
(2) A spectral dimensional progressive pyramid structure is proposed to gradually recover the spectral information from low to high frequencies by multilayer stacking. 3-D convolution combines the same spatial location and different depth features of the spectrum in the 3-D progressive module of the pyramid structure.
3) A network update mechanism is proposed to improve the spectral reconstruction for images with poor original spectral reconstruction. We evaluated the quality of the SR method on three HS-MS datasets and one MS dataset in terms of both similarity and classification performance evaluation. Extensive experiments demonstrate the superiority of the PSJN model in HS reconstruction.

Related work

Early researchers expected to find a mapping matrix that could directly express the correlation between MS images and HSI. Typical methods are Principal Component Analysis (PCA)-based spectral reconstruction, Wiener Estimation (WEN)-based spectral reconstruction, and Pseudo Inverse (PI)-based spectral reconstruction. In the past five years, spectral reconstruction methods have been divided into two branches: knowledge-driven spectral reconstruction methods based on sparse dictionary learning and data-driven spectral reconstruction methods based on deep neural network learning.
Sparse dictionary based spectral reconstruction methods expect to represent MS images and HSIs by the same weight matrix and different dictionaries.The corresponding HSIs can be obtained using the HS dictionary and the weight matrices obtained from the MS dictionary and the images.Based on this basic idea, many deformation methods utilizing different a priori information have been proposed by researchers.
Arad and Ben-Shahar obtained an overcomplete dictionary from HSI by the K-SVD algorithm and a low-dimensional dictionary by projection.RoblesKelly dealt with convolutional features by dictionary learning.Wu et al. improved ARAD’s sparse dictionary method by introducing an “A+” based approach.Jia et al. proposed a two-step mapping method using manifolds as intermediate structures.Han et al. used a spectral library as auxiliary information to construct a sparse dictionary.Yi et al. reconstructed spectra by combining sparse dictionary and demixing models.Yi et al. used the spectral library as auxiliary information. A two-step mapping method using manifolds as intermediate structures.Han et al. constructed sparse dictionaries using spectral libraries as auxiliary information.Yi et al. reconstructed spectra by combining sparse dictionaries and demixing models.Gao et al. learned low-rank sparse dictionaries on HS and MS images, respectively.
Deep learning based spectral reconstruction methods wanted to fit the inverse mapping by constructing a suitable deep neural network.Nguyen et al. attempted to fit the inverse mapping through a narrow RBF network. CNN based spectral reconstruction appeared later in 2017.Galliani et al. first introduced CNN mapping from low-dimensional images to high-dimensional images.Xiong et al. proposed HSCNN which utilizes HSI reconstructed from inputs and outputs of spectrally supersampled images.Alvarez-Gila et al. used a generative adversarial network to reconstruct spectra. Paul and Kumar proposed a spectral reconstruction idea where features of a region are extracted to construct the spectrum of the centroid and applied it to remote sensing images. Similar to spectral reconstruction, related deep learning techniques are also used for compressed sensing.
In 2018, Arad et al. organized the first spectral image reconstruction challenge named NTIRE 2018 Spectral Reconstruction Challenge in which all the participants proposed deep learning methods.Shi et al. developed deep Resnet structures and deep Densenet structures named HCSNN-R and HSCNN-D respectively and won the first space of the challenge. The second challenge for spectral reconstruction was held in 2020. In NTIRE 2020 and what happened in 2018, all participating teams were using deep learning methods. li et al. used the Adaptive Weighted Attention Network (AWAN) to win first place on the “clean” track, Zhao et al. used the Hierarchical Regression Network (HRNet) to win the The first place in the “real world” track was won by Zhao et al. using a hierarchical regression network (HRNet).

methodologies

Problem Formulation

Both HS and MS images are subsamples of a continuous spectral image.HSI retains more information than MS images.MS images lose much of the spectral information during sampling by the MS sensor.The MS sensor at each band of theObservations are related to the continuous spectrum of the MS sensor and the value of the spectral response function (SRF). Let L and L_Mare continuous and MS signals, respectively.L and L_MThe relationship between can be characterized by the SRF of the MS sensor g.
[Hyperspectral images: reconstruction by spatial-spectral]
where i is L_Mof the designated band, g_iis the SRF of the MS sensor in band i, λ is the wavelength, and the range of values of λ depends on the sensor and i.
Since ∫g_i(λ)dλ is a fixed value for a specified frequency band for a specified sensor, so we can describe Eq. (1) by normalizing the SRF:
[Hyperspectral images: reconstruction by spatial-spectral]
In fact, the above integral formula is usually approximated by a discrete shape. Therefore, (2) can also be written as:
where N is the number of points sampled by the SRF of the MS sensor.
For example, the SRF of GF1 covers the bands from 400 to 1000 nm and the N value of GF1 is 601.The observations in each band of GF1 can be considered as 601 bands on theConvolutional samples of actual continuous spectral radiance brightness with SRF of length 601。
Due to the high spectral resolution of the hyperspectral interferometer and the narrow bandwidth of the hyperspectral sensor SRF, the hyperspectra can be interpolated to obtain a continuous spectrum. On this basis, with the help of the MS sensor SRF, theMS images can be obtained in continuous spectral images obtained by HSI interpolation. Thus, the SRF of a specified band from HSI to MS images can be obtained by interpolating the weights of the HS band and the SRF of the MS sensor –g_HM [Hyperspectral images: reconstruction by spatial-spectral]
where L_His the HS signal, B_HIndicates the number of bands.
Therefore, (4) is equivalent to
The SRF of GF1 and the normalized SRF of specific bands from GF5 to GF1 are shown in Figure 1.
The spectral reconstruction from MS images to HSI is actually an inverse mapping of (5).G_HMThere is no left inverse matrix because it is a row-full rank matrix. When L_MWhen fixed, L_HAn infinite number of solutions exist. Therefore, it is not possible to directly find the inverse mapping matrix to accomplish spectral reconstruction based on the SRF of the MS sensor. The inverse mapping itself is intrinsically inappropriate.Arad and Ben-Shahar demonstrated the feasibility of an unsuitable fixed transform for HS signal sparsity in natural scenes.
In HSI of natural scenes, there are spatial continuity and spectral continuity between local image elements and local bands. Reasonable utilization of spatial continuity and spectral continuity realizes spectral reconstruction from MS images to HSI.
However, the reconstruction may fail if it is encountered that the data distribution of the test MS image is completely inconsistent with the data distribution of the MS image in the training set. In this case, the update mechanism can be used for spectral reconstruction.
[Hyperspectral images: reconstruction by spatial-spectral]

Based on the above analysis, the overall workflow of this paper is shown in Fig. 2. First, by overlapping HSI and MS imagesConstruction of HS-MS sample pairs. Second, with the designed network PSJNReconstructing the HS Sample. The loss is calculated based on the difference between the reconstructed HS samples and the real HS samples to constrain the network to be trained in a more accurate direction. Finally, the trained network is utilized for spectral reconstruction of MS images to HSI.
During the spectral reconstruction process, the reconstruction effect of HSI can be self-evaluated, and the network can be updated for the samples with poor reconstruction effect, so as to achieve a better spectral reconstruction effect.

Network Architecture

[Hyperspectral images: reconstruction by spatial-spectral]
As shown in Fig. 3, our proposed PSJN consists of three parts: a 2-D spatial feature extraction module (2-D SFEM), a 3-D progressive spatial-spectral feature construction module (3-D PSFCM), and a spectral post-processing module. The transmission features have four dimensions in the 2-D spatial feature extraction module:Quantitative dimension, spectral dimension(channel dimension for 2-D convolution) andTwo spatial dimensions。
In the 3-D asymptotic spatial-spectral feature construction module, the transmission feature has five dimensions:Quantitative dimension, characteristic dimension(channel dimension for 2-D convolution), spectral dimension and two spatial dimensions. Given the I_M ∈ N × M × X × Y (N is the number of samples, M is the number of MS bands, and X, YX, and Y are the dimensions of the spatial dimension) as MS input.Output of 2-D spatial feature extraction moduleis [Hyperspectral images: reconstruction by spatial-spectral]
where H_2-D SFEM denotes two-dimensional spatial feature extractionSystem response of the module，I_2-D ∈ N × H × X × Y (H is the HS band number).
The output of the other channel 3-D PSFCM in the network is
where H_3-D PSFCM denotes the system response of the 3-D asymptotic null spectrum characterization construction module, I_3-D ∈ N × 1 × H × X × Y (1 is the feature dimension).
After compressing the extra second dimension, the output becomes I_3-Dsq ∈ N × H × X × Y。
The spectral post-processing module of theThe input is the sum of the 2D SFEM and 3D PSFCM outputs. The spectral post-processing module of theThe output is a sample of the recovered HS
[Hyperspectral images: reconstruction by spatial-spectral]
where H_SEIndicates the system response of the spectral post-processing module.

2-D Spatial Feature Extraction Module

[Hyperspectral images: reconstruction by spatial-spectral]
Figure 4 illustrates that the 2-D SFEM consists of three parts:Ascending, feature processing and descending layers。
dimensionalityThe ascending layer consists of a 2-D convolutional layer and an activation layerThe data channel is used to convert theThe number is boosted from the number of raw MS samples to the specified spectral dimension(256 used in the experiment).
hallmarkThe processing layer consists of a number of residual channel attention modules and a feature splicing module.Success.
[Hyperspectral images: reconstruction by spatial-spectral]
As shown in Fig. 5, the residual channel attention module consists of a residual block and a channel attention block. Two 2-D convolutional layers and two parameter corrected linear unit (PRelu) layers alternate in the residual block.

The channel attention block uses a squeeze-excite (SE) structure. As shown in Fig. 6, the SE block consists of a globally averaged pooled layer, two fully connected layers, a Relu layer, and a Sigmoid layer.
The SE block improves cartographic representations generated by explicitly modeling interdependencies between element channels.
The 2-D spatial feature extraction module has sufficient depth to fully extract features by stacking residual channel attention modules. It will beFeature mappings from dimension ascenders and channel attention blocks are combined on the channel dimension, to combine features from different depths.
The dimensionality reduction layer consists of a 2D convolutional layer and an activation layer, which is used to downscale the number of data channels to the true number of HS bands.

3-D Progressive Spatial–Spectral Feature Construction Module

The 3D PSFCM consists of several progressive spatial-spectral feature extraction modules (referred to as progressive modules) and a downscaling layer. The structure of the progressive modules is shown in Fig. 7.
[Hyperspectral images: reconstruction by spatial-spectral]
There are two inputs and one output in the progressive module.The inputs are the original MS samples in 4-D and the outputs of the upper asymptotic module in 5-D. The output is 5-D data with multiplicative spectral dimensions。
The progressive module consists of five parts:Dimension Ascent Layer, Feature Extraction Layer, Cascade in Feature Dimension, Summation Layer, and 3-D Transposition Convolution Layer。
Each progressive module has its assigned spectral dimensionThe spectral dimensions start with a dimension slightly above the number of MS bands and are gradually multiplied until the final asymptotic module exceeds the number of HS bands. The ascending dimension layer is used toBoosts the spectral dimension from the MS band number to the specified dimensionthat consists of a convolutional layer and an activation layer.
Similar to the 2-D module, the feature extraction layer consists of several residual attention modules used to extract features at different depths. It should be noted that the residual attention module consisting of theThe features extracted from the different progressive blocks are also based on their respective specified spectral dimensions. In addition, subject to the constraints of subsequent 3-D convolution and 3-D transposed convolution, the captured features themselves areThe order of the spectral dimension is directly related to the true spectral order.. After feature extraction, joins are used to unite feature maps at different depths. Compared to a 2-D SFEM that combines feature maps in spectral dimensions, the3-D PSFCM combining feature maps in feature dimensions (added dimensions). After the combination, the dimensionality of the spectral dimension remains the same. At the same time, theThe dimensionality of the data has also changed from the original four dimensions to five dimensions. The advantage is that the previously extracted feature maps are constrained to be correlated at specified points (same spatial dimension and same spectral dimension). As shown in Fig. 8, by stacking features on the feature dimension, the output of the 3-D convolution is correlated only with neighboring spectral features and not with distant spectral features. The spectral continuity is bounded by the local processing on the spectral dimension. [Hyperspectral images: reconstruction by spatial-spectral]
The cascaded data is united with the output of the previous level of the progressive module by direct summation in the same dimension. The last part of the progressive module is the transposed convolutional layer. The addition obtained in the previous step is put into the 3-D transposed convolutional layer (with a spectral dimension span of 2) to obtain an output with doubled spectral dimension. The last part of the progressive module is the transposed convolutional layer. The addition obtained in the previous step is put into the 3-D transposed convolution layer (with a spectral dimension span of 2) to get the output with doubled spectral dimension.
Through several progressive blocks, theThe number of dimensions of the spectral dimension increases beyond the number of HS bands. The final asymptotic block no longer contains the transposed convolutional layer, but rather the3D convolutional layer containing HS bands that reduce the spectral dimension to the same number of bands。

Loss Function and Others

The reconstructed HSI should have two characteristics.
One is that the reconstructed HSI should be the same as the real HSISufficiently similar. Another approach is through the SRF of the MS sensorRestore the reconstructed HSI to the original MS image。
Based on these two properties, the loss function of the PSJN network contains two parts:Difference measurements between reconstructed HSI and real HSI, and between reconstructed MS images and real MS images
[Hyperspectral images: reconstruction by spatial-spectral]

Two loss functions are combined by linear summation
[Hyperspectral images: reconstruction by spatial-spectral]
Considering the difference metric between the reconstructed MS image and the real MS image during the training process improves the spectral reconstruction. However, considering that the distribution of the training samples may not coincide with the distribution of the real MS image, the reconstructed MS image may still be inconsistent with the MS image of the test samples. In this case, the difference can be used as an indirect measure for tuning the network. Specifically, we designed two sample blocks of update evaluation metrics. One is the Root Mean Square Error (RMSE) sample MS and the sample MS between reconstructions, and the other is the number of pixels whose absolute difference between MS pixels and reconstructed MS pixels exceeds a specific threshold. If either of these exceeds the limit, the distribution difference is considered too large and needs to be adjusted.
For blocks with large distributional differences, the current MS samples are used as training samples and the RMSE is used as loss for retraining. After 20 iterations of training or reaching the evaluation metric, the training process stops and the current retraining network reconstructs the MS samples.
The dimensional settings of the spectral dimensions of each layer of the PSJN structure are listed in Table I. [Hyperspectral images: reconstruction by spatial-spectral]
Considering the large amount of computation time brought by 3-D convolution, we also propose a simplified form called narrow PSJN (NPSJN).NPSJN reduces the spectral dimension of 3-D PSFCM in a holistic way, which reduces the amount of computation brought by 3-D convolution, and thus reduces the computation time.

[Hyperspectral images: reconstruction by spatial-spectral]