Research Article  Open Access
Shaolei Zhang, Guangyuan Fu, Hongqiao Wang, Yuqing Zhao, "Superpixel Spectral Unmixing for Hyperspectral Image Superresolution Using a Coupled EncoderDecoder Network", Journal of Sensors, vol. 2020, Article ID 8886178, 8 pages, 2020. https://doi.org/10.1155/2020/8886178
Superpixel Spectral Unmixing for Hyperspectral Image Superresolution Using a Coupled EncoderDecoder Network
Abstract
In this paper, we propose a novel hyperspectral image superresolution method based on superpixel spectral unmixing using a coupled encoderdecoder network. The hyperspectral image and multispectral images are fused to generate highresolution hyperspectral images through the spectral unmixing framework with lowrank constraint. Specifically, the endmember and abundance information is extracted via a coupled encoderdecoder network integrating the priori for unmixing. The coupled network consists of two encoders and one shared decoder, where spectral information is preserved through the encoder. The multispectral image is clustered into superpixels to explore selfsimilarity, and then, the superpixels are unmixed to obtain an abundance matrix. By imposing a lowrank constraint on the abundance matrix, we further improve the superresolution performance. Experiments on the CAVE and Harvard datasets indicate that our superresolution method outperforms the other compared methods in terms of quantitative evaluation and visual quality.
1. Introduction
With rich spectral and spatial information, hyperspectral images (HSI) have received extensive attention and have been widely used in many fields, especially for remote sensing [1, 2] and medical imaging [3]. High resolution is essential to obtain high performance in many HSI applications, such as spectral unmixing, pixelwise classification, object detection, and object tracking. However, due to hardware limitations, hyperspectral imaging requires long exposure times to ensure a high signaltonoise ratio, which leads to low spatial resolution [4]. Image superresolution is a promising way to acquire highresolution images in both spatial and spectral domains.
In recent decades, several excellent methods were proposed for HSI superresolution fusing observed lowresolution HSI and the corresponding highresolution multispectral image (MSI). There were some HSI superresolution methods based on sparse representation theory that is widely used in computer vision tasks. Akhtar et al. [4] impose a nonnegative constraint on spectral dictionary and sparse representations to improve HSI superresolution performance. Dong et al. [5] utilized structural sparsity constraint to exploit the clusteringbased sparsity. The superpixelbased sparse representation model [6] learns sparse coding upon MSI superpixel to use spatial selfsimilarity of the MSI. Han et al.’s work [7] combined nonnegative sparse representation, local similarity, and nonlocal similarity for HSI superresolution, which explores selfsimilarity in superpixel and across the entire image. However, when learning the spectral dictionary, these methods do not take into account the spectral correlation in the HSI causing severe spectral distortion in the reconstructed images.
In order to overcome this problem, spectral unmixingbased superresolution methods were proposed. Yokoya et al. [8] exploited nonnegative property while unmixing HSI and MSI. Wycoff et al. [9] further introduced sparse constraint in extracting the endmember and the abundance matrix. HSI and MSI were jointly unmixed with sparse constraint and nonnegative and abundance sumtoone constraints in [10]. Zou and Xia [11] explored spatial structure information in spectra and abundance via mutual distance and graph Laplacian regularization, respectively. These methods showed excellent superresolution performance. However, spectral unmixingbased methods generally assume that the downsampling operation from highresolution to lowresolution images is known. The assumption is not always often valid.
Inspired by the great success of deep learningbased methods in the superresolution recovery of colour images, researchers have attempted to apply the deep learning approach to HSI superresolution. These deep learningbased methods mainly learn the mapping from lowresolution images to highresolution images in a supervised way. For example, the typical models are the autoencoder, 3D convolution neural network [12, 13], and deep residual networks [14, 15]. These methods assumed that there exists consistent mapping across different image pairs. However, it is not always valid, thus causing severe spectral distortion.
In this paper, the local similarity in highresolution images is analysed, and the lowrank property is explored in HSI. Then, we propose the superresolution algorithm based on the superpixel spectral unmixing. The coupled encoderdecoder network with sparsity is employed to extract endmembers and abundance information, which naturally integrates meaningful constraints to enhance the unmixing quality. As shown in Figure 1, the shared decoder preserves endmember information for the hyperspectral image. To obtain accurate highresolution abundance information, we cluster highresolution images into superpixels, which has adaptive shape and size, and impose a lowrank constraint on the abundance matrix. Finally, we combine the endmember and the abundance matrix to reconstruct the highresolution HSI. The combination of the coupled encoderdecoder network and lowrank constraint leads to accurate endmember and abundance information so that promising HSI superresolution performance is obtained.
2. Spectral UnmixingBased HSI Superresolution
The goal of HSI superresolution is to predict highresolution HSI , where and denote image width and height in spatial dimensions and indicates the spectral dimension. The spectral unmixingbased method takes HSI and MSI as input. For notational convenience, we represent the threedimensional hyperspectral image in matrix form. Accordingly, we can get matrix , , and where . We can estimate the desired HSI , as follows: where matrix models the spatial downsampling and blurring operators. is the spectral degradation matrix depending on the hyperspectral sensor. In this paper, we assume that the spectral response function is known.
The naive estimation of the desired HSI according to (1) is computationally complex and inaccurate. Thus, spectral unmixing is used to deal with these problems efficiently. According to the linear spectral mixing model [16], we have where is the endmember matrix and represents the number of spectral bases; is the abundance matrix. In the linear mixture model, the endmember represents the reflectance spectrum. The abundance is the proportion of endmembers, and thus, they should be nonnegative. Also, each column of abundances should meet sum to one (STO) requirement, i.e., = . The fact that there are several endmembers in one pixel means that the abundance also should be sparse. Therefore, we can get endmembers and abundances according to where denotes sparse constraint.
3. Method
This section describes the proposed HSI superpixel resolution method in detail. The proposed method is based on the superpixel spectral unmixing and the coupled encoderdecoder network. As shown in Figure 1, in the proposed method, endmembers and abundance are extracted using the coupled encoderdecoder network. In order to take advantage of selfsimilarity in the highresolution image, the network extracts abundance from the superpixels of the highresolution image.
3.1. MSI Superpixel Segmentation and LowRank Constraint
The square window cannot effectively represent the complex structure of the image since space information in the natural image is usually not regular. Therefore, the traditional unmixing methods, pixelbased and patchbased methods, cannot fully utilize the spatial information of the image. The highresolution MSIs are locally selfsimilar, that is, adjacent pixels in MSI have a similar spectral response. Therefore, we cluster MSI into superpixels to extract the abundance information.
Superpixel was initially proposed by Ren and Malik [17], which groups pixels based on spectral response and other properties. According to the evaluation of 28 stateoftheart superpixel segmentation algorithms [18], we adopt the efficient topologypreserving segmentation (ETPS) algorithm [19] to obtain superpixels form the MSI. The ETPS formulates the segmentation problem with an objective function similar to means clustering, making the obtained superpixels be coherent in appearance and also have a regular shape. The objective function is a Markov energy function, and its definition is as follows: where and are the set of centres and mean positions for all superpixels. denotes the eight neighbourhoods of the pixel . encourages colour homogeneity of each superpixel, encourages that the superpixels should be regular in shape, is used to encourage the superpixel to have small boundary length, and and promote topological connections and superpixel size, respectively.
Since the pixels within a superpixel have a similar spectral response, the rank of matrix reshaped by the superpixel is low. Therefore, when a hyperspectral image is unmixed in superpixels, the lowrank constraint can be imposed. Also, the subspace where the highresolution HSI resides is the same as space spanned by the endmember matrix. Based on the fact, we impose the lowrank constraint on the abundance matrix, not on the highresolution HSI. The optimization model of the coupled network is defined as with the nonnegative and STO constrains. Here, denotes matrix nuclear norm, which approximatively replaces lowrank constraint.
3.2. Coupled EncoderDecoder Network
Motivated by the work in [20], we propose the coupled encoderdecoder network to solve the HSI superresolution problem described in (4). As shown in Figure 2, the basic structure of the coupled network is an asymmetric stack autoencoder network. During the reconstruction process of HSI and MSI, the endmember and abundance information in the image scene are extracted. The autoencoder network consists of an encoder and a decoder. The encoder maps the hyperspectral data to the lowdimensional representation layer; the decoder reconstructs the representation layer into hyperspectral data: where and denote the weights in the ^{th} layer of encoder and decoder, respectively, and denotes the activation function in the ^{th} layer.
Note that when extracting endmember from HSI through the network, the hidden layer should reflect the abundance information, which means needs to meet nonnegative requirements and sumtoone property. The ReLU function is used as an activation function in the encoder to make sure the hidden layer is nonnegative. Following [21], we make the hidden layer variables obey Dirichlet distribution, which is generated using the stickbreaking process. Let represent the hidden layer variable, then is defined as follows: where is sampled from the inverse transform of the Kumaraswamy distribution. and are obtained from the encoderdecoder network.
The bias parameter is not used in the decoder. The identity function is used as the activation function, so the parameters in the encoder have . The weights of the decoder correspond to the endmember. In this way, the deep network can extract the spectral information from HSI, and the hidden layer preserves the spatial information effectively. Similarly, the highresolution abundance information can be extracted during the reconstruction of the highresolution MSI.
Due to the limited band number of MSI, it is not ideal for unmixing MSI to generate endmember and abundance information. Since HSI and MSI reflect ground object information in the same scene, the endmember information of MSI and HSI is correlated: i.e., . Thus, the weights of the decoder are shared for both HSI and MSI, and they are frozen while learning autoencoder network parameters for MSI.
3.3. Network Structure and Implementation Detail
The regularization in (4) is nonconvex and usually replaced with regularization. However, the abundance vector should need the STO constraint. Therefore, the commonly used regularization would not guarantee whether the abundance vector is sparse. As similar to [21], we introduce the generalized Shannon entropy function as the sparsity constraint of the hidden layer. The definition of the function is as follows:
Because the latent variables of representation layers are nonnegative, we choose so that it is computationally efficient. Substituting (7) into (4), the optimization function can be rewritten as
In this paper, we solve the optimization problem via the coupled deep encoderdecoder network. Following the general practice of deep networks, the L2 norm is used on the decoder weights to prevent overfitting. The objective functions of the coupled network are expressed as follows: where , and are used for balancing the reconstruction error, sparsity constraint, weights loss, and locally lowrank constraint, respectively. and are reconstructed images, respectively, and are abundance information of HSI and MSI, respectively.
The coupled network consists of two sparse autoencoder networks to extract endmember and abundance information from HSI and MSI, respectively. The network is optimized as follows: first, the autoencoder extracts endmember information from HSI given the objective function in (9); then, the learned decoder weights are frozen and shared with the decoder of the autoencoder network for MSI. Only the encoder weights of the autoencoder network for MSI are updated according to the objective function in (10).
In the experiments, the number of endmembers is set as ten; that is, the hidden layer of the autoencoding network has ten nodes. The details of the coupled network are shown in Table 1. The autoencoder network for HSI includes three hidden layers. In contrast, the autoencoder network for MSI includes four layers and the nodes for each layer increase from five to ten. Since different HSIs describe different scenes, the changes in ground materials in different scenes may be massive. We extract spectral and spatial information from HSI and the corresponding MSI, respectively.

4. Experimental Results
Two commonly used hyperspectral datasets: CAVE [22] and Harvard [23], are employed to evaluate the proposed method. The details for the two datasets are shown in Table 2. We crop topleft pixels for Harvard in our experiments. For two benchmark datasets, the original images are scaled to . The original image in the dataset is used as ground truth. To simulate lowresolution HSI, we downsample HSI by using the average over disjoint blocks. The MSIs are synthesized using the spectral response of the Nikon D700 camera integrating the raw data along the spectral dimensions.

In order to evaluate the accuracy of the estimated highresolution HSI, four widely used quality metrics are used, including root mean square error (RMSE), peak signaltonoise ratio (PSNR), spectral angle mapper (SAM), and relative dimensionless global error in synthesis (ERGAS). For PSNR, the larger value indicates better performance, while the smaller value indicates better performance for the other three metrics.
4.1. Comparison with StateoftheArt Methods
The proposed method is compared with existing HSI superresolution methods including CNMF [8], SNNMF [9], GSOMP+ [4], BSR [24], CSU [10], NSSR [5], and uSDN [21]. In the experiment, we use the original code provided by the authors, excluding uSDN. Among them, CNMF, SNNMF, CSU, and NSSR need the downsampling function as a priori, which is generally unknown in practical. Thus, the downsampling operation in the original implementation is replaced by bicubic function, which is different from the method to simulate lowresolution HSI.
Table 3 shows the RMSE and SAM for some examples from CAVE and Harvard datasets. Table 4 shows the average RMSE, PSNR, SAM, and ERGAS results. Tables 3 and 4 show that our method outperforms all other methods and has a significant improvement on the CAVE dataset. The unmixingbased methods usually have a better SAM score. However, CNMF [8] and SNNMF [9] are not competitive on the CAVE dataset in terms of RMSE because they do not exploit constraints on endmember and abundance matrix. The sparse representationbased approaches usually have better reconstruction errors, i.e., GSOMP+ [4], BSR [24], and NSSR [5] can achieve better RMSE. However, the first two methods have a larger SAM score. The reason is that the spectral correlation in HSI is not considered. The uSDN [21] could extract accurate spectral information and spatial representations via a sparse Dirichlet network, minimizing angle similarity that leads to better RMSE and smaller SAM. Our method further explores the spatial structure information and local lowrank property to estimate abundance matrix, achieving a better reconstruction effect than other methods.


4.2. The Effects of Parameters
There are three parameters in the proposed method: , , and . Here, and are used to balance the reconstruction error and sparse constraint and lowrank constraint, and is used for the decoder weight loss. In the experiments, the parameters are set to . To select the optimal parameters, we evaluate the HSI recovery performance according to different and . We set and , respectively. We first evaluate the RMSE by adjusting while fixing the lowrank constrain weight . In similar way, we choose the appropriate weight parameter . Figure 3 shows the changed RMSE according to varied parameters. Accordingly, optimal values of and are selected and used in our experiments.
(a)
(b)
4.3. Qualitative Comparisons
Figures 4 and 5 show the recovered HSIs and the absolute difference images at wavelengths 460 nm, 540 nm, and 670 nm, for “cloth” from CAVE and “imgb8” from Harvard, respectively. The recovered images from BSR [23], CSU [10], NSSR [5], uSDN [20], and our method are compared according to their performance in Table 4. The results show that the absolute difference by our method (Figures 4(f) and 5(f)) is smaller than other methods, which indicates recovered image by the proposed method is more similar with the original image. Compared with uSDN [20] (Figures 4(e) and 5(e)), our method adopts superpixel segmentation and lowrank constraint. The smaller difference indicates better performance of superresolution performance, which means the effectiveness of our improvements.
5. Conclusion
In this paper, we propose the HSI superresolution method based on superpixel spectral unmixing with a deep network. The method combines spectral unmixing with superpixel to simultaneously exploit spectral and spatial information in the image. Due to the powerful representation capabilities of deep networks, we can preserve more accurate spectral information. Experiments on two widely used HSI datasets show that our method achieved outstanding performance over other methods. The average RMSE and SAM on CAVE dataset reach 3.69 and 6.53, respectively. In future work, we will explore spectral correlation in HSI for superresolution.
Data Availability
CAVE dataset is available from the website: https://www.cs.columbia.edu/CAVE/databases/multispectral/. Harvard dataset is available from the website: http://vision.seas.harvard.edu/hyperspec/index.html.
Conflicts of Interest
The authors declare no conflict of interests.
Acknowledgments
We would like to thank Prof. F. Yasuma for CAVE dataset and Prof. Ayan Chakrabarti for Harvard dataset. This work is jointly supported by the National Natural Science Foundation for Young Scientists of China (Grant No: 61403397, 61503389, and 61202332) and the Natural Science Foundation of Shaanxi Province, China (Grant No: 2015JM6313).
References
 J. Xia, N. Falco, J. A. Benediktsson, P. du, and J. Chanussot, “Hyperspectral image classification with rotation random forest via KPCA,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 4, pp. 1601–1609, 2017. View at: Publisher Site  Google Scholar
 X. Cao, F. Zhou, L. Xu, D. Meng, Z. Xu, and J. Paisley, “Hyperspectral image classification with Markov random fields and a convolutional neural network,” IEEE Transactions on Image Processing, vol. 27, no. 5, pp. 2354–2367, 2018. View at: Publisher Site  Google Scholar
 G. Lu and B. Fei, “Medical hyperspectral imaging: a review,” Journal of Biomedical Optics, vol. 19, no. 1, article 010901, 2014. View at: Publisher Site  Google Scholar
 N. Akhtar, F. Shafait, and A. Mian, “Sparse spatiospectral representation for hyperspectral image superresolution,” in Computer Vision – ECCV 2014. ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., vol. 8695 of Lecture Notes in Computer Science, pp. 63–78, Springer, Cham, 2014. View at: Publisher Site  Google Scholar
 W. Dong, F. Fu, G. Shi et al., “Hyperspectral image superresolution via nonnegative structured sparse representation,” IEEE Transactions on Image Processing, vol. 25, no. 5, pp. 2337–2352, 2016. View at: Publisher Site  Google Scholar
 L. Fang, H. Zhuo, and S. Li, “Superresolution of hyperspectral image via superpixelbased sparse representation,” Neurocomputing, vol. 273, pp. 171–177, 2017. View at: Publisher Site  Google Scholar
 X.H. Han, B. Shi, and Y. Zheng, “Selfsimilarity constrained sparse representation for hyperspectral image superresolution,” IEEE Transactions on Image Processing, vol. 27, no. 11, pp. 5625–5637, 2018. View at: Publisher Site  Google Scholar
 N. Yokoya, T. Yairi, and A. Iwasaki, “Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 2, pp. 528–537, 2012. View at: Publisher Site  Google Scholar
 E. Wycoff, T.H. Chan, K. Jia, W. K. Ma, and Y. Ma, “A nonnegative sparse promoting algorithm for high resolution hyperspectral imaging,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1409–1413, Vancouver, BC, Canada, 2013. View at: Publisher Site  Google Scholar
 C. Lanaras, E. Baltsavias, and K. Schindler, “Hyperspectral superresolution by coupled spectral unmixing,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3586–3594, Santiago, Chile, 2015. View at: Google Scholar
 C. Zou and Y. Xia, “Hyperspectral image superresolution based on double regularization unmixing,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 7, pp. 1022–1026, 2017. View at: Publisher Site  Google Scholar
 F. Palsson, J. R. Sveinsson, and M. O. Ulfarsson, “Multispectral and hyperspectral image fusion using a 3Dconvolutional neural network,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 5, pp. 639–643, 2017. View at: Publisher Site  Google Scholar
 S. Mei, X. Yuan, J. Ji, Y. Zhang, S. Wan, and Q. du, “Hyperspectral image spatial superresolution via 3D full convolutional neural network,” Remote Sensing, vol. 9, no. 11, p. 1139, 2017. View at: Publisher Site  Google Scholar
 C. Wang, Y. Liu, X. Bai, W. Tang, P. Lei, and J. Zhou, “Deep residual convolutional neural network for hyperspectral image superresolution,” in Image and Graphics. ICIG 2017, Y. Zhao, X. Kong, and D. Taubman, Eds., vol. 10668 of Lecture Notes in Computer Science, pp. 370–380, Springer, Cham, 2017. View at: Publisher Site  Google Scholar
 Z. Shi, C. Chen, Z. Xiong, D. Liu, Z.J. Zha, and F. Wu, “Deep residual attention network for spectral image superresolution,” in Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 214–229, Munich, Germany, 2018. View at: Google Scholar
 J. M. BioucasDias, A. Plaza, N. Dobigeon et al., “Hyperspectral unmixing overview: geometrical, statistical, and sparse regressionbased approaches,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp. 354–379, 2012. View at: Publisher Site  Google Scholar
 X. Ren and J. Malik, “Learning a classification model for segmentation,” in Proceedings Ninth IEEE International Conference on Computer Vision, Nice, France, 2003. View at: Publisher Site  Google Scholar
 D. Stutz, A. Hermans, and B. Leibe, “Superpixels: an evaluation of the stateoftheart,” Computer Vision and Image Understanding, vol. 166, pp. 1–27, 2018. View at: Publisher Site  Google Scholar
 J. Yao, M. Boben, S. Fidler, and R. Urtasun, “Realtime coarsetofine topologically preserving segmentation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2947–2955, Boston, MA, USA, 2015. View at: Publisher Site  Google Scholar
 R. Guo, W. Wang, and H. Qi, “Hyperspectral image unmixing using autoencoder cascade,” in 2015 7th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), pp. 1–4, Tokyo, Japan, 2015. View at: Publisher Site  Google Scholar
 Y. Qu, H. Qi, and C. Kwan, “Unsupervised sparse DirichletNet for hyperspectral image superresolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2511–2520, Salt Lake City, UT, U.S, 2018. View at: Google Scholar
 F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar, “Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum,” IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 2241–2253, 2010. View at: Publisher Site  Google Scholar
 A. Chakrabarti and T. Zickler, “Statistics of realworld hyperspectral images,” in CVPR 2011, pp. 193–200, Providence, RI, USA, 2011. View at: Publisher Site  Google Scholar
 N. Akhtar, F. Shafait, and A. Mian, “Bayesian sparse representation for hyperspectral image super resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3631–3640, Boston, MA, USA, 2015. View at: Google Scholar
Copyright
Copyright © 2020 Shaolei Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.