Abstract

High-resolution meteorological satellite image is the basic data for weather forecasting, climate prediction, and early warning of various meteorological disasters. However, the poor image resolution is limited for both subjective and automated analyses. Through our investigation and study, we found that the meteorological satellite image is a kind of complex data with multimodal and multitemporal characteristics. Fortunately, based on zero-shot learning theory, the complexity of the meteorological satellite image can be used to enhance its own image resolution. In this work, we propose a novel framework called MSLp (Meteorological Satellite Loss phase). Specifically, we choose a zero-shot network as a backbone and propose a phase loss function. A mapping from low- to high-resolution meteorological satellite images was trained for improving the resolution by up to a factor of 4×. Our quantitative study demonstrates the superiority of the proposed approach over ZSSR and bicubic interpolation. For qualitative analysis, visual tests were performed by 7 meteorologists to confirm the utility of the proposed algorithm. The mean opinion score is 9.32 (the full score is 10). These meteorologists think that weather forecasters need higher-resolution meteorological satellite images and the high-resolution images obtained by our method have the potential to be a great help for weather analysis and forecasting.

1. Introduction

Meteorological satellite images can objectively reflect the information of atmospheric structure, occurrence, and development around the Earth’s surface [1]. Therefore, a high-resolution meteorological satellite image is an important basic dataset which is needed by the atmospheric science and meteorological industry. It plays a main role in the weather forecast, climate prediction, and early warning and defense of various meteorological disasters. However, due to the limitation of meteorological satellite optical system, manufacturing cost, and technological level, the spatial resolution of meteorological satellite images is not satisfied with applications [2] because it is difficult to find a small-scale atmospheric system or monitor the states and changes of a microscale atmosphere (such as air pollution sources and haze) in these low-resolution images.

Signal image superresolution (SISR) is a hot topic with great achievements in computer vision. It is defined as recovering a high-resolution (HR) image from its low-resolution (LR) observation [3]. This is a new method for high-resolution meteorological satellite images required by atmospheric science research. Furthermore, it can save the manufacturing cost of updating the meteorological satellite. However, since there are multiple solutions for any LR input, SISR is an ill-posed problem. In recent years, a variety of deep learning (DL) methods have been proposed to tackle such an inverse problem by learning mappings between LR and HR image pairs [4], such as SRCNN [5], VDSR [6], LapSRN [7], SRGAN [8], EDSR [9,10], and RCAN [11]. However, it does not work well by using these current methods directly on the problem of meteorological satellite image superresolution. The primary reason is the lack of many meteorological satellite image pairs. Therefore, meteorological satellite image superresolution is more challenging than the general SISR.

Inspired by the recent zero-shot learning theory, an unsupervised deep network needing no training data is designed in this paper. The proposed network overview is shown in Figure 1. Our SISR methods can be realized with the internal features of the LR satellite image only. Moreover, we also propose a loss function based on the phase consistency principle. The phase loss function can monitor the network training and make the network robust for external influences such as scale change and complex degradation process. To the best of our knowledge, it is the first time to use phase for SISR CNN training. Experiment results show that guiding together with our phase loss function, the neural network for meteorological satellite image superresolution can achieve better performance. The main contributions of this paper are listed as follows:Zero-shot learning-based superresolution network: we propose a zero-shot learning-based superresolution unsupervised deep neural network architecture for meteorological satellite image superresolution task.Phase loss function: we introduce phase loss which is a loss function specifically optimized for satellite image images. It can measure the difference between generated HR images and ground truth in the frequency domain. Combining this phase loss function with other spatial losses can improve image quality in terms of pixel values and texture.

In the rest of this paper, we review the related works of unsupervised deep network and loss functions in Section 2. The network architecture and the phase loss function are discussed in Section 3. In Section 4, the results of both quantitative and qualitative evaluations are presented. Finally, Section 6 concludes this paper.

2.1. Unsupervised Superresolution

Most of the existing superresolution works are supervised learning, i.e., learning the LR-to-HR mapping using matched LR-HR image pairs. However, since it is difficult to collect images of the same scene of different resolutions and the HR images are not even available, the SR models trained on these datasets are more likely to learn a reverse version of the predefined process. In order to prevent the adverse effects brought by predefined degradation, researchers pay more attention to the unsupervised superresolution method. Now, the SISR models can be used in reconstructing the degraded images in the real world. Next, we will briefly introduce several existing unsupervised SR models based on deep learning.

Considering that the structure of a CNN is sufficient to capture a lot of low-level image statistics prior to inverse problems, Ulyanov et al. [12] employ a randomly initialized CNN as handcrafted prior to perform SR. Because the network is randomly initialized and never trained on datasets, the only prior is the CNN structure itself. It shows the rationality of the CNN architectures itself and prompts us to improve superresolution by combining the deep learning methodology with handcrafted priors such as CNN structures or self-similarity. Another approach for unsupervised superresolution is to treat the LR space and the HR space as two domains and use a cycle-in-cycle structure to learn the mappings between each other. Motivated by CycleGAN [13], Yuan et al. [14] propose a cycle-in-cycle SR network (CinCGAN) composed of 4 generators and 2 discriminators. Because of avoiding the predefined degradation, the unsupervised CinCGAN not only achieves comparable performance to supervised methods but also is applicable to various cases even under very harsh conditions. In 2018, considering that the internal image statistics inside a single image is sufficient to provide the information needed for superresolution, ZSSR [15] used LR sons that are downsampled images of the given LR test image. Using this pseudopairs, they train small image-specific SR networks at test time rather than training a generic model on large external datasets. Specifically, they use a kernel estimation method [16] to directly estimate the degradation kernel from a single test image and use this kernel to construct a small dataset by performing degradation with different scaling factors on the test image. Then, a small CNN for superresolution is trained on this dataset and used for the final prediction.

In the application of meteorological satellite image processing, real-world images tend to be suffering from unknown degradation, for example, the additive noise, compression artefacts, and blurring, etc. We cannot get the exact mathematical model. To perform this unsupervised superresolution method for meteorological satellite images, some advanced strategies like architecture, active function, and loss function are also needed for reducing the training difficulty and instability.

2.2. ZSSR and Loss Function

In our research, we choose ZSSR as the backbone. ZSSR uses the deep neural network architecture for superresolving the images only based on the internal image statistics. The aim is to predict the test image from the LR image created from the test image. Hence, it does not require a training image as an input. The ZSSR has eight convolutional layers followed by ReLU consisting of 64 channels and learns the residue image using loss. Besides a good architecture of ZSSR, robust learning strategies are also needed for achieving satisfactory results. Next, we will discuss the loss function which is a promising direction of learning strategies.

In the early researches, and are usually employed as the most widely used loss function for network optimization. But latterly, it has been discovered that these pixelwise loss functions cannot take image quality into account; they often lack high-frequency details and produce perceptually unsatisfying results with overly smooth textures [1719]. Therefore, a variety of loss functions are adopted to better measure the reconstruction error. To evaluate image quality based on the perceptual, the content loss is introduced [20]. In contrast to the pixel loss, the content loss encourages the output image to be perceptually the same as the target image instead of forcing them to match pixels exactly. Specifically, it measures the semantic differences between images using a pretrained image classification network. Thus, it produces visually more perceptible results and is widely used for SISR [21, 22]. On account that the reconstructed image should have the same style (e.g., colors, textures, and contrast) as the target image, the texture loss is introduced into superresolution, based on the style representation motivation in [23, 24]. By employing texture loss, the SR model can create realistic textures and produce visually more satisfactory results. In recent years, the GANs [25] have been introduced to various computer vision tasks. To be concrete, especially in the superresolution field, Ledig et al. [20] firstly introduce SRGAN by using adversarial loss based on cross-entropy. As a matter of fact, the discriminator extracts some latent patterns of real HR images and conforms the generated HR images; this helps to generate more realistic images [26, 27]. In 2017, Yuan et al. [14] present a cycle consistency loss for image superresolution. Concretely speaking, the cycle-in-cycle approach motivated by CycleGAN [13] superresolves the LR image to the HR image by constraining image pixel-level consistency. In order to suppress noise in generated images, the total variation (TV) loss is introduced by Aly et al. [28]. It is defined as the sum of the absolute differences between neighbouring pixels and measures how much noise is in the images. In addition to the above loss functions, external prior knowledge is proposed to constrain the generation process [27]. By introducing more prior knowledge, the performance of superresolution can be further improved.

In general, different loss functions can measure one kind of errors specifically. In practical application, in order to get better superresolution results, more than one of the loss functions are combined to calculate errors. However, from the loss to the external prior loss, all these loss functions are designed in the spatial domain. No matter how to make the combinations, it is only an error measurement of spatial features. Different from these methods, we will discuss a new phase loss function in the frequency domain, which can minimize the phase difference between the generated HR image and ground truth. More details will be shown in Section 3.

3. Methods

Considering the training dataset is insufficient for meteorological satellite image superresolution task. We propose a CNN network based on zero-shot learning theory and give a new phase loss to monitor this unsupervised network training. In the following subsections, we first introduce the proposed CNN structure in Section 3.1. Then, we discuss the phase loss functions designed in our approach in Section 3.2.

3.1. Proposed CNN Architecture

Following the methods of Wang [29], we propose a neural network for meteorological satellite image superresolution. The proposed network aims to estimate the SR meteorological satellite image from its LR image only, synthesizing plausible textures while preserving the consistency with LR in content. The architecture of the proposed MSLp network inspired by the paper [15] is shown in Figure 2. Considering that the internal image statistics inside a single meteorological satellite image are sufficient to provide the information needed for superresolution, we cope with unsupervised SR by training small image-specific SR networks at test time rather than training a generic model on large external datasets. The proposed network model consists of 8 hidden layers, each has 64 channels with the activation being ReLU except the last layer.

3.2. Loss Function

Our goal is to find the sharp image from a single low-resolution meteorological satellite image. The degeneration image can be modelled as a convolution on the sharp image with a subsampled kernel.where is the known HR image, denotes the LR image, is the subsampled kernel, and is the convolution operator. This is a highly ill-posed problem, since multiple solutions for any and can lead to the same . In the Fourier domain, equation (1) corresponds to , where represents the component-wise multiplication and is the Fourier transform. The phase and amplitude of a complex number are and , respectively. We denote taking the phase of a complex signal by . Taking the inverse Fourier transform of the phase component gives the phase-only image, which can be formulated as

From the previous analysis, we know that the phase recovering is certainly helpful for SISR. Therefore, we consider that in order to make full use of the advantages of spatial and frequency domain information, we propose to jointly model the loss function in the spatial domain and frequency domain. However, how to recover the real phase in the frequency domain is related to the fact, noted in [30], that the Fourier components of an edge tend to be in-phase with each other. In this paper, we define a loss function called phase loss to measure the difference between the superresolution image generated by the deep learning network and the real high-resolution image.

The definition of our phase loss is critical for the performance of our neural network. The phase loss is calculated aswhere is the phase loss of SR image, is the downsampling factor, and are the width and height of image, is the HR image, is the LR image, and and are the phase-only image which is described by equation (2), respectively.

In this paper, in addition to the phase loss function described so far, we also use spatial loss, such as pixel loss and TV loss in our MSLp network. Therefore, the loss function of our work iswhere is the spatial loss and is the phase loss. are the coefficients to balance different loss terms. In our experiment, will be replaced by and combinations. In our work, is the pixel loss in the spatial domain calculated as is the total variation loss formulated byand is the total variation; the form is .

In our experiments, we will discuss the effect of this phase loss and other spatial loss functions in meteorological satellite image superresolution. Moreover, we will make a mechanism study based on our proposed network to further explain the phase loss effectiveness and give the best loss combination in the spatial and frequency domain.

4. Experiments

4.1. Data Preparation

We chose a portion of the NSMC (National Satellite Meteorological Centre) dataset for training and testing. This dataset provides visible and infrared images every hour, and the portion we used consists of 500 images of it. In our experiments, we first use a kernel estimation method [16] to directly estimate the degradation kernel from a single test image and use this kernel to construct a small “training set” by performing degradation with different scaling factors on the test image. Since the above “training set” consists of one instance only (the test image), we employ data augmentation on the test image to extract more LR-HR example pairs. The augmentation is done by downscaling the test image to many smaller versions of itself. These play the role of HR supervision and are called “HR fathers.” Each of the HR fathers is then downscaled by the desired SR scale factor to obtain the “LR sons,” which form the input training instances. The resulting training set consists of many image-specific LR-HR example pairs. Our network can then stochastically train over these pairs.

4.2. Investigation of Phase Loss

Several spatial loss types have been investigated to guide the MSLp network convergence. The L1-norm pixel loss, the TV loss, and the phase loss are the main components of the loss function as described in the previous sections. The overall loss function of our network is

is the pixel loss between the target HR image and the SR image in the spatial domain described in equation (5). represents the gradient amplitude summation of the SR image, described in equation (6). The variable dynamically modulates the influence of the pixel loss on the overall loss. Similarly, controls the noise to keep the SR image smooth. The variable controls the importance of the phase loss to evaluate whether the recovery of high-frequency information is correct. We investigated the effect of the loss choices for the MSLp network. Quantitative results are listed in Table 1. Adding to spatial loss can raise the PSNR and SSIM value while combining together will give better results. It shows that joining the frequency loss into spatial loss will get a beneficial result when training an unsupervised superresolution neural network. In our experiments, the variables , , and can be adjusted. In Table 1, we give the best combinations when standard quantitative measures such as PSNR and SSIM reach the highest measures.

4.3. Qualitative Evaluation

We trained and tested our MSLp network on TITAN X NIVIDA GPU and Intel (R) Core (TM) i7-4790 3.6 GHz with 16 GB DDR3 1600 RAM. The net models are trained using Pytorch 0.4.1. The network input is interpolated to the output size and blurred with the estimated degradation kernel. As done in previous ZSSR-based SR methods, we only learn the residual between the LR and its HR pairs. We use loss with Adam optimizer. We start with a learning rate of 0.001. We periodically take a linear fit of the reconstruction error and we change the learning rate by 0.05∼1.5.

4.3.1. Objective Evaluations

Peak signal-to-noise ratio (PSNR) is a full-reference image quality measure of pixelwise similarity. Higher PSNR values represent greater pixelwise similarity. It is a widely used image quality metric, and it should be considered when comparing performances with other models. Structural similarity index (SSIM) is a well-known full-reference quality metric that takes into account the structural composition of pixels. By making use of contrast, luminance, and structure values of images, SSIM aims to measure perceived quality by the human visual system, and it gets values between 0 and 1. As it can be seen from Table 1, the MSLp network overperforms ZSSR and bicubic interpolation in terms of PSNR and SSIM. We compared the performance of the MSLp with the state-of-the-art ZSSR and bicubic interpolation approaches. Figure 3 shows the SR results of the meteorological satellite visible images. The row is, from left to right, degraded images estimated from the “blur estimation network,” bicubic, ZSSR, MSLp, and the ground truth. Figure 4 shows the SR results of the meteorological satellite infrared images.

4.3.2. Mean Opinion Score Evaluations

Mean opinion score is a survey-based subjective image quality metric in which humans participants are asked to choose a score regarding a set of specified features (e.g., sharpness). Some SR models might have low PSNR and SSIM scores whereas the perceptual quality of HR images could be still high, in which case MOS testing should be employed. Specifically, we asked 7 meteorologists to assign an integral score to 10 output images from 1 (bad quality) to 10 (excellent quality) for each of 3 comparison metrics and reconstructed images, respectively. MOS results for all reference methods on the realistic meteorological satellite image are listed in Table 2. We performed a MOS test to subjectively quantify the sharpness, suitability for weather analysis, and detail level of the reconstructed superresolution satellite images of different approaches. In these two evaluation indicators, detail level and sharpness, the mean of MSLp is maximum while the Std of MSLp is minimum. This shows that most meteorologists are more satisfied with our results, and there is little deviation between them. The new method results are more acceptable than all the reference methods.

5. Conclusions

We have introduced an unsupervised deep superresolution network MSLp specifically designed and optimized for meteorological satellite images. The proposed network combines spatial and frequency loss functions by network training. Our investigation on network structure indicates that the proposed unsupervised deeper networks are more beneficial in meteorological satellite image superresolution tasks. We demonstrate the performance of MSLp using quantitative and qualitative tests for (4×) upscaling factor and compared it with the state-of-the-art reference methods. The test results prove that reconstructions of MSLp are perceptually more plausible in terms of weather analysis applications. Last but not least, our method showed promising results in meteorological satellite image unsupervised superresolution, implying its potential extensibility to other unsupervised low-level vision tasks.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request. The other supplementation used to support the findings of this study are supplied by all the authors under license and so cannot be made freely available. Requests for access to these data should be made to the corresponding author.

Conflicts of Interest

All authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (grant number 61802199).