Graph-Theoretic Techniques for the Study of Structures or Networks in EngineeringView this Special Issue
Multimodal Semisupervised Deep Graph Learning for Automatic Precipitation Nowcasting
Precipitation nowcasting plays a key role in land security and emergency management of natural calamities. A majority of existing deep learning-based techniques realize precipitation nowcasting by learning a deep nonlinear function from a single information source, e.g., weather radar. In this study, we propose a novel multimodal semisupervised deep graph learning framework for precipitation nowcasting. Unlike existing studies, different modalities of observation data (including both meteorological and nonmeteorological data) are modeled jointly, thereby benefiting each other. All information is converted into image structures, next, precipitation nowcasting is deemed as a computer vision task to be optimized. To handle areas with unavailable precipitation, we convert all observation information into a graph structure and introduce a semisupervised graph convolutional network with a sequence connect architecture to learn the features of all local areas. With the learned features, precipitation is predicted through a multilayer fully connected regression network. Experiments on real datasets confirm the effectiveness of the proposed method.
The goal of precipitation nowcasting is to predict the future rainfall intensity in a local region over a relatively short period. This method plays an important role in land security and emergency management of natural calamities. Though it is very crucial, precipitation nowcasting still depends on the real-time manual analysis of meteorological observation data by forecasters.
To address this issue, researchers are increasingly trying to replace forecasters with computers, e.g., . In essence, precipitation nowcasting is a spatiotemporal sequence forecasting problem with a sequence of past meteorological observation data as input and a sequence of fixed numbers (usually larger than 1) of future precipitation as output. However, traditional optical flow-based methods have unsatisfactory performance.
Precipitation nowcasting has three long-standing problems. The first challenge is spatiotemporal correlations. Precipitation at any location is influenced by the weather condition of its neighboring region over a relatively short period. The second challenge is diversified observation data. With the progress of technology, we can obtain various meteorological or nonmeteorological observation data of a certain region, e.g., radar echo maps, satellite images, topographic map, temperature, and air humidity. However, handling these data in a unified framework is an open problem. The third challenge is semisupervision. It is generally known that the distribution of meteorological stations for observing precipitation is uneven, implying that precipitation at many locations is unknown.
Since the recent advances in deep neural networks [2, 3], their high capability in various classification and regression tasks has been successfully demonstrated, e.g., image classification by , object detection by , video representation by , and speech recognition by . There are also some studies on precipitation nowcasting using deep learning techniques. Xingjian et al.  and Shi et al.  first introduced a convolutional long short-term memory (ConvLSTM) network to capture spatiotemporal correlations, which has been shown to outperform traditional optical flow-based methods for precipitation nowcasting, indicating that deep learning models have a huge potential for solving this problem. Subsequently, Singh et al.  and Karevan and Suykens  introduced a more complicated LSTM structure, and Shi et al.  used convolutional neural networks (CNNs) to extract more efficient representations.
Although the methods above can achieve some progress, their inputs are merely weather radar echo maps, and they only focus on addressing the problem of spatiotemporal correlations, neglecting the latter two problems.
Inspired by the success of Deep Learning Vision for Nonvision Tasks by , we propose a novel multimodal semisupervised deep graph learning framework for precipitation nowcasting in this study. In contrast to previous studies, different modalities of observation data (including both meteorological and nonmeteorological data) are modeled jointly, and thus, benefiting each other. All information is converted to an image structure. Then, precipitation nowcasting is deemed as a computer vision task to be optimized. We convert all observation information into a graph structure and introduce a semisupervised graph convolutional network (GCN) with a sequence connect architecture to learn the features of all local areas for handling areas without available precipitation. With the learned features, precipitation is predicted through a multilayer fully connected regression network.
We summarize the contributions of this work as follows:(1)We introduce a novel short-term precipitation nowcasting solution by leveraging multimodal observation data (including meteorological and nonmeteorological data)(2)A unified multimodal semisupervised deep graph learning is proposed, aiming at simultaneously addressing three long-standing problems in precipitation nowcasting, i.e., spatiotemporal correlations, diversified observation data, and semisupervision(3)To the best of our knowledge, this is the first attempt to address the precipitation nowcasting problem using a GCN(4)The experimental results on the collected large-scale real dataset validate the effectiveness of the proposed solution
The rest of the paper is organized as follows. In Section 2, related work is reviewed. Section 3 presents the problem formulation and introduces the proposed semisupervised deep regression framework in detail. In Section 4, we report and analyze the experimental results. Finally, we conclude the paper and discuss future work in Section 5.
2. Related Work
In this section, we briefly review the closely related methods in two folds: deep learning-based precipitation nowcasting methods and graph-based semisupervised learning techniques.
2.1. Deep Learning-Based Precipitation Nowcasting
Xingjian et al.  formulated precipitation nowcasting as a spatiotemporal sequence forecasting problem and proposed a ConvLSTM model, which extends the LSTM by , by adding convolutional structures into both input-to-state and state-to-state transitions to solve the problem. Using radar echo sequences for model training, the authors showed that ConvLSTM was better at capturing spatiotemporal correlations than fully connected LSTM and provided more accurate predictions than the real-time optical flow via variational methods for echoes of radar algorithm by . However, the convolutional recurrence structure in ConvLSTM-based models is location invariant, whereas natural motion and transformation (e.g., rotation) are location variant in general. To address this, Shi et al.  proposed a trajectory-gated recurrent unit model that can actively learn a location-variant structure for recurrent connections. To obtain a more robust representation, Karevan and Suykens  proposed a two-layer spatiotemporal stacked LSTM model. Unlike LSTM-based methods which were used widely on sequence learning and time series prediction, Shi et al.  used recurrent dynamic CNNs to handle the spatiotemporal information of radar data.
Unfortunately, all techniques mentioned above only consider radar echo maps as the input data. In addition to radar echo maps, other types of meteorological observed data, such as satellite images, temperature, and air humidity, and nonmeteorological observation data, such as “topographic maps” are available, and these data were incorporated in this study. Notably, data from different sources are complementary. In this work, the proposed model is capable of not only handling spatiotemporal information from different observation sources simultaneously but also unifying labeled and unlabeled data.
2.2. Graph-Based Semisupervised Learning
Existing semisupervised learning methods using graph representations can be roughly be divided into two categories: methods that use some form of explicit graph Laplacian regularization and graph embedding-based approaches.
Typical graph Laplacian regularization techniques include label propagation by , manifold regularization by , and deep semisupervised embedding by . Typical graph embedding-based methods include DeepWalk by , LINE by , and node2vec by . Perozzi et al.  learned embedding via the prediction of the local neighborhood of nodes, sampled from random walks on a graph. Tang et al. and Grover and Leskovec [20, 21] extend DeepWalk with more sophisticated random walk or breadth-first search schemes.
Although the methods mentioned above have achieved significant progress, a multistep pipeline is included in these frameworks, which means that they cannot be trained in an end-to-end way. Recently, Kipf and Welling  proposed a more simplified GCN for semisupervised learning by employing a first-order approximation of spectral filters. The GCN and subsequent variants have achieved state-of-the-art results in various application areas, including multilabel classification by , zero-shot learning by , social networks by , and natural language processing by .
In this paper, graph-based semisupervised learning is utilized for precipitation nowcasting. There are two main differences between the proposed method and that by  (including its variants). First, as observation data are obtained from different sources, graphs in our framework are multimodal. In contrast, most graphs in conventional graph-based semisupervised learning methods are built by a single modality (image, text, etc.). Second, precipitation nowcasting is a regression problem in our study, whereas previous techniques focus on classification tasks.
In this section, we first define problems of precipitation nowcasting processed in this paper. Then, we introduce the proposed multimodal semisupervised deep graph learning in details.
3.1. Problem Definition
Our goal is forecasting precipitation at a place for the next several hours based on the weather conditions of its neighbors over the past several hours. Precipitation is related to many factors. As illustrated in Figure 1, according to the existing observation conditions, herein, we assume that future precipitation can be predicted via modeling the following factors: radar echo maps, air humidity images, satellite images, temperature images, and available precipitation data over the past several hours with the corresponding topographic map. In addition, as precipitation prediction of every longitude and latitude is impossible, we assume that if we can divide a certain region to many cells (that is, each cell represents a local area) then precipitation at different locations in the same cell can be deemed as almost the same.
As the factors belong to different modalities, precipitation nowcasting can be deemed as a multimodal analysis task. Moreover, from Figure 1, we can observe that precipitation in some cells may be unknown when the target area is divided into many grid cells because precipitation stations are not placed everywhere. Thus, precipitation nowcasting is also a semisupervised learning problem in this paper. Therefore, we present a multimodal semisupervised learning method to solve the precipitation nowcasting problems herein. The problem formulations are detailed below.
Let be the observation data at time , where are the radar echo maps of the target area. are the satellite images of the same area. is the topographic map of the same area. and are the temperature and air humidity image sequences from a remote sensing satellite, respectively. is the sample index before time . If the target area is divided into grid cells, the precipitation of all cells is denoted as , where (). It should be noted that will be unknown if no precipitation stations are placed in the corresponding cell. If a cell has several precipitation stations, the cell precipitation values will be the average observation values from the stations. In addition, the precipitation of time points in the future is denoted as , where () are the predicted precipitation in all local areas.
Therefore, the problem can be formulated as follows:
To address it, we propose a multimodal semisupervised deep graph learning framework herein.
3.2. Multimodal Semisupervised Deep Graph Learning
3.2.1. Graph Representation
Multimodal and heterogeneous data are represented by a graph structure. The target region of precipitation nowcasting is divided into many grid cells, and each cell represents a local area (as shown in Figure 2, the rectangular area represents the target region of the precipitation nowcasting and the blue points represent the precipitation stations). Let be the graph structure of the observation data of the target region, where nodes represent a certain local area and edges represent the relationship of two local areas. Here, red points mean the precipitation is available while the green means the precipitation is unknown.
Therefore, the first problem is vector representation of . To obtain the node and edge representations of , we utilize a convolutional autoencoder to learn the feature representation of a radar echo map, an air humidity image, a satellite image, a temperature image, and a topographic map. The convolutional autoencoder (as shown in Figure 3) architecture in our method consists of five parts: an input image, an encoder, a feature representation layer, a decoder, and a reconstructed image. It should be noted that the five convolutional autoencoder architectures have the same structure but are trained separately. Moreover, to ensure that the input images are as similar as possible to the reconstructed images, we utilize mean structural similarity (MSSIM) as our objective function in the convolutional autoencoder training step. If the input and reconstructed images are marked as and , respectively, then the MSSIM of and is formulated as follows:where represents the image similarity computed in patches and and represent the th image patch. is the structural similarity measurement of two image patches formulated aswhere and are the mean and variance of the image patch and is a small constant which avoids zero errors.
Once all convolutional autoencoders are trained, the node representations in graph can be denoted aswhere is the representation of node , and are the representations obtained from the outputs of the decoders in Figure 3.
Then, the edges in graph are denoted aswhere is the similarity of two nodes and and is the Euclidean distance of the two node representations.
Therefore, let be the graph representation of graph , which can be formulated as
3.2.2. Precipitation Nowcasting Model Learning
Based on the graph representation mentioned above, the observation data of different sampled times can be deemed as a graph sequence, as shown in Figure 4. Therefore, the precipitation nowcasting task is converted to a sequence graph modeling problem, which aims to obtain the precipitation of each node according to a series of graphs.
To address it, as illustrated in Figure 5, a multimodal semisupervised deep graph learning framework is proposed as a solution. A GCN and LSTM are used to model spatial and temporal information, respectively.
GCN contains one input layer, several propagation (hidden) layers, and one final perceptron layer by . Given an input and an adjacency matrix , the GCN conducts the following layerwise propagation in the hidden layer aswhere . = diag is a diagonal matrix with . , is a layer-specific weight matrix needing to be trained. denotes a nonlinear activation function, and denotes the activation output in the th layer.
As each row in represents a local area, the precipitation can be predicted as follows:where is a multilayer fully connected neural network. It should be noted that equation (8) is a regression formulation, which is different from the commonly used semisupervised GCN model. In the conventional semisupervised node classification, each row in represents a node, and GCN defines the final perceptron layer as to classify all nodes.
Considering that is crucial for computing equation (7), we design a fusion unit to integrate the five features into one single hidden feature representation. Here, we apply a fully connected layer using a hyperbolic tangent nonlinearity activation function. The detailed transformations are as follows:where is the transformation parameter, that is, the weights of the fully connected layer, which are learned automatically in the training. This fusion unit is capable of learning the weights for different types of the features.
It is well known that precipitation is related to not only the current weather but also meteorological conditions over the past period. To handle this temporal influence, LSTM is utilized to establish relationships between the node at time and the nodes before time , which is formulated aswhere is a LSTM-based sequence model.
The optimal weight parameters are trained by minimizing the following loss function over all labeled nodes , i.e.,where indicates a set of the labeled nodes and denotes the index of all observation graphs.
After the model training is completed, the future precipitation is easy to compute because .
In this section, we first evaluate the proposed method on a large-scale real meteorological dataset. Then, we demonstrate extensive experimental results and analysis.
The dataset used in the benchmark contains radar echo images, satellite images from 2016 to 2018, and topographic maps. The radar reflectivity images at elevation with a resolution of pixels were collected from local new-generation weather radar stations (Hefei, Bengbu, and Huangshan) and cover a local area centered in each radar station. The topographic maps were collected from Baidu maps. Infrared and vapor channel images of FengYun-2 meteorological satellite images were chosen to increase the accuracy of precipitation nowcasting. The topographic maps and satellite images have the same spatial resolution and cover the same areas. Thus, a large-scale real meteorological dataset was obtained (partial raw data can be queried from http://www.amo.org.cn/tcsj/zh/index.jsp). All data were checked by five volunteers. Any sample data from consecutive 27 h can constitute a whole training sample (this paper aims to forecast precipitation at a place for the next 3 h based on the weather conditions of it neighbors over the past 24 h. Moreover, to match the sampling time, the precipitation herein is accumulated over the past 20 min). For each local radar station, we randomly selected 5000 samples as the training set, 500 samples as the validation set, and 2000 samples as the test set.
4.2. Evaluation Metrics
We utilize mean-square error (MSE) to evaluate model’s effectiveness. The formulation is as follows:where is the number of test samples and smaller means better performance.
Furthermore, to quantitative evaluate our model’s effectiveness influenced by the time variations, we compute MSE in different sampling time periods in the test phase:where is the index of the th precipitation map.
4.3. Implementation Details
We trained the proposed model from scratch on the training set and considered the model with the smallest mean-square error on the validation set as our final model. All images were normalized to . Note that images from different sources have different resolutions. Therefore, we aligned and cropped all images before the training. We first selected two key points as anchors and resized all images based on the anchor distance of each image. Then, we cropped all images to pixels based on the radar longitude and latitude. ResNet18 was utilized to extract feature representations of all images, and the feature dimension was set to 256 (that is, the average pooling layer of the ResNet18 was replaced by a 256-d fully connected layer).
The open source deep learning framework PyTorch was employed to implement the proposed semisupervised deep regression model. The parameter initialization for all layers was Xavier, and we used the Adam optimizer with an initial learning rate of 0.001.
4.4. Results and Discussion
4.4.1. Ablation Study
Table 1 presents the mean-square errors of different time nodes of the three radar stations. We can observe that the proposed framework predicts the precipitation in the next hour well. It also clearly shows that the error increases with time. Moreover, we can see that our model has the worst performance for Huangshan. We believe that the reason is that the causes of precipitation in mountainous areas are more complicated than elsewhere.
To the best of our knowledge, this is the first attempt to model various types of the observed data in a unified network. We describe the performed ablation study below to explore the performance influenced by different factors.
The inputs have five forms: re: inputs that only include radar echo maps re + si: inputs that include radar echo maps and satellite images re + si + ah: inputs that include radar echo maps, satellite images, and air humidity images re + si + ah + ti: inputs that include radar echo maps, satellite images, air humidity images, and temperature images re + si + ah + ti + tm: inputs that include radar echo maps, satellite images, topographic map, temperature images, and air humidity images
From Table 2, we can see that all observed data used in this paper contribute to short-term precipitation nowcasting. Table 2 also shows that the topographic map usage improves the accuracy slightly, as compared to the other four factors. We think that the reason is that all other observed data are influenced by the topographic map. Therefore, the features extracted from the other factors represent the topographic map implicitly.
4.4.2. Comparison with Existing Methods
We note that some deep learning-based precipitation nowcasting methods have recently been proposed, e.g., [28–30]. However, they are all focused on supervised tasks. As we attempt to address the semisupervised automatic precipitation nowcasting problem, we compare our method with the two widely used semisupervised node representation methods, e.g., DeepWalk by  and ICA by . Specifically, the GCN utilized in the proposed framework is replaced by these two methods. The results are reported in Table 2. It can be observed that the proposed method achieves the best performance. We think that the major reason is that the proposed framework can handle the spatial (local region) relationship more effectively.
5. Conclusion and Future Work
In this paper, we propose a multimodal semisupervised deep graph learning framework for precipitation nowcasting. Particularly, multimodal factors, i.e., the radar echo maps, air humidity images, satellite images, temperature images, and available precipitation data over the past several hours with the corresponding topographic map are handled in a unified framework. In addition, we handle areas without precipitation stations, and thus, with unknown precipitation. A GCN with sequence information is presented herein this paper to model temporal and spatial information with semisupervised labels simultaneously. Using the methods above, we successfully implemented efficient precipitation nowcasting via a computer vision technique. To verify our method, we built large-scale real precipitation nowcasting datasets. Extensive experimentation demonstrated that our approach achieves superior performance to the baselines. In the future, we will explore how to embed attention mechanisms in our framework, which may improve the accuracy further.
The data used to support the findings of this study are available at http://data.cma.cn/site/index.html.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Kaichao Miao and Wei Wang contributed equally to this work and should be considered co-first authors.
This work was supported by the Anhui Provincial Natural Science Foundation (no. 2008085QF295), Scientific Research Development Foundation of Hefei University (no. 19ZR15ZDA), and Talent Research Foundation of Hefei University (no. 18-19RC54).
J. Gao, T. Zhang, and C. Xu, “I know the relationships: zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8303–8311, Vancouver, Canada, July 2019.View at: Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, Las Vegas, NV, USA, June 2016.View at: Google Scholar
J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271, Honolulu, HI, USA, July 2017.View at: Google Scholar
Z. Qiu, T. Yao, and T. Mei, “Learning spatio-temporal representation with pseudo-3D residual networks,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541, Venice, Italy, October 2017.View at: Google Scholar
C.-C. Chiu, T. N. Sainath, Y. Wu et al., “State-of-the-art speech recognition with sequence-to-sequence models,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4774–4778, IEEE, Calgary, Canada, April 2018.View at: Google Scholar
S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional LSTM network: a machine learning approach for precipitation nowcasting,” in Proceedings of the Advances in Neural Information Processing Systems, pp. 802–810, Montreal, Canada, December 2015.View at: Google Scholar
X. Shi, Z. Gao, L. Lausen et al., “Deep learning for precipitation nowcasting: a benchmark and a new model,” in Proceedings of the Advances in Neural Information Processing Systems, pp. 5617–5627, Long Beach, CA, USA, December 2017.View at: Google Scholar
S. Singh, S. Sarkar, and P. Mitra, “Leveraging convolutions in recurrent neural networks for doppler weather radar echo prediction,” in Proceedings of the International Symposium on Neural Networks, pp. 310–317, Springer, Hokkaido, Japan, June 2017.View at: Google Scholar
E. Shi, Q. Li, D. Gu, and Z. Zhao, “A method of weather radar echo extrapolation based on convolutional neural networks,” in Proceedings of the International Conference on Multimedia Modeling, pp. 16–28, Springer, Bangkok, Thailand, February 2018.View at: Google Scholar
Pechyonkin, Deep Learning Vision for Non-Vision Tasks, 2018, https://pechyonkin.me/deep-learning-vision-non-vision-tasks.
X. Zhu, Z. Ghahramani, and J. D. Lafferty, “Semi-supervised learning using gaussian fields and harmonic functions,” in Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 912–919, Washington, DC, USA, August 2003.View at: Google Scholar
M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: a geometric framework for learning from labeled and unlabeled examples,” Journal of Machine Learning Research, vol. 7, pp. 2399–2434, 2006.View at: Google Scholar
J. Weston, F. Ratle, H. Mobahi, and R. Collobert, “Deep learning via semi-supervised embedding,” in Neural Networks: Tricks of the Trade, pp. 639–655, Springer, Heidelberg, Germany, 2012.View at: Google Scholar
B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: online learning of social representations,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710, ACM, New York, NY, USA, August 2014.View at: Google Scholar
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077, International World Wide Web Conferences Steering Committee, Florence, Italy, May 2015.View at: Google Scholar
A. Grover and J. Leskovec, “Node2vec: scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864, ACM, San Francisco, CA, USA, August 2016.View at: Google Scholar
Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo, “Multi-label image recognition with graph convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5177–5186, Long Beach, CA, USA, June 2019.View at: Google Scholar
X. Wang, Y. Ye, and A. Gupta, “Zero-shot recognition via semantic embeddings and knowledge graphs,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6857–6866, Salt Lake City, UT, USA, June 2018.View at: Google Scholar
Q. Lu and L. Getoor, “Link-based classification,” in Proceedings of the 20th International Conference on Machine Learning, pp. 496–503, ICML-03, Washington, DC, USA, August 2003.View at: Google Scholar