Tales of Two Societies: On the Complexity of the Coevolution between the Physical Space and the Cyber Space
View this Special IssueResearch Article  Open Access
Complexity to Forecast Flood: Problem Definition and Spatiotemporal Attention LSTM Solution
Abstract
With significant development of sensors and Internet of things, researchers nowadays can easily know what happens in physical space by acquiring timevarying values of various factors. Essentially, growing data category and size greatly contribute to solve problems happened in physical space. In this paper, we aim to solve a complex problem that affects both cities and villages, i.e., flood. To reduce impacts induced by floods, hydrological factors acquired from physical space and datadriven models in cyber space have been adopted to accurately forecast floods. Considering the significance of modeling attention capability among hydrology factors, we believe extraction of discriminative hydrology factors not only reflect natural rules in physical space, but also optimally model iterations of factors to forecast runoff values in cyber space. Therefore, we propose a novel datadriven model named as STALSTM by integrating Long ShortTerm Memory (LSTM) structure and spatiotemporal attention module, which is capable of forecasting floods for small and mediumsized rivers. The proposed spatiotemporal attention module firstly explores spatial relationship between input hydrological factors from different locations and runoff outputs, which assigns timevarying weights to various factors. Afterwards, the proposed attention module allocates temporaldependent weights to hidden output of each LSTM cell, which describes significance of state output for final forecasting results. Taking Lech and Changhua river basins as cases of physical space, several groups of comparative experiments show that STALSTM is capable to optimize complexity of mathematically modeling floods in cyber space.
1. Introduction
As more sensors are applied to acquire variant data from physical space, researchers try to build a corresponding cyber space to describe inherent mathematical relationship between sensor acquired factors and results, which provides users a great deal of convenience to find novel solutions for problems in the real world. However, mathematical and technical complexity and challenge rise in both procedures, i.e., transforming problemrelated data from physical space to cyber space, and utilizing models to solve problems in cyber space. Inspired by datadriven and artificial intelligent idea to solve problem in physical space, we intend to smartly solve the problem of flood forecasting by means of complexity modeling and optimization.
Flood often occurs with sudden and devastating nature, causing huge life and economic losses to human society. Therefore, it is of significance to forecast flood disasters in advance. To minimize impacts brought by floods, researchers have proposed quantity of methods to accurate forecasting in the past decade [1, 2]. Based on core ideas to forecast, we divide their proposed methods into two categories: physical model [3, 4] and datadriven model [5, 6]. Physical model explains hydrological procedures with conceptual math equations, such as rain, evaporation, and flow concentration. Afterwards, a highly nonlinear function system is constructed to model complex flood process from hydrological clues to result of large runoff values. We can find carefully designed physical models in works of Fan et al. [7] and Pontes et al. [8, 9], where their models well fit in special areas to handle complexity of flood forecasting.
Datadriven model directly models mathematical interactions between different hydrological factors and runoff values based on historical observations. In other words, datadriven models learn mapping between flooding cues and flow rates without considering detailed physical processes, which is the main difference between physical models and datadriven models [10]. Due to rapid development of machine learning technology, many novel datadriven forecasting methods have been proposed and practiced, including Bayesian network [1], SVM model [11], neural network [6], and their variations and integrations.
It is noted that both physical and datadriven models are sensitive to their internal parameters, which requires both quantity of convinced data and great deal of manual efforts from researchers to adjust. In other words, the main difficulty to apply models on small or medium river basins lies in the fact of insufficient data to support accurate forecasting. Moreover, small or medium river basins are generally short of special research in most developing countries, which leads to difficulties in designing appropriate physical models for forecasting. Based on all these discussions, we aim to optimize complexity to forecast flood in small or medium river basins by methods of datadriven models.
Recently, deep learning structures gained lots of attentions by their significant classification and regression results on visual tasks of text detection, object categorization, language translation, and so on. Following the big progress of deep learning technology on accuracy, we concentrate on LSTM network for the goal of more accurate flood forecasting results. Essentially, LSTM is developed on the basis of Recurrent Neural Network (RNN), which could handle longtime sequential data with special designs of gates. Based on its high potential property to forecast timevarying variables, we apply LSTM to model inherent and complex relationship between hydrological factors and runoff values.
Attention mechanism is widely used in prediction tasks, due to its novelty to borrow idea from human visual attention, namely, that humans purposely view parts of environment or picture with the context information or advanced semantic knowledge. Inspired by core ideas of attention mechanism, we study flood formation of small and medium river basins by firstly gathering related hydrological data of different locations and timings. Afterwards, a novel LSTM structure embedding a spatiotemporal attention module, named as STALSTM, is constructed to dynamically select hydrological features for accurate forecasting.
STALSTM takes advantages of original LSTM structure, which is capable to handle longtime timevarying data. Meanwhile, the embedded spatiotemporal attention module dynamically assigns spatialwise weights for input hydrological factors acquired from different locations at first. After that, it allocates temporalwise weights to hidden output of each LSTM cell, which is comprehended as context information in Liu et al. [12]. Such spatial and temporalwise weights result in capability to dynamically characterize significance of hydrological factors obtained at any timings and locations. According to case study experiments in Europe Lech and China Changhua river basins, STALSTM could realize accurate flood forecasting by constructing contexturebased weighting schemes.
We conclude two contributions of this paper as follows:(i)Facing complexity of modeling cyberphysical interaction, a novel LSTM model embedding spatialtemporal attention module is proposed, which is capable of accurately predicting runoff values in cyber space based on hydrological data acquired from physical space.(ii)We design a novel temporal attention module, which is built on contextual information to compute weights for each LSTM cell output. Incorporating with both spatial and temporal context information, the proposed attention module helps describe how hydrology factors interact to form flood in physical space and appropriately builds such process in cyber space by constructing weighting schemes in STALSTM.
2. Related Work
In this section, we introduce related methods with two categories, i.e., datadriven model for flood forecasting and introduction to attention model.
2.1. DataDriven Model for Flood Forecasting
Early, Juliang et al. [13] propose an accelerated genetic algorithm (AGA). Their method utilizes a Back Propagation Neural Network (BPNN) to optimize initial parameters, which brings advantages of better and faster convergence performance. Inspired by development of support vector machine (SVM), Yu et al. [11] compare performance on flood forecasting between artificial neural network (ANN) and SVM. After performing a number of comparison experiments, they draw a conclusion that SVM is slightly better than ANN in forecasting floods. Later, Minghua et al. [14] emphatically compare experimental results achieved by Xinanjiang model (a famous physical model) and ANN model. Afterwards, they conclude that ANN could reflect timevarying characteristics of hydrological process, which is an advantage by comparing with abstract representation of hydrology process in Xinanjiang model.
After analyzing various flood forecasting models, Cheng et al. [15] utilize quantum particle swarm optimization method to solve complexity of defining parameters for ANN, which is later examined by experiments to forecast daily runoff values of reservoirs. Lima et al. [16] conduct flood frequency analysis with a hierarchical Bayesian framework, which estimates Generalized Extreme Value (GEV) distribution parameters in a local sense for explicitly modeling and uncertainties reduce. Recently, Wang et al. [10] proposed a Bayesianbased method, which establishes a posterior distribution for daily flow rate forecasts and uncertainty quantifications.
With the idea of coupling the strength of physical model and datadriven model, O’Connel et al. [17] use paleo hydrologic information constraints to effectively reduce the uncertainty during flood frequency analysis. Following such idea, Biondi et al. [18] firstly simulate the hydrologic response by a rainfallrunoff model named as Infiltration and Saturation Excess (RISE) and then utilize the extracted hydrological information for later deterministic Bayesian Forecasting. Recently, Wu et al. [5] successfully transformed hydrological process described by Xinanjiang model into entities and connections of Bayesian network, which offers a solution to integrate expert knowledge in a datadriven model. In order to offer a taskspecified computing service, datadriven models nowadays have been developed accompany with Internet of things [19, 20], cloudedge computing [21–23], big data [24, 25], and other technologies [26–28].
2.2. Introduction to Attention Model
Core idea behind attention model is to select informative and significant information for task goal, which coincides with principle of human selective visual mechanism. Existing attention models for deep learning can be divided into two groups: hard and soft attention. Hard attention can be comprehended as spatial selection for salient regions, which leads the input areas to be processed as different parts with values of 0 (ignore areas) or 1 (concentrate areas). Meanwhile, soft attention assigns flexible weight values between 0 and 1 to parts of input data.
Mnih et al. [29] introduce general idea of hard attention by optimally selecting salient regions from input images based on predefined selection rules. Their proposed method performs recognition tasks on selected salient regions with a novel RNN structure. Following idea of hard attention, He et al. [30] propose a convolutional neural network, named as TextCNN, to involve attention scheme for scene text detection. Specifically, their scheme not only extracts salient regions as informative parts of input images, but also particularly selects informative features from feature pools for more accurate detection.
Soft attention is flexible and efficient to be an additional and functional part for deep learning networks. For example, Song et al. [31] propose spatial attention module to accurately and robustly recognize human actions. Their proposed method firstly constructs a spatialwise weight scheme to pay attention on informative joints in each RGBD skeleton frame and then assigns spatial attention weights to guide the construction of feature map of the corresponding frame. To utilize global attention information for higher accuracy and robustness, a globally contextaware attention LSTM [12] is built, which successfully constructs and optimizes global attention information for each RGBD human action sequence from dataset [32].
Most recently, Yeung et al. [33] utilize soft attention model to assign framewise weights for frames captured by a sliding window, which helps fuse multiframe information for recognition tasks. Chen et al. [34] build an attention model on the basis of a novel network architecture combining advantages of CNN and RNN, which successfully extracts informative and modalityspecific feature for human activity recognition. They claim that their extracted feature is able to represent highlevel visual information, even training with an imbalanced and limited size dataset. Anderson et al. [35] construct a bottomup and topdown attention model on top of Faster RCNN, which is capable of assigning weights in object or salient image region level. After conducting experiments on several public datasets, they have achieved stateoftheart performance on image caption task. Inspired by above attention models, we designed the proposed spatialtemporal attention model to allocate attention weights for temporal and spatial dimensions.
3. The Proposed Method
We firstly introduce the experimental small river basins, i.e., Lech and Changhua river basin. Then, we introduce overall network structure of the proposed STALSTM model. Finally, a novel spatialtemporal attention module is proposed to show how context information is extracted.
3.1. Introduction to Experimental River Basins
We take two river basins, i.e., Lech and Changhua, as experimental areas, due to their small and complex nature for flood formation. Due to significant development of remote sensing and sensor technologies, we build our prediction model on data collected from remote sensing imageries and sensors. Specifically, data about Lech river basin are achieved from European Centre for mediumrange Weather Forecasts (ECWMF), which is free to download worldwide weather and hydrological information. Meanwhile, we get hydrological data about Changhua river basin from cooperation China government.
Figure 1(a) refers to the map of Lech river basin, where we suppose latitude and longitude range for Lech river basin is from (10.68E, 47.65N) to (10.94E, 48.73N) with 0.01 × 0.01 radius precision. Originating from northwest slope of Lysitar mountain, Lech river finally flows into Danube river at 40 km north of Augsburg. Total length, basin area, and estuary average annual flow of Lech river are 263 km, 4126 km^{2}, and 120 m^{3}/s, respectively. Weather in Lech river basin areas is warm and humid throughout the year.
(a)
(b)
Figure 1(b) shows the map of Changhua river basin, where we can find that Changhua river originates from Jixi County and finally flows into Xinanjiang river. Total length, basin area, and estuary average annual flow of Changhua river are 96 km, 905 km^{2}, and 146.651 m^{3}/s, respectively. Daily runoff value of Changhua river could vary from 0.58 m^{3}/s to 2100 m^{3}/s.
Our goal for both river basins is to realize forecasting of surface runoff at their converge locations (represented as red circles at Figure 1) through the proposed STALSTM model. Specifically, the proposed model adopts multiple hydrological factors as inputs, including precipitation, evaporation, soil tension water, temperature, and wind.
3.2. Network Architecture Design
We firstly offer a brief introduction to mathematical theory of LSTM cell. Then, we explain how attention scheme improves accuracy of flood forecasting. Afterwards, we design a novel LSTM network architecture involving context information to complete task of flood forecasting. At last, we offer pseudocode of STALSTM for readers’ convenience.
3.2.1. Mathematical Theory of LSTM Cell
Due to difficulty in maintaining longdistance dependency information, LSTM modifies RNN by designing gate structure to keep longterm state. Typical structure of a LSTM cell is represented in Figure 2, where we can observe input gate i, output gate o, input modulation gate , forget gate f, and memory cell c. Each LSTM cell is responsible to update its hidden output representation h at each state t with the following function:where x represents input signal and function σ() refers to operation of Sigmoid.
LSTM introduces a longterm memory structure c to maintain longterm information for each cell. Furthermore, it decides whether to forget information inside memory based on the following equations:where ⊙ refers to elementwise multiplication. From equation (3), we can notice that internal memory cell c_{t} would be updated, if forgetting gate f is activated. After activating f, c_{t} will be assigned with signal controlled by input gate i and input modulation gate . Afterwards, LSTM cell will update hidden output h_{t} on the basis of output gate o and current memory cell c_{t}, which is described in equation (4). With above designs of memory cell and different gates, output of LSTM can be associated with previous input signals to memorize longtime sequential information [36].
3.2.2. Function of Attention Scheme in Flood Forecasting
After years of research on applying datadriven models to forecast flood, we find adopting all hydrological data for forecasting could not help achieve satisfactory forecasting results, since some hydrological features are useless or even independent with runoff predictions. For example, soil tension water is an important factor for initial state of floods in humid areas. When value of soil tension water increases and exceeds maximum amount that soil can maintain during raining, it would have no impact on variation of runoff values. Furthermore, soil tension water does not affect river flow values in dry locations with sandy soil, due to bad capability to maintain water of sandy soil. All these facts can be found in hydrological simulation studies or physical models [37].
Due to high spatial and temporal variation of hydrological processes, it is highly recommended to collect hydrological factors by a dense network of hydrometeorological stations. Built on the basis of sufficient stations to collect hydrological features, modeling informativeness degrees of hydrological features would contribute to accurate forecasting. In other words, selectively utilizing informative factors acquired at significant timings and locations is the key idea to adopt attention scheme in flood forecasting. With the ability of focusing on key features and ignoring irrelevant features, datadriven models can appropriately integrate different factors to fit in process of floods, instead of expert knowledge used in physical models. Besides, irrelevant factors could bring in noise to decrease forecasting accuracy.
Based on above discussions, we thus establish a dynamic feature selection mechanism, i.e., attention mechanism, to describe informativeness degrees of flood factors, so that different combinations or weights can be applied on input hydrological features based on context information, i.e., inherent characteristics of river basin for flood formation. It is noted that there exists a trend in deep learning domain that researchers should design all functions by one single network, which brings advantages of less computation and high optimization efficiency. Following such trend, we aim to design a novel LSTM network, i.e., STALSTM, to complete task of choosing variables by attention module, which works with the same function of Principal Component Analysis (PCA) indeed.
3.2.3. Structure of STALSTM
Network structure of the proposed STALSTM is shown in Figure 3, where the proposed attention module allocates dynamic weights to both input and output of STALSTM cells for usage of selection on informative features. After building attention module, hidden outputs of all STALSTM cells would be concatenated to form F for prediction on increase or decrease in runoff values. It is noted that we prefer to predict based on all hidden outputs, since LSTM structure is restricted in preserving global contextual information by designing forgetting gate. Considering flood forecasting as a global regression problem, we thus prefer to perform forecasting on all hidden outputs, rather than hidden output for final state.
As shown in Figure 3, we firstly acquire hydrological raw data from a small or median river basin to construct input dataset X = {x^{i} ∣ i = 1, 2, …, n}, where i and n refer to index and total number of samples. There exists n flood records in input dataset X. Afterwards, raw sample x^{i} is normalized to construct corresponding hydrological feature set :where function f_{norm}() represents normalization function, τ represents total number of states over the whole network, and H × W refers to the size of input feature for each state and is formed by variant hydrological factors.
At state t which refers to the part labeled by blue rectangle in Figure 3, the corresponding tth LSTM cell would compute hidden output of current state h_{t} withwhere function f_{lstm,t}() represents processing of the tth LSTM cell to maintain longterm information, Î_{t} represents weighted input feature computed by spatial attention module S, and ĥ_{t−1} refers to weighted former hidden computed by temporal attention module T. It is noted that number of LSTM cells is the same with total number of states τ.
Specifically, weighted input feature Î_{t} in equation (6) is processed by the proposed spatial attention module S withwhere α_{t} refers to spatial attention weight for state t, ⊗ denotes elementwise operation, and function means operations inside t − 1th spatial attention module to compute α_{t}. It is noted that number of either spatial or temporal attention module is τ − 1.
Meanwhile, hidden output ĥ_{t} is processed by the proposed temporal attention module T withwhere I_{t−1} and I_{t} are original input features for state t − 1 and t, respectively, β_{t} refers to spatial attention weight for state t − 1, ⊙ operation represents elementwise multiplication, and function represents operations inside t − 1th temporal attention module to compute β_{t−1}.
3.2.4. Pseudocode of STALSTM
After describing steps of building STALSTM network with mathematic functions, we provide detail pseudocode in Algorithm 1, where readers can easily understand the process of experiment and implementation details of STALSTM model.

3.3. SpatialTemporal Attention Module
Structure of the proposed spatialtemporal attention module is shown in Figure 4. Compared with traditional physical models which rely on expert knowledge and experience to manually assign factor weights, the proposed attention model can automatically select informative factors to forecast based on inherent characteristics of collected data, which is more flexible for different application scenarios.
3.3.1. Spatial Attention Module
Acquired data from ECWMF sites is gathered with structure of grids ruled by latitude and longitude, which offers detailed information on spatial distribution of inputting hydrological factors. However, small radius precision, i.e., 0.01 × 0.01 radius, could greatly increase computation burden of the proposed model. Therefore, we reconstruct organization form of the acquired data to keep balance on precision and effectiveness, where the reconstructed data structure is shown in Figure 5. We can notice that Figure 5(a) is abstracted from Figure 1 with flip operation and large spatial grids. After accumulating data from original and small grids into large grid, we finally achieve a novel and effective representation for input hydrological factors in Figure 5(b), where each factor can be viewed as a 3D dimension vector with feature, position, and time values inside.
(a)
(b)
Informativeness for input hydrological factors varies greatly in different locations. For example, regions near Lech river should be more important than regions far away, since rainfall near river can quickly be converged to increase runoff values. To utilize spatial property of input hydrological factors, a spatial attention module is constructed to assign weights for hydrological features acquired from different location grids. Essentially, spatial attention module explores interchannel relationship among features obtained from different locations, which help STALSTM to pay more attention on salient grids for accurate forecasting.
As shown in Figure 4, input feature I_{t} is processed by a fully connected layer and a sigmoid function to output spatialwise weight α_{t}:where function f_{s}() refers to sigmoid operation, W_{S} and b_{s} represent weighting matrix and bias parameters for fully connected network, respectively.
3.3.2. Temporal Attention Module
Considering that there generally exists a trend in sequential data, researchers design HoltWinters double exponential smoothing filter to describe relationship between current and former observation values. Following such supposition and implementing it with a dynamically updating weight scheme, we utilize temporal attention module to assign weights for hidden output of STALSTM cells, which acts as a relation modeling function between observations at different timings.
As shown in Figure 3, we utilize hidden output of current state and former state to construct temporal attention module, which explores the difference between two states to decide whether current input is informative. The detailed structure of the proposed temporal module is shown in the right part of Figure 4, where we compute temporal weight β_{t−1} for state t − 1 aswhere function f_{R}() represents ReLU function, W_{t−1,t−2} and W_{t−1,t−2} refer to parameter matrix required to be defined during training, and b_{t−1} is the bias vector. In fact, temporalwise weight is key to control information passing through network from former hidden output to next cell. Therefore, temporal attention module is a beneficial complementary to spatial attention module.
4. Experiments
In experiment section, we firstly introduce dataset and measurements. Then, we design two groups of ablation experiments to discuss sensibility of hydrological features and effectiveness of the proposed attention module. Afterwards, we conduct comparative studies with several flood forecasting methods to compare effectiveness. Finally, we offer implementation details of STALSTM.
4.1. Dataset and Measurement
We utilize two datasets to prove the effectiveness of STALSTM, i.e., Lech and Changhua river basins. It is noted that they have differences in features, since they are collected from ECWMF and cooperation government departments, respectively. Specifically, we utilize the tool provided by ECWMF to collect 7360 hydrological instances of Lech river basin varying from May 1, 2002, to January 1, 2018. These instances have shown significant increase of runoff values, which provide raw data to detect patterns of variation for runoff factor. Meanwhile, we collect 8555 samples varying from January 1, 1998, to December 31, 2010, which represent 40 floods and are manually recorded hydrological data from rainfall, evaporation, and gaging station in Changhua watershed. It is noted that Lech dataset contains sufficient information on adopted hydrological features, i.e., precipitation, evaporation, soil tension water, temperature, and wind. Meanwhile, Changhua dataset is short of information on temperature and wind. Shortage of these two minor hydrology factors does not have a great impact on accuracy of flooding results. Moreover, we achieve data of soil tension water according to calculation of Xinanjiang model (Short for XAJ model), which is a famous physical model to forecast flood in semihumid regions.
Table 1 offers descriptive statistics for flow and rainfall data collected from Lech and Changhua river basin, where we can observe that data representation for both datasets is different. This is due to their distinctive data collection operations, where Lech dataset is constructed from remote sensing imageries and data in Changhua Dataset is collected manually. From Table 1, we can notice obvious difference in data distribution between Lech and Changhua rivers, since characteristics of different river dataset varies greatly from one to another. This phenomenon brings large difficulty to accurately forecast river runoff values, since it requires models to describe relation function between input hydrological features and runoff values without overfit performance.

Due to the nature of ECWMF, i.e., they collect hydrological data every 3 hours, we refer each state in STALSTM as 3 hours for modeling. As represented in Figure 5, we utilize such 3D feature by defining time value (state value) as 6, which results in an input feature to describe hydrological factors in 18 hours. After training, we perform regression task with STALSTM on runoff values for next 1, 2, and 3 states based on input of hydrological features of former 6 states.
We use standard quality measures, i.e., Root Mean Square Error (RMSE), Mean Absolute Percent Error (MAPE), and Deterministic Coefficient (DC) to measure the quality of flood forecasting. These three measurements are formulated aswhere y_{i} and q_{i} refer to forecasting runoff values and corresponding groundtruth runoff values, refers to average of groundtruth runoff values, and n is number of testing instances. It is noted that higher DC value implies more convinced flood forecasting results. Meanwhile, RMSE and MAPE are used to quantify similarity between forecasting results and groudtruth values, where smaller values in RMSE and MAPE indicate high accuracy on flood forecasting quality.
4.2. Performance Analysis
We design three comparative experiments to analyze performance of STALSTM. Specifically, the first experiment is designed to estimate sensibility of input hydrological features, the second one is used to compare the effectiveness of STALSTM with or without attention module, and the last experiment aims to compare performance of STALSTM with comparative methods. For all experiments in this paper, input time period is settled form T5 to T, which makes input data contain hydrological features of 6 states before T.
4.2.1. Feature Sensibility Experiment
We show related experimental data on feature sensibility in Table 2. In each case of experiment, we eliminate one input hydrological feature and keep other inputs remain same, which could show sensibility of specifical feature in obtaining accurate forecasting results. Due to shortage of two hydrology features in Changhua dataset, we prefer Lech dataset to complete feature sensibility experiment. To better compare results, we offer two more statistics data in tables, i.e., mean and bias value represented with subscripts of A and σ, where the latter one calculates the difference value between result under current running condition and achieved by STALSTM.

From Table 2, we can notice smallest value in RMSE is achieved by WoTemperature, and the best performance in MAPE and DC are achieved by WoWind. In other words, eliminating factors of either temperature or wind have a small impact on forecasting results. Based on this observation, we can conclude these two hydrological factors are less related with formatting of floods. Meanwhile, the worst performance in three measurements is all achieved by WoPrecipitation, which proves that rainfall is the most significant factor to accurately forecast flood.
For experiment of WoEvaporation, we can notice bias values in three measurements increase with larger forecasting time. On the contrary, we can find bias values corresponding to WoSTWater decrease with larger forecasting time. Based on this observation, we could know that importance of evaporation gradually becomes larger for longtime forecasting, while WoSTWater mostly contributes to shorttime forecasting. Essentially, STWater defines the initial state before formation of flood, which makes it significant to shorttime forecasting. Meanwhile, evaporation affects the formation of flood throughout the whole process of flood. Last but not least, we should notice that both factors of evaporation and STWater are not major features for accurate prediction, when comparing with precipitation.
4.2.2. Ablation Experiment
Details of comparative experiment on effectiveness of attention modules are shown in Table 3. Specifically, we perform three cases of experiments with spatial attention module only, with temporal attention module only, and with both modules, respectively.

From Table 3, we can observe that measurementrelated performance obtained by STALSTM is larger when comparing with spatial or temporal attention module only, which proves effectiveness of attention module to involve context information for hydrological feature enhancement. We can also notice that most cases of second best performance and best performance in bias value are achieved by method with temporal module, which proves that temporal attention information contributes more to forecasting than spatial attention information. Essentially, flood is a complex procedure of runoff generation, separation, and routing, which leads timing to be an important factor for flood forecasting. Therefore, temporal context information extracted from sequential data contributes more to accurate forecast flood.
4.2.3. Experiment with Comparative Methods
Tables 4 and 5 offer the detailed statistics by testing STALSTM and comparative methods on Lech and Chuanghua dataset, respectively. Specifically, we implement SVM, LSTM, and 10 layers FCN (fully connected network) to work as comparative methods. For fair comparisons, structure and parameters of LSTM are settled to be the same with STALSTM. Moreover, we implement XAJ model as a comparative study in Table 5 to offer data for comparing datadriven models with physical models. Reason to abandon usage of XAJ model on Lech dataset lies in the fact that XAJ model is specially designed for Changhua river basin or other semihumid regions, which is not fit for Lech river basin under our supposition.


As proved by R_{A}, M_{A}, and D_{A} in Table 4, STALSTM achieves the best performance in Lech dataset. Meanwhile, Table 5 shows that STALSTM achieves the best performance in DC and second best performance in RMSE and MAPE after conducting experiments on Changhua dataset. By comparing between XAJ model and STALSTM in Table 5, we can notice that specifically designed physical model, i.e., XAJ model, is capable of obtaining a higher accuracy on flood forecasting, especially in longtime period forecasting. Such phenomenon can be explained by the truth that flood is a complex process for datadriven modeling under a limited size of data. Therefore, utilizing insufficient data to fit flood process without building inherent and knowledgeembedded relations could not work well for longtime period forecasting. Furthermore, errors and noise for datadriven are easy to accumulate without appropriate means of error optimization during forecasting process.
By comparing STALSTM with other datadriven models in Tables 4 and 5, we can notice that FCN performs better than STALSTM for forecasting at T + 3 in Lech dataset. However, it fails to obtain consistent performance for longtime period forecasting, i.e., T + 6 and T + 9. In fact, ten layer structure of FCN with limited size of parameters makes it suitable to fit in cases of relatively simple shorttime forecasting. However, complexity increases in a large degree with longer forecasting period, which results in worse performance with insufficient parameters of FCN to model and optimize complexity. SVM is widely used to handle cases of learning with limited size of data. However, traditional SVM is not appropriate to conduct regression inference based on complex and sequential data, which leads to worse performance achieved by SVM than STALSTM. Original LSTM performs worse than STALSTM in all cases, which proves that attention module is of significance in improving accuracy by focusing on informative hydrological factors and timings, especially in forecasting with small dataset.
It is evident to observe that all datadriven methods perform better for shorttime forecasting, i.e., T + 3, since core task of shorttime forecasting for datadriven model is to fit data with suitable parameters and prevent overfitting. Due to accumulations of uncertainty and errors sourced from models and input factors like weather forecast, there exist a decrease in performance with longtime forecasting. To deal with complexity of longtime forecasting, the proposed STALSTM is built on the basis of LSTM structure to resolve longtime dependencies, which designs cell memory to represent and memorize longtime dependencies. Moreover, STALSTM inherently models context information to better describe longterm memory and decreases impacts brought by noisy input. As a result, STALSTM is capable of forecasting flood with higher accuracy and longer time period than other datadriven models, which is proved by best performance at T + 6 and T + 9.
We adopt one instance in test datasets to compare forecasting performance on runoff values achieved by STALSTM and other comparative methods in Figure 6. According to plot for T + 3, we can find that FCN performs best, performance of STALSTM is close to LSTM, and SVM achieves the worst forecasting result. Essentially, the forecasting curve obtained by SVM is too smooth to coincide with great timevarying characteristics of groundtruth runoff values, due to its inherent modeling supposition that output values should be smooth to a certain extent. Due to smooth output property, we could find that forecasting results of SVM are low in RMSE, but fail to coincide with timevarying runoff forecasting plot. By comparing forecasting curves between shorttime and longtime forecasting, we can notice obvious distortions for T + 6 and T + 9 due to large increase in complexity and difficulty of forecasting task. Among methods for longtime forecasting, STALSTM performs best, LSTM and SVM achieves slightly worse results, and FCN performs worst. Above all, overall forecasting performance of STALSTM is much more accurate and consistent than comparative methods in this case study. Meanwhile, plots achieved by SVM and LSTM are close in visual effects, where we could find a certain and obvious degree of distortion. Last but not least, FCN obtain forecasting plot with serious distortion and large deviation in several key timings.
(a)
(b)
(c)
4.3. Implementation
Experiments were performed on a server (2.4 GHz 6core Xeon CPU, 60 GB RAM and one Nvidia GeForce GTX 1080 Ti card). For Lech and Changhua datasets, we utilize 4folder cross validation to test STALSTM and other comparative methods. During network design, dimension of hidden state, and total state number of STALSTM network are defined as 128 and 32, respectively. During training, learning rate, weight decrease, and iteration times of STALSTM network are defined as 0.0025, 10^{−6}, and 500, respectively. We update the learning rate every 100 times, and the corresponding decrease rate is 0.01.
5. Conclusion
Facing difficulties in transforming problemrelated data from physical space to cyber space and utilizing models to solve problems in cyber space, we firstly define the problem of flood from views of both physical and cyber space, and then propose STALSTM by embedding spatialtemporal attention information for flood forecasting. STALSTM could selectively utilize informative hydrological features acquired from significant locations and timings. Experiments on Lech and Changhua river basins prove the effectiveness of STALSTM by comparing with several comparative studies. Our future work includes construction of a lightweight flood forecasting model by eliminating useless hydrological features, which not only boosts running speed of flood forecasting system, but also largely decrease complexity in collecting data and modeling feature relationship.
Data Availability
The image and acquired sensor data used to support the findings of this study were supplied by Yukai Ding under license and so cannot be made freely available. Requests for access to these data should be made to Yirui Wu (wuyirui@hhu.edu.cn).
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by National Key R&D Program of China under Grant 2018YFC0407901, Natural Science Foundation of China under Grant 61702160, and Natural Science Foundation of Jiangsu Province under Grant BK20170892.
References
 S. Han and P. Coulibaly, “Bayesian flood forecasting methods: a review,” Journal of Hydrology, vol. 551, pp. 340–351, 2017. View at: Publisher Site  Google Scholar
 A. Kauffeldt, F. Wetterhall, F. Pappenberger, P. Salamon, and J. Thielen, “Technical review of largescale hydrological models for implementation in operational flood forecasting schemes on continental level,” Environmental Modelling & Software, vol. 75, pp. 68–76, 2016. View at: Publisher Site  Google Scholar
 W. Collischonn, R. Haas, I. Andreolli, and C. E. M. Tucci, “Forecasting river Uruguay flow using rainfall forecasts from a regional weatherprediction model,” Journal of Hydrology, vol. 305, no. 14, pp. 87–98, 2005. View at: Publisher Site  Google Scholar
 V. A. Siqueira, R. C. D. Paiva, A. S. Fleischmann et al., “Toward continental hydrologichydrodynamic modeling in South America,” Hydrology and Earth System Sciences, vol. 22, no. 9, pp. 4815–4842, 2018. View at: Publisher Site  Google Scholar
 Y. Wu, W. Xu, J. Feng, S. Palaiahnakote, and T. Lu, “Local and global Bayesian network based model for flood prediction,” in Proceedings of International Conference on Pattern Recognition, pp. 225–230, Beijing, China, 2018. View at: Google Scholar
 G. Corani and G. Guariso, “Coupling fuzzy modeling and neural networks for river flood prediction,” IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), vol. 35, no. 3, pp. 382–390, 2005. View at: Publisher Site  Google Scholar
 F. M. Fan, D. Schwanenberg, W. Collischonn, and A. Weerts, “Verification of inflow into hydropower reservoirs using ensemble forecasts of the TIGGE database for large scale basins in Brazil,” Journal of Hydrology: Regional Studies, vol. 4, pp. 196–227, 2015. View at: Publisher Site  Google Scholar
 P. R. M. Pontes, F. M. Fan, A. S. Fleischmann et al., “MGBIPH model for hydrological and hydraulic simulation of large floodplain river systems coupled with open source GIS,” Environmental Modelling & Software, vol. 94, pp. 1–20, 2017. View at: Publisher Site  Google Scholar
 P. R. M. Pontes, R. B. L. Cavalcante, P. K. Sahoo et al., “The role of protected and deforested areas in the hydrological processes of Itacaiúnas river Basin, eastern Amazonia,” Journal of Environmental Management, vol. 235, pp. 489–499, 2019. View at: Publisher Site  Google Scholar
 H. Wang, C. Wang, Y. Wang, X. Gao, and C. Yu, “Bayesian forecasting and uncertainty quantifying of stream flows using MetropolisHastings Markov chain Monte Carlo algorithm,” Journal of Hydrology, vol. 549, pp. 476–483, 2017. View at: Publisher Site  Google Scholar
 P.S. Yu, S.T. Chen, and I.F. Chang, “Support vector regression for realtime flood stage forecasting,” Journal of Hydrology, vol. 328, no. 34, pp. 704–716, 2006. View at: Publisher Site  Google Scholar
 J. Liu, G. Wang, P. Hu, L. Duan, and A. C. Kot, “Global contextaware attention LSTM networks for 3D action recognition,” in Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3671–3680, Honolulu, HI, USA, July 2017. View at: Google Scholar
 J. Juliang, W. Yiming, and Y. Xiaohua, “Genetic algorithm based neural network and application in modeling vulnerability of flood disaster bearing body,” Journal of Natural Disasters, no. 2, pp. 53–60, 1998. View at: Google Scholar
 W. Minghua, D. Peng, and L. Zhijia, “Comparison between applications of artificial neural network model and Xinanjiang model,” Hydrology, vol. 28, no. 6, pp. 33–35, 2008. View at: Google Scholar
 C. T. Cheng, W. J. Niu, Z. K. Feng, J. J. Shen, and K. W. Chau, “Daily reservoir runoff forecasting method using artificial neural network based on quantumbehaved particle swarm optimization,” Water, vol. 7, no. 8, pp. 4232–4246, 2015. View at: Publisher Site  Google Scholar
 C. H. R. Lima, U. Lall, T. Troy, and N. Devineni, “A hierarchical Bayesian GEV model for improving local and regional flood quantile estimates,” Journal of Hydrology, vol. 541, pp. 816–823, 2016. View at: Publisher Site  Google Scholar
 D. R. H. O’Connell, “Nonparametric Bayesian flood frequency estimation,” Journal of Hydrology, vol. 313, no. 1–2, pp. 79–96, 2005. View at: Google Scholar
 D. Biondi, P. Versace, and B. Sirangelo, “Uncertainty assessment through a precipitation dependent hydrologic uncertainty processor: an application to a small catchment in southern Italy,” Journal of Hydrology, vol. 386, no. 14, pp. 38–54, 2010. View at: Publisher Site  Google Scholar
 X. Xu, Y. Li, T. Huang et al., “An energyaware computation offloading method for smart edge computing in wireless metropolitan area networks,” Journal of Network and Computer Applications, vol. 133, pp. 75–85, 2019. View at: Publisher Site  Google Scholar
 L. Qi, Y. Chen, Y. Yuan, S. Fu, X. Zhang, and X. Xu, “A QoSaware virtual machine scheduling method for energy conservation in cloudbased cyberphysical systems,” World Wide Web, vol. 23, no. 2, pp. 1275–1297, 2020. View at: Publisher Site  Google Scholar
 X. Xu, X. Zhang, H. Gao, Y. Xue, L. Qi, and W. Dou, “Become: blockchainenabled computation offloading for IOT in mobile edge computing,” IEEE Transactions on Industrial Informatics, vol. 16, no. 6, pp. 4187–4195, 2020. View at: Publisher Site  Google Scholar
 X. Xu, C. He, Z. Xu, L. Qi, S. Wan, and M. Z. A. Bhuiyan, “Joint optimization of offloading utility and privacy for edge computing enabled IoT,” IEEE Internet of Things Journal, p. 1, 2019. View at: Publisher Site  Google Scholar
 L. Qi, R. Wang, C. Hu, S. Li, Q. He, and X. Xu, “Timeaware distributed service recommendation with privacypreservation,” Information Sciences, vol. 480, pp. 354–364, 2019. View at: Publisher Site  Google Scholar
 X. Xu, S. Fu, L. Qi et al., “An IoToriented data placement method with privacy preservation in cloud environment,” Journal of Network and Computer Applications, vol. 124, pp. 148–157, 2018. View at: Publisher Site  Google Scholar
 X. Xu, Q. Liu, Y. Luo et al., “A computation offloading method over big data for IoTenabled cloudedge computing,” Future Generation Computer Systems, vol. 95, pp. 522–533, 2019. View at: Publisher Site  Google Scholar
 X. Xu, Y. Chen, X. Zhang, Q. Liu, X. Liu, and L. Qi, “A blockchainbased computation offloading method for edge computing in 5G networks,” Software: Practice and Experience, 2019. View at: Publisher Site  Google Scholar
 X. Xu, Y. Xue, L. Qi et al., “An edge computingenabled computation offloading method with privacy preservation for internet of connected vehicles,” Future Generation Computer Systems, vol. 96, pp. 89–100, 2019. View at: Publisher Site  Google Scholar
 L. Qi, X. Zhang, W. Dou, C. Hu, C. Yang, and J. Chen, “A twostage localitysensitive hashing based approach for privacypreserving mobile service recommendation in crossplatform edge environment,” Future Generation Computer Systems, vol. 88, pp. 636–643, 2018. View at: Publisher Site  Google Scholar
 V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, “Recurrent models of visual attention,” in Proceedings of Neural Information Processing Systems, pp. 2204–2212, Montreal, Canada, December 2014. View at: Google Scholar
 T. He, W. Huang, Y. Qiao, and J. Yao, “Textattentional convolutional neural network for scene text detection,” IEEE Transactions on Image Processing, vol. 25, no. 6, pp. 2529–2541, 2016. View at: Publisher Site  Google Scholar
 S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, “An endtoend spatiotemporal attention model for human action recognition from skeleton data,” in Proceedings of AAAI Conference on Artificial Intelligence, pp. 4263–4270, San Francisco, CA, USA, February 2017. View at: Google Scholar
 J. Liu, A. Shahroudy, M. L. Perez, G. Wang, L. Y. Duan, and A. K. Chichung, “Ntu RGB+D 120: a largescale benchmark for 3D human activity understanding,” IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1, 2019. View at: Publisher Site  Google Scholar
 S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and F. Li, “Every moment counts: dense detailed labeling of actions in complex videos,” International Journal on Computer Vision, vol. 126, no. 2–4, pp. 375–389, 2018. View at: Publisher Site  Google Scholar
 K. Chen, L. Yao, D. Zhang, X. Wang, X. Chang, and F. Nie, “A semisupervised recurrent convolutional attention model for human activity recognition,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–10, 2019. View at: Publisher Site  Google Scholar
 P. Anderson, X. He, C. Buehler et al., “Bottomup and topdown attention for image captioning and visual question answering,” in Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086, Salt Lake City, UT, USA, June 2018. View at: Google Scholar
 L. Ren, X. Cheng, X. Wang, J. Cui, and L. Zhang, “Multiscale dense gate recurrent unit networks for bearing remaining useful life prediction,” Future Generation Computer Systems, vol. 94, pp. 601–609, 2019. View at: Publisher Site  Google Scholar
 L. Jintao, L. Xiaopeng, C. Xi, and M. Can, “Soil moisture content distribution in intermittent rainfall and its effect on infiltration,” Journal of Soil and Water Conservation, vol. 23, no. 5, pp. 96–100, 2009. View at: Google Scholar
Copyright
Copyright © 2020 Yirui Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.