Discrete Dynamics in Nature and Society

Volume 2017, Article ID 4293731, 18 pages

https://doi.org/10.1155/2017/4293731

## Spatial Interpolation of Annual Runoff in Ungauged Basins Based on the Improved Information Diffusion Model Using a Genetic Algorithm

^{1}Research Center of Ocean Environment Numerical Simulation, Institute of Meteorology and Oceanography, PLA University of Science and Technology, Nanjing, China^{2}Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disaster, Nanjing University of Information Science & Technology, Nanjing 210044, China^{3}Key Laboratory of Surficial Geochemistry, Ministry of Education, Department of Hydrosciences, School of Earth Sciences and Engineering, State Key Laboratory of Pollution Control and Resource Reuse, Nanjing University, Nanjing 210093, China

Correspondence should be addressed to Ren Zhang; moc.anis@ener_nijam and Dong Wang; nc.ude.ujn@gnodgnaw

Received 21 November 2016; Accepted 31 January 2017; Published 14 March 2017

Academic Editor: Alicia Cordero

Copyright © 2017 Mei Hong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Prediction in Ungauged Basins (PUB) is an important task for water resources planning and management and remains a fundamental challenge for the hydrological community. In recent years, geostatistical methods have proven valuable for estimating hydrological variables in ungauged catchments. However, four major problems restrict the development of geostatistical methods. We established a new information diffusion model based on genetic algorithm (GIDM) for spatial interpolating of runoff in the ungauged basins. Genetic algorithms (GA) are used to generate high-quality solutions to optimization and search problems. So, using GA, the parameter of optimal window width can be obtained. To test our new method, seven experiments for the annual runoff interpolation based on GIDM at 17 stations on the mainstream and tributaries of the Yellow River are carried out and compared with the inverse distance weighting (IDW) method, Cokriging (COK) method, and conventional IDMs using the same sparse observed data. The seven experiments all show that the GIDM method can solve four problems of the previous geostatistical methods to some extent and obtains best accuracy among four different models. The key problems of the PUB research are the lack of observation data and the difficulties in information extraction. So the GIDM is a new and useful tool to solve the Prediction in Ungauged Basins (PUB) problem and to improve the water management.

#### 1. Introduction

Prediction in Ungauged Basins (PUB) [1] is an important task for water resources planning and management and remains a fundamental challenge for the hydrological community. Prediction in Ungauged Basins (PUB) was identified as a key issue in hydrological studies by IAHS. Accurate estimates of hydrologic variables at ungauged sites such as streamflow allow objective, quantitative, and statistical decision-making with respect to water resources management and natural hazard assessments. The lack of data for model calibration and verification in ungauged basins requires the hydrological regionalization [2] to transfer information (e.g., model parameters) from gauged catchments. The regionalization allows estimating parameter values of hydrological predictive tools without calibration. Regionalization can be defined as the transfer of information from one catchment to another [2]. This transfer is typically from gauged to ungauged catchments (e.g., [3, 4]). Its aim is to estimate parameter values of hydrological models for any/every grid cell, subcatchment, or large geographic region without a need for calibration or “tune” of the model to get the best fit.

Over the years, regionalization has received increasing attention from the hydrological community. A number of regional models are currently available, including () proxy-basin method [5, 6], () spatial interpolation method, for instance, linear interpolation by Guo et al. [7], the inverse distance weighting (IDW) interpolation by Di Piazza et al. (2001), and Kriging interpolation by Vandewiele and Elias [8], () clustering approach [9, 10], () bi- and multivariate regression method [11, 12], and () one step regression–regional calibration [13]. Among those, spatial interpolation method is one of the earliest and most widely used methods, which estimates the value of unknown spatial data based on known spatial data. Its essence is the spatial forecasting of the whole unknown region using a few known points. Deterministic and geostatistical techniques are two main groupings of spatial interpolation techniques to produce a continuous surface from point measurements. Deterministic interpolation techniques create surfaces from measured points using mathematical functions, which are based on either the extent of similarity (e.g., inverse distance weighted) or the degree of smoothing (e.g., radial basis functions). Geostatistical interpolation techniques (e.g., Kriging) utilize both the mathematical and the statistical properties of the measured points [14, 15]. Recent studies highlight that geostatistical interpolation, which has been originally developed for the spatial interpolation of point data (see e.g., [16]), can be effectively applied to the prediction problem of the streamflow regime in ungauged basins [17, 18].

Recently, geostatistical methods have proven valuable for estimating hydrological variables in ungauged catchments. In all geostatistical methods, the traditional Kriging and its related algorithms (e.g., Universal Kriging or Cokriging (COK)) are the most widely used. Skøien and Blöschl [18], for instance, developed the topological Kriging technique (or top-Kriging), which accounts for hydrodynamic and geographical dispersion. Their results indicate that this technique can not only outperform deterministic runoff models in regions where stream gauge density is sufficiently high, as it avoids problems with input data errors and parameter identifiability, but also provide more robust estimates than regional regression models [19]. Comparison of top-Kriging with Physiographical-Space Based Interpolation (PSBI) highlights the complementary utility of the two methods for headwater and larger scale catchments [20].

However, four major limitations of geostatistical methods (e.g., Kriging and its related algorithms) are presented.(1)The ordinary Kriging is defined as a “best linear unbiased estimator.” Kriging is “linear” because its estimates are calculated by a linear equation. While the change of runoff is a nonlinear process, these will cause some deviations.(2)The methods need more sites for modelling, generally more than eight sites [21]. So it is more feasible when it is used in the large watershed interpolating. When the study sites are little in minor watershed, the method can do nothing for the space structure of hydrology variables.(3)The interpolation process of many geostatistical methods needs to add water balance constraints, which controls the export flows of each subbasin. Before the runoff calculation, the runoff data in the study should be normalized [22]. In the interpolation process, in order to measure the correlation between the basins, the distance between the subbasins of the interpolation algorithm need to be redefined [23]. The calculation of the entire process is complex and the conversion is troublesome.(4)The application of these methods is mainly in European natural basin [24]. The application in the watershed acutely impacted by human activities needs to be verified.

In order to overcome the above four questions, we need to find new spatial interpolation method which is more effective and rational. So we introduce the information diffusion model (IDM) in our paper. IDM is a useful method to deal with the small sample problem [25, 26]. Spreading the observed data can extract many additional information based on the diffusion methods. Huang [25] can easily determine simple window width (SWW) with incomplete data based on the nearby criteria. This method was widely used for reliability of risk assessment [27–29]. But IDM with SWW (SIDM) cannot accurately calculate the hydrological or meteorological data which follow abnormal distribution. To solve the problem, Wang et al. [30] presented the optimal window width based on IDM (OIDM) using the principle of least mean squared errors. But the optimal window width (OWW) may easily cause the local optimal problem. Genetic Algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems [31]. Hence, to get the global optima diffusion coefficients, Bai et al. [32] propose a new information diffusion model using a GA (GIDM) to interpolate the river runoff. But they just used GIDM method to interpolate the river runoff time series. Now our paper expand the idea of GIDM and use them to spatially interpolate the river runoff in ungauged basins.

No previous hydrological literature reported that the IDM has been used for establishing a hydrological model for hydrological spatial interpolation. So we explain the principle of the SWW and OWW in Section 2; we also discuss the new method how to improve the information diffusion model based on GA. In Section 3, we illustrate interpolation of river runoff based on IDM in detail. To test our new method, seven experiments for the runoff interpolation based on the GIDM in the Yellow River are carried out in Section 4, compared with SIDM, OIDM, IDW, and COK. Finally, Section 5 summarizes and discusses the results.

#### 2. Information Diffusion Model and the Window Width

##### 2.1. Simple Window Width and Optimal Window Width

The principle of information diffusion and simple window width (SWW) is discussed in the Appendix and was introduced in the previous literature [25].

The SWW method extracted more data from the small sample [26, 33]. However, when the population does not follow a normal distribution, the method is invalid. Based on the idea of the least mean square error, the optimal window width (OWW) can be obtained as follows [30]:where and mean the ordinal number of records and iterations; initial iterative value is .

The OWW based on the mean value will cause the local optimization problem [31]. So GA is introduced.

##### 2.2. Searching the General Optimal Window Width Based on GA

Based on the principle of natural genetics and biological evolution, Genetic Algorithms (GA) can effectively avoid the “local optima” problem [31]. So we use GA to search the global optimal diffusion coefficients. For window width searching, the combination of the IDM and GA includes three major phases. Firstly, the GA initializes a population that compounds random codes from the search domain [30], where and are the minimum and maximum value of the samples. Then we carry out the evaluation of the fitness of all chromosomes. Based on Wang et al. [30], the window width can be obtained as follows:where means different records from sample () and denotes the information diffusion estimator. The second-order schemes are motivated bywhereThe global optimal can be searched as follows:The evolutionary processes can be found in [31]. So we can get the improved window width (IWW).

The above sentences explain the techniques of IDM and improved IDM; how to interpolate the runoff based on IDM will be discussed in following section.

#### 3. Information Diffusion Method with Fuzzy Inference for the Runoff Estimation

The IDM based on the numerical method [28] is as follows. Let be a training set of monthly discharge data on ( is the real line), where input means the index of records sorted by meteorological observations or chronological order, like soil moisture, precipitation, evaporation, and so forth, and means the river discharge.

Let be the range of and be the domain of . will denote the element of , and the same for by . LetDealing with membership functions based on (A.1) and (A.2), the following equation can be obtained:

and are window widths. is called an illustrating point. The information gain of is as follows:Then, A fuzzy relation matrix [34] is as follows: which can be got from an information matrix by usingThen is denoted as the input fuzzy set to calculate the output fuzzy set ; The fuzzy inference formula is used as follows:in which max–min fuzzy composition rule is denoted as operator “”;where ; so we can getFinally, we can generate the gravity center of the fuzzy set as the output:In general, the given sample is used to construct the relationship between river discharge and its meteorological factor or its antecedent values as follows:where is an input vector including , and means the flow in the ungauged basins. So the value of river runoff in the ungauged basins can be obtained by the IDM method.

#### 4. Case Study

To test the runoff interpolation effectiveness of our model (GIDM), we carried out six experiments. These experiments can divide into five groups. Experiments 1 and 2 are spatial interpolation experiments of runoff on mainstream of the Yellow River; Experiments 3 and 4 are spatial interpolation experiments of runoff on tributaries of the Yellow River. Experiments 5 and 6 are spatial interpolation experiments of runoff on the mainstream and tributaries of the Yellow River. These six experiments are carried out to test the spatial interpolation and prediction ability of GIDM model for the mainstream and tributaries of the same river basin. Finally, experiment 7 is the spatial interpolation experiment of runoff in Daying mine region in Guizhou Province (representing nonclosure small watershed of no runoff information), which is used to validate the spatial interpolation and prediction ability of GIDM model for the minor watershed with few hydrological sites. An application of GIDM for runoff interpolation is compared with SIDM, OIDM, IDW, and COK based on the same date.

##### 4.1. Study Area and Data

Tangnaihai, Lanzhou, Toudaoguai, Longmen, Tongguan, Huayuankou, Gaocun, Aishan, and Lijin stations on mainstream of the Yellow River and Hongqi, Huangfu, Wenjiachuan, Baijiachuan, Fanguyi, Zhangjiashan, Baimasi, and Huaxian stations on tributaries of the Yellow River have been selected for the experiments 1–6 (see Figure 1). The Yellow River is the third-longest river in Asia, following the Yangtze River and Yenisei River, and the sixth-longest in the world at the estimated length of 5,464 km. Originating in the Bayan Har Mountains in Qinghai province of western China, it flows through nine provinces, and it empties into the Bohai Sea near the city of Dongying in Shandong province. The Yellow river basin has an east–west extent of about 1,900 kilometers and a north–south extent of about 1,100 km. Its total basin area is about 742,443 square kilometers. So the hydrological stations on the Yellow River are selected for its importance to test the performance of our model.