Advances in Meteorology

Advances in Meteorology / 2015 / Article
Special Issue

Hydrological Processes in Changing Climate, Land Use, and Cover Change

View this Special Issue

Research Article | Open Access

Volume 2015 |Article ID 935868 | 12 pages | https://doi.org/10.1155/2015/935868

Interpolation of Missing Precipitation Data Using Kernel Estimations for Hydrologic Modeling

Academic Editor: Fubao Sun
Received17 Oct 2014
Revised15 Jan 2015
Accepted05 Feb 2015
Published08 Oct 2015

Abstract

Precipitation is the main factor that drives hydrologic modeling; therefore, missing precipitation data can cause malfunctions in hydrologic modeling. Although interpolation of missing precipitation data is recognized as an important research topic, only a few methods follow a regression approach. In this study, daily precipitation data were interpolated using five different kernel functions, namely, Epanechnikov, Quartic, Triweight, Tricube, and Cosine, to estimate missing precipitation data. This study also presents an assessment that compares estimation of missing precipitation data through th nearest neighborhood (NN) regression to the five different kernel estimations and their performance in simulating streamflow using the Soil Water Assessment Tool (SWAT) hydrologic model. The results show that the kernel approaches provide higher quality interpolation of precipitation data compared with the NN regression approach, in terms of both statistical data assessment and hydrologic modeling performance.

1. Introduction

Precipitation data are key factors in hydrologic modeling for estimating rainfall-runoff mechanism [1]. Malfunctions in running hydrologic modeling can occur due to noncontinuous time series precipitation inputs. In light of this important issue, estimation of missing precipitation data is a challenging task for hydrologic modeling. Many hydrologic modeling require interpolation of missing precipitation data [2], meteorological data series completion [3], or imputation of meteorological data [4]. To estimate missing precipitation, researchers should consider spatiotemporal variations in precipitation (rainfall and snowfall) values and the related physical processes. However, accounting for spatial-temporal variation and physical processes can be difficult if there is a lack of equipment for measuring precipitation. Thus, statistical approaches have emerged as widely used methods for filling in missing precipitation data [5].

Many studies have investigated supplanting missing streamflow data with several statistical approaches [5], but there are limited studies on the interpolation of incomplete precipitation and temperature data [610]. Recently, the investigation of artificial neural networks (ANNs: [11]), a more advanced statistical approach, to estimate missing precipitation data, has been proposed [12]. ANNs can learn from training data to reconstruct a nonlinear relationship and obtain values for missing data. Pisoni et al. [13] investigated the interpolation of missing data for sea surface temperature (SST) satellite images using the ANN method; they found that the results from the ANN approach show better accuracy than the results from an interpolation system, as suggested by Seze and Desbois (1987). Nevertheless, ANNs are still under dispute because their neuron systems cannot provide clear relationships between data [14].

The American Society of Civil Engineers (ASCE) Task Committee [15] discussed that although the performance of ANNs for estimating missing precipitation data has already been verified, an alternate solution should be suggested for cases in which the available data are insufficient due to the reliance of ANNs on high data quality and quantity. Additionally, ANNs have other limitations, such as a lack of physical concepts and relations, based on the experience and preferences of those using, studying, and training the networks [1517]. Since ANNs are regarded as black-box model [18], it is difficult to use this method for realizing more linear relationships, even though ANNs can achieve convergence for almost any problem [17]. Thus, for real mechanisms in hydrologic models, in which linear relationships exist between series of weather inputs, the solution is less explicit [19].

Generally, a regression or a distance weighted method is most commonly used for estimating missing precipitation for hydrologic modeling [20]. Daly et al. [21] also propose a variety of regression models to incorporate spatial variation in weather data. However, Creutin et al. [22] found that even though simple linear regression of interpolation approaches show satisfactory serial correlation of daily or monthly streamflow; precipitation patterns do not show proper correlation when simple linear regression or interpolation approaches are used. Furthermore, if a regression method is used for estimating missing precipitation to make refined precipitation time series, a small data sample would not follow the normal distribution based on basic theory of linear regression.

Another approach for estimating missing precipitation data to use neighboring data is based on distance weight. Xia et al. [23] used the closest station to reconstruct missing precipitation data through geometrical distance weight; Willmott et al. [24] used arithmetic data averaging from neighboring data to filling missing precipitation; and Teegavarapu and Chandramouli [25] used an inverse distance weight method from neighboring data to estimate missing precipitation data. Smith [26], Simanton and Osborn [27], and Salas [28] suggest that traditional weighting and data-driven methods, namely, distance based weighting methods, are interpolated for estimating missing precipitation data. Distance weight approaches for estimating missing precipitation data are combined with linear regression and median distribution of regression [29, 30]. Young [31] and Filippini et al. [32] suggested spatially interpolating the correlation to define weight in terms of each station.

Estimation of missing precipitation data is possible when data are available for the same location. Linacre (1992) investigated the interpolation of missing precipitation data by using the mean value of a data series at the same location and Lowry [33] suggested simple interpolation between available data series. Acock and Pachepsky [34] used data from several days before and after missing precipitation data points for estimating the incomplete precipitation data. -nearest neighborhood (nn) regression is a basic method for estimating missing precipitation data that considers vicinity. However, the method has some weaknesses when the data have outliers or a nonlinear trend exists around the missing data. While nn regression has a fundamental assumption to follow a normal distribution which is statistically unsound, the kernel method uses a mean value, which can overcome nn regression’s weakness through the kernel weighting method. By using neighbor data in a kernel function, even though the data show a nonlinear trend, it can overcome nn regression weakness.

The objective of this study was to reconstruct daily precipitation data by using five different kernel functions (Epanechnikov, Quartic, Triweight, Tricube, and Cosine) to estimate missing precipitation data. This study also presents an assessment that compares estimation of missing precipitation data through nn regression to the five different kernel estimations and their performance in simulating streamflow using the Soil Water Assessment Tool (SWAT) hydrologic model. The remainder of this paper is organized as follows. Section 2 provides a description of the study area and the hydrologic model. In Section 3, the methodology of the five different kernel methods is presented. Section 4 presents the results of the interpolation of the missing daily precipitation data and the hydrologic model simulation. Finally, conclusions are in Section 5.

2. Study Area and Hydrologic Model

The Imha (Figure 1) watershed was selected as the test bed for this study. The Imha watershed is a tributary of the Nakdong River basin and is located in the upper side of the Nakdong River basin in South Korea. It is characterized by a mountainous area; approximately 79.8% of the total area of 1,361 km2 is mountainous. The slope in the Imha watershed is 40% to 60%, that is, 655 km2 as 33% of total watershed area. The elevation of the Imha watershed ranges from 80 to 1215 m. The average annual precipitation, minimum temperature, maximum temperature, humidity, and wind speed for the Imha watershed are 1,050 mm, 7°C, 18.8°C, 65%, and 1.6 m/s, respectively (Water Management Information System (WAMIS), http://www.wamis.go.kr/). Since the climate conditions in this area are defined by warm temperatures, there is no precipitation in the form of snow; all precipitation consists of rainfall. For this evaluation of interpolation of precipitation data and hydrologic model performance, precipitation and streamflow gauges were selected as shown in Figure 1 and precipitation and streamflow data were sourced from the Water Management Information System (http://www.wamis.go.kr/).

This study selected the SWAT model for analysis. SWAT has a GIS extension, ArcSWAT, which allows the use of various GIS based datasets to model the geomorphology of a given basin. The SWAT model was developed through research by the USDA (United States Department of Agriculture), Agricultural Research Service (ARS). Major data inputs for SWAT include temperature (maximum and minimum), daily precipitation, solar radiation, relative humidity, wind speed, and geospatial data representing soil types, land cover, and elevation. A watershed is divided into smaller subbasins, which must be broken up into smaller units known as hydrologic response units (HRU). Each of these HRUs is characterized by uniform land use and soil type. SWAT can be used to accurately predict hydrologic patterns for extended periods of time [35]. Canopy interception is implicit in the curve number (CN) method and is explicit for the Green-Ampt method. Infiltration is most accurately accounted for using the CN method in SWAT. An alternative method may be used to account for infiltration is the Green-Ampt method. However, the Green-Ampt method has not been shown to increase accuracy over the CN method, thus the CN method was used in this study.

3. Methodology

This study used the five kernel functions, Epanechnikov, Quartic, Triweight, Tricube, and Cosine, as a weight to predict missing values. Tricube method has large weight around target point. Even though Tricube weight is similar to Triweight, the decreasing acceleration of weight as far away from target point is less than Triweight. Next higher weight around target point is Quartic, which speed in decreasing weight is similar to Triweight. Both Epanechnikov and Cosine have small effect on neighboring values. A brief description of the five kernel functions and their application for reconstructing the missing values is presented in the following and specific kernel functions are described in Appendix A.

3.1. Epanechnikov

The Epanechnikov kernel is the most often used kernel function. The Epanechnikov kernel assigns zero weight to observations that are a distance of four, six, and eight away from the reference point. These values correspond to the choice of the interval width. This is often called the choice of smoothing parameter or band width selection. The main character of the Epanechnikov kernel is that even though the distance is far away from target value, namely, the missing value in this research, its estimation is smooth. A brief description is given by the following: where is the kernel function and is surrounding the nearest value as an independent in data.

3.2. Quartic

The second kernel function used in this research was the Quartic kernel which has more weight sensitivity based on distance from the missing value. Since the applied weight is largely different between near and far data points, it is more influenced by surrounding data. It consists of a fourth-order equation which has more sensitivity in terms of distance than second-order equation. It is described by the following:

3.3. Triweight

The third kernel function used in this research was the Triweight kernel which consists of a sixth-order equation. It has the most sensitivity in terms of distance because a sixth-order equation estimates the missing value based on the difference in distance with a weighted function as shown by the following:

3.4. Tricube

The fourth kernel function used in this research was the Tricube kernel, which uses absolute values. Since it uses absolute values, it presents a smoother pattern for nearest values than the Triweight kernel. However, as the values move further away from the nearest values, it shows a steep trend. The Tricube kernel has the most sensitivity in terms of weighted distance due to the fact that it consists of a ninth-order equation, as shown in the following:

3.5. Cosine

The fifth kernel function used in this research was the Cosine kernel function. It is a widely applied kernel function in various fields because it has a constant curvature. Its shape is similar to the Epanechnikov kernel, even though it uses a cosine function as shown in the following:

3.6. Calculation of the Missing Value

After using a kernel function to calculate the weight of the missing data, estimation of the missing data is performed using the following:where is the missing value, is the number of the nearest neighborhood, and is the th nearest values which correspond to (positive means the right side and negative means the left side). The kernel function should have bilateral symmetry based on a value of zero. If using, for example, the four nearest neighborhoods for estimating the missing value, the neighborhood values used will be two from right side and another two from left side. The specific equation for this example is shown in the following and example calculation is described in Appendix B:

3.7. Statistic Tests

A normality test is required to evaluate for infilling the methods for filling in interpolation data. The Shapiro-Wilk [36] normality test was used with nineteen samples to determine whether the average difference is normally distributed or not. The test statistic is as shown in the following:where is the th order statistic, namely, the th smallest value in the sample, is the mean of , and is a constant given by ordered data. The null hypothesis of the Shapiro-Wilk normality test is that sample is normally distributed, and if significance probability is less than 5%, the null hypothesis will be denied, meaning the sample does not satisfy normal distribution. Since the significance probability for the entire group (Table 1) is below 5%, the null hypothesis is denied. This study should, therefore, use a nonparametric test for normality analysis.


4-NN6-NN8-NN
DF valueDF valueDF value

Ep0.808190.00150.740190.00020.766190.0004
Qu0.831190.00330.768190.00040.721190.0001
Tw0.827190.00290.789190.00080.745190.0002
Tc0.839190.00450.764190.00040.742190.0002
Co0.817190.00200.742190.00020.763190.0003
Reg0.876190.01860.858190.00890.883190.0242

(Ep: Epanechnikov, Qu: Quartic, Tw: Triweight, Tc: Tricube, Co: Cosine, and Reg: regression).

The Friedman test [37], which is a kind of -sample test that can provide the difference between paired values, was selected as a nonparametric test. This method evaluates a small sample for differences by ranking a sequence list. The null hypothesis of the Friedman test is that there is no average difference in each group and if the significance probability is less than 5%, the null hypothesis will be denied, thus conducting that in each group exists an average difference. A brief description of Friedman test is in the following:where and are the sum of the squared treatment and sum of the squared error, respectively.

The null hypothesis in this instance was denied because the significance probability was less than 5% for each and this study concluded that each interpolation method has an average difference, which is why each method is considered independent, even though this study used five different kernel methods. For example, the average rank for four reference points for nn-regression, Tricube, Quartic, Cosine, Triweight, and Epanechnikov varies from a large average to a small average rank (Table 2). For six reference points, the nn-regression, Tricube, Triweight, Quartic, Cosine, and Epanechnikov were ranked as shown in Table 2. In another example, eight reference points used nn-regression, Triweight, Quartic, Cosine, and Epanechnikov average rank (Table 2). As shown in Table 2, the nn-regression has the largest average rank and Epanechnikov has the smallest rank average for all of the reference point cases. This result proves the dissimilarity of these methods.

(a) 4-NN

MeanSDMin.Max. valueMean rank value

Ep19−1.294.46−15.505.762.53 55.6020.0000
Qu19−1.425.03−15.067.512.74
Tw19−1.645.37−14.908.132.58
Tc19−1.185.06−14.878.294.47
Co19−1.404.69−15.426.082.68
Reg192.765.42−13.4714.386.00

(b) 6-NN

MeanSDMin.Max. valueMean rank value

Ep19−2.614.72−16.681.581.53 66.5190.0000
Qu19−2.274.71−16.202.823.16
Tw19−2.184.84−15.894.063.79
Tc19−2.154.63−16.202.844.21
Co19−2.544.69−16.591.502.32
Reg190.064.97−15.487.336.00

(c) 8-NN

MeanSDMin.Max. valueMean rank value

Ep19−3.405.04−17.331.941.32 75.8120.0000
Qu19−3.104.75−16.900.453.21
Tw19−2.744.79−16.591.294.58
Tc19−2.934.77−16.961.513.68
Co19−3.284.93−17.251.852.21
Reg19−1.245.35−16.498.086.00

(Ep: Epanechnikov, Qu: Quartic, Tw: Triweight, Tc: Tricube, Co: Cosine, and Reg: regression).

To determine which methods are dissimilar to the others, this study performed the Wilcoxon signed rank test [38]. The basic feature of the Wilcoxon signed rank test is that data samples that come from the same population are paired and it is detailed in the following:where is the sample size, is th value of the second data point, is th value of the first data point, and is the rank of . If the value is less than 5%, it means there is different mechanism used on the sample data or method. Table 3 shows that the value for nn-regression is less than 5% for all cases. Accordingly, this signifies that nn-regression is completely dissimilar to the other methods. Although the five different kernel methods for data interpolation exhibit similarity or dissimilarity to each other depending on the number of reference points, all of the kernel methods can be distinguished from nn-regression using the Wilcoxon signed rank test.

(a) 4-NN

EpQuTwTcCo

Reg−3.823 −3.823−3.823−3.823−3.823
value0.0001 0.00010.00010.00010.0001

(b) 6-NN

EpQuTwTcCo

Reg−3.823 −3.823−3.823−3.823−3.823
value0.0001 0.00010.00010.00010.0001

(c) 8-NN

EpQuTwTcCo

Reg−3.823 −3.823−3.823−3.823−3.823
value0.0001 0.00010.00010.00010.0001

(Ep: Epanechnikov, Qu: Quartic, Tw: Triweight, Tc: Tricube, Co: Cosine, and Reg: regression).

4. Results

Since Epanechnikov has the smallest average rank, which signifies a small difference between the observation value and the interpolated value for all reference points in Table 2, interpolation data obtained from the Epanechnikov method has the best result among the studied methods. Figure 2 shows that filling in data from nn-regression has a large difference at both four and six reference points. Interpolation data from the kernel methods are close to zero for both the average and median values at four reference points, meaning that the interpolation data are similar to the observation data. On the other hand, more than 75% of the interpolation data from nn-regression exhibits a difference than zero. When the interpolation data are evaluated at six reference points in Figure 2, the median value from the nn-regression is shown to be far away from zero. At eight reference points, nn-regression is close to zero for both average and median values; however, it is difficult to conclude that this is an ideal method because outlying maximum values will affect the average and median value.

This study on precipitation data interpolation also evaluated the simulation of the interpolated data using the SWAT hydrologic model. In SWAT hydrologic modeling, the surface runoff is estimated by considering excess precipitation with abstractions and infiltration factor through Soil Conservation Service Curve Number (SCS-CN) method. Green-Ampt (GA) infiltration method is another method to calculate the surface runoff in SWAT. A study shows that both methods give reasonable results, and there is no significant advantage observed in using one over the other. However, the GA method appears to have more limitations in modeling seasonal variability than the SCS-CN method does. Hence, the SCS-CN method is used for infiltration factor in this study. An SCS curve number based simulation needs time step updated information as soil water content changes. Excess rainfall equation in SCS-CN method was generated based on historical relationship between the curve number and the hydrologic mechanism for over 20 years. Throughout the surface runoff calculation, infiltration should be updated over time according to the soil type. Other abstractions such as evapotranspiration and soil and snow evaporation are calculated by Penman-Monteith method and meteorological statistics. Finally, the kinematic storage model is used to compute groundwater storage and seepage. Flow resulting in SWAT modeling is routed HRUs to watershed outlet. Figure 3 shows the calibration of the model simulation as the initial step and the specific parameters are described in Table 4. After the calibration of the SWAT model, the six different interpolated precipitation datasets, with three different reference ranges for each (a total of twenty-four interpolated precipitations data points), were used to assess the performance of interpolated precipitation data for hydrologic model simulation. Streamflow simulations were done for three years from 2008 to 2010. To evaluate the model performance considering the use of different interpolated precipitation datasets, this study used (Nash-Sutcliffe coefficient), -square (coefficient of determination), and RMSE (root mean square error). Table 5 and Figure 4 show that the simulation results from nn-regression exhibit low SWAT simulation performance for streamflow estimations, with 0.54 , 0.74 -square, and 23.78 m3/s RMSE as an average. All of the kernel functions, on the other hand, exhibit good performance for hydrologic simulations with interpolated precipitation data (Table 5 and Figure 4), the average of , -square, and RMSE (1) for Epanechnikov is 0.83, 0.86, and 14.03 m3/s; (2) for Quartic is 0.84, 0.88, and 13.03 m3/s; (3) for Triweight is 0.93, 0.93, and 9.30 m3/s; (4) for Tricube is 0.94, 0.95, and 8.13 m3/s; and (5) for Cosine is 0.93, 0.94, and 9.00 m3/s, respectively.


ParameterDescriptionSelected value

ESCOSoil evaporation compensation factor0.9500
EPCOPlant water uptake compensation factor1.0000
EVLAILeaf area index at which no evaporation occurs from water surface m2/m23.0000
FFCBInitial soil water storage expressed as a fraction of field capacity water content0.0000
IEVENTRainfall/runoff code: 0 = daily rainfall/CN0.0000
ICRKCrack flow code: 1 = model crack flow in soil0.0000
SURLAGSurface runoff lag time [days]4.0000
ADJ_PKRPeak rate adjustment factor for sediment routing in the subbasin (tributary channels)0.0000
PRFPeak rate adjustment factor for sediment routing in the main channel1.0000
SPCONLinear parameter for calculating the maximum amount of sediment that can be reentrained during channel sediment0.0001
SPEXPExponent parameter for calculating sediment reentrained in channel sediment routing1.0000


4-NN6-NN8-NN
RMSERMSERMSE

Ep0.80 0.83 15.32 0.91 0.92 10.60 0.78 0.82 16.16
Qu0.73 0.78 17.83 0.91 0.93 10.48 0.88 0.92 11.80
Tw0.91 0.91 10.56 0.93 0.94 9.25 0.95 0.95 8.10
Tc0.95 0.95 7.72 0.93 0.94 9.03 0.95 0.95 7.64
Co0.93 0.94 8.83 0.95 0.95 7.72 0.91 0.93 10.44
Reg0.69 0.80 19.14 0.21 0.65 30.71 0.71 0.73 21.48

5. Conclusions

Five different kernel functions were applied to the Imha watershed to evaluate the performance of each weighted method for estimating missing precipitation data and the use of interpolated data for hydrologic simulations was assessed. The following conclusions can be drawn from this research.(1)To estimate missing precipitation data points, exploratory procedures should consider the spatiotemporal variations of precipitation. Due to difficulty on accounting for these variations, statistical methods for estimating missing precipitation data are commonly used.(2)Although ANNs are an advanced approach for estimating missing data, mechanisms are unclear because the neuron system is ultimately a black-box model. Thus, regression methods are widely used for estimating missing data, even though there are limitations in that regression methods cannot follow normal distribution when the sample is small.(3)When using kernel functions as a weighted method, estimated missing data would satisfy normal distribution which is more statistically sound. Also, kernel methods can overcome weakness in nn-regression if the data have outliers and/or a nonlinear trend around the missing data points in terms of mean value.(4)This study assessed the five kernel functions, Epanechnikov, Quartic, Triweight, Tricube, and Cosine, as a weight for predicting missing values. In comparison with the nn-regression method, this study demonstrates that the kernel approaches provide higher quality interpolated precipitation data than the nn-regression approach. In addition, the kernel function results better conform to statistical standards.(5)Furthermore, higher quality of interpolated precipitation data results in better performance for hydrologic simulations, as exemplified in this study. All of the statistical analyses of the streamflow simulations showed that the simulations using the interpolated precipitation data from the kernel functions provide better results than using nn-regression.(6)Use of kernel distribution is a more effective method than regression when the precipitation data have an upward or downward trend. However, if the precipitation data have a nonlinear trend, it is difficult to effectively reconstruct the missing values. For further research, a time series analysis or a random walk model using a stochastic process are possible methods by which to estimate missing data where there is a nonlinear trend.

Appendices

A. Kenel Functions

Kernel density estimation is an unsupervised learning procedure, which historically precedes kernel regression. It also leads naturally to a simple family of procedures for nonparametric classification.

A.1. Kernel Density Estimation

Suppose we have a random sample draw from a probability density and we wish to estimate at a point . For simplicity we assume for now that (real value). Arguing as before, a natural local estimate has the formwhere means number of which converges to and is a small metric neighborhood around of width . This estimate is bumpy, and the smooth Parzen estimate is preferred,because it counts observations close to with weights that decrease with distance from . In this case a popular choice for is the Gaussian kernel . Letting denote the Gaussian density with mean zero and standard-deviation , then (A.2) has the formthe convolution of the sample empirical distribution with . The distribution puts mass at each of the observed and is jumpy; in we have smoothed by adding independent Gaussian noise to each observation .

The Parzen density estimate is the equivalent of the local average, and improvements have been proposed along the lines of local regression (on the log scale for densities). We will not pursue these here. In the natural generalization of the Gaussian density estimate amounts to using the Gaussian product kernel in (A.3),

A.2. Kernel Density Classification

One can use nonparametric density estimates for classification in a straight-forward fashion using Bayes’ theorem. Suppose for a class problem we fit nonparametric density estimates , separately in each of the classes, and we also have estimates of the class priors (usually the sample proportions). ThenIn this region the data are sparse for both classes, and since the Gaussian kernel density estimates use matric kernels, the density estimates are low and of poor quality (high variance) in these regions. The local logistic regression method uses the tricube kernel with -NN bandwidth; this effectively widens the kernel in this region and makes use of the local linear assumption to smooth out the estimate (on the logit scale).

If classification is the ultimate goal, then learning the separate class densities well may be unnecessary and can in fact be misleading. In learning the separate densities form data, one might decide to settle for a rougher, high-variance fit to capture these features, which are irrelevant for the purposes of estimating the posterior probabilities. In fact, if classification is the ultimate goal, then we need only to estimate the posterior well near the decision boundary (for two classes, this is the set ).

B. Procedures of Missing Precipitation

This step shows example calculation for kernel functions for weighted mean. It is an example question about the weight of each situation. If the kernel functions are all symmetric, same values are used for weight based on day distance. Following Table 6 1st, 2nd, 3rd, and 4th day distance and weighted values are shown. For example, if we want to estimate missing precipitation for 2010-02-12 (actual value is 6), see following procedures (3 steps) with 4-NN Epanechnikov kernel (Table 7).


4-NN6-NN8-NN
1st2nd1st2nd3rd1st2nd3rd4th

Ep0.6670.4170.7030.5630.3280.7200.6300.4800.270
Qu0.7410.2890.8240.5270.1790.8640.6620.3840.122
Tw0.7680.1880.9010.4610.0920.9680.6480.2870.051
Tc0.7720.3010.8240.5790.1670.8440.7090.4160.100
Co0.6800.3930.7260.5550.3010.7470.6350.4620.243


Step 1Step 2Step 3
DatePrec.WeightPrec.·WeightEstimation

2010-02-10150.4176.2555.949
2010-02-1117.20.66711.472
2010-02-126
2010-02-139.10.6676.070
2010-02-1400.4170

Step 1. Select the date for target interpolation data.

Step 2. Decide th nearest days precipitation and each kernel weight.

Step 3. Calculate the weight average to estimate missing.

The rest of the kernel methods for estimating missing precipitation are described in Table 8.


Method4NN6NN8NN

Ep5.9495.1724.862
Qu5.9565.3024.936
Tw5.7555.2944.952
Tc6.2055.4074.963
Co5.9455.1974.876
Reg10.3258.9678.813

(Ep: Epanechnikov, Qu: Quartic, Tw: Triweight, Tc: Tricube, Co: Cosine, and Reg: regression).

C. Sample Calculations with Real Value

This section shows how to calculate missing precipitation with kernel mean weighed function by using certain number. This sample selected daily data from 2008 to 2010 with 0.02 possibilities to bivariate by random. After selected data, setting data location is operated. Zhang et al. [39] addressed that kernel based nonparametric multiple imputation has better performance than general linear regression when the sample data is small or limited.

Table 9 shows procedure of kernel weight in each function. We used data Feb. 10, 2012 from Feb. 14, 2014 to estimate Feb. 12, 2012 missing data. Epanechnikov kernel showed that longest data has highest estimation as 0.417; however, Triweight kernel showed that longest data has lowest estimation as 0.188. Highest weight in nearest value is Tricube kernel and lowest weight is Epanechnikov kernel. Generally, Tricube, that is, high weight, shows the overestimation for missing precipitation.

(a) Epanechnikov

Date2.10.2.11.2.12.2.13.2.14.

Prec.15.017.26.09.10.0

Ep. weight0.4170.6670.6670.417

Prec.·weight6.2611.476.070.00

Estimation5.95

(b) Quartic

Date2.10.2.11.2.12.2.13.2.14.

Prec.15.017.26.09.10.0

Qu. weight0.2890.7410.7410.289

Prec.·weight4.3412.756.740.00

Estimation5.96

(c) Triweight

Date2.10.2.11.2.12.2.13.2.14.

Prec.15.017.26.09.10.0

Tw. weight0.1880.7680.7680.188

Prec.·weight2.8213.216.990.00

Estimation5.75

(d) Tricube

Date2.10.2.11.2.12.2.13.2.14.

Prec.15.017.26.09.10.0

Tc. weight0.3010.7720.7720.301

Prec.·weight4.5213.287.030.00

Estimation6.20

(e) Cosine

Date2.10.2.11.2.12.2.13.2.14.

Prec.15.017.26.09.10.0

Co. weight0.3930.6800.6800.393

Prec.·weight5.9011.706.190.00

Estimation5.94

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

  1. K. Kang and V. Merwade, “Development and application of a storage-release based distributed hydrologic model using GIS,” Journal of Hydrology, vol. 403, no. 1-2, pp. 1–13, 2011. View at: Publisher Site | Google Scholar
  2. A. J. Abebe, D. P. Solomatine, and R. G. W. Venneker, “Application of adaptive fuzzy rule-based models for reconstruction of missing precipitation events,” Hydrological Sciences Journal, vol. 45, no. 3, pp. 425–436, 2000. View at: Publisher Site | Google Scholar
  3. P. Ramos-Calzado, J. Gómez-Camacho, F. Pérez-Bernal, and M. F. Pita-López, “A novel approach to precipitation series completion in climatological datasets: application to Andalusia,” International Journal of Climatology, vol. 28, no. 11, pp. 1525–1534, 2008. View at: Publisher Site | Google Scholar
  4. T. Schneider, “Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values,” Journal of Climate, vol. 14, no. 5, pp. 853–871, 2001. View at: Google Scholar
  5. S. K. Regonda, D.-J. Seo, B. Lawrence, J. D. Brown, and J. Demargne, “Short-term ensemble stream forecasting using operationally-produced single-valued streamflow forecasts—a Hydrologic Model Output Statistics (HMOS) approach,” Journal of Hydrology, vol. 497, pp. 80–96, 2013. View at: Publisher Site | Google Scholar
  6. P. Coulibaly and N. D. Evora, “Comparison of neural network methods for infilling missing daily weather records,” Journal of Hydrology, vol. 341, no. 1-2, pp. 27–41, 2007. View at: Publisher Site | Google Scholar
  7. H. A. El Sharif and R. S. V. Teegavarapu, “Evaluation of spatial interpolation methods for missing precipitation data: preservation of spatial statistics,” in Proceedings of the World Environmental and Water Resources Congress, pp. 3822–3832, May 2012. View at: Publisher Site | Google Scholar
  8. A. Bárdossy and G. Pegram, “Infilling missing precipitation records—a comparison of a new copula-based method with other techniques,” Journal of Hydrology, vol. 519, pp. 1162–1170, 2014. View at: Publisher Site | Google Scholar
  9. K. Schamm, M. Ziese, A. Becker et al., “Global gridded precipitation over land: a description of the new GPCC first guess daily product,” Earth System Science Data, vol. 6, no. 1, pp. 49–60, 2014. View at: Publisher Site | Google Scholar
  10. R. S. V. Teegavarapu, “Statistical corrections of spatially interpolated missing precipitation data estimates,” Hydrological Processes, vol. 28, no. 11, pp. 3789–3808, 2014. View at: Publisher Site | Google Scholar
  11. Y. Da and G. Xiurun, “An improved PSO-based ANN with simulated annealing technique,” Neurocomputing, vol. 63, pp. 527–533, 2005. View at: Publisher Site | Google Scholar
  12. A. di Piazza, F. L. Conti, L. V. Noto, F. Viola, and G. La Loggia, “Comparative analysis of different techniques for spatial interpolation of rainfall data to create a serially complete monthly time series of precipitation for Sicily, Italy,” International Journal of Applied Earth Observation and Geoinformation, vol. 13, no. 3, pp. 396–408, 2011. View at: Publisher Site | Google Scholar
  13. E. Pisoni, F. Pastor, and M. Volta, “Artificial Neural Networks to reconstruct incomplete satellite data: application to the Mediterranean Sea Surface Temperature,” Nonlinear Processes in Geophysics, vol. 15, no. 1, pp. 61–70, 2008. View at: Publisher Site | Google Scholar
  14. V. Sharma, S. Rai, and A. Dev, “A comprehensive study of artificial neural networks,” International Journal of Advanced Research in Computer Science and Sofrware Engineering, vol. 2, no. 10, pp. 278–284, 2012. View at: Google Scholar
  15. ASCE Task Committee, “Artificial neural networks in hydrology. II: hydrologic applications,” Journal of Hydrological Engineering, vol. 5, no. 2, pp. 124–137, 2000. View at: Publisher Site | Google Scholar
  16. ASCE Task Committee, “Artificial neural networks in hydrology. I: preliminary concepts,” Journal of Hydrologic Engineering, vol. 5, no. 2, pp. 115–123, 2000. View at: Publisher Site | Google Scholar
  17. D. E. Rumelhart, B. Widrow, and M. A. Lehr, “The basic ideas in neural networks,” Communications of the ACM, vol. 37, no. 3, pp. 87–92, 1994. View at: Publisher Site | Google Scholar
  18. J. Amorocho and W. E. Hart, “A critique of current methods in hydrologic systems investigation,” Transactions of the American Geophysical Union, vol. 45, no. 2, pp. 307–321, 1964. View at: Google Scholar
  19. A. W. Minns and M. J. Hall, “Artificial neural networks as rainfall-runoff models,” Hydrological Sciences Journal, vol. 41, no. 3, pp. 399–417, 1996. View at: Publisher Site | Google Scholar
  20. K. Kang and V. Merwade, “The effect of spatially uniform and non-uniform precipitation bias correction methods on improving NEXRAD rainfall accuracy for distributed hydrologic modeling,” Hydrology Research, vol. 45, no. 1, pp. 23–42, 2014. View at: Publisher Site | Google Scholar
  21. C. Daly, W. P. Gibson, G. H. Taylor, G. L. Johnson, and P. Pasteris, “A knowledge-based approach to the statistical mapping of climate,” Climate Research, vol. 22, no. 2, pp. 99–113, 2002. View at: Publisher Site | Google Scholar
  22. J. D. Creutin, H. Andrieu, and D. Faure, “Use of a weather radar for the hydrology of a mountainous area. Part II: radar measurement validation,” Journal of Hydrology, vol. 193, no. 1–4, pp. 26–44, 1997. View at: Publisher Site | Google Scholar
  23. Y. L. Xia, P. Fabian, A. Stohl, and M. Winterhalter, “Forest climatology: estimation of missing values for Bavaria, Germany,” Agricultural and Forest Meteorology, vol. 96, no. 1–3, pp. 131–144, 1999. View at: Publisher Site | Google Scholar
  24. C. J. Willmott, S. M. Robeson, and J. J. Feddema, “Estimating continental and terrestrial precipitation averages from rain-gauge networks,” International Journal of Climatology, vol. 14, no. 4, pp. 403–414, 1994. View at: Publisher Site | Google Scholar
  25. R. S. V. Teegavarapu and V. Chandramouli, “Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records,” Journal of Hydrology, vol. 312, no. 1–4, pp. 191–206, 2005. View at: Publisher Site | Google Scholar
  26. J. A. Smith, “Precipitation,” in Handbook of Hydrology, D. R. Maidment, Ed., vol. 3, chapter 3, McGraw Hill, New York, NY, USA, 1993. View at: Google Scholar
  27. J. R. Simanton and H. B. Osborn, “Reciprocal-distance estimate of point rainfall,” Journal of Hydraulic Engineering Division, vol. 106, no. 7, pp. 1242–1246, 1980. View at: Google Scholar
  28. J. D.-J. Salas, “Analysis and modeling of hydrological time series,” in Handbook of Hydrology, D. R. Maidment, Ed., vol. 19, chapter 19, pp. 19.1–19.72, McGraw-Hill, New York, NY, USA, 1993. View at: Google Scholar
  29. S. J. Jeffrey, J. O. Carter, K. B. Moodie, and A. R. Beswick, “Using spatial interpolation to construct a comprehensive archive of Australian climate data,” Environmental Modelling and Software, vol. 16, no. 4, pp. 309–330, 2001. View at: Publisher Site | Google Scholar
  30. M. Franklin, V. R. Kotamarthi, M. L. Stein, and D. R. Cook, “Generating data ensembles over a model grid from sparse climate point measurements,” Journal of Physics: Conference Series, vol. 125, Article ID 012019, 2008. View at: Publisher Site | Google Scholar
  31. K. C. Young, “A three-way model for interpolating for monthly precipitation values,” Monthly Weather Review, vol. 120, no. 11, pp. 2561–2569, 1992. View at: Publisher Site | Google Scholar
  32. F. Filippini, G. Galliani, and L. Pomi, “The estimation of missing meteorological data in a network of automatic stations,” Transactions on Ecology and the Environment, vol. 4, pp. 283–291, 1994. View at: Google Scholar
  33. W. P. Lowry, Compendium of Lecture Notes in Climatology for Class IV Meteorological Personnel, Secretariat of the World Meteorological Organization, Geneva, Switzerland, 1972.
  34. M. C. Acock and Y. A. Pachepsky, “Estimating missing weather data for agricultural simulations using group method of data handling,” Journal of Applied Meteorology, vol. 39, no. 7, pp. 1176–1184, 2000. View at: Publisher Site | Google Scholar
  35. S. L. Neitsch, J. G. Arnold, J. R. Kiniry, J. R. Williams, and K. W. King, Soil and Water Assessment Tool—Theoretical Documentation (Version 2005), Texas Water Resource Institute, College Station, Tex, USA, 2005.
  36. S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality: complete samples,” Biometrika, vol. 52, no. 3-4, pp. 591–611, 1965. View at: Publisher Site | Google Scholar | MathSciNet
  37. M. Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” Journal of the American Statistical Association, vol. 32, no. 200, pp. 675–701, 1937. View at: Publisher Site | Google Scholar
  38. F. Wilcoxon, “Individual Comparisons by Ranking Methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945. View at: Publisher Site | Google Scholar
  39. S. Zhang, Z. Jin, X. Zhu, and J. Zhang, “Missing data analysis: a kernel-based multi-imputation approach,” in Transactions on Computational Science III, vol. 5300 of Lecture Notes in Computer Science, pp. 122–142, Springer, Berlin, Germany, 2009. View at: Publisher Site | Google Scholar | MathSciNet

Copyright © 2015 Hyojin Lee and Kwangmin Kang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

4762 Views | 2073 Downloads | 10 Citations
 PDF  Download Citation  Citation
 Download other formatsMore
 Order printed copiesOrder

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at help@hindawi.com to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19.