Reservoir Inflow Prediction under GCM Scenario Downscaled by Wavelet Transform and Support Vector Machine Hybrid Models
Climate change has significant impacts on changing precipitation patterns causing the variation of the reservoir inflow. Nowadays, Indonesian hydrologist performs reservoir inflow prediction according to the technical guideline of Pd-T-25-2004-A. This technical guideline does not consider the climate variables directly, resulting in significant deviation to the observation results. This research intends to predict the reservoir inflow using the statistical downscaling (SD) of General Circulation Model (GCM) outputs. The GCM outputs are obtained from the National Center for Environmental Prediction/National Center for Atmospheric Research Reanalysis (NCEP/NCAR Reanalysis). A new proposed hybrid SD model named Wavelet Support Vector Machine (WSVM) was utilized. It is a combination of the Multiscale Principal Components Analysis (MSPCA) and nonlinear Support Vector Machine regression. The model was validated at Sutami Reservoir, Indonesia. Training and testing were carried out using data of 1991–2008 and 2008–2012, respectively. The results showed that MSPCA produced better extracting data than PCA. The WSVM generated better reservoir inflow prediction than the one of technical guideline. Moreover, this research also applied WSVM for future reservoir inflow prediction based on GCM ECHAM5 and scenario SRES A1B.
Global warming caused by increased concentrations of greenhouse gases has led to climate change. It has an impact on changes in rainfall patterns in spatial-temporal perspective. Based on the fourth report of the Intergovernmental Panel on Climate Change (IPCC) , the pattern of rainfall and extreme rainfall events in the Southeast Asian countries will change as the climate changes. In Indonesia, rainfall pattern is also significantly changed into a spatial-temporal one due to the climate change. Some regions have experienced a timing shift between wet months and dry months . Climate change also affects the changes in the trends or patterns of rainfall at the Brantas Watershed in East Java, Indonesia .
The spatial-temporal changes in the precipitation patterns may lead to the changes in the reservoir inflow. Nowadays, prediction of the reservoir inflow is performed using the Pd-T-25-2004-A technical guideline issued by the Department of Settlement and Regional Infrastructure (Kimpraswil), Indonesia . This technical guideline does not consider the climate variable directly, which results in the significant deviation when compared to the observation results.
The objective of this research is to predict the reservoir inflow by including the climate variables in a direct manner. The atmospheric data of General Circulation Model (GCM) outputs are taken from the National Oceanic and Atmospheric Administration (NOAA), National Center for Environmental Prediction/National Center for Atmospheric Research Reanalysis (NCEP/NCAR Reanalysis). The NCEP/NCAR Reanalysis outputs have a coarse spatial resolution (), making it unusable to the hydrologic modeling in watershed scale [5–7]. The inappropriate spatial resolution is to be resolved by developing the downscaling technique [8, 9].
Downscaling techniques can be grouped into two types of approaches, that is, dynamic downscaling (DD) and statistical downscaling (SD). DD model approach is a Regional Climate Model (RCM), which refers to the physical boundary conditions on a regional scale GCM. This approach requires a complex design and very high computational cost [10, 11]. SD model is a computationally simple and economical one that is carried out by determining the transfer function (empirical) that connects between atmospheric circulation variables (predictors) and local climate variables (predictand) .
Streamflow modeling using the SD model is performed using two approaches, that is, indirect downscaling and direct downscaling. The first approach is performed by linking the SD model of precipitation and the hydrological model [7, 13–15]. Meanwhile, the second approach is performed using the SD model of the streamflow which is based on the GCM outputs directly [9, 16, 17]. In this approach, influences of land use, soil cover, and groundwater storage are not considered.
The choice of potential predictors of the GCM outputs is an important part in the SD model . The predictor selection may vary in each region, depending upon the characteristics of the GCM outputs and the characteristics of the predictand . Besides, the selection of the optimum domain grid of the GCM outputs may result in a better correlation between predictor and predictand. Ghosh and Mujumdar  and Sachindra et al.  employed the optimum domain grids of GCM outputs of and , respectively, for the SD streamflow, while Tripathi et al.  and Tolika et al.  employed the optimum domain grids of GCM outputs of and , respectively, for the SD precipitation. The preprocessing or extraction data of the predictor in the optimum GCM domain grid is performed using the Principal Component Analysis (PCA) [9, 16, 17]. Moreover, extraction data can be performed based on wavelet transform, namely, Multiscale Principal Component Analysis (MSPCA). It is better suited for extraction data containing contributions that change over time and frequency .
This study developed direct statistical downscaling models to predict reservoir inflow using a novel hybrid model, namely, Wavelet Support Vector Machine (WSVM). WSVM is a combination of GCM output data extraction based on wavelet transform and nonlinear Support Vector Machine regression.
2. Data and Methods
2.1. Study Area and Data Resource
The calibration and validation tests of the SD model to reservoir inflow prediction were conducted at Sutami Reservoir. The Sutami Dam is located at Upper Brantas Watershed which has catchment area about 2052 km2. This site is registered at Karangkates Village, Sumberpucung Subdistrict, Malang Regency, Indonesia. Geographically, the Upper Brantas Watershed is observed at the coordinates of 7° 44′ 29′′ to 8° 19′ 47′′ south latitude and between 112° 27′ 25′′ and 112° 55′ 23′′ east longitude. There are eight rainfall stations, that is, Pujon, Tangkil, Poncokusumo, Dampit, Sengguruh, Sutami, Wagir, and Tunggorono. The location study is shown in Figure 1.
In order to provide the inputs for the calibration and validation of SD model, NCEP/NCAR Reanalysis data for the period 1991–2012 were obtained from http://www.esrl.noaa.gov/psd/. The selection of the potential predictor of the NCEP/NCAR Reanalysis data is based on the correlation coefficient above 0.5 to the predictand (local precipitation). The potential predictors consist of precipitation water (prwtr), zonal velocity component (uwnd), meridional velocity component (vwnd), temperature (air), pressure (pres), sea level pressure (slp), relative humidity at 500 hPa (rhum500), relative humidity at 850 hPa (rhum850), specific humidity at 500 hPa (shum500), specific humidity at 850 hPa (shum850), omega at 500 hPa (omega500), omega at 850 hPa (omega850), zonal velocity component at 850 hPa (uwnd850), and meridional velocity component at 850 hPa (vwnd850). Moreover, monthly precipitation and reservoir inflow with similar period were obtained from Perum Jasa Tirta I, Malang, Indonesia. The datasets were separated into two groups, that is, training data (1991 to 2008) and testing data (2008 to 2012).
For the future projection of reservoir inflow, monthly outputs of the GCM ECHAM5 under Special Report on Emission Scenario (SRES) A1B of IPCC were obtained from the Programme for Climate Model Diagnoses and Intercomparison (PCMDI) website (http://www.ipcc-data.org/sim/gcm_monthly/SRES_AR4/index.html) for the period 2013–2035. The GCM ECHAM5 is used for reservoir inflow prediction which is referred to in previous studies [20, 21]. The SRES A1B is a climate change scenario which indicated that the atmospheric CO2 concentrations reach 720 ppm in the year 2100 . A research carried out by Ambarsari and Tedjasukmana  demonstrated that the atmospheric CO2 concentrations over Indonesia increased ranging from 370 ppm to 390 ppm during 8 years (2002 to 2010). If the increasing rate of the CO2 concentration is assumed to be constant (2.5 ppm per year), it will reach 615 ppm in the year 2100 (closest to the SRES A1B).
2.2. Wavelet Transform
The wavelet analysis is an important tool to provide information for both frequency and time domain of the time series data. The wavelet transform decomposes time series data into different frequencies using wavelet functions. The application of the wavelet transform for streamflow forecasting was carried out by several researchers, such as Guo et al. , Kisi and Cimen , and Santos and Silva . Moreover, the wavelet transform is performed in order to reduce the high data dimension .
The advantage of wavelet transform over PCA in reducing the data dimension is ability to provide a lot of alternative matrixes of transformation which can be selected, in such a way that the resulting dimension is compatible with the original data . The matrixes of wavelet coefficient are obtained by dilation and translation of two types of wavelet functions, namely, father wavelet () and mother wavelet () .
2.3. Multiscale Principal Component Analysis (MSPCA)
The data dimension reduction by wavelet transform may cause multicollinearity; thus, further analysis using PCA is needed. Combined data dimension reduction between wavelet transform and PCA is called Multiscale Principal Component Analysis (MSPCA). MSPCA combines the ability of PCA to decorrelate the variables by extracting a linear relationship with that of wavelet analysis to extract deterministic features and approximately decorrelate autocorrelated measurements. MSPCA computes the PCA of the wavelet coefficients at each scale, followed by combining the results at relevant scales . The applications of the preprocessing data through employing MSPCA can be seen in several previous literatures, such as Aminghafari et al. , Sharma et al. , Widjaja et al. , and Anwar et al. .
Figure 2 presents the MSPCA algorithm. The NCEP/NCAR Reanalysis predictors are decomposed using Daubechies wavelet of order 10 (db-10) up to level 3, yielding detail coefficients () and approximation coefficients () with being the level. The PCA is applied to the wavelet coefficients at each scale. If the first eigenvalue exceeds all the mean eigenvalue data (Kaiser’s rule), the new wavelet coefficients are computed. Otherwise, the wavelet coefficients at that scale are set to zero. The new wavelet coefficients are obtained (noted as and ). For all scales, reconstruction of the new wavelet coefficients and the final principal components (PCs) are obtained.
2.4. Support Vector Machine (SVM)
In the past decades, the traditional Artificial Neural Networks (ANN) such as Multilayer Backpropagation (MLBP) and Radial Basis Function (RBF) have been used intensively in hydrological modeling. The local minima and overfitting are frequently encountered in modeling with ANN . Recently, Vapnik  developed a new machine learning algorithm, called Support Vector Machine (SVM), which provides an excellent solution to these problems.
The nonlinear relationship can be expressed aswhere is the output model, is an adjustable weight vector, and is a bias. The parameters can be estimated by minimizing the cost function [35, 36]:The slack variables and describe the -insensitive loss function. The constant is a user-specified positive parameter. The first term of the cost function is to find the appropriate value of to improve generalization model. The second term of the cost function is penalty function to arrange deviation output and target larger than using the -insensitive loss function. According to Smola and Schölkopf , the SVM soft margin setting is shown in Figure 4.
The optimization in (2) is the primal problem for regression. It can be solved using Lagrange multipliers method , which is expressed as and are Lagrange multipliers which are nonnegative real constants. The data points that are not zero at () are called the support vectors. From (3) the nonlinear function estimation of SVM is obtained and expressed as is the inner-product kernel defined in accordance with Mercer’s theorem . It is expressed asThere are several kernel functions that can be used including polynomial, Gaussian or Radial Basis Function (RBF), and sigmoid. In this study, RBF is used to map the input space into the higher dimensional feature space. The advantage of RBF kernel is that it can effectively handle the conditions when the relationship between predictors and predictand is nonlinear. Moreover, the RBF is computationally simpler than polynomial kernel which has more parameters . The RBF kernel is given byThe SVM with RBF kernel involves selection of the penalty parameter () and RBF kernel parameter (). The optimal values of SVM parameters are obtained by the grid search methods . Architecture of SVM for regression can be seen in Figure 5 .
2.5. Performances of SD Model
The performances of SD model were evaluated by comparing the model output with reservoir inflow observation. Criteria of model performance were evaluated using the Coefficient of Determination (), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). measures the proportion of variability in the dependent variable (predictand) that is explained by the regression model through the independent variables (predictors). It is a measure of the goodness of fit for the estimated regression model. The RMSE is a measure of the difference between predicted and observed. It is used for iterative algorithms and is a better measure for high values. The MAE measures the average magnitude of the error in a set of forecasts without considering their direction and does not get too much influenced by higher values.
2.6. Reservoir Inflow Prediction Using Technical Guideline
Nowadays, the prediction of reservoir inflow is made by referring to the technical guideline (Pd-T-25-2004-A) issued by the Department of Settlement and Regional Infrastructure, Kimpraswil, Indonesia . In the technical guideline, the reservoir inflow characteristics are classified into three hydrometeorological conditions, that is, the wet year, normal year, and dry year. The prediction of reservoir inflow is analyzed based on the historical data of the reservoir inflow by considering those three hydrometeorological conditions. The classification of hydrometeorological conditions refers to the percentage of the reservoir inflow volume as shown in Table 1.
Based on monthly inflow record, the annual inflow is determined to be cumulative for year. The results then are being listed in ascending order. The probability of each dataset can be obtained based on its rank number and the number of the annual inflow data. A plot of annual inflow and its associate probability can be established (Figure 6). The data can be grouped into three classes based on their probability value representing dry, normal, and wet years. Finally, inflow prediction for the consecutive months is obtained according to the type of the year.
3. Results and Discussion
3.1. Selection of Optimum Domain Grid and Potential Predictors
There is no specific guideline to determine the optimum size of domain grid of NCEP/NCAR Reanalysis . Ghosh and Mujumdar  and Sachindra et al.  utilized 5 × 5 and 7 × 6 grids, respectively, on their streamflow downscaling model. This work determined the optimum grid based on the correlation coefficient. Calculation results of various domain grids are presented in Table 2 where the target location was in the center of the domain grid (Figure 7). It clearly shows that the optimum grid is of 4 × 4 size which has the highest correlation value.
3.2. Preprocessing Data of NCEP/NCAR Reanalysis
Each predictor at the domain grid 4 × 4 had 25 observation points. SD modeling for reservoir inflow prediction required 14 predictors. Then 350 observational points were employed. Then both PCA and MSPCA were applied to reduce the data dimension. The results are listed in Table 3. Following , this work took cumulative variance more than 98%. It shows that MSPCA required only 7 PCs instead of 11 PCs of PCA to achieve 98% cumulative variance. Utilizing wavelet transform in MSPCA significantly reduced the data dimension compared to PCA alone. It was chosen for the decomposition of predictors due to its more detailed format, which provides better representation of data predictors.
3.3. SVM and WSVM Downscaling Model Calibration and Validation
SVM downscaling model was made of PCA and MSPCA results as input data. The number of input data was 11 PCs of PCA and 7 PCs of MSPCA. The SVM with MSPCA input data was named as WSVM. The SVM with RBF kernel used in this study has two parameters () to be determined. These parameters are mutually dependent so that changing the value of one parameter changes other parameters. The parameter values are obtained by grid search method. The optimal parameters of SVM and WSVM were obtained by averaging the value from the fivefold cross validation. The optimal parameter of SVM is gained at and the RBF kernel parameter , while the optimal parameter of WSVM is gained at and the RBF kernel parameter .
The running results of the SVM and WSVM models during the training and testing stages are given in Table 4, whereas the plot of running results of the SVM and WSVM is shown in Figures 8 and 9. Moreover, the performance of Pd-T-25-2004-A can be seen in Table 4 and Figure 9.
The results show that the WSVM needs less PCs inputs than the SVM to generate the results with similar accuracy. It also proves that WSVM is a parsimonious model for reservoir inflow prediction with as less inputs or PCs as possible.
3.4. Comparison of Reservoir Inflow Modeling
The results of running SD model to predict the reservoir inflow by employing SVM and WSVM (Figure 9 and Table 4) reveal that the SVM and WSVM models generate better prediction results when compared to the prediction model of the reservoir inflow currently used (Pd-T-25-2004-A). The technical guideline is not able to predict the reservoir inflow well when the abnormal shift of the seasons or the duration of rainy and dry seasons occurs (from the normal 6-month shift) as shown by the reservoir inflow prediction in 2010 which was lower than the observed reservoir inflow (Figure 9). The year 2010 was the wet year, where the duration of the rainy season was longer than the usual period.
The direct downscaling model using SVM and WSVM is applicable to use for predicting reservoir inflow due to climate change. This model has an advantage of being able to include the global climate determinant (atmospheric circulation) variable in such a direct manner that allows the prediction of the reservoir inflow. However, this model also shows limitation since the effect of the physical characteristic of Sutami Watershed is not included. The reservoir inflow change is assumed to be influenced only by the change on the precipitation pattern due to the climate change. Yet, the reservoir inflow change is a complex combination between the effect of the global climate change and the changes on the physical characteristics of watershed.
3.5. Reservoir Inflow Prediction under the Climate Change Scenario
Reservoir inflow prediction under the climate change scenario is based on the Special Report of Emission Scenario (SRES) A1B of Max Plank Institute (GCM ECHAM-5). Figure 10 shows the amount of monthly change of reservoir inflow under the climate changes scenario in the future for 22 years (2013–2035).
The trend of the future reservoir inflow (2013–2035) has the same pattern of the trend of historical reservoir inflow (1991–2012). In future, predicting of trend of reservoir inflow is important for optimal reservoir operation. According to Zhang et al. , the knowledge of trend in the streamflow is important for efficient management of water resources.
This work was successful in building and validating a statistical downscaling framework of reservoir inflow directly from GCM outputs. A new proposed hybrid SD model called WSVM was successfully applied to predict reservoir inflow. It utilized MSPCA that required less input data than PCA to generate similar performance results. Utilizing NCEP/NCAR Reanalysis output, the model succeeded to provide better prediction than the one of Indonesian Technical Guide that ignored the climate change effect. The model had also successfully forecasted the reservoir inflow trend for 2013–2035 period by using GCM ECHAM5. However, WSVM did not consider the natural hydrologic cycle such as land use change and groundwater storage on the reservoir inflow prediction.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research was funded by the Higher Education of Indonesia Education Ministry and Research Institute of University of Jember (Grant no. 432/UN25.3.1/LT.6/2014).
IPCC, Climate Change 2007: The Physical Science Basis, Contribution of Working Group I to the Fourth Assessment Report on the Intergovernmental Panel on Climate Change, Cambridge University Press, Cambridge, UK, 2007.
S. L. Slamet and S. S. Berliana, “Indikasi Perubahan Iklim dari Pergeseran Bulan Basah, Bulan Kering dan Lembab,” in Seminar Nasional Pemanasan Global dan Perubahan Iklim (LAPAN '07), Nopember 2007.View at: Google Scholar
Kimpraswil, “Single reservoir operation: the technical guideline of constructing and building,” Tech. Rep. Pd-T-25-2004-A, Departemen Pemukiman dan Prasarana Wilayah, 2004, (Indonesian).View at: Google Scholar
R. L. Wilby, S. P. Charles, E. Zorita, B. Timbal, P. Whetton, and L. O. Mearns, Guidelines for Use Climate Scenarios Developed from Statistical Downscaling Methods, 2004.
K. Tolika, P. Maheras, M. Vafiadis, H. A. Flocas, and A. Arseni-Papadimitriou, “Simulation of seasonal precipitation and raindays over Greece: a statistical downscaling technique based on artificial neural networks (ANNs),” International Journal of Climatology, vol. 27, no. 7, pp. 861–881, 2007.View at: Publisher Site | Google Scholar
R. G. Crane and B. C. Hewitson, “Doubled CO2 precipitation change for the susquenhanna basin: downscaling from the genesis general circulation model,” International Journal of Climatology, vol. 18, no. 1, pp. 65–76, 1998.View at: Google Scholar
D. A. Sachindra, F. Huang, A. Barton, and B. J. C. Perera, “Least square support vector and multi-linear regression for statistically downscaling general circulation model outputs to catchment streamflows,” International Journal of Climatology, vol. 33, no. 5, pp. 1087–1106, 2013.View at: Publisher Site | Google Scholar
A. H. Wigena, B. Aunuddin, and R. Boer, Statistical downscaling modeling using projection pursuit regression to forecast monthly rainfall, a case of monthly rainfall in Indramayu [Ph.D. thesis], Bogor Agricultural University, Bogor, Indonesia, 2006, (Indonesian).
N. Ambarsari and B. S. Tedjasukmana, “Kajian perkembangan teknologi sounding untuk mengukur konsentrasi CO2 di atmosfer,” Berita Dirgantara, vol. 12, no. 1, pp. 28–27, 2011.View at: Google Scholar
S. Sunaryo, K. A. Notodipuro, L. K. Darusalam, and I. W. Mangku, Calibration model using wavelet transform as preprocessing methods [Ph.D. thesis], Bogor Agricultural University, Bogor, Indonesia, 2005, (Indonesian).
L. N. Sharma, S. Danadapat, and A. Mahanta, “Multiscale PCA based quality controlled denoising of multichannel ECG signals,” International Journal of Information and Electronics Engineering, vol. 2, pp. 107–111, 2012.View at: Google Scholar
D. Widjaja, E. V. Diest, and S. V. Huffel, “Extraction of direct respiratory infulences from the tachogram using multiscale principal component analysis,” International Journal of Bioelectromagnetism, vol. 15, no. 1, pp. 97–101, 2013.View at: Google Scholar
A. Anwar, G. Halik, and Edijatno, “Statistical downscaling model for assesing drought disaster due to climate change at Sampean watershed Indonesia,” in Proceedings of the 22nd ICID Congress on Irrigation and Drainage, vol. 1 of Question 58 and 59, p. 427, Gwangju, Republic of Korea, September 2014.View at: Google Scholar
V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY, USA, 1998.View at: MathSciNet
W. W. Hsieh, Machine Learning Methods in the Environmental Sciences, Cambridge University Press, Cambridge, UK, 2009.
S. Haykin, Neural Networks: A Comprehensive Foundation, Pearson Education, Singapore, 4th edition, 2003.
C. W. Hsu, C. C. Chang, and C. J. Lin, A Practical Guide to Support Vector Classification, 2003, http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.