Abstract

River flow prediction is essential in many applications of water resources planning and management. In this paper, the accuracy of multivariate adaptive regression splines (MARS), model 5 regression tree (M5RT), and conventional multiple linear regression (CMLR) is compared with a hybrid least square support vector regression-gravitational search algorithm (HLGSA) in predicting monthly river flows. In the first part of the study, all three regression methods were compared with each other in predicting river flows of each basin. It was found that the HLGSA method performed better than the MARS, M5RT, and CMLR in river flow prediction. The effect of log transformation on prediction accuracy of the regression methods was also examined in the second part of the study. Log transformation of the river flow data significantly increased the prediction accuracy of all regression methods. It was also found that log HLGSA (LHLSGA) performed better than the other regression methods. In the third part of the study, the accuracy of the LHLGSA and HLGSA methods was examined in river flow estimation using nearby river flow data. On the basis of results of all applications, it was found that LHLGSA and HLGSA could be successfully used in prediction and estimation of river flow.

1. Introduction

River flow forecasting plays a vital role in planning of water projects, irrigation systems, hydropower system, and optimized utilization of water resources [1]. Due to continuous increase of population growth, industrial uses, and irrigation needs, the river flow forecasting has received great attentions of researchers for operational river management [2]. Forecasting of river flow provides alerts of approaching floods and also assists in controlling the outflows of reservoir during low flows days of river. Floods affect countless lives, infrastructure, and property and cause limitless damage more than any other natural disaster. Due to no assessment of flood magnitude, a flood resulted in a loss of thousand lives and damage of agriculture land of million dollars in Pakistan in 2010 [3]. It is not possible to provide complete safety from flood, but high amounts of money and many lives can be saved by providing accurate flood predictions, flood magnitude, and flood duration [4]. The importance of water measurement compelled researchers to apply various types of forecasting methods to estimate and forecast river flows.

From the last three decades of the previous century, the statistical methods were applied successfully in the field of hydrology including the river flow forecasting. Statistical methods try to find inherent relationships within the actual data. The autoregressive integrated moving average (ARIMA) and seasonal autoregressive integrated moving average (SARIMA) methods are the most popular in the statistical methods category and have been extensively used to model different variables in the field of hydrology [515]. Ahlert and Mehta [5] and Kurunç et al. [7] used ARIMA statistical models for modeling river flows data. Ahmad et al. [6] and Mirzavand and Ghazavi [8] applied ARIMA statistical methods to analyse water quality and groundwater data, respectively. Otok and Suhartono [11], Rabenja et al. [13], and Valipour [14] forecasted runoff data in Indonesia and USA, respectively, by applying SARIMA model and compared with ARIMA. Psilovikos and Elhag [12] and Yang et al. [15] applied ARIMA models successfully in modeling different processes of evaporation data. Mishra and Desai [9] and Modarres [10] used SARIMA method efficiently for drought forecasting in India and Iran, respectively. In the previous two decades, the artificial neural networks (ANN) have been replaced with the statistical methods in solving different problems due to their flexible nature and capturing the nonlinearity in the data. In the literature, many researchers compared the ANN with statistical methods in solving many problems of hydrology and reported that ANN outperformed the statistical methods [1621]. Huang et al. [18] used the ANN method to forecast the river flows of Apalachicola River, USA, using the previous rainfall and river flow data. They compared the quarterly and yearly river flow forecasts results with the ARIMA method’s results and found that the ANN performed better than the ARIMA method in prediction of river flow. The detailed discussion of all ANN applications in comparison with statistical methods to forecast different variables in hydrology is not possible in this paper. However, ANN also have some major weakness, that is, overfitting, falling into local minimum, slowing convergence speed, and requiring large number of training data. Thus, in the last decade, the support vector regression (SVR) took priority over ANN due to its parallel distributed processing, self-learning features, avoiding the overfitting problems, and providing globally optimal solutions [2228]. Ahmad et al. [22] applied SVR model to forecast runoff of Bakhtiyari Basin, Iran, and results explored that SVR showed better accuracy than the ANN methods for daily runoff forecasting especially in case of prediction of higher values of river flows. However, SVM faces computationally difficulties in determining optimal solution due to use of quadratic programming with nonlinear equation. This procedure is time consuming.

Recently least square support vector regression (LSSVR), the improved version of SVR, received much attention in the field of prediction methods due to use of linear squares principle for the loss function instead of the quadratic programming in the SVR method and fast computational speed [2938]. Shabri and Suhartono [37] and Kisi [34] applied LSSVR successfully to forecast river flows. Shabri and Suhartono [37] compared the prediction accuracy of LSSVR with ANN and multivariate linear regression methods whereas Kisi [34] compared it with adaptive neurofuzzy embedded fuzzy -means clustering (ANFIS-FCM) method and they both found that LSSVR performed better than the other methods. Kisi [33] and Goyal et al. [30] forecasted the reference evapotranspiration and pan evaporation by using LSSVR method. Kisi [33] compared its results with feed forward ANN whereas Goyal et al. [30] compared it with ANN and ANFIS methods and they found that LSSVR performed better than the other methods. Okkan and Serbes [36] and Bhagwat and Maity [29] successfully used LSSVR method to forecast runoff data by using the previous meteorological and river flows data. Kisi [32] estimated the suspended sediment by using the river flow data through LSSVR method and reported that the LSSVR gave better estimates in comparison with ANN and sediment rating curve (SRC) methods. Hwang et al. [31] predicted the daily water demand of the Seoul City, Korea, and daily mean inflow of Cheng-ju Dam by using LSSVR model. He compared the predicted results of LSSVR with the conventional multiple linear regression (CMLR) and back propagation neural network methods in both cases and found that LSSVR showed superiority in prediction accuracies. Wu et al. [38] and Mellit et al. [35] applied LSSVR method to predict the different meteorological variables and found that LSSVR performed better than ANN method. Motivated by these successful applications of the LSSVR, the LSSVR method was selected as a forecasting method in this research. LSSVR method has parameters which play a vital role in determining the prediction accuracy of the method. Determining suitable value of these parameters will produce better river flow prediction results. Still, there is no specific way to determine optimal parameters for LSSVR method in the literature of river flow forecasting. Thus, the novelty of this study is to generate a hybrid LSSVR-gravitational search algorithm (HLGSA) river flow forecasting method. In this study, GSA is used to find the optimal values of LSSVR method to increase the prediction accuracy of the method. GSA was preferred in this study over other heuristic algorithms such as simulated annealing algorithm, genetic algorithm, memetic algorithm, differential evolution, and particle swarm optimization due to their premature convergence, parameter sensitivity, and consuming too much time to obtain global optimal solution. Instead of these heuristic algorithms, GSA improves the global search ability and optimization speed by using the principle of gravity and motion. To the best knowledge of the authors, there is not any published work in the literature that predicts the river flow using hybrid LSSVR-gravitational search algorithm (HLGSA) method. Recently, researchers preferred hybrid methods for solving different problems in the field of hydrology.

In addition to the HLGSA method, multivariate adaptive regression splines (MARS) is another popular regression method used to model the complex nonlinear relationships among the variables. MARS is a nonparametric regression method and it has been applied extensively nowadays in the field of hydrology to predict different variables [3945]. To determine the benefits of using MARS over other conventional regression methods, MARS method was compared with CMLR and model 5 regression tree (M5RT) methods in this study. Cross validation (CV) technique was used to better see the prediction accuracy of all applied methods. Log transform function was also utilized in this study to see its effect on the prediction accuracy of these methods.

2. Hybrid LSSVR-Gravitation Search Algorithm (HLGSA) Method for River Flow Prediction

2.1. LSSVR

LSSVR introduced by Suykens and Vandewalle [46] is a modification version of SVR and has advantage on SVR due to reduction in complexity of optimization process by using linear equation instead of quadratic equations [47]. Figure 1 demonstrates the process of LSSVR algorithm. By using time series inputs (lagged river flows) and output (predicted river flow), the function of nonlinear LSSVR is given aswhere represents dot product, is a nonlinear function that employs regression, and and are the weight vector and bias term, respectively [48]. The cost function () of LSSVR can be minimized aswhere represent the regularization constant and the training error for , respectively.

To solve (2), the Lagrange function is used to find the solutions of and . The Lagrange function can be calculated as [49]where is the Lagrange multipliers.

The solution of above equation can be achieved by determining the partial differential of Lagrange function and applying the kernel function (KF) to satisfy Mercer’s condition. To solve regression problems, there are many types of KF including polynomial, radial basis, Gaussian, sigmoid, Mexican hat, Meyer, and Morlet. The KF type plays a vital role in constructing high accurate LSSVR model [50]. This study used the radial basis KF (RBKF) due to its effectiveness for the nonlinear regression problems [51]. The performance of the RBKF with other KFs is shown in Section 7. RBKF can be expressed as After selecting the RBKF for the LSSVR method, finding proper values for penalty factor parameter that is and RBKF parameter that is is necessary. There is no specific way to obtain the optimal values of parameters. Due to these reasons, GSA is adopted in the study to calculate the suitable parameter values.

2.2. Gravitational Search Algorithm (GSA)

GSA is one of the effective optimization algorithms compared with other evolutionary algorithms. It is based on the law of gravity and motion and first proposed by Rashedi et al. [52]. In GSA, each agent has four parameters: position, velocity, inertial mass, and gravitational mass. The location of the agent corresponds to a solution of the problem whereas its gravitational and inertia masses are obtained utilizing a fitness function [53]. The location of particle can be expressed aswhere represents the location of the th agent in the kth dimension. The mass of each agent is computed after calculating the fitness of current population as [52, 54] where and represent the fitness value and mass of the th agent at time , respectively, whereas and represent the minimum fitness value and maximum fitness value, respectively.

To calculate the gravitational acceleration of the agent , firstly the force exerted by heavy agents on this agent should be computed as where and are the passive and active gravitational mass, respectively, corresponding to agents and at the generation, and are the gravitational and small constant, and indicate position of kth dimension of agents and at the generation, and is Euclidean distance between agents and .

The total gravitational acceleration of the th agent can be calculated using the law of motion as follows:where represents the gravitational acceleration of the agent in the kth dimension and indicates a random variable with uniform distribution in the interval . With the help of (9), the total gravitational force exerted on the agent in the kth dimension can be calculated asThen the speed and location of the agent are updated as follows:

It is clearly seen from the brief description of the GSA that it utilizes the gravitational force as the direct form to communicate the agents’ cooperation. The heavy agents in GSA are processed, infer good solutions, and move more gradually than lighter ones, which guarantee the algorithm’s exploitation step. In other words, the GSA searches for the ideal solution by appropriately calibrating the inertia and gravitational masses of agents where every agent provides a solution. As time progresses, the heaviest agent will exhibit an ideal solution in the search space [55].

2.3. HLGSA (Hybrid LSSVR-GSA)

The process of constructing the river flow prediction model HLGSA by using the hybrid of LSSVR and GSA methods is described in this section and shown in Figure 2. The process is as follows:(i)Firstly, divide all river flow data sets into training and test parts.(ii)Select the RBF kernel function and initial parameters for the HLGSA method to build the initial LSSVR model. The initial value of the parameters is set as follows: the range of penalty factor is 0.1 to 2000, the range of RBF parameter is 0.001 to 20, number of iterations is 15, the number of particles can be set up to 40, and constant alpha is found to be better in range of 16 to 20, whereas initial gravitational constant is found to be better in range from 105 to 115.(iii)Compute the particle fitness value of each agent. In this paper, is selected as the fitness function. The fitness function for this method can be defined as(iv)Choose the best parameters combination through GSA to obtain the optimal values of the LSSVR parameters.(v)If it does not meet the stopping criterion, then utilize the new combination of parameters to reconstruct the LSSVR. Compute the fitness until it suits the stopping criterion.(vi)The ideal parameter values are achieved to build the optimal LSSVR model for forecasting river flow. Now, the testing values are used for the optimal LSSVR model to get river flow prediction results.

3. Regression Methods

In this study, the performance accuracy of a hybrid nonlinear optimized regression method (HLGSA) was compared with a nonlinear, nonparametric regression method (MARS), with a piecewise linear regression method (M5RT), and with a conventional linear regression method (CMLR) in forecasting monthly river flow.

3.1. Multivariate Adaptive Regression Splines

MARS is a flexible method which finds relationships that are nearly additive or involve interactions with fewer parameters. The general MARS method is introduced by Friedman [56] and is expressed by the following equation:where is the forecasted river flow by the MARS that is dependent variable, is a constant, are the model coefficients calibrated to provide the best fit to the used data, is the quantity of basis functions (BFs), is the “splits” quantity that generates the mth BFs, and gets values of 1 or −1 and represents the (right/left) sense of the associated step function. is the independent variable’s label [57].

Two-step MARS provides optimal MARS model. MARS develops a huge number of BFs chosen to overfit the data at first step, where variables are permitted to enter—as continuous, categorical, or ordinal—the formal system by which variable ranges are characterized, and they can interact with each other or be restricted to enter only as additive components. In the second step, BFs are erased in the order of minimum effect utilizing the generalized cross validation criterion (GCV). A measure of variable significance can then be evaluated by watching the decrement in the computed GCV when a parameter is excluded from the model. This procedure proceeds until the rest of the BFs all satisfy the predecided necessities. The GCV can be computed as [58]:where and are the actual and predicted river flow values and is a complexity penalty function.

After building the MARS model, the relative importance of a variable in terms of its contribution to the fit of the model can be estimated. MARS is capable of tracking very complex data structures, so selected in this study for modeling river flow time series.

3.2. Model 5 Regression Regression Tree (M5RT)

In decision tree (DT), each branch node indicates a choice between a number of alternatives and a decision is made in every leaf node [59]. Regression trees (RT) are applied to solve those forecasting problems having numeric response variable. They are different from the DT only in that they involve a numeric value rather than a class label combined with the leaves [60]. The M5RT method combines the features of DT and RT methods because the construction of the M5RT is similar to the DT but, instead of the class labels, it has linear regression functions at the leaves. The M5RT is a piecewise linear method that was introduced by Quinlan [61] and has many successful applications in the field of water resources [41, 6268] that compelled the authors to use M5RT method in this paper for river flow prediction.

The division criteria for the M5RT method are based on reducing the standard deviation of the class values that reach a node as an error measure and computing the estimated reduction in this error as a consequence of testing each attribute at that node. The standard deviation reduction (SDR) is computed bywhere stands for set of samples that enters the node, indicates the subset of samples that have the th output of the potential set, and is the standard deviation [69].

3.3. Conventional Multiple Linear Regression (CMLR)

The multiple linear regression methods forecast values of a dependent variable based on independent variables (). Two main advantages of the CMLR are that it has simple structure and it is included in lots of statistical packages [70]. In this study, after determining the independent lagged river flow values for dependent river flows of both basins, the CMLR can be constructed as follows:where is the dependent variable, are the equation parameters for the linear relation, and , are the independent lagged river flow value used to forecast river flow. However, CMLRs have some disadvantages in predicting nonlinear situations because of their linear structure [71].

4. Study Sites and Data Preprocessing

The study used the river flows data from two catchments, Astore and Shyok, on the Upper Indus Basin of Pakistan. Figure 3 shows the location map of the catchments. The geographical location of Astore Basin is approximately between longitudes 74°, 24′ and 75°, 14′E and between latitudes 34°, 45′ and 35°, 38′N. The river covers a catchment area of about 3750 km2. Water and power development authority (WAPDA), Pakistan, has one flow gauging station, that is, Doyian in this area for flow record under Surface Water Hydrology Project (SWHP). The elevation of this gauging station is 1583 masl and its geographical location in the basin is 35°, 33′N latitude and 74°, 42′E longitude. The Shyok Basin covers drainage area of 68,458 km2 with average basin elevation of 4940 m. WAPDA also installed one flow gauging station hydrometric station in this area for flow record at Yogo with an elevation of 2469 m and its geographical location in the catchment is 35°, 11′N latitude and 76°, 06′E longitude. The recorded monthly data of river flows of both catchments were collected through WAPDA for the duration of 1975–2006 and the total time span of this duration is 384 months. According to the WAPDA, the mean annual river flow 32 yr (1975–2006) flow record is 142 m3/s for Astore catchment whereas, for Shyok catchment, it is 457 m3/s.

In this research, cross validation (CV) technique was applied to better see the prediction accuracy of the applied methods. In CV technique, the whole data is divided into equal data sets (DS), then the DS is used to train, and the other one DS is used to test the accuracy of the method. This process is repeated times till every DS of the data is used to test the applied method. This CV technique is preferred over -fold cross validation techniques due to usage of every data set for testing and which makes it closer to the real world problem [69, 72]. Similarly, in this research, the whole river flow data was divided into four equal DS. In all the applications, the three DS were used to train and remaining one DS was adopted to test the method. This procedure was repeated four times till every DS of data was used to test the method. The monthly river flow time series statistics of Astore and Shyok catchments are reported in Table 1. Here, the DS1, DS2, DS3, and DS4 represent four equal data sets of whole data for CV analysis whereas , , , , and represent mean, standard deviation, skewness coefficient, minimum, and maximum river flows, respectively. The recorded monthly river flows data show similarly high positive skewed distribution for Astore and Shyok catchments ( and 1.88). However, the range of the flow data of Shyok Basin (36.7–2080.7 m3/s) is much higher than that of the Astore Basin (19.3–654.9 m3/s). The lagged values of river flows show low persistence (e.g., Lag 1 = 0.735, Lag 2 = 0.266, and Lag 3 = −0.152). However, the lagged values of Astore Basin give a little better persistence than those of the Shyok Basin.

5. Input Combination Selection and Performance Evaluation Criteria

Input combinations (IC) selection is an important step in model development and it plays a key role in increasing the accuracy of the model. To see the correlation effect of the lagged values, the autocorrelation function is generally used to determine the number of affective lagged input values. For river flow forecasting, previous lagged river flows values are generally taken as IC in many researches of river flow forecasting [37, 73, 74]. In this research, the effect of lagged river flows values of both catchments was examined through autocorrelation function and was reported in Figure 4. According to the analysis, the following three IC were selected as inputs on the basis of most significant lagged river flows values for both basins, that is, , ; , , ; and , , , .

In this paper, two error indices were selected to evaluate the performance of the models in prediction of monthly river flows including the root mean square error and mean absolute error (). The similarity between the observed value and the forecasted value of river flow is measured by using the determination coefficient () index. These three indexes have been extensively applied in many problems of water resources for evaluating the model performance [19, 26, 75]. All the performance evaluation indexes can be calculated aswhere is the total size of observations of river flow time series, is observed river flow, is forecasted river flow, is average of river flows, and is average forecasted river flow.

6. River Flow Prediction Using Soft Computing Methods

In the first part of the paper, the performance of the proposed method hybrid LSSVR-GSA (HLGSA) was compared with other regression methods in predicting monthly river flows of the Astore and Shyok catchments, separately by using the three input combinations comprising antecedent river flows. CV technique was used for each applied method by dividing river flows data into four equal DS. Test statistics of HLGSA, MARS, M5RT, and CMLR methods for the Astore catchment in the test duration is compared in Table 2. It is clear from the table that all four applied methods provide different prediction results for different DS and input combinations. In case of input combinations, IC1 comprising the two consecutive previous months’ river flow values provides the worst prediction results for the HLGSA, MARS, and CMLR methods. IC2 comprising the two consecutive antecedent months’ river flow values including antecedent eleventh month’s flow value gives the best prediction results for the HLGSA, MARS, and M5RT methods whereas, for the CMLR method, IC3 comprising the two consecutive antecedent months’ river flow values and antecedent eleventh and twelfth months’ river flow value provides better performance than the other two input combinations. In the case of data sets, it is clear from the table that DS2 gives the worst forecasts results for all the regressions models including proposed HLGSA method. The reason of this is the maximum river flow value of Astore Basin’s test set; DS2 ( m3/s) is higher than the corresponding extreme value of the training DS value (see Table 1). This indicates that all trained methods encounter problems in constructing extrapolation in higher value of DS2. The higher values of and parameters for the DS2 data set in comparison with other data sets can also be another reason for the worst results. It is evident that all methods provide good forecasts for the DS4 under all input combination scenarios. The best model structures for HLGSA, MARS, and M5RT methods were found for the DS4 and IC2. However, in case of CMLR method, the best model structure was found for the DS4 and IC3. The best (40.09 m3/s) for the HLGSA is better as compared to MARS, M5RT, and CMLR methods (43.46, 57.26, and 58.08 m3/s), respectively. This is also true for values where the best for the HLGSA is 25.05 m3/s compared to MARS, M5RT, and CMLR methods (28.36, 29.83, and 32.70 m3/s), respectively. Table 2 clearly shows that the HLGSA method provides better prediction results than the other regression methods under all data sets and input combinations scenarios. MARS is ranked as the second best and performs better than the M5RT and CMLR whereas M5RT performs better than the CMLR under all data sets. In case of IC3, however, CMLR gives better forecasts than the M5RT for all data sets by having lower values of error indexes ( and ) and higher value of correlation index (). The best prediction results of HLGSA, MARS, and M5RT for IC2 indicate that the river flow of the two preceding months and eleventh month highly affects the current month river flow. The results of IC3 justify this statement by adding the river flow data of twelfth month that negatively affect the method performance. However, the CMLR showed positive dependence on the input combinations by showing the best prediction results for IC3.

For the sake of simplicity, the performance accuracy of all methods was evaluated by comparing the overall mean errors representing the mean error of all data sets and input combinations. Figures 5(a)-5(b) show the overall average errors statistics of all methods. Mean errors statistics of and clearly indicate that the HLGSA method performs better by having relatively less value of error indexes than the other methods in prediction of Astore catchment’s river flows. M5RT and CMLR give almost same mean errors statistics and provide the worst accuracy prediction in comparison to HLGSA and MARS due to having higher values of both errors indexes. This indicates the nonlinearity of the investigated phenomenon because both M5RT and CMLR have linear structures. HLGSA decreases the overall mean of the MARS, M5RT, and CMLR by 8.22%, 23.15%, and 24.49%, respectively. The observed and forecasted monthly river flows of Astore Basin by all the methods using their best model structures are reported in Figures 6(a)6(d). The figure clearly explores that the HLGSA method is in good fit with the original river flows data in comparison to the other methods. The HLGSA gives higher value of correlation index () than the other methods. From Figure 6, it can be clearly seen that the value of the HLGSA method is 0.916, which is higher than of MARS, M5RT, and CMLR methods (which are 0.900, 0.888, and 0.882).

Table 3 reports the results of three performance evaluation statistical indexes for the HLGSA, MARS, M5RT, and CMLR methods in forecasting river flows of the Shyok catchment. In case of input combinations, here also IC1 gives the worst forecast results compared to the IC1 and IC3. However, in contrast to Astore application, here IC3 shows better accuracy than the IC2 for all the applied methods. In the case of data sets, it is evident from the table that DS4 gives the worst prediction results for all the methods. Similar to Astore Basin, here also the maximum river flow value of the Shyok Basin’s test set, DS4 ( m3/s), is higher than that of the training value (see Table 1). Another reason of this may be the fact that the DS2 has low correlations with the preceding river flow input data in comparison to the other data sets. It is obvious from the table that all methods give good prediction results for the DS3 among all input combination scenarios. The best model structures for all four applied methods in case of Shyok catchment are found for the DS3 and IC3. Similar to the Astore Basin, here also Table 3 clearly explores that the HLGSA method outperforms the other methods from , , and viewpoints for all data sets and input combination scenarios. Here, also the MARS gives better accuracy than the M5RT and CMLR methods for all data sets and input combinations. However in contrast to Astore Basin, here M5RT provides better prediction results than the CMLR method. The best for the proposed method is better as 109.20 m3/s compared to MARS, M5RT, and CMLR methods (128.38, 141.13, and 151.21 m3/s), respectively. This is also true for values where the best for the proposed method is 55.75 m3/s compared to MARS, M5RT, and CMLR methods (69.52, 77.27, and 86.70 m3/s), respectively.

Figures 7(a)-7(b) compare the overall average errors statistics of all methods in prediction river flows of Shyok Basin. Similar to Astore, here the figure clearly shows that and indexes of the HLGSA method give lower values in comparison to the MARS, M5RT, and CMLR methods and report that the HLGSA performs better than the other methods. However, in contrast to Astore, here the MARS and M5RT give almost same accuracy results whereas CMLR provides the worst performance in comparison to the other methods due to having higher values of both error indexes. In case of Shyok catchment, HLGSA decreases the overall mean of the MARS, M5RT, and CMLR by 11.48%, 19.55%, and 36.19%, respectively. The scatterplots of the original and predicted monthly river flows of Shyok Basin by all the methods using their best model structures are reported in Figures 8(a)8(d). The figure clearly shows the superior accuracy of the HLGSA method over the MARS, M5RT, and CMLR methods. It is evident from the figure that the HLGSA method has a good fit to the observed river flows data in comparison to the other methods by having the higher value of . From Figure 8, it can be clearly seen that the value of the HLGSA method is 0.947, which is higher than of MARS, M5RT, and CMLR methods (which are 0.917, 0.891, and 0.876).

Selection of the proper kernel function (KF) is very important in obtaining highly accurate HLGSA method. The accuracy of the HLGSA method produced by different types of KF is different. In this research, the RBKF was utilized for determining the prediction results of river flows of both basins. However, the KF has many other types, and mostly common KF types are polynomial KF, sigmoid KF, Gaussian KF, Morlet KF, Mexican hat KF, and Meyer KF. To evaluate the performance of the applied KF in this study, six different kernel functions were compared in prediction of river flows of both catchments by using the best HLGSA model structures and were reported in Table 4. Table 4 clearly proves the superiority of the RBKF over the other kernel functions due to having smaller values of both errors indexes (, for Astore and , for Shyok) and higher value of ( for Astore and for Shyok) for both basins.

7. Effect of Log Transform on the Prediction Accuracy of the Soft Computing Methods

In this section of the paper, the effect of log transform on the prediction accuracy of the applied methods was investigated by applying log transform on the time series data of both basins before applying these methods. Here also the CV technique was applied and log transform data was divided into four equal data sets. Table 5 reports the test results of the LHLGSA, LMARS, LM5RT, and LCMLR methods for the Astore Basin. Here, also the IC2 generally provides the best forecast of LHLGSA, LMARS, and LM5RT methods whereas IC3 generally gives better results for the LCMLR method. However, IC3 generally provides worse results for LM5RT method in comparison to the IC1 and IC2. LCMLR gives better performance than the LM5RT method in case of IC2 and IC3 scenarios due to having lower values of and and higher values of . Similar to previous Astore application, here also the DS2 gives the worst results whereas DS4 performs the best among all data sets. The reason of the worst results of DS2 was already mentioned before. In case of Astore Basin, the best , , and values (35.33 m3/s, 13.08 m3/s, and 0.921) of the LHLGSA are better than those of the HLGSA (40.09 m3/s, 25.05 m3/s, and 0.916), respectively. This is also true for the LMARS, LM5RT, and LCMLR methods where the best , , and values of the LMARS, LM5RT, and LCMLR methods, respectively, are 38.78 m3/s, 20.88 m3/s, and 0.906, 46.18 m3/s, 24.80 m3/s, and 0.899, and 39.02 m3/s, 21.08 m3/s, and 0.901, which are better in comparison to those of the MARS, M5RT, and CMLR methods (43.46 m3/s, 28.36 m3/s, and 0.900, 57.26 m3/s, 29.83 m3/s, and 0.888, and 58.08 m3/s, 32.70 m3/s, and 0.888, resp.).

The mean errors statistics of all log models is illustrated in Figures 9(a)-9(b). The figure clearly explores the performance dominancy of LHLGSA over the other corresponding methods. According to Table 5, LCMLR generally performs better than LM5RT method in mostly cases of input combinations. However, on the mean errors values basis, the LCMLR performs worse than the LM5RT method due to inaccurate forecast results of IC1 ( > , > , > , > according to viewpoint) which affects the mean errors values. Figures 10(a)10(d) illustrate the observed and forecasted river flows of the Astore Basin by using the log transform methods. The figure clearly shows that the LHLGSA method is in good fit with the original data. According to comparison of scatter plots of log and normal methods (Figures 6 and 10), it is evident that the log methods have better fits with the original data in comparison to normal methods. On the basis of fit line equation, the log methods are closer to the exact line than the normal methods (see and coefficients in Figures 6 and 10). LHLGSA decreases the overall mean of the LMARS, LM5RT, and LCMLR by 11.46%, 24.16%, and 35.65%, respectively.

Test statistics of the LHLGSA, LMARS, LM5RT, and LCMLR methods for the Shyok Basin are reported in Table 6. Here, in contrast to previous Shyok application, IC2 gives better forecast results for the LHLGSA and M5RT methods whereas IC3 performs better for the LMARS and LCMLR methods. However, IC3 performs slightly better for the LM5RT, whereas IC2 performs better for the LMARS in the case of DS3 data set. Similar to the previous Shyok application, here also DS3 performs the best whereas DS4 performs worst among all data sets. To check the best log method among all log methods, the mean errors indexes ( and ) of all log models are plotted in Figures 11(a)-11(b) in the form of bar graphs. The bar graphs of mean error indexes clearly show that the LHLGSA performs better than the other log methods due to having lower values of error indexes. Scatter plots of the observed and predicted river flows of Shyok catchment for all log methods by using their best model structures are shown in Figures 12(a)12(d). The figure clearly shows that the LHLGSA method gives less scatter estimates with a higher value of in comparison to the LMARS, LM5RT, and LCMLR methods. The figure also reveals that all log methods are closer to exact line than the normal methods (compare Figures 8 and 12). On the basis of comparison, it is obvious that the normal methods give more scattered forecasts than the log methods. In case of Shyok catchment, LHLGSA decreases the overall mean of the LMARS, LM5RT, and LCMLR by 14.49%, 21.73%, and 43.84%, respectively. The log transform does not equally affect all input combinations and data sets but it can be observed that this effect is more prominent when more inputs are used in the case of CMLR methods (see Tables 5 and 6). In contrast to Astore Basin application, for the Shyok Basin, the best and values for the HLGSA (109.20 m3/s and 55.75 m3/s) are a little better compared to LHLGSA (109.41 m3/s and 59.01 m3/s), respectively. However in case of similarity index (), the best value for LHLGSA (0.959) is better than that of the HLGSA (0.947) method. In case of LMARS, LM5RT, and CMLR method, the best , , and values are better in comparison with MARS, M5RT, and CMLR methods, similar to Astore Basin application.

To evaluate the overall effect of log transform function on all applied methods, the overall mean error indexes ( and ) for both basins with and without log methods are compared in Figures 13(a), 13(b), 14(a), and 14(b). Both graphs clearly explore that the log methods give better accuracy than the normal methods except the LCMLR which gives worse accuracy compared to the CMLR with respect to error index. However, the LCMLR gives better accuracy than the CMLR from the index viewpoint for both basins. The reason of the LCMLR’s bad performance is due to its worse results in case of IC1 that affects the overall value. The bar graphs also prove that the proposed HLGSA method shows better accuracy than the other models in both cases of normal and logarithm transformed time series data. According to the bar graphs, the LHLGSA reduces the overall of the HLGSA by 5.66% and 4.87% for the Astore and Shyok Basins, respectively. The LMARS reduces the overall of the MARS by 2.20% and 1.52% for the Astore and Shyok Basins, respectively. The LM5RT reduced the overall of the M5RT by 4.41% and 2.22% for the Astore and Shyok Basins, respectively. However, in the case of multiple linear regression method, CMLR reduced the overall of the LCMLR by 9.05% and 7.45% for the Astore and Shyok Basins, respectively, while, in the case of error index, the LCMLR reduced the overall of the CMLR by 3.60% and 10.80%, respectively.

8. Comparison of LHLGSA and HLGSA Methods in Estimating River Flows Using Nearby River Flows Data

In this last section of the research, the performance of the HLGSA and LHLGSA is evaluated in river flow estimation of a basin using flow data from a nearby basin. The river flows estimation using nearby basin’s flow data is a vital issue because Pakistan is a developing country and many basins have long duration of missing flows data due to financial problems in the maintenance of the hydraulic gauging stations at higher altitudes. Since river flows play a key role in planning and designing of hydropower projects and for the flood mitigation, it is necessary to find a suitable way to fill these missing flows data. In this paper, the river flows data of the Astore Basin is used to estimate the flow data of the Shyok Basin. In this application, also the CV technique is applied to better see the accuracy of both methods in estimating river flows. Table 7 shows the , , and values of both methods in estimating monthly river flows of Shyok Basin. The best input combination for the LHLGSA and HLGSA methods is IC3 while IC2 generally gives worse results for both methods. DS4 gives better accuracy than the other DS whereas DS1 provides the worst estimates for both methods. Table 7 clearly shows the superiority of the LHLGSA over HLGSA in case of accuracy under all data sets and input combinations. It can also be seen that LHLGSA gave lower values of error indexes in case of the best data set with the best input combination ( of LHLGSA = 177.62 < of HLGSA = 189.03 and of LHLGSA = 80.21 < of HLGSA = 99.66). The original and estimated river flows by the LHLGSA and HLGSA methods using their best model structures (DS4 with IC3) are illustrated in Figure 15. The figure clearly shows that the LHLGSA has higher value of (0.882 > 0.871) representing better estimation than the HLGSA method. Figure 16 compares the mean error indexes of both methods in estimation river flows of Shyok Basin. It can be seen from the figure that the LHLGSA shows better accuracy than the HLGSA by having lower values of errors indexes. LHLGSA decreases the overall mean of the HLGSA method by 4.10% in the estimation of Shyok Basin flows using the river flows of Astore Basin.

9. Discussion

On the basis of above results, the key findings can be summarized as follows.(1)The Astore Basin reported lower values of and in comparison with the Shyok Basin. This is due to the mean river flow of the basins; Astore Basin is characterized by mean river flow of 142 m3/s while the mean river flow of Shyok Basin is 457 m3/s.(2)It was also observed that the prediction accuracy of all methods including proposed method was mostly improved with increasing in input numbers which indicated that all input combinations have positive effects on predicting river flow especially in case of CMLR method.(3)It was also found that the higher value of testing data set’s maximum river flow in comparison with other training data set’s maximum river flow values caused the extrapolation difficulties and produced worst prediction results for that data set.(4)Overall, the HLGSA and LHLGSA methods outperformed the MARS, M5RT, CMLR and LMARS, LM5RT, and LCMLR methods, respectively. Moreover, the comparison between Figures 13 and 14 indicates that the prediction results with log transform function are better on the mean basis than the prediction results without log transform function using all regression methods including the proposed method which means the log transform function is suitable for denoising the river flow data.(5)In literature, many studies reported that MARS performed better or equally in comparison with the LSSVR methods [43, 44, 7678]. However, in this study, the hybrid LSSVR method with gravitational search algorithm performed better than MARS method. The main reason behind this may be the LSSVR’s strong generalization capability and nonlinear fitting ability and the second reason may be the selection of optimal LSSVR control parameters ( and ) through GSA that directly affects and improves the accuracy of the method. The powerful global search ability of GSA helps to find the optimal and suitable values for the LSSVR control parameters in a shorter time in comparison to other algorithms. We can conclude that, in application of LSSVM, the control parameters should be adequately optimized by using global optimization techniques. This will decrease the uncertainties in obtaining optimal LSSVM models.(6)In general, the benchmark regression methods can be ranked according to their prediction accuracy as MARS, M5RT, and CMLR. The reason behind the worst results of M5RT and CMLR methods can be the linear structure of these models.(7)The , , and results validate that the HLGSA and LHLGSA methods can be effectively applied for the prediction and estimation of river flow.

10. Conclusions

In the current study, river flow data of Astore and Shyok rivers was used to determine the forecasting capability of HLGSA, MARS, M5RT, and CMLR methods by using the antecedent river flow values as inputs. Two error indexes ( and ) and one similarity index () were used for comparing the prediction accuracy of these methods. CV technique was used in all the applications to better see the prediction accuracy of the data sets. In the first part of the study, among four regression methods, HLGSA provided better results than the other methods in prediction of the monthly river flow data of both catchments. HLGSA improved the prediction accuracy of the MARS, M5RT, and CMLR by 8.22%, 23.15%, and 24.49% in Astore Basin, respectively, whereas, for Shyok Basin, HLGSA improved the prediction accuracy of the MARS, M5RT, and CMLR methods by 11.48%, 19.55%, and 36.19%, respectively. In the current study, radial basis kernel function was selected for HLGSA model due to its better prediction accuracy. In the second part of the study, the effect of logarithm transform function on prediction performance of all regression methods was also investigated. Results reported that after applying logarithm function on river flow time series data, all the regression methods provided better prediction accuracy for both basins. Prediction results also exposed that the HLGSA method outperformed the other methods. LHLGSA decreased the overall mean of the LMARS, LM5RT, and LCMLR by 11.46%, 24.16%, and 35.65%, respectively, for the Astore Basin, whereas, for Shyok Basin, LHLGSA decreases the overall mean of the LMARS, LM5RT, and LCMLR by 14.49%, 21.73%, and 43.84%, respectively. On the comparison of log transformed methods and normal methods, the LHLGSA reduced the overall of the HLGSA by 5.66% and 4.87% for the Astore and Shyok Basins, respectively. LMARS reduced the overall of the MARS by 2.20% and 1.52% for the Astore and Shyok Basins, respectively. The LM5RT reduced the overall of the M5RT by 4.41% and 2.22% for the Astore and Shyok Basins, respectively. The third part of the study evaluated the prediction performance of the HLGSA and LHLGSA methods in river flow estimation using river flow data of the nearby basin. The test results revealed that the LHLGSA performed better than the HLGSA in estimating river flows of Shyok Basin by using Astore Basin data.

In this study we forecasted river flows with only previous river flow values as inputs. These prediction accuracies of the applied methods could be improved if more input variables were available. Further studies may be conducted by including more inputs such as rainfall, snowpack, and temperature or/and building prediction models using more advanced modeling methods at these study sites. The proposed data driven methods may be applied for other regions with similar or different climates. In this case, however, the methods should be properly calibrated by using high number of river flow data.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work was supported by National Natural Science Foundation of China (no. 51379080 and no. 41571514) and Hubei Provincial Collaborative Innovation Center for New Energy Microgrid, China Three Gorges University. The authors thank the staff of WAPDA for providing river flows data of both basins.