Abstract

Accurate prediction of hydrological processes is key for optimal allocation of water resources. In this study, two novel hybrid models are developed to improve the prediction precision of hydrological time series data based on the principal of three stages as denoising, decomposition, and decomposed component prediction and summation. The proposed architecture is applied on daily rivers inflow time series data of Indus Basin System. The performances of the proposed models are compared with traditional single-stage model (without denoised and decomposed), the hybrid two-stage model (with denoised), and existing three-stage hybrid model (with denoised and decomposition). Three evaluation measures are used to assess the prediction accuracy of all models such as Mean Relative Error (MRE), Mean Absolute Error (MAE), and Mean Square Error (MSE). The proposed, three-stage hybrid models have shown improvement in prediction accuracy with minimum MRE, MAE, and MSE for all case studies as compared to other existing one-stage and two-stage models. In summary, the accuracy of prediction is improved by reducing the complexity of hydrological time series data by incorporating the denoising and decomposition.

1. Introduction

Accurate prediction of hydrological processes is key for optimal allocation of water resources. It is challenging because of its nonstationary and multiscale stochastic characteristics of hydrological process which are affected not only by climate change but also by other socioeconomic development projects. The human activities also effected the climate change through contributing in Earth’s atmosphere by burning of fossil fuels which release carbon dioxide in atmosphere. Instead of these, greenhouse and aerosols have made effect on Earth’s atmosphere by altering in-out coming solar radiations which is the part of Earth’s energy balance. This makes the prediction of hydrological time series data challenging. To predict such hydrological processes, two broad types of models are commonly used, one is the process-based models which further included the lumped conceptual models, hydrological model, and one-two-dimensional hydrodynamic models [1], and the second is data driven models which included autoregressive moving averages and artificial neural network (which are also known as black box models). The process-based models considered the physical mechanism of stochastic hydrological processes, which requires a large amount of data for calibration and validation [2]. Moreover, physical-based models demand the scientific principles of energy and water movements spatiotemporally. Zahidul [3] concluded that unavailability of sufficient amount of data and scientific knowledge of water movement can lead to poor understanding of hydrological system which makes the hydrological modeling a challenging task. In order to overcome these drawbacks, hydrologists used data driven models to efficiently model the hydrological process [4, 5]. The data driven models only take the advantage of inherent the input-output relationship through data manipulation without considering the internal physical process. The data-driven models are efficient over the process-driven models by appreciating the advantage of less demanding the quantitative data, simple formulation with better prediction performance [6]. These data-driven models are further divided into two categories: simple traditional statistical techniques and more complex machine learning methods. In the last few decades, many traditional statistical time series models are developed including Autoregressive (AR), Moving Averages (MA), Autoregressive Moving Averages (ARMA), and Autoregressive Integrated Moving Averages (ARIMA) [7]. Application of ARIMA model to monitoring hydrological processes like river discharge is common and successfully applied [8]. But the problem with all these traditional statistical techniques required that the time series data to be stationary. However, hydrological data was characterized as both nonstationary and nonlinear due to its time varying nature. Therefore, these techniques are not enough to capture the nonlinear characteristics of hydrological series [6]. To rescue the drawbacks of existing traditional models, machine learning (ML) algorithms have been put forward and widely exploited, which provide powerful solution to the instability of hydrological time series data [4]. ML models include Artificial Neural Network (ANN), Support Vector Machine (SVM), and random forest and genetic algorithms [914]. Riad et al. [5] developed an ANN to model the nonlinear relation between rainfall and runoff and concluded that ANN model is better to model the complex hydrological system over the traditional statistical models. However, these ML methods have their own drawbacks such as overfitting and being sensitive to parameter selection. In addition, there are two main drawbacks of using ML models: first is that ML models ignore the time varying characteristics of hydrological time series data and secondly the hydrological data contains noises which deprive the researchers to accurately predict the hydrological time series data in an effective way [15]. These time varying and noise corrupted characteristics of hydrological time series data require hybrid approaches to model the complex characteristics of hydrological time series data [16].

To conquer the limitations of existing single models, some hybrid algorithms such as data preprocessing methods are utilized with data-driven models with the hope to enhance the prediction performance of complex hydrological time series data by extracting time varying components with noise reduction. These preprocess based hybrid models have already been applied in hydrology [2]. The framework of hybrid model usually comprised “decomposition,” “prediction,” and “ensemble” [2, 6, 13]. The most commonly used data preprocessing method is wavelet analysis (WA) which is used to decompose the nonlinear and nonstationary hydrological data into multiscale components [13]. These processed multiscale components are further used as inputs in black box models at prediction stage and finally predicted components are ensemble to get final predictions. Peng et al. [6] proposed hybrid model by using empirical wavelet transform and ANN for reliable stream flow forecasting. They demonstrated their proposed hybrid model efficiency over single models. Later, Wu et al. [11] exploited a two-stage hybrid model by incorporating Wavelet Multi-Resolution Analysis (WMRA), and other data preprocessing methods as MA, singular spectrum analysis with ANN to enhance the estimate of daily flows. They proposed five models including ANN-MA, ANN-SSA1, ANN-SSA2, ANN-WMRA1, and ANN-WMRA2 and suggested that decomposition with MA model performs better than WMRA. An improvement in wavelet decomposition method has been made to get more accurate hybrid results comprising WA [17]. However, the problem which reduces the performance of WA, i.e., selection of mother wavelet basis function, is still an open debate as the selection of mother wavelet is subjectively determined among many wavelet basis functions. The optimality of multiscale characteristics entirely depends on the choice of mother wavelet function as poorly selected mother wavelet function can lead to more uncertainty in time-scale components. To overcome this drawback, Huang et al. [18] proposed a purely data-driven Empirical Mode Decomposition (EMD) technique. The objective of EMD is to decompose the nonlinear and nonstationary data adaptively into number of oscillatory components called Intrinsic Mode Decomposition (IMF). A number of studies have been conducted combining the EMD with data driven models [15, 1821]. Specifically in hydrology, EMD is used with ANN for wind speed and stream flow prediction [15, 20]. Agana and Homaifar [21] developed the EMD-based predictive deep belief network for accurately predicting and forecasting the Standardized Stream flow Index (SSI). Their study manifested that their proposed model is better than the existing standard methods with the improvement in prediction accuracy of SSI. However, Kang et al. [22] revealed that EMD suffers with mode mixing problem which ultimately affects the efficiency of decomposition. In order to deal with this mode mixing problem, Wu and Hang [23] proposed an improved EMD by successively introducing white Gauss noise in signals, called Ensemble Empirical Mode Decomposition (EEMD) that addresses the issue of frequent apparent of mode mixing in EMD. Later, EEMD was effectively used as data decomposition method to extract the multiscale characteristics [2426]. Di et al. [2] proposed a four-stage hybrid model (based on EEMD for decomposition) to improve the prediction accuracy by reducing the redundant noises and concluded that coupling the appropriate data decomposition with EEMD method with data driven models could improve the prediction performance compared to existing EMD based hybrid model. Jiang et al. [26] proposed another two-stage hybrid approach coupling EEMD with data-driven models to forecast high speed rail passenger flow to estimate daily ridership. They suggested that their proposed hybrid model is more suitable for short term prediction by accounting for the day to day variation over other hybrid and single models. However, due to successive addition of independent white Gauss noise, the performance of EEMD is affected which reduces the accuracy of extracted IMFs through EEMD algorithm. Dai et al. [27] reported in their study that EEMD based hybrid models did not perform appropriately due to independent noise addition.

This study aimed to develop a robust hybrid model to decompose the hydrological time varying characteristics using CEEMDAN [28]. The CEEMDAN successively adds white noise, following the steps of EEMD, with interference in each decomposition level to overcome the drawback of EEMD algorithm. Dai et al. [27] developed a model comprising CEEMDAN to daily peak load forecasting which shows robust decomposed ability for reliable forecasting. Therefore, the purpose of using CEEMDAN method for decomposition in this study is to find an effective way to decompose the nonlinear data which enhances the prediction accuracy of the complex hydrological time series data [27].

2. Proposed Methods

In this study, two novel approaches are proposed to enhance the prediction accuracy of the hydrological time series. Both models have the same layout except in stage of denoising, where two different approaches have been used to remove noises from hydrological time series data. In both models, at decomposition stage, an improved version of EEMD, i.e., CEEMDAN, is used to find oscillations, i.e., the high to low frequencies in terms of IMF. At prediction stage, multimodels are used to accurately predict the extracted IMFs by considering the nature of IMFs instead of using only single stochastic model. The purpose of using multimodel is two-way: one is for accurately predicting the IMFs by considering the nature of IMFs and the other is to assess the performance of simple and complex models after reducing the complexity of hydrological time series data through decomposition. Predicted IMFs are added to get the final prediction of hydrological time series. The proposed three stages involve denoising (D-step), decomposition (Decompose-step), and component prediction (P-step), which are briefly described below:(1)D-step: WA and EMD based denoising methods are presented to remove the noises from hydrological time series data.(2)Decomposed-step: Using CEEMDAN, two separately denoised series are decomposed into IMFs and one residual.(3)P-step: The denoised-decomposed series into IMFs and one residual are predicted with linear stochastic and nonlinear machine learning models. The model with the lowest error rate of prediction is selected by three performance evaluation measures. Finally the predicted results are added to get the final prediction.

For convenient, two proposed methods as named as EMD (denoising), CEEMDAN (decomposing), MM (multi-models) i.e. EMD-CEEMDAN-MM and WA (denoising), CEEMDAN (denoising) and MM (multi-models) i.e. WA-CEEMDAN-MM. The proposed architecture of WA/EMD-CEEMDAN-MM is given in Figure 1.

2.1. D-Step

In hydrology time series data, noises or stochastic volatiles are inevitable component which ultimately reduced the performance of prediction. To reduce the noise from data, many algorithms have been proposed in literature such as Fourier analysis, spectral analysis, WA, and EMD [29], as besides decomposition, these techniques have the ability to remove the noises from data. However, the spectral and Fourier analysis only considered the linear and stationary signals, whereas WA and EMD have the ability to address the nonlinear and nonstationary data with better performance. In this study, WA- and EMD-based threshold are adopted to reduce the stochastic volatiles from the hydrological data.

(i) Wavelet analysis based denoising: in order to remove noises, discrete wavelet threshold method is recognized as powerful mathematical functions with hard and soft threshold. With the help of symlet 8 mother wavelet [30], hydrological time series data is decomposed into approximation and details coefficients with the following equations, respectively [31]; andAfter estimating the approximation and details coefficients, threshold is calculated for each coefficient to remove noises. The energy of data is distributed only on few wavelet coefficients with high magnitude whereas most of the wavelet coefficients are noisiest with low magnitude. To calculate the noise free coefficient, hard and soft threshold rules are opted, which are listed as follows, respectively [31]:andwhere is the threshold calculated as , where is constant which takes the values between 0.4 and 1.4 with step of 0.1 and is median deviation of all details. Then, the decomposed data is reconstructed using the noise free details and approximations using the following equation:where is threshold approximation coefficient and is threshold detailed coefficient.

(ii) Empirical mode decomposition based denoising: an EMD is data-driven algorithm which has been recently proposed to decompose nonlinear and nonstationary data into several oscillatory modes [18]. Due to adaptive nature, EMD directly decomposes data into number of IMFs by satisfying two conditions as follows: (a) From complete data set, the number of zero crossings and the number of extremes must be equal or differ at most by one; (b) the mean value of the envelope which is smoothed, through cubic spline interpolation, based on the local maxima and minima should be zero at all points.

The EMD structure is defined as follows:(1)Identify all local maxima and minima from time series and make upper envelope of maxima and lower envelope minima through cubic spline interpolation.(2)Find the mean of upper and lower envelope Find the difference between original series and extracted mean as(3)Check the properties defined in (a) and (b) of ; if both conditions are satisfied then mark this as IMF; then the next step will be to replace the original series by ; if is not IMF just replace with .(4)Repeat the process of (1-3), until the residue becomes a monotone function from which no further IMFs can be extracted.

Finally, original series can be written as the sum of all extracted IMFs and residue aswhere is the number of IMFs, as and is the IMF, and is the trend of the signal. The way of denoised IMF is the same as mentioned in (3)-(5), except the last two IMFs which are used completely without denoising due to low frequencies. The subscript in EMD-based threshold case in (3)-(5) is replaced with according to number of IMFs. The denoised signal is reconstructed as follows:

2.2. Decompose-Step: Decomposition Step

The EEMD method: the EEMD is a technique to stabilize the problem of mode mixing which arises in EMD and decomposes the nonlinear signals into number which contains the information of local time varying characteristics. The procedure of EEMD is as follows:(a)Add a white Gaussian noise series to the original data set.(b)Decompose the signals with added white noise into IMFs using conventional EMD method.(c)Repeat steps (a) and (b) time by adding different white noises ) in original signals.(d)Obtain the ensemble means of all IMFs ensemble time as the final results as , where is IMF.

The CEEMDAN based decomposition: although the EEMD can reduce the mode mixing problem to some extent, due to the successive addition of white Gauss noise in EEMD, the error cannot be completely eliminated from IMFs. To overcome this situation, CEEMDAN function is introduced by Torres et al. [28]. We employed the CEEMDAN to decompose the hydrological time series data. The CEEMDAN is briefly described as follows:(1)In CEEMDAN, extracted modes are defined as ; in order to get complete decomposition we need to calculate the first residual by using the first , which is calculated by EEMD as .(2)Then replace by where and add white Gaussian noises, i.e., , time in , and find the IMF by taking the average of first IMF to get the . Calculate and repeat (2) until stoppage criteria are meet. However, selection of number of ensemble members and amplitude of white noise is still an open challenge but here in this paper the number of ensemble members is fixed as 100 and standard deviation of white noise is settled as 0.2.(3)The resulting decomposed modes, i.e., , and one residual are used for further prediction of hydrological time series.

More details of EMD, EEMD, and CEEMDAN are given in [20, 28].

2.3. P-Step

Prediction of All IMF’s. In prediction stage, denoised IMFs are further used to predict the hydrological time series data as inputs by using simple stochastic and complex machine learning time series algorithms. The reason of using two types of model is that as first few IMFs contain high frequencies which are accurately predicted through complex ML models whenever, last IMFs contain low frequencies which are accurately predictable through simple stochastic models. The selected models are briefly described as follows.

The IMF prediction with ARIMA model: to predict the IMFs, autoregressive moving average model is used as follows:Here, is the IMF and is the residual of CEEMDAN where is autoregressive lag and is moving average lag value. Often the case, time series is not stationary; [7] made a proposal that differencing to an appropriate degree can make the time series stationary; if this is the case then the model is said to be ARIMA where d is the difference value which is used to make the series stationary.

The IMF prediction with group method of data handling type neural network: ANN has been proved to be a powerful tool to model complex nonlinear system. One of the submodels of NN, which is constructed to improve explicit polynomial model by self-organizing, is Group Method of Data Handling-type Neural Network (GMDH-NN) [32]. The GMDH-NN has a successful application in a diverse range of area; however, in hydrological modeling it is still scarce. The algorithm of GMDH-NN worked by considering the pairwise relationship between all selected lagged variables. Each selected combination of pairs entered in a neuron and output is constructed for each neuron. The structure of GMDH-NN is illustrated in Figure 2 with four variables, two hidden and one output layer. According to evaluation criteria, some neurons are selected as shown in Figure 2, four neurons are selected, and the output of these neurons becomes the input for next layer. A prediction mean square criterion is used for neuron output selection. The process is continued till the last layer. In the final layer, only single best predicted neuron is selected. However, the GMDH-NN only considers the two variable relations by ignoring the individual effect of each variable. The Architecture Group Method of Data Handling type Neural Network (RGMDH-NN), an improved form of GMDH-NN, is used which simulates not only the two-variable relation but also their individuals. The model for RGMDH-NN is described in the following equation:The rest of the procedure of RGMDH-NN is the same as GMDH-NN. The selected neuron with minimum Mean Square Error is transferred in the next layer by using the transfer functions listed in Table 1. The coefficients of each neuron are estimated with regularized least square estimation method as this method of estimation has the ability to solve the multicollinearity problem which is usually the inherited part of time series data with multiple lagged variables.

Radial basis function neural network: to predict the denoised IMFs, nonlinear neural network, i.e., Radial Basis Function (RBFNN), is also adopted. The reason for selecting RBFNN is that it has a nonlinear structure to find relation between lagged variables. The RBFNN is a three-layer feed-forward neural network which consists of an input layer, a hidden layer, and an output layer. The whole algorithm is illustrated in Figure 3. Unlike GMDH-NN, RBFNN takes all inputs in each neuron with corresponding weights and then hidden layer transfers the output by using radial basis function with weights to output. The sigmoid basis function is used to transfer the complex relation between lagged variables as follows:

3. Case Study and Experimental Design

Selection of Study Area. In this study, the Indus Basin System (IBS), known to be the largest river system in Pakistan, is considered which plays a vital role in the power generation and irrigation system. The major tributaries of this river are River Jhelum, River Chenab, and River Kabul. These rivers get their inflows mostly from rainfall, snow, and glacier melt. As in Pakistan, glaciers covered 13,680 km2 area in which estimated 13% of the areas are covered by Upper Indus Basin (UIB) [33]. About 50% melted water from these 13% areas adds the significant contribution of water in these major rivers. The Indus river and its tributaries cause flooding due to glacier and snow melting and rainfall [34]. The major events of flood usually occur in summer due to heavy monsoon rainfall which starts from July and end in September. It was reported [35] that, due to excessive monsoon rainfall in 2010, floods have been generated in IBS which affected 14 million people and around 20,000,000 inhabitants were displaced. Moreover, surface water system of Pakistan is also based on flows of IBS and its tributaries [36]. Pappas [37] mentioned that around 65% of agriculture land is irrigated with the Indus water system. Therefore, for effective water resources management and improving sustainable economic and social development and for proactive and integrated flood management, there is a need to appropriately analyze and predict the rivers inflow data of IBS and its tributaries.

Data. To thoroughly investigate the proposed models, four rivers’ inflow data is used in this study which is comprised of daily rivers inflow (1-January to 19-June) for the period of 2015-2018. We consider the main river inflow of Indus at Tarbela with its two principal, one left and one right, bank tributaries [38]: Jhelum inflow at Mangla, Chenab at Marala, and Kabul at Nowshera, respectively (see Figure 4). Data is measured in 1000 cusecs. The rivers inflow data was acquired from the site of Pakistan Water and Power Development Authority (WAPDA).

Comparison of Proposed Study with Other Methods. The proposed models are compared with other prediction approaches by considering with and without principals of denoising and decomposition. For that purpose, the following types of models are selected:(I)Without denoising and decomposing, only single statistical model is selected, i.e., ARIMA (for convenience, we call one-stage model 1-S) as used in [8].(II)Only denoised based models: in this stage, the noise removal capabilities of WA and EMD are assessed. The wavelet based models are WA-ARIMA, WA-RBFNN, and WA-RGMDH whereas the empirical mode decomposition based models are EMD-ARIMA, EMD-RBFNN, and EMD-RGMDH. The different prediction models are chosen for the comparison of traditional statistical models with artificial intelligence based models as RBFN and RGMDH (for convenience, we call two-stage model 2-S). The 2-S selected models for comparison are used from [15, 17] for the comparison with the proposed model.(III)With denoising and decomposition (existing method): for that purpose, three-stage EMD-EEMD-MM model is used from [2] for the comparison with proposed models. Under this, the multiple models are selected by keeping the prediction characteristics similar to proposed model for comparison purpose (for convenience, we call three-stage model 3-S).

Evaluation Criteria. The prediction accuracy of models is assessed using three evaluation measures such as Mean Relative Error (MRE), Mean Absolute Error (MAE), and Mean Square Error (MSE). The following are their equations, respectively:andAll proposed and selected models are evaluated using these criteria. Moreover, in GMDH-NN and RGMDH-NN models, neurons are selected according to MSE.

4. Results

D-stage results: the results of two noise removal filters, i.e., WA and EMD, are described below.

Wavelet based denoised: after calculating the approximations from (1) and details from (2), the hard and soft rule of thresholding are used to remove noises from hydrological time series coefficients. Both hard and soft rules are calculated from (3) and (4) respectively. On behalf of lower MSE, hard threshold based denoised series are reconstructed through (5) for WA.

EMD-based threshold: to remove noises through EMD, intrinsic mode functions are calculated from (7), and then hard and soft thresholds are used to denoise the calculated IMFs except the last two IMFs as, due to smooth and low frequency characteristics, there is no need to denoise the last two IMFs. Hard threshold based denoised IMFs are further used to reconstruct the noise free hydrological time series data from (8).

The WA and EMD based denoised Indus and Jhelum rivers inflow are shown in Figure 5. The statistical measures including mean , standard deviation , MRE, MAE, and MSE of original and denoised series for all case studies of both noise removal methods are presented in Table 2. The results show that the statistical measures are almost the same for both denoising methods except MSE, as for Indus and Jhelum inflow, WA-based denoised series have lower MSE than EMD; however, for Kabul and Chenab inflow, EMD-based denoised series have lower MSE than WA-based denoised series. Overall, it was concluded that both methods have equal performance in denoising the hydrological time series data. In decomposing stage, both of WA and EMD based denoised series are separately used as input to derive the time varying characteristics in terms of high and low frequencies.

Decompose-stage results: to extract the local time varying features from denoised hydrological data, the WA/EMD-based denoised hydrological time series data are further decomposed into nine IMFs and one residual. The CEEMDAN decomposition method is used to extract the IMFs from all four rivers. EMD-based denoised-CEEMDAN-based decomposed results of Indus and Jhelum rivers inflow are shown in Figure 6 whenever WA-CEEMDAN-based noise free decomposed results of Indus and Jhelum rivers inflow are shown in Figure 7. All four rivers are decomposed into nine IMFs and one residual showing similar characteristics for both methods. The extracted IMFs show the characteristics of hydrological time series data where the starting IMFs represent the higher frequency whereas last half IMFs show the low frequencies and residual are shown as trends as shown in Figures 6 and 7. The amplitude of white noise is set as 0.2 as in [2] and numbers of ensemble members are selected as maximum which is 1000.

P-step results: for all extracted IMFs and residual, three methods of predictions are adopted to get the more precise and near-to-reality results. For that reason, one traditional statistical method, i.e., ARIMA (p, d, q), with two other nonlinear ML methods, i.e., GMDH-NN and RBFNN, are used to predict the IMFs and residuals of all four river inflows. The rivers inflow data of all four rivers are split: 70% for training set and 30% for testing set. The parameters and structure of models are estimated using 886 observations of rivers inflow. The validity of proposed and selected models is tested using 30% data of rivers inflow. After successful estimation of multimodels on each IMF and residual, the best method with minimum MRE, MAE, and MSE is selected for each IMF prediction. The testing results of proposed models with comparison to all other models for all four rivers’ inflow, i.e., Indus inflow, Jhelum inflow, Chenab inflow, and Kabul inflow, are presented in Table 3. The proposed EMD-CEEMDAN-MM and WA-CEEMDAN-MM model prediction results fully demonstrate the effectiveness for all 4 cases with minimum MRE, MAE, and MSE compared to all 1-S [8], 2-S [15, 17], and 3-S [2] evaluation models. However, overall, the proposed WA-CEEMDAN-MM model attains the lowest MSE as compared to other EMD-CEEMDAN-MM proposed models. The worst predicted model is 1-S, i.e., ARIMA, without denoising and without decomposing the hydrological time series data with highest MSE. The predicted graphs of proposed model, i.e., EMD-CEEMDAN-MM, with comparison to 2-S models, i.e., with EMD based denoised for Indus and Jhelum river inflow, are shown in Figure 8 and WA-CEEMDAN-MM with comparison to 2-S models, i.e., with WA based denoised, are shown in Figure 9.

To improve the prediction accuracy of complex hydrological time series data from simple time series models one can take the advantage from three principals of “denoising,” “decomposition,” and “ensembling the predicted results.” The 2-S model, with simple ARIMA and GMDH, can perform well as compared to 2-S models with complex models and 1-S models by optimal decomposition methods. Moreover, with addition to extracting time varying frequencies from denoised series, one can get the more precise results over 2-S models. However, from Table 3, it can be concluded that the proposed WA-CEEMDAN-MM and EMD-CEEMDAN-MM models perform more efficiently to predict the hydrological time series data by decreasing the complexity of hydrological time series data and enhancing the prediction performance over 1-S, 2-S, and 3-S existing models.

The following conclusions are drawn based on the testing error presented in Table 3.

Overall comparison: the overall performances of proposed models WA-CEEMDAN-MM and WA-CEEMDAN-MM are better than all other evaluation models selected from the study [2, 8, 15, 17] with the lowest MAE, MRE, and MSE values for all case studies. However, among two proposed models, WA-CEEMDAN-MM performs well by attaining on average 8.49%, 24.19%, and 5.43% lowest MAE, MRE, and MSE values, respectively, for all four rivers’ inflow prediction as compared to EMD-CEEMDAN-MM as listed in Table 3. It is shown that both proposed models perform well with comparison to 1-S, 2-S, and existing 3-S. Moreover, it is also noticed that most of IMFs are precisely predicted with simple traditional statistical ARIMA model except the first two IMFs as the first IMFs presented high frequencies showing more volatile time varying characteristics with the rest of IMFs. However, overall WA-CEEMDAN-MM is more accurate in predicting the rivers inflow.

Comparison of proposed models with other only denoised series models: removing the noise through WA- and EMD-based threshold filters before statistical analysis improved the prediction accuracy of complex hydrological time series. It can be observed from Table 3 that the MAE, MRE, and MSE values of four cases perform well for 2-S model as compared to 1-S model in both WA and EMD based denoised inputs. However, like overall performance of WA-CEEMDAN-MM, WA-based denoised models perform well compared to EMD-based denoised. Moreover, with denoised series, the several statistical (simple) and machine learning (complex) methods are adopted to further explore the performances between simple and complex methods to predict inflows. This can be seen from Table 3, where WA-RBFN and EMD-RBFN perform the worst compared to WA-ARIMA, WA-RGMDH, EMD-ARIMA, and EMD-RGMDH. This means that with denoising the hydrological series one can move towards simple models as compared to complex models like radial basis function neural network. WA-RGMDH and EMD-RGMDH attain the highest accuracy among all 2-S models.

Comparison of proposed models with other denoised and decomposed models: in addition to denoising, the decomposition of hydrological time series strategy effectively enhances the prediction accuracy by reducing the complexity of hydrological data in multiple dimensions. It is shown from Table 3 that the 3-S (existing) performs better as on average for all four rivers MAE, MRE, and MSE values are 13.76%, -6.55%, and 54.79%, respectively, lower than 1-S model and 63.40, 64.76%, and 96.78% lower than 2-S model (EMD-RBFNN). Further research work can be done to explore the ways to reduce the mathematical complexity of separate denoising and decomposition like only single filter which not only denoises but also decomposes the hydrological time series data with the same filter to effectively predict or simulate the data.

5. Conclusion

The accurate prediction of hydrological time series data is essential for water supply and water resources purposes. Considering the instability and complexity of hydrological time series, some data preprocessing methods are adopted with the aim to enhance the prediction of such stochastic data by decomposing the complexity of hydrological time series data in an effective way. This research proposed two new methods with three stages as “denoised,” decomposition, and prediction and summation, named as WA-CEEMDAN-MM and EMD-CEEMDAN-MM, for efficiently predicting the hydrological time series. For the verification of proposed methods, four cases of rivers inflow data from Indus Basin System are utilized. The overall results show that the proposed hybrid prediction model improves the prediction performance significantly and outperforms some other popular prediction methods. Our two proposed, three-stage hybrid models show improvement in prediction accuracy with minimum MRE, MAE, and MSE for all four rivers as compared to other existing one-stage [8] and two-stage [15, 17] and three-stage [2] models. In summary, the accuracy of prediction is improved by reducing the complexity of hydrological time series data by incorporating the denoising and decomposition. In addition, these new prediction models are also capable of solving other nonlinear prediction problems.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group no. RG-1437-027.