Abstract

Streamflow prediction is vital to control the effects of floods and mitigation. Physical prediction model often provides satisfactory results, but these models require massive computational work and hydrogeomorphological variables to develop a prediction system. At the same time, data-driven prediction models are quick to apply, easy to handle, and reliable. This study investigates a new hybrid model, the wavelet bootstrap quadratic response surface, for accurate streamflow prediction. Wavelet analysis is a well-known time-frequency joint analysis technique applied in various fields like biological signals, vibration signals, and hydrological signals. The wavelet analysis is used to denoise the time series data. Bootstrap is a nonparametric method for removing uncertainty that uses an intensive resampling methodology with replacement. The authors analyzed the results of the studied models with different statistical metrics, and it has been observed that the wavelet bootstrap quadratic response surface model provides the most efficient results.

1. Introduction

Water is essential for all living things, including plants, animals, and people. Water is an all-purpose solvent and a gift from nature to all living things. The sustainability of modern ecology, human existence on Earth, and the provision of food for an expanding population depend heavily on water availability [1]. Ghafoor and Nawaz [2] explain the worldwide phenomenon of climate change and its variety of implications for of various ecologies. For example, variations in rainfall intensity and duration could cause floods or droughts, and these catastrophes pose a serious risk to regional and global food security. In the last few decades, streamflow prediction plays a significant part in the hydrology field and the administration of water resources. Accurate and reliable streamflow estimation provides the foundation to increase the capability of reservoirs, flood avoidance, water resources, and the design of hydroelectric projects [3]. Thus, for a short period, streamflow prediction can be helpful to enhance the management of water supply. However, accurate and consistent prediction is a tremendous challenge for hydrologists because of the complicated variability in the river system [4].

In the last two decades, artificial intelligence (AI) techniques such as adaptive neuro-fuzzy inference system (ANFIS), support vector machine (SVM), and neural network (NN), have been employed for river stage prediction. These methods were used because of their skills to model the nonlinear behavior of time series data [512]. A deep learning model is used to estimate the uncertainty associated with river flow and determine flood prediction [13]. Wavelet artificial neural network (WANN) model tested by Shafaei and Kisi [14] for river flow predictions. Results depict that the proposed model yields good results than the traditional artificial neural network (ANN) and support vector machine (SVM) models. Delafrouz et al. [15] planned a new hybrid model by conjunction of phase-space reconstruction (PSR) and ANN methods for reliable daily streamflow prediction. The developed PSR-ANN model was compared with the traditional ANN and gene expression programming (GEP) models. It has been concluded that the proposed model, PSR-ANN produced the best prediction result. Two AI-based techniques named WANN and linear genetic programming (LGP), were introduced by Danandeh Mehr et al. [16] for monthly streamflow forecasting. The results showed that the LGP model is better than the WANN model for monthly streamflow prediction. The SVM model with an adaptive insensitive factors introduced by Gueo et al. [17] for monthly streamflow prediction. The wavelet transform (WT) technique is practiced to reduce noise from data. The phase-space reconstruction technique is applied to determine the structure of the forecasting model. Results claimed that an improved SVM model with adaptive insensitive factor is suitable for processing complex hydrology data.

WT is a prevalent technique to remove noise from nonstationary time series data [1822]. WANN and wavelet adaptive neuro-fuzzy inference system (WANFIS) models were examined by [23] to check the reliability of the proposed models for maximum lead time. It has been revealed that the WANFIS model is suitable for 1–6-hour prediction and the WANN model is a reliable model for 8–10-hour prediction. Results indicated that WT increases the efficiency of both models. Drisya et al. [24] compared the feedforward neural network (FFNN) and WANN model to analyze streamflow prediction. The WANN model represents more high-quality results than the FFNN model for streamflow prediction. Seo et al. [25] introduced hybrid models: wavelet packet artificial neural network (WPANN), wavelet packet-adaptive neuro-fuzzy inference system (WP-ANFIS), and wavelet packet-support vector regression (WPSVR) by the conjunction of wavelet packet decomposition to the traditional machine learning models like ANN, ANFIS, and support vector regression (SVR). The wavelet packet decomposition (WPD) technique significantly increases the predictive power of the machine learning models. Overall, the WP-ANFIS model produces reliable results for daily river stage prediction. Wavelet-based regression models and wavelet-based NN models are studied by Partal [26] for monthly streamflow prediction. The wavelet transformation has a positive impact on forecasting results when coupled with regression and NN-based models. Khan et al. [27] coupled the wavelet technique with autoregressive integrated moving average (ARIMA) and ANN models for future drought forecasting. Wavelet bootstraps multiple linear regression (WBMLR) model proposed by Sehgal et al. [28] for river stage prediction. They showed that the WBMLR model produced better results than the remaining models used in the study based on ANN and MLR models. Two-hybrid models, wavelet neural network (WNN) and ANN, are attached with block bootstrap sampling (BB) technique by Kasiviswanathan et al. [29]. They concluded that the WNN-BB model consistently yields more performance for flood management than the ANN-BB model. RS method of higher-order polynomial functions is used for flood forecasting. They concluded that the RS model with a fifth-order better forecast river stage [30].

This study aims to determine the performance of the wavelet bootstrap quadratic response surface (WBQRS) model for daily reservoir inflow prediction. In addition, it compares the performance of the developed model with some traditional and hybrid models.

2. Methods

2.1. Wavelet Transform

WT can solve these problems by decomposing one-dimensional signals into two-dimensional time-frequency domains at a similar time. Moreover, wavelets have the property of irregularity and are asymmetric in shape. Due to these properties, the wavelet technique is useful for analyzing signals having harp changes and discontinuities [31]. DWT technique provides a timescale representation of time-series data and finds a relationship. The DWT technique is useful for nonstationary data to remove noise from data. In addition, the DWT technique is useful when data constitute jumps or shifts and produces a very accurate analysis [32].

Time series analysis is solved by applying numerous techniques, one of which, the most well-known is Fourier transform (FT). Unfortunately, although, FT breaks down time-series data into basic sinusoids of various frequencies while time information is lost during the process of transformation. Therefore, it is recommended that FT is not a suitable choice for analyzing signals with transitory characteristics such as trends, discontinuities, drift, and breakdown points. On the other hand, WT deals with a long interval of time when low-frequency components are required, and it deals with a short interval of time when high-frequency components are the need of time [33].

WT is obtained through the time-frequency components of a signal. WT is divided into two main types: discrete wavelet transforms (DWT) and continuous wavelet transforms (CWTs). The function of CWT is given bywhere

Calculating wavelet coefficients at every possible resolution level (scale) creates a massive amount of data. Therefore, the choice of resolution level based on the dyadic scale and wavelet analysis obtained through dyadic scale should be more accurate and efficient. This transformation is called DWT. In equation (1), is converted into a wavelet coefficient . Here, (real number) is indicated by the translation parameter, whereas (real and positive number) is symbolized by the dilation parameter. These two parameters are dilating and translating the mother wavelet, , respectively. The DWT was obtained by these two parameters and dilation parameter is discretized while the translation parameter. is discretized with . In the process of discretization, the wavelet function gets the formwhere j and K are the integers represented by scale and translation factors, respectively [34].

Mallat [35] introduced two filters that isolate the signal into two different scales: the high-pass filter and low-pass filter. A high pass filter (wavelet function) represents high frequency and low scale. Wavelet functions are rapidly changing signal features, and these functions are acquired by correlating the original signal with the compressed signal. Other names of high-pass filter (wavelet function) include running differences, detail, and fluctuation. Low-pass filter (scaling function) constitutes low frequency and high scale. Scaling function is also known as an approximation or trend. Figure 1 represents the scaling function and wavelet function of symlet wavelet with vanishing moment 15.

2.2. MLR Model

In regression analysis, intercept and regression coefficients are estimated by employing the method of minimizing the sum of squared residuals. The MLR model is used in hydrological modeling for several decades. Tiwari and Chatterjee [36] and Kisi [37] introduced recent applications in the hydrological literature.

The response variable Y is a column vector of order n × 1.

The predictor variable represented by X is an n × 2 matrix.

Regression coefficients β correspond to a 2 × 1 column vector.

The error term is an n × 1 column vector.

2.3. Bootstrap Technique

The bootstrap technique resamples the original data set with replacements and trains the model on each resampled data point instead of the original data. The bootstrapping technique applied to develop a single realization of a distribution to generate a set of bootstrap samples. These bootstrap samples furnish a better understanding of the average and variability of the original unknown distribution [36]. Suppose ordinary bootstrap proof is inconsistent with a wider confidence interval. So, there is no need to use the m = n bootstrap sample size. For consistency, the m-out-of-n bootstrap technique is useful in such a situation and proves bootstrap to be consistent [38].

Three steps are involved in applying the bootstrap: sampling for estimation of bootstrap samples with replacement, determining the bootstrap distribution, and final application of the bootstrap distribution. The bootstrap distribution of statistics is indicated as the sampling distribution of statistics based on the resampling technique. The main advantage of bootstrap sampling has the simplicity of performing the complex calculation of mean, standard error, and confidence interval. Therefore, the bootstrap technique could be successfully applied compared to other resampling techniques like Jackknife in estimating unsmoothed parameters and estimating the variance of nonlinear statistics. The vital point to be remembered here is that the parametric bootstrap is used when we have information about the normal distribution. On the other hand, when the distribution is non-normal, it cannot fulfill the assumption of normality. In such a situation, we use a nonparametric bootstrap sample [39].

Let denoted by a set of the random sample that is independently and identically (iid) distributed with sample size n. These samples are drawn from an unknown probability distribution while assumed to be observed value. Bootstrap samples generated from observed values and empirical distribution (bootstrap population) developed using uniform sampling with replacement of observed time series data. be a sample taking probability on each value. The set of a bootstrap sample can be indicated as , where B is the total number of bootstrap samples.

2.4. Response Surface Method

Response surface modeling is a procedure to mimic the system of responses because of changes in predictors [40]. RS modeling is a helpful technique to deal with natural events in the field of hydrology. A mathematical equation can express the relationship between regressors and the response variable.

In equation (5), the response variable is expressed by Y, f represents the response function, are regressors, and denoted by the error term.

The mathematical equation expresses a linear relationship between regressors and the response variables.

FORS model involves the cross-product terms.

The mathematical equation of QRS model is

QRS model comprises additional second-order terms in comparison with the FORS model.

2.5. Performance Indices

The statistical indicators to compare the performance of the developed models are RMSE, MAE, NSE, and CP.where is represented by the observed flow, is the predicted flow, and the size of the data set indicates n. RMSE determines the variation in error under the independence of sample size. It determines the discrepancy between observed and predicted streamflow values. The value of RMSE lies between zero and 1. The average absolute error is called the MAE. It shows how close the forecast value is to the observed value [41]. NSE is applied in the hydrologic field to appraise the predictive power. The range of NSE is from to 1. As the value of NSE is close to 1, it means that the model fit is good.

3. Study Area and Data Set

River Chenab is structured by joining the Chandra and Bhaga rivers at Tandi in Himachal Pradesh state, India. Then, it enters into the plains downward from the uplands to the vast alluvial lowlands of Punjab and Pakistan. The Chenab river basin is positioned in eastern Punjab and flows through the southwestern direction in Punjab province. Marala, Khanki, and Qadrabad gauging stations play an important role in the river canal link system in Punjab and Pakistan. Marala gauging station was built in district Sialkot in 1968 and had a length of 1.366 kilometers. It is situated between N to E and has a discharge capacity of 1.1 million cusec water. Khanki gauging station is situated on the river Chenab when it enters in Gujranwala district. It was constructed in 1982 and had a maximum discharge capacity of 0.8 million cusecs. The next gauging station on river Chenab is Qadrabad at district Mandi Bahauddin. The total length of the Chenab river basin is around 974 km, and several irrigation canals are fed by it. Table 1 indicates the summary statistics of the three selected gauging stations for the analysis of this study while Figure 2 represents the Chenab river basin with its adjoining rivers. Where for simplicity, authors use abbreviations of gauging stations Marala = Mar, Khanki = Kha, Qadrabad = Qad, Trimmu = Tri, and Punjab = Punj.

Data used for this study are the daily water discharge of Chenab river basin from three gauging stations during 2005–2010 (1 July–30 September). The data constitutes into two parts. The first part consists of training data from 2005–2009 while the testing data constitute of the 2010 year.

4. Model Development

The prime step in hydrological modeling is selecting appropriate input variables for the models [29, 42]. However, there is no exact method for the selection of the input variables of prediction models. Therefore, different researchers use different techniques for the selection of significant variables. In the present study, the correlation technique is applied for the selection of input variables and considers the geographic location of gauging stations. Figure 1 represents the Chenab river basin with its adjoining rivers.

Correlations of gauging stations are shown in Table 2 to select the relevant input variables for streamflow prediction. Pearson correlation coefficient applies to calculate correlation among gauging stations (variables) on complete data to select suitable input variables. Its mathematical form is given as

Table 2 clearly shows that the Mar gauging station strongly correlates with Kha and Qad gauging station, but it has a week correlation with Tri and Punj gauging stations. Kha gauging station has a strong correlation with Mar and Qad gauging stations. Tri and Punj have a strong correlation with each other. If we look at the geographic location of the Chenab river basin, it is observed that the Jehlum river joins river Chenab at Tri Gauging station. Therefore, Qad gauging station is our forecast site and is used as a response variable, while Mar and Kha used it as predictors.

The main idea behind the correlation analysis is to determine the relationship between two variables. “Correlation analysis is the study of the relationship between variables” [43] p. 375). The correlation coefficient between Mar and Kha is 0.8690 (Table 2), which is the sign of high multicollinearity among both predictors. The principal component analysis technique was applied to remove the multicollinearity problem. By using this operation on the predictors, principal components are formed. These components are orthogonal among each other and have an independent linear relationship. Results obtained from these components produced principal component scores. Principle scores are applied as input to the predictors to eliminate multicollinearity among predictors [44].

DWT method is applied to decompose the data of river streamflow into wavelet components. Choice of the mother wavelet is another important step to perform wavelet analysis because the results obtained from time-series data greatly depend upon selecting the mother wavelet. Maheswaran and Khosa [45] explain that the choice of a suitable mother wavelet is based on the properties and application of the wavelet function, that is, the vanishing moments and region of support. The support region of the wavelet function explains the characteristics of localization and vanishing moments dealing with the polynomial behavior of time series data [46]. Four different wavelet families with different vanishing moments are tested on a traditional FORS-based model for 1-d ahead prediction to select the best one wavelet function. The wavelet families to be tested are Haar, symlets (sym2, sym6, sym12, sym15, and sym20), coiflets (coif1, coif2, and coif4), and Daubechies (db4, db12, db15, and db20). The symlet mother wavelet function with vanishing moment 15 is selected to decompose the time series data because its performance is good on all performance indices compared to other mother wavelet functions, as represented in Table 3.

The best performance in the wavelet domain is also based on selecting the optimal level of decomposition. However, the decomposition level of WT can be determined using the mathematical formula as given in the following [47, 48]:

In this empirical formula, L indicates the length of data; it is the function to convert the decimal parts to an integer, and N represents the total length of data. Three levels of decomposition are chosen by using the empirical formula of the equation. First, the data decomposes into three wavelet components: detail (D1, D2, and D3) and approximation (A3). Then, the effective wavelet components (D3 and A3) are chosen to provide input to the wavelet-based models. Figure 3 indicates the time series data with wavelet decomposition at level third.

4.1. Wavelet-Based Model Development

DWT technique coupled with the RS-based model (FORS and QRS) to produce new hybrid models: WFORS and WQRS. The algorithm of WFORS and WQRS models consists of two steps.(1)At the first step, we decompose the original streamflow data into wavelet components and use the DWT technique after selecting the optimal decomposition level.(2)In the second phase, the effective discrete wavelet component provides input to the FORS and QRS models.

4.2. Wavelet-Bootstrap-Based Model Development

Finally, we develop new hybrid models using wavelet and bootstrap methods on the FORS model to establish the WBFORS model. These two methods are coupled with the QRS model to develop the WBQRS model. To develop the wavelet-bootstrap-based models, we operate the bootstrap technique on the effective wavelet components and provide this bootstrap resampled data to the FORS and QRS models. Flowchart in Figure 4 shows the process of the hybrid models.

Figure 4 shows the algorithm of the developed models in this study. This article studies the relationship between Qad gauging station (predicting variable, Y), Mar gauging station, and Kha gauging station (predictors: X1 and X2). For that matter, the traditional models employed by using the least square method are

Authors collect data on the monsoon season for the river Chenab (1 July–30 September) for all five gauging stations in Pakistan. Then, the correlation technique is applied to select suitable input variables. So, after applying the correlation technique, the selected variables are Mar, Kha, and Qad. According to its geographical location, the Qad gauging station is the response variable and the remaining two gauging stations (Mar and Kha) are the independent variables. Next, the performance of all studied models (Figure 4) is checked on the performance indices: RMSE, MAE, CP, and NSE. Finally, the results are presented based on the testing data set in Table 4. Then in the second step, the authors applied the wavelet technique to the time series data by using symlet mother wavelet function with vanishing moment fifteen and at level third. Next, data from effective wavelet components A3 and D3 are applied to get WFORS and WQRS models. After that, the bootstrap technique was applied to the DWCs to get WBFORS and WBQRS models. The bootstrap technique removes uncertainty from data because hydrological data have nonstationary trends. When authors hybrid the bootstrapping technique with the wavelet method, the model gets outstanding prediction results, as represented in Table 4. For attaining model consistency and removing the effect of randomness, the authors use the m-out-of-n technique. In this method, m represents the bootstrap sample size, and n is the number of repetitions. So, carefully selecting the bootstrap sample size and the number of bootstrap repetitions to ensure the model gets consistent results. The m-out-of-n bootstrap technique is useful for consistency and proves bootstrap models get consistent results with repetition [38].

5. Results and Discussion

To determine the predictive ability of the determined models, the performance indices to be used are RMSE, MAE, NSE, and CP. The performance of the MLR, FORS, WFORS, WBFORS, QRS, WQRS, and WBQRS models is presented in Table 4. In general, the QRS model attains the optimum performance in this study for river inflow prediction than other traditional models, MLR and FORS.

This study compares the wavelet-based RS models: WFORS and WQRS with the traditional models: QRS, FORS, and MLR. It has been observed from Table 3 that the predictive efficiency of the wavelet-based models (WFORS and WQRS) is much better than the traditional models MLR, FORS, and QRS concerning statistical techniques: RMSE, MAE, NSE, and CP in Chenab river basin. The results of MLR, FORS, and QRS models are poor compared to the WFORS and WQRS models. The reason behind the week performance of the MLR, FORS, and QRS models is that these models use raw streamflow data for modeling. In contrast, such raw streamflow data comprises various frequency components. River streamflow data do not present actual characteristics of the time series data when it utilizes one resolution component for prediction purposes [14]. Therefore, wavelet-based models (WFORS and WQRS) use different frequency resolution levels for streamflow data. Wavelet transform is the most suitable choice for reservoir inflow prediction rather than the traditional models. On the basis of performance indices (RMSE, MAE, NSE, and CP), the WQRS (RMSE = 0.217 m3/s, MAE = 0.1354 m3/s, NSE = 0.9796, CP = 0.9563) and WFORS (RMSE = 0.2250 m3/s, MAE = 0.1470 m3/s, NSE = 0.9761, CP = 0.9521) models has much better results for 1-d ahead prediction of testing data set than the traditional models MLR (RMSE = 0.4010 m3/s, MAE = 0.2082 m3/s, NSE = 0.8426, CP = 0.7398), FORS (RMSE = 0.3780 m3/s, MAE = 0.1941 m3/s, NSE = 0.8621, CP = 0.7719), and QRS (RMSE = 0.3390 m3/s, MAE = 0.1952 m3/s, NSE = 0.8913, CP = 0.8201).

Values of the performance indices RMSE and MAE are higher for models MLR, FORS, QRS, WFORS, and WQRS than the models WBFORS and WBQRS, while the values of performance indices NSE and CP are closer to the one for WBFORS and WBQRS models compared to the remaining models for 1–3 ahead prediction. Thus, this study reveals that both models: WBFORS and WBQRS yield better performance in terms of prediction, but the performance of the WBQRS model is excellent.

Scatter plots for 1 d and 3 d ahead prediction of MLR, FORS, WFORS, WBFORS, QRS, WQRS, and WBQRS model are presented in Figures 58, respectively. According to the NSE criterion, the WBQRS model shows that the straight line is good and in better agreement with the observed streamflow. Therefore, the WBQRS model indicates good predictive power with 1–3 days ahead prediction compared to the other models.

6. Conclusion

Streamflow prediction plays a vital role in hydrology for assessing water patterns. This study explains the water flow prediction using wavelet and bootstrap techniques in the RS model. In this paper, the authors introduce a new hybrid model, WBQRS, for reservoir inflow prediction. To check the validity, the proposed WBQRS model was compared with the MLR, FORS, QRS, WFORS, WQRS, and WBFORS models on different statistical criteria: RMSE, MAE, NSE, and CP. The observed data is decomposed into different frequency components by applying the DWT technique at level third. The mother wavelet function used in this whole scenario is symlet with vanishing moment 15 (sym15). In comparison with other models, WBQRS model depicts the best prediction results with different performance indices for 1-d ahead predictions are RMSE = 0.0356 m3/s, MAE = 0.0283 m3/s, NSE = 0.9877, and CP = 0.9957. The results depict that the WBQRS model is a better choice for prediction than the remaining models in the study. In all cases, the models show reliable prediction for 1-3-day prediction. This study will be helpful for authorities to make an early prediction about floods and activate the flood warning response system. This research depicts that the WBQRS model is the most suitable choice for short-term streamflow prediction.

Data Availability

Data are available upon request in the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported via funding from Prince Sattam Bin Abdulaziz University project number (PSAU/2023/R/1444).