#### Abstract

Estimation of discharge flowing through rivers is an important aspect of water resource planning and management. The most common way to address this concern is to develop stage-discharge relationships at various river sections. Various computational techniques have been applied to develop discharge ratings and improve the accuracy of estimated discharges. In this regard, the present study explores the application of the novel hybrid multigene genetic programming-generalized reduced gradient (MGGP-GRG) technique for estimating river discharges for steady as well as unsteady flows. It also compares the MGGP-GRG performance with those of the commonly used optimization techniques. As a result, the rating curves of eight different rivers were developed using the conventional method, evolutionary algorithm (EA), the modified honey bee mating optimization (MHBMO) algorithm, artificial neural network (ANN), MGGP, and the hybrid MGGP-GRG technique. The comparison was conducted on the basis of several widely used performance evaluation criteria. It was observed that no model outperformed others for all datasets and metrics considered, which demonstrates that the best method may be different from one case to another one. Nevertheless, the ranking analysis indicates that the hybrid MGGP-GRG model overall performs the best in developing stage-discharge relationships for both single-value and loop rating curves. For instance, the hybrid MGGP-GRG technique improved sum of square of errors obtained by the conventional method between 4.5% and 99% for six out of eight datasets. Furthermore, EA, the MHBMO algorithm, and artificial intelligence (AI) models (ANN and MGGP) performed satisfactorily in some of the cases, while the idea of combining MGGP with GRG reveals that this hybrid method improved the performance of MGGP in this specific application. Unlike the black box nature of ANN, MGGP offers explicit equations for stream rating curves, which may be counted as one of the advantages of this AI model.

#### 1. Introduction

For quantification of discharge in open channels, hydraulic engineers often restore to the application of rating curves to avoid the need for in situ discharge measurements conducted either continuously or at particular time intervals [1]. These curves are developed by analysing the relationship between the water level and discharge at a specific river section under steady and unsteady circumstances. In the former, the rating curve is single-valued because of the one-to-one relationship between water depth and discharge. However, corresponding to the same water level, higher discharges are observed when the water level is rising during floods than in receding stages resulting in a hysteresis relationship and loop ratings [2]. Finally, once a discharge rating curve is obtained, it can be deliberately used to estimate discharges by measuring water depth exclusively.

Realizing the importance and wide applications of rating curves, the topic has been studied remarkably. Different issues investigated on the topic include [3] (1) developing discharge rating curves using various data-driven techniques under both steady and unsteady flows [4, 5], (2) physical interpretations of different parts of rating curves, (3) uncertainties associated with stage-discharge relationships [6, 7], (4) introducing a rating curve applicable beyond the range of discharge measurements [8, 9], and (5) applying rating curves to practical problems in water resources [10, 11]. Among these issues, the majority of the efforts in the literature may be allocated to the first category, which indicates that the inevitable need to develop more rigorous rating curves is the main problem investigated in many previous studies. Likewise, the problem statement in this study focuses on the assessment of different models for defining single-value and loop rating curves.

According to the literature, available approaches for deriving a stage-discharge relationship may be categorized into two main groups named hard computing and soft computing models. The former exploited either numerical models [12–14] or empirical models [15–17] to determine discharge rating equations, whereas the latter utilized artificial intelligence (AI) models for the same purpose [18–23]. Starting from the simple regression-based ratings, hydraulic engineers gradually have turned towards complex computer programming and machine learning approaches not only to develop discharge rating curves but also to estimate discharges more accurately. Moreover, several studies have reported the superiority of AI models over general regressive techniques [24]. Despite all conducted efforts, the quest not only for new approaches but also for assessing different models available for estimating the stage-discharge relation more precisely is still ongoing due to the importance and wide applications of discharge rating curves.

This study aims not only to explore the applicability of AI models in developing rating curves for steady as well as unsteady flow conditions but also to assess their performances in comparison with some other approaches available in the current literature. Therefore, the performances of the MHBMO algorithm, multigene genetic programming (MGGP), and the hybrid MGGP-GRG, which has been recommended for this specific purpose for the first time in the literature, were compared with the performances of the conventional method, the Excel Solver or evolutionary algorithm (EA), rise and fall (RF) method, and ANN. The comparison is conducted for eight datasets with single-value and loop rating curves using several metrics.

#### 2. Materials and Methods

##### 2.1. Datasets

In the present study, eight datasets (four under the steady and four under the unsteady flow conditions) were used in this study. These data, which have already been published in the literature [16, 17, 24–28], are depicted in Figure 1. As shown, the first four datasets have single-value stage-discharge relations, whereas the next four ones display a loop relationship.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

Each dataset used in this study was randomly divided into two parts: (1) train data (75% of each dataset) and (2) test data (25% of each dataset). The former was utilized for calibrating and training different methods, while the latter was exploited for comparing the estimated and observed discharges. The ranges of the train and test data of the eight datasets are summarized in Table 1. As shown, in each dataset, the minimum and maximum values of each parameter in the train data are lower and higher than the minimum and maximum of the same parameter in the test data, respectively. This characteristic in the data division enables to calibrate or train different methods for a wider range of values so that they can predict the test data better.

##### 2.2. Models Used for Developing Rating Curves

In the subsequent section, the seven models including the conventional method, EA, RF, the MHBMO algorithm, ANN, MGGP, and the hybrid MGGP-GRG model, which were used to develop rating curves, are presented.

###### 2.2.1. Conventional Method

The general form of a discharge rating curve is based on Manning’s equation. This stage-discharge relationship is generally expressed in the following equation [24]:where is a stream discharge, is a river stage, is a constant representing the gauge reading corresponding to zero discharge, and and are the rating curve parameters.

In the conventional method, the value of is determined by the straight line fitted to the curve of versus . This curve is commonly plotted by adopting a trial-and-error process by playing with the value of . In this regard, the transformed logarithmic form of equation (1) is given in the following equation:

By defining , , , and , equation (2) provides a straight relation between and as is *Y *= *AX* + *B*. The values of and can be calculated by using the conventional regression analysis (the least square method).

###### 2.2.2. Evolutionary Algorithm

EA is a population-based probabilistic method with a bioinspired principle of survival of the fittest. This algorithm can be used to solve highly nonlinear problems with less than 200 decision variables [29]. In essence, EA is a combination of evolutionary algorithms, GA, and classical optimization methods. In this zero-order algorithm embedded within Microsoft Excel, a population is initialized randomly. Thereafter, it is subjected to the selection, recombination, and mutation iteratively until it finds a better fit solution in the given time regarding a set of constraints specified.

In the present study, the Excel Solver has been used to optimize the parameters of the rating curve relation shown in equation (1). To obtain the optimal values of the decision variables (*K*, *a*, and *n*), minimization of sum of square of errors (SSE) was set as the objective function. A constraint was put on the variable *a* so that its optimal value is less than or equal to the stage corresponding to the minimum observed discharge. The mutation rate was set as 0.05, while the population size and convergence criterion were fixed as 50 and 0.00001, respectively.

###### 2.2.3. Rise and Fall (RF) Method

Several attempts have also been made by various researchers to address the issues related to the measurement of unsteady flow. During floods, higher discharges are observed when flood waves propagate through the gauging site, while lower discharges are observed for the same stage when the flood wave recedes. This subsequently leads to different relationships between stages and discharges during the rising and falling limbs of flood hydrographs. Since the failure probability of hydraulic structures increases during high discharges, it is vital to distinguish different behaviours of water levels in different limbs of a flood hydrograph. In this regard, separate rating curve equations are fitted to the rising and falling limbs of stage-discharge flood hydrographs, commonly known as the rise and fall (RF) method. Therefore, separate rating curves will be required to be developed to comply with changes in the stage-discharge relationship.

###### 2.2.4. The MHBMO Algorithm

The MHBMO algorithm is a zero-order optimization algorithm that simulates the mating process of honey bees between a queen and drones of a generation. This simulation process mainly searches for the best honey bee (queen) by producing successive generations, which are improved in the light of the fitness criterion through the mating process. Similar to other metaheuristic algorithms, the MHBMO algorithm has several controlling parameters, which have been set in accordance with that of previous studies in the literature [30]. Based on the literature, it is the first time that the MHBMO algorithm has been used for developing rating curves.

###### 2.2.5. Artificial Neural Network

Artificial neural network (ANN) serves as an estimation tool in various fields of research. In essence, it comprises an organized network, which contains three main layers named as input layer, hidden layer, and output layer. Each layer consists of a specific number of neurons, which are permitted to have connections exclusively with other neurons placed in different layers [31]. This principle dictates that the input data, like the output data, should be independent variables. The former and latter data are commonly placed in the input and output layers, respectively. On the other hand, the neurons of the hidden layers play the role of transferring the input data to the output data. Finally, the data flow in the ANN architecture continues until an adequate relationship, which satisfies a desirable accuracy, is obtained [32].

In this study, the input and output data for each dataset consist of normalized stages and normalized discharges, respectively. For the training process of ANN, normalized variables were utilized instead of dimensional variables to capture the relationship between the input and output data more accurately, while normalized variables were turned into ones with dimensions for comparison purposes. Additionally, a feed-forward backpropagation ANN with the Levenberg–Marquardt optimization algorithm was used, while the ANN controlling parameters, except the minimum gradient parameter, were assumed the same as those adopted in the literature [33]. The best value for the minimum gradient parameter was determined by a trial-and-error procedure as it was found that the value of this parameter may have a major impact on the results. Finally, ANN used in this study has a three-layer network, which contains one, ten, and one neurons in the input, hidden, and output layers, respectively.

###### 2.2.6. Multigene Genetic Programming

Genetic programming (GP) is categorized as one of the AI models and as a modified variant of GA. In the latter context, GA works as the search engine of GP to strive for finding a relationship between two sets of variables. To be more precise, GP not only benefits from the random-based searching characteristics of GA but also overcomes one of its limitations, which is known to be the incapability of determining a relation between input and output data associated with a problem with a high order of complexity. The latter advantage of GP over GA is provided with the use of a tree-like configuration [33].

Among various versions of GP, MGGP contains a user-defined number of genes, which enables the implementation of various types of functions and mathematical operators in the quest for a suitable relation between the data under consideration. MGGP exploits GA as an optimization engine to minimize a defined objective function [34]. In the process of solving an optimization problem using MGGP, the main steps of GA, which includes initialization, selection, reproduction, and termination, are successively repeated [35]. To be more specific, it commences with the creation of an initial population. This randomly generated population basically includes different functions and terminals, which are subjected to GA operators, such as crossover and mutation. This process is repeated until one of the termination criteria, which are the maximum number of generation and fitness termination value, is met [36]. Finally, MGGP assigns a weight to each member of the GP population containing a number of trees or genes, while the MGGP output is a linear combination of these trees.

In this study, a MATLAB version code of MGGP, which was adopted from the literature [34, 37], was used. The objective function of this code is minimizing the root mean square of errors between the estimated and observed sets of normalized discharges, while normalized stages were utilized as the input data. Furthermore, the same data division considered for ANN was applied to MGGP. Also, Table 2 summarizes the MGGP parameters used in this study. Since MGGP attempts to solve a defined optimization problem, each run of MGGP may lead to a different equation. Hence, many (more than 50) runs of MGGP were conducted for each dataset, and the relation with the best fitness value was selected as the final MGGP result.

###### 2.2.7. Hybrid MGGP-GRG Method

The hybrid MGGP-GRG refers to the combination of multigene genetic programming and generalized reduced gradient technique. The latter is a first-order optimization algorithm, which has been combined with other models in the literature [30]. Figure 2 illustrates the detailed flowchart of the hybrid MGGP-GRG method used for developing rating curves. As shown, the best-fit MGGP model is selected in the first stage. Then, the coefficients of the model developed by MGGP are obtained by GRG optimization technique with target cell as minimization of SSE. In this way, the equation obtained by MGGP can be further improved by applying GRG optimization technique. Based on the current literature, it is the first time that the hybrid MGGP-GRG has been proposed.

##### 2.3. Performance Evaluation Criteria

The developed ratings obtained by different models were compared on the basis of several performance evaluation criteria including (1) SSE (equation (3)), (2) Nash–Sutcliffe efficiency (NE) (equation (4)), (3) relative error (RE) (equation (5)), (4) mean absolute relative error (MARE) (equation (6)), and (5) maximum absolute relative error (MXARE) (equation (7)). In this regard, the model with the highest efficiency and lowest error will be the best-fit model:where and are the *i*^{th} observed and estimated discharges for each dataset, respectively.

#### 3. Results

The seven methods described in the previous section were used to develop rating curves for all datasets separately. For this purpose, the training part of each dataset was used for (1) calibrating equation (1) using the conventional method, EA, RF, and the MHBMO algorithm and (2) training the AI and hybrid models. The rest of the data in each dataset (test data) were utilized for quantifying the performances of different methods. The results achieved by different methods are presented as follows.

##### 3.1. Results of the Conventional Method, EA, and the MHBMO Algorithm

The rating curve parameters of equation (1), which were obtained by the conventional method, EA, and the MHBMO algorithm, are reported in Table 3. As shown, although these three models applied to the same relation for discharge rating curves (equation (1)), they yielded completely different values of rating curve parameters for Dataset 3, Dataset 5, and Dataset 7. Furthermore, applying EA and the MHBMO algorithm to Dataset 1, Dataset 2, Dataset 4, and Dataset 6 resulted in almost similar optimal parameters of equation (1), while the parameters calibrated by the three methods (the conventional method, EA, and the MHBMO algorithm) are quite close for Dataset 8. The discrepancy in this parameter estimation mainly arises because the parameter in the conventional method is obtained from the best fit of a straight line through a trial-and-error procedure, and the estimates of the other two parameters (and ) are dependent on the value of the parameter . On the other hand, all the rating curve parameters calibrated by EA and the MHBMO algorithm are optimized simultaneously to obtain the global optima. Evidently, the datasets whose parameters gained different values in the calibration process may provide a better perspective when the performances of the conventional method, EA, and the MHBMO algorithm are compared.

##### 3.2. Results of ANN

ANN, as an AI model, leads to a calibrated network for estimating discharges for the train and test data. By conducting many attempts for running ANN, it was found that the minimum gradient parameter in the training process of ANN has a major impact on the discharges estimated by ANN. Therefore, a trial-and-error procedure was adopted to find appropriate values of this controlling parameter.

##### 3.3. Results of MGGP

MGGP provides explicit equations in terms of stage so that discharges can be directly estimated. In the proposed equations, the stage and discharge values were normalized before applying MGGP to achieve better results. For instance, the observed discharges of the *i*^{th} dataset () were normalized by , where , , and are the normalized, minimum, and maximum discharges of the *i*^{th} dataset. The explicit equation obtained by MGGP for estimating normalized discharges of the *i*^{th} dataset is a function of the normalized water depth (). After the dimensionless discharges predicted by the MGGP-based models were transformed back to the dimensional variable (the estimated discharge), they were compared with the corresponding observed values. The equations developed by MGGP are presented in the following for the eight datasets:(1)For Dataset 1:(2)For Dataset 2: where and .(3)For Dataset 3:(4)For Dataset 4:(5)For Dataset 5:(6)For Dataset 6: where the second term in the right-hand side of equation (13) is zero when .(7)For Dataset 7: where the first term in the right-hand side of equation (14) is zero when .(8)For Dataset 8:

##### 3.4. Results of the Hybrid MGGP-GRG Model

The hybrid MGGP-GRG model modified the coefficients of the equations obtained by MGGP. These equations are given in the following for the eight datasets:(1)For Dataset 1:(2)For Dataset 2:(3)For Dataset 3:(4)For Dataset 4:(5)For Dataset 5:(6)For Dataset 6: where the second term in the right-hand side of equation (21) is zero when .(7)For Dataset 7:(8)For Dataset 8:

##### 3.5. Comparison of Different Methods for Single-Value Rating Curves

The discharges estimated by the conventional method, AE, the MHBMO algorithm, ANN, MGGP, and the hybrid MGGP-GRG method are compared with the observed ones using four performance criteria for the single-value rating curves. The results of this comparative analysis are reported in Table 4 and Figure 3 for both train and test data. According to Table 4 and Figure 3, the results obtained by the conventional method are the most inferior for the first four datasets. Since the coefficients determined by EA and the MHBMO algorithm are the same for Dataset 1 and Dataset 2, the performances of these methods are the same for these data. Furthermore, the application of the MHBMO algorithm to Dataset 3 and Dataset 4 slightly improves discharge ratings estimated by EA because it relatively resulted in lower SSE, MARE, and MXARE and higher NE. According to Table 4, ANN improved SSE of the test part of Dataset 1 and SSE of the train parts of Dataset 3 and Dataset 4 comparing to the MHBMO algorithm, whereas the MHBMO algorithm achieved better SSE for the test part of Dataset 3. Based on Figure 3, the MXARE value calculated by ANN is better than that of the MHBMO algorithm for the test part of Dataset 1, whereas the latter has better MXARE and MARE for Dataset 4. The discrepancy between MARE and MXARE obtained by the MHBMO algorithm and ANN is not considerable for Dataset 2 and Dataset 3. Based on the SSE values shown in Table 4, the discharges of the train part of all four datasets estimated by MGGP are much closer to the observed ones in comparison with those of ANN and the MHBMO algorithm, while the hybrid MGGP-GRG improved the SSE values of MGGP for the train part of all single-value rating curves. Furthermore, the MGGP-based model obtained the best SSE for the test part of Dataset 2 compared with ANN and the MHBMO algorithm. Also, the hybrid MGGP-GRG model achieved the best SSE in Table 4 for the test parts of Dataset 2 and Dataset 3, while the best SSE values in this table were computed by ANN and EA for the test parts of Dataset 1 and Dataset 4, respectively. Finally, the lowest MXARE values for the test parts of Dataset 1 and Dataset 3 (shown in Figure 3) were achieved by ANN and the MHBMO algorithm, respectively, while several methods reach the best MXARE values for the test parts of Dataset 2 and Dataset 4.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

For better clarification, Dataset 2 was selected to be focused for further investigations. In this regard, Figure 4 depicts the comparison analysis conducted between different models in terms of relative errors for the test part of Dataset 2. As shown, the conventional method yielded the widest range of relative errors for developing single-value rating curves, whereas the maximum and minimum of relative errors are quite the same in other methods. Also, Figure 4 shows that the patterns of relative errors are quite similar for the applied methods. The similarity of these patterns may reveal that the data belong to a single-value rating curve, whose each stage is designated with a unique discharge.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

##### 3.6. Loop Rating Curves

The performances of different models for the hysteresis-affected data are presented in Table 5 and Figure 5. As shown in Table 5, the highest SSE values were obtained by the conventional method and EA for the training part and the conventional method, EA, and ANN for the test part of Dataset 5. On the other hand, the hybrid MGGP-GRG model achieved the lowest SSE values for both parts of Dataset 5. According to Table 5, the worst SSE for the train part and the best SSE for the test part of Dataset 6 belong to the conventional method, whereas ANN and RF reach the best and the worst SSE values for the train and test parts of Dataset 6, respectively. For the train part of Dataset 7, the conventional method resulted in an SSE value, which is 4.5 times higher than SSE values obtained by other methods. Furthermore, ANN and the hybrid MGGP-GRG method achieved the best SSE value for Dataset 7. Table 5 shows that the conventional method, EA, and the MHBMO algorithm yielded the highest SSE for the test part of Dataset 7, whereas the AI models (ANN and MGGP) and the hybrid MGGP-GRG model significantly improved the SSE of other methods for the corresponding data. Additionally, the best SSE values for the train and test parts of Dataset 8 were achieved by RF and the hybrid MGGP-GRG model, respectively. Moreover, the conventional method achieved the lowest NE for most train and test parts of loop rating curves, whereas the AI models and the hybrid MGGP-GRG method overall performed acceptably and even the best in some cases as shown in Figure 5. Comparing the MARE and MXARE values reported in Figure 5, it is obviously seen that the AI models and the hybrid MGGP-GRG model yielded the lowest values for the train parts of Dataset 5, Dataset 6, and Dataset 7. These methods also resulted in better MARE for the test parts of Dataset 5, Dataset 7, and Dataset 8 and better MXARE for the test parts of Dataset 6 and Dataset 7 than other methods compared in Figure 5. Additionally, the hybrid MGGP-GRG model significantly reduced MXARE for the test part of Dataset 5.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

Figure 6 compares relative errors computed by seven methods for the test part of Dataset 7, whose rating curve is affected by the hysteresis phenomenon. Unlike Figure 4, Figure 6 show that the AI models and the hybrid MGGP-GRG method resulted in relatively smaller ranges of relative errors compared to other methods. Moreover, the patterns of relative errors presented by the former methods are completely different from those achieved by the latter ones. This clearly indicates that equation (1) yielded the same pattern of relative errors for loop rating curves, while the methods used for calibrating equation (1) may have an impact on the maximum and minimum relative errors and not on the corresponding patterns. On the other hand, the AI models and the hybrid MGGP-GRG technique achieved quite the same patterns of relative errors, while the maximum and minimum relative errors are different. Therefore, Figure 6 demonstrates that equation (1), regardless of which method is adopted for the calibration, has limited potential to capture the relation between stages and discharges in loop rating curves, whereas the AI models and the hybrid MGGP-GRG model were found to have more capabilities when it comes to develop hysteresis-affected rating curves.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

Figure 7 compares the confidence limits of different models for the test data of the eight datasets. First of all, the datasets used in this study have a wide range of discharge values as shown in both Figure 7 and Table 1. Based on Figure 7, the most considerable discrepancy is between the confidence limits of Dataset 6, Dataset 7, and Dataset 8. By comparing the confidence limits presented in Figure 7, it can be observed that the confidence limits obtained by the AI models and the hybrid MGGP-GRG model are closer to the observed values compared to other methods. Based on the comparison carried out in Tables 4 and 5 and Figures 3–5, it can be concluded that MGGP and the hybrid MGGP-GRG method not only provided explicit equations between the stage and discharge for each dataset but also significantly improved discharge estimations by developing precise stage-discharge relations, particularly for the loop rating curves.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

#### 4. Discussion

Characterization of the stage-discharge relationship at different river sections has remained an important aspect of hydraulic as well hydrologic studies. Initially, Tawfik et al. [18] employed ANN to simulate hysteresis-affected discharge rating curves. Since Tawfik et al.’s [18] contribution, ANN has remained a preferred choice to address the stage-discharge curves. Ghimire and Reddy [38] suggested that GA is more efficient than model tree and the conventional method for developing rating curves. Azamathulla et al. [23] proposed the application of gene expression programming (GEP) and the classical GP, which are both modified versions of GA, for developing discharge rating curves. They concluded that the efficiency of GEP is higher than that of GP for this purpose. Recently, Zakwan et al. [4] asserted that EA is as efficient as GA for determining stage-discharge relationships. Progressing in a similar direction, the present study has explored the capability of MGGP and the hybrid MGGP-GRG model as compared to the conventional method, EA, RF, the MHBMO algorithm, and ANN to develop stage-discharge curves. Unlike most of the earlier studies, this study examined the proposed models for their efficiency in case of steady as well as unsteady flow conditions by considering four datasets for steady and four datasets for unsteady flow conditions.

The application of MGGP to problems in water resource field is limited to very recent investigations. MGGP and hybrid models incorporating MGGP have been already used for several applications like estimating sediment transport [39], forecasting monthly streamflow [40], developing a rainfall-runoff model to predict streamflow [41], river flow forecasting [42], and predicting longitudinal dispersion coefficients [43]. In almost all of these applications, MGGP has been found to be an accurate estimation tool, particularly when the problem under investigation is complicated. According to the conducted literature review, it is the first time that MGGP has been utilized for developing rating curves, while the hybrid MGGP-GRG method has been introduced for the first time in this study.

When machine learning techniques are used as estimation models, one of the challenges is to take care of the overfitting. In this study, a considerable number of runs of the AI models were conducted. Afterwards, the achieved results were checked and the ones that performed well for the test data alone (and not well for the train data) were excluded because they did not generalize well from the train data to the test data. According to Tables 4 and 5, the results of the AI models are acceptable for both train and test parts of the data for each dataset considered.

As previously mentioned, various functions were considered in applying MGGP to develop stage-discharge relations for each dataset considered in this study. The opportunity to implement a desirable series of functions is indeed one of the main characteristics of MGGP, which makes it a powerful estimation tool. Because of the availability of various built-in functions in MGGP, several combinations of functions were considered to determine the best-fitted rating curves. As a result, MGGP may result in different equations comprising of different types of functions, which yield different numbers of accurate equations for different databases as shown in equations (8)–(15). This feature can be beneficial particularly when the user is not sure about which types of equation are suitable for a specific set of data. The entire process of the application of MGGP and the hybrid MGGP-GRG technique can be found in Figure 2.

Tables 4 and 5 and Figures 3 and 5 present quantitative and qualitative comparative analysis of the performances of different techniques used in the present study. Although much variation is not seen in NE values presented in Figures 3 and 5, there have been significant discrepancies between SSE values obtained by different methods in Tables 4 and 5. To be more specific, the hybrid MGGP-GRG model improved SSE values obtained by the conventional method from 4.5% (for the test part of Dataset 8) to 99% (for the test part of Dataset 3), while the conventional method achieved the second best and the best SSE values for the test parts of Dataset 1 and Dataset 5, respectively. Also, Tables 4 and 5 indicate that the reduction of SSE as a result of combining MGGP with GRG varies between 0.16% (for the test part of Dataset 6) and 7.9% (for the test part of Dataset 4). This improvement reveals that the hybrid MGGP-GRG method performed better than MGGP in terms of SSE for all datasets considered in this study.

By comparing the results shown in Tables 4 and 5 and Figures 3 and 5, it is not evident that which technique outperformed others in all cases. In order to determine which model performs the best, a ranking system was adopted from the literature [32] to evaluate different methods based on the criteria and datasets considered. In essence, this ranking system assumes an equal weight for each criterion. For the test part of each dataset, an integer value from one (the best performance) to seven (the worst performance) was designated to each model based on its performance in terms of each criterion (SSE, NE, MARE, and MXARE). As a result, each model has four ranking values for each dataset. The algebraic summation of these four ranking values is basically used to assign a ranking number to each model for each dataset. The detailed process of this ranking system is presented in Supplementary Materials for each and every dataset considered in this study. The ranking numbers achieved for each model applied to the data with single-value rating curves are given in Table 6. The summation of the ranking numbers of each model is also computed in Table 6, which is called summation of ranks. It was utilized to determine the total rank, which delineates the overall performance of each method for all data and criteria considered. According to Table 6, the hybrid MGGP-GRG model obtained the best summation of ranks and the best ranking numbers for data whose rating curves indicate a one-to-one relation between stage and discharge values. This implies that it has the best performance among different methods for single-value rating curves. Moreover, the MHBMO algorithm takes the second place in Table 6. Furthermore, the summation of ranks calculated for EA and ANN is the same, which shows that their performances for developing the stage-discharge relations for single-value rating curves were overall close to one another, while they took the third total ranks. On the other hand, the conventional method yielded a relatively high summation of ranks, and consequently, it was identified as the fourth (worst) method for developing single-value rating curves. Comparing the total ranks of MGGP and the hybrid MGGP-GRG method indicates that the idea of combining MGGP with GRG significantly improved the performances of MGGP in estimating discharges under steady conditions.

The results of the ranking analysis conducted for the hysteresis-affected data are presented in Table 7. As shown, it indicates that the first two best performances for data with loop rating curves are the hybrid MGGP-GRG model and MGGP. Although ANN was identified as the third best method in Table 6, it performs as the worst model in Table 7. By considering the results of Tables 6 and 7, it can be concluded that no method can be found that performs the best for all eight datasets based on the metrics considered. In other words, ANN or MGGP or even the hybrid MGGP-GRG technique is not the best model for all datasets and metrics considered, while the best method may be different from one case to another one. Nevertheless, Tables 6 and 7, which show the ranking results for the datasets and metrics considered, demonstrate that the hybrid MGGP-GRG method overall performs the best in developing rating curves considered in this study.

GP and its variants (like MGGP) can be used as a sensitivity analysis tool because they have the ability to formulate and structure the equation used for forecasting [40]. For the case of this study, the discharge (output) is exclusively a function of stage (input). Hence, the sensitivity analysis does not give any perspective on which independent variable has the highest impact and which one has the lowest impact on the dependent variable based on the proposed MGGP-based models.

When MGGP is applied to develop stream rating curves, functions, parameters, and the structure of the stage-discharge relation do not need to be known in advance, whereas the conventional method, EA, RF, and the MHBMO algorithm require assuming a specific formula for the stage-discharge relationship. In other words, MGGP requires to know neither the equation form nor functions comprising the relation between the input and output data, while it exploits a set of built-in functions to construct an adequate relation. This is one of the advantages of MGGP, which enables developing a stage-discharge relationship without shape limitation. It also allows the user to decide about the inclusion of each built-in function in the estimation process, while it does not provide the chance to add new functions to the list. This may be counted as one of the major drawbacks of the version of MGGP used in this study. Since MGGP benefits from a search-based optimization algorithm, it may yield a different result in each run. Therefore, it requires running a considerable number of times to make sure that the best results have been obtained. This may be accounted as one of the shortcomings of MGGP. In this regard, several studies suggested coupling MGGP with another program or optimization algorithm not only to improve its efficiency but also to tailor it to their specific problem [40]. In this regard, the hybrid MGGP-GRG model has been proposed for the first time in this study, while it improved the performance of MGGP in this particular application.

MGGP provides the opportunity to choose between the precision and complexity of the stage-discharge relation. This trade-off can be addressed by tuning the several controlling parameters playing the key roles in MGGP. The two controlling parameters, which have a major impact on the accuracy and simplicity of the final results, are the maximum genes allowed in each individual and the depth of trees [35]. These parameters should be set based on either previous studies or a sensitivity analysis because the poor evaluation of these parameters may have a significant influence on the final results.

MGGP is basically a flexible estimation tool that seeks the best relation between any given input and output data regardless of the physical meaning or theoretical background of the data. This feature enables us to apply MGGP to various applications. Combining MGGP with other optimization algorithms can improve the performance of this artificial intelligence technique. Thus, application of different hybrid MGGP-based models can be explored in various sectors of engineering and allied sciences.

#### 5. Conclusions

Rating curves have wide applications in water resource engineering. Numerous studies have been carried out to improve the accuracy of stream rating curves using optimization algorithms as well as AI models. In the present study, an attempt was made to develop ratings considering four datasets under steady (single rating curve) and four datasets under unsteady (loop rating curve) conditions based on the conventional method, EA, the MHBMO algorithm, ANN, MGGP, and the hybrid MGGP-GRG method. Rating curves for the four sites affected by hysteresis (loop rating curves) were also developed by the RF method. The performances of these techniques were compared based on the commonly used performance indicators and error indices. Comparing the performances of different methods applied to the eight datasets indicated that none of the methods outperformed in each and every case. For instance, the conventional method achieved the second best SSE for the first dataset and the best SSE for the sixth dataset, whereas the hybrid MGGP-GRG model improved SSE obtained by the conventional method between 4.5% and 99% for the rest of the datasets. To assess the overall performances of these techniques, a ranking system with equal weight designated to each metric was adopted from the literature. It was observed that the discharges estimated by the hybrid MGGP-GRG model were in good agreement with the observed data for both single and loop rating curves as it achieved the first ranking place for both scenarios. Based on the comparative analysis, the conventional method provided the worst estimates of discharge, whereas the MGGP-based models yielded the most precise discharge values based on SSE, MARE, and MXARE in most of the cases. However, this significant improvement was obtained by the compensation of relatively more complex equations, which can be directly used to predict discharges. The MHBMO algorithm and ANN also performed satisfactorily in most of the cases. Additionally, ANN performed satisfactorily during training, but failed to produce reliable discharge estimations during testing of several cases, which is probably due to overfitting during the training phase. Comparing the confidence limits of different methods with the observed data showed that the discrepancies between estimated and observed discharges are much evident for the hysteresis-affected rating curves than those of single ones. Moreover, MGGP provides several explicit equations between the stage and discharge that can be selected depending on the requirements of accuracy and simplicity. Also, one of the advantages of MGGP is that there is no need to assume both functions and parameters of the stage-discharge relation, while tuning of its parameters requires a bit of trial-and-error process, like any other AI models. However, application of MGGP not only requires a number of trials before the best results could be obtained but also may involve relatively more complex equations than that used by the conventional method in this study. Application of MGGP and hybrid MGGP-based techniques can be explored in the field of water resource engineering, especially in the field of sediment transport, as it involves highly nonlinear and complex equations. Finally, the results of this study demonstrated that a combination of MGGP with another optimization algorithm like GRG can improve the performance of this AI model in water resource applications.

#### Data Availability

Data may be obtained upon request to authors as authors are not entitled to provide the data openly.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Supplementary Materials

This appendix presents the detailed calculation process of the ranking system applied to different methods for developing stage-discharge rating curves.* (Supplementary Materials)*