#### Abstract

Since Bayesian Model Averaging (BMA) method can combine the forecasts of different models together to generate a new one which is expected to be better than any individual model’s forecast, it has been widely used in hydrology for ensemble hydrologic prediction. Previous studies of the BMA mostly focused on the comparison of the BMA mean prediction with each individual model’s prediction. As BMA has the ability to provide a statistical distribution of the quantity to be forecasted, the research focus in this study is shifted onto the comparison of the prediction uncertainty interval generated by BMA with that of each individual model under two different BMA combination schemes. In the first BMA scheme, three models under the same Nash-Sutcliffe efficiency objective function are, respectively, calibrated, thus providing three-member predictions ensemble for the BMA combination. In the second BMA scheme, all three models are, respectively, calibrated under three different objective functions other than Nash-Sutcliffe efficiency to obtain nine-member predictions ensemble. Finally, the model efficiency and the uncertainty intervals of each individual model and two BMA combination schemes are assessed and compared.

#### 1. Introduction

To date, various hydrological models have been put forward and widely used in flood forecasting, planning, and water resources management [1, 2]. Since different models have strengths in capturing different aspects of the real world processes, combining the results from diverse models by weighting procedures can present a better performance than any individual model [3–5]. The early model combination researches in hydrologic forecasting employed such tools as neural network [6] and fuzzy system [7]. Recently, Bayesian Model Averaging (BMA), a method for averaging over different competing models, has been introduced to ensemble hydrologic predictions.

Bayesian Model Averaging came to prominence in statistics in the mid-1990s, and Madigan and Raftery [8] were the first to propose this method for combining predictions. Subsequently, Raftery [9] and Draper [10] gave more detailed discussion about BMA. It has been applied in diverse fields such as economics [11], biology [12], ecology [13], public health [14], toxicology [15], meteorology [16], and management science [17]. In many case studies, BMA produces accurate and reliable predictions and was shown to be a better scheme than other model-combining methods [18–20]. In recent years, hydrologists have also applied BMA to hydrologic modeling, such as groundwater [21] and rainfall-runoff modeling [22–24].

A prediction from a single model has been recognized to be associated with a certain degree of uncertainty, and so is the prediction from combining a number of different single models. Thus, uncertainty analysis is an indispensable element for any hydrologic modeling study. The uncertainty usually arises from errors during the calibration of parameters, the design of model structure, and measurements of input and output data [25, 26]. To account for these uncertainties, many uncertainty analysis techniques have been developed and applied to diverse catchments, such as Generalized Likelihood Uncertainty Estimation (GLUE), Parameter Solution (ParaSol), and Bayesian inference based on Markov chain Monte Carlo (MCMC) [27, 28]. Each of those techniques has its own advantage in uncertainty analysis. In the uncertainty analysis of BMA scheme, the composition of Monte Carlo method [29] is used to generate BMA probabilistic ensemble predictions, and then the 90% uncertainty intervals can be derived within the range of the 5% and 95% quantiles.

Previous studies of BMA in hydrology mostly focused on the comparison of the BMA mean prediction with each individual model’s prediction, to prove the better performance of the prediction after weighted averaging. As BMA also has the ability to provide a statistical distribution of the quantity to be forecasted, the research focus in this study is shifted onto the comparison of the prediction uncertainty interval generated by the BMA with that of each individual model, in order to see if BMA can also improve the prediction reliability. The technical route of the research in this paper is described in Figure 1. Another purpose of this paper is that by calibrating different hydrological models under different objective functions, each of which has distinctive advantages in better modeling certain flow ranges, we can construct different sets of ensemble members for combination in order to fully explore the superiority of BMA. Therefore, two kinds of BMA combination schemes are designed, analyzed, and compared. In the first BMA scheme, we calibrate each of the three models under the same Nash-Sutcliffe efficiency objective function, thus providing three-member predictions ensemble for the BMA combination. In the second BMA scheme, three different objective functions other than Nash-Sutcliffe efficiency are adopted, each of which is supposed to have some advantage of better simulating a certain range of flows (low flow, medium flow, and high flow). All three models are, respectively, calibrated for each of three objective functions to obtain the optimized parameter sets.

#### 2. Methods

##### 2.1. Bayesian Model Averaging

Bayesian Model Averaging (BMA) is a statistical technique designed to infer a prediction by weighted averaging over many different competing models. This method is not only a scheme for model combination but also a coherent approach for accounting for between-model and within-model uncertainty [22]. Below is a brief description of the basic ideas of this method.

Let us consider a quantity to be predicted on the basis of input data ( denotes the input forcing data, and stands for the observational flow data). is the ensemble of the -member predictions. The probabilistic prediction of BMA is given by

The terms in (1) are explained as follows. is the posterior probability of the prediction given the input data and reflects how well model fits . Actually is just the BMA weight , and better performing predictions receive higher weights than the worse performing ones; all weights are positive and should add up to 1. is the conditional probability density function (PDF) of the predictand conditional on and . For computation convenience, is always assumed to be a normal PDF and is represented as , where is the variance associated with model prediction and observations . In order to make this assumption valid, some techniques such as Box-Cox transformation are needed to make the data approximately normally distributed and to narrow the data range.

The BMA mean prediction is a weighted average of the individual model’s predictions, with their posterior probabilities being the weights. In the case that the observations and individual model predictions are all normally distributed, the BMA mean prediction can be expressed as

##### 2.2. EM Algorithm for BMA Parameter Estimation

To estimate BMA weight and model prediction variance , the Expectation-Maximization (EM) algorithm, which has proved to be an efficient technique for BMA calculation based on the assumption that -member predictions are normally distributed, is described in this section [23].

Firstly, if we denote the set of BMA parameters to be estimated by , the log form of likelihood function can be represented as

It is difficult to maximize the function (3) by analytical method. The EM algorithm is a method for finding the maximum likelihood by alternating between two steps, the expectation step and maximization step. The two steps are iterated to convergence when there is no significant change between two consecutive iterative log-likelihood estimations. In EM algorithm, a latent variable (unobserved quantity) is used as an assistant for estimating BMA weight . The procedure of EM algorithm for BMA scheme is described as follows.

*(**1) Initialization. *Set .

Initialize where is the number of iteration and is the number of data in the calibration period. and are denoted as the observation and the corresponding prediction by the th model for the time .

*(**2) Calculate the Initial Likelihood:*

*(**3) Compute the Latent Variable.* Set , then calculate

*(**4) Update the Weight:*

*(**5) Update the Variance:*

*(**6) Update the Likelihood:*

*(**7) Check for Convergence.* If is less than a prespecified tolerance level, stop the whole estimation procedure; else go back to Step (3).

##### 2.3. Estimation of Prediction Uncertainty Interval

After BMA weight and prediction variance being estimated, we use the composition of Monte Carlo method to generate BMA probabilistic predictions for any time [29]. The procedures are described as follows.(1)Generate an integer value of from with probability . A specific procedure is described as follows.(1a)Set the cumulative weight and compute for .(1b)Generate a random number between 0 and 1.(1c)If , it indicates that we choose the th member of the ensemble predictions.(2)Generate a value of from the PDF of . Here, represents the normal distribution with mean and variance .(3)Repeat the above steps (1) and (2) for times. is the probabilistic ensemble size. In this paper, we set .

After generating the BMA probabilistic ensemble predictions, sort them in the ascending order. Then the 90% uncertainty intervals can be derived within the range of the 5% and 95% quantiles.

For each individual model in the BMA scheme, the prediction uncertainty interval can also be constructed, with the Monte Carlo sampling method still being used to approximate the assumed PDF of .

#### 3. Materials

##### 3.1. Study Area and Data

The study area is Mumahe catchment, a branch of Han River. It is located in Shanxi Province of China and the total area is 1224 km^{2}. The basin has a subtropical climate, and the area is humid with fairly high precipitation. The mean annual rainfall for the period of 1980–1987 is 1070 mm, and the mean annual runoff is 687 mm, or roughly 64% of the annual rainfall. The hydrological data include daily runoff, rainfall, and evaporation. There are 2992 data points in total, and 1825 (the period of 1980.1.1–1985.12.31) of them are used for calibration, while the rest 1167 data points (the period of 1986.1.1–1987.12.31) are used for validation.

##### 3.2. Hydrological Models and Optimization Algorithm

In this study, three conceptual hydrological models are employed for testing the capability of BMA: the Xinanjiang Rainfall-Runoff Model (XAJ), the Soil Moisture Accounting and Routing Model (SMAR), and SIMHYD Rainfall-Runoff Model.

Xinanjiang Rainfall-Runoff Model was developed in 1970s. It is a conceptual hydrologic model, which has been widely used in humid and semihumid regions of China. And all the 15 parameters of this model have strong physical meanings. SMAR model is a lumped conceptual model with soil moisture as a central theme. The model consists of two components in sequence: a water balance component with 5 water balance parameters and a routing component with 4 routing parameters. SIMHYD model is a daily conceptual model that estimates daily stream flow from daily rainfall and areal potential evapotranspiration data and it contains 7 parameters [30]. For calibrating these hydrological models, Shuffled Complex Evolution (SCE-UA) method is employed here for parameter optimization [31].

##### 3.3. Objective Functions

The selection of objective function (OF) is of great importance since it will have great influence on the values of calibrated parameters and thus on simulation results of the rainfall-runoff model. Different objective functions can be adopted for different kinds of practical issues. For example, the objective function of squared model errors of squared transformed flow can be applied in high flow studies, and the objective function of squared model errors of logarithmic transformed flow can be applied in low flow studies [32]. In this study, four objective functions have been used for the parameter calibration.

*(**1) OF1: The Nash-Sutcliffe Coefficient of Efficiency (**):*
where and are observed and simulated data at time and is the average of observed data in the calibration period.

*(**2) OF2: Mean Squared Error of Squared Transformed (**):*
Transforming the observed data in squared form puts great emphasis on fitting peak values.

*(**3) OF3: Mean Squared Error of Squared Root Transformed (**):* can be employed in the medium flow simulation.

*(**4) OF4: Mean Squared Error of Logarithmic Transformed (**):*
This transformation helps model parameterization to better fit the low flow values.

##### 3.4. Construction of BMA(3) and BMA(9) Schemes

When the prediction data are highly non-Gaussian, we should firstly transform the data to be normally distributed by Box-Cox transformation before using EM algorithm. OF1 is the most widely used objective function for parameter optimization and is used in calibrating each of three hydrological models mentioned above to generate three different predictions. We combine these three different predictions by BMA to construct a three-member predictions ensemble; thus, we denote the first BMA scheme as BMA(3). Figure 2 shows the procedure of BMA(3) combination scheme. The other three objective functions, that is, OF2, OF3, and OF4 are, respectively, fit for high, medium, and low flow simulation. All three hydrological models are, respectively, calibrated for each of these three objective functions to obtain the optimized parameter sets. As the same model with different parameter sets will give rise to different outcomes, nine different predictions are generated. We can use BMA method to combine these nine different predictions to construct a nine-member predictions ensemble, which is just the second BMA scheme denoted as BMA(9). The procedure of BMA(9) combination scheme is described in Figure 3.

Let denote the uncertainty of the forecast, and it can be written as , including three components, that is, the high flow simulation uncertainty , the medium flow simulation uncertainty , and the low flow simulation uncertainty . In BMA(9), the forecasts which are generated under OF2 have relatively small , so they can get higher weights than other forecasts in high flow simulation. Similarly, the forecasts generated under OF3 have relatively high weights in medium flow simulation, while the ones generated under OF4 have higher weights than others in low flow simulation. By averaging the forecasts from a set of different combinations of hydrological model and objective function, the advantage of BMA(9) is its ability to reduce the simulation errors by giving weights to each of the nine-member forecasts according to their performance in different flow ranges.

##### 3.5. Performance Criteria for Evaluating the Mean Prediction

There are three indices for evaluating the mean prediction.

*(**1) The Nash-Sutcliffe Coefficient of Efficiency (**). *The definition of has expressed in (10). is not only an objective function but also a widely used performance criterion. It ranges from minus infinity to 1.0, with higher values indicating better agreement. It is difficult to evaluate the performance of the model with in all flow ranges, since the value of is always negative in the medium flow range.

*(**2) Daily Root Mean Square Error (**):*
where and are observed and simulated data at time . is sensitive to the differences between the observations and simulations. The lower the value is, the better the prediction performance is.

*(**3) Relative Error of Total Runoff (**):*
It reflects the performance in the simulation of the total runoff amount. Lower values of indicate better agreement of total surface runoff.

##### 3.6. Performance Criteria for Assessing the Prediction Uncertainty Interval

Xiong et al. [33] have presented a set of indices for assessing the prediction uncertainty intervals generated by the uncertainty analysis methods. Three main indices are selected here to assess the prediction uncertainty intervals produced by BMA schemes as well as from each individual hydrological model.

*(**1) Containing Ratio (**).* The containing ratio is used for assessing the goodness of the uncertainty interval. It is defined as the percentage of observed data points that are covered in the prediction bounds.

*(**2) Average Band-Width (**).* Consider
where and are denoted as upper and lower prediction bounds at time . The average band-width is also an index for measuring the performance of estimated uncertainty interval.

*(**3) Average Deviation Amplitude (**).* The average deviation amplitude is an index to quantify the average deflection of the curve of the middle points of the prediction bounds from the observed streamflow hydrograph. It is defined as
where is the observed discharge at time .

#### 4. Results and Discussion

The weights of individual models in BMA(3) scheme are displayed in Figure 4, while the weights in BMA(9) are showed in Figure 5. Moreover, in order to compare the performance of two BMA schemes in different flow ranges, according to the characteristics of the streamflow values of Mumahe catchment, data are broken into three flow ranges: high flow (top 10%), medium flow (middle 50%), and low flow (bottom 40%).

##### 4.1. BMA(3) Results

We check the mean prediction of BMA(3) using three criteria illustrated in Section 4.1. Results of BMA(3) and its 3 individual models in the mean prediction for the whole flow series are presented in Table 1. In terms of , the mean prediction of BMA(3) can achieve 90.68% in calibration period and 86.98% in validation period, which is better than its best individual model prediction (XAJ). However, in terms of , the mean prediction of BMA(3) performs much worse than its best individual model prediction.

Three indices illustrated in Section 4.2 are used for assessing the prediction uncertainty intervals of both BMA(3) and its three individual models. The results for the whole flow series are also showed in Table 1. It is clear that BMA(3) uncertainty interval has the largest values of and , and almost the smallest , in both calibration and validation periods. In other words, BMA(3) uncertainty interval has better properties than any individual model’s uncertainty interval in terms of and , but worse in terms of . Then we compare the differences between BMA(3) and its individual model in uncertainty interval by the graph. Figure 6 displays the mean prediction and 90% uncertainty interval of both BMA(3) and its 3 individual models for Mumahe catchment in the year of 1983 during the calibration period. The observations of 1983 are shown as dots, and the BMA(3) mean prediction and its individual models’ predictions are represented by solid curve. As the statistical results showed in Table 1, the uncertainty intervals of the individual models have low containing ratio and large deviation amplitude. But the uncertainty interval of BMA(3) is much broader than that of any of its individuals. It can be found from Figure 7 that the results of validation period are similar to that of the calibration period. In general, the uncertainty interval of BMA(3) has better performance than its individual models for the whole flow series.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

##### 4.2. BMA(9) Results

Table 2 lists the results of BMA(9) and its 9 individual models in the mean prediction for the whole flow series. And from it we can easily find that in calibration period, the mean prediction of BMA(9) performs better than its best individual prediction according to the value of and , though the mean prediction of BMA(9) does not have any advantage in comparison to its individual model predictions in terms of .

The results of the uncertainty intervals of BMA(9) and its 9 individual models are also listed in Table 2. The containing ratio of BMA(9) uncertainty interval reaches 91.11% in calibration period and 90.23% in validation period, which are much higher than those of the uncertainty intervals of any individual model. The average deviation amplitude of the BMA(9) uncertainty interval is smaller than that of the uncertainty intervals of most of its nine individual models. From Figures 8 and 9, the similar conclusion can be concluded both in calibration and validation periods.

**(a)**

**(b)**

**(a)**

**(b)**

##### 4.3. Comparison of BMA(3) and BMA(9)

The results of both BMA(3) and BMA(9) in terms of the mean prediction and 90% uncertainty interval for the whole flow series are listed in Table 3 for comparison. BMA(3) mean prediction has slightly better performance than BMA(9) mean prediction in terms of and in both calibration and validation periods, while BMA(3) mean prediction is slightly worse than BMA(9) mean prediction in terms of *.* For the uncertainty intervals, some findings are listed as follows: (1) in terms of , BMA(9) uncertainty interval is much higher than BMA(3) uncertainty interval in both calibration and validation periods; (2) in terms of , BMA(9) uncertainty interval is obviously larger than BMA(3) uncertainty interval in both calibration and validation periods; (3) in terms of , BMA(9) uncertainty interval performs slightly better than BMA(3) uncertainty interval in both calibration and validation periods.

Further, we compare the BMA(3) and BMA(9) mean predictions with respect to three flow ranges in Table 4. According to the values of three indices for mean prediction, BMA(3) mean prediction has better performance than BMA(9) mean prediction in high flow range, but has worse performance in medium and low flow ranges, during both calibration and validation periods. Then we compare the uncertainty intervals of BMA(3) and BMA(9) in three different flow ranges and have some findings as follows: (1) the value of BMA(9) uncertainty interval has absolute predominance in comparison with that of BMA(3) uncertainty interval for each of three flow ranges in both calibration and validation periods; (2) the value of BMA(9) uncertainty interval is larger than that of BMA(3) uncertainty interval for all three flow ranges in both calibration and validation periods; (3) the value of BMA(9) uncertainty interval is slightly larger than that of BMA(3) in high flow range but smaller in medium and low flow ranges in both calibration and validation periods.

#### 5. Conclusions

In this paper, the Bayesian Model Averaging (BMA) method is employed to construct a three-member predictions ensemble, denoted by BMA(3), and a nine-member predictions ensemble, denoted by BMA(9), for ensemble prediction as well as for prediction uncertainty analysis. There are three kinds of comparisons made in terms of both mean prediction and prediction uncertainty interval in this study: BMA(3) with its three individual models, BMA(9) with its nine individual models, and BMA(3) with BMA(9). In particular, we break observational flows into three different ranges for detailed comparison and analysis. The performance of two BMA schemes can be summarized as follows.(1)In terms of mean predictions, BMA(3) performs generally better than any of its individual models. And BMA(9) mean prediction has generally higher accuracy than each of its individual model predictions. The comparison between BMA(3) and BMA(9) in mean predictions indicates that BMA(9) does not have any advantage compared to BMA(3) as far as the entire flow series is concerned. The performance of BMA(9) mean prediction is better than that of BMA(3) in both medium and low flow ranges, however, worse in the high flow range.(2)In terms of the containing ratio for assessing the uncertainty intervals, the BMA(3) has a larger value than any of its individual models. And the containing ratio of BMA(9) uncertainty interval is also markedly larger than that of all its individual models when the value is calculated for the whole flow series. When the value is compared for different flow ranges, BMA(9) uncertainty interval performs better than its individual models in high, medium, and low flow ranges. In comparison with BMA(3), BMA(9) uncertainty interval also has absolute predominance in terms of .(3)The average band-width of BMA(3) uncertainty interval is larger than that of all its individuals. And the average band-width of BMA(9) uncertainty interval is even larger than that of BMA(3). It is found that, for uncertainty intervals, the increase of containing ratio is accompanied by the increase of band-width, which has already been pointed out by Xiong et al. [33].(4)The average deviation amplitude of BMA(3) uncertainty interval is generally smaller than the best individual in the ensemble. In terms of , BMA(9) uncertainty interval also has a better performance than the best individual among its nine-member ensemble, especially in high flow range. Moreover, in terms of , BMA(9) uncertainty interval performs better than BMA(3) uncertainty interval in medium and low flow ranges, but worse in the high flow range.

Based on this study, it is found that BMA is a particularly useful method for dealing with two issues. Firstly, when there are two or more competing models or methods available for the same problem, BMA can assess the relative performances of all models by assigning weights to each model or method and then produce more accurate mean prediction by weighted averaging of all predictions from those models or methods. Secondly, BMA can be used when there is uncertainty over control variables. The uncertainty intervals for both individual predictions and the BMA prediction can be derived when the distribution of the data is known or assumed.

Two issues from this study of BMA also need to be pointed out. The first is about the data transformation process. It is obvious that the daily flow data do not strictly obey the normal distribution even after the Box-Cox transformation. In fact, it is impossible to make every prediction from every model be normally distributed by using only a uniform transformation coefficient. Another problem is about the quality of the hydrological models chosen for combination. In this paper, the models employed here are all conceptual hydrological models. If better models are chosen as the ensemble members, then it is expected that the better results will come out of the BMA combination.

#### Acknowledgments

This research is supported by the National Natural Science Foundation of China (Grant nos. 51190094, 51079098), which is greatly appreciated. The comments and suggestions from the editor and the reviewers are very helpful in the improvement of the paper and are greatly appreciated.