Abstract

Single-objection function cannot describe the characteristics of the complicated hydrologic system. Consequently, it stands to reason that multiobjective functions are needed for calibration of hydrologic model. The multiobjective algorithms based on the theory of nondominate are employed to solve this multiobjective optimal problem. In this paper, a novel multiobjective optimization method based on differential evolution with adaptive Cauchy mutation and Chaos searching (MODE-CMCS) is proposed to optimize the daily streamflow forecasting model. Besides, to enhance the diversity performance of Pareto solutions, a more precise crowd distance assigner is presented in this paper. Furthermore, the traditional generalized spread metric (SP) is sensitive with the size of Pareto set. A novel diversity performance metric, which is independent of Pareto set size, is put forward in this research. The efficacy of the new algorithm MODE-CMCS is compared with the nondominated sorting genetic algorithm II (NSGA-II) on a daily streamflow forecasting model based on support vector machine (SVM). The results verify that the performance of MODE-CMCS is superior to the NSGA-II for automatic calibration of hydrologic model.

1. Introduction

Some research has been reported that streamflow processes are affected by several known factors, such as rainfall and soil moisture, and many potential factors. Consequently, the streamflow processes always result in being nonlinear and time-varying. And the differences between the characteristics of high flow processes and low flow processes are very significant most of time. Therefore, it tends to be very difficult to predict the streamflow processes.

Firstly, selecting and constructing a proper model are the great and first important step of the research. Statistics forecast models [1], such as autoregressive (AR) model and autoregressive integrated moving average (ARIMA) model, are used the most. However, these types of models often have poor performance when outside or near the limits of the data [2]. Due to the strong ability of Artificial Neural Networks (ANNs) with nonlinear mapping, they have been employed in the field of streamflow forecasting. However, the ANNs also have several shortcomings, such as overfitting, falling into local optimization, and slow convergence speed, especially when dealing with complex hydrological problems. Support vector machine (SVM), which is proposed by Vapnik [3], is one of the most effective algorithms for prediction in this decade. SVM basically involves solving a quadratic programming problem. In theory, SVM can gain the global optimum solution of the original problem. During the past few years, SVM has been widely employed for streamflow forecasting, and the performance of SVM is shown to be better than ANNs [48]. Thus, we build a daily streamflow forecasting model based on SVM here.

As the model is established, the next vital work is to make calibration of the hydrologic model. Calibration of hydrologic model is to find the optimal parameters which could make the model match the characteristic of the hydrology as good as possible. Usually, the hydrological models are calibrated under the framework of single-objective paradigm. And the major work is focused on choosing the single-objective function and the optimization strategy to optimize that measure. However, several research reports reveal that single-objective functions often cannot properly describe all characteristics of the hydrological models [9]. Therefore, consequently, it stands to reason that multiobjective functions are needed for calibration of hydrologic model. In Gupta et al. [10], the advantages of a multiple-objective representation of the model calibration problem were discussed. Different with single-objective optimization, model calibration with multiobjective functions will result in a set of so-called equal good solutions. And each solution is not dominated by any one solution of the set of “equal good solutions.” This set of “equal good solutions” is called the Pareto front or the set of nondominated solutions. The detail of the definition of Pareto front or nondominated solutions can be referred to [11]. In general, two methods can be used: one is to transform the multiobjective optimization problem to single-objective optimization by setting weight coefficients to each objective function. Although this method is simple to implement, it can only gain a single Pareto solution at a time. The other method is to solve the multiobjective optimization problem directly [10]. In the recent years, a variety of multiobjective optimization algorithms are proposed to optimize hydrologic models. In the research of hydrologic modeling, Yapo et al. [12] presented a multiobjective optimization algorithm MOCOM-UA for multiple-objective global optimization calibration of hydrological models and demonstrated its features and capabilities by a simple example involving calibration of a conceptual rainfall-runoff model using two objectives, and some related research of hydrologic calibration and evaluation presented that the MOCOM-UA can efficiently generate a set of Pareto optimal solutions [10, 1218]; Vrugt et al. [9] developed an effective and efficient algorithm MOSCEM-UA, which can gain a fairly uniform approximation of the Pareto frontier; Khu and Madsen [19] introduced the nondominated sorting genetic algorithm II (NSGA-II) into the multiobjective calibration of the MIKE11/NAM rainfall-runoff model; Gill et al. [20] modified the multiobjective particle swarm optimization (MOPSO) to account for multiobjective problems by introducing the Pareto rank concept. And the application of the new MOPSO in multiobjective calibration of hydrologic model has gained encouraging results; De Vos and Rientjes [21] used the NSGA-II algorithm for calibration of a feed-forward ANN model and the HBV conceptual model to compare the multiobjective performance of the two models; de Vos and Rientjes [22] presented results on the application of MOSCEM-UA, NSGA-II, and two other single-objective optimization algorithms for the training of artificial neural network rainfall-runoff models and showed that more than one objective function can be helpful in constraining the neural network training.

In this paper, a new multiobjective optimization method named multiobjective optimization method based on differential evolution with adaptive Cauchy mutation and Chaos searching (MODE-CMCS) is proposed to optimize a three-parameter support vector machine model for daily streamflow forecasting. And the efficacy of the new algorithm MODE-CMCS is compared with the NSGA-II algorithm on a three-parameter SVM based daily streamflow forecasting model.

This paper is organized as follows: Section 2 provides details of the proposed MODE-CMCS algorithm and a simple introduction about the NSGA-II and Differential Evolution (DE) algorithm is also given; Section 3 describes the performance metrics used for multiobjective algorithm evaluation in this research; in Section 4, the MODE-CMCS is first employed on solving 5 benchmark test problems; Section 5 presents the application of the MODE-CMCS to multiobjective parameter optimization of a three-parameter SVM based hydrological model for streamflow prediction; and conclusions are made in the last section.

2. Multiobjective Optimization Algorithms

In this section, we will give a simple introduction about the NSGA-II, and after that the details of the proposed multiobjective algorithm will be described.

2.1. NSGA-II

NSGA-II, which is an improvement over the NSGA, is first proposed by Deb et al. [23]. It is one of the most effective and efficient algorithms for solving multiobjective problems. The flowchart of NSGA-II is shown in Figure 1. For more details about NSGA-II, readers are encouraged to refer to [23].

Several main features of the algorithm NSGA-II can be summarized as follows:(1)NSGA-II is significantly more efficient than the original version of the algorithm. And NSGA-II can reduce the computational complexity into , where is the number of objective functions and is the size of evolving population.(2)NSGA-II employs a Crowded-Comparison Operator to maintain the diversity of the nondominated Pareto solutions. Each nondominated solution has a new attribute-crowd value, which is calculated through the Crowded-Comparison Operator. And each nondominated solution is ranked by the crowd value. The nondominated solution with higher crowd value is more likely to be preserved to the next population. The diversity of the NSGA-II is superior to the original version of the algorithm.(3)NSGA-II does not have the parameter-share parameter, which is designed in the original version of the algorithm and can hardly set a proper value. This will make the NSGA-II be more flexible and easier to use.

2.2. MODE-CMCS Algorithm

The proposed multiobjective algorithm MODE-CMCS is based on the DE algorithm; thus the DE algorithm will be introduced first, and then the details about MODE-CMCS are presented.

2.2.1. DE Algorithm

DE is firstly proposed by Storn and Price [24, 25]. And it has been applied in several engineering problems [2630]. One format of DE is the so-called DE/rand/1/bin. Similar with other evolutionary algorithms (e.g., GA), DE also mainly contains three basic operations: mutation, crossover, and selection. For DE/rand/1/bin, suppose a population in DE consists of , , where denotes the generation number, and each , which is a D-dimension vector, denotes a possible solution; the details are as follows:

The mutant vectors , are calculated bywhere , , are randomly chosen from the population and is the mutation control parameter that affects the amplification of the differential vector.

Thereafter, the discrete recombination is employed to create vectors , , as follows:where returns a random float number from a uniform random distribution from the section ; returns a random real number from a uniform random distribution from the section ; is the crossover parameter.

DE adopts a greedy selection method, and then the next generation can be gained by

2.2.2. MODE-CMCS

Details of the proposed multiobjective algorithm MODE-CMCS are discussed in this section. The MODE-CMCS mainly contains 7 operators as follows.

(a) Nondominated Sorting and Improved Crowd Distance Assignment Operator. The nondominated sorting method is the same as the NSGA-II. But the MODE-CMCS uses a more precise crowd distance assignment operator. For NSGA-II algorithm, the crowd distance of each point is got by calculating the sum of the two points on either side of this point along each of the objectives. But this method does not take into account the distribution of this point. For example, in Figure 2, suppose that the ranges of objective function 1 and objective function 2 have been scaled to the same scope, according to the crowd distance assignment of NSGA-II; the crowd distance of point A is 12, which is equal to that of point B, while the crowd distance of point C is 6. But it is obvious that the point A has better diversity than point B; that is to say, the crowd distance of point A should be larger than that of point B. Therefore, in this approach, we propose a more precise crowd distance assignment method as follows:where , is the number of individuals, is the number of objective functions, and and are the distances of the th solution to its lower and upper adjacent solution along the th objective, respectively. According to (4), the crowd distance of point A is 228 while those of point B and point C are 211.3196 and 114, respectively. It is evident that the new crowd distance measurement is more precise than that of NSGA-II.

(b) Archive Set Updating Operator. Similar with the improved Strength Pareto Evolutionary Algorithm (SPEA2) [31], the archive set, which is designed to store the nondominated solutions until now, is employed to enhance the algorithm convergence performance. Suppose the archive set is ; the operations on archive set can be expressed as Algorithm 1.

For each non-dominated individual in population
Begin
  If1 is none, then add into directly;
  Else1
  Begin
    If2 is dominated by one member in , then continue;
    Else2
    Begin
      add into and delete the members in dominated by ;
      If3 the length of is larger than the preset size of then
      Begin
        Calculate the crowd distance of individuals in ;
        Delete the individuals with the minimum crowd distance;
      End if3
    End else2
  End else1
End for

(c) Mutation Operator. The mutation operation here is similar to DE. But unlike DE, the random individuals , , and are chosen from the archive set rather than the parent population. As the individuals in the archive set are all nondominated, it can accelerate the convergence of the algorithm to some extent. Furthermore, the mutation control parameter should be set larger at the beginning of the procedure to allow the algorithm with higher global searching ability and be set smaller at the end of the procedure to let the algorithm search within a local scope. Thus, an adaptive dynamic control mechanism for choosing the suitable value of during the evolutionary progress is presented in this paper aswhere is the current generation number and denotes the maximum generation number.

(d) Crossover Operator. The crossover operator is the same as DE’s recombination operator. However, as mentioned in [30], the traditional constant CR cannot completely guarantee the optimization’s ergodicity in the search space. Therefore, in this paper, we adjust the value of CR according to the progress of generation bywhere is the current generation number and denotes the maximum generation number.

(e) Selection Operator. This operator uses the greedy strategy based on the concept of dominate, and it can be described as Algorithm 2.

For each pair of selected candidate individual and parent individual
Begin
  If dominates , then enters the next population ;
  Else if dominates , then enters the next population ;
  Else if and are non-dominated with regard to each other, then
     and both enter the next population ;
End for
If the length of is larger than the preset size
Begin
  Sort the individuals of with non-dominated sorting;
  Calculate the crowd distance of individuals in ;
  Select the first better individuals with regard to their rank and crowd distance as the next population
End if

(f) Adaptive Cauchy Mutation Operator. DE explores new space mainly by mutation and crossover operations. From (1) and (2), we can see that when the diversity of the population decreases to some extent, the algorithm will fall into local optimal solution. To overcome this premature convergence problem, the adaptive Cauchy mutation operator is employed to make algorithm escape out of local optimal area. In the procedure of DEMO-CMCS, we first calculate the diversity of each dimension of the population according to the following equation:where denotes the size of the population, is th dimension of the th individual, is the average of jth dimension of the population, and and are the upper and lower bound of the th dimension, respectively.

If the value is below the preset threshold , the mutation operation will be carried out on this dimension aswhere is th dimension of the th individual, denotes a standard Cauchy variable, is the coefficient of Cauchy mutation, is the current generation number, and denotes the maximum generation number.

(g) Chaos Searching Operator. It is well known that Chaos has the characteristic of ergodicity, stochastic property, and “regularity”; literature [32] has revealed that the Chaos can enhance the search accuracy of algorithm when the searching scope of Chaos is small. Therefore, a Chaos searching operator with a small searching scope is introduced at the end of the procedure. In this research, the Chaos searching is based on the Logistic map, whose equation is aswhere is iteration number; , is a chaotic series between 0 and 1 when .

The procedure of the Chaos searching operator in this paper is as Algorithm 3.

For each individual in archive set
Begin
  Set ;
  Generate a -dimension initial vector ;
  While ( is smaller than the maximum iteration number)
  Begin
    Calculate according to (9);
    Map the to the preset scope [, ] according to:
                  
    Calculate the new individual generated from the current individual according to:
      , where is the shrinkage factor and should be set a small number to ensure that
      the Chaos searches in a small scope;
    If dominates
    Begin
      Replace with ;
      Delete the individuals dominated by ;
      Break while;
    End if
    Else
    Begin
      Replace with ;
    End else
    ;
  End while
End for

With the above 7 operators, the flowchart of the proposed multiobjective algorithm MODE-CMCS can be expressed as Figure 3.

3. Performance Metrics for Multiobjective Algorithm

In this paper, two common indexes are employed. Otherwise, a novel index for evaluating diversity of the solutions is proposed. The details of the three indexes are as follows.

3.1. Generational Distance Metric (GD)

GD is used to estimate the convergence of the algorithm. It measures how far the obtained set of nondominated solutions is from the real Pareto frontier. The definition of GD is aswhere is the number of obtained nondominated solutions and is the minimum Euclidean distance in the objective space between th obtained solution and solutions of the optimal Pareto set.

3.2. Generalized Spread Metric (SP)

The generalized spread metric SP measures the diversity of the obtained set of solutions [23]. It is designed aswhere is the number of obtained nondominated solutions; is the Euclidean distance between th solution and ()th solution in the objective space; is the mean of all distances ; and are the Euclidean distance between the boundary solutions of the obtained nondominated set and the known set of optimal Pareto solutions.

3.3. Generalized Spread Metric Based Correlation Analysis (SPC)

In practical applications, the size of the obtained set of Pareto solutions may be different using different algorithms (in this research, NSGA-II generates 100 Pareto optimal solutions while MODE-CMCS generates 30). At this situation, taking SP as the diversity evaluation indicator is unsuitable and inequitable here, as the SP value is very sensitive with the length of sample. The explanation is as follows.

Suppose and are zero; we have two distance series and . The SP values can be calculated aswhere is the SP value of ; is the SP value of ; denotes the average distance of ; denotes the average distance of .

As and are zero, this means that . Therefore, it is obvious that the SP value is very sensitive with the length of distance series.

To overcome this problem, a novel index SPC is proposed in this paper. We convert the problem into calculating some indicator independent of sample length.

First, a series of distance DIS is constructed base on the definition of as follows:

Second, a series REF with the length is created by equal interval sampling of the uniform distribution on as follows:

Then, the index SPC is defined as calculating the correlation coefficient between DIS and REF; the calculated equation is as

The value of SPC can be used to measure the diversity of generated Pareto optimal solutions, and the larger the SPC is, the better the diversity of the set of Pareto optimal solutions is.

4. Numerical Experiments and Results

In this part, the MODE-CMCS algorithm is employed to solve 5 benchmark problems. And the performance is compared with other reported results.

4.1. Description of the Benchmark Test Problems

ZDT1, ZDT2, ZDT3, ZDT4, and ZDT6, are adopted, and the features of these test problems are given in Table 1.

4.2. Parameter Settings of the MODE-CMCS Algorithm

To make equal comparison with the results in [23], the parameters of MODE-CMCS algorithm are set as follows: dimension of the decision space equals the listed in Table 1; the maximum number of function evaluations is set to 25000; the maximum iteration number of the local Chaos searching is set to 150; population size is 100; size of archive set is 100; the crossover rate is set to 0.8; the mutation rate is set to 0.02; diversity threshold ε is set to 0.1; coefficient of Cauchy mutation η is 0.5; dimension of the objective space is 2.

4.3. Experimental Results

The metrics GD and SP are selected to measure the performance of algorithms. The mean and variance of the values of the two metrics are calculated by MODE-CMCS on 30 individual tests, and the two metrics are shown in Table 2. The best values are marked by boldface. We can find that the MODE-CMCS algorithm performs better than NSGA-II on all benchmark problems.

5. Case Study

In this section, the performance of MODE-CMCS is compared with NSGA-II on calibration of a three-parameter SVM based daily streamflow forecasting model. In this case study, we are especially concerned with the ability of finding the approximate nondominated frontier.

5.1. Description of the Study Area

Figure 4 shows the Yangtze River Basin, which originates in the Qinghai-Tibet Plateau and flows about 6300 km eastwards to the East China Sea. The drainage basin lies between 91°E–122°E and 25°N–35°N, covering a total area of 1808.5 × 103 km2, and its mean annual rainfall is approximately 1067 mm. As the Three Gorges Dam has interests in flood controlling, hydroelectric power generation, agriculture irrigation, and municipal and industrial water supply, the precise streamflow forecasting can make considerable economic benefit and meanwhile be helpful for flood controlling of downstream of the Yangtze River.

5.2. Description of the Streamflow Prediction Model

The daily streamflow forecasting model used in this research is a three-parameter SVM model, which is similar with the SVM model in [20]. The SVM for regression is employed in this paper to build the relationship between input data (measured daily streamflow) and output data (predicted daily streamflow). The kernel function of SVM for regression can map the nonlinearity data into high dimensional feature space. Then in this high dimensional feature space, the nonlinearity data can be described with linearity functions. Also the -insensitive tube is introduced to deal with the existence of fitting errors, and the data located outside of the -insensitive tube is allocated with a penalty coefficient. There are three parameters, penalty coefficient , tolerance , and kernel parameter , of the SVM model needed to be determined through calibration. For one day ahead forecasting, the inputs to the model are the streamflow of the past five days: , , , , and ; the output is the predicted daily streamflow . The inputs are selected based on autocorrelation analysis of streamflow series. To make calibration of this model, the daily streamflow data measured between 2000 and 2002 of the Yichang Station, whose location is shown in Figure 4, is chosen. And the first two years of flow data is used for training; the remaining data is used for testing.

5.3. Selected Objective Functions

The parameters of the daily streamflow forecasting SVM model are optimized by NSGA-II and the proposed MODE-CMCS algorithm with two pairs of objective functions: mean squared logarithmic error (MLSE) versus mean squared derivative error (MSDE) and mean fourth-power error (M4E) versus mean squared derivative error (MSDE) [21, 23]. The equations of the above three objective functions are as follows:where K is total number of flow data, is the th measured value of flow, and is the kth forecasted value of flow.

The MSLE function is more suitable for low flows due to the logarithmic transformation while the M4E is considered as an indicator of goodness-of-fit to high flows as larger deviations are given more contributions. The MSDE can be taken as an indicator of the fit of the shape of hydrograph. As the MSDE function does not take into account bias between observed and simulated value, so it cannot be used as the only objective function for calibration of hydrologic models; it should be used in combination with MSLE or M4E.

5.4. Results of Multiobjective Calibration and Discussion

For NSGA-II, the control parameters are preset as follows: the size of the population is set to 100, the crossover rate is set to 0.8, the mutation rate is set to 0.02, and the maximum number of function evaluations is set to 100000. And for MODE-CMCS, the following values of control parameters are selected: the size of the population is set to 100, the size of the archive set is set to 30 (to lower computation load), the threshold is set to 0.1, the maximum number of function evaluations is set to 100000, and the maximum iteration number of the local Chaos searching is set to 150. To diminish the influence of random factor, each algorithm runs 10 times independently. The approximate true Pareto set is gained by gathering all generated solutions obtained by the algorithms NSGA-II and MODE-CMCS.

The prediction results of the three-parameter SVM model are shown in Figures 58. Figure 5 presents the Pareto plots of the SVM model on the objective space MSLE and MSDE using NSGA-II algorithm and the proposed MODE-CMCS algorithm, whereas the Pareto solutions for combinations of MSDE and M4E are shown in Figure 6.

From Figures 5 and 6, it is apparent that there are conflicts existing between MSDE and the other two objective functions (MSLE and M4E). Figure 5 indicates that a very good calibration of MSLE provides a bad calibration of MSDE and vice versa. Likewise, it can be noted in Figure 6 that a good calibration of M4E leads to a bad calibration of MSDE and vice versa. The detailed convergence performance results are summarized in Table 3.

From Table 3, we can see that the MODE-CMCS algorithm can generate better Pareto solutions than the NSGA-II algorithm. When the objective functions are MSDE and MSLE, the MODE-CMCS generates significantly better solutions than NSGA-II. This indicates that there may be many local optimal solutions for this functions combination and the NSGA-II can be easily trapped into the local optimal solution. If the functions combination is M4E and MSLE, the MODE-CMCS generates slightly better results than NSGA-II. These better results are mainly due to the adaptive Cauchy mutation operator and the Chaos local searching operator, as the former operator can help the algorithm escape from local optimal solution while the latter can improve the searching accuracy of the algorithm.

However, in multiobjective optimization, we usually have two goals. The first one is to generate an approximation Pareto optimal set which are as close as possible to the optimal frontier, and the second one is to maintain the diversity of the solutions. We have to emphasize that the second goal is as important as the first one. This means that generating a better set of Pareto optimal solutions is not enough. Thus, to enhance the diversity of the generated approximation Pareto optimal set, the improved distance assignment operator, which has been discussed in Section 2.2.2(a), is proposed in this paper.

The SPC values obtained from NSGA-II and the proposed MODE-CMCS are given in Table 4. From Table 4, we can see that the performance of MODE-CMCS in maintaining diversity of the generated Pareto optimal solutions is better than NSGA-II.

From the above analysis, it can be noted that the performance of the proposed MODE-CMCS in this paper is better than the NSGA-II in terms of searching better Pareto optimal solutions and maintaining the diversity of the generated solutions.

Furthermore, the multiobjective calibration can help the modelers have a better understanding of the hydrology models. Figures 7 and 8 show the prediction uncertainty ranges associated with the Pareto optimal solutions generated by MODE-CMCS algorithm.

From Figures 7 and 8, it can be noted that the streamflow prediction ranges match the high flow and medium flow very well whereas the prediction accuracy turns to be worse during the low flow terms. Accordingly, it is suggested that the structure may be in need of improvement to enhance the prediction accuracy on low flow events.

6. Conclusions

This paper presents a novel multiobjective algorithm named MODE-CMCS for automatic calibration of hydrology model. Considering the drawback of the DE algorithm of being easily trapped in local minimum, the adaptive Cauchy mutation operator is introduced. And to generate better Pareto optimal solutions (which are closer to the true Pareto optimal frontier), we employ the local Chaos searching operator. Forward, the MODE-CMCS uses a more precise crowd distance assignment than that of the NSGA-II. Besides, as the traditional generalized spread metric SP is sensitive with the size of Pareto set, a novel diversity performance metric SPC, which is independent of Pareto set size, is proposed in this research.

The proposed MODE-CMCS algorithm is firstly employed to solve 5 benchmark problems. The results present that the MODE-CMCS is more powerful than NSGA-II in both providing an approximation of the true Pareto frontier and maintaining the diversity of Pareto set. Furthermore, the performance of the proposed MODE-CMCS algorithm for generating approximate Pareto optimal solutions is compared with the NSGA-II algorithm for calibration of a three-parameter SVM based daily streamflow forecasting model. And two pairs of objective functions, which are MSLE versus MSDE and MSDE versus M4E, are considered. The results of the case study also prove that the performance of the proposed MODE-CMCS is better than the NSGA-II in terms of both searching better Pareto optimal solutions and maintaining the diversity of the generated solutions. Moreover, the multiobjective calibration of hydrology models can help us understand the models better and then put forward improving proposal.

Competing Interests

The authors declare no competing interests.

Authors’ Contributions

Yi Liu and Jun Guo conceived and designed the experiments; Huaiwei Sun and Yueran Wang performed the experiments; Yueran Wang and Wei Zhang developed the model code and performed the simulations; Yi Liu and Jun Guo prepared the paper with contributions from all coauthors.

Acknowledgments

The authors appreciate the support from the National Natural Science Foundation Project of China (NSFC) (no. 51509095), the CRSRI Open Research Program (Program SN:CKWV2014220/KY), and the State Key Program of State Grid Hunan Electric Power Company of China (no. 5216A514003Z).