Application of Harmony Search to Design Storm Estimation from Probability Distribution Models
The precision of design storm estimation depends on the selection of an appropriate probability distribution model (PDM) and parameter estimation techniques. Generally, estimated parameters for PDMs are provided based on the method of moments, probability weighted moments, and maximum likelihood (ML). The results using ML are more reliable than the other methods. However, the ML is more laborious than the other methods because an iterative numerical solution must be used. In the meantime, metaheuristic approaches have been developed to solve various engineering problems. A number of studies focus on using metaheuristic approaches for estimation of hydrometeorological variables. Applied metaheuristic approaches offer reliable solutions but use more computation time than derivative-based methods. Therefore, the purpose of the current study is to enhance parameter estimation of PDMs for design storms using a recently developed metaheuristic approach known as a harmony search (HS). The HS is compared to the genetic algorithm (GA) and ML via simulation and case study. The results of this study suggested that the performance of the GA and HS was similar and showed more accurate results than that of the ML. Furthermore, the HS required less computation time than the GA.
Floods that occurred due to severe rain or major storm generally result in damage to properties and negative impacts on human activity. Flood estimation is the process of minimizing property damage and reducing the threat to human activity. Design storm based on precipitation frequency analysis is one of the main procesess for flood estimation as well as a statistical representation of a precipitation event.
The primary purpose of precipitation frequency analysis in hydrometeorology is to estimate the magnitude of a storm event with a given frequency of occurrence. It can also be used to estimate the frequency of occurrence of a storm event with a given magnitude . The precision of precipitation frequency analysis depends on the selection of an appropriate probability distribution model (PDM) and parameter estimation techniques. A number of PDMs have been developed to describe the probability distribution of the hydrometeorological variables. In practice, it is often assumed that the correct PDM is a member of the developed PDMs. However, the selection of an appropriate PDM is still one of the major problems in precipitation frequency analysis .
For each of the developed PDMs, estimated parameters are provided based on alternative estimation techniques, such as the method of moments (MOM), probability weighted moments (PWM), linear function of ranked observations (L-moments), and maximum likelihood (ML) [1–5]. The MOM is one of the most simple and commonly used methods for estimating the parameters. PWM and L-moments are discussed by Stedinger et al. . Estimates for PWM can be obtained from order statics. Additionally, Stedinger et al.  recommended a parameter estimator for PWM in regionalization studies. L-moments tend to produce less variable estimates for higher moments, when an unusually large or small observation happens to be present in a sample . The moment estimators, including MOM, PWM, and L-moments, are a simpler way to obtain the PDM parameters but are less accurate than the ML estimators. The results using ML are generally more reliable than the other methods. However, the ML is more laborious than the other methods because an iterative numerical solution, such as the Newton-Raphson method , must be used for estimating the parameters.
Metaheuristic approaches have been developed to solve various engineering optimization problems (e.g., linear and stochastic, dynamic, and nonlinear). These approaches include genetic algorithms [8, 9], ant colony optimization , simulated annealing , tabu searches , and evolutionary computation methods. Metaheuristic approaches use a stochastic random search instead of a gradient search so that intricate derivative information is unnecessary . Therefore, the metaheuristic approaches have been shown to be a useful strategy to solve optimization problems in hydrology [5, 14–21].
Karahan et al.  used the genetic algorithm (GA) to predict rainfall intensities for a given set of return periods. The results showed that the proposed GA could be used to develop rainfall intensity-duration-frequency relationships with the lowest mean-squared error between the observed and predicted intensities. Karahan et al.  concluded that predicted intensities were in good agreement with the analyzed return period. Hassanzadeh et al.  studied the most suitable PDM for annual maximum discharge in East-Azerbaijan, Iran. Hassanzadeh et al.  employed the GA and ant colony optimization (ACO) techniques to find the parameters of PDMs. The performance of these algorithms was evaluated by comparison with conventional methods, such as MOM, ML, and PWM. The results showed that the GA and ACO were effective optimization tools compared to other methods for the parameter estimation of PDMs. The GA and ACO techniques could also be used for systems that are more complex and involve nonlinear optimization problems. However, the GA and ACO techniques need more computation time than the MOM, ML, and PWM methods to find the parameters of PDMs. Although the GA and ACO techniques are reasonable alternatives to solve hydrological problems, large computation time is an obvious disadvantage.
Therefore, the purpose of this study is to enhance the design storm estimation with improving parameter estimation of PDMs using a recently developed metaheuristic approach, a harmony search (HS) by Geem et al. . The performance of the HS approach is compared with the GA and conventional methods (i.e., ML).
This paper is organized in the following manner. Section 2 introduces the methodology for the parameters of PDMs with statistical test criteria. The results of the simulation are shown in Section 3, and a case study is reported in Section 4. Finally, the summary and conclusions are presented in Section 5.
2.1. Probability Distribution Models
To test the performance of the proposed parameter estimation approach, the two-parameter lognormal (LN2), two-parameter gamma (GAM2), generalized extreme value (GEV), and Gumbel (GUM) distribution models are used. The general properties of the PDMs are given in Appendices.
2.2. Metaheuristic Approaches
In the current study, two metaheuristic approaches (i.e., GA and HS) were used to estimate the parameters of PDMs.
2.2.1. Genetic Algorithms
The GA is a stochastic search algorithm based on natural evolution and mechanisms of population genetics [8, 9]. The simple ideas of GA are based on the biological processes of survival and adaption. Only the best of the population is allowed to survive and propagate to successive generations. Therefore, the GA does not require derivatives of the objective function to solve complex and discontinuous optimization problems. A number of GAs are introduced, but the following general description encompasses most of the important features.
The analogy with nature is established by the creation of a set of solutions, called a population. The initial population of solutions is usually chosen at random and allowed to evolve over a number of generations. Each individual in a population is represented by a set of parameter values that completely describe a solution and undergo constant change by means of genetic operations of reproduction, crossover, and mutation. At each generation, the fitness of individuals with respect to the objective function is calculated for reproduction and propagated to the next generation. Based on the fitness, individuals with relatively high fitness (called parent) are selected for reproduction of the next generation. For example, as shown in Figure 1(a), there are two parents selected with binary cording. The strings, including the last three digits of two parents (i.e., 110 and 011), are recombined to produce offspring that will comprise the next generation. Then, the parents are replaced in the population by the offspring to keep a stable population size. The recombination operation is usually called the crossover. The offsprings (new generations) have a higher average fitness than their parents (previous generations). Occasionally, mutation is introduced into the population to prevent the convergence to a local optimum and help generate unexpected directions in the parameter space, as shown in Figure 1(b). A fixed fitness of generations has been created when a generation has reached the highest fitness and there is no further improvement with repeated iteration .
The GA method has been widely applied in a variety of engineering optimization problems. It has been established that the GA approach is an attractive alternative to solve optimization problems with nondifferentiable, nonlinear, and multimodal objective functions [9, 25, 26].
2.2.2. Harmony Search
The HS developed by Geem et al.  is a phenomenon-mimicking algorithm inspired by the improvisational processes of musicians. The algorithm is based on natural musical performance processes in which a musician searches for a better state of harmony. Assume that the optimization problem is where is an objective function; is the set of each decision variable ; (representing probability parameter); is the number of decision variables (the number of parameters for a probability distribution); and and are the upper and lower limits, respectively, of the decision variable .
To solve the optimization problem, the procedure of the HS is as follows.(1)Generate the harmony memory (HM) randomly up to the harmony memory size (HMS) from a uniform distribution, denoted as where ; ; and represents the uniform distribution ranging from to . The generated HM is (2)Improvise a new set () by where HMCR is the harmony memory considering rate for all the variables and . Equation (4) implies that each decision variable of a new harmony set is sampled from the same variable of HM in (3) for the probability of HMCR. Otherwise, generate the harmony set from the uniform distribution as in (2).(3)Adjust each variable of the improvised new set () with the probability of the pitch adjusting rate (PAR) as where is generated from . Here, is the arbitrary distance bandwidth. If is large, the variability of the adjusted value is also large.(4)Update the HM by replacing the worst harmony corresponding to the worst objective function with the improvised and adjusted new set.(5)Repeat steps to until the termination criterion has been met.
Following the empirically based HS parameter range is recommended to produce a sufficient solution of 0.7–0.95 for HMCR, 0.2–0.5 for PAR, and 10–50 for HMS .
2.2.3. Optimization Function
The derivative-based method usually uses the integral of the square of the error (ISE) as an optimization function to find a global solution because a derivative of ISE can be obtained relatively well. The metaheuristic approaches use a stochastic random search instead of a derivative search to find global optimization, so that various forms of the optimization function based on the integral of the absolute magnitude of the error (IAE), the integral of the time-absolute error (ITAE), and ISE are applied.
In this study, we use two metaheuristic approaches to estimate the parameters of PDMs. To attain the optimum result, an equivalent objective function (or target function) is used with two metaheuristic approaches. The objective function in this study is constructed using ISE with observed and estimated values and is expressed by where is an objective function; is the th ordered observation value; and is the estimated value from the corresponding th empirical cumulative probability of the selected PDM.
2.3. Statistical Test Criteria
Three test criteria are used to assess the adequacy of the proposed parameter estimation method: the correlation coefficient (CC), coefficient of efficiency (CE), and root mean square error (RMSE) [27, 28]. The mathematical forms of these criteria are as follows: where and are the mean of the observed data and the mean computed data from the fitted PDM, respectively. The range of CC is . Here, the CC is also equivalent to the Filliben Q-Q correlation test, which was proposed by Filliben  as a test of normality. Vogel  extended this approach to lognormal and Gumbel distributions for the goodness-of-fit test and GEV :
Note that higher CC and CE values and lower RMSE values represent better performance. Estimated statistics are described with boxplots. Boxes indicate the interquartile range (IQR), and whiskers extend to 1.5 IQR. The horizontal lines inside the boxes depict the median of the data.
3. Simulation Study
In the simulation study, we employ the GEV distribution model and estimate the parameters (i.e., scale (), shape (), and location ()) using the GA and HS techniques. For the simulation study, we produce 100 data sets using a GEV random number. An individual data set with a record length of 100 has the same parameters, which are , , and . The strategies for the GA and HS are shown in Table 1.
One hundred series are simulated for the parameter estimation of the GEV distribution model. The results of the parameter estimation based on the GA and HS methods are compared using boxplots, as shown in Figure 2. As shown in Figures 2(a) and 2(b), the and derived from the HS indicate a better performance than those of the GA. In addition, the variability of and derived from the GA is greater than that of the HS. However, as shown in Figure 2(c), the derived from the HS and GA shows similar results.
The statistical test criteria in (7) and (8) were computed using the estimated parameters. In Figure 3, the test criteria for the proposed metaheuristic approaches are shown. Recall that higher CC and CE values and lower RMSE values represent better performance. As shown in Figure 3(a), the CCs for the HS are higher, while the variability is narrower than the CCs for the GA. The CE shows a similar pattern to the CC, as shown in Figure 3(b). The RMSE for the HS shows lower values than that of the GA, as shown in Figure 3(c). The RMSE indicates that the HS results have more accurate values than the GA results.
Although the metaheuristic approaches (e.g., the GA and ACO) are reasonable alternatives to solve hydrological problems, significant computation time is an obvious disadvantage . Therefore, we investigate the computation time of two metaheuristic approaches. The computation time for GEV parameter estimation using 100 simulations based on the GA and HS is compared using boxplots, and the results are shown in Figure 4. The computation time for the GA is in the range of 10 to 20 minutes. However, the computation time for the HS is very short and is less than 1 minute. To determine computation time by varying conditions, the population size for the GA is changed to 100, 500, and 1000, while the harmony size for HS is also changed to 100, 500, and 1000. As shown in Figure 5, the computation time for the GA rapidly increases as the population size increases. Meanwhile, the fitness values of the objective function are only slightly improved. In contrast, the computation time for the HS rarely increases despite increasing harmony size, and the fitness values of the objective function remain consistent. Note that the fitness values of the objective function for HS with a small harmony size are better than the fitness values of the GA. The HS approach for estimation of the GEV parameter produces more reliable results than the GA approach and uses less computation time.
In addition, three PDMs (i.e., LN2, GAM2, and GUM) are used in a simulation study. The same procedure as the simulation study using GEV is employed for the parameter estimation of each PDM. The target values for the parameters of the applied PDMs and the estimated parameters by the GA and HS methods are reported in Table 2. In addition, the results of the three test criteria and the computation time for each PDM are summarized in Table 3. The results of additional simulation studies show that the two metaheuristic approaches are appropriate methods for the parameter estimation of PDMs. Furthermore, the difference between the GA and HS methods is that the HS method requires less computation time than the GA method while still providing reliable results.
4. Case Study
In the current case study, we carried out precipitation frequency analysis for design storm with annual maximum hourly rainfall data recorded at 74 rainfall gauges in South Korea. The annual maximum hourly rainfall data were extracted from the Korea Meteorological Administration website, http://www.kma.go.kr/. The record lengths of extracted rainfall data range from 20 to 100 years (average record length is approximately 40 years), and the locations of the 74 rainfall gauges are presented in Figure 6.
Four PDMs (i.e., LN2, GAM2, GEV, and GUM) were applied to the annual maximum hourly rainfall data. Parameters corresponding to the four PDMs were estimated using the three parameter estimation approaches: ML, GA, and HS. To compare the three parameter estimation approaches, the quantiles for different return periods , 25, 50, and 100 years) were estimated at the 74 rain gauges in South Korea. The variability of the quantiles for each PDM is shown in Figure 7. Note that quantiles imply the statistically estimated design storm at a given return period.
As shown in Figure 7, the results of frequency analysis for the employed rainfall data show that the quantiles estimated by the two metaheuristic approaches present similar distributions, except for the quantiles of , 100 for LN2, as shown in Figure 7(a). However, the quantiles based on the ML method show a slightly different distribution when compared with the two metaheuristic approaches.
Table 4 summarizes the results of test criteria derived from the three parameter estimation approaches. The CCs for ML, GA, and HS represent similar values, and the CE represents a similar pattern to that of CC. However, RMSE for GA and HS shows lower values than the results of ML. The RMSE indicates that the two metaheuristic approaches for the parameter estimation of PDMs have a more accurate performance than ML.
In addition, the results of parameter estimation for Seoul, Daejeon, Daegu, and Busan, which are four major cities of South Korea, are summarized in Table 5. Furthermore, values of the quantiles for the four cities are estimated, and the results are summarized in Table 6.
5. Summary and Conclusion
The HS was developed by Geem et al.  and has been applied to solve a variety of engineering problems in a number of previous studies. Because the HS adopts a stochastic random search to find the optimized best solution, initial value settings of decision variables and derivative information to reach global optimum are not required. Furthermore, after considering all of the existing vectors based on the HMCR and PAR, the HS generates a new vector, whereas the GA only considers two vectors. These features increase the flexibility of the HS and produce a better solution. Therefore, we applied the HS method to enhance design storm estimation in the current study.
The results of the simulation study based on the four PDMs (i.e., LN2, GAM2, GEV, and GUM) showed that the HS approach for the parameter estimation of PDMs produced reliable results when measuring test criteria (i.e., CC, CE, and RMSE). Furthermore, the fitness values of the objective function for HS were better than the fitness values of the GA with a small harmony size. Additionally, the computation time for the HS approach was less than that of the GA method.
Precipitation frequency analysis for design storm was conducted to assess the performance of the proposed method with the four PDMs (i.e., LN2, GAM2, GEV, and GUM) and the annual maximum hourly rainfall data recorded at 74 rainfall gauges in South Korea. The results of precipitation frequency analysis were compared with those of the ML and GA methods. The results in this study suggested that performances of the GA and HS were similar and presented more accurate results than the ML in precipitation frequency analysis for annual maximum hourly rainfall data. Accordingly, we conclude that the proposed parameter estimation method based on the HS approach is a useful alternative for the parameter estimation of PDMs, particularly when conventional methods cannot be applied to estimate the parameters of PDMs.
Probability Distribution Models
A. Two-Parameter Lognormal (LN2) Distribution
The transformation of random variable is where is the base of the logarithm.
Assuming that the mean and variance of are and , respectively, then is the probability density function (PDF, i.e., ) of the LN2. The cumulative distribution function (CDF, i.e., ) of the LN2 is
Note that if (base- logarithm) and if (base-10 logarithm). In relation to the random variable , controls the scale and is called the scale parameter, while controls the skewness and is regarded as a shape parameter. Additionally, note that in relation to the variable , is the location parameter, and is the scale parameter.
The quantile function for LN2 corresponding to the nonexceedance probability is where is a quantile of return period for LN2 and is the standard normal variate corresponding to the non-exceedance probability .
B. Two-Parameter Gamma (GAM2) Distribution
The PDF of the GAM2 is where for and for . The parameters and are the scale and shape parameters. The shape parameter is restricted to , while may be positive or negative. is the complete gamma function, which is the integral function. From (B.1), the CDF of GAM2 is derived to be
The function in (B.2) is the incomplete gamma function. Therefore, the quantile for GAM2 must be estimated numerically. The quantile of return period for GAM2 corresponding to the nonexceedance probability is alternatively estimated using the following: where and are the mean and standard deviation, respectively. is the frequency factor, which is a function of return period and the skewness coefficient .
C. Generalized Extreme Value (GEV) and Gumbel (GUM) Distribution
The PDF and CDF of the GEV for a random variable are expressed as follows: where , , and are the scale, shape, and location parameter, respectively. plays an important role such that if , the distribution tends to resemble a type-1 or Gumbel distribution; if , the resulting distribution is type-2 or Log-Gumbel distribution; and if , it is a type-3 distribution or Weibull distribution. The quantile functions for GEV and Gumbel corresponding to the non-exceedance probability are given in the following: where is a quantile of return period for GEV and Gumbel.
The authors acknowledge that this work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korean Government (MEST) (2013-0362).
C. T. Haan, Statistical Methods in Hydrology, Iowa State Press, Ames, Iowa, USA, 2002.
J. D. Salas, R. A. Smith, G. Q. Tabios, and J. H. Heo, in Statistical Computing Techniques in Water Resources and Environmental Engineering, Lecture Note, Department of Civil and Environmental Engineering, Colorado State University, Fort Collins, Colo, USA, 2008.
J. R. M. Hosking, “The theory of probability weighted moments,” Research Report RC12210, IBM Thomas J. Watson Research Center, New York, NY, USA, 1986.View at: Google Scholar
A. R. Rao and K. H. Hamed, Flood Frequency Analysis, CRC Press, New York, NY, USA, 2000.
N. T. Kottegoda and R. Rosso, Applied Statistics for Civil and Environmental Engineers, Wiley-Blackwell, West Sussex, UK, 2008.
J. R. Stedinger, R. M. Vogel, and E. Foufoula-Georious, “Frequency analysis of extream events,” in Handbook of Hydrology, chapter 18, McGraw-Hill, New York, NY, USA, 1994.View at: Google Scholar
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes: The Art of Scientific Computing, Cambridge University Press, Cambridge, Mass, USA, 1986.View at: MathSciNet
J. H. Holland, Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, Mich, USA, 1975.View at: MathSciNet
D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, New York, NY, USA, 1989.
M. Dorigo and T. Stützle, Ant Colony Optimization, MIT Press, Hong Kong, 2004.
H. R. Maier, A. R. Simpson, W. K. Foong, K. Y. Phang, H. Y. Seah, and C. L. Tan, “Ant colony optimization for the design of water distribution systems,” in Bridging the Gap: Meeting the World's Water and Environmental Resources Challenges, vol. 111, American Society of Civil Engineers, 2001.View at: Publisher Site | Google Scholar
A. C. Zecchin, H. R. Maier, A. R. Simpson, M. Leonard, and J. B. Nixon, “Ant colony optimization applied to water distribution system design: comparative study of five algorithms,” Journal of Water Resources Planning and Management, vol. 133, no. 1, pp. 87–92, 2007.View at: Publisher Site | Google Scholar
Z. W. Geem, J. H. Kim, and G. V. Loganathan, “A new heuristic optimization algorithm: harmony search,” Simulation, vol. 76, no. 2, pp. 60–68, 2001.View at: Google Scholar
A. M. S. Zalzala and P. J. Fleming, Genetic Algorithms in Engineering Systems, The Institution of Engineering and Technology, London, UK, 1999.
M. Gen and R. Cheng, Genetic Algorithms and Engineering Optimization, John Wiley & Sons, New York, NY, USA, 1999.
J. E. Nash and J. V. Sutcliffe, “River flow forecasting through conceptual models—part I: a discussion of principles,” Journal of Hydrology, vol. 10, no. 3, pp. 282–290, 1970.View at: Google Scholar
J. J. Filliben, “Probability plot correlation coefficient test for normality,” Technometrics, vol. 17, no. 1, pp. 111–117, 1975.View at: Google Scholar
R. M. Vogel, “The probability plot correlation coefficient test for the normal, lognormal, and Gumbel distributional hypotheses,” Water Resources Research, vol. 22, no. 4, pp. 587–590, 1986.View at: Google Scholar