#### Abstract

This paper presents an experimental study that aims to compare the practical performance of well-known metaheuristics for solving the parameter estimation problem in a dynamic systems context. The metaheuristics produce good quality approximations to the global solution of a finite small-dimensional nonlinear programming problem that emerges from the application of the sequential numerical direct method to the parameter estimation problem. Using statistical hypotheses testing, significant differences in the performance of the metaheuristics, in terms of the average objective function values and average CPU time, are determined. Furthermore, the best obtained solutions are graphically compared in relative terms by means of the performance profiles. The numerical comparisons with other results in the literature show that the tested metaheuristics are effective in achieving good quality solutions with a reduced computational effort.

#### 1. Introduction

The present paper addresses the problem of finding a set of parameter values in a dynamic system model that calibrates the model so that it can reproduce the existing experimental data in the best possible way [1]. This is performed by minimizing an objective function that measures the goodness of the fit. Thus, an objective function, which gives the sum of the squared errors between the model predicted state values and the observed values (at certain time instants of a fixed interval), is minimized, subject to a system of ordinary differential equations (ODE). The dynamic model defined by the system of ODE simulates the time varying processes that take place in a certain time interval. This problem is known in the literature as the dynamic model based parameter estimation (DMbPE) process. Solving this problem is crucial in systems biology and medicine, with a great impact on both pharmaceutical and biotechnological industries [2]. The problem is also very common in the chemical engineering area [3] and it has been extensively used to describe physical phenomena [4]. The DMbPE problem may involve nonlinear differential-algebraic equations and a multiplicity of local solutions may exist. Nonconvexity and ill-conditioning issues of the problem may be addressed with efficient global optimization (GO) methods and proper regularization techniques [5]. The problem is even more difficult than that of an algebraic model. The nonlinear dynamic behavior makes the analytical approach rather complicated, if not impossible, for most of the real phenomena. Numerical direct methods are therefore good alternatives to solve the DMbPE problem. Like any parameter estimation problem, the main advantage is that the intended global minimum is known in advance. The most well-known methods, like Levenberg-Marquardt, gradient descent, and the Nelder-Mead method, which have been extensively used in parameter estimation problems, are local optimization methods and cannot guarantee convergence to a global solution. They are able to exploit a specified search region around a given initial value, but the global solution is difficult to find when the initial value is far from the required solution. Algorithms for GO can be roughly divided into two categories: exact algorithms and heuristics [6].

Sörensen and Glover [7] have defined metaheuristic as follows: “A metaheuristic is a high-level problem-independent algorithmic* framework* that provides a set of guidelines or strategies to develop heuristic optimization algorithms.” Further, an implementation of a heuristic optimization algorithm that follows the strategies laid out in the metaheuristic framework is also referred to as a metaheuristic. This term appeared for the first time in a publication by Glover [8]. Metaheuristics are approximate methods or heuristics that are designed to search for good solutions with less computational effort and time than the more classical algorithms. While heuristics are designed to solve a specific problem, metaheuristics are general-purpose algorithms that can be applied to solve almost any optimization problem. The metaheuristics evaluate a set of potential solutions of an optimization problem and perform a series of operations on those solutions in order to find different and hopefully better solutions. Depending on the way these operations are carried out, a metaheuristic can be classified as a local search, constructive, or population-based metaheuristic. For details, the reader is referred to [7].

Most metaheuristics use random procedures that invoke artificial intelligence tools and simulate nature behaviors and their performance does not depend on the properties of the problem to be solved. They are alternative methods to find good approximations to optimal solutions of GO problems. In addition, they are derivative-free and easy to implement [9]. Metaheuristics can locate the vicinity of global solutions with relative efficiency although global optimality cannot be guaranteed in a finite number of iterations. On the other hand, exact methods for GO can guarantee global optimality for certain problems but the required computational effort increases often exponentially with the problem size [10, 11].

Previous use of metaheuristics, mainly those that use metaphors based on natural evolution and on behavior of animal swarms, to solve some DMbPE problems showed that they are able to provide good quality solutions when real experimental data and noisy artificial data are considered. In [15], a hybrid metaheuristic algorithm that introduces evolutionary operations, namely, mutation and crossover, into the firefly algorithm (FA), is presented. Two variants of the differential evolution (DE) method—a trigonometric and a modified version—have been implemented in [16, 17]. The authors in [2] developed a new procedure based on the scatter search (SS) methodology. A slightly different version of the SS, which uses the fmincon solver from Matlab (Matlab is a registered trademark of MathWorks, Inc.) as an improvement method, is tested in [12] to solve three well-known DMbPE problems, including the chemical isomerization of -pinene. This latter problem is analyzed and solved by the FA in [13]. Alternatively, exact methods for GO have also been used to solve DMbPE problems. These include the spatial branch and bound (BB) algorithm to compute a global solution of the partially discretized DMbPE problem [18], the -BB method to obtain a global solution within an -precision of the completely discretized DMbPE problem [10], and a method based on interval analysis and on the partially discretized DMbPE [19]. In [14], a deterministic outer approximation approach is applied to the reformulation of the DMbPE problem as a finite NLP by applying a complete discretization using orthogonal collocation on finite elements.

Since the problem of estimating the parameters of a dynamic model is important, the contribution of this study is concerned with the implementation and practical comparison of five very simple and easy to implement metaheuristics, hybridized with a local intensification phase, when solving the finite nonlinear programming (NLP) problem that arises when a numerical direct method of a sequential type is used to locate a global optimal solution to the DMbPE problem. The unknown parameters are the decision variables of the NLP problem.

The selected metaheuristics are very popular and have been used to solve a variety of real-world applications. The selection includes the FA [20], the Harmony Search (HS) algorithm [21], the DE algorithm [22], the Artificial Bee Colony (ABC) algorithm [23], and the Directed Tabu Search (DTS) algorithm [24]. During the local intensification phase, the well-known Hooke-and-Jeeves (HJ) local search [25] is implemented with all the metaheuristics. These local search enhancements lead to much faster convergence and provide good quality solutions. The practical comparison carried out in the present study aims to analyze the performance of each metaheuristic in terms of the quality of the obtained solutions. To test the five metaheuristics, nine DMbPE problems were selected from the literature [4, 10, 13, 19, 26], yielding 12 instances due to different experimental data. The problems are described in the Appendix.

The remainder of the paper proceeds as follows. Section 2 addresses the DMbPE problem and the sequential numerical direct method, and Section 3 introduces the selected metaheuristics. Section 4 contains the results of all the numerical experiments and the conclusions are summarized in Section 5.

#### 2. Sequential Direct Method to Solve the DMbPE Problem

To solve the DMbPE problem, the sequential numerical direct method is used. For completeness, the DMbPE problem reads as follows. Find such thatwhere represents the objective function that depends on , a vector of the decision variables (the parameters that need to be estimated); is the vector (with length ) of the state variables that depend on (the independent variable); and represents the vector with the derivatives of with respect to . represents the number of experimental data available for each ; , , , are the experimentally observed values; is the system of first-order ODE; is the vector of initial values to the variables , ; and and represent the vectors with the lower and upper bounds on the parameters, respectively. and are the initial and final values, respectively, of the independent variable .

To solve problem (1), there are indirect methods and direct methods [27]. Indirect methods use the first-order necessary conditions from Pontryagin’s minimum principle to reformulate the problem as a two-point boundary value problem. The latter may become difficult to solve especially if the problem contains state variable constraints.

In direct methods, the optimization present in (1) is performed directly. This type of method discretizes the problem and applies NLP techniques to the resulting finite-dimensional optimization problem. There are two types of direct methods. In the* sequential direct method*, the DMbPE problem is transcribed into a small finite-dimensional optimization problem through the discretization of the decision variables only. Hence, the optimization is carried out in the space of the decision variables only. Given a set of values for the decision variables, the system of ODE is accurately integrated (over the entire time interval) using a specific numerical integration formula (with error control mechanisms to enforce the accuracy of the state variable values), so that the objective function value can be evaluated [3]. Thus, the ODE are satisfied at each iteration of the NLP solver. The method is called sequential, since the processes of minimizing the sum of the squared errors and solving the ODE are done in a sequential manner. This method may lead to a slow convergence process since the system of ODE is solved again and again each time an objective function value is required. In the* simultaneous direct method*, approximations to the solution of the system of ODE are computed instead. The optimization is carried out in the full space of the discretized decision and state variables, which may lead to very-large-dimensional finite NLP problem with equality constraints (emerging from the discretization of the system of ODE), in particular if a fine grid of points is required to obtain a high level of integration accuracy. Thus, very efficient NLP solvers are required when solving the finite problem.

We use the sequential direct method and the system of ODE is numerically integrated by the Matlab function ode15s. The default value for the scalar relative error tolerance is set to and all the components of the vector of absolute error tolerance are set to . The sequentially discretized NLP problem is then solved by a stochastic metaheuristic. This is further elaborated in the next section.

#### 3. Stochastic Metaheuristics

To compute global optimal solutions to NLP problems, stochastic or deterministic methods are available. Stochastic GO methods are able to provide a near-optimal solution in a short CPU time, although it may not be globally optimal. In contrast, deterministic GO methods provide an interval within which the global optimal solution falls, although they require very large computational efforts [14]. Stochastic metaheuristics, such as FA, HS, DE, ABC, and DTS, include randomly generated numbers from probability distributions to define a set of solutions on the search region, as well as to move those solutions to hopefully better positions. They ensure convergence to a global solution in probability, while deterministic methods tend to guarantee asymptotic convergence. Global search techniques use exploration and exploitation search procedures that aim to diversify the search so that a global optimal solution is located and to intensify the search so that a good approximation is computed in a promising area of the search space.

Most stochastic metaheuristics are classified in terms of source of inspiration as nature-inspired algorithms. The well-known swarm-intelligence-based algorithms belong to a wider class called the bioinspired algorithms, and these are a subclass of nature-inspired algorithms [28]. The most popular are the algorithms based on swarm intelligence, with the FA and the ABC algorithm being two examples. On the other hand, the HS is not a bioinspired algorithm and has its inspiration from music. Although the DE algorithm is not based on any biological behavior, it can be classified as a bioinspired algorithm due to the keyword “evolution,” but it is not swarm-intelligence-based [28]. Finally, the DTS algorithm is not a nature-inspired algorithm, but it has the particularity of using past information of the search to record it in memory. The majority of the metaheuristics define a set of approximate solutions at each iteration, called population, and generate new trial solutions that may be accepted as the solutions for the next iteration, if some improvement is detected relative to the current set of solutions.

We use the notation to represent the th point of a population of points, possible solutions of problem (1). The set defines the bound constraints of the problem.

##### 3.1. The FA

The FA is a bioinspired metaheuristic algorithm that is capable of converging to a global solution of an optimization problem. It is inspired by the flashing behavior of fireflies at night [20, 29]. The main rules used to construct the FA are as follows: (i) all fireflies are unisex, meaning that any firefly can be attracted to any other brighter one; (ii) the brightness of a firefly is determined from the encoded objective function; (iii) attractiveness is directly proportional to brightness but decreases with distance.

Table 1 lists the most relevant nomenclature used in the basic FA. At the beginning, the positions of the fireflies in the search space are randomly generated. Then, they are evaluated by computing the corresponding objective function value and the points are ranked in ascending order; that is, is the brightest firefly, is the second brightest, and so on. In the classical FA, a firefly () is moved towards the brighter fireflies as follows:where the second term on the right-hand side of (2) is due to the attraction while the third term is due to randomization with , is the attraction parameter when the distance is zero, is a number uniformly distributed in , and is a vector that aims to scale the movement to the set . The parameter is crucial to speed the convergence of the algorithm and it can take any value in the set . When , the attractiveness tends to the minimum constant value . To accelerate convergence and at the same time prevent premature convergence, the sequence of values is made to slowly decrease along the iterative process. If a component of the final position of a firefly falls outside the set , it is shifted onto the boundaries.

##### 3.2. The HS Algorithm

The HS algorithm was developed to solve GO problems in an analogy with the music improvisation process where music players improvise the pitches of their instruments to obtain better harmony [21, 30]. At each iteration , the basic HS algorithm provides a set of solution vectors, available in the harmony memory (HM), from which the best and the worst solutions, in terms of objective function values, are identified. Table 2 lists the most relevant nomenclature used in the HS algorithm.

The HM contains solution vectors that are maintained in memory throughout the iterative process. First, the solutions in the HM are randomly generated in the search space , , . Then, they are evaluated and the best harmony, , and the worst, , in terms of objective function value are identified. Hereafter, at each iteration, a new harmony is improvised; that is, a new vector is generated, using three improvisation operators. When the* HM operator* is used, the component of is chosen from the HM with probability ; otherwise, the* random selection operator* is used and the component is randomly generated in :for , where is a random number uniformly distributed in . Finally, the* pitch adjustment operator* is subsequently applied with a probability , which varies with the iteration counter , to refine only the components produced by the* HM operator*, as follows:where is the distance bandwidth that depends also on [31]. Finally, the components of are checked against the bounds and projected onto the boundaries if they fall outside. In the final stage, if , the HM is updated since the new harmony is included in the HM, replacing the worst one.

##### 3.3. The DE Algorithm

The DE is a bioinspired population-based algorithm that relies on three strategies—*mutation*,* crossover*, and* selection*—to define the solutions/points for the next iteration [22]. The most important nomenclature and parameters are shown in Table 3.

The initial population of points, , , is randomly generated in the search space . The most commonly used* mutation* strategy defines the mutant point , , as follows:with uniformly chosen random indices , , and from the set , mutually different, and is a real parameter in which controls the amplification of the differential variation, . The indices , , and are also chosen to be different from the index . is called the base point.

The components of the mutant vector are then mixed with components of the vector to generate the trial point, . This strategy is referred to as* crossover* and can be described as follows:for , where denotes a random number uniformly generated within the interval and aims to perform the mixing of the component of the points, , and , an index randomly selected from , aims to ensure that gets at least one component from . Finally, the components of are projected onto the boundaries of if they fall outside. Then, a* selection* strategy compares each trial point with in terms of objective function values and the best one is selected for the population of the next iteration. The variant described above is referred to as DE/rand/1/bin, where “rand” specifies the vector that is mutated, “1” defines the number of difference vectors, and “bin” means that crossover is based on independent binomial experiments [22]. There are other frequently used DE variants, for instance, the DE/best/1/bin which uses the best point of the population as the base point to define the mutant point and the DE/rand-to-best/1/bin with two differential variations:where is the best point in the current population [32].

##### 3.4. The ABC Algorithm

The ABC algorithm is an optimization algorithm based on the intelligent behavior of honeybee swarms [23]. The colony of artificial bees of size includes the employed bees, the onlooker bees, and the scout bees. The first half of the colony consists of employed bees and the second half constitutes the onlookers and scouts. The position of a food source represents a possible solution of the optimization problem and the amount of nectar in the food source gives the quality of that solution. The number of food sources, , is taken to be equal to the number of employed bees. The most important nomenclature and parameters in the ABC algorithm are shown in Table 4.

At the initial stage, a set of food source positions are randomly selected by the bees; that is, the positions are randomly generated in the search space , and their nectar amounts are determined in terms of fitness values. During the employed bee phase, new candidate food positions, called mutant solutions, are produced as follows:for , where (different from ) and are randomly chosen indexes from the sets and , respectively, and is a random number uniformly generated within the interval . If the mutant component falls outside , it is shifted to the boundaries. Afterwards, the mutant solution is compared with and a greedy selection is applied to choose the one with better nectar, that is, with better fitness, as described in (9). If the current solution is maintained (meaning that it has not been improved), its trial counter is increased.

On the other hand, during the onlooker bee phase, the food sources are randomly chosen according to probability values, (associated with the food sources), that depend on their fitness evaluated byThe mutant solution is produced from the chosen old one as previously shown in (8) and the greedy selection is applied between the current and its mutant. Finally, in the scout bee phase, if the trial counter of the solution that has not been improved mostly exceeds a predetermined value, called “limit,” that solution is abandoned and replaced by a position randomly generated in the search space [33].

##### 3.5. The DTS Algorithm

The tabu search (TS) algorithm, introduced to continuous optimization in the paper [34], is capable of guiding the search out of local optima and exploring new regions for the global optimum. This is a point-by-point iterative procedure that maintains a list of the most recent movements, denoted by “tabu list” (TL), so that the movements that lead to solutions previously visited are avoided.

The DTS method developed in [24] is hybridization of the TS with a direct search method that aims to search the neighborhood of a local minimum. The method comprises three main search procedures: exploration, diversification, and intensification. The main loop consists of the exploration and the diversification search procedures. The exploration search aims to explore the search space and uses direct search methods to be able to stabilize the search, in particular in the vicinity of a local minimum. Cycling is prevented by the TL and by another four TS memory elements: the multiranked tabu list, the tabu region, the semitabu region, and the visited region list (VRL). The VRL works as a diversification tool and is used to direct the search towards regions that have not been visited in the search space. The most important nomenclature and parameters of the DTS algorithm are listed in Table 5.

During the exploration procedure, DTS uses an adaptive pattern search strategy to generate an approximate descent direction (ADD) for the objective function at the current approximation. First, based on an approximation , pattern directions parallel to the coordinate axes are constructed and trial points , , are generated, along these directions with a given step length. Thus, the ADD is obtained bySecond, two local trial points and are then generated along with two different step lengths aiming to explore the region along [24]. The diversification procedure aims to randomly generate a new trial point outside the previously visited regions. The VRL contains the centers of the visited regions and the frequency with which these regions are visited. If the shortest distance of to a region’s center, weighted by its frequency, exceeds the predefined region radius, given by , the point is accepted; otherwise, a new trial point is generated. When one of the best obtained trial solutions is sufficiently close to a global minimum, or its value has not been changed for a specified number of iterations, the DTS algorithm leaves the exploration and diversification search procedures and invokes an intensification procedure. Here, the HJ local search is used to compute a solution still closer to the global minimum (see the next section).

##### 3.6. Hooke-and-Jeeves Local Search

A pattern search method directs the search towards a minimizer using a pattern of specific number points. At least points must be provided by the pattern, where is the number of variables. Based on a current approximation, the HJ method uses a pattern search strategy to define a trial solution that gives an improvement in the objective function value, when compared with the current point [25]. Let the current approximation be denoted by , where represents the iteration counter in this iterative process. Then, a trial point, , is generated along a search direction (starting from the current point) with a certain step size as follows:where is the search direction chosen from a finite set of positive spanning directions in . The most used set contains the coordinate directions, defined as the positive and negative unit coordinate vectors . The most important property of this set is that at least one of the coordinate directions is a descent direction for the objective , when is not a stationary point of . When the search fails to generate a trial point that is better than , the iteration is called* unsuccessful*, the step size is halved so that a refined search can be carried out, and . If falls below a given stopping tolerance, the algorithm terminates and the current is considered an approximate minimizer of . However, when at the end of each iteration the objective function value has reduced, then the iteration is called* successful*, is not changed, and .

For a fair comparison, and to improve the quality of the produced solutions, we propose the implementation of the HJ intensification phase with all the above-mentioned metaheuristics. Based on the final solution provided by the metaheuristic, the HJ local search algorithm is invoked and allowed to run for iterations. The number of function evaluations required by the HJ local search is added to the function evaluations of the metaheuristic, herein denoted by , to produce the total number of evaluations of the run.

#### 4. Numerical Experiments

This section aims to present and analyze the statistical significance of the numerical results that were obtained when the metaheuristics are used to solve the DMbPE problems in the sequential direct method context. The dynamic system models were coded in the Matlab programming language and the computational application of the sequential direct method with the FA, HS, DE, ABC, and DTS codes was developed in Matlab programming environment. The computational tests were performed on a PC with a 2.2 GHz Core i7-2670QM and 8 GB of RAM. The parameter values for the tested algorithms are set as shown in Table 6. Other values have been tested, in particular for the parameter in the FA, DE, and ABC and the parameter in HS, but those used seem adequate with reduced computational effort.

Nine case study problems have been selected to analyze the performance of the described stochastic algorithms. A comparison is made considering the quality of the solutions and the time spent to reach the solution after a threshold number of function evaluations. The problems and the experimental data can be found in the Appendix.

Table 7 shows (the average of the obtained final solutions), St.D. (the standard deviations of the solutions), and (the average time in seconds) after running each instance (inst.) 10 times with each metaheuristic (MH). For these experiments, each algorithm is terminated solely with the condition , where we have tested three values of : , , and . The goal here is to analyze the quality of the obtained solutions. Since the target objective value is 0, the lower the value of the better.

To analyze the statistical significance of the results, we use the Matlab function friedman that performs a nonparametric statistical test for multiple comparisons. The Friedman test aims to determine significant differences in the mean for one independent variable, for instance, or , with different levels that correspond to the five metaheuristics, and a dependent variable (corresponding to matched groups, herein taken as the 12 instances) [35]. The null hypothesis in this test is that the mean ranks assigned to the results of the metaheuristics under testing are not statistically different. Let be the number of metaheuristics to be ranked and be the number of groups. We note here that the distribution of the Friedman* statistic*, here denoted by , approaches the ordinary distribution as increases. The exact distribution of has been obtained for special cases of and (the reader is referred to [35] and the references cited in [36]). It has been concluded in [35] that “when the number of groups is moderately large (say greater than 5 for four or more ranks) the significance of the Friedman* statistic* can be tested by reference to the available tables.” However, in [36], it is argued that “the usual approximation is grossly inaccurate for most commonly used combinations of and .” Thus, hereafter, we use another* statistic* that has been recommended in [36, 37] and depends on , as well as on both and :which is distributed according to the -distribution with and degrees of freedom.

When applied to for , and the value of is 12.229. With and , is distributed according to the -distribution with 4 and 44 degrees of freedom. For a significance level of 5%, the critical value is 2.59 (computed by linear interpolation between and ). Since , we have enough evidence to reject the null hypothesis of “no significant differences on mean ranks”; that is, the observed differences between the five distributions of values are statistically significant. Pairwise comparisons may be carried out to determine which mean ranks are significantly different. The Matlab function multcompare is applied. The estimates of the 95% confidence intervals are shown in Figure 1(a) for each case under testing. The distributions of are significantly different if their intervals are disjoint and are not significantly different if their intervals overlap. Hence, we conclude that the mean rank produced by DE is significantly different from that of ABC. For the other pairs of comparisons, there are no significant differences on the mean ranks.

**(a)**metric

**(b)**metricWhen the test is applied to for , and which exceeds the value of , indicating that the null hypothesis is rejected at a significance level of 5%, and we conclude that the distributions of values have statistically significant differences. Figure 1(b) shows the estimates of the 95% confidence intervals. We conclude that the mean rank of ABC is significantly different from those of DE and DTS.

A similar analysis is made for and , when . We obtain for the statistics with and with , respectively. Hence, we may conclude that there is enough evidence to reject the null hypothesis of “no significant differences on mean ranks” of the distributions of , at a significance level of 5%. From Figure 2(a), we conclude that the mean ranks of (the distribution of values) ABC and DTS are significantly different from the mean ranks of DE. For the distributions, we conclude that there is not enough evidence to reject the hypothesis of “no significant differences on mean ranks,” at a significance level of 5%. Figure 2(b) shows the corresponding estimates of the 95% confidence intervals and it is noticed that all pairs have their intervals overlapping.

**(a)**metric

**(b)**metricWhen the statistical analysis is extended to the distributions of and values, for , the statistic gives 16.34 (with ) and 10.64 (with ), respectively. Hence, we have evidence to reject the null hypothesis relative to the values, at a significance level of 5%. Figure 3(a) shows the corresponding estimates of the 95% confidence intervals (ABC and DTS have mean ranks significantly different from DE). As far as the distributions of values are concerned, we conclude that there is enough evidence to reject the hypothesis of “no significant differences on mean ranks” at a significance level of 5%. From Figure 3(b), we conclude that the differences are on the mean ranks of DE and DTS. We note that, at a significance level of 1%, a smaller probability of making a wrong decision, where the critical value is 3.79 (computed by linear interpolation between and ), there is no evidence to reject the null hypothesis.

**(a)**metric

**(b)**metricWe now show in Table 8 the best solution obtained by each metaheuristic, (after the 10 runs), and the time required to reach that value, , for the three values of . Based on the metrics and , we use a graphical procedure to visualize the performance differences among the results produced by the five metaheuristics, in relative terms on the 12 instances, known as performance profiles [38]. Each plot reports (on the vertical axis) the percentage of problems solved with each metaheuristic that is within a certain threshold, (on the horizontal axis), of the best result. The performance profiles of the five distributions of values for are shown in Figure 4(a). Figure 4(b) corresponds to the metric .

**(a)**metric

**(b)**metricThe higher the percentage, the better. A higher value for means that the metaheuristic achieves the lowest value (or the smallest ) mostly.

Therefore, the metaheuristic DE is the most successful since it has the highest probability of achieving the lowest value, when . From the plot, the probability that DE wins on a given instance is 0.5 since it produces the lowest in 6 of the 12 instances. It is followed by HS that has a probability of winning (prob_w) of 0.25, by DTS with prob_w = 0.17, and by FA with prob_w = 0.08. We note that the differences on the produced are so small that the plots of the cumulative distribution are packed around . From Figure 4(b), it is possible to conclude that the fastest metaheuristic is ABC since the probability of achieving the smallest value of (time in seconds) is 0.58.

Similarly, Figures 5(a) and 5(b) contain the performance profiles of the five distributions of and , respectively, when . From the figure on the left, we may conclude that DE is the most successful since it has prob_w = 0.58, followed by FA with prob_w = 0.25 (achieving the best results in 3 out of 12 instances), by HS with prob_w = 0.17, and by ABC and DTS with prob_w = 0.08. As far as is concerned, it is possible to conclude that the metaheuristic with the highest probability of requiring the lowest time is ABC followed by DE.

**(a)**metric

**(b)**metricFinally, Figures 6(a) and 6(b) display the performance profiles of the metrics and , respectively, when . From these comparisons, we may conclude that DE is the most successful in reaching the lowest , with prob_w = 0.5, followed by HS and ABC with prob_w = 0.25 and finally the metaheuristics FA and DTS with prob_w = 0.08. When the profiles of are analyzed, it is possible to conclude that the metaheuristic DE is in general faster than the others but for some problems is outstripped by FA and HS.

**(a)**metric

**(b)**metricTables 9–19 contain comparative results relative to 11 instances. (The results obtained for instance VOL_A are not used in the comparison since the experimental data have been herein generated for this study.) Tables 9–19 show the best and the average values (computed from the 10 runs), as well as the average CPU time and function evaluations, and , respectively. Our results always appear along the first row of each table and are taken from the metaheuristic that produced the lowest with the least function evaluations. This is identified in the table caption. (We note that the implemented metaheuristics have been enhanced with the HJ local search, and the reported considers the sum of with the function evaluations required by the HJ search.) In the tables, “–” means that the information is not available in the cited paper and “n.a.” means “not applicable.” The comparisons use the results reported in the papers [2, 4, 10, 12–14, 16–19, 26]. Results in [16] are based on a trigonometric version of the DE, and in [17], a modified version of the DE is implemented. The authors in [12] use a modified version of the metaheuristic scatter search (SSm) with a local search as an improvement method. The results shown in the table use the function fmincon from Matlab. SSm uses a population of 100 points and a penalty technique with a weight of 1000 to penalize infeasible solutions. In [19], the sequential approach is implemented with an interval-based method for the NLP optimization. The therein obtained solution is an -global solution. Global solutions are produced by the method of orthogonal collocation on finite elements presented in [26]. This strategy is implemented within Matlab code Dynopt after 100 runs of a multistart approach. (We note that a direct comparison of the CPU time cannot be done since the computational platforms are different.)

Overall, after all the numerical comparisons, it is possible to conclude that the selected metaheuristics when enhanced with the local intensification phase are effective in achieving good quality solutions with a reduced computational effort. Furthermore, we have also shown that the sequential direct method combined with the selected metaheuristics competes very favorably with exact methods and other metaheuristics available in the literature.

#### 5. Conclusions

In this paper, we have analyzed the performance of five well-known metaheuristics when solving parameter estimation problems in dynamic system models. The sequential numerical direct method is applied to the DMbPE problem and the resulting optimization is performed directly making use of a numerical integration formula for solving the system of ODE. Using nine DMbPE problems and different experimental data, error-free and random error-added data, a total of 12 instances have been used in the comparative experiments. The solutions produced by the metaheuristics have been analyzed in terms of quality, by stopping the algorithms after a specified number of function evaluations, and compared using statistical hypotheses testing. The statistical tests show that the average obtained solutions and the average CPU time are considered to be mostly significantly different, at a significance level of 5%. The best solutions produced by the metaheuristics are analyzed by means of the performance profiles. After the graphical comparisons for different numbers of allowed function evaluations, we are able to conclude that the DE is the metaheuristic that has the highest probability of giving the lowest value of the objective function mostly. It is followed by HS. The FA, ABC, and DTS metaheuristics are, in this sequence, the least effective. Thus, our recommendation for a reader that favors good quality solutions falls in the DE and HS metaheuristics. On the other hand, ABC and DE are in general the metaheuristics that require the least computational effort, meaning that, for a limited computational budget, our recommendation is to choose one of them. Finally, from the comparisons with other results in the literature, involving both sequential and simultaneous direct methods and stochastic as well as deterministic global optimization methods, we state that the sequential direct method combined with one of the above recommended metaheuristics is a good alternative to solve DMbPE problems. Good quality solutions with a reduced computational effort are thus provided.

#### Appendix

#### DMbPE Problems

*-Pinene First-Order Kinetics (**α*-PIN*)*. A system of five ODE with five parameters is given in [13]for , with initial conditions , , , , and . The lower and upper bounds on the parameters are , . Experimental data are available in Table 20.

*Bellman’s Problem (**BEL**)*. This ODE has one dependent variable and two parameters (see Section 7.4 in [10]). The herein tested version uses the exponential transformation: for , with initial condition . The following loose lower and upper bounds on the parameters , , can be considered defining the instance BEL_A. A second experiment was carried out with this problem using the following tight lower and upper bounds and , giving the instance BEL_B. Experimental data have been generated using and and with a small amount of random error added [10] (see Table 21).

*Catalytic Cracking of Gas Oil (**CAT**)*. This ODE has two dependent variables and three parameters (see Section 6.3 in [19] and Section 7.3 in [10]): for , with initial conditions and . Lower and upper bounds on the parameters are , , and the experimental data in Table 22 have been generated using the following twenty time instants: 0.025, 0.05, 0.075, 0.1, 0.125, 0.150, 0.175, 0.200, 0.225, 0.250, 0.300, 0.350, 0.400, 0.450, 0.500, 0.550, 0.650, 0.750, 0.850, and 0.950, and , , and with a small amount of random error added [10].

*First-Order Irreversible Series Reaction (**IRR1**)*. A system of ODE with two dependent variables and two parameters (Section 6.1 in [19] and Section 7.1 in [10]) is given: for , with initial conditions . Lower and upper bounds on the parameters are , , and the experimental data in Table 23 have been generated using and , with no added error [10].

*First-Order Irreversible Series Reaction (**IRR2**)*. This is the system of ODE of the previous example with different initial conditions, different upper bounds, and a different time interval (Example 1 in [26]): for , with the initial conditions . The lower and upper bounds on the parameters are , . Experimental data have been generated using the time instants and and with no added error [26] (see Table 24).

*First-Order Reversible Series Reaction (**REV**)*. This is a system of three ODE with four parameters available in Section 7.2 of [10]:for , with initial conditions , , and . Lower and upper bounds on the parameters are , , and , . Two sets of data have been generated using and , , , and [10]. The first set has no added error (see Table 25) and yields the instance denoted by REV_A. The second set of data shown in Table 26 has been obtained by adding a small amount of random errors [10] and yields the denoted instance REV_B.

*Theoretical Kinetic Model Described by Three Reactions and Based on Three Parameters (**THEO1**)*. This system of five ODE depends on three parameters (Example 2 in [26]): for , with initial conditions , , , , and . Lower and upper bounds on the parameters are , . Only the concentration of component (=) was measured. Thus, data for component alone for 22 time instants, shown in Table 27, are available in [26].

*Theoretical Kinetic Model Based on Two Parameters (**THEO2**)*. The following system has five dependent variables and two parameters (see Example 3 in [26]): for , with initial conditions , , , , and . The lower and upper bounds on the parameters are , . Experimental data, shown in Table 28, have been generated using and and with added small error and are available in [26]. Only the concentrations of components (=) and (=) were measured.

*Lotka-Volterra Predator-Prey Model (**VOL**)*. This is a system of ODE with two dependent variables and two parameters (see Section 6.4 in [19] and Section 7.6 in [10]): for , with initial conditions and and lower and upper bounds on the parameters: , . Experimental data in Table 29 have been generated using and and with a small amount of random errors added, according to the normal distribution , yielding the instance VOL_A. A second set of experimental data, shown in Table 30, is available in [10] and has also been tested, corresponding to the herein denoted instance VOL_B.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The authors would like to acknowledge the financial support of CIDEM, R&D Unit, funded by the Portuguese Foundation for the Development of Science and Technology (FCT), Ministry of Science, Technology and Higher Education, under the Project UID/EMS/0615/2016, and of COMPETE: POCI-01-0145-FEDER-007043 and FCT within the Projects UID/CEC/00319/2013 and UID/MAT/00013/2013.