Mathematical Modeling, Analysis, and Advanced Control of Complex Dynamical SystemsView this Special Issue
Research Article | Open Access
A Cooperative Harmony Search Algorithm for Function Optimization
Harmony search algorithm (HS) is a new metaheuristic algorithm which is inspired by a process involving musical improvisation. HS is a stochastic optimization technique that is similar to genetic algorithms (GAs) and particle swarm optimizers (PSOs). It has been widely applied in order to solve many complex optimization problems, including continuous and discrete problems, such as structure design, and function optimization. A cooperative harmony search algorithm (CHS) is developed in this paper, with cooperative behavior being employed as a significant improvement to the performance of the original algorithm. Standard HS just uses one harmony memory and all the variables of the object function are improvised within the harmony memory, while the proposed algorithm CHS uses multiple harmony memories, so that each harmony memory can optimize different components of the solution vector. The CHS was then applied to function optimization problems. The results of the experiment show that CHS is capable of finding better solutions when compared to HS and a number of other algorithms, especially in high-dimensional problems.
Optimization is a very ancient problem which has been researched by numerous mathematicians since time immemorial, including Newton, Gauss, and John von Neumann. They developed numerous mathematical theories and methods that made a considerable contribution to optimization. However, almost all mathematical methods require first-order derivatives or second-order derivatives, or even higher derivatives. When the object function is not too complex, we can compute its derivatives easily, but unfortunately most object functions in the real world are so complex that we cannot compute the derivatives, and even worse, they may have no derivatives at all. In this case, it is very difficult to implement mathematical methods.
What should we do, then, to solve complex, nonlinear, nondifferentiable, and multimodal problems? We can perhaps learn from the nature. Many natural phenomena as well as the activities of animals provide us with inspiration and we are able to imitate these phenomena or activities in order to solve complex problems. Over the last four decades, a large number of methods inspired by nature have been developed in order to solve very complex optimization problems , and these were called metaheuristic algorithms. All these metaheuristic algorithms imitate natural phenomena, such as evolutionary algorithms (EAs), which include genetic algorithms (GAs) , evolutionary programming (EP) , and evolution strategy (ES) , all of which simulate biological evolution. PSO and ant colony optimization (ACO), for instance, mimic the foraging behavior of animals [5, 6], and simulated annealing (SA) simulates physical annealing process . As a metaheuristic algorithm, HS is no exception, it is inspired by the improvisation process of musical performers . Although the algorithms mentioned above imitate different phenomena, they have some common factors: they all have a random process; the solutions that they show us are just approximate results; they all suffer from the “curse of dimensionality” , meaning that the search space will increase exponentially when the number of the dimensions increases. The probabilities of finding the optimality region will thus decrease. One way of overcoming this drawback is to partition higher dimensional search space into lower dimensional subspaces.
Potter and De Jong [10, 11] have proposed a general framework for cooperative coevolutionary algorithms and then they applied the method to the GA (CCGA). They suggested that the higher dimensional solution vectors should be split into smaller vectors, with each vector just being a part of a complete solution, so that they must cooperate with each other to constitute a potential solution. In this paper, we apply Potter and De Jong’s cooperative method to the HS to enhance the performance of standard HS.
The rest of this paper is organized as follows. Section 2 presents an overview of the standard HS and some other improved HS algorithms and Section 3 demonstrates the convergence of HS and explains why the improved HS performs better than the standard HS. In Section 4, the cooperative method is briefly introduced, and then the proposed CHS algorithm is elaborated. Section 5 describes the benchmark functions used to test the new algorithm and the experimental results can be found in Section 6. Finally, the conclusions are given in Section 7.
2. Harmony Search Algorithms
The HS algorithm, originally conceived by Geem et al. in 2001 , was inspired by musical improvisation. There are always three ways open to a musician , when he or she is improvising. The first is when he or she plays a piece of music that he or she remembers exactly; the second is when a musician plays something that is similar to what he or she remembers exactly, the musician possibly being engaged in improvisation based on the original harmony by adjusting the pitch slightly. The last one involves a composition that is new. The process employed by the musicians in order to find the best harmony is likely to be the process of optimization. In fact, the HS algorithm mimics the process used to solve optimization problems, with this algorithm being widely applied in order to solve optimization problems, including water network design [13, 14], PID controller design , Cluster analysis, and function optimization [16, 17]. Several approaches have been taken in order to improve the performance of the standard HS, some of which are discussed in the following subsections.
2.1. The Standard HS (SHS)
When we use an algorithm to solve problems, firstly we must know what the problems are. Assuming that there is an unconstrained optimization problem which can be described as follows: where is the object function, is the set of each decision variable, is the th decision variable, and are the lower and upper bounds of the th decision variable, and is the number of decision variables. To apply SHS to solve the optimization problem mentioned above, five steps should be taken  as follows.
Step 1. Initialize the problem algorithm parameters. The problem parameters like , , should be initialized in this step. In addition, four algorithm parameters should be initialized, including harmony memory size (HMS), harmony memory considering rate (HMCR), pitch adjusting rate (PAR), and the maximum number of improvisation (), or stopping criterion.
Step 2. Initialize the harmony memory. The SHS is similar to GAs. GAs are population based optimization algorithms, but the “population” in SHS is referred to as harmony memory (HM), which is composed of solution vectors. The structure of HM is as follows: where is the th decision variable of the th solution vector. Generally, we initialize the HM randomly. For , it can be generated by using the following equation: where is a random number between 0 and 1.
Step 3. Improvise a new harmony. This step is the core step of SHS. A new harmony vector is improvised according to these three rules: harmony memory consideration; pitch adjustment; randomization. The probabilities of harmony consideration and pitch adjustment are dependent on HMCR and PAR. For instance, the th variable of the new harmony vector can be improvised as follows:
In fact, before the variable is generated, we should generate a random number between 0 and 1, then we compare with HMCR, if ; (4) is used to improvise the variable; otherwise, the variable will be improvised by using (5). For example, if , then the HS algorithm will choose a decision variable from the HM with a 95% probability. And if the variable is chosen from the HM, then it will be adjusted with probability as follows: where is an arbitrary distance bandwidth and is a random number between 0 and 1.
Step 4. Update the HM. If the new improvised harmony vector is better than the worst harmony vector in the HM in terms of the object function value, the worst harmony in the HM is superseded by the new harmony vector.
The pseudocode of SHS is shown in Pseudocode 1.
2.2. Improved HSs
The SHS algorithm was introduced in the last subsection, and in this subsection several improved HSs are introduced. PAR and bw are two important parameters of HS, deciding the accuracy of the solution. As the number of iteration increases, the HM becomes better and the solution is closer to the global optimal position. We should use a smaller bw to adjust the pitches, and this adjustment should be with a higher probability, but all parameters are fixed within the SHS algorithm, meaning that they cannot change. If we choose a low PAR and a narrow bw, the SHS will converge slowly, but on the other hand, if we choose a very high PAR and a wide bw, the SHS will converge fast, although the solution may scatter around some potential optimals as a random search. A dynamic adjustment strategy for parameters, especially for PAR and bw, is therefore very necessary.
A dynamic adjustment strategy for PAR and bw was proposed by Mahdavi et al. in 2007 . They suggested that PAR should increase linearly and bw should index decrease. They can change with generation number as shown in Figure 1 by using the following equations: where and are the pitch adjusting rate and bandwidth for each generation, and are the minimum pitch adjusting rate and bandwidth, and are the maximum pitch adjusting rate and bandwidth, and is the generation number. The drawback of the IHS is that we have to specify the , and , which is essentially very difficult, and we are unable to guess without experience.
The IHS algorithm merely changes the parameters dynamically, while some other improved HSs are capable of changing the search strategy of SHS, such as the global best HS (GHS) proposed by Omran and Mahdavi . The GHS is inspired by the concept of swarm intelligence as proposed in PSO. In PSO, the position of each particle presents a candidate solution, so that, when a particle flies through the search space, the position of the particle is influenced by the best position itself has visited so far and the position of the best particle among the swarm, in other words, there are some similarities between the new position and the best position. In GHS, the new harmony vector can mimic the best harmony in the HM. The difference between GHS and SHS is the pitch adjustment strategy, with the variable being adjusted in SHS using (6), while in GHS it is adjusted using the following equation: where is the index of the best harmony in the HM and .
The results of the experiment show that these two improved harmony search algorithms can find better solutions when compared with SHS.
3. Convergence of HS Algorithm
As a metaheuristic algorithm, HS is capable of finding local optima and the global optimum, but why can it converge on local or global optima, and which operator or parameter may have effects on the speed of convergence? All of these problems are unsolved. In this section, we endeavor to solve the problems.
As we noted in the last section, HS has an operator referred to as “HM updating,” the fourth step of HS. This is an indispensable step for HS, because this operator guarantees the convergence of HS, without it HS may not find even a local minimum. In order to explain this, some definitions and theorems are necessary.
Definition 1 (monotone sequences). A sequence is said to be monotonically increasing provided that
for every natural number .
A sequence is said to be monotonically decreasing provided that for every natural number .
A sequence is called monotone if it is either monotonically increasing or monotonically decreasing.
Theorem 2 (the monotone convergence theorem). A monotone sequence converges if and only if it is bounded.
Proof. Firstly, let us suppose that the sequence is a monotonically increasing sequence. Then we can define , assuming that is bounded above. According to the Completeness Axiom, has a least upper bound, we can call it . We claim that the sequence converges to . To claim this, we must prove that for all integers, and provided that
Since is the least upper bound for the set , so
Furthermore, is not an upper bound for set , so there must be a natural number such that , since the sequence is monotonically increasing, so
We have now proved that a monotonically increasing sequence must converge to its least upper bound and, in the same way, we can also claim that a monotonically decreasing sequence must converge to its greatest lower bound.
To return to the HS algorithm mentioned in the previous section: when HS finds a solution which is better than the worst harmony in the current HM, then the operator “HM updating” is implemented, which means that for the current iteration, the algorithm finds a solution that is not worse than the one that was found by the algorithm in the last iteration. Suppose that the problem to be solved is the minimum problem described as (1), the current solution offered is and the last found solution is , then we have , and if we let the best solution in HM for each iteration constitute a sequence , the sequence must be a monotonically decreasing sequence as in Definition 1, and as we have proved, if the sequence has a greatest lower bound, the sequence must converge to the greatest lower bound.
In fact, for a continuous function, suppose that the function is named , and if , where is the dimension of , then there must be a number and a number (assuming that ) satisfying the following inequality:
So in the case that the object function is continuous and , where and , then the sequence which is composed of the best solution in the HM for the th iteration is bounded and , as we have mentioned above for a minimum problem, the sequence is monotonically decreasing, so must converge.
We have explained that HS must converge, but can it converge to ? In fact, is the global optimum of the object function. From Theorem 2, we know that if is the greatest lower bound, must converge to . The problem is whether is the greatest lower bound for . This is, in fact, difficult to solve. HS is a stochastic algorithm and for each iteration, the best solution in HM is random so that it is not a fixed predictable number, but we are able to calculate the probability that the HM is updated with. An example would be as follows: suppose that the object function is a minimum problem, the expression is which has only one dimension, and , and , for the th iteration, the best solution is (suppose that ), so the HM is updated with probability when or when . The higher the probability is, the more quickly the HM is updated. From the expression, it is clear that the probability decreases by , so that as the iteration number increases, the probability decreases sharply, and the convergence curve becomes flatter.
This might also explain why IHS performs better than SHS. The two parameters and are very important for HS. In SHS, the two parameters are fixed while in IHS, increases linearly and the index decreases, so that, from the expression of the HM updated probability, we can see that this is very reasonable, owing to the fact that this strategy is capable of making the probability decrease slowly, with IHS generally performing better. The results of this experiment are demonstrated in Section 6.
4. CHS Algorithm
4.1. A Brief Overview of Cooperative Method
The natural world is a very complex system, and evolution is one element within it. EAs imitate the evolution of species, but they are just a simple simulation and almost all EAs just mimic the evolution of one species. There are a large number of species on the earth, with some of them being closely associated . In order to survive, different individuals from different species or the same species may compete or cooperate with each other. Cooperative coevolutionary algorithms just act as models of symbiotic coevolution. The individuals are from different species or subpopulations, and they have to cooperate with some others. The difficulty involved in cooperative coevolutationary algorithms is how to split the fitness that is achieved by collective effort definitively [21, 22].
Potter and De Jong applied the method to GA, coming up with CCGA. They evolved the solutions in different subpopulations, and each subpopulation has just one dimension of the solution vector. The pseudocode for the CCGA is given in Pseudocode 2. The initial fitness of each subpopulation member is computed by combining it with a random individual from each of the other subpopulations and then using the object function to evaluate the complete solution. After initializing, one individual of a subpopulation cooperates only with the best individual from each of the other subpopulations.
Potter and De Jong also found that the cooperative approach does not perform well when problem parameters are strongly interdependent, due to the greediness of the credit assignment approach . To reduce greediness, Potter and De Jong suggested that random collaboration should be used to generate a complete solution. An individual from one subpopulation can collaborate with the best individual of each of the other subpopulations and then constitute one complete solution, or the individual can cooperate with one individual of each of the other subpopulations randomly and then generate another solution. The fitness of the individual is equal to the better fitness between the two solutions.
4.2. The CHS
In the CHS algorithm, to reduce the complexity, we use HMs instead of HMs, where is the number of decision variables. The pseudocode for CHS algorithm is shown in Pseudocode 3. Each HM may thus contain more than one decision variables, so that, for each HM, we can use the HS operators including harmony consideration and pitch adjustment to generate a new harmony vector, but the vector is just a part of a complete solution vector, and it must collaborate with harmony vectors from every other HM, and as we have mentioned above, when initializing HMs, the cooperation is random. When calculating a new harmony vector which is improvised by using HS operators, it cooperates only with the best ones of each HM. This is similar to several musicians working together to find the best harmony, with each musician being in charge of a part of the whole symphony, assuming that all of them are selfless. When they work together, each one shares the best harmony that he or she has found by himself or herself. In our daily life, teamwork is very universal, and generally it is more effective than work being carried out alone.
We should ask why it brings better solutions when using cooperative method. There are few mathematical foundations for the method, but this can be explained by using an example. HM updating is very important for the HS algorithm, when the HM is updated, it indicates that a better solution may be found. If the HM is updated with a higher probability or in other words, the algorithm has a higher probability of finding better solutions, it will converge faster. For instance, suppose that the object function is , it has two decision variables and , and , . It is quite obvious that the origin is the global optimum. Assume that the worst solution vector in the initial HM is , which is called point and is shown in Figure 2. If we use the CHS algorithm, two HMs will be used, one HM representing and another representing . In one cycle, the HMs are used one by one and when is used to generate a new harmony vector, this new vector must cooperate with the best one in in order to constitute a complete solution vector, meaning that is fixed when the algorithm searches in the subspace , and it is the same for .
When is fixed, the CHS algorithm is able to find a better solution with probability , and when is fixed, the probability is , so after one cycle, the CHS algorithm has the probability of updating the HM, where , but for SHS algorithm, if the HM is updated, the algorithm must find a solution which is located in the circular region, so the HM is updated with probability , where . Also, , because of , so , which means that the CHS algorithm has a higher probability of updating HM than the SHS algorithm, meaning that the CHS is able to find better solutions than the SHS algorithm.
The cooperative method also has disadvantages, however, with one being that the CHS algorithm is easier to trap in a local minimum. This phenomenon is shown in Figure 3, in which point is a local minimum point and point or the origin is the global minimum point; the area where is located in is local minimum area and the global minimum region is where is located. If the algorithm is trapped in point , when we use the CHS algorithm in every iteration, only one subspace is searched. Assuming that is fixed and is searched, irrespective of the value of , the CHS algorithm has no way of reaching the global minimum region, and it is the same for . In this case, the CHS algorithm will never escape from the local minimum. For the SHS algorithm, however, it is possible to reach the global minimum area, although the probability is low.
5. Experimental Setup
5.1. Benchmark Functions
Ten benchmark functions (five unrotated and five rotated) have been chosen to test the performance of the CHS algorithm, and SHS, his, and GHS algorithms are also tested for the sake of comparison. The unrotated functions are as follows.
The Quadric function is where , . This function is a nonseparable unimodal function, with the global minimum point being .
Ackley’s function is where , . Ackley’s function is a separable multimodal function, with the global optimum being .
The Generalized Rastrigin function is where , . The Rastrigin function is a separable multimodal function, with the global optimum being .
The Generalized Griewank function is where , . This function is a very complex, nonlinear, separable, and multimodal function and is very difficult for optimization algorithms. The global optimum is .
Rosenbrock’s function (or Banana-valley function) is where , . Rosenbrock’s function is a naturally nonseparable function, the global optimum being which is located in a narrow valley; it is very hard for algorithms to find the correct direction and reach to the global minimum area.
The parameters of rotated functions are the same as unrotated functions and all the orthogonal matrices are fixed, meaning that all functions are rotated at a fixed angle.
5.2. Configuration of Algorithms
The iteration number of each experiment is , with all the experiments being ran independently 30 times. The results reported are averages, and the best solutions have been calculated from all 30 runs. The parameters of the algorithms are as follows.(i)Parameters of the SHS algorithm: and . This was suggested by Yang . and . The values of the last two parameters (, ) were suggested by Omran and Mahdavi .(ii)Parameters of the IHS algorithm: and , with increasing linearly over time (by using (7)), and , which were suggested by Omran and Mahdavi . decreases by using (6), and , .(iii)Parameters of the GHS algorithm: , , , and .(iv)Parameters of the CHS algorithm: , , , , , , and the number of groups .
This section shows the experimental results gathered by allowing all of the methods tested to run for a fixed number of function evaluations, that is, . All the results are shown in Table 1 to Table 10, for each table, the first column represents the algorithms used, the second lists the mean error and 95% confidence interval after times iteration, and the third lists the best solution found by the algorithm after 30 runs. We should not forget that all the functions have a minimum function value of 0.
Table 1 shows the experimental results of the unrotated Quadric function; this is a nonseparable function, and it is hard to optimize for HS algorithms, but GHS is better able to solve the problem during the 30 runs; also finds a satisfying solution. Table 2 shows the experimental results of the rotated Quadric function. The results are very different from those shown in Table 1.
The results of the experiment show that the rotated Quadric function is more easily optimized by HS algorithms and that the CHS can always find more satisfying solutions than other HS algorithms. As the number of groups increases, the solution becomes better and better, but at the same time, the algorithm becomes more and more complex. Figure 4 shows the best function value profile of each algorithm both for unrotated and rotated Quadric functions. This figure is a more visual representation. It shows the best results among 30 independent runs for each algorithm. From Figure 4, we can deduce that in unrotated cases, all the algorithms converge very sharply, while in rotated cases, the solutions found by CHS are better.
The results of Ackley’s function are very interesting and very different from the results of the Quadric function. Table 3 shows the experimental results of unrotated Ackley’s function; this is a multimodal function and is easily optimized by HS algorithms. Almost all the algorithms are capable of finding a satisfactory solution, but among the tested algorithms, CHS performs better with both the mean and the best results than the other algorithms, and the higher is, the better the solution is. However, in terms of the rotated Ackley’s function, CHS does not perform as well as it is illustrated in Table 4. In terms of the rotated Ackley’s function, IHS and GHS perform better, especially the IHS algorithm. This can be deduced from Figure 5. When the search space is rotated, all the algorithms perform worse, especially CHS, the performance of which is as bad as SHS. GHS converges very fast to a local optimum, although it is capable of finding a better solution. This is because GHS resembles the standard PSO, which is not good at dealing with multimodal problems.
The Rastrigin function is also a multimodal function with many local optima. In an unrotated case, CHS performs relatively well, especially . It is the performance leader, as shown in Table 5. The results of the experiment are very similar to those observed with unrotated Ackley’s function: the higher is, the better the solution found by CHS will be. In the rotated case, however, none of the algorithms performs so well, as shown in Table 6. For CHS, with the increase of , the mean of the solutions becomes worse, which is the reverse of the unrotated case. Figure 6 shows the best function value profile. From Figure 6(a), we can find that, generally, CHS converges, offering a better solution faster, while Figure 6(b) shows a very different result.
Tables 7 and 8 show the results of the Griewank function, which is a nonlinear multimodal and complex function. In unrotated cases, CHS performs better than both SHS and IHS, but not so well as GHS. Furthermore, when increases, the performance of CHS is very similar. This is very different from the results of unrotated Ackley’s function and unrotated Rastrigin function, which are shown from Table 3 to Table 6. In rotated cases, the mean for each algorithm is very similar, but the best solutions for each algorithm are distinct from each other, as shown in Table 8. We might also observe that the higher is, the better the solution found by CHS. Figure 7 shows the best function value profile for each algorithm. For both the unrotated Griewank function and the rotated Griewank function, SHS and IHS trap in a local optimum easily, but GHS is capable of finding a better solution.
The results of the Rosenbrock function are shown in Tables 9 and 10. The global optimum of the Rosenbrock function is located in a narrow valley so that it is very difficult for the algorithms to find the correct direction in order to reach the global region, and thus, none of the algorithms is satisfactory, especially in rotated cases. This is shown more visually in Figure 8.