Artificial Intelligence and Its ApplicationsView this Special Issue
Research Article | Open Access
Wolf Pack Algorithm for Unconstrained Global Optimization
The wolf pack unites and cooperates closely to hunt for the prey in the Tibetan Plateau, which shows wonderful skills and amazing strategies. Inspired by their prey hunting behaviors and distribution mode, we abstracted three intelligent behaviors, scouting, calling, and besieging, and two intelligent rules, winner-take-all generation rule of lead wolf and stronger-survive renewing rule of wolf pack. Then we proposed a new heuristic swarm intelligent method, named wolf pack algorithm (WPA). Experiments are conducted on a suit of benchmark functions with different characteristics, unimodal/multimodal, separable/nonseparable, and the impact of several distance measurements and parameters on WPA is discussed. What is more, the compared simulation experiments with other five typical intelligent algorithms, genetic algorithm, particle swarm optimization algorithm, artificial fish swarm algorithm, artificial bee colony algorithm, and firefly algorithm, show that WPA has better convergence and robustness, especially for high-dimensional functions.
Global optimization is a hot topic with applications in many areas, such as science, economy, and engineering. Generally; unconstrained global optimization problems can be formulated as follows: where is a real-valued objective function, , and is the number of parameters to be optimized.
As many real-world problems are becoming increasingly complex; global optimization, especially using traditional methods, is becoming a challenging task . Because of its great search space, high-dimensional global optimization problems are more difficult . Fortunately, many algorithms inspired by nature have become powerful tools for these problems [3–5]. Since, with long time of biological evolution and natural selection, there are many marvelous swarm intelligence phenomenons in nature, which are wonderful and can give us endless inspiration. The remarkable swarm behavior of animals such as swarming ants, schooling fish, and flocking birds has for long captivated the attention of naturalists and scientists . People have developed many intelligent optimization methods to solve complex global problems in recent decades. In 1995, inspired by social behavior and movement dynamics of birds, Kennedy proposed the particle swarm optimization algorithm (PSO) . In 1996, inspired by social division and foraging behavior of ant colonies, Dorigo proposed the ant colony optimization algorithm (ACO) . In 2002, inspired by foraging behavior of fish schools, Li proposed the artificial fish swarm algorithm (AFSA) . In 2005, motivated by the intelligent foraging behavior of honeybee swarms, Karaboga proposed the artificial bee colony (ABC) algorithm . In 2008, based on the flashing behavior of fireflies, Doctor Yang proposed firefly algorithm (FA) . Researchers even give some conceptions of swarm intelligent algorithms such as rats herds algorithm, mosquito swarms algorithm, and dolphins herds algorithm . Birds, fishes, ants, and bees do not have any human complex intelligence such as logical reasoning and synthetic judgment, but under the same aim, food, they stand out powerful swarm intelligence through constantly adapting environment and mutual cooperation, which give us many new ideas for solving complex problems.
The wolf pack is marvelous. Harsh living environment and constant evolution for centuries have created their rigorous organization system and subtle hunting behavior. Wolves tactics of Mongolia cavalry in Genghis Khan period, submarine tactics of Nazi Admiral Doenitz in World War II and U.S. military wolves attack system for electronic countermeasures all highlight great charm of their swarm intelligence.  proposes a wolf colony algorithm (WCA) to solve the optimization problem. But the accuracy and efficiency of WCA are not good enough and easily fall into local optima, especially for high-dimensional functions. So, in this paper, we reanalyzed collaborative predation behavior and prey distribution mode of wolves and proposed a new swarm intelligence algorithm, called wolf pack algorithm (WPA); Moreover, the efficiency and robustness of the new algorithm were tested by compared experiments.
The remainder of this paper is structured as follows. In Section 2, the predation behaviors and prey distribution of wolves are analyzed. In Section 3, WPA is described. Section 4 describes the experimental setup, followed by experimental results and analysis. Finally, conclusion and future work are presented in Section 5.
2. System Analyzing of Wolf Pack
Wolves are gregarious animals and have clearly social work division. There is a lead wolf; some elite wolves act as scouts and some ferocious wolves in a wolf pack. They cooperate well with each other and take their respective responsibility for the survival and thriving of wolf pack.
Firstly, the lead wolf, as a leader under the law of the jungle, is always the smartest and most ferocious one. It is responsible for commanding the wolves and constantly making decision by evaluating surrounding situation and perceiving information from other wolves. These can avoid the wolves in danger and command the wolves to smoothly capture prey as soon as possible.
Secondly, the lead wolf sends some elite wolves to hunt around and look for prey in the probable scope. Those elite wolves are scouts. They walk around and independently make decision according to the concentration of smell left by prey; and higher concentration means the prey is closer to the wolves. So they always move towards the direction of getting stronger smell.
Thirdly, once a scout wolf finds the trace of prey, it will howl and report that to lead wolf. Then the lead wolf will evaluate this situation and make a decision whether to summon the ferocious wolves to round up the prey or not. If they are summoned, the ferocious wolves will move fast towards the direction of the scout wolf.
Fourthly, after capturing the prey, the prey is not distributed equitably, but in an order from the strong to the weak. That is to say that, the stronger the wolf is, the more the food it will get is. Although this distribution rule will make some weak wolf dead for lack of food, it makes sure that the wolves that have the ability to capture prey get more food so as to keep being strong and can capture more prey successfully in the next time. The rule avoids that the whole pack starves to death and ensures its continuance and proliferating. In what follows, the author made detailed description and realization for the above intelligent behaviors and rules.
3. Wolf Pack Algorithm
3.1. Some Definitions
If the predatory space of the artificial wolves is a Euclidean space, is the number of wolves, is the number of variables. The position of one wolf is a vector , and is the th variable value of the th artificial wolf. represents the concentration of prey’s smell perceived by artificial wolves, which is also the objective function value.
The distance between two wolves and is described as . Several distance measurements can be selected according to specific problems. For example, hamming distance can be used in WPA for 0-1 discrete optimization, while Manhattan distance (MD) and Euclidean distance (ED) can be used in WPA for continuous numerical function optimization. In this paper, we mainly discuss the latter problem, and the selection of distance measurements will be discussed in Section 4.2.1. Moreover, because the problems of maximum value and minimal value can convert to each other, only the maximum value problem is discussed in what follows.
3.2. The Description of Intelligent Behaviors and Rules
The cooperation between lead wolf, scout wolves, and ferocious wolves makes nearly perfect predation, while prey distribution from the strong to the weak makes the wolf pack thrives towards the direction of the prey that it most probably can be able to capture. The whole predation behavior of wolf pack is abstracted three intelligent behaviors, scouting, calling, and besieging behavior, and two intelligent rules, winner-take-all generating rule for the lead wolf and the stronger-survive renewing rule for the wolf pack.
The winner-take-all generating rule for the lead wolf: the artificial wolf with the best objective function value is lead wolf. During each iteration, compare the function value of the lead wolf with the best one of other wolves; if the value of lead wolf is not better, it will be replaced. Then the best wolf becomes lead wolf. Rather than acting the three intelligent behaviors, the lead wolf directly goes into the next iteration until it is replaced by other better wolf.
Scouting behavior: S_num elite wolves except the lead wolf are considered as the scout wolves; they search the solution in predatory space. is the concentration of prey smell perceived by the scout wolf . is the concentration of prey smell perceived by the lead wolf.
If , that means the scout wolf is nearer to the prey and probably captures prey, so the scout wolf becomes lead wolf and .
If , the scout wolf , respectively, takes astep towards different directions; the step length is . After taking a step towards the direction, the state of the scout wolf is formulated below:
It should be noted that is different for each wolf because of their different seeking ways. So is randomly selected in and it must be an integer. is the concentration of prey smell perceived by the scout wolf and represents the one after it took a step towards the th direction. If , the wolf steps forward and its position is updated. Then repeat the above until or the maximum number of repetitions is reached.
Calling behavior: the lead wolf will howl and summon ferocious wolves to gather around the prey. Here, the position of the lead wolf is considered as the one of the prey so that the ferocious wolves aggregate towards the position of lead wolf. is the step length; is the position of artificial lead wolf in the th variable space at the th iteration. The position of the ferocious wolf in the th iterative calculation is updated according to the following equation:
This formula consists of two parts; the former is the current position of wolf , which represents the foundation for prey hunting; the latter represents the aggregate tendency of other wolves towards the lead wolf, which shows the lead wolf’s leadership to the wolf pack.
If , the ferocious wolf becomes lead wolf and ; then the wolf takes the calling behavior; If , the ferocious wolf keeps on aggregating towards the lead wolf with a fast speed until ; the wolf takes besieging behavior. is the distance between the wolf and the lead wolf is the distance determinant coefficient as a judging condition, which determine whether wolf changes state from aggregating towards the lead wolf to besieging behavior. The different value of will affect algorithmic convergence rate. There will be a discussion in Section 4.2.2.
Calling behavior shows information transferring and sharing mechanism in wolf pack and blends the idea of social cognition.
Besieging behavior: after large-steps running towards the lead wolf, the wolves are close to the prey, then all wolves except the lead wolf will take besieging behavior for capturing prey. Now, the position of lead wolf is considered as the position of prey. In particular, reprensents the position of prey in the th variable space at the th iteration. The position of wolf is updated according to the following equation:
is a random number uniformly distributed at the interval ; is the step length of wolf when it takes besieging behavior. is the concentration of prey smell perceived by the wolf and represents the one after it took this behavior. If , the position is updated; otherwise it not changed.
There are , , and in the three intelligent behaviors, and the three-step length in th variable space should have the following relationship:
is step coefficient and represents the fineness degree of artificial wolf searching for prey in resolution space.
The stronger-survive renewing rule for the wolf pack: the prey is distributed from the strong to the weak, which will result in some weak wolves dead. The algorithm will generate wolves while deleting wolves with bad objective function values. Specifically, with the help of the lead wolf’s hunting experience, in the th variable space, position of the th one of wolves is defined as follows:
is the position of artificial lead wolf in the th variable space, rand is a random number uniformly distributed at the interval .
When the value of is larger, it is better for sustaining wolf’s diversity and making the algorithm have the ability to open up new resolution space. But if is too large, the algorithm will nearly be a random search approach. Because the number and scale of prey captured by wolves are different in natural word, which will lead to different number of weak wolf dead. is an integer and randomly selected at the interval . is the population renewing proportional coefficient.
3.3. Algorithm Description
As described in the previous section, WPA has three artificial intelligent behaviors and two intelligent rules. There are scouting behavior, calling behavior, and besieging behavior and winner-take-all rule for generating lead wolf and the stronger-survive renewing rule for wolf pack.
Firstly, the scouting behavior accelerates the possibility that WPA can fully traverse the solution space; Secondly, the winner-take-all rule for generating lead wolf and the calling behavior make the wolves move towards the lead wolf whose position is the nearest to the prey and most likely capturing prey. The winner-take-all rule and calling behavior also make wolves arrive at the neighborhood of the global optimum only after a few iterations elapsed, since the step of wolves in calling behavior is the largest one. Thirdly, with a small step, , besieging behavior makes WPA algorithm have the ability to open up new solution space and carefully search the global optima in good solution area. Fourthly, with the help of stronger-survive renewing rule for the wolf pack, the algorithm can get several new wolves whose positions are near the best wolf, lead wolf, which allows for more latitude of search space to anchor the global optimum while keeping population diversity in each iteration.
All the above make WPA possesses superior performance in accuracy and robustness, which will be seen in Section 4.
Having discussed all the components of WPA, the important computation steps are detailed below.
Step 1 (initialization). Initialize the following parameters, the initial position of artificial wolf , the number of the wolves , the maximum number of iterations , the step coefficient , the distance determinant coefficient , the maximum number of repetitions in scouting behavior , and the population renewing proportional coefficient .
Step 2. The wolf with best function value is considered as lead wolf. In practical computation, , which means that wolves except for lead wolf act with different behavior as different status. So, here, except for lead wolf, according to formula (2), the rest of the wolves firstly act as the artificial scout wolves to take scouting behavior until or the maximum number of repetition is reached and then go to Step 3.
Step 3. Except for the lead wolf, the rest of the wolves secondly act as the artificial ferocious wolves and gather towards the lead wolf according to (3); is the smell concentration of prey perceived by wolf ; if , go to Step 2; otherwise the wolf continues running until ; then go to Step 4.
Step 4. The position of artificial wolves who take besieging behavior is updated according to (4).
Step 5. Update the position of lead wolf under the winner-take-all generating rule and update the wolf pack under the population renewing rule according to (6).
Step 6. If the program reaches the precision requirement or the maximum number of iterations, the position and function value of lead wolf, the problem optimal solution, will be outputted; otherwise go to Step 2.
So the flow chart of WPA can be shown as Figure 1.
4. Experimental Results
The ingredients of the WPA method have been described in Section 3. In this section, the design of experiments is explained, sensitivity analysis of parameters on WPA is explored, and the empirical results are reported, which compare the WPA approach with those of GA, PSO, ASFA, ABC, and FA.
4.1. Design of the Experiments
4.1.1. Benchmark Functions
In order to evaluate the performance of these algorithms, eight classical benchmark functions are presented in Table 1. Though only eight functions are used in this test, they are enough to include some different kinds of problems such as unimodal, multimodal, regular, irregular, separable, nonseparable and multidimensional.
|: dimension; C: characteristic; U: unimodal; M: multimodal; S: separable; N: nonseparable.|
If a function has more than one local optimum, this function is called multimodal. Multimodal functions are used to test the ability of algorithms to get rid of local minima. Another group of test problems is separable or nonseparable functions. A -variable separable function can be expressed as the sum of functions of one variable, such as Sumsquares and Rastrigin. Nonseparable functions cannot be written in this form, such as Bridge, Rosenbrock, Ackley, and Griewank. Because nonseparable functions have interrelation among their variable, these functions are more difficult than the separable functions.
In Table 1, characteristics of each function are given under the column titled . In this column, means that the function is multimodal, while means that the function is unimodal. If the function is separable, abbreviation is used to indicate this specification. Letter refers to that the function is nonseparable. As seen from Table 1, 4 functions are multimodal, 4 functions are unimodal, 3 functions are separable, and 5 functions are nonseparable.
The variety of functions forms and dimensions make it possible to fairly assess the robustness of the proposed algorithms within limit iteration. Many of these functions allow a choice of dimension, and an input dimension ranging from 2 to 200 for test functions is given. Dimensions of the problems that we used can be found under the column titled . Besides, initial ranges, formulas, and global optimum values of these functions are also given in Table 1.
4.1.2. Experimental Settings
In this subsection, experimental settings are given. Firstly, in order to fully compare the performance of different algorithms, we take the simulation under the same situation. So the values of the common parameters used in each algorithm such as population size and evaluation number were chosen to be the same. Population size was 100 and the maximum evaluation number was 2000 for all algorithms on all functions. Additionally, we follow the parameter settings in the original paper of GA, PSO, AFSA, ABC, and FA; see Table 2.
For each experiment, 50 independent runs were conducted with different initial random seeds. To evaluate the performance of these algorithms, six criteria are given in Table 3.
Accelerating convergence speed and avoiding the local optima have become two important and appealing goals in swarm intelligent search algorithms. So, as seen in Table 3, we adopted criteria best, mean, and standard deviation to evaluate efficiency and accuracy of algorithms and adopted criteria Art, Worst, and SR to evaluate convergence speed, effectiveness, and robustness of six algorithms.
Specifically speaking, SR provides very useful information about how stable an algorithm is. Success is claimed if an algorithm successfully gets a solution below a prespecified threshold value with the maximum number of function evaluations . So, to calculate the success rate, an error accuracy level must be set ( also used in ). Thus, we compared the result with the known analytical optima and consider to be “successful” if the following inequality holds:
The SR is a percentage value that is calculated as
Art is the average value of time once an algorithm gets a solution satisfying the formula (7) in 50-run computations. Art also provides very useful information about how fast an algorithm converges to certain accuracy or under the same termination criterion, which has important practical significance.
All algorithms have been tested in Matlab 2008a over the same Lenovo A4600R computer with a Dual-Core 2.60 GHz processor, running Windows XP operating system over 1.99 Gb of memory.
4.2. Experiments 1: Effect of Distance Measurements and Four Parameters on WPA
In order to study the effect of two distance measures and four parameters on WPA, different measures and values of parameters were tested on typical functions listed in Table 1. Each experiment, WPA algorithm that runs 50 times on each function, and several criteria described in Section 4.1.2 are used. The experiment is conducted with the original coefficients shown in Table 9.
4.2.1. Effect of Distance Measurements on the Performance of WPA
This subsection will investigate the performance of different distance measurements using functions with different characteristics. As is known to all, Euclidean distance (ED) and Manhattan distance (MD) are the two most common distance metrics in practical continuous optimization. In the proposed WPA, MD or ED can be adopted to measure the distance between two wolves in the candidate solution space. Therefore, a discussion about their impacts on the performance of WPA is needed.
There are two wolves: is the position of wolf is the position of wolf , and the ED and MD between them can be, respectively, calculated as formula (9). is the dimension number of solution space
The statistical results obtained by WPA after 50-run computation are shown in Table 4. Firstly, we note that WPA with Euclidean distance (WPA_ED) does not get 100% success rate on Colville and Griewank functions , while WPA with Manhattan distance (WPA_MD) does not get 100% success rate on Griewank functions , which means that WPA_ED and WPA_MD with original coefficients still have the risk of premature convergence to local optima.
As seen from Table 4, WPA is not very sensitive to two distance measurements on most functions (Rosenbrock, Sphere, Sumsquares, Booth, and Ackley), and no matter which metric is used, WPA can always get a good result with SR = 100%. But, for these functions, comparing the results between WPA_MD and WPA_ED in detail, we can find that WPA_MD has shorter average reaching time (ARt), which means faster convergence speed to a certain accuracy. The reason may be that ED has the higher computational complexity. Meanwhile, WPA_MD has better performance on other four criteria (best, worst, mean, and StdDev), which means better solution accuracy and robustness.
Naturally, because of its better efficiency, precision, and robustness, WD is more suitable for WPA. So the WPA algorithm used in what follows is WPA_MD.
4.2.2. Effect of Four Parameters on the Performance of WPA
In this subsection, we investigate the impact of the parameters , , , and on the new algorithm. is the step coefficient, is the distance determinant coefficient, is the maximum number of repetitions in scouting behavior, and is the population renewing proportional coefficient. The parameters selection procedure is performed in a one-factor-at-a-time manner. For each sensitivity analysis in this section, only one parameter is varied each time, and the remaining parameters are kept at the values suggested by the original estimate listed in Table 9. The interaction relation between parameters is assumed unimportant.
Each time one of the WPA parameters is varied in a certain interval to see which value within this internal will result in the best performance. Specifically, the WPA algorithm also runs 50 times on each case.
Table 5 shows the sensitivity analysis of the step coefficient . All results are shown in the form of Mean ± Std (SR/%). The choice of interval used in this analysis was motivated by the original Nelder-Mead simplex search procedure, where a step coefficient greater than 0.04 was suggested for general usage.
Meanwhile, based on detailed comparison of the results, on Rosenbrock, Sphere, and Bridge functions, step coefficient is not sensitive to WPA, and for Booth function there is a tendency of better results with larger . From Table 5, it is found that a step coefficient setting at 0.12 returns the best result which has better Mean, small Std, and SR = 100% for all functions.
Tables 6–8 analyze sensitivity of , , and . Generally speaking, , , and are not sensitive to most functions except Griewank function, since Griewank not only is a high-dimensional function for its 100 parameters, but also has very large search space for its interval of , which is hard to optimized.