Research Article  Open Access
Hierarchical Swarm Model: A New Approach to Optimization
Abstract
This paper presents a novel optimization model called hierarchical swarm optimization (HSO), which simulates the natural hierarchical complex system from where more complex intelligence can emerge for complex problems solving. This proposed model is intended to suggest ways that the performance of HSObased algorithms on complex optimization problems can be significantly improved. This performance improvement is obtained by constructing the HSO hierarchies, which means that an agent in a higher level swarm can be composed of swarms of other agents from lower level and different swarms of different levels evolve on different spatiotemporal scale. A novel optimization algorithm (named ), based on the HSO model, is instantiated and tested to illustrate the ideas of HSO model clearly. Experiments were conducted on a set of 17 benchmark optimization problems including both continuous and discrete cases. The results demonstrate remarkable performance of the algorithm on all chosen benchmark functions when compared to several successful swarm intelligence and evolutionary algorithms.
1. Introduction
Swarm intelligence (SI), which is inspired by the “swarm behaviors” of social animals [1], is an innovative artificial intelligence technique for solving hard optimization problems. In SI system, there are many simple individuals who can interact locally with one another and with their environments. Although such systems are decentralized, local interactions between individuals lead to the emergence of global behaviors or global properties. For instance, flock of birds and school of fish emerge spatial selforganized patterns through social foraging [2]. Similar phenomena can also be observed in colonies of singlecell bacteria, social insects like ants and bees, as well as multicellular vertebrates, which all display collective intelligence [3].
As a problemsolving technique, many algorithmic methods of SI were designed to deal with practical problems. In 1991, Dorigo proposed ant colony optimization (ACO) [4, 5] based on foraging behaviors of ant colonies. ACO has been successfully used to solve discrete optimization problems, like traveling salesman problems (TSP) [6]. After that, another SI algorithm, namely, particle swarm optimization (PSO), was proposed by Kennedy and Eberhart [7], which gleaned ideas from the social behavior of bird flocking and fish schooling [8–10]. PSO is primarily concerned with continuous optimization problems. In 2001, Passino proposed a technique known as bacterial foraging optimization (BFO) that inspired by the pattern exhibited by bacterial foraging behaviors [11]. Other swarm optimization methods have been developed like artificial immune systems (AIS) [12], which are based on the metaphor of the immune system as a collective intelligence process [13]. Recently, Karaboga has described a bee swarm algorithm called artificial bee colony (ABC) algorithm [14], and Basturk and Karaboga compared the performance of ABC algorithm with the performance of genetic algorithm (GA) in [15]. These SI paradigms have already come to be widely used in many areas [8, 16–22].
In current artificial SI systems, however, researchers only take into account the collective behaviors of one level of individuals, and ignored the hierarchical nature [23] of the real world systems and animal society. In fact, for most social insects and animals, their organizational structures are not flat. They can form complex hierarchical (or multilevel) system structures by selforganization and division of labor features [24]. In other words, in a hierarchical system, a swarm of lower level individuals can be the infrastructure of a single individual at the higher level [25, 26]. Here the term “swarm” is used in a general sense to refer to any collection of interacting agents. In most of natural hierarchically complex systems, swarms of lower level agents interact with each other to constitute more complex highlevel swarms’ constituent agents, repeatedly, until very complex structures with greatly enhanced macroscopical intelligence emerge. Such phenomenon is so common in the natural world, this guides us to design a multilevel algorithmic model to mimic hierarchical emergence of nature society.
First, this paper extends the traditional SI framework from flat (one level) to hierarchical (multiple level) by proposing a novel optimization model called hierarchical swarm optimization (HSO). In HSO model, collective behaviors of multiple levels are taken into account to solve complex problems. Then some initial insights into this method are provided by designing a twolevel HSO algorithm (named PS^{2}O) based on canonical PSO model. Four versions of PS^{2}O are realized according to different structures of cooperation and interaction types in each level. In order to evaluate the performance of PS^{2}O, extensive studies based on a set of 17 benchmark functions (including both continuous and discrete cases) have been carried out. For comparison purposes, we also implemented the genetic algorithm (GA), covariance matrix adaptation evolution strategy (CMAES), artificial bee colony algorithm (ABC), and four stateoftheart PSO variants on these functions. The experimental results are encouraging; the PS^{2}O algorithm achieved remarkable search performance in terms of accuracy, robustness, and convergence speed on all benchmark functions.
The rest of the paper is organized as follows. Section 2 describes the proposed hierarchical swarm optimization model. In Section 3, a novel HSObased optimization algorithm, namely, PS^{2}O, is given. Section 4 tests the algorithm on the benchmarks and illustrates the results. Finally, Section 5 outlines the conclusions.
2. Hierarchical Swarm Optimization
2.1. From Flat Swarm to Hierarchical Swarm
In [3], Bonabeau et al. define swarm intelligence as “the emergent collective intelligence of groups of simple agents”. In such a perspective, the artificial SI systems, which are designed for complex problem solving, maintain a swarm made up of many isomorphic and relatively simple individuals that often share the same states and behaviors set. In such a swarm, all individuals have absolutely equal status in the whole life cycle. The interaction relations between these individuals are symmetrical and operate on the same spatiotemporal scale. One individual can be substituted by another one, while the function of the swarm remains steady. That is, the architecture and functionality of classical SI are flat (Figure 1(a)).
(a) Flat structure
(b) Hierarchical structure
However, swarm intelligence only explains partial mechanisms of collective behavior of biology. The natural cases could be more complex: except for individual tasks, these units lie at a hierarchical level between an individual ant and the colony as a whole, and thus constitute what might be called “intermediatelevel parts” [27]. Now consider two basic types of systems: hierarchical and nonhierarchical (flat). Flat systems can be regarded as a group of undifferentiated particles, such as traditional SI systems. Hierarchical swarm systems must have a structure requiring a minimum of two hierarchical levels (Figure 1(b)). In Figure 1(b), a particle is the minimum unit of the system, while an agent constitutes the intermediate hierarchical level, which is composed of a number of particles. In this perspective, it is evident that the complexity of the individual “agents” of SI systems is dramatically simplified to particles, which are the minimum unit of the systems. Hence, the hierarchical nature of swarm [23] is ignored in the traditional artificial SI systems, such as PSO and ACO.
Hierarchy is common in the real world. For examples, immune system antibodies continuously selforganize and evolve while being a part of the many “organism agents” of a bird, and that a bird is in turn an agent in the formation of a flock of birds, and the flock of birds is in turn an agent that is part of a particular ecosystem niche [28]. Genes, the elementary biochemical coding units are complicated macromolecular strings, as are the metabolic units, the proteins. Neurons, the basic elements of cognitive networks, themselves are cells. In any of these examples, it is evident that the interactions of the agents lead to a coherent structure at a higher level [29]. That is, the emergent characteristics of a particular lower level system frequently form an individual agent at a higher level of the hierarchical system. This aspect has been emphasized by many researchers on artificial intelligence and complex systems [23, 25, 29–32].
Hence, this paper strives to extend the traditional SI framework from flat to hierarchical, and propose the hierarchical swarm optimization model. By incorporating these new degrees of complexity, HSObased optimization algorithm can accommodate a considerable potential for solving more complex problems.
2.2. HSO Model Description
HSO model accommodates a hierarchical multiagent system, in which an agent can itself be a swarm of other agents.()HSO is composed of a number of levels. Each level is a multiagent system composed of several swarms of agents.()Each swarm of level agents is aggregated into a level agent.()Level behavior emerges from the organization of level 1 to .
HSO naturally admits of a description in terms of higher level and lower level, where the lower level is nested within the higher level. Any agent at any level is both a component of a given swarm in its own level and a subsystem decomposable into a swarm of other agents at its adjacent lower level of HSO (shown as in Figure 2). Note that the agents in the lowest level are the particles that are the minimum unit, which are indecomposable of this hierarchical system. HSO is a heterogeneous system that each swarm in each level is evolved in its own population and adapts to the environment through the application of any SI method at hand. The interaction topology of HSO can also be heterogeneous hierarchical structures. Namely, the evolution rules and the interaction topology of distinct swarms can be different, and these different SI paradigms hierarchically construct the HSO model and lead to the hierarchical emergence of intelligence. In mathematical terms, the HSO model can be defined as in Table 1.

Figure 3 lists a general description of HSO containing four main functional blocks. In the first block of Figure 3, we show that under the external environment pressure (defined by the object function), each agent in the HSO model evolves and adapts as a consequence of internal and external hierarchical interactions. Both in higher level and lower level, the swarms can be manipulated by different SI algorithms (shown as in blocks 2 and 3 of Figure 3). In principle, any SI algorithms (such as PSO, ACO, BFO, and ABC) can be used by any swarm at any level, and we have first hand experience constructing HSO paradigms using PSO and BFO [33–35]. Interactions that occur within one level (each entity of the interaction is operating on the same spatiotemporal scale) are called symmetrical relations.
On the other hand, asymmetrical relationships occuring between different levels are called “constraints” [32]. The forth block formed the constraint that higher level affects elements from lower level. When an agent of higher level transmits the information to its constituent swarms of other agents of lower level, the effect can be the according evolutional actions of this agent’s swarm of constituent agents.
3. Case Study: The PS^{2}O Algorithm
In this section, we implement a simple twolevel HSO algorithm, which employs PSO method in each swarm of each level, and hence named it PS^{2}O. Here the agents (particles) in the lower level (level1) are analogous to individuals in a biological population (species) and the agents in the higher level (level2) are analogous to species. As the hierarchical interactions that occur in the real ecosystems, from the macro view, dissimilar species establish symbiotic relationships to improve their survivability in level1 of PS^{2}O (i.e., interspecies cooperation); from the micro view, species’ members (the particles) cooperatively interact with each other in level2 of PS^{2}O (i.e., intraspecies cooperation).
3.1. Levels Detail of PS^{2}O
Here the basic goals are to find the minimum of . We create an ecosystem in level1 that contains a species set , and each speciespossesses a member set, , in level2. The th member of the th species is characterized by the vector . In each generation t, the evolution process of each level is detailed as follow.
3.1.1. Level 1
Level1 agents are clustered into swarms, each of which possesses agents. Each swarm constitutes an agent of level2. Each swarm of level1 evolves within its own separate population via separate PSO algorithm. That is, there are M parallel PSO paradigms evolving separately in level1. This process addresses the cooperation between individuals of the same species: within the species k, one or more members in the neighborhood of contribute their experiments to , and also share its knowledge with its neighbors. Then accelerate towards its personal best position and the best position found by its species members in neighborhood: where is the social acceleration vector of , is the personal best position found so far by , is the best position found so far by its neighbors within species , are individual learning rate, are social learning rate, and are two random vectors uniformly distributed in [].
3.1.2. Level 2
All level2 agents aggregate into a single swarm. This swarm of distinct symbiotic species coevolves via the social only version of the PSO [36] as the cognitive processes have already taken care of by the level1 swarms. From the coevolution perspective, the species k accelerates towards the best position that the symbiotic partners of species k have found: where is the symbiotic acceleration vector of , is the best position found so far by the symbiotic partners of the th species, is the “symbiotic learning rate”, and is a uniform random sequence in the range [].
3.1.3. Constraints
When species in level2 accelerates towards the best position, , found by its more successful symbiotic partners, the according evolutional action of this agent’s swarm of constituent agents from level1 is that all the members of species accelerate to too where is the symbiotic acceleration vector of .
Then the velocity and position of each member of species are updated according to where is known as the constriction coefficient [37].
3.2. Hierarchical Interaction Topologies
Systems of interacting agents—like many natural and social systems—are typically depicted by scientists as the graphs or networks, in which Individuals can be connected to one another according to a great number of schemes [38]. In PSO, since the original particle swarm model is a simulation of the social environment, a neighborhood that structured as the interaction topological graph is defined for an individual particle as the subset of particles it is able to communicate with. Four classical interaction topologies have been shown as in Figure 4.
(a)
(b)
(c)
(d)
Most particle swarm implementations use one of two simple interaction topologies. The first, namely, the fullyconnected topology (see Figure 4(a)), conceptually connects all members of the population to one another. The effect of this topology is that each particle is influenced by the very best performance of any member of the entire population. This means faster convergence, which implies a higher risk to converge to a local minimum. Experiments show that the fullyconnected topology is faster than the other neighborhoods, but it meets the optimal fewer times than any other one. The second, called ring topology (see Figure 4(b)), creates a neighborhood for each individual comprising itself and its two nearest neighbors in the population. The ring neighborhood is more robust if the maximum number of iterations was increased but much slower. However, experiments show that the ring neighborhood cannot meet the required precision for many complex problems. That is, it promotes the exploration, but unfortunately fails to provide the exploitation.
In our model, the interaction of agents occurred in a twolevel hierarchical topology. By employing two simple topologies—the ring and the fullyconnected topologies—for swarms in different levels, four hierarchically nested interaction topologies have been obtained. Shown as in Figure 5, each hierarchical topology is comprised of 4 warms in level2 and each swarm possesses 4 agents from level1. The first two topologies have a homogeneous hierarchical structure (employ the ring or fullyconnected topology in both levels) and the other two have the heterogeneous hierarchical structures (employ the ring and fullyconnected topologies in different levels, resp.). Four variant versions of the PS^{2}O algorithms are studied, respectively, in this paper according to these four interaction topologies.(i)PS^{2}OS: in level1, agents interact with each other in each swarm. In level2, each agent is influenced by the performance of all the other agents. That is, swarms of both levels are configured into the fullyconnected topology (Figure 5(a)). (ii)PS^{2}OR: in level1, agents interact with 2 immediate agents in its neighborhood. In level2, each agent is influenced by the performance of its two symbiotic partners only. That is, both levels are configured into the ring topology (Figure 5(b)).(iii)PS^{2}O: In level1, agents interact with each other in each swarm. In level2, each agent is influenced by the performance of its two symbiotic partners only. That is, the level2 is configured into the fullyconnected topology while the each swarm of level1 is configured into the ring topology (Figure 5(c)).(iv)PS^{2}O: In level1, each agent interacts with 2 immediate agents in its neighborhood. In level2, each agent is influenced by the performance of all the other agents. That is, each swarm of the level1 is configured into the ring topology while the level2 is configured into the fullyconnected topology (Figure 5(d)).
(a)
(b)
(c)
(d)
3.3. Matrix Representation
A multidimensional array representation of the PS^{2}O algorithm is proposed in this section. The PS^{2}O randomly initializes a number of M species with each possesses a number of N members to represent the biological community in the natural ecosystems. Then the positions X, velocities , and personal best locations of the biological community are all specified as the threedimensional (3D) matrixes (showed as in Figures 6(a)–6(c)), where the first matrix dimension—Species number—is the number of species in level2, the second matrix dimension—Swarm size—is the number of agents of each swarm in level1, and the third matrix dimension—Dimension—is the number of dimensions of the object problem.
(a)
(b)
(c)
(d)
In PS^{2}O model, in order to update the velocity and position matrixes, every agent in level1 must accelerate to three factors: the previous best position of the agent itself (this factor is called “personal best”), the previous best position of other members in its neighborhood (we named this factor “species best”), and the previous best position found by other species (agents from level2) that have the cooperative symbiotic relation to the species that this agent belongs to (we named this factor “community best”). The species best is represented by a 2D matrix S, which showed as in Figure 6(d) left, and the community best is referred to as a 1D matrix C, which showed as in Figure 6(d) right.
, and C matrixes together record all of the update information required by the PS^{2}O algorithm. These 3D matrixes are elegantly updated in successive iteration to numerically model the hierarchical emergence. The velocity and position matrixes must be updated element by element in each generation as. to obtain the intended behaviors. Note that these equations are exactly described in the previous section: the term associates with each individual’s own cognition, the term associates with cooperative coevolution within each swarm in level1, and the term associates with the symbiotic coevolution between dissimilar species in level2.
The main difference between PS^{2}O and PSO is the matrix implementation and the modified velocity updating equation, that is, the complexity of this new HSO algorithm is similar to the original PSO. The flowchart of the PS^{2}O algorithm is presented in Figure 7, and according variables used in PS^{2}O are summarized in Table 2.

4. Experimental Result and Discussion
In experimental studies, according to the no free lunch (NFL) theorem [39], a set of 17 benchmark functions (with continuous and discrete characters), which are listed in the appendix, was employed to fully evaluate the performance of the PS^{2}O algorithm without a biased conclusion towards some chosen problems.
4.1. Experimental Setting
Experiments were conducted with four variations of PS^{2}O (PS^{2}Os) according to the four hierarchical interaction topologies. To fully evaluate the performance of the proposed PS^{2}O, seven successful EA and SI algorithms were used for comparisons:(i)canonical PSO with constriction factor (PSO) [37],(ii)fully informed particle swarm (FIPS) [40],(iii)unified particle swarm (UPSO) [41],(iv)fitnessDistanceRatiobased PSO (FDRPSO) [42],(v)standard genetic algorithm (GA) [43],(vi)covariance matrix adaptation evolution strategy (CMAES) [44],(vii)artificial bee colony algorithm (ABC) [15].
Among these optimization tools, GA is the classical search technique that enables the fittest candidate among discrete strings to survive and reproduce based on random information search and exchange imitating the natural biological selection; the underlying idea of CMAES is to gather information about successful search steps, and to use that information to modify the covariance matrix of the mutation distribution in a goaldirected, derandomized fashion; ABC is a recently developed SI paradigm simulating foraging behavior of bees; UPSO combined the global version and local version PSO together to construct a unified particle swarm optimizer; FIPS used all the neighbors’ knowledge of the particle to update the velocity; when updating each velocity dimension, the FDRPSO selects one other particle , which has a higher fitness value and is nearer to the particle being updated.
In all experiments in this section, the values of the common parameters used in each algorithm such as population size and total generation number were chosen to be the same. Population size was 150 and the maximum evaluation number was 10000 for continuous functions and 1000 for discrete functions.
According to Clerc’s method [37], when constriction factor is implemented as in the canonical PSO algorithm, is calculated from the values of the acceleration coefficients (i.e., the learning rate) and; importantly, it is the sum of these two coefficients that determines what to use. This fact implies that the particle’s velocity can be adjusted by any number of terms, as long as the acceleration coefficients sum to an appropriate value. Thus, the constriction factor in velocity formula of PS^{2}O can be calculated by where Then the algorithm will behave properly, at least as far as its convergence and explosion characteristics, whether all of is allocated to one term, or it is divided into thirds, fourths, and so forth. Hence, for each PS^{2}O, except when different interaction topologies are used, the parameters were set to the values (i.e., ) and then = 0.729, which is calculated by (4.1).
All the control parameters for the other algorithms were set to be default of their original literatures. In continuous optimization experiment, for CMAES, initialization conditions are the same as in [44], and the number of offspring candidate solutions generated per time step is ; for ABC, the limit parameter is set to be , where is the dimension of the problem and is the number of employed bees; for canonical PSO and UPSO, the learning rates and were both 2.05 and the constriction factor = 0.729; for FIPS, the constriction factor equals to 0.729 and the Uring topology that achieved highest success rate is used; for FDRPSO, the inertia weight ω started at 0.9 and ended at 0.5 and a setting of was adopted.
Since there are no literatures using CMAES, ABC, UPSO, FIPS, and FDRPSO for discrete optimization so far, discrete optimization experiment just compares PS^{2}Os with the binary version of canonical PSO and standard GA in discrete cases. For GA, singlepoint crossover operation with the rate of 0.8 was employed and mutation rate was set to be 0.01. For discrete PSO, the parameters were set to the values and = 1. For PS^{2}O variants, the parameters were set to the values and = 1. The sigmoid function was used as the transfer function to discrete the position X of PSO and PS^{2}O variants [45]. Then the velocity update equation remains unchanged, while the position update equation is defined by the following equation (4.2) for discrete problems:
The number of agents (species) in level2 (i.e., swarm number M of the level2 swarm) needs be tuned. Six benchmark functions—Sphere 10D, Rosenbrock 10D, Rastrigrin 10D, Goldberg 120D, Bipolar 60D, and Discrete multimodal problem 100D—are used to investigate the impact of this parameter. Experiments were executed with PS^{2}OR on Sphere, PS^{2}O on Rosenbrock, PS^{2}OS on Rastrigrin, PS^{2}OR on Goldberg, PS^{2}O on Bipolar, and PS^{2}OS on Discrete multimodal problem by changing the number of swarms and fixing each swarm size at 10. The average test results obtained from 30 runs are plot in Figure 8. For continuous problems, the performance measurement is the average bestsofar fitness value; while for discrete cases, the performance measurement is the mean iteration to the function minimum 0. From Figure 8, we can observe that the performance of PS^{2}Os is sensitive to the number of agents in level2. When increased, we obtained faster convergence velocity and better results on all test functions. However, it can be observed that the performance improvement is not evident when for most test functions. Thus, in our experiments, the parameter of PS^{2}Os is set at 15 for all test functions (i.e., each swarm of level2 possesses agents of level 1).
(a)
(b)
(c)
(d)
(e)
(f)
The experiment runs 30 times, respectively, for each algorithm on each benchmark function. The numbers of generations for the 10 continuous benchmark functions were set to be 10000 and for the 7 discrete functions were 1000, respectively.
4.2. Continuous Unimodal Functions
Unimodal problems have been adopted to assess the convergence rates of optimization algorithms. We test the four PS^{2}O variants on a set of unimodal functions () in comparison with CMAES, ABC, PSO, FIPS, UPSO, and FDRPSO algorithms. Table 3 lists the experimental results (i.e., the mean and standard deviations of the function values found in 30 runs) for each algorithm on . Figure 9 shows the search progress of the average values found by the eight algorithms over 30 runs for .

(a)
(b)
(c)
(d)
(e)
From Table 3 and Figure 9, the four PS^{2}O variants converged much faster to significantly better results than all other algorithms. The PS^{2}O, which has the heterogeneous hierarchical structures, is the fastest one for finding good results within relatively few generations. All PS^{2}O variants were able to consistently find the minimum to functions , , and within 10000 generations.
From the comparisons between PS^{2}O and other algorithms, we can see that, statistically, PS^{2}O has significantly better performance on continuous unimodal functions . From the rank values presented in Table 2, the search performance of the algorithms tested here is ordered as PS^{2}OS PS^{2}OR CMAES PS^{2}O PS^{2}O UPSO > FDRPSO ABC FIPS PSO.
4.3. Continuous Multimodal Functions
The first four multimodal functions are regarded as the most difficult functions to optimize since the number of local minima increases exponentially as the function dimension increases. According to the results reported in [22], the methods CLPSO, PSO, CMAES, G3PCX, DE, and the algorithms used for comparison all failed to find the minimum of the six composition function designed by Liang. Since these mentioned methods have demonstrated their excellent performance on standard benchmark functions, the six composition functions are very complex. In this paper, we only test PS^{2}O on the first composition function and the test on the other five composition functions will be studied in the future works. The mean and standard deviations of the function values found in 30 runs for each algorithm on each function are listed in Table 4. Figure 10 shows the search progress of the average values found by the ten algorithms over 30 runs for functions .

(a)
(b)
(c)
(d)
(e)
From Table 4 and Figure 10, it is clear to see that for most of the tested continuous benchmark functions, all PS^{2}O algorithms except PS^{2}O markedly outperformed the other algorithms. For example, PS^{2}OR and PS^{2}O found the global minimum every time of run on function , and PS^{2}OR can also consistently found the minimum of within relatively fewer generations, while the other algorithms generated poorer results on them. On functions and , the four PS^{2}O algorithms yielded similar results to the other four algorithms. From the rank values presented in Table 4, the search performance of the algorithms tested here is ordered as PS^{2}OR PS^{2}O ABC PS^{2}OS FIPS CMAES FDRPSO PSO UPSO PS^{2}O.
It should be mentioned that PS^{2}O were the only ones able to consistently find the minimum to the composition function that reported in the literatures so far.
4.4. Discrete Functions
In binary optimization, it is very easy to design some algorithms that are extremely good on some benchmarks (and extremely bad on some others) [46]. In order to fully evaluate the performance of PS^{2}O on discrete problems, we have employed a carefully chosen set of discrete benchmark functions