Journal of Artificial Evolution and Applications 
Volume 2008 (2008), Article ID 143624, 14 pages
doi:10.1155/2008/143624
Research Article

Geometric Particle Swarm Optimization

Alberto Moraglio,1 Cecilia Di Chio,2 Julian Togelius,3 and Riccardo Poli2

1Centre for Informatics and Systems of the University of Coimbra, Polo II - University of Coimbra, Coimbra 3030-290, Portugal
2Department of Computing and Electronic Systems, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
3Dalle Molle Institute for Artificial Intelligence (IDSIA), Galleria 2, Manno-Lugano 6928, Switzerland

Received 21 July 2007; Accepted 4 December 2007

Recommended by T. Blackwell

Abstract

Using a geometric framework for the interpretation of crossover of recent introduction, we show an intimate connection between particle swarm optimisation (PSO) and evolutionary algorithms. This connection enables us to generalise PSO to virtually any solution representation in a natural and straightforward way. The new Geometric PSO (GPSO) applies naturally to both continuous and combinatorial spaces. We demonstrate this for the cases of Euclidean, Manhattan and Hamming spaces and report extensive experimental results. We also demonstrate the applicability of GPSO to more challenging combinatorial spaces. The Sudoku puzzle is a perfect candidate to test new algorithmic ideas because it is entertaining and instructive as well as being a nontrivial constrained combinatorial problem. We apply GPSO to solve the Sudoku puzzle.

1. Introduction

Particle swarm optimization (PSO) is a relatively recently devised population-based stochastic global optimization algorithm [1]. PSO has many similarities with evolutionary algorithms, and has also proven to have robust performance over a variety of difficult optimization problems. However, the original formulation of PSO requires the search space to be continuous and the individuals to be represented as vectors of real numbers.

There is a number of extensions of PSO to combinatorial spaces with various degrees of success [2, 3]. (Notice that applications of traditional PSO to combinatorial optimization problems cast as continuous optimization problems are not extensions of the PSO algorithm.) However, every time a new solution representation is considered, the PSO algorithm needs to be rethought and adapted to the new representation. In this article, we extend PSO to richer spaces by making use of a rigorous mathematical generalization of the notion (and motion) of particles to a general class of spaces. This approach has the advantage that a PSO can be derived in a principled way for any search space belonging to the given class.

In particular, we show formally how a general form of PSO (without the inertia term) can be obtained by using theoretical tools developed for evolutionary algorithms with geometric crossover and geometric mutation. These are representation-independent operators that generalize many pre-existing search operators for the major representations, such as binary strings [4], real vectors [4], permutations [5], syntactic trees [6], and sequences [7]. (The inertia weight was not part of the original proposal of PSO, it was later introduced by Shi and Eberhart [8].)

Firstly, we formally derive geometric PSOs (GPSOs) for Euclidean, Manhattan, and Hamming spaces and discuss how to derive GPSOs for virtually any representation in a similar way. Then, we test the GPSO theory experimentally: we implement the specific GPSO for Euclidean, Manhattan, and Hamming spaces and report extensive experimental results showing that GPSOs perform very well.

Finally, we also demonstrate that GPSO can be specialized easily to nontrivial combinatorial spaces. In previous work [9], we have used the geometric framework to design an evolutionary algorithm to solve the Sudoku puzzle and obtained very good experimental results. Here, we apply GPSO to solve the Sudoku puzzle.

In Section 2, we introduce the geometric framework and introduce the notion of multiparental geometric crossover. In Section 3, we recast PSO in geometric terms and generalize it to generic metric spaces. In Section 4, we apply these notions to the Euclidean, Manhattan, and Hamming spaces. In Section 5, we discuss how to specialize the general PSO automatically to virtually any solution representation using geometric crossover. Then, in Section 6, we report experimental results with the GPSOs for Euclidean, Manhattan, and Hamming spaces, and we compare them with a traditional PSO. In Section 7, we apply GPSO to Sudoku, and we describe the results in Section 8. Finally, in Section 9, we present conclusions and future work.

2. Geometric Framework

Geometric operators are defined in geometric terms using the notions of line segment and ball. These notions and the corresponding genetic operators are well defined once a notion of distance in the search space is defined. Defining search operators as functions of the search space is opposite to the standard way [10] in which the search space is seen as a function of the search operators employed.

2.1. Geometric Preliminaries

In the following, we give necessary preliminary geometric definitions and extend those introduced in [4]. For more details on these definitions, see [11].

The terms distance and metric denote any real valued function that conforms to the axioms of identity, symmetry, and triangular inequality. A simple connected graph is naturally associated to a metric space via its path metric: the distance between two nodes in the graph is the length of a shortest path between the nodes. Distances arising from graphs via their path metric are called graphic distances. Similarly, an edge-weighted graph with strictly positive weights is naturally associated to a metric space via a weighted path metric.

In a metric space , a closed ball is a set of the form , where and is a positive real number called the radius of the ball. A line segment is a set of the form , where are called extremes of the segment. Metric ball and metric segment generalize the familiar notions of ball and segment in the Euclidean space to any metric space through distance redefinition. In general, there may be more than one shortest path (geodesic) connecting the extremes of a metric segment: the metric segment is the union of all geodesics.

We assign a structure to the solution set by endowing it with a notion of distance . is, therefore, a solution space and , where is the fitness function, is the corresponding fitness landscape.

2.2. Geometric Crossover

Definition 1 (geometric crossover). A binary operator is a geometric crossover under the metric if all offsprings are in the segment between its parents.

The definition is representation-independent and, therefore, crossover is well defined for any representation. Being based on the notion of metric segment, crossover is a function only of the metric associated with the search space.

The class of geometric crossover operators is very broad. For vectors of reals, various types of blend or line crossovers, box recombinations, and discrete recombinations are geometric crossovers [4]. For binary and multary strings, all homologous crossovers are geometric [4, 12]. For permutations, PMX, Cycle crossover, merge crossover, and others are geometric crossovers [5]. We describe this in more detail in Section 2.3 since we will use the permutation representation in this paper. For syntactic trees, the family of homologous crossovers are geometric [6]. Recombinations for several more complex representations are also geometric [4, 5, 7, 13].

2.3. Geometric Crossover for Permutations

In previous work, we have studied various crossovers for permutations, revealing that PMX [14], a well-known crossover for permutations, is geometric under swap distance. Also, we found that Cycle crossover [14], another traditional crossover for permutations, is geometric under swap distance and under Hamming distance (geometricity under Hamming distance for permutations implies geometricity under swap distance, but not vice versa). Finally, we showed that geometric crossovers for permutations based on edit moves are naturally associated with sorting algorithms: picking offspring on a minimum path between two parents corresponds to picking partially sorted permutations on the minimal sorting trajectory between the parents.

2.4. Geometric Crossover Landscape

Geometric operators are defined as functions of the distance associated with the search space. However, the search space does not come with the problem itself. The problem consists of a fitness function to optimize and a solution set, but not a neighbourhood relationship. The act of putting a structure over the solution set is part of the search algorithm design and it is a designer's choice. A fitness landscape is the fitness function plus a structure over the solution space. So for each problem, there is one fitness function but as many fitness landscapes as the number of possible different structures over the solution set. In principle, the designer could choose the structure to assign to the solution set completely independently from the problem at hand. However, because the search operators are defined over such a structure, doing so would make them decoupled from the problem at hand, hence turning the search into something very close to random search.

In order to avoid this, one can exploit problem knowledge in the search. This can be achieved by carefully designing the connectivity structure of the fitness landscape. For example, one can study the objective function of the problem and select a neighbourhood structure that couples the distance between solutions and their fitness values. Once this is done, the problem knowledge can be exploited by search operators to perform better than random search, even if the search operators are problem independent (as in the case of geometric crossover and mutation).

Under which conditions is a landscape well searchable by geometric operators? As a rule of thumb, geometric mutation and geometric crossover work well on landscapes where the closer pairs of solutions are, the more correlated their fitness values. Of course this is no surprise: the importance of landscape smoothness has been advocated in many different contexts and has been confirmed in uncountable empirical studies with many neighborhood search metaheuristics [15, 16]. To summarize, consider the following.(i) Rule of thumb 1: If we have a good distance for the problem at hand, then we have good geometric mutation and good geometric crossover.(ii) Rule of thumb 2: A good distance for the problem at hand is a distance that makes the landscape “smooth.”

2.5. Product Geometric Crossover

In recent work [12], we have introduced the notion of product geometric crossover.

Theorem 1. Cartesian product of geometric crossover is geometric under the sum of distances.

This theorem is very useful because it allows one to build new geometric crossovers by combining crossovers that are known to be geometric. In particular, this applies to crossovers for mixed representations. The elementary geometric crossovers do not need to be independent, to form a valid product geometric crossover.

2.6. Multiparental Geometric Crossover

To extend geometric crossover to the case of multiple parents, we need the following definitions [17].

Definition 2. A family of subsets of a set is called convexity on if(C1)the empty set and the universal set are in ,(C2) is nonempty, then , and(C3) is nonempty and totally ordered by inclusion, then .

The pair is called convex structure. The members of are called convex sets. By the axiom (C1), a subset of of the convex structure is included in at least one convex set, namely, .

From axiom (C2), is included in a smallest convex set, the convex hull of : (1) The convex hull of a finite set is called a polytope.

The axiom (C3) requires domain finiteness of the convex hull operator: a set is convex if it includes for each finite subset of .

The convex hull operator applied to a set of cardinality two is called segment operator. Given a metric space , the segment between and is the set . The abstract geodetic convexity on induced by is obtained as follows: a subset of is geodetically convex, provided for all , in . If denotes the convex hull operator of , then for all . The two operators need not to be equal: there are metric spaces in which metric segments are not all convex.

We can now provide the following extension.

Definition 3 (multiparental geometric crossover). In a multiparental geometric crossover, given parents , their offspring are contained in the metric convex hull of the parents for some metric .

Theorem 2 (decomposable three-parent recombination). Every multiparental recombination that can be decomposed as a sequence of 2-parental geometric crossovers under the same metric and , so that , is a three-parental geometric crossover.

Proof. Let be the set of parents and their metric convex hull. By definition of metric convex hull, for any two points , their offspring are in the convex hull . Since , any two parents have offspring . Then, any other parent , when recombined with , produces offspring in the convex hull . So the three-parental recombination equivalent to the sequence of geometric crossover and is a multiparental geometric crossover.

3. Geometric PSO

3.1. Canonical PSO Algorithm and Geometric Crossover

Consider the canonical PSO in Algorithm 1. It is well known [18] that one can write the equation of motion of the particle without making explicit use of its velocity.

Algorithm 1: Standard PSO algorithm.

Let be the position of a particle and its velocity. Let be the current best position of the particle and let be the global best. Let and be the velocity of the particle and and its position at the next two time ticks. The equation of velocity update is the linear combination: , where , , and are scalar coefficients. To eliminate velocities, we substitute the identities and in the equation of velocity update and rearrange it to obtain an equation that expresses as function of and as follows: (2)

If we set (i.e., the particle has no inertia), becomes independent on its position two time ticks earlier. If we call , , and , the equation of motion becomes (3)

In these conditions, the main feature that allows the motion of particles is the ability to perform linear combinations of points in the search space. As we will see in the next section, we can achieve this same ability by using multiple (geometric) crossover operations. This makes it possible to obtain a generalization of PSO to generic search spaces.

In the following, we illustrate the parallel between an evolutionary algorithm with geometric crossover and the motion of a particle in PSO (see Figure 1). Geometric crossover picks offspring on a line segment between parents and . Geometric crossover can be interpreted as a motion of a particle: consider a particle that moves in the direction of a point reaching, in the next time step, position . If one equates parent with the particle and parent with the direction point , the offspring is, therefore, the particle at the next time step . The distance between parent and offspring is the magnitude of the velocity of the particle . Notice that the particle moves from to : this means that the particle is replaced by the particle in the next time step. In other words, the new position of the particle replaces the previous position. Coming back to the evolutionary algorithm with geometric crossover, this means that the offspring replaces its parent in the new population. The fact that at a given time all particles move is equivalent to say that each particle is selected for “mating.” Mating is a weighted multirecombination involving the memory of the particle and the best in the current population.

Figure 1: Geometric crossover and particle motion.

In the standard PSO, weights represent the propensity of a particle towards memory, sociality, and cognition. In the GPSO, they represent the attractions towards the particle's previous position, the particle's best position, and the swarm's best position.

Naturally, particle motion based on geometric crossover leads to a form of search that cannot extend beyond the convex hull of the initial population. Mutation can be used to allow nonconvex search. We explain these ideas in detail in the following sections.

3.2. Geometric Interpretation of Linear Combinations

Definition 4. A convex combination of vectors is a linear combination , where all coefficients are nonnegative and add up to 1.

It is called “convex combination” because when vectors represent points in space, the set of all convex combinations constitutes the convex hull.

A special case is , where a point formed by the convex combination will lie on a straight line between two points. For three points, their convex hull is the triangle with the points as vertices.

Theorem 3. In a PSO with no inertia ( ) and where acceleration coefficients are such that , the next position of a particle is within the convex hull formed by its current position , its local best , and the swarm best .

Proof. As we have seen in Section 3.1, when , a particle's update equation becomes the linear combination in (3). Notice that this is an affine combination since the coefficients of , , and add up to 1. Interestingly, this means that the new position of the particle is coplanar with , , and . If we restrict and to be positive and their sum to be less than 1, (3) becomes a convex combination. Geometrically, this means that the new position of the particle is in the convex hull formed by (or more informally, is between) its previous position, its local best, and the swarm best.

In the next section, we generalize this simplified form of PSO from real vectors to generic metric spaces. As mentioned above, mutation will be required to extend the search beyond the convex hull.

3.3. Convex Combinations in Metric Spaces

Linear combinations are well defined for vector spaces: algebraic structures endowed with scalar product and vectorial sum. A metric space is a set endowed with a notion of distance. The set underlying a metric space does not normally come with well-defined notions of scalar product and sum among its elements. Therefore, a linear combination of its elements is not defined. How can we then define a convex combination in a metric space? Vectors in a vector space can easily be understood as points in a metric space. However, the interpretation of scalars is not as straightforward: what do the scalar weights in a convex combination mean in a metric space?

As seen in Section 3.2, a convex combination is an algebraic description of a convex hull. However, even if the notion of convex combination is not defined for metric spaces, convexity in metric spaces is still well defined through the notion of metric convex set that is a straightforward generalization of traditional convex set. Since convexity is well defined for metric spaces, we still have hope to generalize the scalar weights of a convex combination trying to make sense of them in terms of distance.

The weight of a point in a convex combination can be seen as a measure of relative linear “attraction” toward its corresponding point, versus attractions toward the other points of the combination. The closer a weight to 1, the stronger the attraction to the corresponding point. The point resulting from a convex combination can be seen as the equilibrium point of all the attraction forces. The distance between the equilibrium point and a point of the convex combination is, therefore, a decreasing function of the level of attraction (weight) of the point: the stronger the attraction, the smaller its distance to the equilibrium point. This observation can be used to reinterpret the weights of a convex combination in a metric space as follows: with , , and greater than zero and is generalized to mean that is a point such that , , and , where is a decreasing function.

This definition is formal and valid for all metric spaces, but it is nonconstructive. In contrast, a convex combination not only defines a convex hull, but also tells how to reach all its points. So how can we actually pick a point in the convex hull respecting the above distance requirements? Geometric crossover will help us with this, as we show in the next section.

To summarize, the requirements for a convex combination in a metric space are as follows.(1)Convex weights: the weights respect the form of a convex combination: and .(2)Convexity: the convex combination operator combines , , and and returns a point in their metric convex hull (or simply triangle) under the metric of the space considered.(3)Coherence between weights and distances: the distances to the equilibrium point are decreasing functions of their weights.(4)Symmetry: the same value assigned to , , or has the same effect (e.g., in a equilateral triangle, if the coefficients have all the same value, the distances to the equilibrium point are the same).

3.4. Geometric PSO Algorithm

The generic GPSO algorithm is illustrated in Algorithm 2. This differs from the standard PSO (Algorithm 1), in that:(i) there is no velocity,(ii)the equation of position update is the convex combination,(iii)there is mutation, and(iv)the parameters , , and are nonnegative and add up to one.

Algorithm 2: Geometric PSO algorithm.

The specific PSOs for the Euclidean, Manhattan, and Hamming spaces use the randomized convex combination operators described in Section 4 and space-specific mutations. The randomization introduced by the randomized convex combination and by the mutation are of different types. The former allows us to pick points at random exclusively within the convex hull. The latter, as mentioned in Section 3.1, allows us to pick points outside the convex hull.

4. Geometric PSO for Specific Spaces

4.1. Euclidean Space

The GPSO for the Euclidean space is not an extension of the traditional PSO. We include it to show how the general notions introduced in the previous section materialize in a familiar context. The convex combination operator for the Euclidean space is the traditional convex combination that produces points in the traditional convex hull.

In Section 3.3, we have mentioned how to interpret the weights in a convex combination in terms of distances. In the following, we show analytically how the weights of a convex combination affect the relative distances to the equilibrium point. In particular, we show that the relative distances are decreasing functions of the corresponding weights.

Theorem 4. In a convex combination, the distances to the equilibrium point are decreasing functions of the corresponding weights.

Proof. Let , , and be three points in and let be a convex combination. Let us now decrease to such that , , and still form a convex combination and that the relative proportions of and remain unchanged: . This requires and to be and . The equilibrium point for the new convex combination is (4) The distance between and is (5) and the distance between and the new equilibrium point is (6) So when decreases () and and maintain the same relative proportions, the distance between the point and the equilibrium point increases (). Hence, the distance between and the equilibrium point is a decreasing function of . For symmetry, this applies to the distances between and and the equilibrium point: they are decreasing functions of their corresponding weights and , respectively.

The traditional convex combination in the Euclidean space respects the four requirements for a convex combination presented in Section 3.3.

4.2. Manhattan Space

In the following, we first define a multiparental recombination for the Manhattan space and then prove that it respects the four requirements for being a convex combination presented in Section 3.3.

Definition 5 (box recombination family). Given two parents and in , a box recombination operator returns offspring such that for .

Theorem 5 (geometricity of box recombination). Any box recombination is a geometric crossover under Manhattan distance.

Proof. Theorem 5 is an immediate consequence of the product geometric crossover (Theorem 1).

Definition 6 (three-parent box recombination family). Given three parents , , and in , a box recombination operator returns offspring such that for .

Theorem 6 (geometricity of a three-parent box recombination). Any three-parent box recombination is a geometric crossover under Manhattan distance.

Proof. We prove it by showing that any multiparent box recombination can be decomposed as a sequence of two simple box recombinations. Since the simple box recombination is geometric (Theorem 5), this theorem is a simple corollary of the multiparental geometric decomposition theorem (Theorem 2).

We will show that followed by can reach any offspring . For each , we have . Notice that . We have two cases: (i) in which case is reachable by the sequence ; (ii) , then it must be in in which case is reachable by the sequence .

Definition 7 (weighted multiparent box recombination). Given three parents , , and in and weights , , and , a weighted box recombination operator returns offspring such that for , where , , and are a convex combination of randomly perturbed weights with expected values , , and .

The difference between box recombination and linear recombination (Euclidean space) is that in the latter, the weights , , and are randomly perturbed only once and the same weights are used for all the dimensions, whereas the former one has a different randomly perturbed version of the weights for each dimension.

The weighted multiparent box recombination belongs to the family of multiparent box recombination because for , hence, it is geometric.

Theorem 7 (coherence between weights and distances). In weighted multiparent box recombination, the distances of the parents to the expected offspring are decreasing functions of the corresponding weights.

Proof. The proof of Theorem 7 is a simple variation of that of Theorem 4.

In summary, in this section, we have introduced the weighted multiparent box recombination and shown that it is a convex combination operator satisfying the four requirements of a metric convex combination for the Manhattan space: convex weights (Definition 6), convexity (geometricity, Theorem 6), coherence (Theorem 7), and symmetry (self evident).

4.3. Hamming Space

In this section, we first define a multiparental recombination for binary strings, that is, a straightforward generalization of mask-based crossover with two parents and then prove that it respects the four requirements for being a convex combination in the Hamming space presented in Section 3.3.

Definition 8 (three-parent mask-based crossover family). Given three parents , , and in , generate randomly a crossover mask of length with symbols from the alphabet . Build the offspring filling each position with the bit from the parent appearing in the crossover mask at the corresponding position.

The weights , , and of the convex combination indicate, for each position in the crossover mask, the probability of having the symbols , , or .

Theorem 8 (geometricity of three-parent mask-based crossover). Any three-parent mask-based crossover is a geometric crossover under Hamming distance.

Proof. We prove it by showing that any three-parent mask-based crossover can be decomposed as a sequence of two simple mask-based crossovers. Since the simple mask-based crossover is geometric, this theorem is a simple corollary of the multiparental geometric decomposition theorem (Theorem 2).

Let be the mask to recombine , , and , producing the offspring . Let be the mask obtained by substituting all occurrences of in with , and let be the mask obtained by substituting all occurrences of in with . First, recombine and using obtaining . Then, recombine and using where the 's in the mask stand for alleles in . The offspring produced by the second crossover is , so the sequence of the two simple crossovers is equivalent to the three-parent crossover. This is because the first crossover passes to the offspring all genes it needs to take from according to , and the rest of the genes are all from ; the second crossover corrects those genes that should have been taken from parent according to , but were taken from instead.

Theorem 9 (coherence between weights and distances). In the weighted three-parent mask-based crossover, the distances of the parents to the expected offspring are decreasing functions of the corresponding weights.

Proof. We want to know the expected distance from parent , , and to their expected offspring as a function of the weights , , and . To do so, we first determine, for each position in the offspring, the probability of it being the same as . Then from that, we can easily compute the expected distance between and . We have that (7) where is the probability of a bit of at a certain position to be the same as the bit of at the same position; , , and are the probabilities that a bit in is taken from parents , , and , respectively (these coincide with the weights of the convex combination , , and ); and are the probabilities that a bit taken from or coincides with the one in at the same location. These last two probabilities equal the number of common bits in and (and and ) over the length of the strings . So and , where is the Hamming distance. So (7) becomes (8) Hence, the expected distance between the parent and the offspring is .Notice that this is a decreasing function of because increasing forces or to decrease since the sum of the weights is constant, hence, decreases. Analogously, and are decreasing functions of their weights and , respectively.

In summary, in this section, we have introduced the weighted multiparent mask-based crossover and shown that it is a convex combination operator satisfying the four requirements of a metric convex combination for the Hamming space: convex weights (Definition 8), convexity (geometricity, Theorem 8), coherence (Theorem 9), and symmetry (self evident).

5. Geometric PSO for Other Representations

Before looking into how we can extend GPSO to other solution representations, we will discuss the relation between the three-parental geometric crossover and the symmetry requirement for a convex combination.

For each of the spaces considered in Section 4, we have first considered (or defined) a three-parental recombination and then proven that it is a three-parental geometric crossover by showing that it can actually be decomposed into two sequential applications of a geometric crossover for the specific space.

However, we could have skipped the explicit definition of a three-parental recombination altogether. In fact, to obtain the three-parental recombination, we could have used two sequential applications of a known two-parental geometric crossover for the specific space. This composition is indeed a three-parental recombination (it combines three parents) and it is decomposable by construction. Hence, it is a three-parental geometric crossover. This, indeed, would have been simpler than the route we took.

The reason we preferred to define explicitly a three-parental recombination is that the requirement of symmetry of the convex combination is true by construction: if the roles of any two parents are swapped by exchanging in the three-parental recombination both positions and the respective recombination weights, the resulting recombination operator is equivalent to the original operator.

The symmetry requirement becomes harder to enforce and prove for a three-parental geometric crossover obtained by two sequential applications of a two-parental geometric crossover. We illustrate this in the following.

Let us consider three parents , , and with positive weights , , and which add up to one. If we have a symmetric three-parental weighted geometric crossover , the symmetry of the recombination is guaranteed by the symmetry of the operator. So is equivalent to . Hence, the requirement of symmetry on the weights of the convex combination holds. If we consider a three-parental recombination defined by using twice a two-parental genetic crossover , we have (9) with the constraint that and are positive and add up to one, and and are positive and add up to one. Notice the inherent asymmetry in this expression: the weights and are not directly comparable with because they are relative weights between and . Moreover, there is the extra weight . This asymmetry makes the requirement of symmetry problematic to meet: given the desired , , and , what values of , , , and should we choose to obtain an equivalent symmetric three-parental weighted recombination expressed as a sequence of two two-parental geometric crossovers?

For the Euclidean space, it is easy to answer this question using simple algebra as follows: (10) Since the convex combination of two points in the Euclidean space is , and and , then (11)

However, the question may be less straightforward to answer for other spaces, although, we could use the equation above as a rule-of-thumb to map the weights of and the weights in the sequential decomposition to obtain a nearly symmetric convex combination.

Where does this discussion leave us in relation to the extension of GPSO to other representations? We have seen that there are two alternative ways to produce a convex combination for a new representation: (i) explicitly define a symmetric three-parental recombination for the new representation and then prove its geometricity by showing that it is decomposable into a sequence of two two-parental geometric crossovers (explicit definition), or (ii) use twice the simple geometric crossover to produce a symmetric or nearly symmetric three-parental recombination (implicit definition). The second option is also very interesting because it allows us to extended automatically GPSO to all representations we have geometric crossovers for (such as permutations, GP trees, and variable-length sequences, to mention but a few), and virtually to any other complex solution representation.

6. Experimental Results for Euclidean, Manhattan, and Hamming Spaces

We have run two groups of experiments: one for the continuous version of the GPSO (EucliadeanPSO, or EPSO for short, and ManhattanPSO, or MPSO), and one for the binary version (HammingPSO, or HPSO).

For the Euclidean and Manhattan versions, we have compared the performances with those of a continuous PSO (BasePSO, or BPSO) with constriction and inertia, whose parameters are as in Table 1. We have run the experiments for dimensions 2, 10, and 30 on the following five-benchmark functions: F1C Sphere [19], F2C Rosenbrock [19], F3C Ackley [20], F4C Griewangk [21], and F5C Rastrigin [22]. The Hamming version has been tested on the De Jong's test suite [19]: F1 Sphere (30), F2 Rosenbrock (24), F3 Step (50), F4 Quartic (240), and F5 Shekel (34), where the numbers in brackets are the dimensions of the problems in bits. All functions in the test bed have global optimum 0 and are to be maximized.

Table 1: Parameters for BPSO.

Since there is no equivalent GPSO with (, which does not respect the conditions in Theorem 3), we have decided to set , , and proportional to , , and , respectively, and summing up to one (see Table 2).

Table 2: Parameters for EPSO and MPSO.

For the binary version, the parameters of population size, number of iterations, and , , and have been tuned on the sphere function and are as in Table 3. From the parameters tuning, it appears that there is a preference for values of close to zero. This means that there is a bias towards the swarm and particle bests, and less attraction towards the current particle position.

Table 3: Selected parameters for HPSO.

For each set up, we performed 20 independent runs. Table 4 shows the best and the mean fitness value (i.e., the fitness value at the position where the population converges) found by the swarm when exploring continuous spaces. This table summarizes the results for the three algorithms presented, over the five test functions, for the two choices of population size, giving an immediate comparison of the performances. Overall the GPSOs (EPSO and MPSO), compare very favourably with BPSO, outperforming it in many cases. This is particularly interesting since it suggests that the inertia term (not present in GPSO) is not necessary for good performance.

Table 4: Test results for continuous versions: best and mean fitness values found by the swarm over 20 runs at last iteration (iteration 200).

In two dimensions, the results for all the functions (for PSOs both with 20 and 50 particles) are nearly identical, with BPSO and the two GPSOs both performing equally well. In higher dimensions, it is interesting to see how dimensionality does not seem to affect the quality of the results of GPSOs (while there is a fairly obvious decline in the performance of BPSO when dimension increases).

Also, EPSO's and MPSO's results are virtually identical. Let us recall from Section 4.2 the difference between Euclidean and Manhattan spaces, that is, “the difference between box recombination and linear recombination (Euclidean space) is that in the latter, the weights are randomly perturbed only once and the same weights are used for all the dimensions, whereas the former one has a different randomly perturbed version of the weights for each dimension.” The results obtained show, therefore, that at least in this context, randomly perturbing the weights once for all dimensions, or perturbing them for each dimension, does not seem to affect the end result.

Table 5 shows the mean of the best fitness value and the best fitness value over the whole population for the binary version of the algorithm. The algorithm compares well with results reported in the literature, with HPSO obtaining near optimal results on all functions. Interestingly, the algorithm works at its best when , the weight for (the particle position), is zero. This corresponds to a degenerated PSO that makes decisions without considering the current position of the particle.

Table 5: Test results for HPSO with selected parameters for De Jong's test suite.

7. Geometric PSO for Sudoku

In this section, we will put into practice the ideas discussed in Section 5 and propose a GPSO to solve the Sudoku puzzle. In Section 7.1, we introduce the Sudoku puzzle. In Section 7.2, we present a geometric crossover for Sudoku. In Section 7.3, we present a three-parental crossover for Sudoku and show that it is a convex combination.

7.1. Description of Sudoku

Sudoku is a logic-based placement puzzle. The aim of the puzzle is to enter a digit from 1 through 9 in each cell of a grid made up of subgrids (called “regions”), starting with various digits given in some cells (the “givens”). Each row, column, and region must contain only one instance of each digit. In Figure 2, we show an example of Sudoku puzzle. Sudoku puzzles with a unique solution are called proper Sudoku, and the majority of published grids are of this type.

Figure 2: Example of Sudoku puzzle.

Published puzzles often are ranked in terms of difficulty. Perhaps surprisingly, the number of “givens” has little or no bearing on a puzzle's difficulty. This is based on the relevance and the positioning of the numbers rather than the quantity of the numbers.

The Sudoku puzzle of any difficulty can be solved very quickly by a computer. The simplest way is to use some brute force trial-and-error search employing back tracking. Constraint programming is a more efficient method that propagates the constraints successively to narrow down the solution space until a solution is found or until alternate values cannot otherwise be excluded, in which case, backtracking is applied. A highly efficient way of solving such constraint problems is the Dancing Links Algorithm by Donald Knuth [23].

The general problem of solving Sudoku puzzles on boards of blocks is known to be NP complete [24]. This means that, unless , the exact solution methods that solve very quickly the boards take exponential time in the board size in the worst case. However, it is unknown whether the general Sudoku problem restricted to puzzles with unique solutions remains NP complete or becomes polynomial.

Solving Sudoku puzzles can be expressed as a graph coloring problem. The aim of the puzzle in its standard form is to construct a proper 9 coloring of a particular graph, given a partial 9 coloring.

A valid Sudoku solution grid is also a Latin square. Sudoku imposes the additional regional constraint. Latin square completion is known to be NP complete. A further relaxation of the problem allowing repetitions on columns (or rows) makes it polynomially solvable.

Admittedly, evolutionary algorithms and meta-heuristics in general are not the best techniques to solve Sudoku because they do not exploit systematically the problem constraints to narrow down the search. However, Sudoku is an interesting study case because it is a relatively simple problem but not trivial since is NP complete, and the different types of constraints make Sudoku an interesting playground for search operator design.

7.2. Geometric Crossover for Sudoku

Sudoku is a constraint satisfaction problem with four types of constraints:(1) fixed elements,(2)rows are permutations,(3)columns are permutations,(4) and boxes are permutations.

It can be cast as an optimization problem by choosing some of the constraints as hard constraints that all solutions have to respect, and the remaining constraints as soft constraints that can be only partially fulfilled, and the level of fulfilment is the fitness of the solution. We consider a space with the following characteristics:(i)hard constraints: fixed positions and permutations on rows,(ii)soft constraints: permutations on columns and boxes,(iii)distance: sum of swap distances between paired rows (row-swap distance),(iv)feasible geometric mutation: swap two nonfixed elements in a row, and(v)feasible geometric crossover: row-wise PMX and row-wise cycle crossover.

The chosen mutation preserves both fixed positions and permutations on rows (hard constraints) because swapping elements within a row, which is a permutation, returns a permutation. The mutation is 1-geometric under row-swap distance. Row-wise PMX and row-wise cycle crossover recombine parent grids applying, respectively, PMX and cycle crossover to each pair of corresponding rows. In case of PMX, the crossover points can be selected to be the same for all rows, or be random for each row. In terms of offspring that can be generated, the second version of row-wise PMX includes all the offspring of the first version.

Simple PMX and simple cycle crossover applied to parent permutations always return permutations. They also preserve fixed positions. This is because both are geometric under swap distance and, in order to generate offspring on a minimal sorting path between parents using swaps (sorting one parent into the order of the other parent), they have to avoid swaps that change common elements in both parents (elements that are already sorted). Therefore, also row-wise PMX and row-wise cycle crossover preserve both hard constraints.

Using the product geometric crossover Theorem 1, it is immediate to see that both row-wise PMX and row-wise cycle crossover are geometric under row-swap distance since simple PMX and simple cycle crossover are geometric under swap distance. Since simple cycle crossover is also geometric under Hamming distance (restricted to permutations), row-wise cycle crossover is also geometric under Hamming distance.

To restrict the search to the space of grids with fixed positions and permutations on rows, the initial population must be seeded with feasible random solutions taken from this space. Generating such solutions can be done still very efficiently.

The fitness function (to maximize) is the sum of the number of unique elements in each row plus the sum of the number of unique elements in each column plus the sum of the number of unique elements in each box. So for a grid, we have a maximum fitness of for a completely correct Sudoku grid, and a minimum fitness little more than because for each row, column, and square, there is at least one unique element type.

It is possible to show that the fitness landscapes associated with this space is smooth, making the search operators proposed a good choice for Sudoku.

7.3. Convex Combination for Sudoku

In this section, we first define a multiparental recombination for permutations and then prove that it respects the four requirements for being a convex combination presented in Section 3.3.

Let us consider the example in Figure 3 to illustrate how the multiparental sorting crossover works.

Figure 3: Example of multiparental sorting crossover.

The mask is generated at random and is a vector of the same length of the parents. The number of 's, 's, and 's in the mask is proportional to the recombination weights , , and of the parents. Every entry of the mask indicates to which parent the other two parents need to be equal to for that specific position. In a parent, the content of a position is changed by swapping it with the content of another position in the parent. The recombination proceeds as follows. The mask is scanned from the left to the right. In position 1, the mask has a . This means that at position 1, parent and parent have to become equal to parent . This is done by swapping the elements 1 and 3 in parent and the elements 1 and 3 in parent . The recombination now continues on the updated parents: parent is left unchanged and the current parent and parent are the original parents and after the swap. At position 2, the mask has . This means that at position 2, the current parent and current parent have to become equal to the current parent . So at position 2, parent and parent have to get 5. To achieve this, in parent , we need to swap elements 2 and 5, and in parent , we need to swap elements 2 and 5. The recombination continues on the updated parents for position 3 and so on, up to the last position in the mask. At this point, the three parents are now equal because at each position, one element of the permutation has been fixed in that position and it is automatically not involved in any further swap. Therefore, after all positions have been considered, all elements are fixed. The permutation to which the three parents converged is the offspring permutation. This recombination sorts by swaps the three parents towards each other according to the contents of the crossover mask, and the offspring is the result of this multiple sorting. This recombination can be easily generalized to any number of parents.

Theorem 10 (geometricity of three-parental sorting crossover). Three-parental sorting crossover is a geometric crossover under swap distance.

Proof. A three-parental sorting crossover with recombination mask is equivalent to a sequence of two two-parental sorting crossovers: the first is between parents and with recombination mask obtained by substituting all 3's with 2's in . The offspring so obtained is recombined with with recombination mask obtained by substituting all 1's with 2's in . So for Theorem 2, the three-parental sorting crossover is geometric.

Theorem 11 (coherence between weights and distances). In a weighted multiparent sorting crossover, the swap distances of the parents to the expected offspring are decreasing functions of the corresponding weights.

Proof. The weights associated to the parents are proportional to their frequencies in the recombination mask. The more occurrences of a parent in the recombination mask, the smaller the swap distance between this parent and the offspring. This is because the mask tells the parent to copy at each position. So the higher the weight of a parent, the smaller its distance to the offspring.

The weighted multiparental sorting crossover is a convex combination operator satisfying the four requirements of a metric convex combination for the swap space: convex weights sum to 1 by definition, convexity (geometricity, Theorem 10), coherence (Theorem 11), and symmetry (self evident).

The solution representation for Sudoku is a vector of permutations. For the product geometric crossover theorem, the compound crossover over the vector of permutations that applies a geometric crossover to each permutation in the vector is a geometric crossover. Theorem 12 extends to the case of a multiparent geometric crossover.

Theorem 12 (product geometric crossover for convex combinations). A convex combination operator, applied to each entry of a vector, results in a convex combination operator for the entire vector.

Proof. The product geometric crossover theorem (Theorem 1) is true because the segment of a product space is the Cartesian product of the segments of its projections. A segment is the convex hull of two points (parents). More generally, it holds that the convex hull (of any number of points) of a product space is the Cartesian product of the convex hulls of its projections [17]. The product geometric crossover then naturally generalizes to the multiparent case.

8. Experimental Results for Sudoku

In order to test the efficacy of the GPSO algorithm on the Sudoku problem, we ran several experiments in order to thoroughly explore the parameter space and variations of the algorithm. The algorithm in itself is a straightforward implementation of the GPSO algorithm given in Section 3.4 with the search operators for Sudoku presented in Section 7.3.

The parameters we varied were swarm sociality () and memory (), each of which were in turn set to 0, 0.2, 0.4, 0.6, 0.8, and 1.0. Since the attraction to each particle's position is defined as , the space of this parameter was implicitly explored. Likewise, mutation probability was set to either 0, 0.3, 0.7, or 1.0. The swarm size was set to be either 20, 100, or 500 particles, but the number of updates was set so that each run of the algorithm resulted in exactly 100 000 fitness evaluations (thus performing 5000, 1000, or 200 updates). Further, each combination was tried with ring topology, von Neumann topology (or lattice topology), and global topology, which are the most common topologies.

As explained in Section 5, there are two alternative ways of producing a convex combination: either using a convex combination operator or simply applying twice a two-parental weighted recombination with appropriate weights to obtain the convex combination. Both ways to produce convex combination operators, explicit and implicit, were tried on preliminary runs and turned out to produce indistinguishable results. In the end, we used the convex combination operator.

8.1. Effects of Varying Coefficients

The best population size is 100. The other two sizes we studied (20 and 500) were considerably worse. The best topology is the lattice (von Neumann) topology. The other two topologies we studied were worse (see Table 9).

From Tables 68, we can see that mutation rates of 0.3 and 0.7 perform better than no mutation at all. We can also see that parameter settings with (i.e., the attraction of the particle's previous position) set to more than 0.4 generally perform badly. The best configurations generally have (i.e., sociality) set to 0.2 or 0.4, (i.e., memory) set to 0.4 or 0.6, and set to 0 or 0.2. This gives us some indication of the importance of the various types of recombinations in GPSO as applied at least to this particular problem. Surprisingly, the algorithm works at its best when the weight of the particle position () is zero or nearly zero. In the case of set to 0, GPSO, in fact, degenerates to a type of evolutionary algorithm with deterministic uniform selection, mating with the population best with local replacement between parents and offspring.

Table 6: Average of bests of 50 runs with population size 100, lattice topology, and mutation 0.0, varying sociality (vertical) and memory (horizontal).
Table 7: Average of bests of 50 runs with population size 100, lattice topology, and mutation 0.3, varying sociality (vertical) and memory (horizontal).
Table 8: Average of bests of 50 runs with population size 100, lattice topology and mutation 0.7, varying sociality (vertical) and memory (horizontal).
Table 9: Success rate of various methods.
8.2. PSO versus EA

Table 9 compares the success rate of the best configurations of various methods we have tried. Success is here defined as the number of runs (out of 50) where the global optimum (243) is reached. All the methods were allotted the same number of function evaluations per run.

From the table, we can see that the von Neumann topology clearly outperforms the other topologies we tested, and that a GPSO with this topology can achieve a respectable success rate on this tricky noncontinuous problem. However, the best genetic algorithm still significantly outperforms the best GPSO we have found so far. (Notice that this is true only when considering GPSO as an optimizer. The approximation behavior of the GPSO is very good: with the right parameter setting, the GPSO reaches on average a fitness of 242 out of 243 (see Tables 68).) We believe this to be at least partly the effect of the even more extensive tuning of parameters and operators undertaken in our GA experiments.

9. Conclusions and Future Work

We have extended the geometric framework with the notion of multiparent geometric crossover, that is, a natural generalization of two-parental geometric crossover: offspring are in the convex hull of the parents. Then, using the geometric framework, we have shown an intimate relation between a simplified form of PSO (without the inertia term) and evolutionary algorithms. This has enabled us to generalize, in a natural, rigorous, and automatic way, PSO for any type of search space for which a geometric crossover is known.

We specialized the general PSO to Euclidean, Manhattan, and Hamming spaces, obtaining three instances of the general PSO for these spaces, EPSO, MPSO, and HPSO, respectively. We have performed extensive experiments with these new GPSOs. In particular, we applied EPSO, MPSO, and HPSO to standard sets of benchmark functions and obtained a few surprising results. Firstly, the GPSOs have performed really well, beating the canonical PSO with standard parameters most of the time. Secondly, they have done so right out of the box. That is, unlike the early versions of PSO which required considerable effort before a good general set of parameters could be found, with GPSO, we have done very limited preliminary testing and parameter tuning, and yet the new PSOs have worked well. This suggests that they may be quite robust optimisers. This will need to be verified in future research. Thirdly, HPSO works at its best with only weak attraction toward the current position of the particle. With this configuration, GPSO almost degenerates to a type of genetic algorithm.

An important feature of the GPSO algorithm is that it allows one to automatically define PSOs for all spaces for which a geometric crossover is known. Since geometric crossovers are defined for all of the most frequently used representations and many variations and combinations of those, our geometric framework makes it possible to derive PSOs for all such representations. GPSO is rigorous generalization of the classical PSO to general metric spaces. In particular, it applies to combinatorial spaces.

We have demonstrated how simple it is to specify the general GPSO algorithm to the space of Sudoku grids (vectors of permutations), using both an explicit and an implicit definitions of convex combination. We have tested the new GPSO on Sudoku and have found that (i) the communication topology makes a huge difference and that the lattice topology is by far the best; (ii) as for HPSO, the GPSO on Sudoku works better with weak attraction toward the current position of the particle; (iii) the GSPO on Sudoku finds easily near-optimal solutions but it does not always find the optimum. Admittedly, GPSO is not the best algorithm for the Sudoku puzzle where the aim is to obtain the correct solution all the times, not a nearly correct one. This suggests that GPSO would be much more profitably applied to combinatorial problems for which one would be happy to find near-optimal solutions quickly.

In summary, we presented a PSO algorithm that has been quite successfully applied to a nontrivial combinatorial space. This shows that GPSO is indeed a natural and promising generalization of classical PSO. In future work, we will consider GPSO for even more challenging combinatorial spaces such as the space of genetic programs. Also, since the inertia term is a very important feature of classical PSO, we want to generalize it and test the GPSO with the inertia term on combinatorial spaces.

Acknowledgment

The second and fourth authors would like to thank EPSRC, Grant no. GR/T11234/01 “Extended Particle Swarms,” for the financial support.

References

  1. T. Bäck, Evolutinary Algorithms in Theory and in Practice, Oxford University Press, Oxford, UK, 1996.
  2. M. Clerc, “Discrete particle swarm optimization, Illustrated by the traveling salesman problem,” in New Optimization Techniques in Engineering, Springer, New York, NY, USA, 2004.
  3. M. Clerc and J. Kennedy, “The particle swarm-explosion, stability, and convergence in a multidimensional complex space,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 1, pp. 58–73, 2002.
  4. M. Clerc, “When nearer is better,” 2007, Hal Open Archive.
  5. M. Deza and M. Laurent, Geometry of Cuts and Metrics, Springer, New York, NY, USA, 1991.
  6. K. De Jong, An analysis of the behaviour of a class of genetic adaptive systems, Ph.D. thesis, University of Michigan, Ann Arbor, Mich, USA, 1975.
  7. D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, Reading, Mass, USA, 1989.
  8. T. Jones, Evolutionary algorithms, fitness landscapes and search, Ph.D. thesis, University of New Mexico, Albuquerque, NM, USA, 1995.
  9. J. Kennedy and R. C. Eberhart, “A discrete binary version of the particle swarm algorithm,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (ICSMC '97), vol. 5, pp. 4104–4108, Orlando, Fla, USA, October 1997.
  10. J. Kennedy and R. C. Eberhart, Swarm Intelligence, Morgan Kaufmann, San Francisco, Calif, USA, 2001.
  11. D. E. Knuth, “Dancing links,” 2000, Preprint P159, Stanford University.
  12. A. Moraglio, J. Togelius, and S. Lucas, “Product geometric crossover for the Sudoku puzzle,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '06), pp. 470–476, Vancouver, BC, Canada, July 2006.
  13. A. Moraglio and R. Poli, “Topological interpretation of crossover,” in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '04), vol. 3102, pp. 1377–1388, Seattle, Wash, USA, June 2004.
  14. A. Moraglio and R. Poli, “Topological crossover for the permutation representation,” in Proceedings of the Workshops on Genetic and Evolutionary Computation (GECCO '05), pp. 332–338, Washington, DC, USA, June 2005.
  15. A. Moraglio and R. Poli, “Product geometric crossover,” in Proceedings of the 9th International Conference on Parallel Problem Solving from Nature (PPSN '06), pp. 1018–1027, Reykjavik, Iceland, September 2006.
  16. A. Moraglio and R. Poli, “Geometric crossover for sets, multisets and partitions,” in Proceedings of the 9th International Conference on Parallel Problem Solving from Nature (PPSN '06), pp. 1038–1047, Reykjavik, Iceland, September 2006.
  17. A. Moraglio and R. Poli, “Geometric landscape of homologous crossover for syntactic trees,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '05), pp. 427–434, Edinburgh, UK, September 2005.
  18. A. Moraglio, R. Poli, and R. Seehuus, “Geometric crossover for biological sequences,” in Proceedings of the 9th European Conference on Genetic Programming, pp. 121–132, Budapest, Hungary, April 2006.
  19. H. Müehlenbein, M. Schomisch, and J. Born, “The Parallel genetic algorithm as function optimizer,” Parallel Computing, vol. 17, pp. 619–632, 1991.
  20. P. M. Pardalos and M. G. C. Resende, Handbook of Applied Optimization, Oxford University Press, Oxford, UK, 2002.
  21. Y. Shi and R. C. Eberhart, “A modified particle swarm optimizer,” in Proceedings of the IEEE Congress on Evolutionary Computation, pp. 69–73, Anchorage, Alaska, USA, May 1998.
  22. A. Törn and A. Zilinskas, Global Optimization, vol. 350 of Lecture Notes in Computer Science, Springer, Berlin, Germany, 1989.
  23. T. Yato and T. Seta, “Complexity and completeness of finding another solution and its application to puzzles,” 2005, Preprint, University of Tokyo.
  24. M. L. J. van de Vel, Theory of Convex Structures, North-Holland, Amsterdam, The Netherlands, 1993.