Journal of Artificial Evolution and Applications 
Volume 2008 (2008), Article ID 761459, 10 pages
doi:10.1155/2008/761459
Research Article

Dynamics and Stability of the Sampling Distribution of Particle Swarm Optimisers via Moment Analysis

Riccardo Poli

Department of Computing and Electronic Systems, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK

Received 23 July 2007; Accepted 3 December 2007

Recommended by T. Blackwell

Abstract

For stochastic optimisation algorithms, knowing the probability distribution with which an algorithm allocates new samples in the search space is very important, since this explains how the algorithm really works and is a prerequisite to being able to match algorithms to problems. This is the only way to beat the limitations highlighted by the no-free lunch theory. Yet, the sampling distribution for velocity-based particle swarm optimisers has remained a mystery for the whole of the first decade of PSO research. In this paper, a method is presented that allows one to exactly determine all the characteristics of a PSO's sampling distribution and explain how it changes over time during stagnation (i.e., while particles are in search for a better personal best) for a large class of PSO's.

1. Introduction

Let us consider the basic form of PSO with inertia weight, which is controlled by the following two equations:

(1) where and represent vectors of random numbers uniformly distributed in and , respectively, ( and are nonnegative constants), is component wise multiplication, is the position of the th particle's personal best, is the position of the particle's neighbourhood best, and is a constant in . Following Kennedy's graphical examinations of the trajectories of individual particles and their responses to variations in key parameters [1], traditionally, this model (and the related PSO with constriction [2], which is, provably, mathematically equivalent) has been studied theoretically under strong simplifying assumptions such as isolated single individuals, search stagnation (i.e., no improved solutions are found) and absence of randomness [212]. While these assumptions make it possible to make theoretical progress, they also make the results of the analyses difficult to carry over to the real PSO (where stochasticity is present).

Only very few attempts to understand the behaviour of the PSO in the presence of stochasticity have been made. For example, Clerc [13] analysed the distribution of velocities of one particle controlled by the standard PSO update rule with inertia and stochastic forces and was able to show that a particle's new velocity is the sum of three components: a forward force, a backward force, and noise. Kadirkamanathan et al. [14] were able to study the stability of particles in the presence of stochasticity by using Lyapunov stability analysis.

These are good and important first steps in the direction of modelling the real PSO. However, they only marginally scratch the surface in relation to one of the most important open problems in PSO research: the characterisation of the PSO's sampling distribution and its changes over time. Why is this question so fundamental? For two reasons. Firstly, this scientific knowledge gives us much greater understanding of an algorithm's behaviour. Secondly, on a more practical level, the only way to beat the limitations implied by NFL [15] is to match algorithms to problems. Naturally, at least in part, this can be achieved by a trial-and-error approach involving testing different algorithms and parameter settings on a particular problem or class of problems until a suitable match is found. A more analytic approach to this problem, however, requires two things: (a) knowing the search distribution of an algorithm and (b) checking whether this is well matched to the features of a problem's fitness landscape. Unfortunately, the sampling distribution of the PSO has been unavailable during the whole first decade (and beyond) of PSO research.

In recent work [16], we introduced a novel method, which allowed us to exactly determine the mean and standard deviation of the sampling distribution of the canonical PSO as well as their changes over any number of generations. The only assumption we made was stagnation, that is, we studied the sampling distribution produced by particles in search for a better personal best. In this paper, we generalise the technique and show how it can be used to determine any number of moments of the sampling distribution of PSO's.

The paper is organised as follows. In Section 2, we provide a summary of the results presented in [16]. In Section 3, we generalised the method to moments of any order. In Section 4, we apply the technique to compute the skewness and kurtosis of the canonical PSO's sampling distribution. In Section 5, we further extend the generality of our technique so as to cover a variety of PSO's, not just the one in (1). This allows us to compare, on a theoretical basis, different forms of PSO. In Section 6, we then use the moments of the PSO's sampling distribution to approximately reconstruct the distribution itself, making it possible, for the first time, to understand the search strategy implemented by the canonical PSO, while searching for new improvements. We discuss the results and conclude in Section 7.

2. Dynamics of First- and Second Order Moments of the PSO's Sampling Distribution

2.1. Derivation of Dynamic Equations for the Moments

During stagnation, each particle behaves independently. Also, each dimension is treated independently. So we can analyse each particle's behaviour in isolation. Therefore, we can drop the superscript in (1), and rewrite them as

(2) where we used the relation . Note that on the r.h.s. of this equation, while , , and are constants, , , , and are stochastic variables. As a consequence, on the l.h.s. of the equation is a stochastic variable too (in fact, it is a stochastic function of the stochastic variables appearing in the r.h.s.).

In [16], we applied the expectation operator to both sides of the equation obtaining

(3) where we performed the substitution because of the statistical independence between and . Assuming that , that is, the and are both uniformly distributed in , we have , and so

(4) where we renamed .

If we square both sides of (2), we obtain

(5) Expanding the r.h.s., we obtain an equation expressing the stochastic variable as a function of other stochastic variables, that is, , , , and so forth. By applying the expectation operator to both sides of this equation, one obtains

(6) where we set and , for brevity.

If, instead, we multiply both sides of (2) by and apply the expectation operator, we obtain

(7)

Equations (4), (6), and (7) form a system of coupled second-order difference equations using which, given appropriate initial conditions, one can compute , and for any .

2.2. Analysis of the Dynamics

Equations (4), (6), and (7) can also be written in matrix notation as the extended first order system

(8) where

(9) where , , .

It is then trivial to verify under what conditions , , and will converge to stable fixed points. We need to have that all eigenvalues of must be within the unit circle, that is, . When this happens, we will say that the PSO is order-2 stable.

Naturally, when , in principle, we could derive (either symbolically or numerically) the fixed point for the system, which we will denote as . This would be simply given by

(10) When the system is order-2 stable, by the simple change of variables we can represent the dynamics of the system via the following linear homogeneous equation:

(11) which can trivially be integrated to obtain the following explicit solution for the dynamics of the first two moments:

(12)

3. Higher-Order Moments

In [16], we obtained recursions which describe the dynamics of first- and second-order moments of the sampling distribution of a standard PSO during stagnation, as summarised in the previous section. One may then wonder whether it would be possible to follow a similar approach to study the dynamics of higher-order moments.

3.1. Derivation of Dynamic Equations Forhigher-Order Moments

The fundamental question is “what quantities would we have to deal with if we took higher powers of both sides of (2) as we did to derive (5)?” Generally, the r.h.s. would be a sum of terms of the form

(13) where are suitable constants. Naturally, taking powers of (2) and then multiplying both sides by some power of would also lead to equations involving terms such as those in (13). That is, for any choice of and ,

(14) where are suitable constants. If we then take expectations for both sides, we obtain

(15) where we used the independence of , , , and, of course, their powers. If and are uniformly distributed in the same range, namely, , it is easy to verify that

(16) So

(17) where

(18) It is important to note here that because (2) is linear in and , all the terms on the r.h.s. of (14) and (17) respect the relation . This implies that it is possible to construct recursions for moments of arbitrary order.

3.2. Complexity of the Calculations

If one wanted to push the analysis developed in Section 2 up to order 3, one would need to instantiate (17) for , , and add the resulting equations to (4), (6), and (7). If one wanted to go to order 4, an additional set of four equations (for , , , and ) would be needed, bringing the total to 10.

More generally, in order to compute statistics of order , one needs to construct and iterate

(19) second-order difference equations. Since, after expansion, the r.h.s. of (2) contains 7 atomic terms of the form in (13), the r.h.s. of (17) contains such terms (Note that the exponent does not influence the number of terms because the recursion for is obtained as follows: (a) we compute , which is given by an expression containing terms; (b) we multiply each term by , which changes the exponents but does not alter the number of terms; (c) we apply the expectation operator, which again does not modify the number of terms.) So the total number of terms one needs to compute to construct the equations for order- statistics is (for the order 1 equations) plus (for the order 2 equations) plus (for the order 3 equations), and so forth. This gives us a total of

(20) terms. Note that grows exponentially approximately as . So although the number of equations one needs to deal with grows quadratically, the computational effort required to instantiate them is exponential. For instance, , , , , and . The growth in number of terms can be reduced if one makes explicit use of (i.e., by adding the factor in (13)). Then . Either way, manually deriving equations for moments of order 3 is already very laborious. The process, however, is clearly mechanisable. This can be done using computer algebra systems, or by explicitly representing and manipulating the 's in each equation (this is what we did). As a result of mechanisation, computing the equations for up to order 6 or 7 is feasible with an ordinary personal computer.

Some of the 's in (17) present the same pattern of exponents for , , , and , so terms can be collected leading to more compact equations (e.g., compare (5) and (6)). Also, given their size, one will normally want to study (e.g., integrate) (17) numerically. In this case, , , , and are all numerical parameters. So the 's become constants and, after collecting terms, each equation contains at most terms, which, as we know, is quadratic in the order . As a result, although the complexity of the construction of the motion equations for the moments is exponential in the order of the moments, their numerical integration is only of order .

Naturally, the system of second-order difference equations necessary to predict the dynamics of moments of order 1 to can be turned into a system of order 1 of the form in (8), via the choice

(21)

This effectively means adding artificial update equations of the form for , bringing the total to . The transition matrix for the system is, therefore, of size . We will denote this with . For example, for , which would allow one to study the mean, variance, skewness, and kurtosis of the sampling distribution as a function of , is merely a matrix.

Interestingly, grows so slowly that one can perform an eigenvalue analysis for any that one is able to compute. That is, the expensive part of the process is the construction of . Once this is done, iterating the system, establishing its stability, or finding its fixed points is a trivial matter.

In the next section, we provide results for statistics of order 3 and 4, that is, , a value of for which computing takes only a few seconds. However, before we do this, we need to consider the initial conditions for the system.

3.3. Initial Conditions

In order to provide initial conditions for the moments' dynamics, we need to compute and for generic and .

If we are at the very first iteration of the PSO algorithm, the initial conditions for the dynamics of the moments are related to the ranges used to initialise positions and velocities in the PSO. Under the assumption that a particle's initial position, , is chosen uniformly at random in a symmetric range , we have

(22) In order to compute , we need to consider the equation

(23) where a particle's initial velocity, , is a stochastic variable uniformly distributed the range (often ). By taking the th power of both sides of the equation, multiplying by , and taking expectations, as we did to construct (17), one obtains the desired expressions for . Like for (17), these expressions contain a number of terms that grows exponentially with . However, this process, too, can be trivially mechanised.

If the PSO is not at its first iteration, and a new improvement has just been found to a particle's personal best or neighbourhood best, then is not a stochastic variable, but a constant determined by the position where the particle found the last improvement.

4. Skewness and Kurtosis of the PSO's Sampling Distribution

We constructed the recursions for moments of up to order 4 as described in the previous section for the canonical PSO. Naturally, for higher-order moments, we can do exactly the same type of eigenvalue analysis we did in [16] for the mean and standard deviation of the sampling distribution. For example, Figure 1 shows the magnitude of the largest eigenvalue, , of as a function of the parameters and . The system is order-4 stable, that is, mean, variance, skewness, and kurtosis have a stable fixed point, whenever , that is, within the curved region enclosed by the contour shown in the figure.

Figure 1: Magnitude of the largest eigenvalue of as a function of the parameters and . The curved line on the surface encloses the order-4 stable region.

For easier comparison, we show the lines where for , , , and in Figure 2 (ordered from top to bottom, resp.). The regions of order-1, -2, -3, and -4 stability are nested. Note how the lines for and coincide for many values of . Note also that the standard setting, and , lays within the narrow region of order-3 stability. This implies that while mean, variance, and skewness of the standard PSO tend to a fixed point, kurtosis is unstable and will tend to grow indefinitely. Interestingly, a growth in the kurtosis of samples was observed by Kennedy [17], although this was effectively computed under the assumption that the sampling distribution is time independent. So the values of recorded in a run at grows were treated as different samples from the same distribution, while we now know this is incorrect.

Figure 2: Plot of the regions of order-1, -2, -3, and -4 stability for the canonical PSO.

That the predictions of the model are exact is also confirmed by the comparison of the dynamics of predicted and recorded higher-order moments. Figure 3(a) shows a comparison between the skewness computed using our model and the average positions of the particle recorded in one billion (1 000 000 000) real runs in the first 30 iterations for the case , , , , and . As one can see, there is a very good match between the model's predictions and the stagnation behaviour of particles in real runs. Only after about 20–25 generations, the sampling errors start accumulating enough to show significant differences. We know that these differences are due to sampling errors because before performing 1 000 000 000 runs, we attempted to use much smaller sets of runs. With fewer runs, we observed that errors were visible well before generation 25. For example, with 1 000 000 runs, there is a good match up to around iteration 15.

Figure 3: Comparison between predicted and experimental skewness and (excess) kurtosis of the PSO sampling distribution for , , , , and . Kurtosis grows exponentially, and so it is plotted on a logarithmic scale. The first point is not plotted because the excess kurtosis was negative (−1.2).

As shown in Figure 3(b), the model also predicts very well the behaviour of the (excess) kurtosis of the sampling distribution. (Following standard practice, in this paper whenever we use the term “kurtosis,” we will refer to the excess kurtosis . The excess kurtosis of the normal distribution to 0.) Note that for and , the system is order-3 stable, and so, although the oscillations of the skewness shown in Figure 3(a) appear to grow bigger and bigger, suggesting instability, this is actually only a transient effect, as shown in Figure 4 where we integrate the equations over 200 generations instead of 30.

Figure 4: Predicted dynamics of the skewness of the PSO sampling distribution over 200 generations for , , , , and .

5. Comparison between PSO's

In the previous sections, we studied the canonical PSO with the restriction that the acceleration coefficients, and , were identical: . One may wonder, however, whether allowing such coefficients to differ would produce qualitatively very different dynamics. For example, what would happen if we set one of the 's to zero as in a purely cognitive or purely social PSO model? This effectively would reduce to one the sources of random influences on a particle's dynamics. Conversely, one might wonder what would happen if we increased the number of such sources of influence, as is done, for example, in the fully informed particle swarm (FIPS) [18, 19].

5.1. Moments of Generalised PSO's

To answer these (and other) important questions on the sampling distribution of different PSO models, we adopt a FIPS-like general class of PSO's described by the following difference equation:

(24) where the 's are stochastic variables uniformly distributed in the range , are constants, and the 's are the personal best positions of neighbours of the particle (the particle itself may be included in its own neighbourhood). Naturally, this equation can be converted into the following:

(25) which is a generalisation of (2). All of the steps we performed in Section 3 can be repeated for (25). These lead to recursions of the form in (17) with the only difference that the coefficients take the more general form

(26) where , , , are appropriate constants.

Because (25) contains terms, the complexity of the expansion now grows as , where is the order of the moments we are interested in. So the larger , the heavier the computation load required to compute . Once the transition matrices are computed, however, they are exactly of the same size for all PSO models within the class defined by (24). Initial conditions can be found following the approach described in Section 3. Calculations are expensive but can be mechanised. We did this for the examples described below.

5.2. Three PSO's We Want to Compare

An extensive comparison of different PSO's is beyond the scope of this paper. However, as an example of the kind of comparisons one can make using our approach, we considered the PSO's in (24) with . Within this class of PSO's, we considered three variants as follows:

(a) a purely social variant of PSO, which we will call social PSO for brevity, where and (due to symmetries, the behaviour of a purely cognitive PSO where and is effectively identical to that of this social PSO); (b) the canonical PSO we have studied so far in the paper, which is obtained by setting and ; (c) the simplest version of FIPS with a neighbourhood of three individuals, for example, obtained using an lbest topology and an interaction radius of 1, where . We will call this version FIPS3.

In order to perform a fair comparison of the stability properties of these PSO variants, we study them in conditions where the sum of the amplitudes of the random components, , is identical across models. That is, we set , and compare models with the same value. Again, we analyse eigenvalues.

5.3. Results of the Comparison

Figures 58 show the lines in the plane where the magnitude of the largest eigenvalue of , is 1 for and for the three PSO variants mentioned above. Each figure represents a different moment. In the plot for the th moment, the region below each line represents the set of all pairs for which the th moment of the distribution of the PSO corresponding to that line is stable. Let us analyse these figures in detail.

Figure 5: Lines below which the mean of the sampling distributions for a social PSO, a canonical PSO, and FIPS with a neighbourhood of three individuals have a fixed point (order-1 stability). The three lines coincide.
Figure 6: Lines below which the variance of the sampling distributions for a social PSO (bottom), a canonical PSO (middle), and FIPS with a neighbourhood of three individuals (top) have a fixed point (order-2 stability).
Figure 7: Lines below which a social PSO, a canonical PSO, and FIPS3 are order-3 stable (from bottom to top, resp.).
Figure 8: Lines below which the kurtosis of the sampling distributions for a social PSO (bottom), a canonical PSO (middle), and FIPS3 (top) have a fixed point (order-4 stability).

Firstly, we should note that the regions of order-1 stability for the three models are identical. This is because the dynamics of the mean of the three models is governed by equations of the same form, namely,

(27) where the constant term may differ in different PSO variants. (This is irrelevant for the stability of the system, since stability is determined by the homogeneous part of the equation.) Note also that the rightmost point in each plot is an artifact due to the fact that, at , for irrespective of the value of .

The regions where the variance is stable for the three models, instead, are different, with FIPS3 having the largest region of order-2 stability, followed by the canonical PSO, and finally, by the social PSO. Exactly the same happens with skewness (Figure 7) and kurtosis (Figure 8), with the order-3 stability region largely coinciding with the order-2 ones also for FIPS3 and the social PSO.

These results are counter intuitive. One would expect that the more sources of randomness, the 's, there are, the more a PSO should be unstable. However, the exact opposite happens. The social PSO, where the only influence is , is the least stable of all models, while FIPS3, which has three sources of randomness, is the most stable. What are the reasons for this behaviour?

We can understand this by rewriting (25) as follows:

(28) where and . Both and are the sum of independent and uniformly distributed variables: the variables in the case of and the variables in the case of .

We know that . To simplify our treatment, let us further assume that the 's are i.i.d., that is, that all are identical, and so . We can then apply the central limit theorem to . This predicts that for sufficiently large , the distribution of is approximately Gaussian with mean and variance . So the larger , the smaller the variance of and the stochasticity of (28).

In the case of the stochastic variable , the quantities are not identically distributed even if all are identical. This is because, in principle, each may be different. This prevents the use of the standard central limit theorem. We can, however, apply Lyapunov's central limit theorem to . The conditions for its application are as follows:

(1) the variables must have finite mean, which is the case since , (2) the variables must have finite variance, which, again, is the case since , (3) the variables must have finite third central moment, which is satisfied since , and finally, (4) the Lyapunov condition, , must be satisfied, which, again, is the case since all .

Then for sufficiently large , also the distribution of is approximately Gaussian with mean and variance . Note that and are the mean and the mean , respectively. So these are finite quantities if, as is normally the case, all are finite. (PSO search is normally confined to a prefixed, finite region of , and so all must be finite.) So like for , the larger , the smaller the variance of , and consequently, the less the stochasticity of (28).

Effectively, the larger , the more and become deterministic and approach constant values. This explains why adding more and more sources of randomness—while keeping constant—produces progressively more and more stable PSO's.

6. The Density Function of the Canonical PSO's Sampling Distribution

The technique described in this paper, in principle, would allow one to determine all the moments of the sampling distribution of the PSO at all times. The question then is “could we derive the PSO sampling distribution itself ?” The answer is of course in the positive since knowing all the moments of a distribution implies knowing its moment generating function. This, in turn, allows one to obtain the density function of the distribution via inverse Laplace transform.

In practice, however, it is impossible to compute all the moments of the PSO sampling distribution. This is for two reasons. Firstly, there are infinitely many such moments. Secondly, as we have seen in the previous sections, the cost of computing moments is exponential in the order of the moments. The next question is then “to what extent can we still reconstruct the PSO's density function from a finite number of moments?” This is an instance of the well-known truncated moment problem, a difficult, inverse ill-posed problem for which many approaches have been proposed. Here, we consider only one such approach.

A particularly simple idea is to consider a family of density functions with parameters , , and so forth, with sufficient expressive power to represent distributions with widely different shapes, with more or less asymmetries, with tails of different characteristics, and so on. Then one can use optimisation techniques to identify the parameters of the distribution which minimise the difference between the moments of and the moments of the PSO's sampling distribution. This is called the moment-matching method. Once the parameters , , and so forth, are identified, can be used as an approximation of the true PSO sampling distribution. This approach to reconstructing probability distributions from moments was proposed [20] (see also [21, 22]), where a generalised lambda distribution (GLD) was used. We adopt this same approach here.

GLD is a four-parameter distribution defined via its quantile function as follows:

(29) where . Its density function is given by

(30) where .

The GLD is enormously flexible in terms of the shape of the distribution. For example, as shown in Figure 9, the uniform, Gaussian, exponential, and Gamma distributions are all special cases of GLD. Effectively, determines the location of the distribution, determines its scale, while and determine other shape characteristics. In particular, only if , the distribution is symmetric.

Figure 9: Examples of GLD density functions.

Because GLD has 4 parameters, all we need is four moments—the mean, variance, skewness, and kurtosis—of the PSO's sampling distribution in order to identify such parameters with the moment-matching method described above.

As an illustration, we apply this technique to reconstruct the sampling distribution during stagnation of a canonical PSO with parameters , , , , and . With these parameters, . In Figure 10, we show snapshots at times and 24 of the theoretical sampling distribution together with estimates of the distribution based on 1 000 000 actual runs. In all cases, the match between the moments of the GLD and those of the PSO sampling distribution was exact (within experimental errors). Also, there is considerable agreement between the theoretical lines and histograms obtained in real runs. Note how widely the mean of the density function oscillates around 5 in the first few generations. Also note the asymmetry in the distributions due to the oscillations of the skewness.

Figure 10: Estimates of the sampling distribution of a canonical PSO with parameters , , , , and during stagnation, reconstructed via GLD best fitting vers. histograms over 1 000 000 real runs. Snapshots at times , 2, 4, 12, and 24 are shown. For each theoretical sampling distribution, we report the parameters of the corresponding GLD.

7. Discussion and Conclusions

Several theoretical analyses of the dynamics of particle swarms have been offered in the literature over the last decade. These have been very illuminating. However, virtually, all have relied on substantial simplifications and on the assumption that the particles are deterministic. Naturally, these simplifications make it impossible to derive an exact characterisation of the sampling distribution of the PSO.

In previous work [16], by using surprisingly simple techniques, we started by exactly determining, perhaps, the most important characteristic of a PSO's sampling distribution, its variance, and we have been able to explain how it changes over any number of generations. The only assumption we made is stagnation. Here, we extended this technique to the study of higher-order statistics. In particular, we analysed, in detail, the skewness and kurtosis of the distribution. Because of the complexity of the calculations involved, this required mechanising the derivation of the recursions for these moments.

We applied the analysis to the PSO with inertia weight, but the analysis is also valid for the PSO with constriction, because of the well-known equivalence of these two models (via a simple parameter mapping). We also generalised our model so as to include FIPS. This made it possible to explicitly compare the stability of different forms of PSO, leading to a deeper understanding of their properties. In particular, we showed that while FIPS and standard forms of PSO present exactly the same order-1 stability, in FIPS, higher-order moments are more stable than in the other PSO's, and we were able to explain why this is the case, using two forms of central limit theorem (Section 5). Elsewhere [23], we also applied the same type of analysis to derive the stability regions of a simpler form of velocity-based PSO.

Finally, with all these tools in hand, we went in search for the “holy grail”—the actual PSO sampling density function. We treated the problem as an ill-posed inverse problem, which we regularised thanks to the use (and best fit) of a family of distributions—the generalised lambda distribution (GLD). All empirical evidence we have suggests that this distribution approximates very closely the sampling behaviour of PSO's (naturally, with parameters that are functions of time, i.e., ). We will attempt to characterise the PSO's sampling distribution in more depth in future research.

Whether or not GLD is the exact PSO sampling distribution or just a very good approximation, if one could determine (again, either exactly or approximately) how the 's are affected by the parameters , , , and and by the initial conditions and , it would then be possible to accurately simulate the behaviour of the PSO by sampling from . This is easily done since GLD deviates can trivially be produced by picking uniformly at random in the interval and applying (29), that is, is generalised lambda distributed. We will study this more sophisticated form of bare-bones PSO [17] in future research.

Bare-bone PSO's have some resemblance with estimation of distribution algorithms (EDA's). EDA's are powerful population-based searchers where the variation operations traditionally implemented via crossover and mutation in evolutionary algorithms are replaced by the process of sampling from a distribution (a review of EDA's is available in [24]). The distribution is modified generation after generation, by using information obtained from the more fit individuals in the population. The objective of these changes is to increase the probability of generating individuals with high fitness. As one can clearly see from this description, a bare-bone PSO performs a very similar form of search to an EDA. However, there is a difference between the two. While in a typical EDA the whole new population is generated from a single distribution, in a bare-bone PSO, each individual has a personal sampling distribution from which only one sample is drawn at each iteration. (The sampling distributions of different particles may share some characteristics, since the sampling distribution of a particle is determined by both personal best and neighbourhood best, and the neighbourhood best can easily be the same for different particles.) Nonetheless, one could still interpret a bare-bone PSO as an unusual type of EDE in which multiple distributions are used and updated based on fitness during the search.

Acknowledgments

The author would like to acknowledge the support by EPSRC (Grant no. GR/T11234/01, “Extended Particle Swarms (XPS)”) and the help and comments from Maurice Clerc, David Broomhead, Susanna Wau Men Au-Yeung, and Roberto Renó. The anonymous reviewers and the editors are also thanked for their useful feedback.

References

  1. J. Kennedy, “The behavior of particles,” in Proceedings of the 7th International Conference on Evolutionary Programming (EP '98), V. W. Porto, N. Saravanan, D. Waagen, and A. E. Eiben, Eds., p. 581, Springer, San Diego, Calif, USA, March 1998.
  2. M. Clerc and J. Kennedy, “The particle swarm—explosion, stability, and convergence in a multidimensional complex space,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 1, 58 pages, 2002.
  3. E. Ozcan and C. K. Mohan, “Analysis of a simple particle swarm optimization system,” in Intelligent Engineering Systems Through Artificial Neural Networks, vol. 8, p. 253, ASME Press, St. Louis, Mo, USA, 1998.
  4. E. Ozcan and C. K. Mohan, “Particle swarm optimization: surfing the waves,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '99), vol. 3, p. 1939, IEEE Service Center, Washington, DC, USA, July 1999.
  5. F. van den Bergh, An analysis of particle swarm optimizers, Ph.D. thesis, Department of Computer Science, University of Pretoria, Pretoria, South Africa, 2002.
  6. A. P. Engelbrecht, Fundamentals of Computational Swarm Intelligence, John Wiley & Sons, New York, NY, USA, 2005.
  7. K. Yasuda, A. Ide, and N. Iwasaki, “Adaptive particle swarm optimization,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, vol. 2, p. 1554, Washington, DC, USA, October 2003.
  8. T. M. Blackwell, “Particle swarms and population diversity,” Soft Computing, vol. 9, no. 11, 793 pages, 2005.
  9. B. Brandstatter and U. Baumgartner, “Particle swarm optimization—mass-spring system analogon,” IEEE Transactions on Magnetics, vol. 38, no. 2, 997 pages, 2002.
  10. I. C. Trelea, “The particle swarm optimization algorithm: convergence analysis and parameter selection,” Information Processing Letters, vol. 85, no. 6, 317 pages, 2003.
  11. E. F. Campana, G. Fasano, and A. Pinto, “Dynamic system analysis and initial particles position in particle swarm optimization,” in Proceedings of the IEEE Swarm Intelligence Symposium (SIS '06), Indianapolis, Ind, USA, May 2006.
  12. E. F. Campana, G. Fasano, D. Peri, and A. Pinto, “Particle swarm optimization: efficient globally convergent modifications,” in Proceedings of the 3rd European Conference on Computational Mechanics, Solids, Structures and Coupled Problems in Engineering (ECCM '06), C. A. Mota Soares, J. A. C. Martins, H. C. Rodrigues, and J. A. C. Ambrosio, Eds., Lisbon, Portugal, June 2006.
  13. M. Clerc, “Stagnation analysis in particle swarm optimisation or what happens when nothing happens,” Tech. Rep. CSM-460, Department of Computer Science, University of Essex, Colchester, UK, August 2006.
  14. V. Kadirkamanathan, K. Selvarajah, and P. J. Fleming, “Stability analysis of the particle dynamics in particle swarm optimizer,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 3, 245 pages, 2006.
  15. D. H. Wolpert and W. G. Macready, “No free lunch theorems for search,” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, 67 pages, 1997.
  16. R. Poli and D. Broomhead, “Exact analysis of the sampling distribution for the canonical particle swarm optimiser and its convergence during stagnation,” in Proceedings of the 9th Genetic and Evolutionary Computation Conference (GECCO '07), p. 134, ACM Press, London, UK, July 2007.
  17. J. Kennedy, “Bare bones particle swarms,” in Proceedings of the IEEE Swarm Intelligence Symposium (SIS '03), p. 80, Indianapolis, Ind, USA, April 2003.
  18. J. Kennedy and R. Mendes, “Population structure and particle swarm performance,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '02), vol. 2, p. 1671, Honolulu, Hawaii, USA, May 2002.
  19. R. Mendes, Population topologies and their influence in particle swarm performance, Ph.D. thesis, Departamento de Informatica, Escola de Engenharia, Universidade do Minho, Braga, Portugal, 2004.
  20. J. S. Ramberg, P. R. Tadikamalla, E. J. Dudewicz, and E. F. Mykytka, “A probability distribution and its uses in fitting data,” Technometrics, vol. 21, no. 2, 201 pages, 1979.
  21. A. Lakhany and H. Mausser, “Estimating the parameters of the generalized lambda distribution,” Algo Research Quarterly, vol. 3, no. 3, 47 pages, 2000.
  22. S. W. M. AuYeung, Finding probability distributions from moments, M.S. thesis, Imperial College, London, UK, 2003.
  23. R. Poli, D. Bratton, T. M. Blackwell, and J. Kennedy, “Theoretical derivation, analysis and empirical evaluation of a simpler particle swarm optimiser,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '07), IEEE Press, Singapore, September 2007.
  24. P. Larrañaga and J. A. Lozano, Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, Kluwer Academic Publishers, Boston, Mass, USA, 2002.