Department of Computing and Electronic Systems, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
Recommended by T. Blackwell
Abstract
For stochastic optimisation algorithms, knowing the probability distribution with which
an algorithm allocates new samples in the search space is very important, since this explains
how the algorithm really works and is a prerequisite to being able to match algorithms to
problems. This is the only way to beat the limitations highlighted by the no-free lunch theory.
Yet, the sampling distribution for velocity-based particle swarm optimisers has remained a
mystery for the whole of the first decade of PSO research. In this paper, a method is presented
that allows one to exactly determine all the characteristics of a PSO's sampling distribution
and explain how it changes over time during stagnation (i.e., while particles are in search for
a better personal best) for a large class of PSO's.
1. Introduction
Let us consider the basic form of PSO with inertia
weight, which is controlled by the following two equations:
(1)
where
and
represent
vectors of random numbers uniformly distributed in
and
, respectively, (
and
are nonnegative constants),
is
component wise multiplication,
is the position
of the
th particle's
personal best,
is the position
of the particle's neighbourhood best, and
is a constant
in
. Following Kennedy's graphical examinations of the
trajectories of individual particles and their responses to variations in key
parameters [1], traditionally, this model (and the related PSO with
constriction [2], which is, provably, mathematically equivalent) has been
studied theoretically under strong simplifying assumptions such as isolated
single individuals, search stagnation (i.e., no improved solutions are found)
and absence of randomness [2–12]. While these assumptions make it possible to make
theoretical progress, they also make the results of the analyses difficult to
carry over to the real PSO (where stochasticity is present).
Only very few attempts to understand the behaviour of
the PSO in the presence of
stochasticity have been made. For example, Clerc [13] analysed the
distribution of velocities of one particle controlled by the standard PSO
update rule with inertia and stochastic forces and was able to show that a
particle's new velocity is the sum of three components: a forward force, a
backward force, and noise. Kadirkamanathan et al. [14] were able to study the
stability of particles in the presence of stochasticity by using Lyapunov
stability analysis.
These are good and important first steps in the
direction of modelling the real PSO. However, they only marginally scratch the
surface in relation to one of the most important open problems in PSO research:
the characterisation of the PSO's sampling distribution and its changes over
time. Why is this question so fundamental? For two reasons. Firstly, this
scientific knowledge gives us much greater understanding of an algorithm's
behaviour. Secondly, on a more practical level, the only way to beat the
limitations implied by NFL [15] is to match algorithms to problems. Naturally,
at least in part, this can be achieved by a trial-and-error approach involving
testing different algorithms and parameter settings on a particular problem or
class of problems until a suitable match is found. A more analytic approach to
this problem, however, requires two things: (a) knowing the search distribution
of an algorithm and (b) checking whether this is well matched to the features
of a problem's fitness landscape. Unfortunately, the sampling distribution of
the PSO has been unavailable during the whole first decade (and beyond) of PSO
research.
In recent work [16], we introduced a novel method,
which allowed us to exactly determine the mean and standard deviation of the
sampling distribution of the canonical PSO as well as their changes over any
number of generations. The only assumption we made was stagnation, that is, we
studied the sampling distribution produced by particles in search for a better
personal best. In this paper, we generalise the technique and show how it can
be used to determine any number of moments of the sampling distribution of
PSO's.
The paper is organised as follows. In Section 2, we
provide a summary of the results presented in [16]. In Section 3, we
generalised the method to moments of any order. In Section 4, we apply the
technique to compute the skewness and kurtosis of the canonical PSO's sampling
distribution. In Section 5, we further extend the generality of our technique
so as to cover a variety of PSO's, not just the one in (1). This allows us
to compare, on a theoretical basis, different forms of PSO. In Section 6, we
then use the moments of the PSO's sampling distribution to approximately
reconstruct the distribution itself, making it possible, for the first time, to
understand the search strategy implemented by the canonical PSO, while
searching for new improvements. We discuss the results and conclude in Section
7.
2. Dynamics of First- and Second Order Moments of the PSO's Sampling Distribution
2.1. Derivation of Dynamic Equations for
the Moments
During stagnation, each particle behaves
independently. Also, each dimension is treated independently. So we can analyse
each particle's behaviour in isolation. Therefore, we can drop the superscript
in (1),
and rewrite them as
(2)
where we used
the relation
. Note that on the r.h.s. of this equation, while
,
, and
are constants,
,
,
, and
are stochastic
variables. As a consequence,
on the
l.h.s. of the equation is a
stochastic variable too (in fact, it is a stochastic function of the stochastic
variables appearing in the r.h.s.).
In [16], we applied the expectation operator to both
sides of the equation obtaining
(3)
where we
performed the substitution
because of the
statistical independence between
and
. Assuming that
, that is, the
and
are both
uniformly distributed in
, we have
, and so
(4)
where we
renamed
.
If we square both sides of (2), we obtain
(5)
Expanding the
r.h.s., we obtain an equation expressing the stochastic variable
as a function
of other stochastic variables, that is,
,
,
, and so forth. By applying the expectation operator
to both sides of this equation, one obtains
(6)
where we set
and
, for brevity.
If, instead, we multiply both sides of (2) by
and apply the
expectation operator, we obtain
(7)
Equations (4), (6), and (7) form a system of coupled
second-order difference equations using which, given appropriate initial
conditions, one can compute
,
and
for any
.
2.2. Analysis of the Dynamics
Equations (4), (6), and (7) can also be written in
matrix notation as the extended first order system
(8)
where
(9)
where
,
,
.
It is then trivial to verify under what conditions
,
, and
will converge
to stable fixed points. We need to have that all eigenvalues of
must be within
the unit circle, that is,
. When this happens, we will say that the PSO is order-2 stable.
Naturally, when
, in principle, we could derive (either symbolically
or numerically) the fixed point for the system, which we will denote as
. This would be simply given by
(10)
When the system is order-2 stable, by the simple
change of variables
we can represent the dynamics of the system via the
following linear homogeneous equation:
(11)
which can trivially be integrated to obtain the
following explicit solution for the dynamics of the first two moments:
(12)
3. Higher-Order Moments
In [16], we obtained recursions which describe the
dynamics of first- and second-order moments of the sampling distribution of a
standard PSO during stagnation, as summarised in the previous section. One may
then wonder whether it would be possible to follow a similar approach to study
the dynamics of higher-order moments.
3.1. Derivation of Dynamic Equations Forhigher-Order Moments
The fundamental question is “what quantities
would we have to deal with if we took higher powers of both sides of (2) as we
did to derive (5)?” Generally, the r.h.s. would be a sum of terms of the
form
(13)
where
are suitable
constants. Naturally, taking powers of (2) and then multiplying both sides by
some power of
would also lead
to equations involving terms such as those in (13). That is, for any choice of
and
,
(14)
where
are suitable
constants. If we then take expectations for both sides, we obtain
(15)
where we used
the independence of
,
,
, and, of course, their powers. If
and
are uniformly
distributed in the same range, namely,
, it is easy to verify that
(16)
So
(17)
where
(18)
It is important
to note here that because (2) is linear in
and
, all the terms on the r.h.s. of (14) and (17) respect
the relation
. This implies that it is possible to construct recursions for
moments of arbitrary order.
3.2. Complexity of the Calculations
If one wanted to push the analysis developed in
Section 2 up to order 3, one would need to instantiate (17) for
,
,
and add the
resulting equations to (4), (6), and (7). If one wanted to go to order 4, an
additional set of four equations (for
,
,
, and
) would be
needed, bringing the total to 10.
More generally, in order to compute statistics of
order
, one needs to construct and iterate
(19)
second-order
difference equations. Since, after expansion, the r.h.s. of (2) contains 7
atomic terms of the form in (13), the r.h.s. of (17) contains
such
terms (Note that the exponent
does not influence
the number of terms because the recursion for
is
obtained as follows: (a) we compute
, which is given by an expression containing
terms; (b) we
multiply each term by
, which changes the exponents
but does not
alter the number of terms; (c) we apply the expectation operator, which again
does not modify the number of terms.) So the total number of terms one needs to
compute to construct the equations for order-
statistics is
(for the order
1 equations) plus
(for the order
2 equations) plus
(for the order
3 equations), and so forth. This gives us a total of
(20)
terms. Note that
grows
exponentially approximately as
. So although the number of equations one needs to
deal with grows quadratically, the computational effort required to instantiate
them is exponential. For instance,
,
,
,
, and
. The growth in number of terms can be reduced if one
makes explicit use of
(i.e., by
adding the factor
in (13)). Then
. Either way, manually deriving equations for moments
of order 3 is already very laborious. The process, however, is clearly
mechanisable. This can be done using computer algebra systems, or by explicitly
representing and manipulating the
's in each
equation (this is what we did). As a result of mechanisation, computing the
equations for up to order 6 or 7 is feasible with an ordinary personal
computer.
Some of the
's in (17)
present the same pattern of exponents for
,
,
, and
, so terms can be collected leading to more compact
equations (e.g., compare (5) and (6)). Also, given their size, one will
normally want to study (e.g., integrate) (17) numerically. In this case,
,
,
, and
are all
numerical parameters. So the
's become
constants and, after collecting terms, each equation contains at most
terms, which,
as we know, is quadratic in the order
. As a result, although the complexity of the
construction of the motion equations for the moments is exponential in the
order of the moments, their numerical integration is only of order
.
Naturally, the system of
second-order
difference equations necessary to predict the dynamics of moments of order 1 to
can be turned
into a system of order 1 of the form in (8), via the choice
(21)
This effectively means adding artificial update
equations of the form
for
, bringing the total to
. The transition matrix for the system is, therefore,
of size
. We will denote this with
. For example, for
, which would allow one to study the mean, variance,
skewness, and kurtosis of the sampling distribution as a function of
,
is merely a
matrix.
Interestingly,
grows so slowly
that one can perform an eigenvalue analysis for any
that one is
able to compute. That is, the expensive part of the process is the construction
of
. Once this is done, iterating the system,
establishing its stability, or finding its fixed points is a trivial matter.
In the next section, we provide results for statistics
of order 3 and 4, that is,
, a value of
for which
computing
takes only a
few seconds. However, before we do this, we need to consider the initial
conditions for the system.
3.3. Initial Conditions
In order to provide initial conditions for the
moments' dynamics, we need to compute
and
for generic
and
.
If we are at the very first iteration of the PSO
algorithm, the initial conditions for the dynamics of the moments are related
to the ranges used to initialise positions and velocities in the PSO. Under the
assumption that a particle's initial position,
, is chosen uniformly at random in a symmetric range
, we have
(22)
In order to
compute
, we need to consider the equation
(23)
where a
particle's initial velocity,
, is a stochastic variable uniformly distributed the
range
(often
). By taking
the
th power of
both sides of the equation, multiplying by
, and taking expectations, as we did to construct
(17), one obtains the desired expressions for
. Like for (17), these expressions contain a number of
terms that grows exponentially with
. However, this process, too, can be trivially
mechanised.
If the PSO is not at its first iteration, and a new
improvement has just been found to a particle's personal best or neighbourhood
best, then
is not a
stochastic variable, but a constant determined by the position where the
particle found the last improvement.
4. Skewness and Kurtosis of the PSO's Sampling Distribution
We constructed the recursions for moments of up to
order 4 as described in the previous section for the canonical PSO. Naturally,
for higher-order moments, we can do exactly the same type of eigenvalue
analysis we did in [16] for the mean and standard deviation of the sampling
distribution. For example, Figure 1 shows the magnitude of the largest
eigenvalue,
, of
as a function
of the parameters
and
. The system is order-4 stable, that is, mean,
variance, skewness, and kurtosis have a stable fixed point, whenever
, that is, within the curved region enclosed by the
contour shown in the figure.
Figure 1: Magnitude of the largest
eigenvalue of

as a function
of the parameters

and

. The curved line on the surface encloses the order-4
stable region.
For easier comparison, we show the lines where
for
,
,
, and
in Figure 2
(ordered from top to bottom, resp.). The regions of order-1, -2, -3, and
-4 stability are nested. Note how the
lines for
and
coincide for
many values of
. Note also that the standard setting,
and
, lays within the narrow region of order-3 stability.
This implies that while mean, variance, and skewness of the standard PSO tend
to a fixed point, kurtosis is unstable and will tend to grow indefinitely.
Interestingly, a growth in the kurtosis of samples was observed by Kennedy
[17], although this was effectively computed under the assumption that the
sampling distribution is time independent. So the values of
recorded in a
run at
grows were
treated as different samples from the same distribution, while we now know this
is incorrect.
Figure 2: Plot of the regions
of order-1, -2, -3, and -4 stability for the canonical PSO.
That the predictions of the model are exact is also
confirmed by the comparison of the dynamics of predicted and recorded
higher-order moments. Figure 3(a) shows a comparison between the skewness
computed using
our model and the average positions of the particle recorded in one billion (1
000 000 000) real runs in the first 30 iterations for the case
,
,
,
, and
. As one can see, there is a very good match between
the model's predictions and the stagnation behaviour of particles in real runs.
Only after about 20–25 generations, the sampling errors start accumulating
enough to show significant differences. We know that these differences are due
to sampling errors because before performing 1 000 000 000 runs, we attempted
to use much smaller sets of runs. With fewer runs, we observed that errors were
visible well before generation 25. For example, with 1 000 000 runs, there is a
good match up to around iteration 15.
Figure 3: Comparison between predicted
and experimental skewness and (excess) kurtosis of the PSO sampling
distribution for

,

,

,

, and

. Kurtosis grows exponentially, and so it is plotted
on a logarithmic scale. The first point is not plotted because the excess
kurtosis was negative (−1.2).
As shown in Figure 3(b), the model also predicts
very well the behaviour of the (excess) kurtosis
of the sampling
distribution. (Following standard practice, in this paper whenever we use the
term “kurtosis,” we will refer to the excess kurtosis
. The excess kurtosis of the normal distribution to
0.) Note that for
and
, the system is order-3 stable, and so, although the
oscillations of the skewness shown in Figure 3(a) appear to grow bigger and
bigger, suggesting instability, this is actually only a transient effect, as
shown in Figure 4 where we integrate the equations over 200 generations instead
of 30.
Figure 4: Predicted dynamics of
the skewness of the PSO sampling distribution over 200 generations for

,

,

,

, and

.
5. Comparison between PSO's
In the previous sections, we studied the canonical PSO
with the restriction that the acceleration coefficients,
and
, were identical:
. One may wonder, however, whether allowing such
coefficients to differ would produce qualitatively very different dynamics. For
example, what would happen if we set one of the
's
to
zero as in a purely cognitive or purely social PSO model? This effectively
would reduce to one the sources of random influences on a particle's dynamics.
Conversely, one might wonder what would happen if we increased the number of
such sources of influence, as is done, for example, in the fully informed
particle swarm (FIPS) [18, 19].
5.1. Moments of Generalised PSO's
To answer these (and other) important questions on the
sampling distribution of different PSO models, we adopt a FIPS-like general
class of PSO's described by the following difference equation:
(24)
where the
's are
stochastic variables uniformly distributed in the range
,
are constants, and
the
's are the
personal best positions of neighbours of the particle (the particle itself may
be included in its own neighbourhood). Naturally, this equation can be
converted into the following:
(25)
which is a generalisation
of (2). All of the steps we performed in Section 3 can be repeated for (25).
These lead to recursions of the form in (17) with the only difference that the
coefficients
take the more
general form
(26)
where
,
,
,
are appropriate
constants.
Because (25) contains
terms, the
complexity of the expansion now grows as
, where
is the order of
the moments we are interested in. So the larger
, the heavier the computation load required to compute
. Once the transition matrices
are computed,
however, they are exactly of the same size for all PSO models within the class
defined by (24). Initial conditions can be found following the approach
described in Section 3. Calculations are expensive but can be mechanised. We
did this for the examples described below.
5.2. Three PSO's We Want to Compare
An extensive comparison of different PSO's is beyond
the scope of this paper. However, as an example of the kind of comparisons one
can make using our approach, we considered the PSO's in (24) with
. Within this class of PSO's, we considered three
variants as follows:
(a)
a purely social
variant of PSO, which we will call social PSO for brevity, where
and
(due to
symmetries, the behaviour of a purely cognitive PSO where
and
is effectively
identical to that of this social PSO);
(b)
the canonical PSO we have studied so far in
the paper, which is obtained by setting
and
;
(c)
the simplest
version of FIPS with a neighbourhood of three individuals, for example,
obtained using an lbest topology
and an interaction radius of 1, where
. We will call this version FIPS3.
In order to perform a fair comparison of the stability
properties of these PSO variants, we study them in conditions where the sum of
the amplitudes of the random components,
, is identical across models. That is, we set
, and compare models with
the same
value. Again,
we analyse eigenvalues.
5.3. Results of the Comparison
Figures 5–8 show the lines in the
plane where the
magnitude of the largest eigenvalue of
,
is 1 for
and for the
three PSO variants mentioned above. Each figure represents a different moment.
In the plot for the
th moment, the
region below each line represents the set of all
pairs for which
the
th moment of
the distribution of the PSO corresponding to that line is stable. Let us
analyse these figures in detail.
Figure 5: Lines below which the
mean of the sampling distributions for a social PSO, a canonical PSO, and FIPS
with a neighbourhood of three individuals have a fixed point (order-1
stability). The three lines coincide.
Figure 6: Lines below which
the variance of the sampling distributions for a social PSO (bottom), a
canonical PSO (middle), and FIPS with a neighbourhood of three individuals
(top) have a fixed point (order-2 stability).
Figure 7: Lines below which a
social PSO, a canonical PSO, and FIPS3 are order-3 stable (from bottom to top,
resp.).
Figure 8: Lines below which
the kurtosis of the sampling distributions for a social PSO (bottom), a
canonical PSO (middle), and FIPS3 (top) have a fixed point (order-4 stability).
Firstly, we should note that the regions of order-1
stability for the three models are identical. This is because the dynamics of
the mean of the three models is governed by equations of the same form, namely,
(27)
where the
constant term may differ in different PSO variants. (This is irrelevant for the
stability of the system, since stability is determined by the homogeneous part
of the equation.) Note also that the rightmost point in each plot is an
artifact due to the fact that, at
,
for
irrespective of
the value of
.
The regions where the variance is stable for the three
models, instead, are different, with FIPS3 having the largest region of order-2
stability, followed by the canonical PSO, and finally, by the social PSO.
Exactly the same happens with skewness (Figure 7) and kurtosis (Figure 8), with
the order-3 stability region largely coinciding with the order-2 ones also for
FIPS3 and the social PSO.
These results are counter intuitive. One would expect
that the more sources of randomness, the
's, there are,
the more a PSO should be unstable. However, the exact opposite happens. The
social PSO, where the only influence is
, is the least stable of all models, while FIPS3,
which has three sources of randomness, is the most stable. What are the reasons
for this behaviour?
We can understand this by rewriting (25) as follows:
(28)
where
and
. Both
and
are the sum of
independent and uniformly distributed variables: the variables
in the case of
and the
variables
in the case of
.
We know that
. To simplify our treatment, let us further assume
that the
's are i.i.d.,
that is, that all
are identical,
and so
. We can then apply the central limit theorem to
. This predicts that for sufficiently large
, the distribution of
is
approximately Gaussian with mean
and variance
. So the larger
, the smaller the variance of
and the
stochasticity of (28).
In the case of the stochastic variable
, the quantities
are not
identically distributed even if all
are identical.
This is because, in principle, each
may be
different. This prevents the use of the standard central limit theorem. We can,
however, apply Lyapunov's central limit theorem to
. The conditions for its application are as
follows:
(1)
the variables
must have
finite mean, which is the case since
,
(2)
the variables
must have
finite variance, which, again, is the case since
,
(3)
the variables
must have
finite third central moment, which is satisfied since
, and finally,
(4)
the Lyapunov
condition,
, must be satisfied, which, again, is the case since
all
.
Then for
sufficiently large
, also the distribution of
is
approximately Gaussian with mean
and variance
. Note that
and
are the mean
and the mean
, respectively. So these are finite quantities if, as
is normally the case, all
are finite.
(PSO search is normally confined to a prefixed, finite region of
, and so all
must be
finite.) So like for
, the larger
, the smaller the variance of
, and consequently, the less the stochasticity of
(28).
Effectively, the larger
, the more
and
become
deterministic and approach constant values. This explains why adding more and
more sources of randomness—while keeping
constant—produces progressively more and more stable PSO's.
6. The Density Function of the Canonical PSO's Sampling Distribution
The technique described in this paper, in principle,
would allow one to determine all the moments of the sampling distribution of
the PSO at all times. The question then is “could we derive the PSO
sampling distribution itself ?” The answer is of course in the positive
since knowing all the moments of a distribution implies knowing its moment
generating function. This, in turn, allows one to obtain the density function
of the distribution via inverse Laplace transform.
In practice, however, it is impossible to compute all
the moments of the PSO sampling distribution. This is for two reasons. Firstly,
there are infinitely many such moments. Secondly, as we have seen in the
previous sections, the cost of computing moments is exponential in the order of
the moments. The next question is then “to what extent can we still
reconstruct the PSO's density function from a finite number of moments?”
This is an instance of the well-known truncated moment problem, a difficult,
inverse ill-posed problem for which many approaches have been proposed. Here,
we consider only one such approach.
A particularly simple idea is to consider a family of
density functions
with parameters
,
, and so forth,
with sufficient expressive power to represent distributions with widely
different shapes, with more or less asymmetries, with tails of different
characteristics, and so on. Then one can use optimisation techniques to
identify the parameters of the distribution
which minimise
the difference between the moments of
and the moments
of the PSO's sampling distribution. This is called the moment-matching method. Once the
parameters
,
, and so forth, are identified,
can be used as
an approximation of the true PSO sampling distribution. This approach to
reconstructing probability distributions from moments was proposed [20] (see
also [21, 22]), where a generalised
lambda distribution (GLD) was used. We adopt this same approach here.
GLD is a four-parameter distribution defined via its
quantile function as follows:
(29)
where
. Its density function is given by
(30)
where
.
The GLD is enormously flexible in terms of the shape
of the distribution. For example, as shown in Figure 9, the uniform, Gaussian,
exponential, and Gamma distributions are all special
cases of GLD. Effectively,
determines the
location of the distribution,
determines its
scale, while
and
determine other
shape characteristics. In particular, only if
, the distribution is symmetric.
Figure 9: Examples of GLD density functions.
Because GLD has 4 parameters, all we need
is four moments—the mean, variance, skewness,
and kurtosis—of the PSO's sampling distribution in order to identify such
parameters with the moment-matching method described above.
As an illustration, we apply this technique to
reconstruct the sampling distribution during stagnation of a canonical PSO with
parameters
,
,
,
, and
. With these parameters,
. In Figure 10, we show snapshots at times
and 24 of the theoretical sampling
distribution together with estimates of the distribution based on 1 000 000
actual runs. In all cases, the match between the moments of the GLD and those
of the PSO sampling distribution was exact (within experimental errors). Also,
there is considerable agreement between the theoretical lines and histograms
obtained in real runs. Note how widely the mean of the density function
oscillates around 5 in the first few generations. Also note the asymmetry in
the distributions due to the oscillations of the skewness.
Figure 10: Estimates of the
sampling distribution of a canonical PSO with parameters

,

,

,

, and

during
stagnation, reconstructed via GLD best fitting vers. histograms over 1 000 000
real runs. Snapshots at times

, 2, 4, 12, and 24 are shown. For each theoretical
sampling distribution, we report the parameters of the corresponding GLD.
7. Discussion and Conclusions
Several theoretical analyses of the dynamics of
particle swarms have been offered in the literature over the last decade. These
have been very illuminating. However, virtually, all have relied on substantial
simplifications and on the assumption that the particles are deterministic.
Naturally, these simplifications make it impossible to derive an exact characterisation
of the sampling distribution of the PSO.
In previous work [16], by using surprisingly simple
techniques, we started by exactly determining, perhaps, the most important
characteristic of a PSO's sampling distribution, its variance, and we have been
able to explain how it changes over any number of generations. The only
assumption we made is stagnation. Here, we extended this technique to the study
of higher-order statistics. In particular, we analysed, in detail, the skewness
and kurtosis of the distribution. Because of the complexity of the calculations
involved, this required mechanising the derivation of the recursions for these
moments.
We applied the analysis to the PSO with inertia
weight, but the analysis is also valid for the PSO with constriction, because
of the well-known equivalence of these two models (via a simple parameter
mapping). We also generalised our model so as to include FIPS. This made it
possible to explicitly compare the stability of different forms of PSO, leading
to a deeper understanding of their properties. In particular, we showed that
while FIPS and standard forms of PSO present exactly the same order-1
stability, in FIPS, higher-order moments are more stable than in the other
PSO's, and we were able to explain why this is the case, using two forms of
central limit theorem (Section 5). Elsewhere [23], we also applied the same
type of analysis to derive the stability regions of a simpler form of
velocity-based PSO.
Finally, with all these tools in hand, we went in
search for the “holy grail”—the actual PSO sampling density function. We
treated the problem as an ill-posed inverse problem, which we regularised
thanks to the use (and best fit) of a family of distributions—the
generalised lambda distribution (GLD). All empirical evidence we have suggests
that this distribution approximates very closely the sampling behaviour of
PSO's (naturally, with parameters that are functions of time, i.e.,
). We will
attempt to characterise the PSO's sampling distribution in more depth in future
research.
Whether or not GLD is the exact PSO sampling
distribution or just a very good approximation, if one could determine (again,
either exactly or approximately) how the
's are affected
by the parameters
,
,
, and
and by the
initial conditions
and
, it would then be possible to accurately simulate the
behaviour of the PSO by sampling from
. This is easily done since GLD deviates can trivially
be produced by picking
uniformly at
random in the interval
and applying
(29), that is,
is generalised
lambda distributed. We will study this more sophisticated form of bare-bones
PSO [17] in future research.
Bare-bone PSO's have some resemblance with estimation
of distribution algorithms (EDA's). EDA's are powerful population-based searchers
where the variation operations traditionally implemented via crossover and
mutation in evolutionary algorithms are replaced by the process of sampling
from a distribution (a review of EDA's is available in [24]). The distribution
is modified generation after generation, by using information obtained from the
more fit individuals in the population. The objective of these changes is to
increase the probability of generating individuals with high fitness. As one
can clearly see from this description, a bare-bone PSO performs a very similar
form of search to an EDA. However, there is a difference between the two. While
in a typical EDA the whole new population is generated from a single
distribution, in a bare-bone PSO, each individual has a personal sampling
distribution from which only one sample is drawn at each iteration. (The
sampling distributions of different particles may share some characteristics,
since the sampling distribution of a particle is determined by both personal
best and neighbourhood best, and the neighbourhood best can easily be the same
for different particles.) Nonetheless, one could still interpret a bare-bone
PSO as an unusual type of EDE in which multiple distributions are used and
updated based on fitness during the search.
Acknowledgments
The author would like to acknowledge the support by
EPSRC (Grant no. GR/T11234/01, “Extended Particle Swarms (XPS)”) and the help
and comments from Maurice Clerc, David Broomhead, Susanna Wau Men Au-Yeung, and
Roberto Renó. The anonymous reviewers and the editors are also thanked for
their useful feedback.
References
- J. Kennedy, “The behavior of particles,” in Proceedings of the 7th International Conference on Evolutionary Programming (EP '98), V. W. Porto, N. Saravanan, D. Waagen, and A. E. Eiben, Eds., p. 581, Springer, San Diego, Calif, USA, March 1998.
- M. Clerc and J. Kennedy, “The particle swarm—explosion, stability, and convergence in a multidimensional complex space,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 1, 58 pages, 2002.
- E. Ozcan and C. K. Mohan, “Analysis of a simple particle swarm optimization system,” in Intelligent Engineering Systems Through Artificial Neural Networks, vol. 8, p. 253, ASME Press, St. Louis, Mo, USA, 1998.
- E. Ozcan and C. K. Mohan, “Particle swarm optimization: surfing the waves,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '99), vol. 3, p. 1939, IEEE Service Center, Washington, DC, USA, July 1999.
- F. van den Bergh, An analysis of particle swarm optimizers, Ph.D. thesis, Department of Computer Science, University of Pretoria, Pretoria, South Africa, 2002.
- A. P. Engelbrecht, Fundamentals of Computational Swarm Intelligence, John Wiley & Sons, New York, NY, USA, 2005.
- K. Yasuda, A. Ide, and N. Iwasaki, “Adaptive particle swarm optimization,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, vol. 2, p. 1554, Washington, DC, USA, October 2003.
- T. M. Blackwell, “Particle swarms and population diversity,” Soft Computing, vol. 9, no. 11, 793 pages, 2005.
- B. Brandstatter and U. Baumgartner, “Particle swarm optimization—mass-spring system analogon,” IEEE Transactions on Magnetics, vol. 38, no. 2, 997 pages, 2002.
- I. C. Trelea, “The particle swarm optimization algorithm: convergence analysis and parameter selection,” Information Processing Letters, vol. 85, no. 6, 317 pages, 2003.
- E. F. Campana, G. Fasano, and A. Pinto, “Dynamic system analysis and initial particles position in particle swarm optimization,” in Proceedings of the IEEE Swarm Intelligence Symposium (SIS '06), Indianapolis, Ind, USA, May 2006.
- E. F. Campana, G. Fasano, D. Peri, and A. Pinto, “Particle swarm optimization: efficient globally convergent modifications,” in Proceedings of the 3rd European Conference on Computational Mechanics, Solids, Structures and Coupled Problems in Engineering (ECCM '06), C. A. Mota Soares, J. A. C. Martins, H. C. Rodrigues, and J. A. C. Ambrosio, Eds., Lisbon, Portugal, June 2006.
- M. Clerc, “Stagnation analysis in particle swarm optimisation or what happens when nothing happens,” Tech. Rep. CSM-460, Department of Computer Science, University of Essex, Colchester, UK, August 2006.
- V. Kadirkamanathan, K. Selvarajah, and P. J. Fleming, “Stability analysis of the particle dynamics in particle swarm optimizer,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 3, 245 pages, 2006.
- D. H. Wolpert and W. G. Macready, “No free lunch theorems for search,” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, 67 pages, 1997.
- R. Poli and D. Broomhead, “Exact analysis of the sampling distribution for the canonical particle swarm optimiser and its convergence during stagnation,” in Proceedings of the 9th Genetic and Evolutionary Computation Conference (GECCO '07), p. 134, ACM Press, London, UK, July 2007.
- J. Kennedy, “Bare bones particle swarms,” in Proceedings of the IEEE Swarm Intelligence Symposium (SIS '03), p. 80, Indianapolis, Ind, USA, April 2003.
- J. Kennedy and R. Mendes, “Population structure and particle swarm performance,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '02), vol. 2, p. 1671, Honolulu, Hawaii, USA, May 2002.
- R. Mendes, Population topologies and their influence in particle swarm performance, Ph.D. thesis, Departamento de Informatica, Escola de Engenharia, Universidade do Minho, Braga, Portugal, 2004.
- J. S. Ramberg, P. R. Tadikamalla, E. J. Dudewicz, and E. F. Mykytka, “A probability distribution and its uses in fitting data,” Technometrics, vol. 21, no. 2, 201 pages, 1979.
- A. Lakhany and H. Mausser, “Estimating the parameters of the generalized lambda distribution,” Algo Research Quarterly, vol. 3, no. 3, 47 pages, 2000.
- S. W. M. AuYeung, Finding probability distributions from moments, M.S. thesis, Imperial College, London, UK, 2003.
- R. Poli, D. Bratton, T. M. Blackwell, and J. Kennedy, “Theoretical derivation, analysis and empirical evaluation of a simpler particle swarm optimiser,” in Proceedings of the IEEE Congress on Evolutionary Computation (CEC '07), IEEE Press, Singapore, September 2007.
- P. Larrañaga and J. A. Lozano, Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation, Kluwer Academic Publishers, Boston, Mass, USA, 2002.