Mathematical Problems in Engineering

Volume 2014 (2014), Article ID 147457, 11 pages

http://dx.doi.org/10.1155/2014/147457

## On the Convergence of Biogeography-Based Optimization for Binary Problems

^{1}Department of Electrical Engineering, Shaoxing University, Shaoxing, Zhejiang, China^{2}Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai, China^{3}Department of Electrical and Computer Engineering, Cleveland State University, Cleveland, OH 44115, USA

Received 28 January 2014; Accepted 1 May 2014; Published 22 May 2014

Academic Editor: Erik Cuevas

Copyright © 2014 Haiping Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Biogeography-based optimization (BBO) is an evolutionary algorithm inspired by biogeography, which is the study of the migration of species between habitats. A finite Markov chain model of BBO for binary problems was derived in earlier work, and some significant theoretical results were obtained. This paper analyzes the convergence properties of BBO on binary problems based on the previously derived BBO Markov chain model. Analysis reveals that BBO with only migration and mutation never converges to the global optimum. However, BBO with elitism, which maintains the best candidate in the population from one generation to the next, converges to the global optimum. In spite of previously published differences between genetic algorithms (GAs) and BBO, this paper shows that the convergence properties of BBO are similar to those of the canonical GA. In addition, the convergence rate estimate of BBO with elitism is obtained in this paper and is confirmed by simulations for some simple representative problems.

#### 1. Introduction

Mathematical models of biogeography describe the immigration and emigration of species between habitats. Biogeography-based optimization (BBO) was first presented in 2008 [1] and is an extension of biogeography theory to evolutionary algorithms (EAs). BBO is modeled after the immigration and emigration of species between habitats. One distinctive feature of BBO is that it uses the fitness of each candidate solution to determine its immigration and emigration rate. The immigration rate determines how likely a candidate solution is to change its decision variables, and the emigration rate determines how likely a candidate solution is to share its decision variables with other candidate solutions. Specifically, a candidate solution’s emigration rate increases with fitness, and its immigration rate decreases with fitness.

Although BBO is a relatively new EA, it has demonstrated good performance on various unconstrained and constrained benchmark functions [2–5] and on real-world optimization problems such as sensor selection [1], economic load dispatch [6], robot controller tuning [7], satellite image classification [8], and power system optimization [9]. In addition, Markov models have been derived for BBO on binary problems [10, 11]. Reference [12] discusses the conceptual, algorithmic, and performance differences between BBO and GAs using both Markov model comparisons and benchmark simulation results. These simulation and theoretical results confirm that BBO is a competitive evolutionary algorithm. But until now there have not been any theoretical results concerning its convergence properties.

We say that an optimization algorithm converges to the global optimum if the value of at least one of its candidate solutions, in the limit as the generation count approaches infinity, is equal to the global optimum of the optimization problem. Several mathematical tools have been used to analyze EA convergence [13–17]. Recent work includes the analysis of EA convergence using Markov’s inequality, Chebyshev bounds, Chernoff bounds, and martingales for minimum spanning tree problems, maximum matching problems, scheduling problems, shortest path problems, Eulerian cycle problems, multiobjective problems, and others [18–20].

Markov chain models are still some of the most frequently used methods for the analysis of EA convergence. They have been widely used in a variety of EAs, including genetic algorithms (GAs) [21–25] and simulated annealing [4, 26], to prove probabilistic convergence to the global optimum. A Markov chain is a random process which has a discrete set of possible states . The probability that the system transitions from state to is given by , which is called a transition probability. The matrix = [] is called the transition matrix, where is the total number of possible population distributions. A population distribution is a specific multiset of individuals with a cardinality that is equal to the population size. A Markov state in [11] represents a BBO population distribution. Each state represents a particular population distribution, that is, how many individuals at each point of the search space there are in the population. Probability is the probability that the population transitions from the th population distribution to the th population distribution in one generation.

This paper analyzes the global convergence properties of BBO as applied to optimization problems with binary search spaces, based on a previously derived BBO Markov model, and obtains the convergence rate estimate using homogeneous finite Markov chain properties. Section 2 gives a brief review of BBO and its Markov transition probabilities. Section 3 gives some basic definitions, obtains BBOs convergence properties, and obtains the convergence rate estimate. Section 4 confirms the theory using simple numerical simulation. The convergence properties and convergence rates derived here are not surprising in view of previous EA convergence results, but this paper represents the first time that such results have been formalized for BBO. Finally, Section 5 presents some concluding remarks and directions for future work.

#### 2. Biogeography-Based Optimization (BBO)

This section presents a review of the biogeography-based optimization (BBO) algorithm with migration and mutation (Section 2.1) and then provides a review of the Markov transition probability of BBO populations (Section 2.2).

##### 2.1. Overview of Biogeography-Based Optimization

This section provides an overview of BBO. The review in this section is very general because it applies to optimization problems with either real domains, integer domains, binary domains, or combinations thereof.

BBO is a new optimization approach, inspired by biogeography theory, to solve general optimization problems. A biogeography habitat corresponds to a* candidate solution* to the optimization problem. A multiset of biogeography habitats corresponds to a* population* of candidate solutions. Habitat suitability index (HSI) in biogeography corresponds to the goodness of a candidate solution, which is also called* fitness *in standard EA notation. Like other EAs [27], BBO probabilistically shares information between candidate solutions to improve candidate solution fitness. In BBO, each candidate solution is comprised of a set of features, which are also called independent variables or decision variables in the optimization literature. Note that the decision variables can be taken from sets of real numbers, integers, binary numbers, or combinations thereof, depending on the problem. Each candidate solution immigrates features from other candidate solutions based on its immigration rate and emigrates features to other candidate solutions based on its emigration rate. BBO consists of two main steps: migration and mutation.

*Migration*. Migration is a probabilistic operator that is intended to improve a candidate solution . For each feature of a given candidate solution , the candidate solution’s immigration rate is used to probabilistically decide whether or not to immigrate. If immigration occurs, then the emigrating candidate solution is probabilistically chosen based on the emigration rate . Migration is written as follows:
where is a candidate solution feature. In BBO, each candidate solution has its own immigration rate and emigration rate . A good candidate solution has relatively high and low , while the converse is true for a poor candidate solution. According to [1], the immigration rate and emigration rate of the candidate solution are based on linear functions and are calculated as
where fitness denotes candidate solution fitness value, which is normalized to the range [0, 1]. The probabilities of immigrating to and of emigrating from are calculated as
where is the population size.

*Mutation.* Mutation is a probabilistic operator that randomly modifies a candidate solution feature. The purpose of mutation is to increase diversity among the population, just as in other EAs. Mutation of the th candidate solution is implemented as shown in Algorithm 1.

In the mutation logic (Algorithm 1), rand () is a uniformly distributed random number between and , is the mutation rate, and and are the lower and upper search bounds of the th independent variable. The above logic mutates each independent variable with a probability of . If mutation occurs for a given independent variable, then that independent variable is replaced with a random number within its search domain.

A description of one generation of BBO is given in Algorithm 2.

##### 2.2. Transition Probability of Biogeography-Based Optimization

Two main steps in BBO are significant: migration and mutation. The transition probability of one generation includes migration probability and mutation probability. Consider a problem whose search space is binary. The search space is the set of all bit strings consisting of bits each. Therefore, the cardinality of the search space is . Suppose that BBO is currently in the th generation. Based on the previously derived Markov chain model for BBO [11], the probability that migration results in the th candidate solution are equal to at generation is given by
where** 1**_{0} is the indicator function on the set 0 (i.e.,** 1**_{0} if = 0, and** 1**_{0}() = 0 if ), denotes the index of the candidate solution feature (i.e., the bit number), denotes the immigration rate of candidate solution , denotes the emigration rate of candidate solution , and denotes the number of individuals in the population. The notation () in (4) denotes the set of search space indices such that the th bit of is equal to the th bit of . That is, () = . Note that the first term in the product on the right side of (4) denotes the probability that if immigration of the th candidate solution feature does not occur, and the second term denotes the probability if immigration of the th candidate solution feature does occur.

*Example 1. *To clarify the notations in (4), an example is presented. We use the notation to denote the number of individuals in the population. Suppose we have a two-bit problem (, ) with a population size . The search space consists of the bit strings , , , = , 01, 10, . Suppose that the individuals in the current population are , , = {01, 11, 11}. Then we have , , , and . To clarify the notation , we now explain how to calculate (1). We arbitrarily number bits from left to right; that is, in any given bit string, bit 1 is the leftmost bit and bit 2 is the rightmost bit. From the definition of () we see that
Since = 00, we see that (1) = 0 (i.e., the leftmost bit is 0). Then (5) can be written as
But (1) = 0 for , which in turn indicates that (1) = (1) for ; therefore, (1) = {1, 2}. Continuing this process, we see that

*Mutation.* Mutation operates independently on each candidate solution by probabilistically reversing each bit in each candidate solution. Suppose that the event that each bit of a candidate solution is flipped is stochastically independent and occurs with probability . Then the probability that candidate solution mutates to become can be written as
where is the number of bits in each candidate solution and represents the Hamming distance between bit strings and .

#### 3. Convergence of Biogeography-Based Optimization

The previous section reviewed the BBO algorithm and its Markov model. In this section, which comprises the main contribution of this paper, we use the results of Section 2 to analyze the convergence behavior of BBO. Section 3.1 gives some basic foundations of Markov transition matrices, including notation and basic theorems that we will need later. Section 3.2 reviews previously published Markov theory as it relates to BBO. This leads to Section 3.3, which obtains some important properties of the BBO transition matrix. Section 3.4 uses transition matrix properties to analyze BBO convergence to the solution of a global optimization problem. This leads to Section 3.5, which uses the BBO convergence analysis to obtain an estimate of the convergence rate.

##### 3.1. Preliminary Foundations of Markov Theory

A finite Markov chain is a random process which has a finite number of possible state values (), where is the total number of states, which is the cardinality . The probability that the system transitions from state to at time step is given by , which is called the transition probability. The matrix = ( ()) is called the transition matrix, where for , , and for all . The matrix is called* stochastic* because the elements in each row sum to 1. If the transition probability is independent of , that is, for all , [1, ] and for all and , then the Markov chain is said to be* homogeneous*. Given an initial probability distribution of states (0) as a row vector, the probability distribution of the Markov chain after steps is given by . Therefore, a homogeneous finite Markov chain is completely specified by (0) and , and the limiting distribution as depends on the structure of . For homogeneous finite Markov chains, we have the following two theorems [28, 29].

Theorem 2 (see [28, page 123]). *Let be a primitive stochastic matrix of order ; that is, all of the elements of are positive for some integer . Then converges as to a stochastic matrix which has all nonzero entries. That is, for all , ,
**
where , and for .*

Theorem 3 (see [28, page 126]). *Let be a stochastic matrix of order with the structure , where is a primitive stochastic matrix of order and , . Then converges as to a stochastic matrix. That is,
**
where , and for .*

We will use these theorems in Section 3.3 to derive important properties of the BBO transition matrix and in Section 3.4 to derive BBO convergence properties.

##### 3.2. BBO Markov Theory

In previous work [11], the transition probability of BBO with migration and mutation was obtained. This provides us with the probability of transitioning in one generation from population vector = [], where is the number of candidate solutions in the population and is the size of the search space, to population vector . BBO can be described as a homogeneous finite Markov chain: the state of BBO is defined as the population vector, so the element of the state transition matrix is obtained by computing for each possible and each possible . Namely, denotes the probability that the th population vector, denoted as , transitions to the th population vector, denoted as , where , [1, ]. Note that the cardinality of the state space is , where is the total number of possible populations; that is, is the number of possible vectors and the number of possible vectors . The number can be calculated in several different ways, as discussed in [11].

Let the matrix and the matrix be intermediate transition matrices corresponding to only migration and only mutation, respectively, where is the population size and is the cardinality of the search space. Note that and . That is, is the probability that the th individual in the population transitions to the th individual in the search space when only migration is considered, and is the probability that the th individual in the search space transitions to the th individual in the search space when only mutation is considered. We can use [11] to obtain the transition probability of the th population state vector to the th population state vector as where is a single element of the product of and . The matrix composed of the elements can be represented as [] = , where ] and . denotes the probability that the th migration trial followed by mutation results in candidate solution . Note that is a scalar, and transition matrix = () is a matrix, each element of which can be obtained by (11).

*Example 4. *Here we use a simple example based on [11] to clarify (11). Consider a simple BBO experiment in which a trial of migration and mutation can result in one of four possible outcomes , , , and with probabilities , , , and , respectively. Index refers to the migration trial number (i.e., the “For each ” loop in Algorithm 2). Assume that the total number of trials (i.e., the population size ) is equal to 2. Suppose that the probabilities are given as follows: Trial 1: , , , ; Trial 2: , , , .In this example, we calculate the probability that and occur after two migration trials. In order to calculate this probability, let denote a vector of random variables, where is the total number of times that occurs after two migration trials. Based on (11), we use , , , and to obtain
where
According to (13), belongs to if it satisfies the following conditions: (1) is a matrix; (2) each element of is either 0 or 1; (3) the elements in each row of add up to 1; and (4) the elements in the th column of add up to .

Note from [11] that the cardinality of is The number of matrices that satisfy these conditions is calculated as , and the matrices are found as follows: Substituting these matrices into (12) gives

##### 3.3. BBO Transition Matrix Properties

Recall that the migration probability and mutation probability can be calculated by (3) and (8), respectively: = Pr () ≥ 0 and = Pr () > 0. Therefore, is a nonnegative stochastic matrix; although it is not a transition matrix since it is not square, each row sums to 1. We also see that is a positive left stochastic matrix; that is, each of its columns sums to 1. We now present two theorems that show that there is a nonzero probability of obtaining any individual in the search space from any individual in a BBO population after migration and mutation. This means that there is a nonzero probability of transitioning from any population vector to any other population vector in one generation, which means that the BBO transition matrix is primitive.

Theorem 5. *If is a positive stochastic matrix and is a positive left stochastic matrix, then the product is positive.*

*Proof. *If is positive and stochastic, then every entry of is positive; that is, > 0 for and , and for all . Similarly, if is positive, then every entry of is positive; that is, for . Therefore, by matrix multiplication, for and .

Theorem 6. *The transition matrix of BBO with migration and mutation is primitive.*

*Proof. *From (11) we know that if = for all and , then for , where is given in (11). So the transition matrix of BBO is positive. Therefore, is primitive since every positive transition matrix is primitive.

Corollary 7. *There exists a unique limiting distribution for the states of the BBO Markov chain. Also, the probability that the Markov chain is in the th state at any time is nonzero for all .*

*Proof. *Corollary 7 is an immediate consequence of Theorems 2 and 6.

##### 3.4. Convergence Properties of BBO

Before we obtain the convergence properties of BBO, some precise definitions of the term* convergence* are required [15]. Assume that the search space of a global optimization problem is with cardinality . Further assume that the BBO algorithm with population size consists of both migration and mutation, as shown in Algorithm 2.

*Definition 8. *Let be the population at generation , where is the population size, denotes an individual representing a candidate solution in the search space , and may contain duplicate elements; denotes a fitness function assigning real values to individuals; is a subset in the search space, each member of which has the globally maximum fitness; and the best individuals in the population at generation are , where for all and for all .

We use the notation to denote an arbitrary element of (i.e., one of the best individuals in the population at generation ). Because of migration and mutation, and its fitness will change randomly over time. As , the convergence, or lack of convergence, of to the subset indicates whether or not the BBO algorithm is globally convergent. That is, BBO is said to converge if Note that is not necessarily unique. However, Definition 8 states that the BBO algorithm is globally convergent if and only if (17) holds for every . Clearly, the evolution of is a homogeneous finite Markov chain, which we call an -chain.

Now we sort all the states of in order of descending fitness; that is, , and . We define as the set of indices of ; that is, . Further, we define as the elements of such that ; that is, for all . This leads to the following definition.

*Definition 9. *Let be the transition matrix of an -chain, where for is the probability that transitions to . The BBO algorithm converges to a global optimum if and only if transitions from any state to as with probability one, that is, if
As noted earlier, there may be more than one -chain since more than one element of the search space may have a globally maximum fitness. Definition 9 states that the BBO algorithm converges to a global optimum if and only if (18) holds for every -chain. Also note that depends on the other individuals in the population at generation . Definition 9 states that the BBO algorithm converges to a global optimum if and only if (18) holds for every possible transition matrix for every -chain.

Theorem 10. *If the transition matrix of an -chain is a positive stochastic matrix, then BBO with migration and mutation does not converge to any of the global optima.*

*Proof. *Since every positive matrix is also a primitive one, it follows by Theorem 2 that the limiting distribution of is unique with all nonzero entries. Therefore, for any ,
where we use the notation to denote all elements of that do not belong to . We see that (18) is not satisfied, which completes the proof.

Theorem 11. *If the transition matrix of an -chain is a stochastic matrix with the structure , where is a positive stochastic matrix of order , and , , then the BBO algorithm converges to one or more of the global optima.*

*Proof. *From Theorem 3, we see that, for all , ,
where , for , and . It follows directly that, for any ,
We see that (18) is satisfied which completes the proof.

Theorems 10 and 11 can be applied directly to determine the global convergence of BBO if the structure of the transition matrix of the Markov chain can be determined, as we will show in the remainder of this section. In particular, we will formalize the observation that the transition matrix of BBO without elitism satisfies the conditions of Theorem 10 (as stated in Theorem 6). We will further show that the transition matrix of the -chain of BBO with elitism satisfies the conditions of Theorem 11.

*Elitism*. We now discuss a modified BBO which uses* elitism*, an idea which is also implemented in many EAs. There are many ways to implement elitism, but here we define elitism as the preservation of the best individual at each generation in a separate partition of the population space. This enlarges the population size by one individual; the elite individual increases the population size from to . However, note that the population size is still constant (i.e., equal to ) from one generation to the next. The elite individual does not take part in recombination or mutation but is maintained separately from the other -members of the population. At each generation, if an individual in the -member main population is better than the elite individual, then the elite individual is replaced with a copy of the better individual.

Relative to a standard -member BBO population, elite BBO increases the number of possible population distributions by a factor of , which is the search space size. That is, each possible population distribution of the -member main population could also include one of elite individuals. The number of possible population distributions increases by a factor of , from (see Section 3.1) to . We order these new states so that each group of states has the same elite individual. Also, the elite individual in the th group of states is the th best individual in the search space, for .

The elitist-preserving process can be represented by an upgrade transition matrix , which contains the probabilities that each population distribution of the ()-member population transitions to some other population distribution after the elitist-preserving step. That is, the element in the th row and th column of , denoted as , is the probability that the th population distribution transitions to the th population distribution after the step in which the elite individual is replaced with the best individual from the -member main population. The upgrade matrix is similar to the one in [29]; it does not include the effects of migration or mutation but only includes the elitism-preserving step. The upgrade matrix only includes the probability of changing the elite individual; it does not include the probability of changing the -member main population, since it does not include migration or mutation. If there are no individuals in the -member main population that are better than the elite individual, then the elite individual does not change. The structure of the upgrade matrix can be written as where each matrix is , where is the number of population distributions in an EA with a population size of and search space cardinality of . is the identity matrix since the first population distributions have the global optimum as their elite individual, and the elite individual can never be improved from the global optimum. Matrices with are diagonal matrices composed of all zeros and ones. Since the population distributions are ordered by grouping common elite individuals and since elite individuals in the population distribution ordering are in order of decreasing fitness, the super block diagonals in are zero matrices as shown in (22); that is, there is zero probability that the th population distribution transitions to the th population distribution if . So the Markov chain of elite BBO can be described as where is the transition matrix described in Section 3.1.

*Example 12. *To explain the update matrix described in (22), a simple example is presented. Suppose there exists a search space consisting of individuals which are where the fitness of is the lowest and the fitness of is the highest. Suppose the main population size is , so the elitist population size is . So there are nine possible populations before the elitist-preserving step, and they are = , , , , , , , , . Note that the first element in each population is the elite individual and the last element ( in this example) is the main population. Also note that the populations are ordered in such a way that the first three have the most fit individual as their elite individual, the next three have the second most fit individual as their elite individual, and the last three have the least fit individual as their elite individual. The update matrix is a matrix.

The population transitions to the population with probability 1; that is, = 1. Population cannot transition to any other population ; that is, (1,) = 0 for . Similarly, population transitions to with probability 1 since the elite is better than the main-member population ; therefore, , and for . Continuing with this reasoning, we obtain the matrix as follows: where each matrix is and the blank elements are 0.

Now we consider the convergence of the -chain, which is the sequence of elite individuals in the elite BBO algorithm. If the elite individual is equal to the global optimum, we call this an absorbing state of the -chain. Recall that the elite individual in elite BBO can only be replaced by one with better fitness. Therefore, the -chain of elite BBO contains three classes of states: (1) at least one absorbing state, (2) nonabsorbing states which transition to absorbing states in one step, and (3) nonabsorbing states which transition to nonabsorbing states in one step. So the transition matrix of the -chain, which we introduced in (18)–(21), can be written as where is a unit matrix corresponding to optimal individuals ( is the number of optima), is a matrix of order corresponding to nonabsorbing states that transition to absorbing states ( is the cardinality of the state space , so is the number of nonabsorbing states), and is a matrix of order corresponding to nonabsorbing states that transition to nonabsorbing states. The matrix of (25) has the same structure as the matrix in Theorem 11. It follows from Theorem 11 that the -chain of elite BBO is globally convergent.

These results are similar to the canonical GA [29], which is proven to never converge to the global optimum, but elitist variants of which are proven to converge to the global optimum. We sum up these results in the following corollary.

Corollary 13. *BBO with migration and mutation does not converge to any of the global optima, but elite BBO, which preserves the best individual at each generation, converges to the global optimum.*

*Proof. *This is an immediate consequence of Theorems 6 and 10 (the nonconvergence of BBO without elitism), Theorem 11 (the convergence of BBO with elitism), and the discussion above.

##### 3.5. Convergence Rate

The previous subsection analyzed the convergence properties of elite BBO, and this subsection discusses its convergence rate. The transition matrix of elite BBO after steps can be found from (25) as follows: where . If the limiting distribution of the Markov chain of BBO can be found from , which can be written as Modified BBO with elitism has been proven to converge to a global optimum in the previous subsection, and there exists a limiting distribution , where = [, …, , ,…, ] (recall that is the number of global optima). The convergence rate estimate of elite BBO can be obtained as follows.

Theorem 14. *If , the convergence rate of elite BBO satisfies .*

*Proof. *Consider
Note that elite BBO is guaranteed to converge to a global optimum regardless of the initial state. In addition, note that we can improve the convergence rate bound by decreasing the parameter . That is, reducing the number of nonabsorbing states which transition to other nonabsorbing states can accelerate the convergence of elite BBO. In spite of differences between GAs and BBO [12], we see from Theorem 14 that the convergence rate of BBO with elitism is very similar to that of GAs [22, Theorem 5].

#### 4. Simulation Results

Theorem 14 gives the upper bound of the convergence rate estimate of elite BBO. In this section, we use simulation experiments to confirm this theorem. Note that in (28) the parameter is a norm: . Here we define as the infinity norm ; that is, , where is the element in the th row and the th column of matrix . Now note that the transition matrix in (25) can be obtained from (11) and (23) using elementary matrix transformations. We can thus use Theorem 11 to check for BBO convergence, and we can use Theorem 14 to estimate the convergence rate of BBO. That is, we define as the error between a BBO population distribution and a distribution that includes at least one optimal solution. We then define the convergence criteria as an arbitrarily small error (e.g., ). We can then estimate the time to convergence from (28) as follows: Test functions are limited to three-bit problems with a search space cardinality of eight and a population size of four. The three fitness functions that we examine are where is a unimodal one-max problem, is a multimodal problem, and is a deceptive problem. Note that all three problems are to be maximized. Fitness values are listed in binary order, so the first element of each fitness function corresponds to the bit string 000, the second element corresponds to the bit string 001, and so on. For the BBO parameters, we use a maximum immigration rate and maximum emigration rate of 1, and we use linear migration curves as described in (2). We test elite BBO with three different mutation rates which are applied to each bit in each individual at each generation: 0.1, 0.01, and 0.001. Note that we do not test with a zero-mutation rate because the theory in this paper requires that the mutation rate be positive (see Theorem 5). Convergence is not guaranteed unless the mutation rate is positive.

Numerical calculations show that the transition matrices for these three problems satisfy the convergence conditions of Theorem 11, which indicates that the BBO algorithm converges to one or more of the global optima. As a heuristic test of Theorem 14, we use simulations to record the generation number of first obtaining a population in which all individuals are optimal, and all results are computed from 25 independent runs. Tables 1, 2, and 3 show comparisons of the theoretical convergence time , the corresponding parameter , and the generation number of first finding an all-optimal population, averaged over 25 independent simulations.

Tables 1–3 show time to convergence and time to finding an optimum, for both BBO and GA. The table confirms the statement following Theorem 14 that the convergence behavior of BBO is similar to that of GA. The tables show that GA converges slightly faster than BBO for high mutation rates, but BBO converges slightly faster for low mutations, and this latter behavior is more important in practice because low mutations rates provide faster convergence.

Several things are notable about the results in Tables 1–3. First, the mutation rate affects the convergence rate of BBO. For all test problems, the convergence rate improves when the mutation rate decreases. We can accelerate the convergence of BBO by decreasing the mutation rate. This may provide practical guidance for BBO tuning for real-world problems. Second, by analyzing the relationship of the parameter and the convergence time in Tables 1–3, we see that the convergence time is exponentially related to the parameter , as predicted by Theorem 14. Third, the theoretical results and simulation results match well for most of the test problems, which confirms the convergence rate estimate provided by Theorem 14.

The three-bit problems analyzed above are small, and the results do not tell us how convergence rates scale with the problem dimension. Also, the transition matrix grows faster than exponentially with the problem dimension and the population size [11], so realistically sized problems cannot be directly analyzed with the methods in this paper. However, our methods could be used to study the effect of BBO tuning parameters on small problems, which could provide guidance for larger, real-world problems. Also, similar population distributions could be grouped into the same Markov model state to reduce the transition matrix dimension of large problems to a manageable size [16], which could make the methods in this paper practical for realistically sized problems.

#### 5. Conclusion

In this paper we modeled BBO as a homogeneous finite Markov chain to study convergence, and we obtained new theoretical results for BBO. The analysis revealed that BBO with only migration and mutation does not converge to the global optimum. However, an elite version of BBO that maintains the best solution in the population from one generation to the next converges to a population subset containing at least one globally optimal solution. In other words, BBO with elitism will converge to the global optimum in any binary optimization problem.

In addition, an upper bound for the BBO convergence rate was obtained in Theorem 14. We used simulations to confirm this theorem for a unimodal one-max problem, a multimodal problem, and a deceptive problem. The results in this paper are similar to those of the canonical GA [29], and so our results are not surprising, but this paper represents the first time that such results have been formalized for BBO.

The results in this paper are limited to binary problems but can easily be extended to discrete problems with any finite alphabet. This is due to the simple fact that any discrete problem can be reduced to a binary problem.

For future work there are several important directions. First, it is of interest to study how to improve BBO convergence time and robustness based on these results. The second important direction for future work is to study the asymptotic convergence of variations of BBO, including partial emigration-based BBO, total immigration-based BBO, and total emigration-based BBO [12]. The theorems in this paper provide the foundation to study these variations, so we do not need additional theoretical tools to analyze their convergence. The third important direction for future work is to develop hybrid BBO, which combines BBO with other EAs, and study their convergence behaviors using the theory presented here. Finally, it would be of interest to extend these results to continuous optimization problems, which are the types of problems to which real-world implementations of BBO are typically applied.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This material was supported in part by the National Science Foundation under Grant no. 0826124, the National Natural Science Foundation of China under Grant nos. 61305078 and 61179041, and the Shaoxing City Public Technology Applied Research Project under Grant no. 2013B70004. The authors are grateful to the reviewers for suggesting improvements to the original version of this paper.

#### References

- D. Simon, “Biogeography-based optimization,”
*IEEE Transactions on Evolutionary Computation*, vol. 12, no. 6, pp. 702–713, 2008. View at Publisher · View at Google Scholar · View at Scopus - D. Du, D. Simon, and M. Ergezer, “Biogeography-based optimization combined with evolutionary strategy and immigration refusal,” in
*Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '09)*, pp. 1023–1002, San Antonio, Tex, USA, October 2009. View at Publisher · View at Google Scholar · View at Scopus - M. Ergezer, D. Simon, and D. Du, “Oppositional biogeography-based optimization,” in
*Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '09)*, pp. 1035–1040, San Antonio, Tex, USA, October 2009. View at Publisher · View at Google Scholar · View at Scopus - M. Lundy and A. Mees, “Convergence of an annealing algorithm,”
*Mathematical Programming*, vol. 34, no. 1, pp. 111–124, 1986. View at Publisher · View at Google Scholar · View at MathSciNet - H. Ma and D. Simon, “Blended biogeography-based optimization for constrained optimization,”
*Engineering Applications of Artificial Intelligence*, vol. 24, no. 3, pp. 517–525, 2011. View at Publisher · View at Google Scholar · View at Scopus - A. Bhattacharya and P. K. Chattopadhyay, “Biogeography-based optimization for different economic load dispatch problems,”
*IEEE Transactions on Power Systems*, vol. 25, no. 2, pp. 1064–1077, 2010. View at Publisher · View at Google Scholar · View at Scopus - P. Lozovyy, G. Thomas, and D. Simon, “Biogeography-based optimization for robot controller tuning,” in
*Computational Modeling and Simulation of Intellect: Current State and Future Perspectives*, B. Igelnik, Ed., pp. 162–181, IGI Global, Hershey, Pa, USA, 2011. View at Google Scholar - V. Panchal, P. Singh, N. Kaur, and H. Kundra, “Biogeography based satellite image classification,”
*International Journal of Computer Science and Information Security*, vol. 6, no. 2, pp. 269–274, 2009. View at Google Scholar - R. Rarick, D. Simon, F. E. Villaseca, and B. Vyakaranam, “Biogeography-based optimization and the solution of the power flow problem,” in
*Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '09)*, pp. 1029–1034, usa, October 2009. View at Publisher · View at Google Scholar · View at Scopus - D. Simon, M. Ergezer, and D. Du, “Population distributions in biogeography-based optimization algorithms with elitism,” in
*Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '09)*, pp. 1017–1022, San Antonio, Tex, USA, October 2009. View at Publisher · View at Google Scholar · View at Scopus - D. Simon, M. Ergezer, D. Du, and R. Rarick, “Markov models for biogeography-based optimization,”
*IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics*, vol. 41, no. 1, pp. 299–306, 2011. View at Publisher · View at Google Scholar · View at Scopus - D. Simon, R. Rarick, M. Ergezer, and D. Du, “Analytical and numerical comparisons of biogeography-based optimization and genetic algorithms,”
*Information Sciences*, vol. 181, no. 7, pp. 1224–1248, 2011. View at Publisher · View at Google Scholar · View at Scopus - A. E. Eiben, E. H. L. Aarts, and K. M. Van Hee, “Global convergence of genetic algorithms: a Markov chain analysis,” in
*Parallel Problem Solving from Nature (Dortmund, 1990)*, vol. 496 of*Lecture Notes in Computer Science*, pp. 4–12, Springer, New York, NY, USA, 1991. View at Google Scholar · View at MathSciNet - C. Grinstead and J. Snell,
*Introduction to Probability*, American Mathematical Society, Providence, RI, USA, 1997. - G. Guo and S. Yu, “The unified method analyzing convergence of genetic algorithms,”
*Control Theory & Applications*, vol. 18, no. 3, pp. 443–446, 2001. View at Google Scholar · View at MathSciNet - C. Reeves and J. Rowe,
*Genetic Algorithms: Principles and Perspectives*, vol. 20, Kluwer Academic Publishers, Norwell, Mass, USA, 2003. View at MathSciNet - L. Wang,
*Intelligent Optimization Algorithms with Applications*, Springer, New York, NY, USA, 2001. - A. Auger and B. Doerr,
*Theory of Randomized Search Heuristics*, World Scientific, Singapore, 2011. - T. Jansen,
*Analyzing Evolutionary Algorithms*, Springer, New York, NY, USA, 2013. View at Publisher · View at Google Scholar · View at MathSciNet - F. Neumann and C. Witt,
*Bioinspired Computation in Combinatorial Optimization*, Springer, New York, NY, USA, 2010. - T. Davis and J. Principe, “A Markov chain framework for the simple genetic algorithm,”
*Evolutionary Computation*, vol. 1, no. 3, pp. 269–288, 1993. View at Publisher · View at Google Scholar - J. He and L. Kang, “On the convergence rates of genetic algorithms,”
*Theoretical Computer Science*, vol. 229, no. 1-2, pp. 23–39, 1999. View at Publisher · View at Google Scholar · View at MathSciNet - J. Horn, “Finite Markov chain analysis of genetic algorithms with niching,” in
*Proceedings of the International Conference on Genetic Algorithms*, pp. 1–13, San Francisco, Calif, USA, 1993. - A. Nix and M. Vose, “Modeling genetic algorithms with Markov chains,”
*Annals of Mathematics and Artificial Intelligence*, vol. 5, no. 1, pp. 79–88, 1992. View at Publisher · View at Google Scholar · View at MathSciNet - J. Suzuki, “Markov chain analysis on simple genetic algorithms,”
*IEEE Transactions on Systems, Man and Cybernetics*, vol. 25, no. 4, pp. 655–659, 1995. View at Publisher · View at Google Scholar · View at Scopus - J. Suzuki, “A further result on the Markov chain model of genetic algorithms and its application to a simulated annealing-like strategy,”
*IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics*, vol. 28, no. 1, pp. 95–102, 1998. View at Publisher · View at Google Scholar · View at Scopus - X. Yao, Y. Liu, and G. Lin, “Evolutionary programming made faster,”
*IEEE Transactions on Evolutionary Computation*, vol. 3, no. 2, pp. 82–102, 1999. View at Publisher · View at Google Scholar · View at Scopus - M. Iosifescu,
*Finite Markov Processes and their Applications*, John Wiley & Sons, New York, NY, USA, 1980. View at MathSciNet - G. Rudolph, “Convergence analysis of canonical genetic algorithms,”
*IEEE Transactions on Neural Networks*, vol. 5, no. 1, pp. 96–101, 1994. View at Publisher · View at Google Scholar · View at Scopus