Abstract

The Moran model is a discrete-time birth and death Markov chain describing the evolution of the number of type 1 alleles in a haploid population with two alleles whose total size is preserved during the course of evolution. Bias mechanisms such as mutations or selection can affect its neutral dynamics. For the ergodic Moran model with mutations, we get interested in the fixation probabilities of a mutant, the growth rate of fluctuations, the first hitting time of the equilibrium state starting from state , the first return time to the equilibrium state, and the first hitting time of starting from , together with the time needed for the walker to reach its invariant measure, again starting from . For the last point, an appeal to the notion of Siegmund duality is necessary, and a cutoff phenomenon will be made explicit. We are interested in these problems in the large population size limit . The Moran model with mutations includes the heat exchange models of Ehrenfest and Bernoulli-Laplace as particular cases; these were studied from the point of view of the controversy concerning irreversibility (-theorem) and the recurrence of states.

1. Introduction

In Section 2, we start with generalities on discrete-time birth and death Markov chains with finite state space, whose transition matrix is of Jacobi type: spectral representation, reversibility versus -theorem, large deviations of the empirical average, invariant measure, conditions of transience/recurrence, spectral positivity and stochastic monotonicity, distribution of hitting and first return times, and Greens function. The scale or harmonic function is shown to be of interest to determine the excursions heights would the chain be ergodic. After having fixed the background, we then proceed with the study of specific Moran chains presenting various bias mechanisms.

The Moran chain model is a particular instance of a discrete-time birth and death Markov chain, describing the evolution of the number of type alleles in an haploid population with two alleles whose total size is preserved during the course of evolution. Bias mechanisms such as mutations or selection can affect its neutral dynamics (there are alternative domains of science where this model makes sense, namely, economy and behavioral sciences. Here, the Markov state takes on the interpretation of the number of agents that belong to the first strategy (say right- or left-wing voters) at time . See [1, 2] and references therein for a recent study with this physical image in mind.).

For the ergodic Moran model with mutations [3], we get interested (in Sections 3 and 5) into the fixation probabilities of a mutant, the growth of fluctuations, the first hitting time of the equilibrium state starting from , the first return time to the equilibrium state, and the first hitting time of starting from , together with the time needed for the Moran walker to reach its invariant measure, again starting from . The detailed growth rates of these quantities are shown to be exactly determined in terms of the mutation probabilities.(i)For the first fixation probability point, a sharp threshold phenomenon is exhibited in the following sense: would the mutation probability favoring allele exceed the one favoring allele , the fixation probability of a type mutant tends to as grows large with a correction. On the contrary, this fixation probability tends to geometrically fast as grows large. In the critical case when the two mutation probabilities are egal, this fixation probability goes to from below.Under the additional presence of selection, the fixation condition can be achieved with probability even if the mutation probability favoring allele exceeds the one favoring allele . The condition is that the relative fitness of allele should be large enough with bound given in terms of a ratio of mutation probabilities, translating a mutation-selection balance.(ii) For the last time to stationarity point, an appeal to the notion of Siegmund duality (requiring stochastic monotonicity) is necessary, and a cut-off phenomenon will be made explicit in Section 4, following the general stream of ideas developed in [47]. The analysis is made simple because the spectral representation of the mutation Moran model is exactly known.We are interested into these problems in the large population size limit (i.e., in the thermodynamic limit). From our analysis, it follows that the mutation Moran walker started at reaches stationarity in about time, far below the time needed to first return to the origin (or excursion’s length) which is geometrically large with . This is because the mutational drift pushes the walker inside the bulk of the state space, making the reverse move to any of the two boundaries extremely costly.

2. Discrete-Time Birth and Death Processes

We start with generalities on discrete birth and death (BD) processes with finite state space, before particularizing the study to the Moran model under concern which is in this class.

2.1. Generalities on BD Processes with Finite State Space

Let be a tridiagonal (Jacobi) irreducible stochastic matrix with , , and , corresponding to the transition probability matrix of some ergodic discrete-time Markov chain on the state space . With a column vector of s, is stochastic if .

The left (row-) eigenvector associated to the eigenvalue satisfies , where   denotes transposition. Its components are given by and , , with chosen so as . It corresponds to the invariant probability measure of the chain.

The matrix is diagonally equivalent to the symmetric matrix so with real eigenvalues. Indeed, with , we have . As a symmetric matrix, is diagonalizable by an orthogonal transformation, and so, is diagonalizable.

Such nearest-neighbors random walks models are well known to be reversible (detailed balance holds). Indeed, , where is the transition matrix of the time-reversed process, given by (with denoting the transpose of ). The transition matrix of reversible Markov chains has real eigenvalues.

Under our assumptions, irreducible chains governed by are positive recurrent, meaning that every state recurs with probability , the mean time between recurrence to state being finite and equal to .

2.2. The Karlin-McGregor (KMG) Spectral Theory

Consider the polynomials in the variable , determined by and the 3-term recurrences: therefore, with ,, . The polynomials satisfy , and are of degree in . They are often called the random walk polynomials. Let , be the zeroes of the polynomial of degree , namely, with . The set constitutes the spectrum of (see [8, page ]). The quantity is the spectral gap.

Let be the spectral probability measure on with respect to which are orthogonal, namely, where , are the potential coefficients.

These polynomials are important in view of the KMG spectral representation theorem [8]. Indeed, with , we have and therefore, . Selecting the row , multiplying by , integrating with respect to , and applying the orthogonality relations (5), we get

Necessarily, because by the ergodic theorem, for all, as . From (7), because , we have , and so the generating function of is the Stieltjes transform of the spectral measure, satisfying as a result of . On the other side, let be the probability generating function (pgf) of the first return time to zero probability. As it can easily be checked by renewal arguments (see [9, page  330]), , showing that

We have and .

More generally, with , let be the Green potential function of the chain. We let , the expected sojourn time at , starting from . Note that and .(i)With the first hitting time of starting from , by renewal arguments, the pgf of reads with . We have With , we have (ii)With the first return time to , we have with

The probability that the first return time to is finite is if and only if (a recurrence condition), and this happens if and only if (stochastic) is irreducible, in which case all states are in the same recurrence class. Else, if for some (a transience condition for ), this probability is smaller than , and one possible reason is that absorption occurs in at least one state ; all states then but the absorbing ones (typically here, either or , or both endpoints) are in the same transience class, and because the walker may or not return to before hitting one of the absorbing states. So here is no longer irreducible, and this is a particular reducibility case where transience pops in. In the latter transient case however, by removing the rows and columns associated to the absorbing states, we are left with an irreducible substochastic transition matrix of smaller dimension to which the Karlin-McGregor spectral representation applies as well (see [8]). The dominant eigenvalue of the restricted matrix is now strictly smaller than .

In view of and in case of recurrence,

Equations (6) and (7) suggest that, with the vectors and can be chosen to be the right (column) and left (row) eigenvectors of associated to the eigenvalue for which

In particular, and , as required. We have because , with the right normalization. Equation (19) yields another orthogonality relation which is dual to the one in (5), namely,

With , we now get the decomposition of into orthogonal projectors which is (7):

Note the biorthogonality relations: from which it follows that

There is a matrix version of all this. Calling (resp., ) the matrix whose column (row) is (resp., ), (18) reads and where   diag, . Since by (19),   diag, we have , which is (22). Since , . If we let be the matrix whose entry is , then and is the matrix formulation of (17). Let us finally check that and (i.e., (17)) are indeed admissible left and right eigen matrices. The relation is also which is which holds from (6). The relation is also or which is or again , because by the reversibility property, . If ( is involutive), then which means . If in addition , then : the duality relations (5) and (20) coincide, and this is an exceptional case of self duality.

The KMG spectral theory therefore gives a representation of the right and left eigenvectors of associated to its spectrum , in terms of the orthogonal polynomials evaluated at (the -matrix).

The interpretation of the right eigenvectors of is as follows. We have . Fixing the row , the left-hand-side of the last equality is . Therefore,

Suppose that we want to compute for some test function . Let then be the decomposition of in the basis . We have .

Thus, and

Whenever the spectral structure of the chain is known, the solution to the backward Kolmogorov equations are thus computable exactly. Let us now investigate some special classes of random walks (RWs).

2.3. Symmetric RW

Suppose that is even with . Assume that a random walk on is given so that is the transition matrix of some symmetric random walk reflected at the boundaries . The KMG theory is also relevant for such transition matrices . Let ,  be the zeroes of the polynomial of degree , namely, with . Then, is the probability measure on with respect to which the polynomial associated to is orthogonal. This measure is symmetric on . In particular, and , are eigenvalues of such s. When is odd, say with , the spectral measure is again symmetric, but no longer is an eigenvalue, and .

Example 1. The simplest such example is the simple gambler random walk for which , , () with a pure reflection at the endpoints (). The invariant measure is a truncated geometric distribution. It follows from [9, page ] that ,,,. The spectral gap tends to as if . The mass of the spectral measure can be seen to be equal to , , (with when ). It exhibits boundary effects causing deviation from the uniform measure on . The orthogonal polynomials involve two sine functions. A closed form expression of the generating functions of and follows.
Additional Comment. Let be the tridiagonal transition matrix associated to this RW . Let be the transition matrix associated to . Then, has the same structure as except that and are interchanged (a drift reversal property). is obtained from by applying a permutation reversing the order of the rows and then of the columns. So and share the same spectrum, and this is why is a symmetric function of and . However, the mass of the spectral measure of is different with: .

2.4. Special Cases: Spectral Positivity

Let be a subdiagonal matrix. It is the transpose of a superdiagonal matrix . Then, with diag

Thus, is the sum of a diagonal matrix and a symmetric positive definite matrix. From this, we conclude that if the holding probabilities , for all , then for all , and so and then are positive definite with real positive eigenvalues. The Jacobi matrix then has all its principal minors nonnegative and therefore is oscillatory (i.e., totally nonnegative and such that is totally positive) with all its minors nonnegative (see e.g., [10, page  99]). Oscillatory stochastic matrices have distinct positive eigenvalues, with: . As a sequence of numbers, the right row eigenvector associated to has exactly sign changes (see [10, page  101]). And so do the numbers: . The above condition is a simple sufficient condition allowing to exhibit examples of that are oscillatory but of course, it is not necessary; the full condition under our assumptions being for all or

Equivalently, if are the positive eigenvalues of , we need to check for all .

For a simple random walk whose -transition matrix is spectrally nonnegative (resp., spectrally positive), there exists a symmetric random walk on (resp., on ), reflected at the endpoints, started at an even integer (resp., odd integer ), such that , (resp., ). This follows simply from adapting [11, Theorem  2.1] to our finite-dimensional case. As noted just before, the spectral measure of the random walk is symmetric on , and by passing to , the spectrum is being folded: If (resp., ) is the symmetric spectral measure of with (resp., ), then is the spectral measure of . Let and be the up and down probabilities that in one step given is in state different from the endpoints, , then

This, together with and (resp., ), allows to determine recursively the transition matrix of from the one of . From these facts, we get the following proposition.

Proposition 2. If a BD chain is spectrally nonnegative, then it is stochastically monotone.

Proof. Under our hypothesis indeed, because this is also . This condition means that is stochastically monotone (see below the construction of the Siegmund dual of an RW where stochastic monotonicity is needed). See also [6, Lemma 2.4] for a different proof.

Remark 3. Suppose a BD chain with transition matrix is not spectrally nonnegative, so has strictly negative eigenvalues (). With , consider the lazy modified stochastic matrix

The eigenvalues of are , with the same left and right eigenvectors as the ones of . One can always choose small enough so that . For this , is now spectrally nonnegative, and so governs a stochastically monotone chain . The RW , with shifted spectrum, clearly has the same invariant measure as the one of .

For example, solving (34) with the boundary conditions and gives the forward and backward recurrences of the unknown odd and even probability terms in the bulk:

As a conclusion, it is easy to construct spectrally positive RWs by “squaring” two symmetric random walks, but given a nonsymmetric RW, it may be difficult to decide whether it is or not spectrally positive because one needs conversely to check whether all the above s are probabilities. Spectral positivity determines stochastic monotonicity.

2.5. Scale Function and Excursion Height of an RW

Assume the height for some excursion (sample paths of between two consecutive visits to state ). This event will be realized if and only if (i) downward paths started from hit state before hitting state and (ii) upward paths started at first reach (with probability ) and then, paths started at hit without returning to again in the intervening time. These two events are independent. Therefore, with , the first hitting time of starting from

Assume . Let denote the random walk stopped when it first hits . Its transition matrix is except for its first row which is , translating that state is now absorbing. Define the scale (or harmonic) function of this random walk as the function which makes a martingale. The function is important because, as is well-known, for all , with , the first hitting time of starting from

Using this remark, we get

Note that if . The event may also occur, with probability .

The interpretation the event takes on is twofold: either it is the maximal height of an excursion if the state remains reflecting, either, if state is itself assumed absorbing, constitutes the probability of a fixation event at , starting from state . We can thus compute the probability of a fixation before extinction at occurs. It remains to compute (defined up to an arbitrary multiplicative constant). We wish to have , leading to and where , . Imposing (fixing the arbitrary multiplicative constant), the searched “harmonic” function is where satisfies . Thus, and

Equations (39) and (41) characterize the law of the excursion height of the random walk . We note that increases with . From (38) and (41), we get an explicit expression of the probability of the event : where is the first hitting time of starting from , with . In particular, if with also absorbing, is the fixation probability starting from , whereas is the extinction probability.

3. A BD Example: The Moran Model

The example we have in mind is the -allele Moran model with bias mechanism . Let be continuous and .

The Moran model is characterized by the following transition probabilities, given the walker is in state :

For a Moran model is the amount of say allele or type individuals in a population of size , and the reproduction law is obtained while choosing two individuals at random among , one dying and the other one passing to the next generation while giving birth to an additional individual so as to keep constant the full population size over the generations. Given then , the transition () occurs with probability () translating the fact that the individual bound to die is chosen among type (type ) individuals with probability    , and the one which splits into two new born is chosen among the other type, with probability (resp., ), after deforming the neutral frequency by the bias mechanism . These bias take into account various evolutionary forces which are the causes of deviation to neutrality (see [1214] for a discussion on various models of utmost interest in population genetics, among which selection and mutations). The classical neutral Moran model is obtained when (no external deformation). Extended Moran models are studied in [15].

Assuming and , with , the corresponding Markov chain is ergodic with invariant distribution: where

We have the following obvious facts.

Proposition 4. Let be a Moran process with transition probabilities (44) for some bias . Then, is again a Moran process with transition probabilities of the type (44) but with the new bias , namely, , . The spectrum of the transition matrix of is the same as the one of . If the bias is such that , then the transition matrix of coincides with the one of .

Proof. We have , and is obtained from applying the permutation reversing the order of the rows and then of the columns.

3.1. Fixation Probability

Forcing the states to be absorbing, the scale function reads with being the fixation probability, whereas is the extinction probability, starting from (for the neutral model with , the scale function is , and the fixation probability is trivially ). The probability of a fixation before extinction occurs in an excursion reads

We have where is the fixation probability of a mutant-type allele , for example, the probability that, starting from , the RW hits state (fixation of the mutant-type allele ) before hitting state (extinction of allele ).

3.2. Examples of Bias Mechanisms

We now give three examples of bias that we will deal with in the sequel.

Moran Model with Mutations. A basic bias example is the mutation mechanism where are mutation probabilities in . We  let  ,   , , and  .

Schematically, type alleles are being produced whenever type alleles do not mutate to type (with probability ) or type alleles mutate to type (with probability )

The transition matrix of this model has therefore the particular BD structure with , . When , we introduce the auxiliary parameters with   and.

The drift of this RW is

The deterministic part of the dynamics of has a stable point at , cancelling the drift. When is nondecreasing, we have . We note that in the expression of , the roles of and are interchanged.   So, if and only if (symmetric mutations).

If , we recover if is even the heat-exchange Bernoulli-Laplace model [9] as a borderline example, but is strictly decreasing in this case. If , then and we obtain an aperiodic model amenable (through a suitable time substitution) to the Ehrenfest urn model provided is even.

For one-way mutations, or leading to the simpler forms and of, but with and , respectively, corresponding to (resp., ) being absorbing. We avoid this situation because it deserves another treatment due to transience.

Moran Model with Mutations and Selection. Another example could be which is the composition of the selection bias mechanism with the mutation mechanism (50). When considering the fixation probability for the Moran model with this particular mechanism, we obtain a closed-form formula for the fixation probability of a mutant which of course is an interesting issue in genetics.

Cases with Positive Eigenvalues. We may look for conditions on the mechanism leading to in which case the RW is spectrally positive. Assume is continuous and nondecreasing, with .

Then, as can easily be checked: for all if and only if with . Indeed, imposing for all leads to if and if . So, if is nondecreasing with , then .(i)Take, for example, the mutation mechanism now satisfying . The condition leads to (fair mutation). In this case, with . If , then and , , and .  For example, if , has three distinct real nonnegative eigenvalues ordered in decreasing size. We note however that it is not necessary that for to be spectrally positive; indeed, for the full Moran model with mutations , under the sole condition ( is nondecreasing), the eigenvalues can be seen to be nonnegative. Still, we were not able to find a sharp general condition on that would lead to a spectrally nonnegative Moran chain.(ii)For the mechanism of (55) combining mutations with selection, assuming to guarantee that is nondecreasing, the condition relates to give , where both . In this case, the model is spectrally positive.

3.3. Fixation Probabilities for the Moran Model with Mutations and/or Selection

We now study the particular Moran model with mutations and/or selection, starting with fixation probabilities.

Moran with Mutations (sharp threshold phenomenon). We need to evaluate for large and then will follow.(i) Suppose first that and let .

Proposition 5. The fixation probability satisfies the sharp threshold property: (i)if (), then ; (ii)if (), then ; (iii)if (), then .

Proof. With , we first have the identity for sums of inverse binomial coefficients (see e.g., [16, page  197])
There is therefore a closed-form expression of the fixation probability . Observing now
we get(i)  (): . So: ;(ii) (): . So: ;(iii) (): . So: .
If , because if : and is very close to . The fixation probability tends to from below.
If , + + , and is very close to if and very close to up to a factor if .
When , the probability, starting from to hit the state (fixation) starting from before returning to , tends to geometrically fast. On the contrary, would , this probability would converge to ; mutations producing type individuals are sufficiently large to enhance fixation of a mutant allele with probability . If , the fixation probability approaches .

(ii)Suppose now that . With , , Proceeding as before, we have showing that . The dominating term in is thus either the one corresponding to or the one corresponding to .

With , by Stirling formula, we get

Now, whenever , regardless of whether or ,

Indeed, let . The condition is also

If , then if and only if (or ) and then .

If , then if and only if (or ) and then .

We obtained the following proposition.

Proposition 6. When , the fixation probability also satisfies the following sharp threshold property.(i)If , the fixation probability tends to 0 geometrically fast because , leading to . The fixation probability of a mutant tends to 0, algebraically fast at rate given by (63). (ii)If , regardless of whether or , . Thus, leading to from below.(iii)If (balanced mutations), , and therefore, : the fixation probability tends to from below.

(i) For the Moran model with selection mechanism as in (56), gene (resp., ) has fitness (resp., ) so that, with , the mutant relative fitness showing that:. Whenever (), the mutant gene is selectively advantageous (deleterious) compared to . When (), tends to , whereas when (), this probability tends to exponentially fast like . When (neutral case), tends to much slower.

In this pure selection case, these results are well known (see, e.g., (3.66) page 109 of [14] and also [1] in [17]).

(ii) For the Moran model with bias mechanism (55) involving both mutation and selection, where, with ,, , .

From the preceding analysis, we conclude that with

We thus obtained the following proposition.

Proposition 7. For the Moran model with bias mechanism (55):(i)if , the fixation probability tends to   geometrically fast because ,   leading to .   The fixation probability of a mutant tends to , algebraically fast at rate   now given by (67); (ii)if  ,    leading to   from below; (iii)in the critical regime when    and then  , the fixation probability tends again to  .

Proof. Regardless of whether or , the extinction condition is achieved whenever
By the above arguments, this leads to the following mutation-selection balance condition: which is also
The selective advantage of allele (its relative fitness) should not be too large, and the upper bound is given by the ratio of the mutation probabilities . Note that (resp., ) (resp., ). In the presence of selection, the extinction condition can be achieved even if , provided that is small enough. When , the extinction of the mutant allele occurs if it is selectively deleterious: .

To the best of the author’s knowledge, the results displayed in Propositions 57 seem to be new.

3.4. Spectral Representation and Invariant Measure of the Moran Model with Mutations

Except for some exceptional special cases, the spectral measure associated to the Moran model with general bias mechanism as in (44) is not known. One of the notable exceptions is the Moran model with positive mutation probabilities ; see [1820]. The transition matrix of this model has therefore the BD structure (52).

For the Moran model with mutations, the orthogonal polynomials coincide with the dual Hahn polynomials , where, with , are the classical Hahn or Hahn-Eberlein polynomials. We note that (Hahn case) with () and (), whereas (Hahn-Eberlein case) with () and ().

The case () corresponds to a weak (strong) mutation regime.

As a truncated hypergeometric series, the polynomials are of degree in . In this case, thanks to the -term recurrence satisfied by the , the eigenvalues of the transition matrix (52) are depending only on the total mutation pressure . In particular, is the spectral gap, and is the smallest eigenvalue (with if ).(i)In the weak mutation regime , the RW is spectrally positive with (ii)In the strong mutation regime , the RW is not spectrally positive with

When gets large, there is a fixed (independent of ) number of negative eigenvalues. Only when do we have a number of negative eigenvalues of order .

Invariant Measure. Furthermore, with the rising factorial of , the invariant probability measure is the generalized bivariate hypergeometric distribution: which may also be recast as

with the convention , , the falling factorials of with .

The mean and variance of are

Upon scaling, the law of is very much peaked around its mean , with density where and, by Stirling formula,

We have at , and .

A saddle point estimate gives

(i) When both (), the distribution (74) is the negative hypergeometric distribution which can be obtained as a mixture of the binomial distribution with parameter , namely, with (by Stirling formula) displaying exponential decay . Note that, with it holds that

Note that both and because, with , and .

Furthermore, the rate (63) appearing in the fixation probability is .

The case of symmetric mutations (with ) is obtained while setting .

(ii) When both (), the distribution in (75) constitutes the classical (generalized) hypergeometric distribution arising in sampling without replacement.

By Stirling formula, for all (exponential growth),

This shows that, with and ,

Note that again with and because, with , and .

(iii) When both (), we get the standard hypergeometric distribution , with .

The case of symmetric mutations (with ) is obtained while setting .

(iv) When (Bernoulli-Laplace), the transition probabilities read   . Here, (the standard hypergeometric distribution),   and .

The expected return time to is , which is huge when is large, whereas the expected return time to is of order , much smaller.

(v) On the critical line , is constant and so the transition probabilities become affine functions of the state: and. Here, , are binomial bindistributed and self-dual, and , independent of (this RW is spectrally nonnegative). The Hahn polynomials boil down to , , a class of Krawtchouk polynomials. When (the lazy Ehrenfest urn), the holding probabilities are , and both and are symmetric bin-distributed.

Spectral Measure When   . The spectral probability measure is and the orthogonality relation reads

Notice that, in our notations: . In particular, where

Evolution of the Conditional Mean and Variance of given .    and being right eigenvectors of , leading, by virtue of (25), to the following peoposition.

Proposition 8. For the Moran model with mutations

This shows how the mean and variance of vary, during the course of evolution, as a function of and the initial condition before reaching the mean and variance of the limit law. Concerning the mean value and depending on or , approaches monotonically the limiting value   from above or from below. Note however that for all initial conditions ,

Let us now consider the variance. Noting that , rearranging the terms, with if ( and ) or ( and ) and . The corrective term to the limiting variance is a linear combination of three terms with geometric decay at different rates. For large , the discrepancy between and is indeed controlled by the slower term. In general, the variance does not increase monotonically towards its limiting value, starting from .

Proposition 9. If for instance starting from (which is ), depending on or not, approaches its limiting value from below or from above, respectively. When , there is a time at which the fluctuation is maximum, and this maximum exceeds . The conclusions should be reversed would (or ).

It can also be shown [21, page  47] that, for , meaning a geometric asymptotic decay of the autocovariance also controlled by the second largest eigenvalue .

Evolution of Heterozygosity. Let denote the heterozygosity of the chain. This function is the largest when , that is, when both type population sizes and are the closest. The exact evolution of and can be derived in a similar way using (25). For the variance , , as a degree polynomial in , is needed. It can be shown that tends monotonically to , geometrically fast at rate , as . The variance tends to geometrically fast as , and it is maximal at some intermediate time .

Note that, with denoting the population gap, and this function, just like , is not always (i.e., independently of ) decreasing with .

Increase of the Conditional Entropy Given   (-Theorem). Let , . For any initial distribution , with the Shannon conditional entropy of the chain, we have [22]

The first inequality shows that is nondecreasing. It converges to its limiting maximal value, which is the Shannon entropy of the invariant measure. The second inequality shows that is concave: the speed of increase of entropy decreases as time passes by. In particular, letting , for all fixed starting point and

The increase of entropy is a manifestation of irreversibility apparently in conflict with the reversibility property of all ergodic BD chains and their recurrence of states.

Empirical Average and Large Deviation [23]. Given , let denote the empirical average of some bounded measurable output function of the (ergodic) chain. Examples of interesting s are in which case is directly the empirical average of the s, but also in which case is the fraction of time spent by the walker at state (its local time). With the row vector which is everywhere but at position where it is and a column vector of s, we have where, with a column weight-vector with entry , . By Perron-Frobenius theorem, we have where is the dominant principal eigenvalue of and a concave function of , with . Thus, where is the concave Legendre transform of the pressure function , giving the large deviation rate function of . Clearly, which is the value where is maximum () equal to .

4. A Detailed Study of the Siegmund Dual of BD Chains with an Application to the Mutation Moran Model

In this Section, we will illustrate the power of the duality/intertwining relationship by considering the simplest Siegmund dual of a birth and death chain, with the mutation Moran birth and death example in mind. We also address here the question of computing the strong stationary time distribution that helps quantifying the “distance” to equilibrium of the original positive recurrent BD process.

Definition 10 (see [24]). Two discrete-time Markov processes , with state spaces , possibly with substochastic transition kernels, are said to be dual with respect to some nonsingular duality kernel on the product space if

When state spaces are finite and identical, the duality kernel is a square-matrix, and the transition matrix of the dual process , say is obtained from the one of the direct process by where stands for the transpose of . Note that if is an dual to , then is an dual to . If , is an dual to , but also is an dual to .

The Siegmund Duality Kernel. (For nonneutral population genetics models, other duality kernels have recently been shown to be of interest, see [25, 26]). The Siegmund duality kernel is . If, for a given process , a process exists satisfying the above condition, is called the Siegmund dual of , see [27]. Clearly, in the BD case for , the condition for the Siegmund dual to exist is that should be stochastically monotone in that, for all and should be increasing with .

For positive recurrent birth and death processes, and for the Siegmund kernel, the transition matrix of the dual process reads where ,    (and ,   ,, ). It is again the one of a BD process (but not of a Moran BD process if is a Moran transition matrix). Indeed, in this case, is an upper-right triangular matrix with nonzero entries , whereas the nonnull entries of are the diagonal (with entries ) and the upper diagonal with entries . The structure of follows from this, and the duality relation: .

For this dual to exist, we need to ensure that for which is a necessary and sufficient condition to guarantee the stochastic monotonicity of . This condition also reads: . The drift of at is where is the drift of at . We have

We already know that if is a spectrally nonnegative BD matrix, the chain is stochastically monotone. Here is another sufficient condition relative to the specific ergodic Moran case.

Proposition 11. Consider the Moran model with bias satisfying , . If is nondecreasing, the condition is fulfilled (the Moran chain is stochastically monotone), and so the Siegmund dual exists.

Proof. We first need to guarantee that corresponding to the row of , but this is true because . Next, when , the inequality clearly holds because . Finally, for , because

For the mutation Moran model in the weak mutation regime , is nondecreasing, and therefore, this chain is stochastically monotone. Although the condition that is nondecreasing is a sufficient condition to guarantee that the chain is stochastically monotone, it is not necessary.

Proposition 12. Consider the Moran model with mutation bias in the strong mutation regime . Although is not nondecreasing, the condition is still fulfilled would be large enough (in which case the Moran chain with strong mutations is still stochastically monotone). And then, the Siegmund dual exists.

Proof. When, is monotone decreasing. We need to check that in that case also when varies from to , as soon as is large enough. After some elementary algebra,
The minimum if this convex degree polynomial is attained at the point so, when is large, to the left (right) of if (if ) with a negative value of this minimum. The maximum of is thus attained either at (at ), with (resp., ). These maxima are nonpositive for all sufficiently large so that . Then, given a Moran model with , for all large enough, both and .
When  , and is a linear function of . The maximum of is thus attained either at (at ), with (resp., )), both nonpositive. So for all and the Siegmund dual also exists when , whatever the value of .

Remark 13. Whenever , therefore, the Moran model with mutations is or can be made stochastically monotone. When , except at least at both and where . The Bernoulli-Laplace chain fails to be stochastically monotone.
From the structure of , it is apparent that the dual process loses mass at and is absorbed at . Let us therefore add a coffin state and consider the enlarged stochastic matrix which we will call :
The corresponding proper BD chain, call it , now has two absorbing states, one at and one at . Let now , be the scale function of , solving , forcing . We can easily check that so that the scale function of expresses in terms of the cumulative distribution of the invariant measure of the original process.
Let be the infimum of the first hitting time of and starting from . We have
Doob Transform. Define a new transition matrix by
We have where state becomes isolated and disconnected. Deleting the row and column associated to the entry of , we get a stochastic matrix, call it , of a process on which corresponds to conditioned to first hit state before state . The state of this conditioned BD process now is partially reflecting whereas the remaining absorbing state, say , is .
We will also need subsequently to introduce , , which is the new scale function of . We have because , and , and and and are the new transition probabilities of on which can now be read from .
We now show that and are intertwined through a stochastic link.

Proposition 14. (i) The matrices and are similar (with the same eigenvalues); that is,
The link is given by   , corresponding to the entries of a lower-triangular stochastic matrix. In other words, for all , and , where and .
(ii) The link satisfies
(iii)   are admissible initial distributions of the chains and , satisfying
(iv)   is -dual to where .

Proof. (i) From (120), The random walk being reversible, . Thus, As a result, with the right entries   , satisfying .
(ii) The last row of is given by so that once hits state , the law of is .
(iii) The first row of is so that .
(iv) Indeed, with with the right entries.

So (as the composition of the Siegmund dual of with a Doob-transform) can be obtained from either from through a stochastic link or from through the duality kernel . This is because BD chains such as the Moran model are reversible.

Strong Stationary Time. The intertwining construction shows that the original positive recurrent BD chain with transition matrix may also be viewed as the output (through the link ) of an intertwined hidden Markov chain with transition matrix . Once hits its absorbing state , the RW is distributed like , provided both and were started at . Furthermore, there exists a bivariate Markov chain with transition kernel where . We have if and only if . We can check that as required for a stochastic matrix.

With , we will let be the first hitting time of of , starting from state . The random time gives some information on the speed of convergence of the law of the original process to its invariant measure (is a strong stationary time in the sense of Diaconis and Fill [5]). The facts (123), (124), (125), and (126) indeed guarantee that is a strong stationary time of in the sense that and is independent of (see [5, Theorems  2.4 and  2.17] or [6, Theorem  2.1]). Equivalently (see [4, Proposition]), it holds that where is the law of started at , its invariant measure. In (132), the separation discrepancy is defined by . It satisfies , where is the total variation distance between and .

Furthermore, from (123), (126), there is a unique “witness” state say such that either or showing that this random time is stochastically the smallest since the first inequality in (132) turns out to be an equality (see [5, Remark  2.39] and Proposition 15 below).

In the context of BD chains absorbed at , the probability generating function (pgf) of is [6, 28]: where , are the distinct eigenvalues of both and , avoiding . The formula (133) also reads where

Thus, , and has geometric tails with exponent . We have

Note that, since is the dominant eigenvalue, When the eigenvalues are nonnegative, then where the s are independent with geom the geometric distribution with success parameter on . When the eigenvalues are not all positive, it is not obvious that the previous expression (133) of is indeed a pgf but it turns out that this is the case. Assuming , (133) takes on the interpretation of where bernoulli geom, and are all mutually independent. We can summarize these results that should mainly be attributed to [47] as follows.

Proposition 15. Suppose a Siegmund dual exists for a finite state space ergodic BD chain . Then, there exists a Markov chain , intertwined with , with as an absorbing state and fully described in Proposition 14. The random time is a fastest strong stationary time for whose law is characterized either by (133) or (136) involving the spectrum of either or , the transition matrices governing the two processes.

Remark 16. When the RW is symmetric, except for and maybe , the eigenvalues can be paired, because if is an eigenvalue, then is one also. The pgf reads depending on whether is or is not an eigenvalue.
Computing the Mean and Variance of . Suppose that the are known explicitly. In this case, it is possible to compute and and find conditions under which If this is the case, then in probability and is expected to be a cut-off time for started at .
We now apply this construction to the particular Moran model under study.
The Case of the Moran Model with Mutations. The eigenvalues are known, leading to: . If we scale the characteristic time to , the summations for large can be replaced by integrals, leading to the estimate and similarly for . From this, we easily get the following proposition.

Proposition 17. For the Moran model with mutations with , whenever the Siegmund dual exists, showing that .

We have and the tail distribution of is

Remark 18. When , the Bernoulli-Laplace RW fails to be stochastically monotone, and so the above construction also fails as the chain has no Siegmund dual. However, as was shown in [29], similar conclusions on the fastest strong stationary time hold for the embedded continuous-time version of the Bernoulli-Laplace chain.

Proposition 19. We have the Gumbel limit law

Proof. Indeed, using to the dominant order in , which is the Fourier transform of ,

With , then where and . The expected mixing time is , whereas the spectral gap is , the product of the of which tends to . Recalling , , the condition is a sufficient condition for . If this holds, the contribution of to dominates the lead term (see [30, 31] for recent developments and precisions).

Consider the ergodic Moran birth and death Markov chains with mutations on the state space . By (16), the mean return time to state is . Thus, with , regardless of or

When ( if ).

If , .

We wish now to estimate the expected time it takes for to move from one end of the state space to the other, that is, from state to . Let be the random time of such a sweep (note that the expected fixation time of a wild-type allele is given by as a result of , the sum of and an independent geometrically distributed rv with success probability ). We will prove the following estimation of its mean value growing geometrically fast.

Proposition 20. With , regardless of or , it holds that
When , ( in the lazy Ehrenfest case: ). When , .

Proof. Let indeed be the random time to first hit the state starting from the state . Depending on whether the move starting in is up, down, or no move, is either or (with a statistical copy of ) or . If we let be the mean value of , we thus get , . This leads to the recurrence ():
This recurrence can be solved explicitly to give
Thus,
Looking at this sum formula, one expects that its leading term is because this is where is the largest and the smallest. Observing and , this estimation leads, recalling , to and this is indeed true because the remaining terms do not contribute significantly, see [32]. When ,
When (Laplace-Bernoulli), .

Remark 21. Regardless of whether or , if and only if . In this case, the limiting growth rate of the expected value of dominates the one of . On the contrary, .

Note that if (symmetric Moran model), and as , whereas ( if , if ).

The physical image of the situation is the following: it takes a small amount of time to move from to the stable equilibrium state because the drift is favorable to this displacement. Once at state returns rapidly and very often to (it takes a time to do so), and coming back either to or to is hard because the drift now pushes inside the domain As a result, both and are, comparatively, geometrically very large, with rates computed just before. Recalling it takes about steps for the chain to reach maximum entropy under the “sep-distance,” starting from , although the return to will eventually occur, the time it takes is so huge that such returns will not be observed. In other words, the chain will reach rapidly the invariant measure inside a single excursion. Let us finally check these last statements.

First Return Time to Equilibrium State. Indeed, from (75), using (86),

Thus, by (16), , and the return time to the equilibrium state is relatively small.

First Hitting Time of the Equilibrium State, Starting from . In this case, we have

If we scale this characteristic time to , the summations for large can be replaced by integrals, leading to the estimate

Recalling two successive appeals to the saddle point method show that , which is indeed .

Concerning the related first hitting time of state , starting from (one step below), we have showing that is also , in accordance with .

Acknowledgments

Thierry E. Huillet acknowledges partial support from the ANR Modélisation Aléatoire en Écologie, Génétique et Évolution (ANR-Manège-09-BLAN-0215 project) and from the MME-DII Center of Excellence (Modèles Mathématiques et Économiques de la Dynamique, de l’Incertitude et des Intéractions, ANR-11-LABX-0023-01 project).