- About this Journal ·
- Abstracting and Indexing ·
- Aims and Scope ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents

International Journal of Stochastic Analysis

Volume 2012 (2012), Article ID 569081, 20 pages

http://dx.doi.org/10.1155/2012/569081

## Birth and Death Processes with Neutral Mutations

^{1}IECN, Université de Lorraine, Campus Scientifique, B.P. 70239, 54506 Vandœuvre-lès-Nancy Cedex, France^{2}Inria, 54600 Villers-lès-Nancy, France^{3}Laboratoire de Probabilités et Modèles Aléatoires, UMR 7599 CNRS and UPMC Université Paris 06, Case courrier 188, 4 Place Jussieu, 75252 Paris Cedex 05, France^{4}CMAP, Ecole Polytechnique, Route de Saclay, 91128 Palaiseau Cedex, France

Received 27 September 2012; Accepted 28 November 2012

Academic Editor: Fima Klebaner

Copyright © 2012 Nicolas Champagnat et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We review recent results of ours concerning branching processes with general lifetimes and neutral mutations, under the infinitely many alleles model, where mutations can occur either at the birth of particles or at a constant rate during their lives. In both models, we study the allelic partition of the population at time . We give closed-form formulae for the expected frequency spectrum at and prove a pathwise convergence to an explicit limit, as , of the relative numbers of types younger than some given age and carried by a given number of particles (small families). We also provide the convergences in distribution of the sizes or ages of the largest families and of the oldest families. In the case of exponential lifetimes, population dynamics are given by linear birth and death processes, and we can most of the time provide general formulations of our results unifying both models.

#### 1. Introduction

We consider a general branching model, where particles have i.i.d. (not necessarily exponential) life lengths and give birth at a constant rate during their lives to independent copies of themselves. The genealogical tree thus produced is called *splitting tree* [1–3]. The process that counts the number of the alive particles through time is a *Crump-Mode-Jagers process* (or general branching process) [4] which is binary (births occur singly) and homogeneous (constant birth rate).

We enrich this genealogical model with mutations. In Model I, each child is a clone of her mother with probability and a mutant with probability . In Model II, independently of other particles, each particle undergoes mutations during her life at constant rate (and births are always clonal). For both models, we are working under the infinitely many alleles model; that is, a mutation yields a type, also called *allele*, which was never encountered before. Moreover, mutations are supposed to be neutral; that is, they do not modify the way particles die and reproduce. For any type and any time , we call *family* the set of all particles that share this type at time .

Branching processes (and especially birth and death processes) with mutations have many applications in biology. In carcinogenesis [5–10], they can model the evolution of cancerous cells. In [11], Kendall modeled carcinogenesis by a birth and death process where mutations occur during life according to an inhomogeneous Poisson process. In [8, 10], cancerous cells are modeled by a multitype branching process where a cell is of type if it has undergone mutations and where the more a cell has undergone mutations, the faster it grows. The object of this study is the time of appearance of the first cell of type . In [7], the authors study the arrival time of the first resistant cell and the number of resistant cells, in a model of cancerous cells undergoing a medical treatment and becoming resistant after having experienced a certain number of mutations.

Branching processes with mutations are also used in epidemiology. Epidemics, and especially their onset, can be modeled by birth and death processes, where particles are infected hosts, births are disease transmissions, and deaths are recoveries or actual deaths. In [12], Stadler provides a statistical method for the inference of transmission rates and of the reproductive value of epidemics in a birth and death model with mutations. In [13], Lambert and Trapman enriched the transmission tree with Poissonian marks modeling detection events of hospital patients infected by an antibiotic-resistant pathogen. They provided an inference method based on the knowledge of times spent by patients at the hospital at the detection of the outbreak.

Let us also mention the existence of models, for example, [14], of phage reproduction within a bacterium by a (possibly time inhomogeneous) birth and death process with Poissonian mutations, where particles model phage in the vegetative phase (DNA strands in the bacterium without protein coating) and death is interpreted as phage maturing (reception of protein coating).

In ecology, the neutral theory of biodiversity [15] gives a prediction of the diversity patterns, in terms of species abundance distributions, that are generated by individual-based models where speciation is caused by mutation or by immigration from mainland. Usually, the underlying genealogical models are assumed to keep the population size constant through time, as in the Moran or Wright-Fisher models, and so have the same well-known properties as models in mathematical population genetics (e.g., Ewens sampling formula), with a different interpretation. See [16, 17] for cases where this assumption is relaxed in favor of the branching property.

In this paper, we are first interested in the allelic partition of the population and more precisely in properties about the *frequency spectrum* (, ), where is the number of distinct types younger than (i.e., whose original mutation appeared after ) carried by exactly particles at time . This kind of question was first studied by Ewens [18] who discovered the well-known “sampling formula” named after him and which describes the law of the allelic partition for a Wright-Fisher model with neutral mutations.

In our models, it is not possible to obtain a counterpart of Ewens sampling formula but we obtain different kinds of results concerning the frequency spectrum (, ). First, we get a closed-form formula for the expected frequency spectrum, even in the non-Markovian cases. Second, we get pathwise convergence results as on the survival event, of the relative abundances of types. Third, we investigate the order of magnitude of the sizes of the largest families at time and of the ages of oldest types at time , as , and show the convergence in distribution of these quantities properly rescaled. Several regimes appear, depending on whether the *clonal process*, which is the process counting particles of a same type, is subcritical, critical, or supercritical.

We do not know any previous mathematical studies, other than ours, on branching processes with Poissonian mutations, but there are several existing mathematical results on branching models with mutations at birth that we now briefly review.

In discrete time, Griffiths and Pakes [19] studied the case of a Bienaymé-Galton-Watson (BGW) process where at each generation, all particles mutate independently with some probability . The authors obtained properties about the number of alleles/types in the population, about the time of last mutation in the (sub)critical case and about the expected frequency spectrum. In [20, 21], Bertoin considers an infinite alleles model with neutral mutations in a subcritical or critical BGW process where particles independently give birth to a random number of clonal and mutant children according to the same joint distribution. In [20], the tree of alleles is studied, where all particles of a common type are gathered in clusters and the law of the allelic partition of the total population is given by describing the joint law of the sizes of the clusters and of the numbers of their mutant children. In [21], Bertoin obtains the joint convergence of the sizes of allelic families in the limit of a large initial population size and a small mutation rate.

In continuous time, Pakes [22] studied Markovian branching processes and gave the counterpart in the time-continuous setting, of properties found in the previously cited paper [19]. In particular, his results about the frequency spectrum and the “limiting frequency spectrum” are similar to ours, stated in Section 3. Recently, Maruvka et al. [23, 24] have considered the linear birth and death process with Poissonian mutations. Actually, they rather studied a PDE satisfied by a concentration which can be seen as (but is not proved to be) a deterministic approximation to the number of families of size at time . It is remarkable that this PDE has a steady concentration , whose behavior as is comparable to the asymptotic behavior of the relative numbers of families of size as in the discrete model studied here and in [19]. In the monography [25], Taïb is interested in general branching processes known as Crump-Mode-Jagers processes (see [4, 26] and references therein) where mutations still occur at birth but with a probability that may depend, for example, on the age of the mother. He obtained limited theorems about the frequency spectrum by using random characteristics techniques but in most cases, limits cannot be explicitly computed. Some of our results in Model I are applications of Taïb’s, but use techniques specific to splitting trees to yield explicit formulae. We have refrained to apply results of Taïb on the convergence in distribution of properly rescaled sizes of the largest families, on the validity of which we have doubts in the case of supercritical clonal processes (see last section).

The paper is organized as follows. In Section 2, we define the models and give some of their properties that will be useful to state the main results. Section 3 is devoted to the study of the frequency spectrum (small families). Finally, in Section 4, we give the results about ages of the oldest families and about sizes of the largest ones.

Notice that in this paper, most of the results are stated for linear birth and death processes in order to simplify the notation. Most of them are also true with general life length distributions and are proved in Chapter 3 of the Ph.D. thesis [27] for Model I and in [28, 29] for Model II. Specific effort has been put on finding a unifying formulation for our results as soon as it seemed possible.

#### 2. The Models

##### 2.1. Model without Mutations

We first define the model without mutations and give some of its properties. Afterwards, we will explain the two mutation mechanisms that we consider in this paper.

As a population model, we consider *splitting trees* [1–3]; that is, (i)at time , the population starts with one progenitor; (ii)all particles have i.i.d. reproduction behaviors; (iii)conditional on her birth date and her life length , each particle gives birth at a constant rate during , to a single particle at each birth event.

It is important to notice that the common law of life lengths can be as general as possible. Let be the process counting the number of extant particles through time. We denote the lifespan distribution by , where is a finite positive measure on with total mass and is called a *lifespan measure* [3].

The total population process belongs to a large class of branching processes called *Crump-Mode-Jagers* or *CMJ processes*. In these processes, also called general branching processes [4, 26], one associates with each particle in the population a non-negative r.v. (her life length) and a point process called birth point process. One assumes that the sequence is i.i.d. but and are not necessarily independent. Then, the CMJ process is defined as
where for any particle in the population, is her birth time.

In our particular case, the common distribution of lifespans is and conditional on her lifespan, the birth point process of a particle is distributed as a Poisson point process during her life. We can say that the CMJ process is *homogeneous* (constant birth rate) and *binary* (births occur singly). We will say that is *subcritical, critical, or supercritical* according to whether the mean number of children per particle
is less than, equal to, or greater than 1.

The advantage of homogeneous, binary CMJ processes is that they allow for explicit computations, for example, about one-dimensional marginals of (see forthcoming Proposition 2.1). More precisely, for , define
and let be the greatest root of . Notice that is convex, , and . As a consequence,
Let be the so-called *scale function* [30, page 194] associated with , that is, the unique increasing continuous function satisfying

Proposition 2.1 (Lambert [3, 17]). *The one-dimensional marginals of are given by
**
and for ,
**
In other words, conditional on being nonzero, is distributed as a geometric r.v. with success probability . *

If denotes the extinction event of , according to [3], as a consequence of the last proposition, Thus, thanks to (2.4), extinction occurs a.s. when is (sub)critical and when it is supercritical.

The following proposition justifies the fact that is called the *Malthusian parameter* of the population in the supercritical case.

Proposition 2.2 (Lambert [3]). *If , conditional on the survival event ,
**
where is exponential with parameter . *

In fact, convergence in distribution is proved in [3] and a.s. convergence holds according to [31] (see [32, page 285]).

##### 2.2. Two Mutation Models I and II

We now assume that particles in the population carry types, also called *alleles*. We consider two population models where mutations appear in different ways. In each case, we will make the assumption of *infinitely many alleles*; that is, to every mutation event is associated a different type, so that every type appears only once. We will also assume that mutations are *neutral*; that is, they do not change the way particles die and reproduce.

In Model I, mutations occur at birth. More precisely, there is some such that at each birth event, independently of all other particles, the newborn is a clone of her mother with probability and a mutant with probability . An illustration is given in Figure 1.

In Model II, particles independently experience mutations during their lives at constant rate . In particular, in contrast with Model I, particles can change type several times during their lifetime, but always bear at birth the same type as their mother at this very time. An illustration is given in Figure 2.

In what follows, an important role will be played by the *clonal process*, generically denoted , counting, as time passes, the number of particles bearing the same type as the progenitor of the population at time 0. It can easily be seen that the genealogy of a clonal population is again a splitting tree, so that is also a homogeneous, binary CMJ process. We denote by its birth rate, by the associated convex function as in (2.3), and by the nonnegative function with Laplace transform . Furthermore, when the clonal population is supercritical, that is, when , we denote by its Malthusian parameter, which is the only nonzero root of . We will sometimes need to have this generic notation depending on the model considered: , , , and for Model I and , , , and for Model II.

Concerning Model I, it can be seen [27] that the clonal splitting tree has the same life lengths as the original splitting tree and birth rate , so that its lifespan measure is and In particular, as in (2.2), the clonal population is subcritical, critical, or supercritical according to whether is less than, equal to, or greater than 1. It should be noted that there is no closed-form formula for .

Concerning Model II, it can be seen [28] that the clonal splitting tree has birth rate and life lengths distributed as where has the probability distribution and is an independent exponential r.v. with parameter . Then we get In particular, and the clonal population is subcritical, critical, or supercritical according to whether is less than, equal to, or greater than . It can also be proved that and are differentiable and that their derivatives are related via with the requirement that .

##### 2.3. Exponential Case

An interesting case that we will focus on is the *exponential (or Markovian) case*, when the common distribution of life lengths is exponential with parameter (with the convention that lifespans are a.s. infinite if ), that is, or . In that case, is, respectively, a linear birth and death process with birth rate and death rate or a pure birth process (or Yule process) with parameter .

In this case, and are Markov processes and the quantities defined in Section 2.1 are computable. Indeed, we have It is also possible to compute the function , defined by (2.5), while it is generally unknown. From [3, page 393], we have and in all cases

The same results hold for , by, respectively, replacing , , and by in Model I and by in Model II.

We will sometimes state results in the total generality of splitting trees, in which case an equation numbered (-a) (resp. (-b)) refers to Model I (resp. Model II), as done previously. However, we will most of the time focus on the exponential case, in which we will as soon as possible use the unified notation using ’s. We will notify when the results can be generalized and will give precise references.

*Remark 2.3. *In the exponential case, notice that Models I and II are two (incompatible) cases of a more general class of linear birth and death processes with mutations, where particles mutate spontaneously at rate , die at rate , and give birth at rate and at each birth event: with probability , the mother and the daughter both mutate (and bear either the same new type or two different new types); with probability , the daughter (only) mutates; with probability , none of them mutates. Then Model I corresponds to the case when and Model II to the case when . The case studied by Pakes in [22] corresponds to , , , and . It is still an open question to check whether, when our results hold for both Models I and II with the unified notation, they hold for all linear birth and death processes with mutations.

#### 3. Small Families

Recall that a *family* is a maximal set of particles bearing the same type at the same given time. In this section, we are interested in results about *small families*; that is, families whose sizes and ages are fixed, in opposition to those of Section 4 which concern asymptotic properties of the largest and oldest ones.

More precisely, we give properties of the allelic partition of the entire population by studying the *frequency spectrum* (, ), where denotes the number of distinct types, whose ages are less than at time , carried by exactly particles at time . Notice that is simply the number of alleles carried by particles at time (regardless of their ages).

For instance, in Figure 1, the frequency spectrum (, ) is because three alleles (, , and ) are carried by one particle, and are carried by two particles, and is the only allele carried by three particles. Moreover, if we only consider families with ages less than , (, ) equals because alleles and appear in the population before time . Similarly, in Figure 2, the frequency spectrum in Model II is .

In the case of branching processes, there is no closed-form formula available for the law of the frequency spectrum as it is the case for the Wright-Fisher model thanks to Ewens sampling formula [18]. Nevertheless, we obtained for both mutation models an exact computation of the expected frequency spectrum and an almost sure asymptotic behavior of this frequency spectrum as .

##### 3.1. Expected Frequency Spectrum

We first give an exact expression of the expected frequency spectrum at any time .

For and , we denote by the number of types carried by particles at time and with ages in . The following proposition yields its expected value.

Proposition 3.1. *For and , one has**In the exponential case, both expressions are read as
*

In [27], (3.1a) is proved in the general case. Its proof uses the branching property and basic properties about Poisson processes. The main argument is that conditional on , is the sum of independent r.v. distributed as the number of mutants that appear in the population in a time interval and with clonal alive descendants at time . The proof of the general case of (3.1b) in [28] is based on coalescent point processes.

The expected frequency spectrums can be obtained by integrating (3.1a) and (3.1b) over ages. Taking into account the contribution of the type of the progenitor, we can prove the following result.

Corollary 3.2. *For and ,**In the exponential case,
*

The second terms that appear in the r.h.s. correspond to the probabilities that the progenitor has alive clonal descendants at time . In the exponential case, we left this probability as such, since its expression depends on the model. It is also possible to get similar equations for the number of families with ages less than (resp. with size ) by summing over (resp. by taking ) in the last expressions.

*Remark 3.3. *In the exponential case, when the process is critical, that is, when , for ,
which is reminiscent of Fisher log-series of species abundances [17]. Surprisingly, this expression is independent of .

From Corollary 3.2, we deduce the asymptotic behavior of in the supercritical case.

Proposition 3.4. *We suppose that . In the general case,
**
where, for Model I,**
and, for Model II,
**In the exponential case, one gets the simpler formula:
*

Notice that grows exponentially with parameter , as does on its survival event.

##### 3.2. Convergence Results

In this section and in all following ones, we are interested in long-time behaviors in the two models we consider. Then, from now on, *we assume that the process ** is supercritical. *

This paragraph deals with the improvements of the convergence results (3.6) regarding the expected frequency spectrum. The following results yield the asymptotic behavior as of the frequency spectrum (, ), conditional on the survival event.

The main technique we use to prove them is CMJ processes counted with random characteristics (see [4] and Appendix A in [25]). It enables us to obtain several pathwise convergence results regarding some processes embedded in the supercritical splitting tree.

A characteristic is a random nonnegative function on . To each particle in the population is associated a characteristic , which can be viewed as a score or a weight. It must satisfy that is an i.i.d. sequence, where we recall that is the life length of and its birth process. Then, the process counted with the characteristic is defined as For instance, if , equals and if , is the number of extant particles at time with ages less than , then, provided technical conditions about are satisfied, the convergences of and of as hold a.s. on the survival event. In our case, when is appropriately chosen, we can use this result to obtain the following statements.

Proposition 3.5. *Let be the number of extant types at time . Almost surely, on the survival event of ,
**
where in Model I,**
while in Model II,
**and where is the r.v. defined by (2.9). **In the exponential case, one has
*

Notice that (3.10) is consistent with (3.6) since and . Moreover, (3.10) still holds after is replaced by and by .

##### 3.3. Asymptotic Behavior of the Limiting Frequency Spectrum

Thanks to Proposition 3.5, the proportion of types carried by particles and with ages less than converges a.s. to as . This limit is called “the limiting frequency spectrum” by Pakes in [22]. This paragraph is devoted to the asymptotic behavior, as , of , obtained by taking in (3.7a) and (3.7b). In the exponential case,

###### 3.3.1. Supercritical Case

In this paragraph, we only treat the exponential case. Let us assume that the clonal process is supercritical, that is, . Define We have in Model I and in Model II. Recall that is the proportion of types carried by particles in the large time asymptotic.

Proposition 3.6. *In the exponential case, one has for both models
*

Notice that this result is consistent with [24] where Maruvka et al. use an approximation of the frequency spectrum by a concentration driven by a PDE and with [22] where Pakes considers Markov branching processes with multiple simultaneous births, binomial mutations at birth, and no Poissonian mutations.

*Remark 3.7. *The following proof of Proposition 3.6 easily extends to any life length distributions since it is based on Proposition 2.2 which holds in the general case.

*Proof of Proposition 3.6. *Since for , the sequence is positive and non-increasing. Then, according to a Tauberian theorem about series, to prove Proposition 3.6, it is sufficient to prove that is equivalent to as .

Recalling (3.13), we have
and from now on, we follow the proof of [22, Theorem ]. Let be such that . Then and
Using Proposition 2.2 and (2.13), , where is a nonnegative r.v. such that
and conditional on , is an exponential r.v. with parameter . Moreover, using Markov inequality, for ,
using again the a.s. convergence in Proposition 2.2. Then, for and , we have
and thanks to the dominated convergence theorem,
The change of variables in the last integral leads to
which terminates the proof.

###### 3.3.2. Critical Case

We want to obtain a similar result to Proposition 3.6 when the clonal population is critical. It seems that this is not possible in a general setting due to the non explicit expression of the functions and . However, in the exponential and critical case, we have the simpler expression and . Then, we have

Proposition 3.8. *In the exponential case, one has
**
where one recalls that here and one has set for . *

*Proof. * By a change of variables, we have set
where is known as a confluent hypergeometric function (see [33, Chapter 13]). Then, using [22, Theorem ] with , we have the result.

#### 4. Asymptotic Results about Large and Old Families

We now state results about ages of the oldest families and about sizes of the largest ones. We mainly focus on the case when clonal populations are subcritical. Then, in Section 4.3, we explain which results hold in the critical and supercritical cases.

We need some notation. For , (i)for , let be the number of extant families at time , with ages greater than ( for “old”); for convenience, we set if , (ii)for , let be the number of families with sizes greater than at time ( for “large”).

In this section, we are interested in finding the orders of magnitudes of the ages and of the sizes of the families; that is, in finding numbers and such that and converge to positive and finite real numbers as .

##### 4.1. Ages of Old Families in the Subcritical Case

In this section, we suppose that the clonal processes are subcritical and we are interested in ages of old families. Although we only state the results in the exponential case, they also hold in the general case and are proved in [27, Chapter 3] and [29]. However, to obtain the general results in Model I, additional assumptions about the lifespan measure are required, which are easily satisfied in the exponential case (for instance, we need the existence of a negative root of , which, with easy computations, is in the exponential case).

In the first result, which is a result in expectation, we show that in both models, the ages are of order of magnitude

Proposition 4.1 (see [27, 29]). *One supposes that is subcritical. For , one has
*

This result is a consequence of the expected spectrum formula (3.2), summed over and integrated on . We also obtain a more precise result about the convergence in distribution of as .

Proposition 4.2 (see [27, 29]). *With the same assumptions as in Proposition 4.1, for , conditional on the survival event, as , converges in distribution to an r.v. , distributed as a mixed Poisson r.v. whose parameter of mixture is
**
where is an exponential r.v. with mean 1. Equivalently, is geometric on with success probability
*

The proof of this proposition in the general case and for Model I, given in [27], follows arguments of Taïb in [25] and uses the notion of CMJ processes counted with *time-dependent* random characteristics developed by Jagers and Nerman in [26, 34]. The difference with (3.9) is that here the characteristics are allowed to depend on time. This theory provides convergences in distribution, as , of quantities of the form
under technical conditions about the family of characteristics (, ). The proof of Proposition 4.2 for Model II is given in [29] and does not make use of random characteristics.

The last result deals with the convergence in distribution of the sequence of the ranked ages of extant families. Let be the set of nonnegative -finite measures on and finite on , equipped with the *left-vague topology* induced by the maps for all bounded continuous functions such that there exists satisfying for all , .

Theorem 4.3 (see [27, 29]). *With the same assumptions as previously, let be the point process defined by
**
where is the decreasing sequence of ages of alive families at . Then, conditional on the survival event, converges as in equipped with the left-vague topology to a mixed Poisson point process with an intensity measure
**
where is an exponential r.v. with mean 1. *

##### 4.2. Sizes of the Largest Families in the Subcritical Case of Model II

In this paragraph, we still suppose that the clonal process is subcritical and we are interested in similar results as those of Section 4.1 about the sizes of the largest families. The aim is to find a number such that converges to a finite and positive limit as .

Concerning Model I, this problem is still open. On the contrary, it is possible to obtain in Model II the sizes of the largest families. In [29], they are given for any life length distribution but to simplify the results, we only state them in the exponential case. The following result is a consequence of (3.3b) applied with and summed over . Recall that the clonal process is assumed to be subcritical, so that .

Proposition 4.4 (see [29]). *One sets
**
Then, for ,
**
where denotes the fractional part of a real number and where is an explicit constant that only depends on , , and . *

For and , we denote by the size of the th largest family in the whole population at time . Let be the point measure of the renormalized sizes of the population. To get rid of fractional parts, the following theorem gives the convergence in distribution of and along a subsequence. More precisely, for , let be such that ; this equation has a unique solution for any greater than some integer . It satisfies We now state the convergence of the sequence (, ).

Theorem 4.5 (see [29]). *Conditional on the survival event, the sequence (, ) of point processes on converges as on the set equipped with the left-vague topology to a mixed Poisson point measure on with an intensity measure
**
where the mixture coefficient is an exponential r.v. with mean 1. *

##### 4.3. Other Results

###### 4.3.1. Critical Case in Model I

The case of a critical clonal process for a general supercritical splitting tree is treated in Section of [27] where the counterparts of Propositions 4.1 and 4.2 and Theorem 4.3 are proved.

If , provided that the second moment is finite and that a condition about the tail distribution of holds, ages of oldest families are of order Notice that these conditions about are trivially satisfied in the exponential case. These results were also proved in [25, Chapter 4] for any CMJ process , that is, with a birth point process as general as possible, but in that case, limits were not explicit.

Similarly to the subcritical case, the problem of sizes of the largest families is still open. Nevertheless, we can state the following conjecture about their order of magnitude.

Conjecture 4.6. *If
as , on the survival event, converges in distribution to a nondegenerate geometric r.v. *

###### 4.3.2. Critical Case in Model II

The general case when is critical () can be found in Sections 3.4 and 5 in [29]. For both ages and sizes, the counterparts of the results of Sections 4.1 and 4.2 hold.

As in Model I, ages of the oldest families are of order . Moreover, sizes of the largest ones are of order and the point measure, converges to a mixed Poisson measure as but contrary to Theorem 4.5, it does not only hold along a subsequence.

###### 4.3.3. Sizes of the Largest Families in Supercritical Cases

In [27, Chapter 3], general splitting trees in Model I are considered. When the clonal process is supercritical, that is, when , a result about the sizes of the largest families is proved. First notice that, as in (2.9), and on , a.s. converges as to an exponential random variable. Hence, the sizes of alive families at time must be of order as . We proved this in [27] by showing that converges as to an explicit limit.

Notice that we cannot obtain similar results to Proposition 4.2 and Theorem 4.3 concerning the convergence in distribution of and the convergence of the associated point measure of the decreasing sequence of family sizes.

In [25], Taïb considers a more general model than our Model I; mutation mechanism is the same but can be any supercritical CMJ process. In his Theorem 4.6, by using a time-dependent characteristic argument, he proved the convergence in distribution of (to a nonexplicit random variable). However, we have doubts about the application of Theorem A.7, since the technical requirements of this theorem do not seem to hold in his case. These technical requirements are neither proved to hold in [25] nor in [34].

In Model II, for a general supercritical splitting tree, if is supercritical, that is, , asymptotically grows like . In [29, Proposition 3.2], it is proved that converges as , but we were unable to obtain any convergence in distribution in that case.

#### Acknowledgments

This work was supported by project MANEGE ANR-09-BLAN-0215 (French National Research Agency). The authors want to thank an anonymous referee for his/her careful check of this paper.

#### References

- J. Geiger, “Size-biased and conditioned random splitting trees,”
*Stochastic Processes and their Applications*, vol. 65, no. 2, pp. 187–207, 1996. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - J. Geiger and G. Kersting, “Depth-first search of random trees, and Poisson point processes,” in
*Classical and Modern Branching Processes (Minneapolis, MN, 1994)*, vol. 84 of*IMA Volumes in Mathematics and its Applications*, pp. 111–126, Springer, New York, NY, USA, 1997. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - A. Lambert, “The contour of splitting trees is a Lévy process,”
*The Annals of Probability*, vol. 38, no. 1, pp. 348–395, 2010. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - P. Jagers,
*Branching Processes with Biological Applications*, Wiley-Interscience, London, UK, 1975, Wiley Series in Probability and Mathematical Statistics—Applied Probability and Statistic. - M. A. Nowak, F. Michor, and Y. Iwasa, “The linear process of somatic evolution,”
*Proceedings of the National Academy of Sciences*, vol. 100, no. 25, pp. 14966–14969, 2003. - Y. Iwasa, M. A. Nowak, and F. Michor, “Evolution of resistance during clonal expansion,”
*Genetics*, vol. 172, no. 4, pp. 2557–2566, 2006. - S. Sagitov and M. C. Serra, “Multitype Bienaymé-Galton-Watson processes escaping extinction,”
*Advances in Applied Probability*, vol. 41, no. 1, pp. 225–246, 2009. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - R. Durrett and S. Moseley, “Evolution of resistance and progression to disease during clonal expansion of cancer,”
*Theoretical Population Biology*, vol. 77, no. 1, pp. 42–48, 2010. - R. Durrett and J. Mayberry, “Traveling waves of selective sweeps,”
*The Annals of Applied Probability*, vol. 21, no. 2, pp. 699–744, 2011. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - K. Danesh, R. Durrett, L. J. Havrilesky, and E. Myers, “A branching process model of ovarian cancer,”
*Journal of Theoretical Biology*, vol. 314, pp. 10–15, 2012. View at Publisher · View at Google Scholar - D. G. Kendall, “Birth-and-death processes, and the theory of carcinogenesis,”
*Biometrika*, vol. 47, pp. 13–21, 1960. View at Zentralblatt MATH - T. Stadler, “Inferring epidemiological parameters based on allele frequencies,”
*Genetics*, vol. 188, no. 3, pp. 663–672, 2011. View at Publisher · View at Google Scholar - A. Lambert and P. Trapman, “Splitting trees stopped when the first clock rings and Vervaat's transformation,”
*Journal of Applied Probability*. In press. - J. Gani and G. F. Yeo, “Some birth-death and mutation models for phage reproduction,”
*Journal of Applied Probability*, vol. 2, pp. 150–161, 1965. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - S. P. Hubbell,
*The Unified Neutral Theory of Biodiversity and Biogeography*, Princeton University Press, Princeton, NJ, USA, 2001. - B. Haegeman and R. S. Etienne, “Relaxing the zero-sum assumption in neutral biodiversity theory,”
*Journal of Theoretical Biology*, vol. 252, no. 2, pp. 288–294, 2008. - A. Lambert, “Species abundance distributions in neutral models with immigration or mutation and general lifetimes,”
*Journal of Mathematical Biology*, vol. 63, no. 1, pp. 57–72, 2011. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - W. J. Ewens, “The sampling theory of selectively neutral alleles,”
*Theoretical Population Biology*, vol. 3, pp. 87–112, 1972, erratum, ibid. vol. 3, p. 240, 1972; erratum, ibid. vol. 3, p. 376, 1972. View at Zentralblatt MATH - R. C. Griffiths and A. G. Pakes, “An infinite-alleles version of the simple branching process,”
*Advances in Applied Probability*, vol. 20, no. 3, pp. 489–524, 1988. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - J. Bertoin, “The structure of the allelic partition of the total population for Galton-Watson processes with neutral mutations,”
*The Annals of Probability*, vol. 37, no. 4, pp. 1502–1523, 2009. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - J. Bertoin, “A limit theorem for trees of alleles in branching processes with rare neutral mutations,”
*Stochastic Processes and their Applications*, vol. 120, no. 5, pp. 678–697, 2010. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - A. G. Pakes, “An infinite alleles version of the Markov branching process,”
*Australian Mathematical Society A*, vol. 46, no. 1, pp. 146–169, 1989. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - Y. E. Maruvka, N. M. Shnerb, and D. A. Kessler, “Universal features of surname distribution in a subsample of a growing population,”
*Journal of Theoretical Biology*, vol. 262, no. 2, pp. 245–256, 2010. - Y. E. Maruvka, D. A. Kessler, and N. M. Shnerb, “The birth-death-mutation process: a new paradigm for fat tailed distributions,”
*PLoS ONE*, vol. 6, no. 11, article e26480, 2011. - Z. Taïb,
*Branching Processes and Neutral Evolution*, vol. 93 of*Lecture Notes in Biomathematics*, Springer, Berlin, Germany, 1992. - P. Jagers and O. Nerman, “The growth and composition of branching populations,”
*Advances in Applied Probability*, vol. 16, no. 2, pp. 221–259, 1984. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - M. Richard,
*Arbres, Processus de branchement non markoviens et Processus de Lvy [Ph.D. thesis]*, UPMC, Paris, France, 2011. - N. Champagnat and A. Lambert, “Splitting trees with neutral Poissonian mutations I: small families,”
*Stochastic Processes and their Applications*, vol. 122, no. 3, pp. 1003–1033, 2012. View at Publisher · View at Google Scholar - N. Champagnat and A. Lambert, “Splitting trees with neutral Poissonian mutations II:,”
*Largest and Oldest families*. In press, http://arxiv.org/abs/1108.4812. - J. Bertoin,
*Lévy Processes*, vol. 121 of*Cambridge Tracts in Mathematics*, Cambridge University Press, Cambridge, UK, 1996. View at Zentralblatt MATH - O. Nerman, “On the convergence of supercritical general (C-M-J) branching processes,”
*Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete*, vol. 57, no. 3, pp. 365–395, 1981. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - M. Richard, “Limit theorems for supercritical age-dependent branching processes with neutral immigration,”
*Advances in Applied Probability*, vol. 43, no. 1, pp. 276–300, 2011. View at Publisher · View at Google Scholar · View at Zentralblatt MATH - M. Abramowitz and I. A. Stegun,
*Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables*, vol. 55 of*National Bureau of Standards Applied Mathematics Series*, Superintendent of Documents, U.S. Government Printing Office, Washington, DC, USA, 1964. View at Zentralblatt MATH - P. Jagers and O. Nerman, “Limit theorems for sums determined by branching and other exponentially growing processes,”
*Stochastic Processes and their Applications*, vol. 17, no. 1, pp. 47–71, 1984. View at Publisher · View at Google Scholar · View at Zentralblatt MATH