#### Abstract

We present a stochastic methodology to study the decay phase of an epidemic. It is based on a general stochastic epidemic process with memory, suitable to model the spread in a large open population with births of any rare transmissible disease with a random incubation period and a Reed-Frost type infection. This model, which belongs to the class of multitype branching processes in discrete time, enables us to predict the incidences of cases and to derive the probability distributions of the extinction time and of the future epidemic size. We also study the epidemic evolution in the worst-case scenario of a very late extinction time, making use of the *Q*-process. We provide in addition an estimator of the key parameter of the epidemic model quantifying the infection and finally illustrate this methodology with the study of the Bovine Spongiform Encephalopathy epidemic in Great Britain after the 1988 feed ban law.

#### 1. Introduction

Outbreaks of infectious diseases of animals or humans are subject, when possible, to control measures aiming at curbing their spread. Effective measures should force the epidemic to enter its decay phase and to reach extinction. The decay phase can then be simply detected by a decrease of the number of cases, when this decrease is obvious. However this is not always the case, and this rough qualitative information might not be sufficient to evaluate accurately the effectiveness of the proposed measures to reduce the final size and duration of the outbreak. The goal of this paper is to present a stochastic methodology in discrete time to study more accurately the decay phase of an epidemic. Our framework is the spread, in a large open population, of a rare transmissible disease such that the infection process may be assumed to follow a Reed-Frost type model, with a probability for a susceptible to become infected by a given dose of pathogens inversely proportional to the total population size. Moreover the latent period (during which an individual is infected but not yet infectious) may be random and long compared to the generation time. Questions about the decay phase include the following: which quantitative criteria can ensure that the disease has entered an extinction phase? What is the probability distribution of the epidemic extinction time, of the epidemic final size, and of the incidence of infected individuals? Finally, what would be the evolution of the epidemic in the event of a very late extinction of the disease?

From a practical point of view, it is generally impossible to observe all infections. Susceptible and infected but not yet infectious individuals are most often not distinguishable, being both apparently healthy. This leads to the fact that the only available observations correspond to the incidence of individuals with clinical symptoms. One way to deal with this lack of information was proposed in [1] by Panaretos, who used a model taking into account two types of infected individuals, the observed and the unobserved. In order to answer the previous questions, we choose here a different approach, considering a stochastic model depending on the sole incidences of infectives at each time. We assume that an infective can transmit the disease during one given time unit at most. Therefore the incidence of infectives corresponds to the incidence of cases. The process then describes in a recursive way how one single infective can indirectly generate new infectives (so-called “secondary cases”) time units later, where . We assume that this number of secondary cases follows a Poisson probability distribution with parameter . The recursive formula defining is then the following: where the variable is the incidence of secondary cases produced at time with a delay (latent period) by individual infectious at time . The are assumed independent given , and the are assumed i.i.d. (identically and independently distributed) given , with a common Poisson distribution with parameter . This model is therefore time homogeneous and is in this sense less general than the one introduced in [2], which describes the spread of infectious animal diseases in a varying environment. However since we focus on the extinction phase only, the assumption of a constant environment with no new control measure is well founded and enables us to describe more accurately the decay phase. This process is the generalization of the well-known single-type BGW (Bienaymé-Galton-Watson) branching process, which is the limit, as the total population size tends to infinity, of the process describing the spread in a closed population of an infectious disease with a negligible latent period and a probability to become infected following a Reed-Frost model (see, e.g., [3, 4] and citations therein).

The core of the paper lies in Section 2, where the whole methodology is presented. We first formulate the epidemic model as a multitype branching process with Poissonian transitions, the types representing the memory of the process. This formulation provides useful analytical results such as the extinction criteria, and the distributions of the extinction time and of the epidemic size (Sections 2.1–2.3). Then, in order to investigate the worst-case scenario of an extreme late extinction of the epidemic, we introduce in Section 2.4 the -process , obtained by conditioning on a very late extinction. Using this process, we focus the study on the early behavior of the decay phase in the worst-case scenario, rather than on its long range behavior, which would have little meaning in our setting. Motivated by practical applications to real epidemics, for which we want to predict the processes and , as well as the derived distributions above, we need to know the values of the parameters . We may write , where is the mean number of individuals infected at age by an infective by direct or indirect transmission, is the largest survival age, is the probability for the individual aged at infection to have a latent period equal to (given his survival), and is the probability to be aged at the end of the latent period. Parameters are the key quantities for the spread of the disease and can be subject to changes due to control measures during the epidemic. We assume here that , where is constant over time, while may change with control measures. A typical example is when is the mean number of individuals infected by an infective by horizontal route at age , assumed independent of , and represents the maternal transmission probability . In this case So we assume here that, except for , the other parameters of the are constant over time and are known (generally estimated) from previous experiences or from the study of the whole epidemic evolution, in particular its growth phase. We moreover assume that each depends affinely on (see Section 3.1). In Sections 2.5 and 2.6, we provide optimal WCLSE (Weighted Conditional Least Squares Estimators) of in the decay phase, in the frame of as well as in the frame of the associated conditioned process , and prove the strong consistency and the asymptotic normality of these estimators.

The final Section 3 is devoted to the application of this method to real epidemics. We first present in Section 3.1 some general conditions under which the spread of a SEIR disease (susceptible, exposed (latent), infectious, removed) can be approximated by our epidemic process defined by (1.1) and give an explicit derivation of the parameters . We then illustrate the methodology in Section 3.2 with the decay phase of the BSE (Bovine Spongiform Encephalopathy) epidemic in Great Britain. According to the available data [5], the epidemic is obviously fading out. We assume that the satisfy (1.2). Then thanks to the stochastic tools developed here, we provide in addition to this rough information, short- and long-term predictions about the future spread of the disease as well as an estimation of a potentially remaining horizontal infection route after the 1988 feed ban law.

#### 2. Methodology for the Study of an Epidemic Decay Phase

In this section we present a general methodology to study the decay phase of a SEIR disease in a large population, modeled by the process (1.1) defined in Section 1. Our main goal is to provide analytical tools to evaluate the efficiency of the last control measures taken prior to the considered time period. Most of our results are derived from the fact that this epidemic model can be seen as a multitype branching process. Indeed, defined by (1.1) is a Markovian process of order . Consequently, the -dimensional process defined by is Markovian of order 1, and it stems directly from (1.1) that is a multitype Bienaymé-Galton-Watson (BGW) process with types (see, e.g., [6]). Note that the types in this branching process do not correspond to any attribute of the individuals in the population, which is usually the case in mathematical biology (see, e.g., [7]), but simply correspond to the memory of the process . The information provided by the -dimensional Markovian process is therefore the same as the one given by the 1-dimensional -Markovian process , but the multitype branching process setting gives us powerful mathematical tools and results stemming from the branching processes theory [6]. The first basic tool is the generating function of the offspring distribution of , , defined on by , where denotes the th basis vector of and for . For all , we have here The second basic tool is the mean matrix defined by , which is here Let us notice that, since for each , then is nonsingular, positive regular (see [6]) and satisfies the condition, where denotes the sup norm in .

##### 2.1. Extinction of the Epidemic

###### 2.1.1. Almost Sure Extinction

Since the single-type process has a memory of size , it becomes extinct when it is null at successive times, or equivalently as soon as the -dimensional process reaches the -dimensional null vector . According to the theory of multitype positive regular and nonsingular BGW processes [6], the extinction of the process occurs almost surely (a.s.), if and only if , where is the dominant eigenvalue (also called the Perron's root) of the mean matrix . Thus is solution of . In general for , has no explicit expression. However, leads directly to the following explicit threshold criteria.

Proposition 2.1. *The epidemic becomes extinct almost surely if and only if , where is the total mean number of secondary cases generated by one infective in a disease. We call the basic reproduction number.**Moreover, when , then with equality if and only if either or , and when , then with equality if and only if .*

Note that when , only provides information about the threshold level, whereas provides an additional information about the speed of extinction of the process, as shown in the next two paragraphs.

###### 2.1.2. Speed of Extinction

Thanks to well-known results in the literature about multitype branching processes and more particularly to the Perron-Frobenius theorem (see, e.g., [6]), we can deduce the expected incidence of infectives in the population at time , for large. Denoting by and the right and left eigenvectors of associated to the Perron's root ; that is, and , with the normalization convention , where stands for the usual scalar product in and where the superscript denotes the transposition, then . The first coordinate in the latter formula becomes for the epidemic process . Computing explicitly and , we obtain that for all , which leads to the following asymptotic result: Hence if , the mean number of infectives decreases exponentially at the rate . In the following section, we provide a much finer result on the estimation of the disease extinction time in the population.

###### 2.1.3. Extinction Time of the Epidemic

The extinction time distribution can be derived as a function of the offspring generating function. As usual in stochastic processes, this quantity is calculated conditionally on the initial value , but for the sake of simplicity we do not let it appear in the notations. Note that since we are building tools for the prediction of the spread of the disease, the time origin corresponds here to the time of the last available data (generally the current date). Let denote the extinction time of the process , and let be the th iterate of the generating function given by (2.2). We denote . Then, by the branching property of the process [6], the probability of extinction of the epidemic before time is immediately given by , that is to say, It can be immediately deduced from convergence results for as [8], that if , , while if , , for some constants . As a consequence, the closer is to unity, the longer the time to extinction will be in most realizations. More specifically, (2.7) enables the exact computation (resp., estimation) of for any by the iterative computation of , being given, when the parameters of (1.1) are known (resp., estimated). Moreover, since for the epidemic becomes extinct in an a.s. finite time and , then for any given probability there exists such that . So in practice, for any , (2.7) enables us to compute the -quantile of the extinction time,

##### 2.2. Total Size of the Epidemic

Under the assumption and the independence of the (we previously assumed the independence of the , for each ), we derive the distribution of the future total size of the epidemic until its extinction, that is, the future total number of infectives until the extinction of the disease. It can be shown [9] that, given the initial value , the time origin being the same as in Section 2.1, the probability distribution of is where denotes the equality in distribution, an empty sum is by convention , the are independent, and the and the are i.i.d. with that is, for each , . Consequently, under the convention that an empty product is 1, the probability distribution of is, for any , which may be calculated (resp., estimated), replacing the by their values (resp., estimations). In practice, for any , (2.11) enables to compute the -quantile of the total epidemic size, We obtain moreover an explicit formula for the mean value and variance of the size of the epidemic,

##### 2.3. Exposed Population

Depending on the disease, it might also be crucial to study and predict the evolution of the incidence of exposed individuals in the population, which is generally unobservable. We assume that this information is given by the process defined by the conditional distribution , where is the mean number of individuals infected at time by an infective of this time (see Section 3.1). This property enables on the one hand to reconstruct the whole past epidemic (i.e., the incidence of infectives as well as of exposed individuals) thanks to the observable data. On the other hand, it allows to simulate the evolution of the incidence of exposed individuals in the future, based on predictions of the evolution of the epidemic process .

##### 2.4. Worst-Case Scenario: Very Late Extinction of the Epidemic

Even in the case when the epidemic dies out almost surely (), and although one can provide the -quantile of the extinction time with the probability as large as wanted (see (2.8)), the epidemic might become extinct after this time with a small but nonnull probability of order . This raises the following question: how would the incidences of infectious and exposed individuals evolve in the (unlikely) case of a very late extinction? In terms of risk analysis, this issue appears to be crucial to evaluate the risks associated with this worst-case scenario. The tools developed in the previous subsections allow to evaluate the probability of all possible outcomes. But since the worst ones, typically a very late extinction, have a negligible probability, these tools do not bring any information in these worst cases and in particular do not inform on the evolution at each time step of the spread of the disease (would it decrease extremely slowly, stay at a constant rate for a very long time, present several peaks in its evolution, etc.?). In order to study the propagation of the epidemic in the decay phase, assuming that extinction occurs very late, we are interested in the distribution of the process conditionally on the event that the epidemic has not become extinct at time , where is very large. We therefore consider for any and any the conditioned probability , where the subscript denotes the initial value. If is finite this distribution cannot be easily handled due to its time inhomogeneity. However, when , it is known ([10]) that this conditioned distribution converges, as , to the distribution of a -dimensional Markov process : We will further discuss in Proposition 2.5 the relevancy of approximating the conditioned probability for fixed by the limiting object (2.14). The conditioned process defined by (2.14) is known in the literature as the -process associated with , also described as the process conditioned on “not being extinct in the distant future.” It has the following transition probability [10]: for every , , , where is the normalized right eigenvector of associated to the Perron's root as introduced in Section 2.1, and computed explicitly in (2.5). In the same way as for the process , we define the 1-dimensional process . By construction we then have , for each and each .

Proposition 2.2. *The stochastic process is, conditionally on its past, distributed as the sum of two independent Poisson and Bernoulli random variables:
**
where , is the convolution product symbol, and
*

*Proof. *Applying (1.1) and (2.15), we obtain that for all ,
The equality implies that for all , , and that . Consequently, , and thus

*Remark 2.3. *Note that if one compares (2.16) with the transition probability of the unconditioned process , it appears that behaves at each time step like , according to a Poisson distribution, except that it has the possibility at each time step to add one unit or not, according to a Bernoulli random variable. Moreover, if , then according to (2.17), , which implies that at time , the probability to add one unit is equal to one, thus preventing the extinction of the process.

Proposition 2.4. *The process admits a stationary probability measure with finite first- and second-order moments.*

*Proof. *Since the multitype branching process satisfies property (2.4), it is known [10] that the -process is positive recurrent with a stationary probability measure given by
where is the Yaglom distribution of the process , uniquely defined by the following property: for all , . In the literature, this stationary measure for the conditioned process is also referred to as the doubly limiting conditional probability. Moreover, by Proposition 2.2,
which implies that . We consequently obtain by means of Fatou's lemma that, for every ,
We similarly prove that has finite second-order moments by writing

Let us discuss the relevancy of approximating the epidemic process conditioned on nonextinction at some finite time , for large, by the -process obtained by letting . When considering the case of late extinction, one works under an hypothetical assumption based on the unknown future, hence in practice one does not focus on a specific value for the survival of the disease in the population. We therefore might consider that is chosen large enough such that the approximation of the process conditioned on the event by the process is valid. Of course, the order of magnitude of such will depend on the rate of convergence of the conditioned process to .

Proposition 2.5. *Let and . Then the difference decreases, as , with , where is an eigenvalue of such that , with the being the other eigenvalues of . In case we stipulate that the multiplicity of is at least as great as the multiplicity of .*

*Proof. *Thanks to (2.15) and to the Markov property together with the fact that , we have
The right term of (2.24) is known to converge to 0, as , thanks to the property that , for some , where (see [8]). This stems from two convergences, namely, , where , and . Let us write , where . Since and , it comes
It thus appears that the rate of convergence of to is of the same order of magnitude as the one of to . Let us determine this rate in an accurate way. We use the following inequality produced by Joffe and Spitzer in [8]: for each ,
where we are going to replace and by some explicit formulae function of and . For this purpose, we use a detailed asymptotic behavior of , as , presented for instance in [11]; we have , where . For the sake of clarity the symbol will denote either a scalar or a matrix with all the entries satisfying the associated property. This implies the existence of some constant such that, for all , , where . Moreover, following [8], let us write, for all , , where , and as . Then , which implies the existence of some constant such that, for all , , with . We thus have provided an explicit formula for the sequences and introduced by Joffe and Spitzer in [8]. Finally let us apply (2.26) to and replace in this inequality and by their explicit expressions that we got. We obtain that, for all ,
Consequently, the right member of (2.24) will decrease with , as .

Hence the concept of the -process will have most practical relevance to approximate the very late extinction case if is near to zero and if is small compared with . Note however that the very late extinction scenario is more likely to happen if is near to unity because the time to extinction in most realizations will then be long (see Section 2.1).

##### 2.5. Estimation of the Infection Parameter

We assume for this subsection that the parameters of the epidemic model (1.1)–(2.1) are not entirely known. More precisely, we assume that the are of the form, for all , where and are constants, and is an unknown real parameter. We will write in what follows , where and so forth. This general assumption corresponds in particular to the case where the are of the form (1.2).

We estimate by the following WCLSE in model (1.1)–(2.1). This estimator generalizes the well-known Harris estimator [12] in a BGW process. Let , , such that . The WCLSE is based on the normalized process and is defined by We easily derive the following explicit form: The normalization of the process by appears to be the most natural and suitable for the following reasons. First, this normalization generalizes the normalization in the monotype case, which is the one leading to the Harris estimator of since we have, for , . It also corresponds, in the linear case , to the maximum likelihood estimator of . In addition, defining for any vector , and , we have hence the conditional variance of the error term in the stochastic regression equation is invariant under multiplication of the whole process, and bounded respectively to , leading to the quasi-optimality of at finite and , in the sense of [13].

Let us provide asymptotic results for the estimator defined by (2.30), as the initial population size tends to infinity. We denote by the th entry in the th power of the matrix given by (2.3).

Theorem 2.6. *Let us assume that, for each , there exists some such that . Then is strongly consistent, that is, , and is asymptotically normally distributed:
**
where
*

*Proof. *Let us first prove that, for each and each ,
Using the branching property of the process ([6]) we write
where, for all and , is the th coordinate of a -type branching process at time initialized by a single particle of type . For , , and fixed the random variables are i.i.d. with mean value . According to the strong law of large numbers and under the theorem assumption, we have, for every such that ,
which together with the theorem assumption leads to (2.34).

To prove the consistency of we apply (2.34) to (2.30), using the fact that and , and obtain
By definition,
hence (2.37) immediately leads to the strong consistency.

We are now interested in the asymptotic distribution of . We derive from (2.30) that
By (1.1),
where the are i.i.d. given , following a Poisson distribution with parameter , and the are independent given . Renumbering the we then obtain
Applying a central limit theorem for the sum of a random number of independent random variables (see, e.g., [14]), we obtain that, for all ,
We have used the fact that is a real positive sequence growing to infinity, and is a sequence of integer-valued random variables such that converges in probability to a finite random variable. In our case the limit is actually deterministic, since we have shown in (2.34) that
Using (2.41) in (2.39), we write
Using again (2.34),
which, combined to (2.42) and (2.44), implies by Slutsky's theorem that
By (2.33) and the strong consistency, , from which we finally deduce (2.32).

##### 2.6. Estimation of the Infection Parameter in the Worst-Case Scenario

In order to make predictions of the evolution of the epidemic in case of a very late extinction, that is, in order to make predictions of the behavior of the conditioned process introduced in Section 2.4, we need to estimate the parameter in the setting of this conditioned process. We point out that does not play the same role in the conditioned process and in the unconditioned process , since, as shown in Proposition 2.2, this parameter interferes not only in the Poisson random variable but also in the Bernoulli one. It would thus be irrelevant to estimate with an estimator aimed for the unconditioned process, such as . Let us notice that, according to (2.16), the process could be written as a multitype branching process with state dependent immigration. Because of this state-dependence, and since the parameter acts in a nonlinear way in the immigration, the methods developed in estimation theory for branching processes with immigration (see, e.g., [15]) cannot be directly applied here. Similarly as in Section 2.5 we consider the WCLSE based on the process , namely, where is defined in Section 2.5, and where Let be the error term between the normalized process and its conditional expectation. We obtain that which implies In what follows, we denote by the derivative of with respect to , and similarly for the other quantities depending on .

Theorem 2.7. *The estimator is strongly consistent, that is, , and has the following asymptotic distribution:
*

*Remark 2.8. *Note that (2.51) involves the function and thus requires the knowledge of the derivative of the function given in (2.5), which is not an explicit function of since is not either. However, satisfies ; hence is known as soon as can be computed. Consequently, denoting by when is the parameter of the model, (2.51) can be used as soon as is known; for this purpose one can for instance numerically approximate the largest solution of .

*Proof. *The proof heavily relies on a strong law of large numbers for homogeneous irreducible positive recurrent Markov chains applied to the conditioned process and its stationary distribution (given by Proposition 2.4), which states [16]: for every -integrable function ,
Note that the Perron's root of and the associated right normalized eigenvector are -functions of .

In order to prove the strong consistency of as , since is not linear, we cannot use the general standard method based on the first-order expansion of at , and on the strong law of large numbers for martingales applied to correctly normalized. We consequently use the following conditions given in [17]:(i) is Lipschitz on ; that is, there exists a nonnegative -measurable function (where ), satisfying, for all , , a.s., (ii),(iii). Let us note that, in the frame of a general model , since is compact, (i) is satisfied as soon as has a first derivative in with , (ii) is satisfied for any optimal estimator in the sense of [13] and moreover could be weakened (see [17]), and (iii) is a necessary condition. First, for all and , , , where denotes the derivative of with respect to , which thanks to (2.17) is bounded on . Condition (i) is thus satisfied. Condition (ii) follows from (2.50). Condition (iii) comes from the fact that, for every and every such that , applying the mean value theorem to the -function ,
Let us show that the function is -integrable. For every , and , denoting (by continuity of on ), and ,
Hence, for all , , and applying (2.52) together with (2.22) we obtain that
Let fixed. Since, for all , , the extreme value theorem implies that . Hence the right term in (2.55) is strictly positive, which together with (2.55) leads to (iii).

Let us now consider the asymptotic distribution of . For this purpose, we follow the steps of the proof of Proposition 6.1 in [17]. Writing the Taylor expansion of at we obtain that , for some , with . Since , we can write
where . Let us first show that
This is an application of (2.52) and (2.22), since, for all , ,
In view of (2.56), we now prove that
Computing thanks to the formula , it appears that (2.59) is true, as soon as the following holds:
Let us prove (2.60)–(2.62). First, (2.60) is given by a strong law of large numbers proved in [17, Proposition 5.1]. The latter can be indeed applied since (as an immediate consequence of the stronger result (2.57)), and since fulfills the required Lipschitz condition. Indeed we proved earlier that is continuous on the compact set and is thus bounded on , which implies that is bounded by a -measurable function. In view of (2.61), we consider the function and its derivative . Similarly as for (2.54), one can show that there exists a constant such that for all , and all , . This implies
Consequently,
which by (2.57) and the strong consistency of almost surely tends to . Writing
this implies (2.61). It now remains to prove (2.62). We write