Model Selection and Evaluation Based on Emerging Infectious Disease Data Sets including A/H1N1 and Ebola

Liu, Wendi; Tang, Sanyi; Xiao, Yanni

doi:https://doi.org/10.1155/2015/207105

Computational and Mathematical Methods in Medicine

On this page

Abstract Introduction Data and Methods Results Discussion and Conclusions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2015 | Article ID 207105 | https://doi.org/10.1155/2015/207105

Model Selection and Evaluation Based on Emerging Infectious Disease Data Sets including A/H1N1 and Ebola

Wendi Liu,¹Sanyi Tang,¹and Yanni Xiao²

Academic Editor: Chung-Min Liao

Received26 Jun 2015

Revised19 Aug 2015

Accepted27 Aug 2015

Published15 Sept 2015

Abstract

The aim of the present study is to apply simple ODE models in the area of modeling the spread of emerging infectious diseases and show the importance of model selection in estimating parameters, the basic reproduction number, turning point, and final size. To quantify the plausibility of each model, given the data and the set of four models including Logistic, Gompertz, Rosenzweg, and Richards models, the Bayes factors are calculated and the precise estimates of the best fitted model parameters and key epidemic characteristics have been obtained. In particular, for Ebola the basic reproduction numbers are 1.3522 (95% CI (1.3506, 1.3537)), 1.2101 (95% CI (1.2084, 1.2119)), 3.0234 (95% CI (2.6063, 3.4881)), and 1.9018 (95% CI (1.8565, 1.9478)), the turning points are November 7,November 17, October 2, and November 3, 2014, and the final sizes until December 2015 are 25794 (95% CI (25630, 25958)), 3916 (95% CI (3865, 3967)), 9886 (95% CI (9740, 10031)), and 12633 (95% CI (12515, 12750)) for West Africa, Guinea, Liberia, and Sierra Leone, respectively. The main results confirm that model selection is crucial in evaluating and predicting the important quantities describing the emerging infectious diseases, and arbitrarily picking a model without any consideration of alternatives is problematic.

1. Introduction

Emerging and reemerging infectious diseases such as severe acute respiratory syndrome (SARS) in 2003 [1, 2], novel influenza (A/H1N1) pandemic in 2009 [3], and Ebola outbreak in West Africa in 2014 significantly affect public health, economic activity, and population movements. In particular, the 2014 Ebola outbreak in West Africa represents the largest outbreak of Ebola virus to date. Public health interventions have been introduced in all affected countries and show the great effects on the infection. However, the numbers of infected cases from Ebola show a trend of bouncing back after declining in February 2015. Those indicate that the outbreak patterns of Ebola in West Africa become more and more complex, and hence it is important to determine the best model by employing mathematical models and model selection methods, which can be used to estimate and predict the characteristics of emerging infectious diseases.

Although susceptible-infective-removal (SIR) compartmental model is commonly used to describe the transmission dynamics of an infectious disease, it cannot be used when we consider only the cumulative infected population and capture the temporal variations of an outbreak, such as the turning point that is the point in time at which the rate of accumulation changes from increasing to decreasing. Several models have been proposed to estimate basic reproduction number, turning point, and final size by cumulated cases; some of them are based on purely empirical relationship, while others have a theoretical basis and are realized by differential equations. The simplest and commonly applied model among all the infectious disease models is the Richards model [3–5]. By employing Richards model, Hsieh et al. investigated the characteristics including basic reproduction number, turning point, and final size for influenza such as H1N1 [3], SARS [4], and Dengue [5] by fitting Richards model to the reported cumulative cases.

The most common approach in infective disease data analyses with simply ODE model is to select one model, usually Richards model, based on the shape of the desired curve and on biological assumptions. A single wave of infections consisting of a single peak of high incidence, an S-shaped cumulative epidemic curve, and a single turning point of an outbreak can be the best fitting to data using the selected model. Inference and estimation of parameters and their precision are based on the fitted model. Therefore, the interesting questions would be as follows: Can Richards model effectively predict the growth of the cumulative infected population? How to select the best model for fitting the emerging infectious diseases data? Is it possible to predict the turning point and final size and effectively estimate the basic reproduction number which are quite important in the disease control and management?

The traditional approaches of hypotheses testing, when applied to model selection, have been often found to be mediocre [6, 7]. The adjusted coefficient of multiple determination that is often used in model selection was found to be a very poor approach [8]. Posada and Buckley [9] pointed out that the Bayesian and Akaikes information criterion (AIC) approaches present several important advantages over other model selection methods. Therefore, in the present work we employ the Bayes factors to select one model from a set of competing models which can capture the underlying disease outbreak best, and further it can be confirmed by calculating AIC values. The basis of the Bayes factor approach to model selection is quantifying the plausibility of each model when the data and the set of candidate models are given. The Bayes factor is a measure of the change from prior model odds to posterior model odds, brought about by the observed data. In this study, we calculate the Bayes factor with the ratio of the selected number of different models and sample from the joint space of product of model and parameters in each model and then estimate the posterior probability of each model using Metropolis-Hastings (MH) algorithms.

In Section 2 we initially present the data sources and the important quantities describing the emerging infectious diseases for this study and then briefly provide the approaches of Bayesian model selection and realization algorithms. In Section 3 we verify the validity of the model selection algorithm introduced in Section 2. In Section 4 based on the real data sets for 2009 A/H1N1 in Shaanxi Province of mainland China and the data sets for current Ebola infection in West Africa, we select the optimal model and examine the specifics of the corresponding diseases. In particular, we focus on estimating basic reproduction number, turning point, and final size of A/H1N1 and Ebola and then explain some important issues related to the emerging infection disease control. Finally, we conclude by summarizing important conclusions and emphasizing the importance of model selection.

2. Data and Methods

2.1. Data Sources, Basic Reproduction Number, Turning Point, and Final Size

We employ the data on laboratory-confirmed cases of pandemic A/H1N1 influenza admitted to the 8th Hospital of Xi’an, the Province’s Public Health Information System [11, 13], in 2009. The data included information on the daily number of hospital notifications and the number of newly reported hospital notifications (local/imported cases in mainland China or community/sporadic cases in Shaanxi Province). For the Ebola data sets, we use the data from the WHO website for the most serious regions including Guinea, Liberia, and Sierra Leone from March 25, 2014, to May 3, 2015. Note that the data for Ebola are the sum of confirmed, probable, and suspected cases.

As mentioned in Section 1, the main purpose is to choose the best model from the several single species growth models, which will help us to evaluate the characteristics of emerging infectious diseases including A/H1N1 and Ebola. In particular, the basic reproduction number, turning point, and final size are the most important quantities describing the emerging infectious diseases. Thus, we first estimate the parameter values for each model candidate based on the data sets and then determine the best fitted model to calculate the basic reproduction number for A/H1N1 in China and Ebola in different regions of West Africa. The basic reproduction number can be obtained from the formula [3, 4], where denotes the intrinsic growth rate and is the generation time of disease transmission.

Secondly, the turning point (or the inflection point of the cumulative case curve), defined as the time when the rate of case accumulation changes from increasing to decreasing (or vice versa), will be estimated for A/H1N1 in China and Ebola in different regions of West Africa. The turning point plays an important role in determining the rate of change transitions from positive to negative, that is, the moment at which the cases begin to decline. Precisely estimating this point can allow us to determine either the beginning of a new epidemic phase or the peak of the current epidemic phase, representing the point at which disease control activities take effect or the point at which an epidemic begins to wane naturally, defined by Hsieh et al. [14].

The final size of an emerging infectious disease is another important quantity for public health, which is the likely magnitude of the outbreak, and it is often called the expected final size of the epidemic [14, 15].

2.2. Metropolis-Hastings Algorithm and Bayesian Model Selection

The principle of MCMC methods can be briefly described as follows: build a transition kernel with stationary distribution (which is a target density) and then generate a Markov chain using this kernel such that the limiting distribution of is . The integral can be approximated with the standard average . The Metropolis-Hastings (MH) algorithm is one of the methods to realize MCMC algorithm, which can produce a Markov chain with the objective density and the transition probability . The algorithm is as follows.

Given ,(1)move the chain to a new value generated from the density .(2)Take where with The distribution is called the instrumental (or proposal or candidate) distribution and probability is the Metropolis-Hastings acceptance probability [16].

Suppose that the observed data is generated by a model , where is the finite set of competing models. Corresponding to model , there is a distinct unknown parameter vector of dimension and a prior model probability with . Let be set of all possible values for ; then ; and let be the collection of all model-specific ’s. Now our interest lies in obtaining the posterior probabilities for the various models, and then in determining the best model.

A slightly more direct (and more common) approach to estimating posterior model probabilities using MCMC has been included in the model indicator as a parameter in the sampling order. As a result, most model settings require that the MCMC searches over the models and parameter space jointly. That is, the joint sampling space is

Besides the marginal posterior model probabilities , this joint search also permits posterior estimation of the parameters under each model, . Assume that, corresponding to model , the likelihood function is , the prior probability is , is merely an indicator of which is relevant to , and is independent of given the model indicator [17]. The following four models are employed in the present work.

Logistic model is as follows:

Gompertz model is as follows:

Rosenzweig model is as follows:

Richards Model (the reverse Rosenzweig model) is as follows:

For convenience, we denote, respectively, the above four models as , , , and , so . Here the positive parameter denotes the intrinsic growth rate, represents carry capacity, and is the exponent of deviation. The above four models are widely used single species models which can be solved analytically and thus can be easily employed to fit the data and estimate the unknown parameters.

Let be the unknown true cumulative number of cases with and denote the reported cumulative cases of the emerging infectious disease with . Because the reported cases having certain statistical errors are inaccurate, we assume that the reported cases follow a Poisson process. Thus, if the real cumulative number of cases at a given time is , the probability of the number of cases reported is

Further, we assume that the set of parameter vectors is , in which the parameters are independent of each other. In particular, for models and . For simplicity, we select noninformation prior distribution; that is, constant; thus the posterior distribution probability reads

The step of model selection with the Metropolis-Hastings algorithm is based on a proposal for a move from model to , followed by acceptance or rejection of this proposal. Assume that the selection probability of model is . The procedure given by Han and Carlin [17] is as follows:(1)Let the initial value be .(2)Propose a new model with probability .(3)Accept the proposed move (from to ) with probability

Under the usual regularity conditions, this MH algorithm will produce samples. Provided that the sampling chain for the model indicator mixes sufficiently well, the posterior probability of model can be estimated bywhich can in turn be used to estimate a Bayes factor asThe criterion of model selection based on the Bayes factor is shown in Table 1.

Based on above procedures, we realize our model selection as follows. Firstly, we obtain the Markov chains having 500000 samplers for each parameter of each model, respectively, carrying out the MCMC procedure by using an adaptive MH algorithm. Then the best model can be selected dynamically with the Markov chains of all parameters as follows.(1)Let the initial value be , where is of dimension .(2)Generate a new model from the discrete uniform distribution , and , . Let and , ; when , let .(3)Repeat for .(4)Evaluate the acceptance probability of the move (from to ) by with .(5)Let , and then we have (6)Return the values ; then we have The estimation of the corresponding Bayes factor is

3. Validation of Model Selection Algorithm

In order to validate the proposed model selection algorithm, we generate the time series from a given model with known parameter values. To do this, we fix all parameter values of Richards model as and of Gompertz model as and the initial value . Solving, respectively, the two models from to , we get forty time points of each model, denoted by and respectively.

Using the simulated data set , we realize the model selection based on the algorithm introduced in the previous section, as shown in the first line of Table 2. Here we can calculate being infinite >100 and . Thus, the evidence of selecting model (Richards model) is decisive based on the criterion shown in Table 1. To further confirm the validation of the proposed method, we calculate the AIC value of each model; that is, they are 260, 241, 245, and 231 for models , , , and , respectively. The AIC value for model , Richards model, is the smallest, so the best model is Richards model, which is consistent with the result using Bayes factor. The estimation of parameter values for Richards model is as follows: being very close to the real values, shown in the third line of Table 2.

A repeat of the above procedure by using the simulated data gives that model (Gompertz model) is decisive and is then the “best” model. The estimation of parameter values for Gompertz model is which are very close to the real values, shown in the last line of Table 2.

The above results show that the proposed model selection methods based on Bayes factor and MCMC method can help us to choose the optimal model. In Figure 1, we plot the fitting results for four models based on the simulated time series generated from Richards model and Gompertz model. Although the other three models can also fit the simulated data well, it is obvious that the fitting of the Richards model and data time points is the best for time series , as shown in Figures 1(a) and 1(b), and the fitting of the Gompertz model and data time points is the best for time series , as shown in Figures 1(c) and 1(d).

(a)

(b)

(c)

(d)

4. Real Data Driven Model Selection and Results

4.1. A/H1N1 Data and Results

The 2009 influenza A/H1N1 pandemic outbreaks in Shaanxi Province of mainland China started from the 3rd of September. The majority of reported A/H1N1 cases were initially diagnosed in colleges and universities in early September 2009 when the universities began their fall semester and then spread to the communities in the middle of October 2009. The epidemic curve in Shaanxi Province exhibited the bimodality, where the first and small wave started around 3 September till 21 September and the second and large wave followed [11, 13]. In order to evaluate the effectiveness of control measures on A/H1N1, Tang et al. [11, 13] proposed the compartment epidemic models and then employed the A/H1N1 data sets to estimate the unknown parameters.

In this subsection, we plan to realize the model selection procedures using the published accumulative cases number of A/H1N1 from the 8th Hospital of Xi’an, where the majority of the confirmed cases in the province of Shaanxi in early September 2009 were isolated. The selection results are given in the first line of Table 3 and Figure 2(a). It follows from Table 3 that Bayes factors , are infinite (>100 naturally), which confirm that there exists decisive evidence for model (i.e., Logistic model) compared with models and . Moreover, both and mean that the selection of Logistic model and Richards model is uniform and alternating. To confirm the model selection results on A/H1N1 data set, we further calculate the AIC values which are given to be 249, 362, 592, and 254 for models , , , and , respectively. The AIC values for both Logistic model and Richards model support us to choose these two models, which are the best models for us to fit the A/H1N1 data.

(a)

(b)

Figure 2

(a) Model selection based on the accumulate cases data from the 8th Hospital of Xi’an from 3 September to 21 September with the last 2000-group parameters of Markov chain; (b) model fitting of A/H1N1 data in Xi’an, 2009. The curves represent the fitting to the data for four models, respectively. The grey areas are the 95% confidence intervals of each curve. Here, cyan curve represents Logistic model; blue curve represents Gompertz model; red curve represents Rosenzweig model; black curve represents Richards model. Note that the cyan curve and black curve almost coincide.

To show the results of model selection intuitively, Figure 2(a) gives the selection results for the last 2000 groups of all estimated parameters from the Markov chains. In Figure 2(a), the number of four models which have been selected in the last 2000 runs is 1102, 0, 0, and 898 for models , , , and , respectively. It is easy to notice that the probabilities for Logistic model and Richards model are almost the same, which further confirm that Logistic model and Richards model are the best model for the A/H1N1 data set. Fitting, respectively, four models to the cumulate A/H1N1 case number data, we obtained model fit for the initial outbreak from September 3 to September 21, shown in Figure 2(b). It is easy to notice that the best fitted theoretical models are Logistic model and Richards model and the solution curves of Logistic model and Richards model are coincident.

The estimations of basic reproduction number and turning point are shown in the first line of Table 5 and parameters for Logistic model are shown in the first line of Table 4. For the purpose of computing , we employ the mean estimated generation interval of days given by Tang et al. [13], which results in the estimation of based on Logistic model (i.e., (95% CI (1.8869, 1.9142))). The likelihood-based and compartment model-based estimations of are 1.663 (95% CI (1.273, 2.053)) [13] and 1.682 (95% CI (1.446, 1.918)) [11] for the period from 3 September to 21 September with a generation time of four days. All those show that in order to evaluate the emerging infectious disease we could employ the simplest model, because it allows us to identify the model parameter values more quickly and it is actually based on a small number of data points, and this is quite important for public health. The result of for Xi’an indicates that the turning point had occurred during 25–27 September, 2009. The estimation of final size is 1013 (95% CI (996, 1030)) of the first wave, but it cannot be reached because of the beginning of the second wave.

4.2. Ebola Data and Results

On June 18, 2014, an Ebola outbreak emerged in Africa. The outbreak, first reported in Guinea in December, 2013, has spread to neighboring Sierra Leone and Liberia. Ebola, characterized by fever, severe diarrhea, and vomiting, has a high fatality rate, which has mooted by the World Health Organization (WHO) criteria for a serious disease. Therefore, the main propose of this subsection is to use the report data sets from the WHO about the most serious regions including Guinea, Liberia, and Sierra Leone from March 25, 2014, to May 3, 2015, in order to carry out model selections and parameters estimations and then to get the estimates of , turning point , and final size for Guinea, Liberia, Sierra Leone, and West Africa, respectively. Note that the sum of data from those three countries has been used for West Africa.

The selection results are shown in Table 3 and Figure 3. In Table 3, we compute the relevant Bayes factors and AICs for four candidate models. Figure 3 gives the selection result for last 2000 groups of all estimated parameters from the Markov chains based on the Ebola cases of West Africa, Guinea, Liberia, and Sierra Leone. The estimations of model parameters and , , and final size are shown in Tables 4 and 5, respectively. In Table 6, we compared the reported cases and model predicted cases of Ebola based on Richards model on June 14, 2015. In Figure 4, the model fitting results for four models and data sets are also provided. Note that the data points from March 25, 2014, to May 3, 2015, for West Africa and Guinea and the data points from May 27, 2014, to May 3, 2015, for Liberia and Sierra Leone have been used to fit the models. In Figure 5, we show the different and relevant 95% confidence interval when the generation time changes from 10 days to 18 days.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

In particular, for West Africa, the selection results are given in the second line of Table 3 and Figure 3(a). It follows from Table 3 that Bayes factors , are infinite (>100 naturally), which indicates that there exists decisive evidence for model (i.e., Logistic model) compared with models and . Moreover, both and mean that the selection of Logistic model and Richards model is uniform and alternating (i.e., here). To further confirm the model selection results for West Africa, the AIC values are calculated and given by 5200, 49500, 1872800, and 5400 for models , , , and , respectively. The AIC values for both Logistic model and Richards model further indicate that these two models are the best. In Figure 3(a), the numbers of four models which have been selected in the last 2000 runs are 1168, 0, 0, and 832 for models , , , and , respectively. It is interesting to notice that the probability for Logistic model is the biggest one and Richards model is the second one, which further confirms that Logistic model is the best model for West Africa data set. In Figure 4(a), it is easy to notice that the best fitted theoretical models are Logistic model and Richards model and the solution curves of Logistic model and Richards model are almost coincident.

The estimations of the parameters for Logistic model are shown in the second line of Table 4 and basic reproduction number and turning point based on West Africa data are shown in the second line of Table 5. The mean estimated generation interval days given by Chowell and Nishiura [12] is used to calculate the basic reproduction number . Based on Logistic model is estimated to be 1.3522 (95% CI (1.3506, 1.3537)) and the variation in with different generation intervals is shown in Figure 5. The turning point is (95% CI (226, 228)) which indicates that the turning point had occurred during 6–8 November, 2014, for West Africa. The estimation of final size is 25794 (95% CI (25630, 25958)) which could have occurred during 13–17 September, 2015. On June 14, 2015, the reported cases are 26742 and the model predicted cases are 25693 and the rate of underestimated rate of model is −5.9%.

For Guinea, the selection results are given in the third line of Table 3 and Figure 3(b). From Table 3, Bayes factors , are infinite (>100 naturally), which confirm that there exists decisive evidence for model (i.e., Logistic model) compared with models and . Moreover, both and mean that the selection of the Logistic model and the Richards model is uniform and alternating. The AIC values are 1991, 3427, 18476, and 1998 for models , , , and , respectively. The AIC values for both Logistic model and Richards model further indicate that these two models are the best for us to fit Guinea data. In Figure 3(b), the numbers of selected about four models which have been selected in the last 2000 runs are 1028, 0, 0, and 972 for models , , , and , respectively, which further shows that the Logistic model and the Richards model are the best model for Guinea data set, as shown In Figure 4(b). Compared with the selection results for West Africa we conclude that the outbreak pattern of West Africa follows Guinea.

The estimations of the parameters for Logistic model are shown in the third line of Table 4 and basic reproduction number and turning point based on Guinea data are shown in the third line of Table 5. The estimation of based on Logistic model is 1.2101 (95% CI (1.2084, 1.2119)) with generation interval days and the different estimations with different are shown in Figure 5. The result of (95% CI (237, 241)) indicates that the turning point had occurred during 15–19 November, 2014. The estimation of final size is 3916 (95% CI (3865, 3967)) which could have occurred during 24–31 December, 2015. On June 14, 2015, the reported cases are 3674 and the model predicted cases are 3778 and the rate of overestimated model is +2.8%.

For Liberia, the selection results are given in the fourth line of Table 3 and Figure 3(c). It follows from Table 3 that Bayes factors , are infinite (>100 naturally), which suggests that there exists decisive evidence for model (i.e., Richards model) compared with models and . Moreover, and indicate that the evidence for the selection of Richards model is decisive. Meanwhile, we calculate the AIC values which are given by 6308, 6547, 7980, and 2559 for models , , , and , respectively. It supports us to choose Richards model, which is the best model for us to fit Liberia data. In Figure 3(c), the numbers of four models which have been selected in the last 2000 runs are 1, 0, 0, and 1999 for models , , , and , respectively, which further confirms that Richards model is the best model for Liberia data set, as shown in Figure 4(c).

The estimation of based on Logistic model is 3.0234 (95% CI (2.6063, 3.4881)), shown in Table 5 with days, and variation in with different values of is shown in Figure 5. The result of (95% CI (121, 149)) indicates that the turning point occurred during 23 September–21 October, 2014. The estimation of final size is 3916 (95% CI (3865, 3967)) which occurred during 24–31 December, 2015. On June 14, 2015, the reported cases are 10666 and the model predicted cases are 9843 and the rate of underestimated model is −7.7%.

For Sierra Leone, it follows from Table 3 that Bayes factors , , and , which implies that there exists decisive evidence for models , , and compared with model (i.e., Logistic model). Moreover, both and mean that the evidence for the selection of Richards model is stronger than model and more substantial than model . To further confirm the model selection results, we calculate the AIC values to be 15432, 6251, 7038, and 5400 for models , , , and , respectively. The AIC value for Richards model supports us to choose Richards model, which is the best model for us to fit Sierra Leone data. In Figure 3(d), the numbers of four models which have been selected in the last 2000 runs are 2, 408, 205, and 1385 for models , , , and , respectively, which further confirm that Richards model is the best model for Sierra Leone data set, as shown in Figure 4(d).

The estimation of based on Richards model is 1.9018 (95% CI (1.8565, 1.9478)) with generation interval days and variation in with different values of is shown in Figure 5. The result of (95% CI (157, 174)) indicates that the turning point had occurred during 27 October–12 November, 2014. The estimation of final size is 12633 (95% CI (12515, 12750)) which could have occurred during 15–22 December, 2015. On June 14, 2015, the reported cases are 12965 and the model predicted cases are 12515 and the rate of underestimated model is −3.5%.

Comparing the actual reported cases and the model predicted cases, on 14 June, 2015, the rates of underestimated or overestimated model are, respectively, −5.9%, +2.8%, −7.7%, and −3.5% for West Africa, Guinea, Liberia, and Sierra Leone, as shown in Table 6. In Liberia, the underestimated rate is bigger than others because the data had changed due to ongoing reclassification, retrospective investigation, and availability of laboratory results, and the data of Liberia had significant adjustments. This also is why for Liberia is the biggest. Note that the reported accumulated cases including confirmed, probable, and suspected cases in Liberia have been revised largely; for example, there are 4665, 6535, and 6525 cases on October 23, October 27, and November 2, 2014, respectively. Similarly, the reported accumulated cases in Sierra Leone are 3896, 5235, and 4759 on October 23, October 27, and November 2, 2014, respectively. Those big differences could result in the big variances in estimating and predicting the outbreaks of Ebola in West Africa. Therefore, the more precise data sets are, the more accurate estimation and predication are.

5. Discussion and Conclusions

On the basis of four simplest single species models, the model selection, and MCMC method we choose the best model to fit the A/H1N1 data set in China and Ebola data sets in West Africa. This allows us to estimate the basic reproduction number, the turning point, and final size more quickly and accurately for the emerging infectious disease compared with some complex compartment models.

Our estimate of with (95% CI (1.8869, 1.9142)) on A/H1N1 is quite similar to that from the data for Shaanxi Province obtained by Tang et al. [11, 13] but with little differences that could well be associated with differences in methodology. Further, many factors such as differences in population densities, realization of control measure, and mobility of the population among regions led to a wide range of reproduction number. Our estimated reproduction numbers from the hospital notifications are in broad agreement with those obtained in studies on data from Mexico (95% CI (1.2, 1.6)) [18], the United States of America (95% CI (1.7, 1.8)) [19], and New Zealand (95% CI (1.80, 2.15)) [20]. Thus, we believe that the best model (i.e., Richards model) can be used for rapid epidemic modeling in the face of public health crisis.

When we fit the data sets for Ebola cases in West Africa, the selection of the most appropriate model is Logistic model or Richards model. Reproductive numbers are 1.3522 (95% CI (1.3506, 1.3537)) for West Africa, 3.0234 (95% CI (2.6063, 3.4881)) for Liberia, 1.2101 (95% CI (1.2084, 1.2119)) for Guinea, and 1.9018 (95% CI (1.8565, 1.9478)) for Sierra Leone. Using early phase of Ebola outbreaks in West Africa 2014, Chowell and Nishiura [12] estimated for those three countries, which were given by 1.96 (95% CI (1.92, 2.01)) for Liberia and 3.07 (95% CI (2.85, 3.32)) for Sierra Leone. Althaus [21] formulated a susceptible-exposed-infectious-removal (SEIR) model and employed the data from March 22, 2014, to August 20, 2014, to get the maximum likelihood estimates of , where (95% CI (1.50, 1.52)) for Guinea, 2.53 (95% CI (2.41, 2.67)) for Sierra Leone, and 1.59 (95% CI (1.57, 1.60)) for Liberia. WHO Ebola Response Team [22] employed the data by September 14, 2014, and got (95% CI (1.41, 1.60)) for Liberia, 1.81 (95% CI (1.60, 2.03)) for Guinea, and 1.38 (95% CI (1.27, 1.51)) for Sierra Leone. It is worth noticing that the estimations of by using data points in different periods are quite different, and any differences could well be associated with variations in methodology and differences at times or at stages.

For the previous Ebola outbreaks in Central Africa, Chowell et al. [23] developed a homogenous mixing SEIR model and got for Congo in 1995 and 1.34 for Uganda in 2000. However, the estimations of in the present paper show that for Liberia is the biggest, followed for Sierra Leone, and the smallest is for Guinea. As mentioned before, the main reason why for Liberia is bigger than others is that the ongoing reclassification, retrospective investigation, and availability of laboratory results make the data of Liberia having significant adjustment. Moreover, the suspected cases were increased significantly, while the confirmed cases were increased slowly related to the suspected cases.

The turning point and final size have been also estimated and calculated. For example, the turning point for West Africa was 227 days which corresponds to 5 November, 2014, and the turning points for Guinea, Liberia, and Sierra Leone were about October or November, 2014. Further, the final breakout time will be September or December, 2015, with final size of 3916 (95% CI (3865, 3967)) for Guinea, 9886 (95% CI (9740, 10031)) for Liberia, 12633 (95% CI (12515, 12750)) for Sierra Leone, and 25794 (95% CI (25630, 25958)) for West Africa, respectively, as shown in Table 3. Note that the Ebola outbreak in Liberia was declared over on 9 May, after 42 complete days that elapsed since the burial of the last confirmed case, but the estimation of final time of Liberia was September 23, 2015, because of the accumulative reported numbers of suspected cases being increasing. That is the reason of the country having entered a 3-month period of heightened vigilance from May 9, 2015, and WHO will maintain an enhanced presence in the country until the end of 2015, with a particular focus on areas that border Guinea and Sierra Leone (http://apps.who.int/ebola/en/current-situation/ebola-situation-report-13-may-2015).

For the results of model selection, the most appropriate model is Logistic model or Richards model which requires only cumulative case data from an epidemic curve (Table 7). Note that for the earlier stages of an epidemic such as Ebola in Guinea the Logistic model cannot fit the data well [12]. However, our main results show that the Logistic model could be a candidate to fit the data with more time points. All those indicate that the model selection depends on the length of the time series. Moreover, Logistic model is a special case of Richards model with the exponent of deviation parameter 1. Therefore, we conclude that Richards model could be chosen firstly when estimating that require more extensive and detailed data [24, 25]. E. Tjørve and K. M. C. Tjørve [26] indicated that Gompertz model is also a special case of Richards model, but our results indicate that Gompertz model may not be a suitable candidate for describing the data of emerging infectious diseases.

In Figure 4, we fit the data sets for Guinea, Liberia, Sierra Leone, and West Africa based on four candidate models, and our results show that the best models are different for different data sets. In particular, the “best” model is Richards model for Liberia and Sierra Leone, and could be underestimated if we choose the Logistic model for Liberia and Sierra Leone, while turning point could be underestimated if we choose Gompertz model for Liberia and Sierra Leone, as shown in Table 8 and Figure 6. The error is too big when fitting data of Liberia and Sierra Leone with Rosenzweig model, so we only compare the estimation of and about Logistic, Gompertz, and Richards model.

(a) Liberia

(b) Liberia

(c) Sierra Leone

(d) Sierra Leone

By the analysis of Ebola data, we get that model selection uncertainty caused a magnification of the standard error of the estimation of and , so model selection is necessary when fitting specific data with model. That is to say adopting the bad model would probably cause overestimation or underestimation of parameters, basic reproduction number, and final size. Thus, it has to be emphasized that the model selection is essential for investigating dynamic of the emerging infectious disease based on the available data set and arbitrarily picking a model without any consideration of alternatives is inadvisable.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (NSFCs, 11171199, 11471201, and 11171268) and by the Fundamental Research Funds for the Central Universities (GK201305010 and GK201401004).

References

M. Lipsitch, T. Cohen, B. Cooper et al., “Transmission dynamics and control of severe acute respiratory syndrome,” Science, vol. 300, no. 5627, pp. 1966–1970, 2003.
View at: Publisher Site | Google Scholar
S. Riley, C. Fraser, C. A. Donnelly et al., “Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health interventions,” Science, vol. 300, no. 5627, pp. 1961–1966, 2003.
View at: Publisher Site | Google Scholar
Y.-H. Hsieh, S. Ma, J. X. Velasco Hernandez, V. J. Lee, and W. Y. Lim, “Early outbreak of 2009 influenza a (H1N1) in mexico prior to identification of pH1N1 virus,” PLoS ONE, vol. 6, no. 8, Article ID e23853, 2011.
View at: Publisher Site | Google Scholar
Y. H. Hsieh, J.-Y. Lee, and H. L. Chang, “SARS epidemiology, logistic-type model, and cumulative case number,” Emerging Infectious Diseases, vol. 10, pp. 1165–1167, 2004.
View at: Google Scholar
Y.-H. Hsieh and S. Ma, “Intervention measures, turning point, and reproduction number for dengue, Singapore,” The American Journal of Tropical Medicine and Hygiene, vol. 80, pp. 66–71, 2009.
View at: Google Scholar
H. Akaike, “Likelihood of a model and information criteria,” Journal of Econometrics, vol. 16, no. 1, pp. 3–14, 1981.
View at: Publisher Site | Google Scholar | Zentralblatt MATH
K. P. Burnham and D. R. Anderson, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, Springer, New York, NY, USA, 2002.
View at: MathSciNet
A. D. R. McQuarrie and C.-L. Tsai, Regression and Time Series Model Selection, World Scientific Publishing Company, Singapore, 1998.
View at: Publisher Site | MathSciNet
D. Posada and T. R. Buckley, “Model selection and model averaging in phylogenetics: advantages of Akaike Information Criterion and Bayesian approaches over likelihood ratio tests,” Systematic Biology, vol. 53, no. 5, pp. 793–808, 2004.
View at: Publisher Site | Google Scholar
D. Jeffreys, Theory of Probability, Clarendon Press, Oxford, UK, 3rd edition, 1961.
View at: MathSciNet
S. Y. Tang, Y. N. Xiao, Y. P. Yang, Y. Zhou, J. Wu, and Z. Ma, “Community-based measures for mitigating the 2009 H1N1 pandemic in China,” PLoS ONE, vol. 5, no. 6, Article ID e10911, 11 pages, 2010.
View at: Publisher Site | Google Scholar
G. Chowell and H. Nishiura, “Transmission dynamics and control of Ebola virus disease (EVD): a review,” BMC Medicine, vol. 12, pp. 196–212, 2014.
View at: Google Scholar
S. Y. Tang, Y. N. Xiao, L. Yuan, R. A. Cheke, and J. Wu, “Campus quarantine (Fengxiao) for curbing emergent infectious diseases: lessons from mitigating A/H1N1 in Xi'an, China,” Journal of Theoretical Biology, vol. 295, pp. 47–58, 2012.
View at: Publisher Site | Google Scholar | MathSciNet
Y.-H. Hsieh, D. N. Fisman, and J. H. Wu, “On epidemic modeling in real time: an application to the 2009 Novel A (H1N1) influenza outbreak in Canada,” BMC Research Notes, vol. 3, article 283, 2010.
View at: Publisher Site | Google Scholar
J. L. Ma and D. J. E. Earn, “Generality of the final size formula for an epidemic of a newly invading infectious disease,” Bulletin of Mathematical Biology, vol. 68, no. 3, pp. 679–702, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
C. P. Robert and G. Casella, Introducing Monte Carlo Methods with R, Springer, New York, NY, USA, 2010.
C. Han and B. P. Carlin, “Markov Chain Monte Carlo Methods for computing Bayes factors: a comparative review,” Journal of the American Statistical Association, vol. 96, no. 455, pp. 1122–1132, 2001.
View at: Publisher Site | Google Scholar
C. Fraser, C. A. Donnelly, S. Cauchemez et al., “Pandemic potential of a strain of influenza A (H1N1): early findings,” Science, vol. 324, no. 5934, pp. 1557–1561, 2009.
View at: Publisher Site | Google Scholar
L. F. White, J. Wallinga, L. Finelli et al., “Estimation of the reproductive number and the serial interval in early phase of the 2009 influenza A/H1N1 pandemic in the USA,” Influenza and Other Respiratory Viruses, vol. 3, no. 6, pp. 267–276, 2009.
View at: Publisher Site | Google Scholar
H. Nishiura, N. Wilson, and M. G. Baker, “Estimating the reproduction number of the novel influenza A virus (H1N1) in a Southern Hemisphere setting: preliminary estimate in New Zealand,” The New Zealand Medical Journal, vol. 122, no. 1299, pp. 73–77, 2009.
View at: Google Scholar
C. L. Althaus, “Estimating the reproduction number of Zaire ebolavirus (EBOV) during the 2014 outbreak in West Africa,” PLoS Currents Outbreaks, vol. 10, pp. 1371–1380, 2014.
View at: Google Scholar
WHO Ebola Response Team, “Ebola virus disease in West Africa—the first 9 months of the epidemic and forward projections,” The New England Journal of Medicine, vol. 371, pp. 1481–1495, 2014.
View at: Google Scholar
G. Chowell, N. W. Hengartner, C. Castillo-Chavez, P. W. Fenimore, and J. M. Hyman, “The basic reproductive number of Ebola and the effects of public health measures: the cases of Congo and Uganda,” Journal of Theoretical Biology, vol. 229, no. 1, pp. 119–126, 2004.
View at: Publisher Site | Google Scholar | MathSciNet
J. Wallinga and M. Lipsitch, “How generation intervals shape the relationship between growth rates and reproductive numbers,” Proceedings—Biological Sciences, vol. 274, no. 1609, pp. 599–604, 2007.
View at: Publisher Site | Google Scholar
G. Chowell, M. A. Miller, and C. Viboud, “Seasonal influenza in the United States, France, and Australia: transmission and prospects for control,” Epidemiology and Infection, vol. 136, no. 6, pp. 852–864, 2008.
View at: Publisher Site | Google Scholar
E. Tjørve and K. M. C. Tjørve, “A unified approach to the Richards-model family for use in growth analyses: why we need only two model forms,” Journal of Theoretical Biology, vol. 267, no. 3, pp. 417–425, 2010.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2015 Wendi Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1789

Downloads

1465

Citations