Abstract

This article presents statistical inference methodology based on maximum likelihoods for delay differential equation models in the univariate setting. Maximum likelihood inference is obtained for single and multiple unknown delay parameters as well as other parameters of interest that govern the trajectories of the delay differential equation models. The maximum likelihood estimator is obtained based on adaptive grid and Newton-Raphson algorithms. Our methodology estimates correctly the delay parameters as well as other unknown parameters (such as the initial starting values) of the dynamical system based on simulation data. We also develop methodology to compute the information matrix and confidence intervals for all unknown parameters based on the likelihood inferential framework. We present three illustrative examples related to biological systems. The computations have been carried out with help of mathematical software: MATLAB® 8.0 R2014b.

1. Introduction

Delay differential equations (DDEs) are widely used to model many real life phenomena, especially in science and engineering. Examples include the modeling of spread of infectious diseases, modeling of tumor growth and the growth of blood clots in the brain, population dynamics, traffic monitoring, and price fluctuations of commodities in economics; see [14]. A univariate delay differential equation model (DDEM) with multiple delays equates the real valued observations, , as noisy realizations from an underlying DDE: where ’s are errors assumed to arise from a noise distribution with zero mean and unknown standard deviation . In (1), is the solution, , of the DDEevaluated at the time points, ; in (2), , is the th delay term with delay parameter , and is a vector of other parameters of interest that govern the trajectories of the underlying DDE in (2). Equations (1) and (2) constitute a univariate DDEM in the most general form. In a DDEM, the parameters and are often unknown and have to be estimated based on observations

Not many methods appear in the statistical literature on parameter estimation and inference for DDEMs. Among the statistical approaches that have been suggested, many involve restrictions on the form of DDEMs that are being investigated. When such restrictions are relaxed, high computational costs and challenges arise. Typically, further inferential procedures such as obtaining standard errors and confidence intervals associated with parameter estimates involve further computational costs and challenges. We give a brief review of these works and approaches that have been reported in the literature in the following paragraph.

Ellner et al. [5] estimate the derivative of a univariate DDEM, which is assumed to be in an additive form, using nonparametric smoothing. Subsequently, they infer the constant (single) delay parameter, , based on fitting a generalized additive model. Ellner’s technique, although it unifies previous works, can thus be applied to DDEMs which satisfy the assumed additive form only. Wood [6] developed spline based model fitting techniques in the case when the DDEMs are partially specified. The spline based method involves high computational costs as cross-validation is used to select the smoothing coefficients associated with the penalty term as well as the unknown parameter estimates. A penalized semiparametric method is proposed by Wang and Cao [7] which involves maximizing an objective function consisting of two terms: a likelihood term and a penalty term which measures the discrepancy between an estimate of the derivative, , and the right hand side of the DDEM in (1). The selection of smoothing coefficients is done, similar to [6], via cross-validation, whereas standard errors of parameter estimates are obtained by bootstrapping. It follows that the method of [7], like [6], involves high computational costs. Further, Wang and Cao consider only univariate DDEMs with a single delay parameter. An estimation method based on Least Squares Support Vector Machines (LS-SVMs) for approximating constant as well as time-varying parameters in deterministic parameter-affine DDEMs is presented by Mehrkanoon et al. [8]. We note that Mehrkanoon performs parameter estimation only; no standard errors of estimates or confidence intervals are reported. Further, only single delays (either constant or time varying) are considered in [8].

In this paper, we consider parameter estimation and inference for univariate DDEMs with multiple delays based on the maximum likelihood. The method of maximum likelihood, as advocated by Fisher in his important papers [9, 10], has become one of the most significant tools for estimation and inference available to statisticians. Maximum likelihood estimators (MLEs) are well defined once a distributional model is specified for the observations. MLEs have well-behaved and well-understood properties: Huber [11] presents general conditions whereby the MLE is consistent for the true value of the unknown parameters for large sample sizes. Wald [12] and Akaike [13] observed that the maximum likelihood estimator is a natural estimator for the parameters when the true distribution is unknown. The large sample theory and distributional properties of MLEs can be used to perform subsequent inference procedures such as obtaining standard errors and confidence intervals and performing tests of hypotheses at minimal additional computational costs. MLEs are also the basic estimators that are used in subsequent statistical inferential procedures such as model selection using Akaike Information Criteria (AIC), Bayes Information Criteria (BIC), and other model selection criteria. Model selection is an important issue in DDEMs, such as for partially specified DDEMs in [6], where several models can be elicited for an observed physical process, but one model needs to be selected among many which fits the observed data and is simple enough to understand (Occam’s razor principle).

MLE can be developed for a large variety of estimation situations and is asymptotically efficient, which means that for large samples it produces the most precise estimates compared to non-MLE based methods (such as [8]). These are the reasons why we preferred using MLE over all other estimators for DDEMs in this paper.

The remainder of this paper is organized as follows: we define univariate DDEMs in Section 2. In Section 3, the MLE approach for DDEMs is outlined and the MLE is obtained computationally using an adaptive grid procedure followed by a gradient descent algorithm. We also develop algorithms for obtaining the information matrix and construct standard errors and confidence intervals for the unknown parameters. Three examples of univariate DDEMs related to biological systems are presented, and the numerical solutions and results based on the proposed methodology are provided based on simulation in Section 4.

2. General Model Formulation

Recall the DDEM defined by (1) and (2). The observation is obtained at the th sampled time point, , with , where , . In the remainder of this paper, the errors are assumed to be independent and identically distributed according to a normal with mean zero and unknown standard deviation , that is, . The underlying dynamical system , is expressed implicitly in terms of the DDE. The general form of DDE with multiple delays for is given by (2) as , where , is 1-dimensional function, and denotes the first derivative of with respect to time . The quantities and are unknown parameters of the DDEM, where is a vector of unknown parameters of dimension and is a vector of time delays of dimension . The complete trajectory of the function on will be determined by (2), and initial condition function , where for all , is also unknown in addition to the unknown parameters and . For given values of , , and , we note that the solution , , is a function of , , and ; thus, we make it an explicit function of the unknown quantities , , and based on the notation. The observations are collected at the sampled time points , based on the observational model (1). Our goal is to estimate , and based on the observations Here, note that is also unknown and thus appears as a nuisance parameter since properties of the dynamical system are governed by and and not .

3. The Maximum Likelihood Estimation Approach for DDEMs

The likelihood of the DDEM for parameters , , and , given observations , is where is the collection of all observations on with densitybased on the normality assumption on ’s in (1). The above expression for the likelihood can be simplified toWe assume for the moment that is fixed and known; the case when is unknown is dealt with later. Thus, the above likelihood is taken to be a function of for now. The usual practice for statistical inference is to use the natural logarithm of the likelihood function, namely, the log-likelihood function, which is given byExpressions of the log-likelihood are often simpler than the likelihood function, , since they are easier to differentiate and the results are more stable computationally.

Since is a monotonically increasing function of , it follows that the maximization of (3) and (6) is equivalent in that the same optimized parameter is found. We denote the MLE of as . The MLE is a point estimate such that and can be viewed as a random vector depending on the distribution of data, . We now consider the case when is unknown. The MLE of is denoted by . After finding as in (7), the log-likelihood equation in (6) is maximized as a function of . The resulting estimate is available in closed form and is given by

3.1. A Two-Stage Numerical Procedure for Finding the MLE

To find the MLE numerically, we develop a two-stage numerical procedure consisting of an adaptive grid procedure, then followed by a gradient descent algorithm. Two stages are needed as we wish to utilize the advantages of each algorithm while avoiding the drawbacks of the other in each stage. Grid algorithms are able to find the global maximum of a function over a grid space. First, it evaluates values of the function on the grid space and then finds the grid value that corresponds to the maximum. Provided the grid space is refined enough, the grid value corresponding to this maximum will be close to the domain value that actually corresponds to the global maximum. So by gridding, we are able to ensure that we are close to the global maximum. The adaptive grid algorithm enhances the original gridding algorithm so that we will move closer and closer to the global maximum. However, the main drawback of any grid (and adaptive grid) algorithm is its slowness in convergence.

On the other hand, gradient descent algorithms can converge to maxima of a function sufficiently quickly. The main drawback of gradient descent algorithms is that it will find the nearest local maximum from the starting point. So, if the original starting point is not close to the global maximum, a gradient descent algorithm will not guarantee that the global maximum is found since it might get “stuck” at a local maximum only.

The function to maximize in our case is the log-likelihood in (6) and the domain value corresponding to this maximum is the MLE. Thus, our two-step method uses adaptive grid in the first stage to ensure that we are close to the MLE and then switches to the quasi-Newton algorithm to ensure rapid convergence to the MLE.

3.1.1. Grid Procedure

To find and as in (7), the value with largest (log) likelihood should be chosen. This can be done by an adaptive grid procedure. The gridding is carried out for and , and for each pair of in the grid space, a Newton-Raphson numerical procedure is used to find the maximum value of defined as

We use the grid space , which covers values of . For every fixed value of and in , we find the MLE of , , given by maximizing the log-likelihood above. Since satisfiesthe numerical problem is solved by using Newton-Raphson method: wherein (12) and (13), and are, respectively, the first and second partial derivative of with respect to and then evaluated at for As seen from (12) and (13), we need to calculate and at each . This is done recursively as follows.

The first derivative process of is obtained by differentiating (2) with respect to . Differentiating (2) with respect to , where and are independent of each other, giveswhich implies that the first derivative process satisfies another DDE which is given by (14).

Similarly, the second derivative process is obtained by differentiating (14) with respect to , to obtainwhich implies that satisfies a DDEM depending on . The above two DDEs can be solved numerically based on initial conditions that are specified below.

To obtain the initial conditions of the first and second derivative process, we note thatThus, and for . Subsequently, we can get the value of and numerically at every value of by numerically solving the DDEs using (12) and (13). We divide each into equal segments. Here, is a natural number.

The grid algorithm operates on the grid space of , and the Newton-Raphson procedure is nested within the adaptive grid algorithm. Thus, for every grid value pair , the Newton-Raphson uses these values of to find “” via (11). On convergence of the Newton-Raphson method, we obtain the MLE of , , for each point grid in . The log-likelihood is calculated based on (6) using . Then the point based maximum is found by finding the maximum as a function of . We define the MLE which is obtained from the gridding algorithm as

3.1.2. The Adaptive Grid Procedure

The adaptive grid (AG) algorithm is a repeated application of the generic grid procedure over increasingly finer intervals for . The AG algorithm is as follows:(1)Choose an initial grid space consisting of the grid points , and .(2)Maximize with respect to and as described in the grid procedure above.(3)Obtain as in (17).(4)Refine the grid: suppose as in #. The new grid space has lower and upper -grid points given by . The corresponding lower and upper -grid points are . If either the lower or upper bounds are not found, then the original grip space is enlarged so that the MLE occurs in the interior of (5)Repeat steps #–# to obtain based on the generic grid procedure. Repeat to generate the sequence , . Stop at when , a prespecified threshold.(6)The final MLE based on the adaptive grid technique is

Remark 1. In step #, the initial grid space is chosen to be a large domain that is likely to contain the MLE. In our simulation experiments, since the true values of are known, the domain is selected around these true values. In practice, we need to carry out an exhaustive search within the upper and lower bounds of and . If the parameters are positive, say, as is usually the case, the lower bounds can be taken to be zero. Next, we can consider a large positive numbers, say and , and construct the grid in consisting of equidistant marginal grind points. The value of need not be too large since we only aim to explore the log-likelihood profile. The log-likelihood can be evaluated at these grid points and plotted to visualize properties of the resulting surface. Depending on this plot, we can choose either to fix or increase and until we are certain that the MLE is within the selected domain.

After obtaining the first-step approximation to the MLE by the adaptive grid procedure above, we use the MATLAB function fminunc to obtain the final MLE by minimizing the negative log-likelihood function viewed as a function of the unknown parameter vector We have , and the final step MLE is defined as where with defined in (9). We require to input the gradient vector for this MATLAB function which is given by The explicit expression of each entry of is provided in Appendix A. The MATLAB function fminunc uses, as an option, a quasi-Newton procedure that does not require the calculation of second derivatives and hence saves computational time.

3.2. Statistical Inference Based on MLEs
3.2.1. Information Matrix

Now we incorporate into the estimation procedure as well. Once is obtained by the above two-stage procedure, the MLE of is obtained analytically asLet denote the vector of all unknown parameters (including ) where . Subsequent inference based on the MLEs requires the computation of the Fisher information [14, 15]. The Fisher information matrix is given by the symmetric matrix whose th element is the covariance between th and th first partial derivatives of the log-likelihood:Based on the expected values of the second partial derivatives, the Fisher information matrix in (22) is equivalent toThe observed Fisher information matrix is simply , the information matrix evaluated at the maximum likelihood estimate, , of . Further, its inverse evaluated at the MLE is an estimate of the asymptotic covariance matrix for which is given bySince the log-likelihood function is given bythe first-order partial derivative of with respect to each element of is given byfor . The second-order partial derivative isfor , for , andTaking expectations on both sides of the above equations and from (23), we get for , ,for , andWe compute each element of the matrix in (23) for DDEMs with single and multiple delays; the explicit expressions are given in Appendices A and B.

3.2.2. Confidence Intervals

A confidence interval for an unknown parameter gives the range of values most likely to cover the true value of the parameter with high probability. The standard form of a confidence interval is To construct a level C confidence interval for any element of , say, , for , we need to find an estimate of the margin of error. First, the estimated standard error of the maximum likelihood estimate, , of is given bywhere is the covariance matrix as given in (24). The explicit terms of the covariance matrix can be obtained by substituting (30) into (24). The confidence interval for iswhere   norminv, , is the desired significance level and is the margin of error. We can find confidence intervals for all components of in this way. In some cases, the estimated confidence interval in (35) may include some negative values which is unreasonable for a parameter that is known to be positive. In this case, we perform a logarithmic transformation of the parameter, construct the confidence interval for the log-transformed parameter, and then transform the confidence interval back to the original parameter space. The confidence interval for based on this log-transformation procedure is

4. Examples

We present three examples of DDEMS in the univariate case: we consider two models with a single delay and a third one with two delays.

4.1. Example  1

We consider the exponential delay differential equation model (EDDEM) with a single delay (i.e., ) which is the solution to the DDE The EDDEM in (37) is a model for ideal population growth under infinite resources and no deaths, such as a protozoan or bacterial culture dividing under constant environmental conditions. The delay parameter can be taken to represent the gestation period or maturity period, that is, the time taken for individuals to be ready for division. The parameter represents the growth rate of the population. We numerically solve the DDE in (37) using the MATLAB function dde23 with fixed parameters values at . Sampled observations from the DDEM as in (1) were obtained at discrete time intervals of width starting from . The endpoint considered is corresponding to . The aim is to estimate , , , and based on . Figure 1(a) illustrates the different behaviour of based on different parameter specifications. Figure 1(b) shows the underlying trajectories of the solution from the DDE model (37) and the sampled observations.

The initial grid space for the adaptive grid procedure was taken to be with , , ; see the remark that is given towards the end of this section regarding the selection of for this example as well as for the rest. The stopping criteria threshold was chosen to be 0.0001. The adaptive grid and Newton-Raphson procedures were run; the results are given in Tables 1 and 2. Recall that is the number of subdivisions of each interval needed for calculating the quantities and ; see Section 3.1.1. As increases, the MLE estimates become more accurate but at the cost of increased computational time. We note that, for or , satisfactory results are already achieved in terms of closeness to the true parameter values of . Subsequently, is considered for finding the maximum of the log-likelihood function, for finding the information matrix and computing the confidence intervals of the parameters.

The Fisher information matrix as and for EDDEM is and variance of at the MLE is given byThe confidence intervals for parameters in EDDE with single delay are shown, respectively, in Table 3.

4.2. Example  2

A delay differential equation in population ecology given by is known as Hutchinson’s equation [16], where is the population at that instant, is the intrinsic growth rate, and is the carrying capacity of the population. Both and are positive constants and is a positive constant delay parameter.

Define and ; then (40) can be rewritten asThe observations are collected at the sampled time points , based on the observational model (1), and the aim is to estimate , , , and based on the observations . The DDE in (41) is solved numerically by using the MATLAB function dde23 with fixed parameters values. Figure 2(a) shows the underlying trajectories (mean function) of the solution from the DDE model (41). Figure 2(a) illustrates the different behaviour of based on different parameter specifications which reflect both stable and unstable solutions. Subsequently, we fix the parameter specification at . Sampled observations from the DDEM as in (1) were obtained at discrete time intervals of width starting from . The endpoint considered is corresponding to where the number of sampled time points are . Figure 2(b) shows the underlying trajectory of the solution from the DDE model (41) and the sampled observations for the time range selected.

The initial grid space for the adaptive grid procedure was taken to be with and . The stopping criteria threshold δ was chosen to be 0.0001. The adaptive grid and Newton-Raphson procedures were run; the results are given in Tables 4 and 5. As in the previous example, as increases, the becomes more accurate and very close to be the true value but at the cost of increased computational time.

The Fisher information matrix as and for DLDEM with single delay isand variance of by using MLE is given byThe confidence intervals for parameters in DLDE with single delay are shown in Table 6.

4.3. Example  3

The delayed logistic differential equation model (DLDEM) with two delays proposed by Braddock and van den Driessche [17] is the solution to the DDEwhere , , , , and are positive constants. DDEs with two delays appear in many applications such as epidemiological models [18], physiological models [19], neurological models [20], and medical models [21]. In such equations [22, 23], very rich dynamics have been observed. Denoting , , , , and , we obtainBy using the MATLAB function dde23 with fixed parameters values, we obtain the trajectories of the solution . As in Example , we note the different characteristics of the solution depending on the parameter specifications as shown in Figure 3(a). Subsequently, the parameters are fixed at and observations are collected at the sampled time points , based on the observational model (1) at discrete time intervals of width starting from . The endpoint considered is corresponding to . Figure 3(b) shows the underlying trajectories of the solution from the DDE model (41) and the sampled observations.

The initial grid space for the adaptive grid procedure is taken to be with and . The stopping criteria threshold was chosen to be . The adaptive grid and Newton-Raphson procedures were run; the results are given in Tables 7 and 8.

The Fisher information matrix as for DLDEM with two delays at isand variance of by using MLE is given byThe confidence intervals for parameters in DLDE with two delays are shown in Table 9.

Remark 2. As mentioned earlier, the adaptive grid procedure in the first stage of our two-step procedure needs to select a sufficiently large domain that is likely to contain the MLE. The MLE should be close to the true parameter values that generated the data as standard MLE theory [1113] dictates. This has also been established in the three examples considered. Hence, since the true value of is known in our simulation examples, we selected the initial domain of the grid procedure to contain these true values in its interior. Thus, the notation denotes the lower bound of the parameters which is used for the grid procedure. The grid is ensured to contain the true parameter values based on selection of , , and in all the examples. Other than this consideration, the true values that were used in the simulation were selected rather arbitrarily, only chosen so as to be representative parameter values that exhibit the typical nature of trajectories of the underlying DDEs as shown in the figures.

5. Conclusion

In this paper, we presented the method of maximum likelihood for estimating parameters in delayed differential equations. As examples we considered the exponential differential equation model, delayed logistic differential equation model with single delay, and delayed logistic differential equation model with two delays; then we estimated the unknown parameters in these models. Two-step approach using an adaptive grid followed by a gradient descent procedure is proposed. Our methodology estimates the delay parameter as well as the initial starting value of the dynamical system correctly based on simulation data. Confidence intervals and information matrix by using maximum likelihood are obtained and are found to contain the true values of the parameter based on simulation data.

In this paper, we took the initial value function of the DDE as an unknown constant “”. However, it is possible to extend the constant initial value assumption to a more general linear or nonlinear function, say , for Two complications arise here. First, we need additional unknown parameters to represent ; for example, if is chosen to be linear, we have to estimate parameter in addition to . Higher order functions offer greater flexibility in modeling the initial function but at the expense of estimating extra parameters and slowing down the computational procedure. A second issue that follows the first is the selection of a “best” initial value function—either constant, linear, quadratic, or others. Thus, further research is required to address this concern and we hope to report some results in this direction in the future.

Appendix

A. Quasi-Newton Procedure

As input to the quasi-Newton procedure, we require to compute the gradient vector, , as given in (20), which consists of the partial derivatives of the log-likelihood function with respect to the entries of Recall that the DDEM has multiple delays parameters given by , evolution parameters given by , and unknown initial condition . Based on the log-likelihoodrecall that we define as Denoting and , the first-order partial derivatives are as follows:The above equations involve derivatives of with respect to the parameters. Each derivative expression of , , and for can be numerically obtained from respective DDEs which are derived from the initial model in (2) by differentiating it with respect to the quantity of interest. In the case of , differentiating (2) with respect to gives a new DDE for :where is the derivative of with respect to and is the delayed version of ; that is, The initial condition for the DDE in (A.4) is since the derivative of the initial value with respect to is 0. Based on this initial condition, the above DDE model can be numerically solved and the values of can be obtained from for each

In a similar way, each value of can be determined by differentiating (2) with respect to . The new DDE for iswhere is the derivative of with respect to and is the delayed version of ; that is, The initial condition for the DDE in (A.6) is since, again, the derivative of the initial value with respect to is 0. Based on this initial condition, the above DDE model can be numerically solved and the values of can be obtained from for each . The case of is similar and has been discussed in the main text when presenting the Newton-Raphson procedure.

The expressions in (A.3) also involve and . They can be obtained from differentiating the equation satisfied by in (10) for every pair :Differentiating with respect to , we getthusfor . Similarly, from differentiating (10) with respect to , we get and hence, for We give the explicit expressions for the second-order derivatives of with respect to its arguments in Appendix B (as given by (B.6), (B.7), and (B.11) in Appendix B).

B. Information Matrix

For the DDE in (2) with multiples delays and , recall that , where . The first-order partial derivatives of are as follows: for , As mentioned in Appendix A, these partial derivatives of can be evaluated numerically from the derivative expression of , , and for , since each of them forms an additional DDE derived from (2) by differentiating it with respect to the quantity of interest. Further we have Differentiating the above again with respect to the arguments of , the second-order partial derivatives have the general forms Equations (B.6), (B.7), and (B.10) are required for the quasi-Newton procedure in Appendix A.

To obtain the information matrix, we need to take the expectation of the negative of the second-order derivatives of with respect to its arguments:Taking negative on the LHS of (B.3)–(B.12), followed by expectation under the sampling distribution of each , we note that the general terms of the form oron the RHS become zero since the expectation of equals Hence we get as in (30) as well as Finally, taking expectation in (B.12),

Conflicts of Interest

The authors declare that there are no conflicts of interest with respect to the research, authorship, and/or publication of this paper.

Acknowledgments

The authors would like to thank Universiti Teknologi PETRONAS for the financial assistance under ERGS and HiCOE grants from the Ministry of Education Malaysia, as well as resource facilities provided by Center for Intelligent Signal and Imaging Research (CISIR).