Complexity

Volume 2017 (2017), Article ID 6148934, 14 pages

https://doi.org/10.1155/2017/6148934

## Maximum Likelihood Inference for Univariate Delay Differential Equation Models with Multiple Delays

^{1}Fundamental and Applied Sciences Department, Faculty of Science and Information Technology, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia^{2}Department of Electrical and Electronic Engineering, Center for Intelligent Signal and Imaging Research (CISIR), Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia

Correspondence should be addressed to Ahmed A. Mahmoud

Received 11 May 2017; Revised 19 August 2017; Accepted 23 August 2017; Published 12 October 2017

Academic Editor: Fathalla A. Rihan

Copyright © 2017 Ahmed A. Mahmoud et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This article presents statistical inference methodology based on maximum likelihoods for delay differential equation models in the univariate setting. Maximum likelihood inference is obtained for single and multiple unknown delay parameters as well as other parameters of interest that govern the trajectories of the delay differential equation models. The maximum likelihood estimator is obtained based on adaptive grid and Newton-Raphson algorithms. Our methodology estimates correctly the delay parameters as well as other unknown parameters (such as the initial starting values) of the dynamical system based on simulation data. We also develop methodology to compute the information matrix and confidence intervals for all unknown parameters based on the likelihood inferential framework. We present three illustrative examples related to biological systems. The computations have been carried out with help of mathematical software: MATLAB® 8.0 R2014b.

#### 1. Introduction

Delay differential equations (DDEs) are widely used to model many real life phenomena, especially in science and engineering. Examples include the modeling of spread of infectious diseases, modeling of tumor growth and the growth of blood clots in the brain, population dynamics, traffic monitoring, and price fluctuations of commodities in economics; see [1–4]. A univariate delay differential equation model (DDEM) with multiple delays equates the real valued observations, , as noisy realizations from an underlying DDE: where ’s are errors assumed to arise from a noise distribution with zero mean and unknown standard deviation . In (1), is the solution, , of the DDEevaluated at the time points, ; in (2), , is the th delay term with delay parameter , and is a vector of other parameters of interest that govern the trajectories of the underlying DDE in (2). Equations (1) and (2) constitute a univariate DDEM in the most general form. In a DDEM, the parameters and are often unknown and have to be estimated based on observations

Not many methods appear in the statistical literature on parameter estimation and inference for DDEMs. Among the statistical approaches that have been suggested, many involve restrictions on the form of DDEMs that are being investigated. When such restrictions are relaxed, high computational costs and challenges arise. Typically, further inferential procedures such as obtaining standard errors and confidence intervals associated with parameter estimates involve further computational costs and challenges. We give a brief review of these works and approaches that have been reported in the literature in the following paragraph.

Ellner et al. [5] estimate the derivative of a univariate DDEM, which is assumed to be in an additive form, using nonparametric smoothing. Subsequently, they infer the constant (single) delay parameter, , based on fitting a generalized additive model. Ellner’s technique, although it unifies previous works, can thus be applied to DDEMs which satisfy the assumed additive form only. Wood [6] developed spline based model fitting techniques in the case when the DDEMs are partially specified. The spline based method involves high computational costs as cross-validation is used to select the smoothing coefficients associated with the penalty term as well as the unknown parameter estimates. A penalized semiparametric method is proposed by Wang and Cao [7] which involves maximizing an objective function consisting of two terms: a likelihood term and a penalty term which measures the discrepancy between an estimate of the derivative, , and the right hand side of the DDEM in (1). The selection of smoothing coefficients is done, similar to [6], via cross-validation, whereas standard errors of parameter estimates are obtained by bootstrapping. It follows that the method of [7], like [6], involves high computational costs. Further, Wang and Cao consider only univariate DDEMs with a single delay parameter. An estimation method based on Least Squares Support Vector Machines (LS-SVMs) for approximating constant as well as time-varying parameters in deterministic parameter-affine DDEMs is presented by Mehrkanoon et al. [8]. We note that Mehrkanoon performs parameter estimation only; no standard errors of estimates or confidence intervals are reported. Further, only single delays (either constant or time varying) are considered in [8].

In this paper, we consider parameter estimation and inference for univariate DDEMs with multiple delays based on the maximum likelihood. The method of maximum likelihood, as advocated by Fisher in his important papers [9, 10], has become one of the most significant tools for estimation and inference available to statisticians. Maximum likelihood estimators (MLEs) are well defined once a distributional model is specified for the observations. MLEs have well-behaved and well-understood properties: Huber [11] presents general conditions whereby the MLE is consistent for the true value of the unknown parameters for large sample sizes. Wald [12] and Akaike [13] observed that the maximum likelihood estimator is a natural estimator for the parameters when the true distribution is unknown. The large sample theory and distributional properties of MLEs can be used to perform subsequent inference procedures such as obtaining standard errors and confidence intervals and performing tests of hypotheses at minimal additional computational costs. MLEs are also the basic estimators that are used in subsequent statistical inferential procedures such as model selection using Akaike Information Criteria (AIC), Bayes Information Criteria (BIC), and other model selection criteria. Model selection is an important issue in DDEMs, such as for partially specified DDEMs in [6], where several models can be elicited for an observed physical process, but one model needs to be selected among many which fits the observed data and is simple enough to understand (Occam’s razor principle).

MLE can be developed for a large variety of estimation situations and is asymptotically efficient, which means that for large samples it produces the most precise estimates compared to non-MLE based methods (such as [8]). These are the reasons why we preferred using MLE over all other estimators for DDEMs in this paper.

The remainder of this paper is organized as follows: we define univariate DDEMs in Section 2. In Section 3, the MLE approach for DDEMs is outlined and the MLE is obtained computationally using an adaptive grid procedure followed by a gradient descent algorithm. We also develop algorithms for obtaining the information matrix and construct standard errors and confidence intervals for the unknown parameters. Three examples of univariate DDEMs related to biological systems are presented, and the numerical solutions and results based on the proposed methodology are provided based on simulation in Section 4.

#### 2. General Model Formulation

Recall the DDEM defined by (1) and (2). The observation is obtained at the th sampled time point, , with , where , . In the remainder of this paper, the errors are assumed to be independent and identically distributed according to a normal with mean zero and unknown standard deviation , that is, . The underlying dynamical system , is expressed implicitly in terms of the DDE. The general form of DDE with multiple delays for is given by (2) as , where , is 1-dimensional function, and denotes the first derivative of with respect to time . The quantities and are unknown parameters of the DDEM, where is a vector of unknown parameters of dimension and is a vector of time delays of dimension . The complete trajectory of the function on will be determined by (2), and initial condition function , where for all , is also unknown in addition to the unknown parameters and . For given values of , , and , we note that the solution , , is a function of , , and ; thus, we make it an explicit function of the unknown quantities , , and based on the notation. The observations are collected at the sampled time points , based on the observational model (1). Our goal is to estimate , and based on the observations Here, note that is also unknown and thus appears as a nuisance parameter since properties of the dynamical system are governed by and and not .

#### 3. The Maximum Likelihood Estimation Approach for DDEMs

The likelihood of the DDEM for parameters , , and , given observations , is where is the collection of all observations on with densitybased on the normality assumption on ’s in (1). The above expression for the likelihood can be simplified toWe assume for the moment that is fixed and known; the case when is unknown is dealt with later. Thus, the above likelihood is taken to be a function of for now. The usual practice for statistical inference is to use the natural logarithm of the likelihood function, namely, the log-likelihood function, which is given byExpressions of the log-likelihood are often simpler than the likelihood function, , since they are easier to differentiate and the results are more stable computationally.

Since is a monotonically increasing function of , it follows that the maximization of (3) and (6) is equivalent in that the same optimized parameter is found. We denote the MLE of as . The MLE is a point estimate such that and can be viewed as a random vector depending on the distribution of data, . We now consider the case when is unknown. The MLE of is denoted by . After finding as in (7), the log-likelihood equation in (6) is maximized as a function of . The resulting estimate is available in closed form and is given by

##### 3.1. A Two-Stage Numerical Procedure for Finding the MLE

To find the MLE numerically, we develop a two-stage numerical procedure consisting of an adaptive grid procedure, then followed by a gradient descent algorithm. Two stages are needed as we wish to utilize the advantages of each algorithm while avoiding the drawbacks of the other in each stage. Grid algorithms are able to find the global maximum of a function over a grid space. First, it evaluates values of the function on the grid space and then finds the grid value that corresponds to the maximum. Provided the grid space is refined enough, the grid value corresponding to this maximum will be close to the domain value that actually corresponds to the global maximum. So by gridding, we are able to ensure that we are close to the global maximum. The adaptive grid algorithm enhances the original gridding algorithm so that we will move closer and closer to the global maximum. However, the main drawback of any grid (and adaptive grid) algorithm is its slowness in convergence.

On the other hand, gradient descent algorithms can converge to maxima of a function sufficiently quickly. The main drawback of gradient descent algorithms is that it will find the nearest local maximum from the starting point. So, if the original starting point is not close to the global maximum, a gradient descent algorithm will not guarantee that the global maximum is found since it might get “stuck” at a local maximum only.

The function to maximize in our case is the log-likelihood in (6) and the domain value corresponding to this maximum is the MLE. Thus, our two-step method uses adaptive grid in the first stage to ensure that we are close to the MLE and then switches to the quasi-Newton algorithm to ensure rapid convergence to the MLE.

###### 3.1.1. Grid Procedure

To find and as in (7), the value with largest (log) likelihood should be chosen. This can be done by an adaptive grid procedure. The gridding is carried out for and , and for each pair of in the grid space, a Newton-Raphson numerical procedure is used to find the maximum value of defined as

We use the grid space , which covers values of . For every fixed value of and in , we find the MLE of , , given by maximizing the log-likelihood above. Since satisfiesthe numerical problem is solved by using Newton-Raphson method: wherein (12) and (13), and are, respectively, the first and second partial derivative of with respect to and then evaluated at for As seen from (12) and (13), we need to calculate and at each . This is done recursively as follows.

The first derivative process of is obtained by differentiating (2) with respect to . Differentiating (2) with respect to , where and are independent of each other, giveswhich implies that the first derivative process satisfies another DDE which is given by (14).

Similarly, the second derivative process is obtained by differentiating (14) with respect to , to obtainwhich implies that satisfies a DDEM depending on . The above two DDEs can be solved numerically based on initial conditions that are specified below.

To obtain the initial conditions of the first and second derivative process, we note thatThus, and for . Subsequently, we can get the value of and numerically at every value of by numerically solving the DDEs using (12) and (13). We divide each into equal segments. Here, is a natural number.

The grid algorithm operates on the grid space of , and the Newton-Raphson procedure is nested within the adaptive grid algorithm. Thus, for every grid value pair , the Newton-Raphson uses these values of to find “” via (11). On convergence of the Newton-Raphson method, we obtain the MLE of , , for each point grid in . The log-likelihood is calculated based on (6) using . Then the point based maximum is found by finding the maximum as a function of . We define the MLE which is obtained from the gridding algorithm as

###### 3.1.2. The Adaptive Grid Procedure

The adaptive grid (AG) algorithm is a repeated application of the generic grid procedure over increasingly finer intervals for . The AG algorithm is as follows:(1)Choose an initial grid space consisting of the grid points , and .(2)Maximize with respect to and as described in the grid procedure above.(3)Obtain as in (17).(4)Refine the grid: suppose as in #. The new grid space has lower and upper -grid points given by . The corresponding lower and upper -grid points are . If either the lower or upper bounds are not found, then the original grip space is enlarged so that the MLE occurs in the interior of (5)Repeat steps #–# to obtain based on the generic grid procedure. Repeat to generate the sequence , . Stop at when , a prespecified threshold.(6)The final MLE based on the adaptive grid technique is

*Remark 1. *In step #, the initial grid space is chosen to be a large domain that is likely to contain the MLE. In our simulation experiments, since the true values of are known, the domain is selected around these true values. In practice, we need to carry out an exhaustive search within the upper and lower bounds of and . If the parameters are positive, say, as is usually the case, the lower bounds can be taken to be zero. Next, we can consider a large positive numbers, say and , and construct the grid in consisting of equidistant marginal grind points. The value of need not be too large since we only aim to explore the log-likelihood profile. The log-likelihood can be evaluated at these grid points and plotted to visualize properties of the resulting surface. Depending on this plot, we can choose either to fix or increase and until we are certain that the MLE is within the selected domain.

After obtaining the first-step approximation to the MLE by the adaptive grid procedure above, we use the MATLAB function fminunc to obtain the final MLE by minimizing the negative log-likelihood function viewed as a function of the unknown parameter vector We have , and the final step MLE is defined as where with defined in (9). We require to input the gradient vector for this MATLAB function which is given by The explicit expression of each entry of is provided in Appendix A. The MATLAB function fminunc uses, as an option, a quasi-Newton procedure that does not require the calculation of second derivatives and hence saves computational time.

##### 3.2. Statistical Inference Based on MLEs

###### 3.2.1. Information Matrix

Now we incorporate into the estimation procedure as well. Once is obtained by the above two-stage procedure, the MLE of is obtained analytically asLet denote the vector of all unknown parameters (including ) where . Subsequent inference based on the MLEs requires the computation of the Fisher information [14, 15]. The Fisher information matrix is given by the symmetric matrix whose th element is the covariance between th and th first partial derivatives of the log-likelihood:Based on the expected values of the second partial derivatives, the Fisher information matrix in (22) is equivalent toThe observed Fisher information matrix is simply , the information matrix evaluated at the maximum likelihood estimate, , of . Further, its inverse evaluated at the MLE is an estimate of the asymptotic covariance matrix for which is given bySince the log-likelihood function is given bythe first-order partial derivative of with respect to each element of is given byfor . The second-order partial derivative isfor , for , andTaking expectations on both sides of the above equations and from (23), we get for , ,for , andWe compute each element of the matrix in (23) for DDEMs with single and multiple delays; the explicit expressions are given in Appendices A and B.

###### 3.2.2. Confidence Intervals

A confidence interval for an unknown parameter gives the range of values most likely to cover the true value of the parameter with high probability. The standard form of a confidence interval is To construct a level C confidence interval for any element of , say, , for , we need to find an estimate of the margin of error. First, the estimated standard error of the maximum likelihood estimate, , of is given bywhere is the covariance matrix as given in (24). The explicit terms of the covariance matrix can be obtained by substituting (30) into (24). The confidence interval for iswhere norminv, , is the desired significance level and is the margin of error. We can find confidence intervals for all components of in this way. In some cases, the estimated confidence interval in (35) may include some negative values which is unreasonable for a parameter that is known to be positive. In this case, we perform a logarithmic transformation of the parameter, construct the confidence interval for the log-transformed parameter, and then transform the confidence interval back to the original parameter space. The confidence interval for based on this log-transformation procedure is

#### 4. Examples

We present three examples of DDEMS in the univariate case: we consider two models with a single delay and a third one with two delays.

##### 4.1. Example 1

We consider the exponential delay differential equation model (EDDEM) with a single delay (i.e., ) which is the solution to the DDE The EDDEM in (37) is a model for ideal population growth under infinite resources and no deaths, such as a protozoan or bacterial culture dividing under constant environmental conditions. The delay parameter can be taken to represent the gestation period or maturity period, that is, the time taken for individuals to be ready for division. The parameter represents the growth rate of the population. We numerically solve the DDE in (37) using the MATLAB function dde23 with fixed parameters values at . Sampled observations from the DDEM as in (1) were obtained at discrete time intervals of width starting from . The endpoint considered is corresponding to . The aim is to estimate , , , and based on . Figure 1(a) illustrates the different behaviour of based on different parameter specifications. Figure 1(b) shows the underlying trajectories of the solution from the DDE model (37) and the sampled observations.