Research Article | Open Access

# Separable Nonlinear Least-Squares Parameter Estimation for Complex Dynamic Systems

**Academic Editor:**Honglei Xu

#### Abstract

Nonlinear dynamic models are widely used for characterizing processes that govern complex biological pathway systems. Over the past decade, validation and further development of these models became possible due to data collected via high-throughput experiments using methods from molecular biology. While these data are very beneficial, they are typically incomplete and noisy, which renders the inference of parameter values for complex dynamic models challenging. Fortunately, many biological systems have embedded linear mathematical features, which may be exploited, thereby improving fits and leading to better convergence of optimization algorithms. In this paper, we explore options of inference for dynamic models using a novel method of *separable nonlinear least-squares optimization* and compare its performance to the traditional nonlinear least-squares method. The numerical results from extensive simulations suggest that the proposed approach is at least as accurate as the traditional nonlinear least-squares, but usually superior, while also enjoying a substantial reduction in computational time.

#### 1. Introduction

Nonlinear dynamic models are widely used for characterizing the processes that govern complex biological pathway systems. Of particular interest in this context are so-called canonical formats, which are very flexible in their possible responses, yet involve a very restricted domain of functional forms. Outside linear systems, the best-known canonical formats are Lotka–Volterra (LV) models [1–4], which use binomial terms, and power-law systems within the framework of Biochemical Systems Theory (BST), which exclusively use products of power functions. BST was originally devised for the analysis of biochemical and gene regulatory systems, but has subsequently found much wider application in various biomedical and other areas [5, 6]. Whereas it is easy to set up an LV or BST model for a complex biological system in a symbolic format, the identification of optimal parameter values continues to be a significant challenge. As a consequence, estimating parameters of systems of ordinary differential equations (ODEs) remains to be an active research area that attracts contributions from a variety of scientific fields (e.g., [7–12]. Indeed, numerous optimization methods for ODE models have been proposed in recent years, but none works exceptionally well throughout a wide range of applications, with reasons spanning the entire spectrum from intrinsic problems with biological data (sparseness, uncertainties, noise, …) to technical and computational issues (numerous local minima, unidentifiability, sloppiness, …). Methods like slope-based estimation (e.g., [13]) and dynamic flux estimation [14–16] alleviate these problems but are not panacea.

Here, we revisit, and bring to fruition, early ideas [17] of separating estimation tasks into linear and nonlinear aspects. However, our main focus is *not* really a new estimation method *per se*. Instead, we are interested in a more general and higher-level point of view regarding parameter estimation than that typically presented in technical articles. Specifically, this article addresses parameter estimation for dynamic models whose mathematical format contains linear features that allow a natural separation of parameters and system states. A trivial example is a linear ODE where the vector field is linear in the parameter *θ*, with denoting the derivative of with respect to *t*. As a more interesting example, the ODE vector field may be partially linear in the parameters, as it is the case for so-called S-system models in BST [5].

*Example 1. *An S-system [18] is defined asHere are non‐negative rate constants, while are real-valued kinetic orders that reflect the strength and directionality of the effect that a variable has on a given influx or efflux. Informally, one can view this system as a regression equation, where the “covariates” are the variables on the right-hand side, whereas the “response” variables are the derivatives on the left-hand side. Note that the vector field is linear in the rate constants , but nonlinear in the kinetic orders .

Estimation methods that exploit separability of parameters and system states in dynamic models have a long history; see [19] for a special case. However, a rigorous statistical analysis of such a method has been achieved only recently [20]. In a classical paper on the inference for dynamic models, Varah [17] mentioned in passing that “one can use the idea of separability or variable projection (see [21] or [22]), in which the linear parameters are implicitly solved for, the resulting (fully) nonlinear least-squares problem is solved for the nonlinear parameters, and then the linear parameters are obtained using their representation in terms of the nonlinear parameters. Since this reduces the size of the nonlinear least-squares problem to be solved, it is worthwhile.” Somewhat surprisingly, given that parameter estimation for ODEs is commonly thought as a computational bottleneck in modeling dynamic processes, Varah’s suggestion has not been widely followed in practice. In fact, in the vast literature dedicated to parameter fitting techniques for dynamic models, we are aware of only two relevant references: using a direct integral approach, Dattner et al. [23] applied a separable nonlinear least-squares technique to the inference of parameters in a predator-prey system acting in a heterogeneous environment, while Wu et al. [24] used separability to estimate parameters of high-dimensional linear ODE systems. Moreover, Varah’s idea of exploiting separability for estimating ODE parameters has been implemented only recently in a publicly available software package [25]. Pertinent details of this software will be discussed in a later section.

The analysis in this paper is hoped to convince the reader that Varah’s idea is indeed worth pursuing. To support this claim, we explore and compare two general data fitting approaches for dynamic models: the traditional nonlinear least-squares method (NLS) and the proposed separable nonlinear least-squares method (SLS). Through extensive Monte-Carlo simulations of representative complex models, we identify and quantify significant statistical and computational gains obtained with this separation method. We will ultimately come to the conclusion that model separability can be very beneficial and that the SLS approach should be considered for any complex dynamic system that possess significant linear features.

The paper is organized as follows. In Section 2, we present details of the SLS methodology in the context of dynamic models. Section 3 describes the simulation setup, quantifies the statistical measures we use in order to compare the performance of SLS and NLS, and presents numerical results. In Section 4, we point out future research directions, while conclusions are provided in Section 5.

#### 2. Separable Nonlinear Least-Squares (SLS) and Varah’s Idea

##### 2.1. Generalities

Following Varah’s original idea within the context of inference in dynamic models, the main advantages of exploiting separability for parameter estimation are the following [26]:(i)Fewer initial guesses are required for optimization(ii)The optimization problem is better conditioned(iii)Convergence is faster

These advantages have been convincingly demonstrated in several publications. For example, see Mullen [27] for an implementation and applications in physics and chemistry; Chung & Nagy [28] for a high-dimensional case, where the number of parameters is substantially larger than the number of observations; Gan et al. [29] who compared the performance of several algorithms for SLS problems; and Erichson et al. [30] who studied sparse principal component analysis via variable projection. Separable models are of broad practical applicability, and as such form a subject of active theoretical and applied research. For instance, when analyzing the “reduced” nonlinear optimization problem of a separable structure, simplified conditions are required for establishing a variety of theoretical results concerning numerical and statistical properties of the resulting estimators, compared to the original NLS problem (e.g., [20, 31]).

In the following, we focus on complex dynamic models that are formulated as systems of ordinary differential equations (e.g., [32]). Specifically, consider a system of equations given bywhere takes values in , , and . For our purposes, we explicitly separate linear components from nonlinear ones in the function *F* by settingwhere , and the symbol stands for the matrix transpose (cf., [20]). Here , a vector of size , stands for the “nonlinear” parameters that are not separable from the state variables *x*, while , a vector of size , contains the “linear” parameters; note that . As the vector field in (3) is separable in the linear parameter vector , we refer to the corresponding ODE system as *linear in the parameter * (cf. the case of a linear regression model), although the *solution* to the system might be highly nonlinear in θ, or even implicit.

*Example 2. *Let

Then, one sees that equation (1) is a special case of (2)–(3).

##### 2.2. Solution Strategy

Let , , be the solution of the initial value problem (2). We assume that noisy measurements on the system are collected at time points . A common statistical formulation of this situation is

Here the random variables are unobservable, independent measurement errors (not necessarily Gaussian) with zero mean and finite variance.

Varah’s approach to parameter estimation in ODE models works as follows. Let stand for a smoother of the data, obtained, e.g., using splines or local polynomials (see e.g., [33, 34] and [35] for a treatment of various smoothing methods and an extensive bibliography). This smoother approximates the solution to the ODE (2). Varah suggests to insert the smoother into equation (2), which will now be satisfied only approximately, and to minimize the resulting discrepancy over the parameters *ξ* and . A minimizer is then an estimator of . This idea was put on a solid statistical foundation in Brunel [36] and Gugushvili and Klaassen [37]. Varah’s original approach requires the use of the derivative as an estimator of , which is a disadvantage, as it is well known that estimating derivatives from noisy and sparse data may be rather inaccurate; see e.g., Vilela et al. [38] and Chou and Voit [7] or more generally Fan and Gijbels [33]. Recent research [20, 23, 39–46] has shown that it is more fruitful to transplant Varah’s idea to the solution level of equation (2). To accomplish this shift, we define an integral criterion functionas it is typical in estimation approaches based on integrals (see references above). Here, is the Euclidean norm. A minimizer of (5) over yields a parameter estimator that typically has slightly different features than an estimator based on the differential equations themselves. In practice, the integral is discretized and replaced by a sum, so that minimization can be performed using a typical nonlinear least-squares method, such as *fminsearch* in Matlab. The discretized format is

The NLS solution does not take into account the specific linear form of the ODEs in (3), but uses the general form in (2).

It is at this stage that Varah suggested to utilize separability, without actually investigating such an approach. Here, we provide the necessary details (cf. [20]). Denote

Then, with kept fixed, a minimizer of (5) is given by

where denotes the identity matrix. The notation and emphasizes the dependence of the solution on the nonlinear parameters . This solution is plugged back into (5), yielding the reduced integral criterion function (cf. [23]):

Once is minimized over and a solution

is obtained, estimators for *ξ* and *θ* follow immediately and are given (with mild abuse of the matrix transpose notation) byrespectively. Equations (7) and (8) are driven by Varah’s [17] suggestion discussed above. Indeed, note that the nonlinear optimization is applied only for estimating the nonlinear parameters , which, in comparison to the NLS approach, can substantially reduce the dimension of the nonlinear optimization problem.

From the above derivation, it is clear that SLS problems constitute a special class of NLS problems, with linear and nonlinear objective functions for different sets of variables. While the idea of using separability for improving parameter estimation was presented already in Lawton and Sylvestre [47], it seems that much of the subsequent literature is based on the variable projection method proposed by Golub and Pereyra [21]. Golub and Pereyra [26] reviewed 30 years of research into this problem.

#### 3. Simulation Framework and Results

In order to investigate the relative performance of SLS and NLS, we designed and performed a large Monte-Carlo simulation, whose results are presented in this section.

All computations were carried out in *R* on an Amazon EC2 m5a.4xlarge instance using the *simode* package of Yaari and Dattner [25] (Separable Integral Matching for Ordinary Differential Equations). The statistical methodologies applied in the package use smoothing and minimization of an integral-matching criterion function, taking advantage of the mathematical structure of the differential equations like separability of parameters from equations. Application of smoothing and integral-based methods to parameter estimation of ordinary differential equations was shown to yield more accurate and stable results comparing to derivative based ones [20]. Here, we used default smoothing and optimization settings in *simode*, and in that respect, both SLS and NLS received equal treatment. Specifically, *simode* uses cross validation (see, e.g., [35]) to determine the optimal amount of smoothing. A detailed guide for using the package can be found in Yaari and Dattner [25]. The code to reproduce our numerical results can be accessed on GitHub (see https://github.com/haroldship/complexity-2019-code/tree/master/Final Code First Submission). For plotting, we relied on the *ggplot2* package in *R*, see Wickham [48].

##### 3.1. Monte-Carlo Study Design

We chose several representative and challenging ODE models arising in a variety of scientific disciplines. Those were(i)An SIR model for simulating the spread of an infectious disease(ii)A Lotka–Volterra population model with sinusoidal seasonal adjustment(iii)A Generalised Mass Action (GMA) system within BST, e.g., for metabolic pathway systems(iv)A FitzHugh–Nagumo system of action potentials along neuronal axons

Further mathematical details on these systems and the specific experimental setups we used are given below.

In each case, we generated observations by numerically integrating the system and then adding independent Gaussian noise to the time courses, as in (4). We considered various parameter setups, sample sizes, and noise levels, as specified below. The ODE parameters were estimated via both NLS and SLS, as defined in equations (6) and (8), respectively.

As performance criteria, the time required to perform optimization and the accuracy of the resulting parameter estimates were used. While comparing computation times is trivial, numerous options are available for comparing accuracy. We focused on the main difference between the two optimization schemes, namely the way they deal with the estimation of linear parameters. SLS does not require initial guesses for these parameters. By contrast, NLS does require a good initial guess for each linear parameter; otherwise, it might diverge or get stuck in a local minimum. Thus, finding “good” solutions to nonlinear optimization problems often requires “good” initial guesses in the parameter space. Clearly, some “prior information” regarding these parameters is of crucial importance for optimization purposes. The key insight is that this prior information is encapsulated in the mathematical form of the ODEs themselves, such as (3). Importantly, while NLS does not take into account the special mathematical features of the ODEs and treats all the parameters in a uniform manner, this is not the case for SLS. Thus, one might *a priori* expect SLS to be more efficient and possibly more accurate than NLS, when prior information regarding the linear parameters is of low quality. On the other hand, when one has high-quality prior information regarding the linear parameters, we expect that SLS and NLS will perform similarly. One might note that the nonlinear parameters in almost all GMA and S-systems are very tightly bounded, usually between −1 and +2, and that their sign is often known, whereas the linear parameters are unbounded in GMA systems and nonnegative in S-systems, and nothing is known about their magnitudes (see Chapter 5 of [18]). Thus, not needing prior information on the linear parameters in SLS can be a tremendous advantage.

For the Monte-Carlo study, we varied the prior information by using high-, medium-, and low-quality initial guesses for the parameter values. Here, higher quality means that the initial guesses were closer to the truth. To be more specific, the initial guesses for the linear parameters used by NLS were Gaussian random variables centered on the true parameter values and having standard deviations equal to the true parameter multiplied by a prior information value (in other words, the prior information value can also be understood as the coefficient of variation of the “prior distribution”). The specific quantification of “high,” “medium,” and “low” is admittedly somewhat subjective and varies across the different ODE models, as specified below. For the sake of better and faster convergence of the optimization algorithms (especially NLS), the nonlinear parameters were constrained to a given range, and this range was the same no matter how we varied the prior information on linear parameters. Further, in each Monte-Carlo iteration, we used exactly the same (pseudorandom) initial guess for nonlinear parameters for both NLS and SLS. Thus, as far as the information on nonlinear parameters is concerned, this was kept invariant for each benchmark model, irrespective of the prior on linear parameters. Consequently, both algorithms received the same prior information regarding nonlinear parameters, and neither one was treated preferentially.

The noise level (signal-to-noise ratio, SNR) we used is defined as follows. For a given solution of an ODE equation, we calculate the standard deviation . Then SNR of, say, and is given by , and , respectively, where *σ* is the standard deviation of a Gaussian measurement error ϵ as defined in equation (4). We will refer to these SNRs as “low noise” and “high noise,” respectively (cf. [49]; albeit in a different context). We then compared the mean square errors (MSE) of the resulting parameter estimates, which leads to a valid comparison in statistically identifiable ODE models (see e.g., [20] for relevant definitions and results). As another accuracy measure, we used the criteria (5) and (7) evaluated at optimal parameter values. The two criteria we propose, though reasonable, are different. Hence, they are not expected to be in agreement in every experimental setup. However, the global conclusions reached with them in Section 5 are coherent and favor SLS.

We now provide the mathematical details on the models and the experimental setups.

###### 3.1.1. Age-Group SIR

The system of interest is an epidemiological model of SIR-type (Susceptible—Infected—Recovered) and includes age groups and seasonal components (e.g., [50]). The infectious process in each age group and each season is described using two equations for the proportion of susceptible (*S*) and infected (*I*) individuals within the population (the proportion of recovered individuals is given by ):

The parameters of the model are the transmission matrix *β*, the recovery rate *γ*, and , which signify the relative infectivity of, e.g., influenza virus strains circulating in seasons compared to season 1 ( is used as a reference and fixed at 1). As shown in Yaari et al. [46], taking into account separability characteristics of this model is advantageous for data fitting purposes. Specifically, (9) is nonlinear in the initial value , which are typically unknown and have to be estimated. For our purposes, it suffices to consider a model with one age group and one season. The following parameter setup was used: . We considered two sample sizes, 18 and 36, and two noise levels, and . The prior information used was , corresponding to high, medium, and low quality, respectively. The size of the Monte-Carlo study was 500 simulations.

###### 3.1.2. Lotka–Volterra with Seasonal Forcing

As another benchmark we considered an extension of a classical predator-prey model, namely, a Lotka–Volterra model including seasonal forcing of the predation rate, using two additional parameters that control the amplitude () and phase (*ω*) of the forcing:

The nonlinear parameters are *ϵ* and *ω*. We considered the dynamics within the time interval . The parameter setup is given by

and initial values are . Four experimental scenarios were studied, corresponding to sample sizes of 100 and 200, and SNRs of and . The prior information values were , corresponding to high, medium, and low quality, respectively. The size of the Monte-Carlo study was 500 simulations.

###### 3.1.3. GMA System

The GMA system we analyzed consists of three differential equations in three variables ([18]; pp. 84–85). They are

Here the linear parameters are the rate constants *γ*, while the nonlinear ones are the indexed kinetic orders *f*. Note that the parameters *f* are allowed to become negative and their sign might or might not be known. We considered the dynamics of the system within the time interval . The parameter setup is the one presented in Voit [18]; namely,

and initial values are . Four experimental scenarios were studied: sample sizes of 100 and 200, with SNRs of and . The prior information values were , corresponding to high, medium, and low quality, respectively. The size of the Monte-Carlo study was 500 simulations. Parameter estimation for GMA systems is considered to be a challenging numerical task [18].

###### 3.1.4. FitzHugh–Nagumo System

The FitzHugh–Nagumo (FHN) system [51–53] models spiked action potentials in neuron transmission. It is given by

This system with two state variables was proposed as a simplification of a more complicated model presented in Hodgkin and Huxley’s study [54] for studying and simulating nerve function in giant squid axons. The FHN model is used in neurophysiology as an approximation of the observed action potential.

The system (10) is linear in parameters , and *b*, but nonlinear in *c*. We considered two sample sizes, and , and two SNRs of and . The parameters were set to . The initial values were . The true solutions were obtained over the time interval . The prior information used here was , corresponding to high, medium, and low quality, respectively (the initial guesses for parameters were assured to be positive). The size of the Monte-Carlo study was 500 simulations. Many researchers studied the problem of parameter estimation for the FHN model. In particular, Ramsay et al. [55], Campbell and Steele [56], and Ramsay and Hooker [11] pointed out several difficulties in estimating the parameters for this ODE system.

##### 3.2. Results of the Monte-Carlo Analysis

Our findings are presented through charts and tables. The primary summaries are Tables 1 and 2, where we report the ratios of the mean square errors (square errors averaged over Monte-Carlo simulations) for estimates of linear parameters (for nonlinear parameters, see the discussion at the end of this section). Several conclusions can be gleaned from the tables.(i)Given high-quality prior information, the accuracy of NLS and SLS is comparable, and neither is superior throughout the variety of experimental setups (at least some of the differences that one sees from the raw numbers in the tables are plausibly attributable to the Monte-Carlo simulation error and as such appear to be insignificant).(ii)When the quality of prior information degrades to medium or low, the performance of SLS becomes in most cases significantly better than that of NLS (with an extent depending on the specific experimental setup).(iii)For a fixed noise level, as the sample size increases, the advantage of SLS becomes more pronounced.(iv)For a fixed sample size, as the noise level increases, the SLS is still better than NLS, but to a lesser extent.

| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

The MSE ratios (computed by averaging square errors over Monte-Carlo simulation runs) of NLS and SLS for estimating the linear parameters in various benchmark models and under different experimental setups are displayed (see Section 3.1 for detailed specifications). To identify model names, self-explanatory abbreviations are used. The values in the table are rounded off to one significant digit. The sample size is for the GMA and Lotka–Volterra models, for the FitzHugh–Nagumo system, and for SIR model. The noise levels are and . Values larger than 1 in the table correspond to the cases where SLS performs better than NLS. Note the decreasing pattern in the columns, reflecting the effect of the quality of prior information on the performance of NLS. |

| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

These results can also be visualized through a combination of simple statistical charts. Thus, Figure 1 displays the line graphs that compare MSEs of the two methods under several experimental setups. Whereas the numbers in Tables 1 and 2 are ratios of MSEs, the figures here present absolute MSE values. From the graphs, an advantage of SLS over NLS is apparent for less than ideal prior information. Note that, in this specific setting, SLS performed worse than NLS for high-quality prior information. A plausible explanation may be the following: while, under our experimental setup, the amount of information used by SLS via (3) is fixed throughout simulations, NLS can in principle receive arbitrarily precise initial guesses on linear parameters. One may therefore envision a threshold, where using the latter kind of information outweighs the benefits of using the structural relationship (3). However, a precise quantification of the phenomenon is hardly possible beyond an observation that it appears to manifest itself in scenarios with excellent knowledge on likely parameter values. In reality, such ideal prior information is rare.

**(a)**

**(b)**

Panel (a) of Figure 1 further suggests that in the specific scenarios we report, SLS improves when the noise level decreases, which is different from NLS in the same figure.

Figure 2 is a scatterplot of NLS and SLS losses (5) and (7) (on a log scale) evaluated at optimal parameter estimates. The figure highlights in yet another manner the importance of prior information for NLS: it is evident that the performance of the latter is strongly affected by the quality of initial parameter guesses. Again, NLS and SLS perform similarly when the prior information is of high quality. However, when the quality of prior information is less than ideal, as it is in most applications, NLS becomes substantially worse than SLS. The scatterplot also gives a quick impression of the variability of estimation results.

The conclusions that we drew from Figure 2 are confirmed by the panel (a) of Figure 3, which presents boxplots of NLS and SLS losses (on a log scale) measured according to criteria (5) and (7). The pattern is clear: SLS performs better than NLS, and the inferiority for NLS becomes more dramatic with degrading prior information.

Panel (b) of Figure 3 summarizes computation times. SLS is in general much faster. The execution time of NLS is affected by the quality of prior information and, interestingly, increases with this quality. The results for all other models (and noise levels) were similar and are therefore omitted.

| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

The table displays the MSE ratios (computed through squared errors averaged over Monte-Carlo simulations) of NLS and SLS for estimating the nonlinear parameters. The experimental setup is as in Table 1. Values larger than 1 in the table correspond to the cases where SLS performs better than NLS. Since the prior information regarding nonlinear parameters stays invariant (see Section 3.1 for details), the table in particular shows the effects that the quality of initial guesses for linear parameters has on the estimation accuracy of NLS in the case of nonlinear ones. The results suggest that, in some settings, vague prior knowledge regarding linear parameters may have an adversary effect on the accuracy of NLS with respect to the nonlinear parameters. |

Finally, Tables 3 and 4 provide information regarding the nonlinear parameters. In the case of NLS, one can observe how prior knowledge regarding linear parameters propagates into the estimation accuracy for nonlinear parameters. In particular, for less than ideal prior information on the linear parameters, SLS holds a pronounced edge over NLS, even in the case of nonlinear parameters.

#### 4. Outlook

Data fitting in complex dynamical systems remains a challenging problem that cannot be treated in a cavalier fashion, even if one takes advantage of separability. For instance, in order to uncover the patterns in Section 3 of this work, we had to carefully design the experimental study, because otherwise simulations might not have converged or might have converged to poor solutions. This was true for both NLS and SLS, but whenever they were observed, convergence issues were much more severe for NLS; they were especially sensitive in the case of the FitzHugh–Nagumo system. This result highlights the crucial role of *prior information* regarding the parameters or, expressed differently, the quality of the initial parameter guesses used for the optimization. We focused here primarily on the effects of the prior information regarding the linear parameters. However, it also became clear that prior information on the nonlinear parameters has an equally crucial role for optimization purposes, and this was true for both NLS and SLS (data not shown).

As a result of our exploratory work, we envision the following promising research directions for the future.

##### 4.1. Numerical Implementation of SLS for Dynamic Systems

All computations in our analysis were done in *R* using the *simode* package of Yaari and Dattner [25]. However, the idea of using separability properties of ODEs is independent of a particular programming language and can be implemented within other software packages quite as well. Indeed, much work has been done in the context of the variable projection method since it was first introduced by Golub and Pereyra [21]. In the context of nonlinear regression, the variable projection method of Golub and Pereyra [21] is implemented in *R* in the *nls* command; see Venables and Ripley [58] and pp. 218–220 for an example of its application. In addition, we are aware of the *TIMP* package of Mullen and van Stokkum [59]; which implements the variable projection method. Thus, a next step could be to combine the strengths of different packages, e.g., *simode* and *TIMP*, in order to develop advanced software for variable projection in the context of dynamic systems.

##### 4.2. Customized Algorithms for Specific Classes of Complex Dynamical Systems

It is well known that the performance of an optimization scheme depends crucially on the underlying mathematical model used for description of the data. Thus, it appears that different classes of dynamic models require specific algorithms tailored to their peculiarities. For instance, parameter estimation for GMA systems has different challenges than those encountered when working with SIR (see Section 3). We expect that there is much to gain from focusing future research on specific classes of models and developing stable algorithms for their parameter estimation.

##### 4.3. Theoretical Properties of SLS in the Context of Dynamic Systems

Gugushvili and Klaassen [37] studied the statistical properties of NLS in the general context of smoothing, while Dattner and Klaassen [20] specifically addressed ODE systems that are linear in (functions of) the parameters. One might expect that some assumptions used in Gugushvili and Klaassen [37] can be relaxed when the problem is closer to the one considered in Dattner and Klaassen [20].

##### 4.4. Extensions to Partially Observed, High-Dimensional, and Misspecified Dynamic Systems

Recent work dealing with inference in high-dimensional ODE models suggests that exploiting linearity in parameters is crucial for developing a successful estimation methodology (see e.g., [24, 39]). More generally, it would be interesting to use the variable projection method to study cases of partially observed, high-dimensional, and possibly misspecified dynamic systems. This work might additionally require the use of high-dimensional regularization techniques (e.g., [39]) for balancing data and model, and specifically take into account a potential model misspecification (see [55]).

#### 5. Conclusions

In this work, we designed an extensive simulation study to explore the relative statistical and computational performance of two optimization schemes for inference in dynamic systems: the typical nonlinear least-squares (NLS) method and a novel, separable nonlinear least-squares (SLS) approach. As benchmarks, we considered several widely used ODE models arising in a variety of biological fields. We measured statistical performance of the two methods by the mean square error (MSE) of the estimates. As another performance criterion, we employed the loss function values at the optimal parameter estimates. Computational performance of the methods was also compared by the execution times required to complete each optimization.

Our overall recommendation is the following: whenever a complex dynamic system contains an appreciable number of linear parameters, estimation of its parameters should be addressed with the separable nonlinear least-squares method, rather than the more commonly used, generic nonlinear least-squares method. The general pattern emerging from our study is that SLS performs at least as well as, and frequently better than, NLS, especially if the prior information regarding the system is not ideal, which is typically the case in practice. This statement was found to be uniformly true over all models tested.

#### Data Availability

The code to reproduce our numerical results can be accessed at https://github.com/haroldship/complexity-2019-code/tree/master/Final_Code_First_Submission.

#### Disclosure

A preprint of the article is available on ArXiv at https://arxiv.org/abs/1908.03717.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

Itai Dattner was supported by the Israeli Science Foundation grant number 387/15. Eberhard Voit was supported by grants NSF-MCB 1615373 (PI: Diana Downs) and NIH 2P30ES019776-05 (PI: Carmen Marsit). The funding agencies are not responsible for the content of this article.

#### References

- A. J. Lotka,
*Elements of Mathematical Biology*, Dover Publications Inc., New York, NY, USA, 1956. - R. M. May,
*Stability and Complexity in Model Ecosystems. With a New Introduction by the Author*, Princeton University Press, Princeton, NJ, USA, 2nd edition, 2001. - M. Peschel and W. Mende,
*The Predator-Prey Model: Do We Live in a Volterra World?*Springer, New York, NY, USA, 1986. - V. Volterra, “Fluctuations in the abundance of a species considered mathematically,”
*Nature*, vol. 118, no. 2972, pp. 558–560, 1926. View at: Publisher Site | Google Scholar - M. A. Savageau, “Biochemical systems analysis,” in
*A Study of Function and Design in Molecular Biology, Vol. 6739 of Advanced Book Program*, Addison-Wesley Publishing Company, Boston, MA, USA, 1976. View at: Google Scholar - E. O. Voit, “Biochemical systems theory: a review,”
*ISRN Biomathematics*, vol. 2013, Article ID 897658, 53 pages, 2013. View at: Publisher Site | Google Scholar - I.-C. Chou and E. O. Voit, “Recent developments in parameter estimation and structure identification of biochemical and genomic systems,”
*Mathematical Biosciences*, vol. 219, no. 2, pp. 57–83, 2009. View at: Publisher Site | Google Scholar - P. Gennemark and D. Wedelin, “Efficient algorithms for ordinary differential equation model identification of biological systems,”
*IET Systems Biology*, vol. 1, no. 2, pp. 120–129, 2007. View at: Publisher Site | Google Scholar - M. Girolami and B. Calderhead, “Riemann manifold Langevin and Hamiltonian Monte Carlo methods,”
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, vol. 73, no. 2, pp. 123–214, 2011. View at: Publisher Site | Google Scholar - K. McGoff, S. Mukherjee, and N. Pillai, “Statistical inference for dynamical systems: a review,”
*Statistics Surveys*, vol. 9, pp. 209–252, 2015. View at: Publisher Site | Google Scholar - J. Ramsay and G. Hooker,
*Dynamic Data Analysis. Modeling Data with Differential Equations*, Springer, New York, NY, USA, 2017. - K. Schittkowski,
*Numerical Data Fitting in Dynamical Systems. A Practical Introduction with Applications and Software*, Kluwer Academic Publishers, Dordrecht, Netherlands, 2002. - E. O. Voit and J. Almeida, “Decoupling dynamical systems for pathway identification from metabolic profiles,”
*Bioinformatics*, vol. 20, no. 11, pp. 1670–1681, 2004. View at: Publisher Site | Google Scholar - S. Dolatshahi and E. O. Voit, “Identification of metabolic pathway systems,”
*Frontiers in Genetics*, vol. 7, p. 6, 2016. View at: Publisher Site | Google Scholar - M. Faraji and E. O. Voit, “Stepwise inference of likely dynamic flux distributions from metabolic time series data,”
*Bioinformatics*, vol. 33, no. 14, pp. 2165–2172, 2017. View at: Publisher Site | Google Scholar - G. Goel, I.-C. Chou, and E. O. Voit, “System estimation from metabolic time-series data,”
*Bioinformatics*, vol. 24, no. 21, pp. 2505–2511, 2008. View at: Publisher Site | Google Scholar - J. M. Varah, “A spline least squares method for numerical parameter estimation in differential equations,”
*SIAM Journal on Scientific and Statistical Computing*, vol. 3, no. 1, pp. 28–46, 1982. View at: Publisher Site | Google Scholar - E. O. Voit,
*Computational Analysis of Biochemical Systems: A Practical Guide for Biochemists and Molecular Biologists*, Cambridge University Press, Cambridge, UK, 2000. - D. M. Himmelblau, C. R. Jones, and K. B. Bischoff, “Determination of rate constants for complex kinetics models,”
*Industrial & Engineering Chemistry Fundamentals*, vol. 6, no. 4, pp. 539–543, 1967. View at: Publisher Site | Google Scholar - I. Dattner and C. A. J. Klaassen, “Optimal rate of direct estimators in systems of ordinary differential equations linear in functions of the parameters,”
*Electronic Journal of Statistics*, vol. 9, no. 2, pp. 1939–1973, 2015. View at: Publisher Site | Google Scholar - G. H. Golub and V. Pereyra, “The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate,”
*SIAM Journal on Numerical Analysis*, vol. 10, no. 2, pp. 413–432, 1973. View at: Publisher Site | Google Scholar - A. Ruhe and P. Å. Wedin, “Algorithms for separable nonlinear least squares problems,”
*SIAM Review*, vol. 22, no. 3, pp. 318–337, 1980. View at: Publisher Site | Google Scholar - I. Dattner, E. Miller, M. Petrenko, D. E. Kadouri, E. Jurkevitch, and A. Huppert, “Modelling and parameter inference of predator-prey dynamics in heterogeneous environments using the direct integral approach,”
*Journal of The Royal Society Interface*, vol. 14, no. 126, Article ID 20160525, 2017. View at: Publisher Site | Google Scholar - L. Wu, X. Qiu, Y.-X. Yuan, and H. Wu, “Parameter estimation and variable selection for big systems of linear ordinary differential equations: a matrix-based approach,”
*Journal of the American Statistical Association*, vol. 114, no. 526, pp. 657–667, 2019. View at: Publisher Site | Google Scholar - R. Yaari and I. Dattner, “Simode: statistical inference for systems of ordinary differential equations using separable integral-matching, R package version 1.1.4,” January 2019, https://CRAN.R-project.org/package=simode. View at: Google Scholar
- G. Golub and V. Pereyra, “Separable nonlinear least squares: the variable projection method and its applications,”
*Inverse Problems*, vol. 19, no. 2, pp. 1–26, 2003. View at: Publisher Site | Google Scholar - K. M. Mullen,
*Separable nonlinear models: theory, implementation and applications in physics and chemistry*, Vrije Universiteit, Amsterdam, Netherlands, 2008, Ph.D. thesis. - J. Chung and J. G. Nagy, “An efficient iterative approach for large-scale separable nonlinear inverse problems,”
*SIAM Journal on Scientific Computing*, vol. 31, no. 6, pp. 4654–4674, 2010. View at: Publisher Site | Google Scholar - M. Gan, C. L. P. Chen, G.-Y. Chen, and L. Chen, “On some separated algorithms for separable nonlinear least squares problems,”
*IEEE Transactions on Cybernetics*, vol. 48, no. 10, pp. 2866–2874, 2018. View at: Publisher Site | Google Scholar - N. B. Erichson, P. Zheng, K. Manohar, S. L. Brunton, J. N. Kutz, and A. Y. Aravkin, “Sparse principal component analysis via variable projection,” 2018, https://arxiv.org/abs/1804.00341. View at: Google Scholar
- S. Basu and Y. Bresler, “The stability of nonlinear least squares problems and the Cramér-Rao bound,”
*IEEE Transactions on Signal Processing*, vol. 48, no. 12, pp. 3426–3436, 2000. View at: Publisher Site | Google Scholar - M. W. Hirsch, R. L. Devaney, and S. Smale,
*Differential Equations, Dynamical Systems, and Linear Algebra*, vol. 60, Academic Press, Cambridge, MA, USA, 1974. - J. Fan and I. Gijbels,
*Local Polynomial Modelling and its Applications*, Chapman & Hall, London, UK, 1996, vol. 66 of Monographs on Statistics and Applied Probability. - P. J. Green and B. W. Silverman,
*Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach*, Chapman & Hall, London, UK, 1994, vol. 58 of Monographs on Statistics and Applied Probability. - L. Wasserman,
*All of Nonparametric Statistics*, Springer, New York, NY, USA, 2006. - N. J.-B. Brunel, “Parameter estimation of ODEs via nonparametric estimators,”
*Electronic Journal of Statistics*, vol. 2, pp. 1242–1267, 2008. View at: Publisher Site | Google Scholar - S. Gugushvili and C. A. J. Klaassen, “-consistent parameter estimation for systems of ordinary differential equations: bypassing numerical integration via smoothing,”
*Bernoulli*, vol. 18, no. 3, pp. 1061–1098, 2012. View at: Publisher Site | Google Scholar - M. Vilela, C. C. H. Borges, S. Vinga et al., “Automated smoother for the numerical decoupling of dynamics models,”
*BMC Bioinformatics*, vol. 8, no. 1, p. 305, 2007. View at: Publisher Site | Google Scholar - S. Chen, A. Shojaie, and D. M. Witten, “Network reconstruction from high-dimensional ordinary differential equations,”
*Journal of the American Statistical Association*, vol. 112, no. 520, pp. 1697–1707, 2017. View at: Publisher Site | Google Scholar - I. Dattner, “A model-based initial guess for estimating parameters in systems of ordinary differential equations,”
*Biometrics*, vol. 71, no. 4, pp. 1176–1184, 2015. View at: Publisher Site | Google Scholar - I. Dattner and S. Gugushvili, “Application of one-step method to parameter estimation in ODE models,”
*Statistica Neerlandica*, vol. 72, no. 2, pp. 126–156, 2018. View at: Publisher Site | Google Scholar - I. Dattner and A. Huppert, “Modern statistical tools for inference and prediction of infectious diseases using mathematical models,”
*Statistical Methods in Medical Research*, vol. 27, no. 7, pp. 1927–1929, 2018. View at: Publisher Site | Google Scholar - F. Vissing Mikkelsen and N. R. Hansen, “Learning large scale ordinary differential equation systems,” 2017, https://arxiv.org/abs/1710.09308. View at: Google Scholar
- I. Vujačić, I. Dattner, J. González, and E. Wit, “Time-course window estimator for ordinary differential equations linear in the parameters,”
*Statistics and Computing*, vol. 25, no. 6, pp. 1057–1070, 2015. View at: Publisher Site | Google Scholar - R. Yaari and I. Dattner, “Simode: R package for statistical inference of ordinary differential equations using separable integral-matching,” 2018, https://joss.theoj.org/papers/10.21105/joss.01850. View at: Google Scholar
- R. Yaari, I. Dattner, and A. Huppert, “A two-stage approach for estimating the parameters of an age-group epidemic model from incidence data,”
*Statistical Methods in Medical Research*, vol. 27, no. 7, pp. 1999–2014, 2018. View at: Publisher Site | Google Scholar - W. H. Lawton and E. A. Sylvestre, “Elimination of linear parameters in nonlinear regression,”
*Technometrics*, vol. 13, no. 3, pp. 461–467, 1971. View at: Publisher Site | Google Scholar - H. Wickham,
*Ggplot2. Elegant Graphics for Data Analysis*, Springer, New York, NY, USA, 2009. - I. M. Johnstone and B. W. Silverman, “Empirical Bayes selection of wavelet thresholds,”
*The Annals of Statistics*, vol. 33, no. 4, pp. 1700–1752, 2005. View at: Publisher Site | Google Scholar - H. W. Hethcote, “The mathematics of infectious diseases,”
*SIAM Review*, vol. 42, no. 4, pp. 599–653, 2000. View at: Publisher Site | Google Scholar - R. FitzHugh, “Impulses and physiological states in theoretical models of nerve membrane,”
*Biophysical Journal*, vol. 1, no. 6, pp. 445–466, 1961. View at: Publisher Site | Google Scholar - R. FitzHugh, “Mathematical models of excitation and propagation in nerve,” in
*Biological Engineering*, H. P. Schwan, Ed., pp. 1–85, McGraw-Hill, New York, NY, USA, 1969, vol. 9 of Inter-university electronics series. View at: Google Scholar - J. Nagumo, S. Arimoto, and S. Yoshizawa, “An active pulse transmission line simulating nerve axon,”
*Proceedings of the IRE*, vol. 50, no. 10, pp. 2061–2070, 1962. View at: Publisher Site | Google Scholar - A. L. Hodgkin and A. F. Huxley, “A quantitative description of membrane current and its application to conduction and excitation in nerve,”
*The Journal of Physiology*, vol. 117, no. 4, pp. 500–544, 1952. View at: Publisher Site | Google Scholar - J. O. Ramsay, G. Hooker, D. Campbell, and J. Cao, “Parameter estimation for differential equations: a generalized smoothing approach,”
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*, vol. 69, no. 5, pp. 741–796, 2007. View at: Publisher Site | Google Scholar - D. Campbell and R. J. Steele, “Smooth functional tempering for nonlinear differential equation models,”
*Statistics and Computing*, vol. 22, no. 2, pp. 429–443, 2012. View at: Publisher Site | Google Scholar - E. R. Tufte,
*The Visual Display of Quantitative Information*, Graphics Press, Cheshire, CT, USA, 2nd edition, 2001. - W. N. Venables and B. D. Ripley,
*Modern Applied Statistics with S. Statistics and Computing*, Springer, New York, NY, USA, 4th edition, 2002. - K. Mullen and I. van Stokkum, “TIMP: an R package for modeling multi-way spectroscopic measurements,”
*Journal of Statistical Software*, vol. 18, no. 3, pp. 1–46, 2007. View at: Publisher Site | Google Scholar

#### Copyright

Copyright © 2020 Itai Dattner et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.