Computational Methods for Identification and Modelling of Complex Biological SystemsView this Special Issue
Research Article | Open Access
A Parameter-Free Model Comparison Test Using Differential Algebra
We present a method for rejecting competing models from noisy time-course data that does not rely on parameter inference. First we characterize ordinary differential equation models in only measurable variables using differential-algebra elimination. This procedure gives input-output equations, which serve as invariants for time series data. We develop a model comparison test using linear algebra and statistics to reject incorrect models from their invariants. This algorithm exploits the dynamic properties that are encoded in the structure of the model equations without recourse to parameter values, and, in this sense, the approach is parameter-free. We demonstrate this method by discriminating between different models from mathematical biology.
Given competing mathematical models to describe a process, we wish to know whether our data are compatible with the candidate models. Often comparing models requires optimization and fitting time-course data to estimate parameter values and then apply an information criterion to select a “best” model . However sometimes it is not feasible to estimate the value of these unknown parameters (e.g., large parameter space, nonlinear objective function, nonidentifiability, etc.). In this paper, we compare candidate models with time-course data while avoiding the parameter estimation problem by considering a “parameter-free” approach.
The parameter problem has motivated the growth of fields that embrace a parameter-free flavor such as chemical reaction network theory and stoichiometric theory [2–4]. However many of these approaches are limited to comparing the behavior of models at steady-state [5–7]. Inspired by techniques commonly used in applied algebraic geometry  and algebraic statistics , methods for discriminating between possible models without estimating parameters have been developed for steady-state data [10, 11]. These approaches characterize a model in only observable variables—called a steady-state invariant —using techniques from computational algebraic geometry and determine whether the noisy steady-state data are compatible with this steady-state invariant via a statistical test. However, unlike other Bayesian and parameter estimation approaches, it does not select models; it can only rule them out. Notably the method does not require parameter estimation, hence there is the term parameter-free.
Extending the method developed in , we present a method for comparing models using time-course data instead of steady-state data. In this approach we compute input-output equations, which we refer to as input-output invariants for time series data. We consider state-space ordinary differential equations (ODE) models of the form and where are species variables, is a known input into the system, , is a known output (measurement) from the system, , is the unknown dimensional parameter vector, and the functions are rational functions of their arguments. The dynamics of the model can be observed in terms of a time series where is the input at discrete points and is the output.
In this setting, we aim to characterize our ODE models by eliminating variables that we cannot measure using differential elimination from differential algebra . From the elimination, we obtain a system of equations in 0, 1, and higher order derivatives forming the input-output invariants: , . Importantly, the coefficients of these equations are rational functions of the parameters. We will see shortly that, in the linear case, is a linear differential equation. For nonlinear models, is nonlinear. Computing input-output invariants is described in Section 2.
In order to test model compatibility, we substitute the data into the input-output invariant, which is given in the form of evaluated at given time points. This results in a linear system of equations, , where each row of and corresponds to the input-output invariant evaluated at a different time point. The components of are the coefficient functions of the parameters in the input-output invariants. The set-up of the model compatibility test is given in Section 3.
Then we ask: does there exist a such that . If , of course we are guaranteed a zero trivial solution and the nontrivial solution can be determined via a rank test (e.g., singular value decomposition, or SVD). Since data may be imperfect, we can perform the statistical criterion developed in  with the bound improved in  to determine whether or not to reject the model. One of the key differences in adapting this method to time-course data is considering when . For , there may be no solutions. Thus, we must check if the linear system of equations is consistent, i.e., has one or infinitely many solutions. We present a rank test, based on the SVD, for determining the compatibility of the data with input-output invariants from various (potentially incorrect) models. The linear solvability test is described in Section 4. We assume our data have Gaussian measurement noise. In Section 5, we derive a statistical cut-off for when the model is incompatible with the data.
Another key difference in this approach than previous parameter-free model discrimination methods is the occurrence of higher order derivatives of the input and output variables in the input-output invariants, requiring them to be known at various time instances. Often one does not have data points for the higher order derivatives, then these need to be estimated. Unlike numerical estimation or splines, which assume a specific functional form, one can use Gaussian Process Regression (GPR) to estimate the higher order derivatives from time-course data. In Section 6, we present such a method, which has previously been done for first and second derivatives of biological data . Bounding error of derivative estimates is a difficult problem, which requires us to remove certain data points; however, the advantage of GPR is that one can consider a family of functions, which  points out to be able to capture many more temporal trends in the data than any one equation. This enables us to substitute the newly estimated derivative data into the input-output invariant and test model compatibility using the solvability test with the statistical cut-off that we present in Sections 4 and 5.
2. Differential Elimination
We now give some background on differential algebra since a crucial step in our algorithm is to perform differential elimination to obtain equations purely in terms of input variables, output variables, and parameters. For this reason, we will only give background on the ideas from differential algebra required to understand the differential elimination process. For a more detailed description of differential algebra and the algorithms listed below, see [12, 14, 15]. In what follows, we assume the reader is familiar with concepts such as rings and ideals, which are covered in great detail in .
Definition 1. A ring is said to be a differential ring if there is a derivative defined on and is closed under differentiation. A differential ideal is an ideal which is closed under differentiation.
Let our differential ideal be equipped with a ranking, i.e., a total ordering, denoted , among the variables and their derivatives. Let and be arbitrary derivatives. Then the ranking should be such that, for arbitrary positive integer :
Let be the leader of a polynomial , which is the highest ranking derivative of the variables appearing in that polynomial. A polynomial is said to be of lower rank than if the order of is less than the order of or, whenever , the highest algebraic degree of any term containing the leader of is less than the highest algebraic degree of any term containing the leader of . A polynomial is reduced with respect to a polynomial if contains neither the leader of with equal or greater algebraic degree, nor its derivatives. If is not reduced with respect to , it can be reduced by using the pseudodivision algorithm in Section 2.1. A set of differential polynomials that are all reduced with respect to each other is called an auto-reduced set.
Two auto-reduced sets, and ordered in increasing rank so that , are ranked according to the following principle: if there is an integer , such that rank = rank , , rank rank , then is said to be lower rank than . If and rank = rank , , then is also said to be of lower rank than .
A useful description of a differential ideal is called a differential characteristic set, which is a finite description of a possibly infinite set of differential polynomials. We give the technical definition from .
Definition 2. Let be a set of differential polynomials, not necessarily finite. If is an auto-reduced set, such that no lower ranked auto-reduced set can be formed in , then is called a differential characteristic set.
A well-known fact in differential algebra is that differential ideals need not be finitely generated [12, 15]. However, a radical differential ideal is finitely generated by the Ritt-Raudenbush basis theorem . This result gives rise to Ritt’s pseudodivision algorithm (see below), allowing us to compute the differential characteristic set of a radical differential ideal. We now describe various methods to find a differential characteristic set and other related notions, and we describe why they are relevant to our problem; namely, they can be used to find the input-output equations.
In what follows, we will be considering the differential ring , where is the field of rational functions in the parameter vector . The variables in this differential ring are the states, the inputs, the outputs, and possibly their derivatives.
Consider an ODE system of the form and for with and rational functions of their arguments. Let our differential ideal be generated by the differential polynomials obtained by subtracting the right-hand-side from the ODE system to obtain and for . In what follows, we use the ranking in , which is given byNote that the notation reflects the fact that the ordering among the components of and is immaterial, since these are known variables, whereas different ordering of the components of may lead to different characteristic sets . With respect to this ordering, a differential characteristic set is of the form :where are differential polynomials. Note that the resulting system is not necessarily auto-reduced in , namely, may not be auto-reduced. The first terms of the differential characteristic set, , are those terms independent of the state variables and when set to zero form the input-output equations:Specifically, the input-output equations are polynomial equations in the variables with rational coefficients in the parameter vector . Note that the differential characteristic set is in general non-unique, but the coefficients of the input-output equations can be fixed uniquely by normalizing the equations to make them monic.
We now discuss several methods to find the input-output equations. The first method (Ritt’s pseudodivision algorithm) can be used to find a differential characteristic set for a radical differential ideal. The second method (RosenfeldGroebner) gives a representation of the radical of the differential ideal as an intersection of regular differential ideals and can also be used to find a differential characteristic set under certain conditions [18, 19]. Finally, we discuss Gröbner basis methods to find the input-output equations.
2.1. Ritt’s Pseudodivision Algorithm
An algorithm to find a differential characteristic set of a radical (in particular, prime) differential ideal generated by a finite set of differential polynomals is called Ritt’s pseudodivision algorithm. We describe the process in detail below, which comes from the description in . Note that our differential ideal as described above is a prime differential ideal [12, 20]. Let and be differential polynomials.(1)If contains the derivative of the leader of , is differentiated times so its leader becomes .(2)Multiply the polynomial by the coefficient of the highest power of ; let be the remainder of the division of this new polynomial by with respect to the variable . Then is reduced with respect to . The polynomial is called the pseudoremainder of the pseudodivision.(3)The polynomial is replaced by the pseudoremainder and the process is iterated using in place of and so on, until the pseudoremainder is reduced with respect to .
This algorithm is applied to a set of differential polynomials, such that each polynomial is reduced with respect to each other, to form an auto-reduced set. The result is a differential characteristic set. Note that the multiplication mentioned in Step (2) above may yield a nonequivalent system if that coefficient happens to belong to the ideal. However, in practice, this does not occur for the ODE systems studied .
Using the DifferentialAlgebra package in Maple, one can find a representation of the radical of a differential ideal generated by some equations, as an intersection of radical differential ideals with respect to a given ranking . Specifically, the RosenfeldGroebner command in Maple takes two arguments: sys and R, where sys is a list of set of differential equations or inequations which are all rational in the independent and dependent variables and their derivatives and R is a differential polynomial ring built by the command DifferentialRing specifying the independent and dependent variables and a ranking for them . Then RosenfeldGroebner returns a representation of the radical of the differential ideal generated by sys, as an intersection of radical differential ideals saturated by the multiplicative family generated by the inequations found in sys. This representation consists of a list of regular differential chains with respect to the ranking of R. Note that RosenfeldGroebner returns a differential characteristic set if the differential ideal is prime .
2.3. Gröbner Basis Methods
Finally, both algebraic and differential Gröbner bases can be employed to find the input-output equations. To use an algebraic Gröbner basis, one can take a sufficient number of derivatives of the model equations and then treat the derivatives of the variables as indeterminates in the polynomial ring in , , ,..., , , ,..., , , ,..., etc. Then a Gröbner basis of the ideal generated by this full system of (differential) equations with an elimination ordering where the state variables and their derivatives are eliminated first can be found. Details of this approach can be found in . Differential Gröbner bases have been developed by Carrà Ferro , Ollivier , and Mansfield , but currently there are no implementations in computer algebra systems .
3. Model Rejection Using Input-Output Invariants
We now discuss how to use the input-output invariants obtained from differential elimination (using Ritt’s pseudodivision, differential Gröbner bases, or some other method) for model selection/rejection.
We can write our input-output relations in (4), or input-output invariants, in the form:The functions are differential monomials, i.e., monomials in the input/output variables , , , etc., and the functions are rational functions in the unknown parameter vector . In order to uniquely fix the rational coefficients to the differential monomials , we normalize each input/output equation to make it monic. In other words, we can rewrite our input-output relations asHere is a differential monomial in the input/output variables , , , etc. If the values of ,, , etc., were known at a sufficient number of time instances , then one could substitute in values of and at each of these time instances to obtain a linear system of equations in the variables .
First consider the case of a single input-output equation. If there are unknown coefficients , we obtain the system:
We write this linear system as , where is an by matrix of the form: is the vector of unknown coefficients , and is of the form .
For the case of multiple input-output equations, we get the following block diagonal system of equations :where is a by matrix.
In the symbolic setting, given a general input function and generic initial conditions and parameters, this system should have a unique solution for , due to the persistence of excitation conditions . In other words, we assume that the vectors of differential monomials at various time points are linearly independent. This means the coefficients of the input-output equations can be uniquely determined in the generic setting . Note that we have assumed that the parameters are all unknown and we have not taken any possible algebraic dependencies among the coefficients into account.
The main idea of this paper is to translate the symbolic setting to the numerical setting and can be described as follows. Assume we have perfect data; i.e., we know values of , etc., at many time instances , perfectly. Given a set of candidate models, we find their associated input-output invariants and then substitute in our values of , etc., at time instances , thus setting up the linear system for each model. With perfect data and assuming the persistence of excitation conditions mentioned above, the solution to should be unique for the correct model, but there should, in theory, be no solution for each of the incorrect models. Thus under ideal circumstances, one should be able to select the correct model since the input/output data corresponding to that model should satisfy its input-output invariant. Likewise, one should be able to reject the incorrect models since the input/output data should not satisfy their input-output invariants.
However, with imperfect data, there could be no solution to even for the correct model, and likewise there may or may not be a solution to for an incorrect model. Thus, with imperfect data, one may be unable to select the correct model. On the other hand, if there is no solution to for each of the candidate models, then the goal is to determine how “badly” each of the models fails and rejects models accordingly.
A subtle point regarding this approach is that this model rejection technique works best if the models under consideration are in the simplest possible form. This means that, ideally, redundant parameters have been eliminated from the model so that the input-output equations are as reduced as possible; i.e., there are not extra columns in when considering the linear system . Extra columns generically mean more possible solutions, which can make it harder for our algorithm to reject incorrect models. However, redundant parameters, and the related notion of parameter unidentifiability, do not necessarily yield more coefficients in the input-output equations, as can be seen from the structure of input-output equations for linear compartment models as discussed in Section 8.
A related question to model compatibility is that of structural indistinguishability. Two models are structurally indistinguishable if for any choice of parameters in the first model there is a choice of parameters in the second model that will yield the same dynamics in both models, and vice versa . One way to test for structural indistinguishability of two models is to find the associated input-output equations and then equate their coefficient functions and attempt to solve for one set of parameters in terms of the other set of parameters, and vice versa . A necessary condition for models to be structurally indistinguishable is to have input-output equations with the same differential monomial terms. Since our approach only considers the structure of the input-output equations and not the specific coefficient functions, it is possible to have several different models, all with the same structure of their input-output equations, to be compatible using our model compatibility test. Thus, if a given model is found to be compatible, then any model that is structurally indistinguishable from that model is also compatible and thus our approach and structural indistinguishability testing can be applied in parallel. For more on structural indistinguishability, see [28–30]. The specific form of the coefficients of the input-output equations is considered in Section 8.
We now describe criteria to reject models.
4. Linear Solvability
Let and consider the linear systemHere, we study the solvability of (10) under noisy perturbation of both and . Let and denote the perturbed versions of and , respectively, and assume that and depend only on and , respectively (see Section 5). Our goal is to infer the unsolvability of the unperturbed system (10) from observation of and only.
Our method is based on detecting the rank of an augmented matrix, but first let us introduce some notation. The singular values of a matrix will be denoted by (Note that we have trivially extended the number of singular values of from to .) The rank of is written . The range of is denoted . Throughout, refers to the Euclidean norm.
The basic strategy will be to assume as a null hypothesis that (10) has a solution, i.e., , and then to derive its consequences in terms of and . If these consequences are not met, then we conclude by contradiction that (10) is unsolvable. In other words, we will provide sufficient but not necessary conditions for (10) to have no solution; i.e., we can only reject (but not confirm) the null hypothesis. We will refer to this procedure as testing the null hypothesis.
We first collect some useful results.
Theorem 3 (Weyl’s inequality). Let . Then
Corollary 4. Let and assume that . Then
Therefore, if (13) is not satisfied, then .
4.2. Augmented Matrix
Assume the null hypothesis. Then , so . Therefore, . But we do not have access to and so must consider instead the perturbed augmented matrix .
Theorem 5. Under the null hypothesis,
Proof. Apply Corollary 4.
Remark 6. This approach can fail to correctly reject the null hypothesis if is (numerically) low-rank.
Remark 7. In principle, we should test directly the assertion that . However, we can only establish lower bounds on the matrix rank (we can only tell if a singular value is “too large”), so this is not feasible in practice. An alternative approach is to consider only numerical ranks obtained by thresholding. How to choose such a threshold, however, is not at all clear and can be a very delicate matter especially if the data have high dynamic range.
Remark 8. The theorem is uninformative if since then trivially. However, this is not a significant disadvantage beyond that described above since if is full-rank, then it must be true that (10) is solvable.
4.3. Example: Perfect Data
As a proof of principle, we first apply Theorem 5 to a simple linear model. We start by taking perfect input and output data and then add a specific amount of noise to the output data and attempt to reject the incorrect model. In the subsequent sections, we will see how to interpret Theorem 5 statistically under a particular “noise” model for the perturbations.
Here, we take data from a linear 3-compartment model, add noise, and try to reject the general form of the linear 2-compartment model with the same input/output compartments. Linear compartment models are defined in Section 8. In practice, one would like to compare models with the same input and output compartments, as the number of compartments involved may not be known, but the injection and measurement compartments would be known. Thus, in this particular example, we are assuming the linear 3-compartment model is the “true model” and want to reject a competing model, but as our method concerns model rejection and not model selection, this notion of a “true model” is not a requirement for our method to work.
Example 9. Let our model be a 3-compartment model of the following form:Here we have an input to the first compartment of the form and the first compartment is measured, so that represents the output. Note that we have chosen a smooth, persistently exciting input function  so that derivatives can be taken and the coefficients of the input-output equation can be uniquely determined, as required. The solution to this system of ODEs can be easily found of the form:
so that .
The input-output equation for a 3-compartment model with a single input/output to the first compartment has the form:where are the coefficients of the characteristic polynomial of the matrix and are the coefficients of the characteristic polynomial of the matrix which has the first row and first column of removed .
We now substitute values of at time instances into our input-output equation and solve the resulting linear system of equations for . We get that , which agrees with the coefficients of the characteristic polynomials of and .
We now attempt to reject the 2-compartment model using 3-compartment model data. We find the input-output equations for a 2-compartment model with a single input/output to the first compartment, which has the form:where again are the coefficients of the characteristic polynomial of the matrix and is the coefficient of the characteristic polynomial of the matrix which has the first row and first column of removed.
We substitute values of at time instances into our input-output equation and attempt to solve the resulting linear system of equations for .
The singular values for the matrix with the substituted values of at time instances areThe singular values of the matrix with the substituted values of at time instances areSince the smallest singular value is greater than zero (or order machine precision), it is evident that the 2-compartment model can be rejected.
We now add noise to our matrix in the following way. To each entry , and , we add where is a random real number between and , and . Then the noisy matrix has the following singular values:We add noise to our vector in a similar way. To each entry , we add where is again a random real number between and . Then the noisy matrix has the following singular values:To determine whether the noisy data are compatible, we need to compute . Due to the specific noise model chosen, this can be bounded independently of the true unobservable data as , where is a matrix of all ones of the appropriate size (the actual norm is ). Since this norm is less than the smallest singular value , we can reject this model. Thus, using noisy 3-compartment model data, we are able to reject the 2-compartment model.
5. Statistical Inference
We now consider the statistical inference of the solvability of (10). First, we need a noise model.
5.1. Noise Model
If the perturbations and are bounded, e.g., and for some (representing a relative accuracy of in the “measurements” and ), then Theorem 5 can be used at once. However, it is customary to model such perturbations as normal random variables, which are not bounded. Here, we will assume a noise model of the form where is a (computable) matrix that depends only on and similarly with , denotes the Hadamard (entrywise) matrix product , and is a matrix-valued random variable whose entries are independent standard normals.
In our application of interest, the entries of depend on those of as follows. Let for some input vector , but suppose that we can only observe the “noisy” vector . Then the corresponding perturbed matrix entries are By the additivity formulafor standard Gaussians, Therefore, so, to first order in , An analogous derivation holds for .
The basic strategy is now as follows. Let be a test statistic, i.e., in Section 4.2. Then since where we have made explicit the dependence of both sides on the same underlying random mechanism , the (cumulative) distribution function of must dominate that of , i.e., Thus,
Using (31a)–(31d), we can associate a -value to any given realization of by referencing upper tail bounds for quantities of the form . Recall that under the null hypothesis. In a classical statistical hypothesis testing framework, we may therefore reject the null hypothesis if (31d) is at most , where is the desired significance level (e.g., ).
5.2. Hadamard Tail Bounds
We now turn to bounding , where we will assume that . This can be done in several ways.
One easy way is to recognize thatwhere is the Frobenius norm, so But has a chi distribution with degrees of freedom. Therefore, However, each inequality in (32) can be quite loose; a slightly better approach is to use the inequality  where and denote the th row and th column, respectively, of . The term can then be handled using a chi distribution via as above or directly using a concentration bound (see below). Variations on this undoubtedly exist.
Theorem 10. Let , where each . Then for any ,
5.3. Test Statistic Tail Bounds
The bound (31d) for can then be computed as follows. Let so that . Then by Theorem 10, where and are the “variance” parameters in the theorem for and , respectively. This simplifies to on completing the square. Now set so that the integral becomes The variable substitution then gives where is the standard normal distribution function. Thus,A similar analysis yieldsEquations (44) and (45) together comprise the probability bound on the null hypothesis that we will use hereafter.
6. Gaussian Processes to Estimate Derivatives
We next present a method for estimating higher order derivatives and the estimation error using Gaussian Process Regression and then apply the input-output invariant method to both linear and nonlinear models in the subsequent sections.
A Gaussian process (GP) is a stochastic process , where is a mean function and a covariance function. GPs are often used for regression/prediction as follows.
Suppose that there is an underlying deterministic function that we can only observe with some measurement noise as , where for the Dirac delta. We consider the problem of finding in a Bayesian setting by assuming it to be a GP with prior mean and covariance functions and , respectively. Then the joint distribution of at the observation points and at the prediction points isThe conditional distribution of given is also Gaussian:where are the posterior mean and covariance, respectively. This allows us to infer on the basis of observing . The diagonal entries of are the posterior variances and quantify the uncertainty associated with this inference procedure. In particular, the square roots of these variances give us estimates on the term in the assumed noise model in Section 5.
6.1. Estimating Derivatives
Equation (48) provides an estimate for the function values . What if we want to estimate its derivatives? Let for some covariance function . Then by linearity of differentiation. Thus, where is the prior mean for and . This joint distribution is exactly of the form (47). An analogous application of (48) then yields the posterior estimate of for all .
Alternatively, if we are interested only in the posterior variances of each , then it suffices to consider each block independently: The cost of computing can clearly be amortized over all .
6.2. Formulae for Squared Exponential Covariance Functions
We now consider the specific case of the squared exponential (SE) covariance function where is the signal variance and is a length scale. The SE function is one of the most widely used covariance functions in practice. Its derivatives can be expressed in terms of the (probabilists’) Hermite polynomials (these are also sometimes denoted ). The first few Hermite polynomials are , , and .
We need to compute the derivatives . Let so that . Then and . Therefore,
The GP regression requires us to have the values of the hyperparameters , , and . In practice, however, these are hardly ever known. In the examples below, we deal with this by estimating the hyperparameters from the data by maximizing the likelihood. We do this by using a nonlinear conjugate gradient algorithm, which can be quite sensitive to the initial starting point, so we initialize multiple runs over a small grid in hyperparameter space and return the best estimate found. This increases the quality of the estimated hyperparameters but can still sometimes fail.
We showcase our method on competing models: linear compartment models (2 and 3 species), Lotka-Volterra models (2 and 3 species) and Lorenz. We compute the input-output invariants of the Lotka-Volterra and Lorenz using RosenfeldGroebner. The method to compute the linear compartment input-output invariants is presented in the following section. We simulate each of these models to generate time-course data, add varying levels of noise, and estimate the necessary higher order derivatives using GP regression. Using the estimated GP regression data, we test each of the models using the input-output invariant method on other models.
Example 11. The two species Lotka-Volterra model is where and are variables and are parameters. We assume only is observable and perform differential elimination and obtain our input-output invariant in terms of only :
Example 12. By including an additional variable , the three species Lotka-Volterra model is assuming only is observable. After differential elimination, the input-output invariant is