Complexity

Volume 2018, Article ID 2380650, 10 pages

https://doi.org/10.1155/2018/2380650

## The Union between Structural and Practical Identifiability Makes Strength in Reducing Oncological Model Complexity: A Case Study

^{1}Department of Information Engineering, University of Padova, Via Gradenigo 6/b, 35131 Padova, Italy^{2}IEIIT-CNR, c/o Department of Information Engineering, University of Padova, Via Gradenigo 6/a, 35131 Padova, Italy

Correspondence should be addressed to Maria Pia Saccomani; ti.dpinu.ied@aip

Received 9 September 2017; Accepted 14 January 2018; Published 11 February 2018

Academic Editor: Peter Giesl

Copyright © 2018 Maria Pia Saccomani and Karl Thomaseth. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Mathematical models are increasingly proposed to describe tumor’s dynamic response to treatments with the aims of improving their efficacy. The most widely used are nonlinear ODE models, whose identification is often difficult due to experimental limitations. We focus on the issue of parameter estimation in model-based oncological studies. Given their complexity, many of these models are unidentifiable having an infinite number of parameter solutions. These equivalently describe experimental data but are associated with different dynamic evolution of unmeasurable variables. We propose a joint use of two different identifiability methodologies,* structural identifiability* and* practical identifiability*, which are traditionally regarded as disjoint. This new methodology provides the number of parameter solutions, the analytic relations between the unidentifiable parameters useful to reduce model complexity, a ranking between parameters revealing the most reliable estimates, and a way to disentangle the various causes of nonidentifiability. It is implementable by using available differential algebra software and statistical packages. This methodology can constitute a powerful tool for the oncologist to discover the behavior of inaccessible variables of clinical interest and to correctly address the experimental design. A complex model to study “in vivo” antitumor activity of interleukin-21 on tumor eradication in different cancers in mice is illustrated.

#### 1. Introduction

Many mathematical models of tumor growth at different levels from gene expression to the macroscopic tumor development have been formulated, [1–11]. Recently, mathematical models of tumor-immune interactions have also been considered to evaluate the efficacy of immunotherapy in the context of tumor challenge. Models of cancer treatments, both chemotherapy or/and immunotherapy, have been also widely employed. Furthermore, pharmacokinetic-pharmacodynamic (PK-PD) models are developed to describe the interaction between tumor growth, drug absorption, and effect of the drug in terms of patients’ response to therapies. Among different mathematical frameworks used to describe these models, the most widely used in* mathematical oncology* are based on nonlinear ordinary differential equations (ODE). These models are being intensely studied to describe the complex processes of tumorigenesis and to produce an integrated mathematical view of tumorigenesis, cancer progression, and evaluation of anticancer agents under different oncological settings. Often, mathematical models are combined with optimal control techniques for quantitatively describing tumor progression and optimal treatment planning. Clinically validated mathematical models have been proposed for the development of the so-called “virtual patient” [12] to accurately predict efficacy and toxicity of various oncological therapeutic combinations in individuals and in populations. Many of these models have been successfully simulated and validated against clinical observations, promoting modeling and optimization control as a therapy planning tool in clinic.

Given the increased model complexity required to describe the more and more available data, many of the models employed in oncological studies are unidentifiable; that is, they have an infinite number of parameter solutions. However, this problem is not always recognized. Typically, these solutions are equivalently describing experimental data, but they are associated with different dynamic evolution of the not directly measurable variables. Such a situation is undesirable and frustrates one of the most useful aspects of mathematical models, that is, that of providing a means to infer* unobservable* quantities and time-varying phenomena.

By starting from these observations, in this paper, we focus on unidentifiable models, in particular on the mathematical issue of parameter estimation [13] in model-based oncological studies. This issue will become crucial also to increase the precision of the recently proposed treatments personalization methods.

To guarantee goodness and reliability of the parameter estimation results, we propose a new joint use of two different identifiability methodologies, namely,* structural identifiability* and* practical identifiability*, which are traditionally regarded as disjoint because they are based, in turn, on differential algebraic manipulations or numerical simulation of systems equations. Nevertheless, also the structural analysis can provide useful practical information if applied in a particular point of the admissible parameter space of clinical interest.

We first propose an algorithmic method to count the number of parameter solutions of a model, technically speaking to check the* structural identifiability* of the model [14, 15]. In particular, if the model parameterization is unique (*global identifiable* model), the numerical estimate of the unknown parameter provided by whatever optimization algorithm is correct and allows arriving at reliable conclusions.

From a methodological point of view, testing structural identifiability before collecting experimental data is an essential prerequisite for assessing whether the experimental design is adequate for a hypothesized model and whether the parameter estimation problem is well posed. In the model-based oncological studies literature, however, the identifiability issue is still neglected, and collection of experimental data precedes the formulation of mathematical models, which is often carried out by trial and error by fitting different model structures to the acquired data. This approach is surely dictated by the fact that there are many software tools to perform parameter estimation, while checking identifiability in some cases can be prohibitively complex, for example, for large models containing many states and parameters. However, if the postulated model has an infinite number* (unidentifiable)* of parameter solutions, the parameter estimates that could still be obtained by some numerical optimization algorithms would be unreliable and vary randomly depending, for instance, on the initialization of the algorithm [15]. Vice versa, in case of nonidentifiability, the outcome of structural identifiability, that is, the Gröbner bases of the exhaustive summary, provides algebraic nonlinear equations which define the relations between unidentifiable parameters [16, 17]. These equations describe the equivalence class of parameters with respect to their ability to describe the output function. Just by inspection, one can know the degrees of freedom of the system and thus which are the redundant parameters of the model. In particular, the analytic expressions of the dependencies among parameters can be included in the original model in order to reduce its number of parameters and to define an equivalent identifiable model of reduced complexity.

Although necessary, structural identifiability is not sufficient to guarantee an accurate identification of the model parameters from real, possibly noisy, input/output data. The parameter estimates obtained by standard algorithms, even for a structurally identifiable model, may be very sensitive to noise and a measure of this sensitivity can be important in applications. Thus, there is a need to perform, besides structural identifiability tests, a* practical identifiability* analysis. In many studies, to have an idea of how much the outputs of the model are biased by the parameter values, the sensitivity of the output functions with respect to each parameter is calculated. Usually the parameter correlations are also calculated to try to identify, by trial and error, which are the right parameters to fix in order to calculate the others. In this paper, we show how to analytically calculate the relations between the unidentifiable parameters. These relations remain completely hidden to the investigator when calculating the correlations. From these, it is immediate to know which parameter to fix in order to analytically calculate all the correlated parameters.

In principle, the whole proposed methodology can be checked by suitable mathematical procedures directly on the model, without the need for collecting experimental data. This may avoid waste of resources for doing uninformative experiments, given the high costs, not only in economic terms, of oncological experiments. Only structural identifiability can be tested without assuming prior knowledge on parameter values, whereas practical identifiability, based on sensitivity analysis, requires “nominal” parameter values for numerical simulation [18].

We choose to illustrate our procedure and to show our results in a simple linear model.

Finally, this paper aims to demonstrate that, in oncological studies, when a model is formulated and its parameters need to be estimated from available measurements, checking the uniqueness of the parameter solution is crucial. Since the conclusions of model-based oncological studies are generally founded on the numerical estimates of the unknown parameters from experimental data, by neglecting all the solutions of an unidentifiable parameter (except that estimated with an optimization algorithm), the investigator can arrive at totally erroneous conclusions. Furthermore, the calculation of the analytic relations between the unidentifiable parameters is an effective approach to discover the behavior of nonaccessible variables of clinical interest as well as for scheduling cancer therapy by guaranteeing the reliability of the results. In order to do this, we will apply our methodology to a relevant benchmark model. This is a mathematical model to study and evaluate the “in vivo” antitumor activity of interleukin-21 (IL-21) on tumor eradication in different cancers in mice [4, 8].

#### 2. Mathematical Background

##### 2.1. Definitions

This section provides the reader with the definitions that are necessary to set the notations used in the paper. Consider a nonlinear dynamic system described in state space form aswith state , input ranging on some vector space of piecewise smooth (infinitely differentiable) functions, output , and constant unknown parameter vector belonging to some open subset . Whenever initial conditions are specified, the relevant equation is added to the system. The essential assumption here is that the functions and are vectors of* rational functions* in . We also assume that there is no feedback, so that is a free variable not depending on .

In the following, we will assume that the input-output map of system (1) which started at the initial state exists and we denote it with

##### 2.2. Structural Identifiability via Differential Algebra

In general, the assessment of parameter values of ODE models can only be approached indirectly as a parameter estimation problem starting from external, input-output measurements. A basic question is whether the parameters of the model can be determined (uniquely) from input-output measurements, at least for suitable input functions, assuming that all observable variables are error-free. This is a mathematical property called a priori or* structural identifiability* of the model. It is a* property of the model alone* and of course it depends on how it is parameterized. Structural identifiability can (and should) in principle be checked before collecting experimental data.

We adopt the definitions of structural identifiability used in Saccomani et al.’s work [16].

*Definition 1. *System (1) is a priori* globally (or uniquely) identifiable from input-output data* if, for at least a generic set of points , there exists (at least) one input function such that the equationhas only one solution for almost all initial states .

If (3) has generically an infinite number of solutions for all input functions , system (1) is* unidentifiable*.

We will apply structural identifiability based on differential algebra and on the free dedicated software DAISY (Differential Algebra for Identifiability of SYstems), [19]. The reader is referred to Audoly et al. [14] and Saccomani et al. [16] for a detailed explanation of the theory behind the software tool and to Bellu et al. [19] for the algorithm.

Briefly, this algorithm permits eliminating the unobserved state variables from system (1) and finding the* input-output relation*: a set of polynomial differential equations involving only the variables and their time derivatives describing all input-output pairs satisfying the original dynamic system. The coefficients of the input-output relation provide a set of (nonlinear) algebraic functions of the unknown parameter of the original model. These functions form the* exhaustive summary* of the model. They appear linearly in the input-output relation so that they can be easily extracted. Identifiability is tested by checking injectivity of the exhaustive summary function with respect to parameter . By applying Buchberger’s computer algebra algorithm [20], it is possible to compute a Gröbner basis of the system. This algorithm represents a common generalization for nonlinear equations and for more variables of the Gaussian and the Euclidean algorithm, respectively. In particular, the Gröbner basis allows counting the number of solutions of the unknown parameter and shows if parameters satisfy algebraic relations or have instead a one-to-one relation with the exhaustive summary, in which case the model is* globally identifiable*.

DAISY automatically ranks the input, output, state variables and their derivatives, starts the pseudodivision algorithm, and calculates the differential polynomials which form the* input-output relation* of the model. Buchberger’s algorithm is then applied to the (nonlinear) algebraic equations obtained after equating the coefficients of the input-output relation to a set of pseudo-randomly chosen numerical points in their range set. DAISY calculates the Gröbner basis of this algebraic nonlinear system and provides the identifiability results holding in all the parameter space.

In general, a Gröbner basis can be represented aswhere are algebraic nonlinear polynomials.

The possibly finite or infinite multiple solutions of the system of equations in the unknowns ,provide a parametrization of , which satisfies (3).

In the case of unique identifiability, the Gröbner basis functions become simplyIn case of a unidentifiable model, that is, at least one parameter is unidentifiable, the Gröbner basis (4) provides the analytic relations between the unidentifiable parameters which hold in the whole admissible parameter space, not only around a given parameter value [16]. The solutions of (5) provide a uniquely identifiable parametrization of , as function of the known parameters and with unknowns that are unidentifiable. These unknowns are free parameters that can be assigned arbitrary values without affecting the input-output relationship (3).

##### 2.3. Practical Identifiability via Sensitivity Analysis

In the literature,* practical identifiability* [21, 22] is generally understood as a study of the sensitivity of some criterion function, for example, the likelihood, with respect to the parameters to be estimated, in particular with the purpose of detecting sensitive or nonunique minima. This can be done on more realistic models which explicitly involve noise in the measurements and may use actual measurement data subject to disturbances of various nature. Checking practical identifiability by data-based (or simulation-based) procedures cannot, however, provide a mathematically rigorous answer to the uniqueness problem.

However, since for a fixed input function the parameter estimates should minimize a criterion function which depends, besides the parameter vector, on the actual output function, the nonuniqueness of minima can also be studied by studying the sensitivity of the output with respect to parameter variations. It should be evident from the very problem setting that this sensitivity should play a key role in identifiability analysis: obviously a model whose output has zero sensitivity with respect to some parameter variations is clearly indicative of nonidentifiability. But the role goes much farther, since likelihood optimization generally requires the calculation of the gradient of the cost function which in turn depends on the output sensitivities.

The choice of a particular approach for testing practical identifiability can lead, however, to inconclusive or only qualitative results, particularly if random noise is added to numerical simulations in the attempt of making them more realistic. Large unpredictable errors can also occur if model output sensitivities are determined numerically by finite difference approximation, which are easily affected by roundoff errors if sensitivities with respect to parameters are very small. This latter kind of errors can be avoided by algorithmic differentiation. Still, random noise in sensitivity analysis may obfuscate deterministic relationships among parameters that can be assessed only through analytic mathematical approaches.

The practical parameter identification framework considered is based on (simulated) noisy measurements of the dynamic system (see (1)) output taken over a finite horizon at discrete time points ; that is,For notation simplicity, it is assumed that . To check* practical identifiability* from a finite set of input-output measurements, one can form the average weighted squared* prediction error*:where is the output predictor based on a generic parameter value . Assume that has only one absolute minimum,compared to Ljung [13]. According to Raue et al. [23], the component of the parameter is* practically unidentifiable* if the one-dimensional confidence region about extends to infinity. Naturally this statement cannot be checked exactly with real data and needs to be interpreted as an asymptotic statement for sample sizes when has an asymptotic distribution of type. Approximate confidence regions of parameter estimates can be, however, calculated a priori from the Fisher information matrix or simply from the rank of the sensitivity matrix formed aswhere and are the sensitivities of model outputs at sampling times of the output components with respect to the parameter vector . Without loss of generality, but not without side effects because sensitivity analysis is susceptible to parameter scaling, (10) can be thought of as normalized sensitivities according to various possible definitions. For instance, with heteroscedastic measurement noise, for example, in (8), to reduce the effect of parameter scaling and without assumptions on measurement noise, we consider the following sensitivities as derivatives of (unnormalized) model outputs with respect to* fractional* parameter variations or* logarithmic* derivatives; that is, = .

The above theory is implemented in almost all the statistical model fitting software usually based on the quadratic approximation of the likelihood function or also, for example, on Monte Carlo simulation. The reader is referred to AMIGO [24], PLE [23], and COPASI [25] for the biological and physiological models and to NONMEM [26] and ADAPT [27] for population pharmacokinetic and pharmacokinetic-pharmacodynamic modeling.

*Practical identifiability* is then tested here by the long-standing Principal Component Analysis, for example, Vajda et al. [28], for which we can finally give the following formal definition.

*Definition 2. *The system described by (1) and (7) identified by nonlinear least squares is* practically identifiable* if the sensitivity matrix has full rank.

This can be ascertained through Singular Value Decomposition (SVD) which provides the following factorization:

where and are the orthonormal eigenvector matrices of and , respectively, and is diagonal (referring to the top submatrix) with sorted* singular values *, which are also the square roots of the eigenvalues of the positive-semidefinite matrix . The theoretical (practical) rank of is defined as the smallest at which (, with being a user-defined threshold). A known application of SVD consists in representing estimated parameter vectors as linear combinations of the first eigenvectors of , with* significance ranking* given by the singular values [18].

In contrast to this common application, in this paper, we aim to exploit the results of SVD on the lower end of singular values to provide a ranking between unidentifiable parameters, possibly to distinguish between* strictly* structurally and* loosely* practically unidentifiable parameters. The working hypothesis is that structurally unidentifiable parameters should be associated exactly with zero singular values, whereas practically nonidentifiability should be more vaguely defined.

The SVD algorithm is implemented in all the general purpose software, as MATLAB (MathWorks, Inc., USA) or R [29], normally through standard linear algebra packages, for example, LAPACK [30].

#### 3. New Perspectives on the Joint Use of Structural and Practical Identifiability Analysis

Structural identifiability analysis can provide, compared to practical identifiability analysis based on sensitivities, a much deeper insight into the properties of a given model and can indicate how to reparameterize a model that turns out to be unidentifiable. However, the analytic methods can be usefully employed also a posteriori, after having performed some preliminary model fitting to data. For this purpose, there is in fact no need of having all parameters identifiable, because tuning a model to experimental data is feasible even with overparameterized unidentifiable models [18]. In this context, the structural result may indicate which is the identifiable subset of the parameter vector.

The aim of the present paper is to exploit properties and results of structural identifiability to provide an analytic approach for interpreting parameter estimation results. For this purpose, we summarize already mentioned properties of Gröbner bases.

Practical identifiability techniques are essentially based on simulations and on the study of the level curves of a cost function, typically the likelihood function. Assuming that the minimization yields a* unique* parameter value, the level curves around the minimum define the confidence region. Nonidentifiability is defined in terms of diverging confidence regions in some direction above given thresholds. Instead, structural identifiability provides a dichotomous answer that does not depend on parameter values.

In case of global identifiability, sensitivity analysis proceeds in the classical way to show if for a given experiment design parameters still remain identifiable also in the practical situation around a nominal point (local identifiability). If the parameter turns out to be practically unidentifiable, only if structural identifiability of the model has been first tested is it possible to know whether there is a problem with experiment design or with the model structure, problems that must be solved differently in the two cases, to reach identifiability.

Here we show how practical identifiability analysis, based on model output sensitivities, can take advantage of information provided by structural identifiability analysis based on differential algebra by applying the following line of reasoning.

A practical numerical approach useful to assess (non)identifiability around a nominal point (locally) is to consider the linear approximation of (5) and to evaluate whether small admissible perturbations in the parameter exist. That is, if the expressionis satisfied, locally, the perturbation belongs to the null-space of the vector columns of or, geometrically, lies on the tangent plane to the constraint surface (5).

In particular, by exploiting the results of SVD, we will see that this joint use of structural and practical identifiability analysis allows the following:(1)Distinguishing between identifiable and unidentifiable parameters(2)Providing a uniquely identifiable parameterization to reduce the complexity of the model and to correctly proceed with optimization techniques(3)Exploiting the analytic relations among unidentifiable parameters described by Gröbner basis. The investigator can thus choose which parameter is convenient to fix, on the basis of its a priori knowledge, to analytically derive the related ones(4)Ordering the parameters with respect to their ability to influence the output function. This provides a useful suggestion to the oncologist by indicating which parameters are going to be estimated more precisely than others from the experimental data.

#### 4. A Simple Example

The usefulness of the joint use of structural and practical identifiability analysis to analytically calculate the relation between correlated parameters is applied to a typical example of unidentifiable model: the two-compartment, single-input single-output model, with one accessible pool and elimination from both compartments. The model is linear in the input-output relationship but nonlinear in the parameters, which therefore does not lessen significance of the example. The model is described by the following equations:To carry out numerical calculations, the following arbitrary nominal parameter values are assumed: . The input is modeled as triangular-shaped profile with unit area.

We first perform the* Practical identifiability*, based on sensitivities of the model output trajectory with respect to the parameters, calculated at some nominal values . Results are shown in Figure 1, which does not evidence the fact that sensitivities are correlated. This information is provided by SVD, where the singular values computed numerically are which clearly supports the conclusion that the sensitivity matrix has a reduced rank revealing that the model is overparameterized. Thus it should be simplified by reducing the degree of freedom.