BioMed Research International

Volume 2015 (2015), Article ID 454765, 21 pages

http://dx.doi.org/10.1155/2015/454765

## Simultaneous Parameters Identifiability and Estimation of an *E. coli* Metabolic Network Model

^{1}Programa de Engenharia Química-COPPE, Universidade Federal do Rio de Janeiro, Cidade Universitária, 21941-972 Rio de Janeiro, BR, Brazil^{2}Instituto de Química, Universidade do Estado do Rio de Janeiro, São Francisco Xavier 524, 20550-900 Rio de Janeiro, BR, Brazil^{3}Planta Piloto de Ingeniería Química-CONICET, Universidad Nacional del Sur, Camino La Carrindanga, Km 7, 8000 Bahía Blanca, Argentina

Received 31 May 2014; Revised 29 August 2014; Accepted 5 September 2014

Academic Editor: Eugénio Ferreira

Copyright © 2015 Kese Pontes Freitas Alberton et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

This work proposes a procedure for simultaneous parameters identifiability and estimation in metabolic networks in order to overcome difficulties associated with lack of experimental data and large number of parameters, a common scenario in the modeling of such systems. As case study, the complex real problem of parameters identifiability of the *Escherichia coli* K-12 W3110 dynamic model was investigated, composed by 18 differential ordinary equations and 35 kinetic rates, containing 125 parameters. With the procedure, model fit was improved for most of the measured metabolites, achieving 58 parameters estimated, including 5 unknown initial conditions. The results indicate that simultaneous parameters identifiability and estimation approach in metabolic networks is appealing, since model fit to the most of measured metabolites was possible even when important measures of intracellular metabolites and good initial estimates of parameters are not available.

#### 1. Introduction

The development of mathematical model for metabolic networks has been severely hampered by the lack of kinetic information [1–4]. Usually, available experimental data are obtained under different conditions using heterogeneous techniques, whose choice must be done according to the observation of a specific phenomenon of interest on the pathways [2, 4–6]. In such systems, the type of experiment, sampling method, and the mathematical interpretation of the data depend on the desired experimental information [5]. However, as pointed out by Costa et al. [4], kinetic information presented in the literature about metabolic network models is scarce and often confuse; thus, other strategies are adopted in detriment to the dynamic simulation of such systems.

Mathematically, metabolic networks are described by complex dynamics models, whose structure is composed by ordinary differential equations that represent mass balance of the substrate, biomass, products and intracellular metabolites crucial on the pathways, and numerous reaction rates regarding to the pathways. Such mathematical structure presents a large number of parameters, for which the estimation procedure demands a considerable number of experimental data. Since experimentally metabolic networks are only partially observed, only a fraction of the intracellular metabolites considered in the mathematical model can be directly measured and thus initial conditions should also be estimated. Unfortunately, in metabolic network systems, lack of experimental data is almost unavoidable, which compromises the reliability of reactions rates proposition and makes the estimation of all parameters unfeasible. Thus, such problems often require the use of parameters identifiability procedures.

Parameters identifiability procedures deal with ill-posed parameters estimation problems, selecting a subset of parameters that can be estimated when the estimation of all parameters is not possible [7]. In most procedures, parameters are ranked from most estimable to least estimable based on the structure of the model, the experimental measurements and their uncertainties, and the uncertainties of the initial estimates [8].

Several studies reported in the literature addressing the parameters identifiability in metabolic networks are exclusively concentrated in the model structure, called* structural identifiability* (e.g., Davidescu and Jørgensen [9]; Roper et al. [10]; Nikerel et al. [11]), and do not take into account the available experimental data. On other side, the* practical identifiability* (e.g., Srinath and Gunawan [12]) investigates if the available experimental data are appropriate and sufficient for achieving a reliably estimation of the model parameters.

Although the structural identifiability is a necessary condition, the practical identifiability must overcome additional difficulties, including the selection of parameters with low sensitivity on the model predictions and correlation among parameters [13]. Unfortunately, such analyses in complex models depend on the values of parameters [13], generally unavailable [8].

Since sensitivity analysis is a key tool of identifiability procedures, procedures that evaluate the identifiability only based on initial parameters values can lead to a subset of selected parameters whose estimation may lead to an ill-posed problem [7]. A strategy to soften this problem is to perform simultaneous parameters selection and estimation, which ensures the estimation of the selected parameters (e.g., Secchi et al. [14], Wu et al. [15]; Wu et al. [16]; McLean et al. [17]; Alberton et al. [18]).

Several procedures adopt as stop criterion the singularity of FIM (Fisher Information Matrix) (e.g., Weijers and Vanrolleghem [19]; Sandink et al. [20]; Li et al. [21]; Secchi et al. [14]; Lund and Foss [22]; Thompson et al. [8]; Alberton et al. [18]). The singularity of the FIM matrix, calculated only with the selected parameters, indicates the point in which estimation problem becomes ill-posed. When such point is reached, the parameters selection is stopped and the remaining parameters are admitted as nonidentifiable parameters. Particularly when identifiability performs simultaneous parameters selection and estimation (e.g., Secchi et al. [14]; Wu et al. [15]; Wu et al. [16]; McLean et al. [17]; Alberton et al. [18]), it is not desirable to keep the remaining parameters as nonidentifiable without evaluation of their estimation potential, because the estimation problem is modified at each selected parameter.

A great challenge to be overcame in identifiability procedures, even those which include simultaneous estimation, is that the nonselected parameters are evaluated based on their initial estimates, which are probably inadequate. The literature addresses Monte Carlo techniques [23] and simultaneous parameters reestimation for assuring well posed estimation of the selected parameters (e.g., Secchi et al. [14], Wu et al. [15]; Wu et al. [16]; McLean et al. [17]; Alberton et al. [18]). As more proper, the evaluation of subsequent parameters to be selected should be done based on the reestimated selected parameters values [17, 18], reducing the dependence on the initial parameters estimates. In an interesting work, McLean et al. [17] developed an algorithm which allows evaluating the identifiability of all parameters of the model; such procedure reestimate selected parameters and use these reestimated values in the selection of subsequent parameters to be evaluated, with an intensive computational efforts. Also, in the work of Alberton et al. [18], the reestimated values are used in the selection of subsequent parameters to be evaluated, but in such procedure the numerical efforts are significantly reduced using a binary search based algorithm.

Another challenge is that, even when good initial estimates of parameters values are available, in complex models the verification for identifiability problems (e.g., nonsignificant parameters or parameters correlation derived from experimental design) is a conceptual and numerical arduous task.

In such scenario, an important question to be answered is how to reduce the dependence of identifiability procedure with the initial estimates of parameters values and the selection criteria adopted? In this context, this work presents a numerical procedure for treating estimation problems present in metabolic networks based on intensive parameters evaluation that includes simultaneous parameters selection and estimation. As the main characteristic, the numerical procedure is able to investigate the identifiability of all parameters of the mathematical model, even in ill-posed estimation problem. As in Alberton et al. [18], the numerical procedure could be adapted to procedures proposed in literature for ranking parameters according to their estimability (e.g., Weijers and Vanrolleghem [19]; Sandink et al. [20]; Brun et al. [24]; Yao et al. [25]; Li et al. [21]; Secchi et al. [14]; Chu and Hahn [23]; Sun and Hahn [26]; Lund and Foss [22]; Chu et al. [27]). A complex dynamic model of the microorganism* Escherichia coli* K-12 W3110 metabolic pathways [2] illustrates the performance of the proposed numerical procedure in applications of interest. Such microorganisms are very important in bioengineering and industrial microbiology, being widely employed in processes of recombinant proteins production.

#### 2. Theoretical Backgrounds

A brief description of parameters estimation and identifiability procedures is given below.

##### 2.1. Parameters Estimation

Parameters estimation is achieved by minimizing an objective function, which is a measure between the difference of the predicted model outputs and experimental measurements. Parameters values can be obtained according to maximum likelihood principle, as extensively described in literature [28, 29]. Assuming that the model is perfect, experiments are well done, experimental errors follow normal distribution, and independent variables are known with high accuracy, then the parameters can be estimated, according to the maximum likelihood principle, by minimizing the following objective function [28, 29]: in which represents the experimental error covariance matrix, is the vector of experimental data, and is the vector of model predicted values. Generally, only the terms of the diagonal of the matrix are considered, due to the difficulties to characterize experimental errors; thus, the objective function becomes the weighted least square function.

Once the parameters have been obtained, one can determine the uncertainties in the parameters and prediction. Usually the parameters uncertainty is based on the parameters covariance matrix , which under some simplifying assumptions contains geometrics characteristics of the confidence region of the parameters. The terms along the diagonal of the parameters covariance matrix represent the variability of the parameters estimates, and off-diagonal terms indicate the interactions among the parameters. In the parameters estimation procedure, first the Fisher information matrix (FIM) is computed and, subsequently, as follows [28, 29]: in which represents the local sensitivity matrix [28, 29].

##### 2.2. Parameters Identifiability

Estimation of all parameters values may not be possible when unsatisfactory quantity and/or quality of experimental data are available or when bad model structure and/or inadequate design of experiments were built, leading to nonsignificant or high-correlated parameters with influence on model prediction.

A common approach to overcome this problem is the use of parameters identifiability, also known as parameters estimability [7, 8]. Based on structural model and available experimental data, parameters identifiability procedures partition the original set of parameters into two subsets: (i) the parameters that can be estimated, called identifiable parameters, and (ii) the parameters that cannot be estimated, called nonidentifiable parameters. In most procedures, the identifiable parameters are ranked from most estimable to least estimable and such parameters are estimated, while the nonidentifiable parameters are kept at their initial estimates. Thus, the comparison with the model fit before and after applying the identifiability procedure is verified by the improvement achieved with the selected parameters reestimation.

A classical scheme employed by parameters identifiability procedures is showed in Figure 1. Note that in the classical scheme, the parameters estimation is carried out after the procedure; thus the quality of the initial estimates of parameters values is fundamental for a suitable selection [18].