Research Article  Open Access
Simultaneous Parameters Identifiability and Estimation of an E. coli Metabolic Network Model
Abstract
This work proposes a procedure for simultaneous parameters identifiability and estimation in metabolic networks in order to overcome difficulties associated with lack of experimental data and large number of parameters, a common scenario in the modeling of such systems. As case study, the complex real problem of parameters identifiability of the Escherichia coli K12 W3110 dynamic model was investigated, composed by 18 differential ordinary equations and 35 kinetic rates, containing 125 parameters. With the procedure, model fit was improved for most of the measured metabolites, achieving 58 parameters estimated, including 5 unknown initial conditions. The results indicate that simultaneous parameters identifiability and estimation approach in metabolic networks is appealing, since model fit to the most of measured metabolites was possible even when important measures of intracellular metabolites and good initial estimates of parameters are not available.
1. Introduction
The development of mathematical model for metabolic networks has been severely hampered by the lack of kinetic information [1–4]. Usually, available experimental data are obtained under different conditions using heterogeneous techniques, whose choice must be done according to the observation of a specific phenomenon of interest on the pathways [2, 4–6]. In such systems, the type of experiment, sampling method, and the mathematical interpretation of the data depend on the desired experimental information [5]. However, as pointed out by Costa et al. [4], kinetic information presented in the literature about metabolic network models is scarce and often confuse; thus, other strategies are adopted in detriment to the dynamic simulation of such systems.
Mathematically, metabolic networks are described by complex dynamics models, whose structure is composed by ordinary differential equations that represent mass balance of the substrate, biomass, products and intracellular metabolites crucial on the pathways, and numerous reaction rates regarding to the pathways. Such mathematical structure presents a large number of parameters, for which the estimation procedure demands a considerable number of experimental data. Since experimentally metabolic networks are only partially observed, only a fraction of the intracellular metabolites considered in the mathematical model can be directly measured and thus initial conditions should also be estimated. Unfortunately, in metabolic network systems, lack of experimental data is almost unavoidable, which compromises the reliability of reactions rates proposition and makes the estimation of all parameters unfeasible. Thus, such problems often require the use of parameters identifiability procedures.
Parameters identifiability procedures deal with illposed parameters estimation problems, selecting a subset of parameters that can be estimated when the estimation of all parameters is not possible [7]. In most procedures, parameters are ranked from most estimable to least estimable based on the structure of the model, the experimental measurements and their uncertainties, and the uncertainties of the initial estimates [8].
Several studies reported in the literature addressing the parameters identifiability in metabolic networks are exclusively concentrated in the model structure, called structural identifiability (e.g., Davidescu and Jørgensen [9]; Roper et al. [10]; Nikerel et al. [11]), and do not take into account the available experimental data. On other side, the practical identifiability (e.g., Srinath and Gunawan [12]) investigates if the available experimental data are appropriate and sufficient for achieving a reliably estimation of the model parameters.
Although the structural identifiability is a necessary condition, the practical identifiability must overcome additional difficulties, including the selection of parameters with low sensitivity on the model predictions and correlation among parameters [13]. Unfortunately, such analyses in complex models depend on the values of parameters [13], generally unavailable [8].
Since sensitivity analysis is a key tool of identifiability procedures, procedures that evaluate the identifiability only based on initial parameters values can lead to a subset of selected parameters whose estimation may lead to an illposed problem [7]. A strategy to soften this problem is to perform simultaneous parameters selection and estimation, which ensures the estimation of the selected parameters (e.g., Secchi et al. [14], Wu et al. [15]; Wu et al. [16]; McLean et al. [17]; Alberton et al. [18]).
Several procedures adopt as stop criterion the singularity of FIM (Fisher Information Matrix) (e.g., Weijers and Vanrolleghem [19]; Sandink et al. [20]; Li et al. [21]; Secchi et al. [14]; Lund and Foss [22]; Thompson et al. [8]; Alberton et al. [18]). The singularity of the FIM matrix, calculated only with the selected parameters, indicates the point in which estimation problem becomes illposed. When such point is reached, the parameters selection is stopped and the remaining parameters are admitted as nonidentifiable parameters. Particularly when identifiability performs simultaneous parameters selection and estimation (e.g., Secchi et al. [14]; Wu et al. [15]; Wu et al. [16]; McLean et al. [17]; Alberton et al. [18]), it is not desirable to keep the remaining parameters as nonidentifiable without evaluation of their estimation potential, because the estimation problem is modified at each selected parameter.
A great challenge to be overcame in identifiability procedures, even those which include simultaneous estimation, is that the nonselected parameters are evaluated based on their initial estimates, which are probably inadequate. The literature addresses Monte Carlo techniques [23] and simultaneous parameters reestimation for assuring well posed estimation of the selected parameters (e.g., Secchi et al. [14], Wu et al. [15]; Wu et al. [16]; McLean et al. [17]; Alberton et al. [18]). As more proper, the evaluation of subsequent parameters to be selected should be done based on the reestimated selected parameters values [17, 18], reducing the dependence on the initial parameters estimates. In an interesting work, McLean et al. [17] developed an algorithm which allows evaluating the identifiability of all parameters of the model; such procedure reestimate selected parameters and use these reestimated values in the selection of subsequent parameters to be evaluated, with an intensive computational efforts. Also, in the work of Alberton et al. [18], the reestimated values are used in the selection of subsequent parameters to be evaluated, but in such procedure the numerical efforts are significantly reduced using a binary search based algorithm.
Another challenge is that, even when good initial estimates of parameters values are available, in complex models the verification for identifiability problems (e.g., nonsignificant parameters or parameters correlation derived from experimental design) is a conceptual and numerical arduous task.
In such scenario, an important question to be answered is how to reduce the dependence of identifiability procedure with the initial estimates of parameters values and the selection criteria adopted? In this context, this work presents a numerical procedure for treating estimation problems present in metabolic networks based on intensive parameters evaluation that includes simultaneous parameters selection and estimation. As the main characteristic, the numerical procedure is able to investigate the identifiability of all parameters of the mathematical model, even in illposed estimation problem. As in Alberton et al. [18], the numerical procedure could be adapted to procedures proposed in literature for ranking parameters according to their estimability (e.g., Weijers and Vanrolleghem [19]; Sandink et al. [20]; Brun et al. [24]; Yao et al. [25]; Li et al. [21]; Secchi et al. [14]; Chu and Hahn [23]; Sun and Hahn [26]; Lund and Foss [22]; Chu et al. [27]). A complex dynamic model of the microorganism Escherichia coli K12 W3110 metabolic pathways [2] illustrates the performance of the proposed numerical procedure in applications of interest. Such microorganisms are very important in bioengineering and industrial microbiology, being widely employed in processes of recombinant proteins production.
2. Theoretical Backgrounds
A brief description of parameters estimation and identifiability procedures is given below.
2.1. Parameters Estimation
Parameters estimation is achieved by minimizing an objective function, which is a measure between the difference of the predicted model outputs and experimental measurements. Parameters values can be obtained according to maximum likelihood principle, as extensively described in literature [28, 29]. Assuming that the model is perfect, experiments are well done, experimental errors follow normal distribution, and independent variables are known with high accuracy, then the parameters can be estimated, according to the maximum likelihood principle, by minimizing the following objective function [28, 29]: in which represents the experimental error covariance matrix, is the vector of experimental data, and is the vector of model predicted values. Generally, only the terms of the diagonal of the matrix are considered, due to the difficulties to characterize experimental errors; thus, the objective function becomes the weighted least square function.
Once the parameters have been obtained, one can determine the uncertainties in the parameters and prediction. Usually the parameters uncertainty is based on the parameters covariance matrix , which under some simplifying assumptions contains geometrics characteristics of the confidence region of the parameters. The terms along the diagonal of the parameters covariance matrix represent the variability of the parameters estimates, and offdiagonal terms indicate the interactions among the parameters. In the parameters estimation procedure, first the Fisher information matrix (FIM) is computed and, subsequently, as follows [28, 29]: in which represents the local sensitivity matrix [28, 29].
2.2. Parameters Identifiability
Estimation of all parameters values may not be possible when unsatisfactory quantity and/or quality of experimental data are available or when bad model structure and/or inadequate design of experiments were built, leading to nonsignificant or highcorrelated parameters with influence on model prediction.
A common approach to overcome this problem is the use of parameters identifiability, also known as parameters estimability [7, 8]. Based on structural model and available experimental data, parameters identifiability procedures partition the original set of parameters into two subsets: (i) the parameters that can be estimated, called identifiable parameters, and (ii) the parameters that cannot be estimated, called nonidentifiable parameters. In most procedures, the identifiable parameters are ranked from most estimable to least estimable and such parameters are estimated, while the nonidentifiable parameters are kept at their initial estimates. Thus, the comparison with the model fit before and after applying the identifiability procedure is verified by the improvement achieved with the selected parameters reestimation.
A classical scheme employed by parameters identifiability procedures is showed in Figure 1. Note that in the classical scheme, the parameters estimation is carried out after the procedure; thus the quality of the initial estimates of parameters values is fundamental for a suitable selection [18].
3. Numerical Procedure: Intensive Parameters Evaluation
An important aspect of the classical identifiability procedures is to keep the nonidentifiable parameters in their initial estimates while the identifiable parameters are estimated. Since estimation of all parameters is not possible, the estimation of the most identifiable parameters can both regularize an illposed estimation problem and simplify the associated optimization problem [7]. Nonetheless, when parameters selection in the classical procedure reached the stop criteria, the remaining parameters are admitted as nonidentifiable. However, it should be important to verify if the inclusion of any other parameter is possible, in order to give the chance of all parameters to be tested. It is especially important because the parameters reestimation of the selected subset may change all parameters values. Thus, changing parameters values may allow the inclusion of other parameters, even those that has already been tested, without success. Thus, identifiability procedures with simultaneous parameters reestimation will require a high computational cost. For example, in the procedure proposed by McLean et al. [17] evaluations are necessary.
The proposed numerical procedure performs intensive parameters evaluation regarding the identifiability and performs the simultaneous estimation approach, based on the onebyone selection, as presented in Figure 2. The intensive parameters evaluation presents the following basics steps: (i) rank all parameters of the model, (ii) select the most identifiable parameter, and (iii) reestimate the set of selected parameters. If step (iii) is not successfully performed, an additional step is introduced: (iv) remove the last evaluated parameter and repeat steps (ii) and (iii) taking the next most identifiable parameter, until step (iii) is successfully performed; such evaluation is stopped only when all parameters have been evaluated.
From of description in Figure 2, essentially three sets are created: selected parameters , nonselected parameters , and evaluated parameters . Initially, all the parameters are included in the set of nonselected parameters . Thus, this set of parameters is ranked according to their apparent identifiability; in this paper, according to Yao et al. [25], the apparent most estimable parameter is included in the set of selected parameters and removed from the set of nonselected parameters. A reestimation step for the selected parameters is then performed. If the estimation is successfully performed, then the nonselected parameters are ranked again and the procedure is repeated. Otherwise, if the estimation failed, or some numerical problem occurs, the last included parameter must be removed from the set of selected parameters and transferred to the set of evaluated parameters . The set of evaluated parameters contains the parameters that were not successfully performed in the set of selected parameters. The procedure seeks to include the next apparent most estimable parameter from . If some parameter is successfully included, then after the reestimation step, with changed values of the selected parameters, the parameters may now become estimable; therefore, in case of succeed reestimation, all parameters in the set are transferred to the set to be reevaluated in next iterations. The procedure stops when is empty; that is, there is no more parameter to be included in the set of selected parameters. All the parameters have also been included in the set of or . Seeking to ensure evaluation of all parameters of the model, the proposed procedure is based on the onebyone parameters selection that has a computational effort in the worst case given as .
The proposed numerical procedure for parameters identifiability is implemented in computational code Fortran 95. In this numerical procedure, two powerful packages of literature are employed: Dassl [30], used for solving algebraicdifferential equations, and Estima and Planeja [31], used for estimate the parameters of the model; this last one adapted by Alberton et al. [18] to deal complex models with scarce experimental data.
In order to clarify the proposed procedure, Figure 3 illustrates a case with three parameters. The parameters are ranked according to the adopted identifiability criteria (e.g., Weijers and Vanrolleghem [19]; Sandink et al. [20]; Brun et al. [24]; Yao et al. [25]; Li et al. [21]; Secchi et al. [14]; Chu and Hahn [23]; Sun and Hahn [26]; Lund and Foss [22]; Chu et al. [27]). According to the proposed procedure, the first of the most relevant parameter of the rank is included in the set of selected parameters, represented by the filled squares in Figure 3. When adding such parameter to the set of selected parameters two cases can arise: (i) the set of selected parameters can be estimated, the selected parameter was included with success; thus, the nonselected parameters are reranked and the most relevant parameter of the rank should be included in the set of selected parameters or (ii) the set of selected parameters cannot be estimated simultaneously, the last selected parameter is removed from the set of selected parameters and added to the set of evaluated parameters, represented by plaid squares in Figure 3; thus the next parameter in the identifiability rank should be selected. Since the total number of parameters is equal to 3, in the worst case the number of required evaluations is 6.
Although the proposed procedure may still be affected by the initial estimates, this dependency is strongly reduced due to the steps of parameters reestimation [18, 23]. According to Alberton et al. [18], a possible and natural alternative would be the use of nondeterministic optimization methods before or even during the identifiability procedures, such as Particle Swarm Optimization (PSO) [32] or Genetic Algorithm (GA) [33], to improve the initial parameters estimates. However, the use of nondeterministic methods may not be advantageous for complex metabolic network models, because the range of parameters values must be carefully chosen, otherwise, numerical problems associated with parameters values very often do not allow the numerical simulation of the model. Thus, for many random parameters values generated by PSO or GA, parameters estimation could not be successfully evaluated. For example, in this work, the PSO as a previous step of parameters identifiability of the E. coli K12 W3110 metabolic network model was investigated, but the numerical problems associated with the model simulation did not lead to any further significant improvement regarding the initial estimates. Besides, such methods require too many function evaluations of the metabolic network, since the number of parameters to be estimated is high (131 parameters in the case study).
The Yao et al. [25] methodology was adopted as criteria for ranking the parameters according to their identifiability potential; however, it is important to emphasize that other ranking methodologies can be adopted. The Yao et al. [25] identifiability is based on the sensitivity matrix and two criteria for parameters selection: (i) parameters influence on model prediction, length of sensitivity vector by Euclidian norm, and (ii) parameters correlations, linear dependence among sensitivity vectors by GramSchmidt orthogonalization method. Each column of matrix can be understood as a sensitivity vector regarding each parameter. Generally, normalized sensitivity matrix as shown in (3) is employed to avoid the influence of different magnitudes of variables and parameters on the sensitivity analysis.
According to Yao et al. [25], the most identifiable parameter is the one with the highest Euclidian norm of the sensitivity vector, that is, . Thus, this procedure proposes to include, in the set of selected parameters, one parameter each time, according to the decreasing rank of the sensitivity vectors norms. Moreover, since the vectors can be linearly dependent, for every parameter included in the set of selected parameters, an orthogonalization procedure (GramSchmidt method) was performed over the matrix , in order to discount the influence of linear dependence with the selected parameters
As a previous analysis, the collinearity angle was evaluated among sensitivity vectors and , for all parameters of the model, that is, , as presented in the following equation [33]:
As wellknown from linear algebra, collinearity angles values near 90° indicate linear independence between sensitivity vectors while values near 0° or 180° indicate the opposite. It is important to emphasize that is not a rigorous analysis; correlation among parameters is only verified in parameters pars. However, this analysis can be used in a simple way to verify if correlation among pairs of parameters is expected to occur.
4. Application: Metabolic Networks in Large Scale Modeling
Metabolic networks models are employed for describing enzymatic activity of microorganisms. In such processes, series of sequential and parallel reactions take place, producing metabolites. Palsson [3] and Steuer and Junker [1] present detailed descriptions about the development of metabolic networks models.
Despite the great variety of computational tools available for assisting the development of such models, significant challenges are encountered in modeling living organisms and their mechanisms [34]. The development of metabolic networks models includes several steps which must interconnect with each other, related to biological knowledge, experimental data acquirement, mathematical modeling, parameters estimation, and model evaluation. A simplified scheme for construction of metabolic networks models is presented in Box 1.

Regarding step , Steuer and Junker [1] present several databases that can be consulted for selecting all the reactions that will be considered. Moreover, Copeland et al. [34] present computational tools employed for this step.
In step , there are difficulties in collecting experimental data from the literature. Besides, the experiments are time and money consuming. In fact, in many metabolic reactions, especially in the catabolic reactions and the reactions for cells energy production, turnover rates are in the range of 1.5–2.0 s^{−1}. Such fast reactions make experiments with manual operation unreliable to study the dynamics of intracellular metabolite concentrations [5, 6]. This characteristic of the microsystem seriously impairs the availability of experimental data, since the choice of sampling times is crucial to the quality of information that can be obtained from experimental data.
In step , it is necessary to propose the set of possible reactions that may occur. The application of Topological Analysis (TA) and Flux Balance Analysis (FBA), together with experimental data [35], will lead to a stoichiometric matrix for the most relevant reactions that seems to occur. Although computational tools are available for this step [34], there are some arbitrary choices, which leads to a nonunique results.
Regarding steps and , it is important to emphasize the possible use of databases, containing both models and initial parameters values. Databases sources are presented by Steuer and Junker [1] and Copeland et al. [34]. Nevertheless, such steps are the most difficult parts of the work, since, when not found, initial estimates of parameters may become completely arbitrary. Gerdtzen et al. [36] address the complexity at the pathway level and alert for the use of default models for biochemical processes such as the Monod/MichaelisMenten rates with their generalization toward several substrates, reversibility, and different mechanisms of inhibition. Three major problems affect the parameters estimation of metabolic networks: (i) when experimental data do not present measures for all metabolites of interest, (ii) when measurements of some metabolites are not synchronized with other metabolic measures over the sampling time, and (iii) uncertain initial estimates of the parameters values; in many cases, good initial estimates of parameters values (or even possible ranges) are not known; thus, such values are usually arbitrarily adopted [5].
Step indicates the use of parameters identifiability techniques. Since generally the experimental evidence is insufficient, the estimation of all parameters seldom will be possible [37]. Nevertheless, if parameters are set initially in reasonable values, one can identify the most influent set of parameters. It is important to emphasize that techniques allowing parameters reestimation should be preferable, as discussed in this work.
Steps and can be performed together. Since estimation of all parameters will generally not be possible, one interesting stop criterion is the experimental demand for obtaining additional information. Using just simulated results, one can verify if, for the optimal experimental design conditions, the expected information to be obtained will justify performing additional experiments. If it seems reasonable to perform more experiments, one should return to step , performing experiments at the condition indicated by the optimal experimental design techniques [38, 39].
From the uncertainties sources described above, discrepancies are expected to occur between model predictions and experimental results. Even so, the use of models can help to choose experimental regions for performing experimental tests where one can expect to achieve more information, thus, saving experimental efforts which are time and money consuming. As shown in this paper, it can be well performed for large scale metabolic networks.
4.1. Case Study: Mathematical Model of Metabolic Networks in Large Scale—Escherichia coli K12 W3110
Possibly, Escherichia coli is the most studied microorganism in the literature. Even so, its metabolic networks are not completely observable. For different experimental conditions, there are several infrequent or absent measures of intracellular metabolites. Thus, some studies have focused on the development of kinetic model for metabolic network for this microorganism (e.g., Chassagnole et al. [2]; di Maggio et al. [40]; Degenring [41]; Usuda et al. [42]).
Particularly, E. coli K12 W3110 to be wild type (wt) corresponds to standard strain that can be genetically manipulated allowing a large range of applications, such as: (i) pharmaceutical, recombinants proteins, vaccines, and serums, (ii) genetic medicine, (iii) environmental, biomarkers and pollutionfighting, and (v) energetic, biodiesel, oils, and others. E. coli is able to metabolize a large variety of components (e.g., carbohydrates, proteins, amino acids, lipids, and organics acids), produce catalase, and also utilize a variety of sources (e.g., glucose, ammonia, and nitrogen). Besides, E. coli grows in large concentrations allowing fermentative processes of high yield [43].
Regarding parameters identifiability in E. coli, di Maggio et al. [40] used global sensitivity analysis proposed by Sobol’ [44] to determine the most influential parameters on the dynamics of the central carbon metabolism of this bacterium, based on the model proposed by Chassagnole et al. [2]. di Maggio et al. [40] concluded that twelve kinetic parameters were the most influential for the model predictions. These parameters represent maximum reaction rates, inhibition, and halfsaturation constants. Their identification and later estimation provide the starting points for the manipulation of certain enzyme properties [40].
Large scale metabolic network of the glycolysis, the pentosephosphatepathway, and the phosphotransferase system of E. coli K12 W3110 (Figure 4) [40] consists of a complex mathematical structure with 18 dynamic mass balance equations, (5) to (22) presented below, 7 additional algebraic equations, (A.1) to (A.7), and 30 kinetic rate expressions, (A.8) to (A.37) [2]. Some enzyme kinetics modifications were introduced by di Maggio et al. [40] as the kinetic expression for the activity of phosphofructokinase was taken from Ricci [45] (see (A.12)) and the activity of glucose6phosphate dehydrogenase was modeled by the expression proposed by Ratushny et al. [46] (see (A.13)).
Mass Balances to the Substrate (See (5)) and Intracellular Metabolites (See (6)–(22)) glucose: glc glucose6phosfato: g6p fructose6phosphate: f6p fructose1,6biphosphate: fdp glyceraldehyde3phosphate: gap dihydroxyacetonephosphate: dhap 1,3diphosphoglycerate: pgp 3phosphoglycerate: 3pg 2phosphoglycerate: 2pg phosphoenolpyruvate: pep pyruvate: pyr 6phosphogluconate: 6pg ribulose5phosphate: ribu5p xylulose5phosphate: xyl5p sedoheptulose7phosphate: sed7p ribose5phosphate: rib5p erythrose4phosphate: e4p glucose1phosphate: g1p
As usual, the values of initial estimates used here were found in the literature [40]. Experimental data employed in this work were obtained in Hoque et al. [43] throughout a time horizon of 300 seconds. Temporal profiles for glucose (glc), dihydroxyacetonephosphate (dhap), erythrose4phosphate (e4p), pyruvate (pyr), fructose1,6diphosphate (fdp), ribose5phosphate (rib5p), ribulose5phosphate (ribu5p), 2phosphoglycerate (2pg), phosphoenolpyruvate (pep), glyceraldehyde3phosphate (gap), glucose6phosphate (g6p), fructose6phosphate (f6p), and 6phosphogluconate (6pg) were used for parameters identifiability.
As a previous analysis, the collinearity angle among sensitivity vectors for all parameters of the model were calculated, and the frequency of each angle interval is shown in Figure 5. The results indicate high frequency of parameters pars linearly independents (collinearity angles values approximately 90°), in this way Table 1 presents few parameters pars with critical collinearity angles values that do not expect to be selected until a first successfully performed selection.

Fifty eight parameters have been identified, together with five initial conditions of metabolites concentrations which were not known from experimental data (, , , , and ). In metabolic networks, the initial conditions of intracellular metabolites are very important. It is not recommended to keep initial conditions of unknown metabolites at zero, because numerical problems in model integration can occur or predicted values can become unreliable. In such situations, the initial concentrations of unknown metabolites must be estimated together with the parameters. In order to obtain a first initial estimates of nonmeasured intracellular metabolites, the following procedures were employed in this paper: (i) initially, keep the initial concentrations of nonmeasured intracellular metabolites at zero and simulate the model until next integration point (first numerical integration step must be small), and (ii) take the calculated values for the nonmeasured intracellular metabolites concentrations as initial conditions; the information obtained in (ii) is given as initial estimates of concentrations of nonmeasured intracellular metabolites in the identifiability procedure.
Numerical results for the proposed procedure are presented in Figure 6 and Table 2, comparing the model simulation with initial parameters estimates from the literature [40] and reestimated parameters according to the intensive parameters evaluation.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
Figure 6 demonstrates the improvement of the model prediction by using the proposed procedure of parameters identifiability. Unfortunately, model limitations and few experimental data impose some insurmountable barriers to identifiability procedures. Thus, some metabolites did not change after applying the identifiability procedure. In this way, the 10 metabolites profiles that were significantly improved after the identifiability procedure are presented. Since the model fits reasonably the experimental data, it can be concluded that the parameters selection was performed with relatively good success. A special attention should be given to Figures 6(b), 6(d), and 6(f) for which the mathematical model behavior using initial estimates of parameters values is not able to follow the tendency of the experimental data. For these cases, the initial estimates of parameters values showed to be inadequate.
Table 2 presents the estimated parameters together with the initial estimates and their normalized standard deviations, calculated as the ratio between the standard deviation of the parameter and its estimated value . As important information of the identifiable parameters, the low normalized relative deviations obtained after the numerical procedure indicates a good quality of the estimated values. Analyzing the parameters estimated values, it is found that initial estimates of the parameters were significantly improved. It is important to emphasize that the estimates of the selected parameters can be significantly influenced by the initial estimates of the nonselected parameters. Thus, more adequate parameters values can be obtained when a large experimental dataset is available, allowing estimating all parameters of the mathematical model. Using the proposed numerical procedure, the objective function was also improved in two orders of magnitude relative to the initial estimates. The results are also dependent on the assumed behavior of experimental uncertainties. It is usual to assume the standard deviation of the dependent variable () to be proportional to its value () in the form where and are generally arbitrarily chosen. For example, for and , the initial value of the objective function was and reduced to after reestimation. For and , the initial value of the objective function was and reduced to after parameters reestimation. The set of selected parameters was not the same in both cases; nevertheless, the prediction was similar. The dependence of parameters estimation performance on the assumed experimental uncertainties is expected even for well posed problems. It is important information, but generally it is neglected and arbitrarily chosen due to the difficulties involved in characterizing the experimental errors.
Even with so much uncertainties, associated to experimental data, modeling development and some unknown initial conditions, it has been shown that it is possible to fit and/or improve model predictions with parameters identifiability procedures. The model can now be used for predicting the experimental behavior of the system. Besides its uncertainties, it can help to give us an idea about what we should expect in experimental regions, delimiting experimental design for further investigations, among other purposes. Naturally, for more accurate results, some predictions should be confirmed by additional experiments in the experimental regions of greater interest.
Comparing the performance of the proposed numerical procedure with similar procedure but stopping the selection when the last selected parameter was not successfully included, only 63 parameters have been selected, and the objective function was around 5 times greater compared with the results obtained with the algorithm presented in Figure 2. It emphasizes the necessity to investigate all parameters in order to improve final results and not to stop the procedure when the first parameter have failed to be included.
In the procedure proposed in previous work [18], subsets of parameters were included at once, and if the estimation of such subset was possible, this subset could not leave the set of selected parameters. Thus, if a set of parameters has been successfully estimated, the orthogonalization step (for discount parameters influence one each other) was not performed between such parameters. In the procedure presented in this work, one parameter is selected each time, and orthogonalization step between the selected parameters and yet unselected parameters is performed much more times than in previous work [18]. Although not assured by the model nonlinearities, one can expect that the present procedure should lead to better results, despite being more time consuming, since it performs a more thorough investigation regarding the parameters correlation. In fact, the objective function obtained in this work divided by the objective function of previous work [18] was 0.15, indicating a good improvement in prediction performance.
It is important to emphasize that, for this application, the improvement in the prediction quality was observed with estimation of only half of the model parameters, being the numerical procedure performed using unknown values for some initial conditions of the dependent variables. Thus, in similar scenarios, a great merit of the parameters identifiability procedure is to select only the most influential parameters on the mathematical model that can be estimated with the available experimental data.
5. Conclusion
In this work, a large scale illposed problem of parameters estimation in metabolic networks was investigated, where experimental data are scarce and concentrations of intracellular metabolites are not completely known. A proposed procedure of intensive parameters evaluation that presents simultaneous parameters selection and estimation was successfully applied to solve this problem. Compared in pairs, few parameters seem to be correlated, as revealed by collinearity angles. From the initial scenario, containing 131 parameters and 5 unknown initial conditions of intracellular metabolites concentrations, the procedure was able to identify 58 parameters, together with the 5 initial conditions of intracellular metabolites concentrations. The simultaneous reestimation step reduced the dependence on the initial parameters estimates, allowing a good fit of the model. The robustness of the applied procedure is certainly an appealing feature for metabolic networks problems.