Abstract

A metabolic system consists of a number of reactions transforming molecules of one kind into another to provide the energy that living cells need. Based on the biochemical reaction principles, dynamic metabolic systems can be modeled by a group of coupled differential equations which consists of parameters, states (concentration of molecules involved), and reaction rates. Reaction rates are typically either polynomials or rational functions in states and constant parameters. As a result, dynamic metabolic systems are a group of differential equations nonlinear and coupled in both parameters and states. Therefore, it is challenging to estimate parameters in complex dynamic metabolic systems. In this paper, we propose a method to analyze the complexity of dynamic metabolic systems for parameter estimation. As a result, the estimation of parameters in dynamic metabolic systems is reduced to the estimation of parameters in a group of decoupled rational functions plus polynomials (which we call improper rational functions) or in polynomials. Furthermore, by taking its special structure of improper rational functions, we develop an efficient algorithm to estimate parameters in improper rational functions. The proposed method is applied to the estimation of parameters in a dynamic metabolic system. The simulation results show the superior performance of the proposed method.

1. Introduction

Living cells require energy and material for maintaining their essential biological processes through metabolism, which is a highly organized process. Metabolic systems are defined by the enzymes dynamically converting molecules of one type into molecules of another type in a reversible or irreversible manner. Modeling and parameter estimation in dynamic metabolic systems provide new approaches towards the analysis of experimental data and properties of the systems, ultimately leading to a great understanding of the language of living cells and organisms. Moreover, these approaches can also provide systematic strategies for key issues in medicine, pharmaceutical, and biotechnological industries [1]. The formulation and identification of metabolic systems generally includes the building of the mathematical model of biological process and the estimating of system parameters. Because the components of a pathway interact not only with each other in the same pathway but also with those in different pathways, most (if not all) of mathematical models of metabolic systems are highly complex and nonlinear. The widely used approaches for modeling inter- and intracellular dynamic processes are based on mass action law [14]. By mass action law, the reaction rates are generally polynomials in concentrations of metabolites with reaction constants or rational functions which are a fraction and whose denominator and numerators are polynomials in concentrations of metabolites with reaction constants [14]. As a result, the mathematical model is nonlinear not only in the states but also in the parameters. Estimation of these parameters is crucial to construct a whole metabolic system [57].

In general, all algorithms for nonlinear parameter estimation can be used to estimate parameters in metabolic systems, for example, Gauss-Newton iteration method, and its variants such as Box-Kanemasu interpolation method, Levenberg damped least squares methods and Marquardt’s method [8, 9]. However, these iteration methods are initial-sensitive. Another main shortcoming is that these methods may converge to the local minimum of the least squares cost function and thus cannot find the real values of parameters. Furthermore, because of their highly complexity and nonlinearity, Gauss-Newton iteration method and its variants cannot efficiently and accurately estimate the parameters in metabolic systems [57, 10, 11].

In this paper, we propose a systematic method for estimating parameters in dynamic metabolic systems. Typically mathematical model of dynamic metabolic systems consists of a group of nonlinear differential equations, some of which contains several rational functions in which parameters are nonlinear. In Section 2, we propose a method for model complexity analysis via the stoichiometric matrix. As a result, we obtain a group of equations, each of which contains only one-rational function plus polynomial functions which we called an improper rational function. Then, based on the observation that in the improper rational functions both the denominator and numerator are linear in parameters while polynomials are also linear in parameters, we develop an iterative linear least squares method for estimating parameters in dynamic metabolic systems in Section 3. The basic idea is to transfer optimizing a nonlinear least squares objective function into iteratively solving a sequence of linear least squares problems. In Section 4, we apply our developed method to estimate parameters in a metabolism system. Finally we give conclusions and some directions of future work along with this study in Section 5.

2. Model Complexity Analysis for Parameter Estimation

A dynamic metabolic system consists of substances (molecules), and reactions can be described by a system of differential equations as follows: where represents the concentrations of molecule , represents the reaction rate , and represents the stoichiometric coefficient of molecule in reaction . The mass action law in biochemical kinetics [24, 12] states that the reaction rate is proportional to the probability of a collision of the reactants. This probability is in turn proportional to the concentration of reactants. Therefore, reaction rate is a function of the concentrations of molecules involved in reaction and proportion constants.

The stoichiometric coefficient assigned to molecule and reaction can be put into a so-called stoichiometric matrix . Let and , and let represent the vector consisting of all independent proportion constants, and then (1) can be rewritten in the following vector-matrix format:

In principle, the stoichiometric coefficient in matrix is a constant integer and can be decided according to how molecule is involved in reaction . According to mass action law, the expression of reaction rates can be determined to be polynomials or rational functions with reaction constants [24, 12]. The challenge to build up the mathematic model of dynamic metabolic system (2) is to estimate the parameter vector , especially when some reaction rates are in the form of rational functions in which parameters are nonlinear.

If each differential equation in (2) contains one-rational function without or with polynomial functions, the parameters in model (2) can be estimated by algorithms in [13, 14] or a new algorithm proposed in the next section of this paper. Unfortunately, each differential equation contains a linear combination of several rational functions, which makes the parameter estimation in those coupled differential equations more difficult. The stoichiometric matrix contains very important information about the structure of the metabolic systems and is widely used to analyze the steady state and flux balance of metabolic systems [24]. In this paper, via the stoichiometric matrix, we propose a systematic method to transfer a system of differential equations (2) into another system of differential equations, in which each differential equation contains at most one-rational function.

Running Example. To illustrate the proposed method, we use the upper part of glycolysis system as a running example, showing how the method is applied to this system step after step. The schematic representation of this system is shown in Figure 1. The model for this metabolic system is described by the system of differential equations (2) as follows:

Based on the mass action law, the individual reaction rates can be expressed as

Model (3) has six ordinary differential equations (ODEs) and 15 parameters contained in eight reaction rates, three out of which are rational functions. Some ODEs contain more than one rational reaction rates, which makes the parameter more difficult.

Comparing (3) to (2) we have the state vector: = [Gluc6P; Fruc6P; Fruc1,6P2; ATP, ADP, AMP] and stoichiometric matrix:

In the following, we describe our proposed method to analyze the complexity of model (2) through the running example.

Step 1. Collect the columns in the stoichiometric matrix corresponding to the rational reaction rates in model (2) to construct a submatrix and collect other columns (corresponding to polynomial reaction rates) to construct a submatrix . Therefore, we have
where is the subvector of and consists of all rational reaction rates while is another subvector of and consists of all polynomial reaction rates. In this step, we should make sure that the rank of matrix equals the number of rational reaction rates. If the rank of matrix does not equal the number of rational reaction rates, it means that some rational reaction rates are not independent. Then we combine dependent rational reaction rates together to create a new reaction rate such that all resulted rational reaction rates should be linearly independent [14]. As a result, the rank of matrix will equal the number of rational reaction rates.
For the running example, we have
and and . The rank of matrix equals 3, which is the number of rational reaction rates.

Step 2. Calculate the left inverse matrix of . That is, calculate such that
As matrix has the column full rank, matrix satisfying (8) exists although it is typically not unique. For a given matrix , can be easily found by solving (8) which is a linear algebraic system. If it is not unique, any matrix satisfying (8) works for our proposed method.
For the running example, we can have

Step 3. Multiply (6) by matrix from the left to obtain or
From its expression, each differential equation in the system (11) contains only one-rational reaction rates plus a linear combination of polynomial reaction rates.
For the running example, we have

Step 4. Calculate matrix such that
where has the full row rank and = the  number  of  rows  in  . Note that can be easily found by solving (13), which is a homogenous linear algebraic system. Again if it is not unique, any matrix satisfying (13) works for our proposed method.
Then multiply (6) by matrix from the left to obtain
or
For the running example, we can have

Step 5. Let . If ≥ the  number  of  columns, then solving (15) yields
If < the  number  of  columns, it means that some polynomial reaction rates in (15) are linearly dependent. Then combine the linearly dependent rates and construct a new reaction rate vector and full column rank matrix such that
and then solving (18) yields

For the running example, we have < the  number  of  columns. As the first four columns are linearly dependent, we can have a new reaction rates . Therefore, we have and furthermore, noting that , from (19) we have

After these five steps, dynamic metabolic system (2) is transferred into a system of differential equations, in which each differential equation contains one-rational function plus polynomial functions ((11) or (12)) or only polynomial function ((19) or (21)). Parameters in (19) can be analytically estimated by well-known least squares methods. In the next section, we describe an algorithm to estimate parameters in (11).

3. Parameter Estimation Algorithm

After its complexity analysis, estimating parameters in dynamic metabolic system is reduced to mainly estimating parameters in a rational function plus polynomial, which we call the improper rational function. These functions are nonlinear in both parameters and state variables. Therefore, estimation of parameters in these models is a nonlinear estimation problem. In general, all algorithms for nonlinear parameter estimation can be used to estimate parameters in the improper rational functions, for example, Gauss-Newton iteration method and its variants such as Box-Kanemasu interpolation method, Levenberg damped least squares methods, Marquardt’s method [912, 15], and more sophisticated methods [16]. However, these iteration methods are initial sensitive. Another main shortcoming is that most of these methods may converge to the local minimum of the least squares cost function and thus cannot find the real values of parameters. In the following, we describe an iterative linear least squares method to estimate parameters in the improper rational functions. The basic idea is to transfer optimizing a nonlinear least squares objective function into iteratively solving a sequence of linear least squares problems.

Consider the general form of the following improper rational functions: where the vector consists of the state variables and the -dimensional vector consists of all parameters in the improper rational function (22), which can naturally be divided into three groups: those in the numerator of the rational functions , those in the denominator of the rational function , and those in the polynomial , where we have that . , , and are the known functions nonlinear in the state variable and do not contain any unknown parameters. Either or must be nonzero, and otherwise from sensitivity analysis [9, 16] the parameters in model (22) cannot be uniquely identified.

If there is no polynomial part, model (22) is reduced to a rational function. Recently, several methods have been proposed for estimating parameters in rational functions [5, 6, 13, 14]. The authors in [5, 6] have employed general nonlinear parameter estimation methods to estimate parameters in rational functions. As shown in their results, the estimation error is fairly large. We have observed that in rational functions both the denominator and numerator are linear in the parameters. Based on this observation, we have developed iterative linear least squares methods in [13, 14] for estimating parameters in rational functions. Mathematically, improper rational function (22) can be rewritten as the following rational function:

However, in the numerator of the model above, there are coefficients while there are unknown parameters. When , the number of parameters is equal to the numbers of coefficients, and the methods developed in [13, 14] can be applied. However, when , those methods are not applicable as the number of parameters is less than the number of coefficients in the numerator.

In order to describe an algorithm to estimate parameters in the improper rational function (22) for given groups of observation data and , we introduce the following notation:

To estimate parameters in the improper rational function (22), as in [11], we form a sum of the weighted squared errors (the cost function) with the notions above as follows:

Minimizing with respect to can give the nonlinear least squares estimation of parameters , , and . We rewrite the objective function (22) as follows:

In the objective function (26), for a given parameters in the first term, we have where

Then for given parameters , we can estimate the parameters by linear least squares method as follows:

Based on the above discussion, we propose the following iterative linear least squares method.

Step 1. Choose the initial guess for .

Step 2. Iteratively construct matrix and vector by (28) and (29), respectively, and then solve the linear least squares problem:
which gives the solution until the stopping criterion is met, where is the estimation of parameters at step .

From (31), if the estimation sequence is converged to , the objective function (26) reaches its minimum value at . That is, is the estimation of parameters in model (22).

There are several ways to set up a stopping criterion. In this paper the stopping criteria are chosen as

where is the Euclidean norm of the vector and is a preset small positive number, for example, .

4. Application

To investigate the method developed in previous sections, this study generates artificial data from the dynamic metabolic system in the running example with the biochemically plausible parameter values [4] listed in column 2 of Table 1 and initial values: Gluc6P(0) = 1 mM, Fruc6P(0) = 0 mM, Fruc1,6P2(0) = 0 mM, ATP(0) = 2.1 mM, ADP(0) = 1.4 mM, and AMP (0) = 0.1 mM. The trajectory of this system is depicted in Figure 2. From Figure 2, the concentrations of all molecules except for Frucose-1,6-biphosphate reach their its steady states after about 0.1 minutes while Frucose-1,6-biphosphate after 0.5 minutes. Therefore, we do not use the data simulated after 0.5 minutes.

Although no noise is added to the artificial data in the simulation, noises are introduced in numerically calculating the derivatives by finite difference formulas. In general, the higher the sampling frequency and more data points are used, the more accurate the numerical derivatives are. On the other hand, we may not obtain data with the high frequency because of experimental limitations in practice. In this study, the sampling frequency is 100 data points per minute. In numerically calculating the concentration change rate at each time point from concentration , we adopt the five-point central finite difference formula as follows:

The estimation accuracy of the proposed method is investigated in terms of relative estimation error which is defined as

As all parameters to be estimated are nonnegative, initial values are chosen as 0 or 1 in this study. The experimental results are listed in columns 3 and 4 in Table 1. From column 3 in Table 1, the estimated parameter values are very close to the corresponding true values. Actually the relative estimation errors calculated from (29) for all estimated parameters except for two are less than 1%. This indicates that the proposed method can accurately estimate the parameters in this system.

5. Conclusions and Future Work

In this study, we have first described a method to analyze the complexity of metabolic systems for parameter estimation, based on the stoichiometric matrix of the metabolic systems. As a result, the estimation of parameters in the metabolic systems has been reduced to the estimation of parameters in the improper rational functions or polynomial functions. Then we have developed an iterative linear least squares method for estimating parameters in the improper rational models. The results from its application to a metabolism system have shown that the proposed method can accurately estimate the parameters in metabolic systems.

We do not consider the noises in the data except those introduced by numerical derivatives in this study. One direction of future work is to investigate the influence of noises in the data to the estimation accuracy. In addition, low sampling frequency is expected, particularly for molecular biological systems as in practice measurements from them may be very expensive or it is impossible to sample measurements with high frequencies. Another direction of future work is to improve the estimation accuracy of the proposed method with low sampling frequencies.

Acknowledgments

This work was supported by the Special Fund of Ministry of Education of Beijing for Distinguishing Professors and Science and Technology Funds of Beijing Ministry of Education (SQKM201210037001) to Li-Ping Tian, by National Natural Science Foundation of China (NSFC 61134004) to Zhong-Ke Shi, and by Natural Sciences and Engineering Research Council of Canada (NSERC) to Fang-Xiang Wu.