Abstract

A distribution-based identification procedure for estimation of yield coefficients in a baker’s yeast bioprocess is proposed. This procedure transforms a system of differential equations to a system of algebraic equations with respect to unknown parameters. The relation between the state variables is represented by functionals using techniques from distribution theory. A hierarchical structure of identification is used, which allows obtaining a linear algebraic system of equations in the unknown parameters. The coefficients of this algebraic system are functionals depending on the input and state variables evaluated through some test functions from distribution theory. First, only some state equations are evaluated throughout test functions to obtain a set of linear equations in parameters. The results of this first stage of identification are used to express other parameters by linear equations. The process is repeated until all parameters are identified. The performances of the method are analyzed by numerical simulations.

1. Introduction

A process carried out in a bioreactor can be defined as a set of biochemical reactions involving components (with ). The global dynamics can be represented by following the dynamical state-space model [1]: where .

This model describes the behaviour of an entire class of biotechnological processes and is referred to as the general dynamical state-space model of this class of bioprocesses [1, 2]. In (1.1), the term is the rate of consumption and/or production of the components in the reactor, that is, the reaction kinetics and the term represents the exchange with the environment, that is, the dynamics of the component transportation through the bioreactor. The strongly nonlinear character of the model (1.1) is given by the reaction kinetics. In many situations, the yield coefficients, the structure, and the parameters of the reaction rates are partially known or unknown. Many of the evolved control methods for these kind of systems—like model predictive control [3], robust or adaptive control [4, 5]—are based on good initial estimates of the yield parameters. This paper deals with the yield coefficients identification of baker’s yeast bioprocess.

In recent years, a progress has been made in the area of continuous-time system identification. Even if the most physical systems are naturally continuous, a much more attention has been paid to parameter estimation of discrete-time systems, mainly because they are better suited for numerical implementations. Continuous-time identification makes possible a more direct link to the physical properties and operation of the underlying systems [6, 7] and the direct estimation of physical parameters which have a clear significance [8]. One important direction in continuous-time system identification is to transform the system of differential equations to an algebraic system that reveals the unknown parameters. By using some measures, the direct computation of the input-output data derivatives can be completely avoided. For linear system identification, several methods are reported on this direction: identification based on the Laplace transformation and then use the Laguerre filter [9] or transforming the continuous-time system to the frequency domain [10].

A novel approach to identify the continuous-time system is the distribution-based approach, using deterministic distributions [11] or random distributions [12]. Through these approaches, derivatives are described in the sense of distribution theory and one constructs the input-output algebraic relationships using differential information produced in the distribution sense.

Identification of nonlinear continuous-time systems is far away more complicated. The traditional procedures are based on the Volterra functional series, expressed in time domain or frequency domain [7]. The parameter identification of deterministic nonlinear continuous-time systems (NCTSs), modelled by polynomial-type differential equations has been considered by numerous authors [13, 14]. They use the modulating functions as a sum of sinusoids, whose fundamental period has a fixed length, to convert differential equations into an algebraic form in the frequency domain. These methods can be applied only if the NCTS is exactly integrable. This means that all the terms can be written as pure derivatives of some computable function of the measured signals. The computational burden in applying the method of modulating functions is overtaken by using Fast Fourier Transform algorithms.

In this paper we propose an identification method for a class of NCTS considering that the unknown parameters can appear in rational relations with measured variables. Using techniques utilized in distribution approach the measurable functions and their derivatives are represented by functionals on a fundamental space of testing functions. Such systems are common in biotechnology [1, 15]. It is supposed that the system is described by state equations and all state variables are measurable (or they can be estimated using some state estimators [16]).

The most important approaches for the yield coefficients estimation make use of the state transformations based on the general structure [1719]. The main idea of this paper is to use a hierarchical structure of identification to estimate the yield coefficients in the baker’s yeast bioprocess. First, some state equations are utilized to obtain a set of linear equations in some parameters. The results of this first stage of identification are used for expressing other parameters by linear equations. This process is repeated until all parameters are identified.

The paper is organized in the following way. The nonlinear dynamical model of a baker’s yeast process is given in Section 2. Section 3 presents the problem statement of continuous-time systems identification based on distribution approach and the hierarchical structure of yield coefficients identification. Section 4 is dedicated to the analysis of properties of the identification algorithm. Some experimental results are presented in Section 5, and conclusions in Section 6.

2. Nonlinear Dynamical Model of Baker’s Yeast Process

The yeast growth has already been the subject of numerous studies and papers. The reaction scheme of baker’s yeast bioprocess is [20]

In the reaction schemes (2.1)–(2.3), is the glucose, the biomass, the dissolved oxygen, the ethanol, and the dissolved carbon dioxide. The first reaction scheme represents the respiratory growth on glucose; the second reaction scheme represents the fermentative growth on glucose, and finally the third reaction represents the respiratory growth on ethanol. , and are three specific growth rates that reflect the capacity of the baker’s yeast to exploit three catabolic pathways for energy and material sources.

The dynamical model of the fed-batch bioprocess with the reaction schemes (2.1)–(2.3) can be obtained by using the mass balance of the components, considering that the bioreactor is well mixed, the yield coefficients, are constant and the dynamic of the gas phase can be neglected [20]:

The dynamical model is expressed as a set of differential equations, in terms of the concentrations of components, and with the additional (2.9), which represents the dynamics of the solution volume in the bioreactor. In (2.4)–(2.9), are the positive yield coefficients, is the glucose concentration on the feed, is the oxygen rate, defined as (where is the mass transfer coefficient and the equilibrium concentration of the dissolved oxygen), is the carbon dioxide transfer rate (defined as , with a transfer coefficient), is the input feed rate, and is the so-called dilution rate.

The model (2.4)–(2.8) can be expressed in the matrix form:

The next notations will be used: the state vector is , the vector of reaction rates (the reaction kinetics) is ; is the matrix of yield coefficients, is the vector of feeding rates, and is the vector of rates of removal of the components in gaseous form.

The most difficult task for the construction of the dynamical model is the modelling of the reaction kinetics. The form of kinetics is complex, nonlinear, and in many cases unknown. The reaction rates can be expressed as where is a state-dependent diagonal matrix, whose elements correspond to the reactions’ reactants.

The specific growth rates are considered as a vector of time-varying parameters. For the baker’s yeast bioprocess the form of matrix is very simple:

Then the model (2.10) becomes

For the reaction rates of the baker’s yeast bioprocess there are many possible models [21, 22], and so forth. However, the process of yeast growth on glucose with ethanol production is generally described by the following three metabolic reactions, widely analyzed by Sonnleitner and Kappeli [20].

First, the reaction rate of the respiratory growth on glucose and the associated specific growth rate are where the kinetic terms associated with the glucose consumption and with the respiratory capacity are modelled by Monod-type laws: , , with and the maximal values of the specific growth rates of glucose and oxygen, and and the saturation parameters for glucose and oxygen uptake, respectively.

Second, the reaction rate of the fermentative growth on glucose and the associated specific growth rate are

Finally, if the oxidation capacity is sufficiently high to oxidize both ethanol and glucose, then their consumption is possible. Then, the reaction rate of the respiratory growth on ethanol and the associated specific growth rate are where is the potential ethanol oxidative rate, modelled by Monod-type law: , with the maximal value of the specific growth rates of ethanol, and the saturation parameter for growth on ethanol.

The above kinetic model is based on the well-known bottleneck hypothesis developed by Sonnleitner and Kappeli [20]. This model assumes a limited capacity of yeast, leading to the production of ethanol under conditions of oxygen limitation and/or high glucose concentration. The glucose concentration that corresponds to the oxidative capacity is reached when , and it is called critical concentration. Depending on this critical concentration, two operating regimes can be observed. At glucose concentration lower than its critical value, the system is in the respirative regime. The glucose consumption rate is smaller than the maximal respiratory capacity and the rate of the oxidative glucose metabolism is given by the glucose consumption rate. Ethanol can be oxidized in parallel with glucose and the rate of oxidative ethanol metabolism is determined by the excess of respiratory capacity and the available ethanol. At glucose concentration higher than its critical value, the system is in the so-called respirofermentative regime. Then the glucose consumption rate is larger than the maximal respiratory capacity and the respiratory capacity establishes the rate of the oxidative glucose metabolism. The excess of glucose is processed by the fermentative metabolism. Under lack of oxygen conditions, the fermentative pathway is predominating.

Since there are three unknown kinetics, the implementation of some state observers (such as asymptotic observers) require the online measurements of three state variables. Also, the online estimation algorithms for kinetic parameters (e.g., observer-based techniques) necessitate the knowledge of minimum three states. From the technological point of view, , , and are the online available state variables, so it is important to design the estimators by using these measurements. Unfortunately, according to the yield coefficient values, , , and are linearly dependent, with the corresponding yield matrix being ill conditioned [23]. This would result in sensitive algorithms to numerical errors, thus having performances degraded. To solve this problem, Pomerleau and Perrier [23] suggested a reformulation of model (2.10). This reformulation is based on the division of the complete process model (2.10) into two partial models, corresponding to the above-described mechanisms: the respirofermentative partial model for the ethanol production state of the process and the respirative partial model, corresponding to the ethanol consumption state of the process.

The fermentative partial model is stated as

The respirative partial model is stated as

3. Distribution-Based Identification of Yield Coefficients

3.1. Problem Statement

In this approach the set of nonlinear differential equations describing the state evolution is mapped into a set of linear algebraic equations with respect to the model parameters. Using techniques utilized in distribution approach, the measurable functions and their derivatives are represented by functionals on the fundamental space of testing functions [14]. The main advantages of this method are that a set of algebraic equations with real coefficients results and the formulations are free from boundary conditions. The idea of utilizing test functions in system identification was proposed by Pearson and Lee [13] in terms of modulating functions in order to ameliorate the noise handling for deterministic least-squares identification based on time-limited data.

Let us denote by the fundamental space from the distribution theory [24] of the real functions with compact support , having continuous derivatives at least up to the order . Let be a function which admits a Riemann integral on any compact interval from . Using this function, a unique distribution (or generalized function) can be built by the relation:

In distribution theory, the notion of -order derivative is introduced. If , then its -order derivative is a new distribution uniquely defined by the relations: where is the -order time derivative of the test function.

When , then that means the -order derivative of a distribution generated by a function equals the distribution generated by the -order time derivative of the function . So, in place of the states and their time derivatives of a system one utilize the corresponding distributions and, in some particular cases, it is possible to obtain a system of equations linear in parameters. If the system is compatible the model parameters are structurally identifiable.

In our study, it were utilized three types of test functions characterized by a bounded support , all of these accomplishing the condition:

The nonzero restriction is one of the following three types.(1)Exponential: (2)Sinusoidal: (3)Polynomial: where is an integer.

Figures 1 and 2 present the exponential type test function and its first-order derivative for .

3.2. The Hierarchical Structure of the Identification Algorithm

Consider all state variables accessible for measurements. This assumption can be considered unrealistic but the unknown states can be estimated using different approaches. From the multitude of proposed solutions one can mention methods based on balance equations, observers (asymptotic observers, high-gain observers) [5], Kalman filters [25], neural networks [26], and fuzzy reasoning [27].

The dynamical systems (2.17) and (2.18) contain rational dependences between parameters and measured variables. To obtain linear equations in unknown parameters, the identification problem is split in several simpler problems [28]. Based on the specific structure of these systems, it is possible to group the state equations in such way to determine some interconnected relations. They are organized in a hierarchical structure. In the first stage some state equations are utilized to obtain a set of linear equations in some parameters. If these parameters are identifiable then they can be used as known parameters in the following stages. This process is repeated in the other stages until all the parameters are identified.

Because the dilution rate can be externally modified, it will be considered the first component of the input vector and the other component of is the concentration .

From the third and fourth equations of the fermentative partial model (2.17) one gets the following relations for the reaction rates:

These relations are replaced in the other three equations to obtain linear equations with respect to yield coefficients. So, one can split the identification problem in the following steps.

Step 1 (identification of and ). Replacing (3.10) in the first equation of the system (2.17) one get where and .
One denotes
Multiplying these functions with a test function and integrating over one gets the following distributions:
As one must identify two parameters it is enough to choose a finite number of test functions and to build an algebraic system of equations, where, is the vector of unknown parameters, is a matrix of real numbers: and denotes an -column real vector build from distributions calculated as in (3.13): If then a unique solution is obtained: One obtains and .

Step 2 (identification of and ). Considering known and from Step 1 and substituting (3.10) in the second equation of (2.17) one gets where and .

The procedure of identification from Step 1 is repeated and one obtains and .

Step 3 (identification of and ). Considering known and from Step 1 and substituting (3.10) in the last equation of (2.17) one gets where and .

The procedure of identification from Step 1 is repeated and one obtains and .

Since the respirative partial model (2.18) has a similar structure with the fermentative partial model (2.17), in a similar manner one obtains the parameters , , and .

4. Analysis of the Algorithm Properties

In the following one presents some consistency and numerical aspects related to the presented algorithm and a short overview of the classical identification procedure widely used for this kind of biotechnological systems.

4.1. Identifiability

Identifiability is a necessary prerequisite for model identification; it concerns uniqueness of the model parameters determined from the input-output data, under ideal conditions of noise-free observations and error-free model structure. A remarkable feature of distribution-based identification procedure is that it provides a linear reparameterization of the input-output relation of the nonlinear system. This reparametrization of the system involves a very simple identifiability condition to be accomplished, that is, the existence of the matrix or, equivalently, is of full rank.

4.2. Consistency

Obviously, consistency of the estimates is directly influenced by the precision of numerical integration used to compute the value of the distributions. There are available numerous integration methods (often called numerical quadrature) with various degrees of precision. One presents shortly the techniques used in the simulations to the approximate calculation of a definite integral where is a given function and a finite interval. Interpolatory quadrature formulas, where the nodes are constrained to be equally spaced, are called Newton-Cotes formulas. These are especially suited for integrating a tabulated function (such is our case). Newton-Cotes numerical integration rule is a weighted sum of function values: , where coefficients are derived from interpolating polynomial fitted for points . A Newton-Cotes formula of any degree can be constructed. One of the simplest integration methods is the trapezoidal rule. The trapezoidal rule is based on linear interpolation of at and , that is, is approximated by The integral of equals the of trapezoid with base times the average height hence To increase the accuracy one subdivides the interval and assumes that is known on a grid of equidistant points , where is the step length. The trapezoidal approximation for the th subinterval is where Summing the contributions for each subinterval , gives The global truncation error is This shows that by choosing small enough we can make the truncation error arbitrarily small. In other words, we have asymptotic convergence when .

In the composite Simpson’s rule one divides the interval into an even number  m steps of length and uses the formula where are sums over odd and even indices, respectively. The remainder is

This shows that one have gained two orders of accuracy compared to the trapezoidal rule, without using more function evaluations.

Let us study properties of Newton-Cotes formulas in frequency domain (or spectral properties). Newton-Cotes rules are symmetric (hence linear phase) digital filters with finite impulse response. One of the most important characteristic of digital filter is magnitude/frequency response-function which shows how much filter damps or amplifies magnitude of particular frequency contained in input data.

So, for trapezoidal rule and Simpson’s rule one obtains

These transfer functions have the amplitude-frequency Bode characteristics from Figure 3.

As one can see classical Newton-Cotes formulas suppress high frequencies (noise) in the input data. In that sense they can be considered low-pass filters. From Figure 3 one observes that trapezoidal rule offers a better suppression of noise than the Simpson’s rule, so in the algorithm implementation one used the trapezoidal rule for numerical integration.

4.3. “Classical” Identification Procedure

The “classical” identification approach of yield coefficients is a two-step procedure under the assumption that full state measurements are available [1]. This method is based on a state transformation that allows reformulating the dynamical model into separate submodels. The first submodel depends only on the reaction structure and is independent of the kinetics. It can be linearly reparametrized and used for the identification of the yield coefficients by means of linear regressions, provided suitable identifiability conditions are satisfied. We present briefly this method for estimating the yield coefficients and we use it for comparison in the next section.

The general dynamical model given in (2.1) represents a particular class of nonlinear state-space models. The nonlinearity lies in the reaction rates that are (nonlinear) functions of the state variables. These functions enter the model in the form (where is a constant matrix), which is a set of linear combinations of the same nonlinear functions . This particular feature can be exploited to separate the nonlinear part from the linear part of the model by an adequate linear state transformation. More precisely, one chooses a nonsingular partition with of full row rank matrix (i.e., and a permutation matrix. The induced partitions of the vectors and are (2.1) is then partitioned into two submodels: Then with the state transformation one transform the initial model into where the matrix is the unique solution of that is, where is a generalized or pseudoinverse of . This subsystem (4.16) can be augmented with an equation derived from (2.7) as follows:

It can be considered as a linear time-varying (if varies in the course of time) model with state z, input , and output . It is nonlinearly parametrized by the yield coefficients but linearly reparametrized by the nonzero entries of . When data of the signals , , , and are available, the auxiliary model (4.19) can be used to identify the yield coefficients independently of the knowledge of the reaction rates. The model (4.19) can be used to perform the identification of the nonzero entries of C by a linear regression technique, with the yield coefficients recovered afterwards from (4.17). The identification makes sense only if the yield coefficients are structurally uniquely identifiable with the auxiliary model (4.19).

5. Simulation Results

The efficacy of our distribution-based approach is shown by simulations, comparing it with this “classical” approach reviewed in the previous section. Also, the influence of sampling period, test functions type, initial conditions, and input type is analysed. To compare statistical performances of the different approaches and simulation conditions the empirical normalized mean square error (NMSE) was used, which is defined as where is the number of estimated parameters, is the element of the estimated parameter vector, while the “*” superscript denotes the true value of the parameter.

The model given by (2.17) and (2.18) was integrated using a fourth-order Runge-Kutta routine with a sampling period of 1 minute.

In order to study the sensitivity of the estimation method to the sampling period , to the initial conditions, to the test functions type, to the input type, and to the noise, the following parameters were used:(a)sampling period: ,(b)input signals: ,(c)initial conditions: IC1: [1 0.2 0.15 0.0066 0.008] IC2: [0.1 0.3 0 0.002 0.01 2] IC3: [0.5 0.5 3 0.01 0.001 1],(d)test functions: Type1: exponential Type2: sinusoidal Type3: polynomial,(e)noise: {noise-free, zero-mean white (SNR = 50 dB, 40 dB, 30 dB, 20 dB, 10 dB)}.

Some of the signals used in the identification algorithm are presented in Figures 46. Figure 4 presents the noise-free system time response for the fermentative partial model (2.17). The measured states are represented in percent with respect to their maximal values. Three test function of sinusoidal type are presented in Figure 5. In Figure 6 some noisy measurements are presented.

The simulation results presented in Tables 15 leads to the following remarks.(1)Sensitivity to the sampling period: results given in Table 1 show that the performance of the algorithm deteriorates with the increasing sampling period.(2)Sensitivity to the inputs type: results given in Table 2 conclude that the algorithm is not too sensitive to the type of input. However, as was expected, the best performances are obtained for a sinusoidal input type.(3)Sensitivity to the initial conditions and test function type: as shown in Tables 3 and 4, initial conditions and type of the test functions have very little influence on the accuracy of the estimates.(4)Sensitivity to the noise: Tables 5 and 6 present the simulation results for distribution-based method and for “classical” method, respectively. The results are still very good in the distribution-based case even for the high level of noise (SNR = 10 dB), while the “classical” approach is less robust to increasing noise level.

6. Conclusions

The paper presents a distribution-based identification procedure for offline estimation of yield coefficients in a baker’s yeast bioprocess. This procedure is a functional type method, which transforms a differential system of equations to an algebraic system in unknown parameters. The relation between the state variables of the system is represented by functionals using techniques from distribution theory based on test functions from a finite-dimensional fundamental space. The identification algorithm has a hierarchical structure, which allows obtaining a linear algebraic system of equations in the unknown parameters. The coefficients of this algebraic system are functionals depending on the input and state variables and are evaluated through some testing functions from distribution theory.

According to the proposed procedure, first, only some state equations are evaluated throughout testing functions to obtain a set of linear equations in some parameters. The results of this first stage of identification are utilized for expressing other parameters by linear equations. This process is repeated until all parameters are identified.

The influence of the sampling period, initial conditions, test functions type, input type, and noise on the parameters estimates was empirically analyzed.

The proposed method presents two main advantages: there is necessary no information about the parameters of the reaction rates to identify the yield coefficients of the baker’s yeast bioprocess and the algorithm provides very good results even the measurements are noise-contaminated because the evaluation of states derivatives is completely avoided.

Nomenclature

:Dissolved oxygen concentration (g/L)
:Equilibrium concentration of the dissolved oxygen (g/L)
:Carbon dioxide transfer rate (h−1g/L)
:Dilution rate (h−1)
:Ethanol concentration (g/L)
:Vector of feeding rates
:Input feed rate (Lh−1)
:Dissolved carbon dioxide concentration (g/L)
:Matrix of yield coefficients
:Yield coefficients
:Mass transfer coefficient (h−1)
:Saturation parameter for growth on ethanol (g/L)
, :Saturation parameters for glucose/oxygen uptake, respectively (g/L)
:Transfer coefficient
:Oxygen transfer rate (h−1g/L)
:Vector of rates of removal of the components in gaseous form
:Maximal oxygen specific growth rate ()
:Maximal ethanol specific growth rate ()
:Maximal glucose specific growth rate ()
:Glucose concentration (g/L)
:Glucose concentration on the feed (g/L)
:Culture volume in the reactor (L)
:Biomass concentration (g/L)
:Vector of reaction rates (reaction kinetics)
, :Specific growth rates of baker’s yeast bioprocess (h−1)
:Vector of specific growth rates
:Kinetic term associated with the respiratory capacity
:Potential ethanol oxidative rate
:Kinetic term associated with the glucose consumption
:State vector
:The fundamental space from the distribution theory
:The set of real numbers
:Test function
:Distribution (generalized function)
:The vector of unknown parameters
:The unknown parameters.

Acknowledgment

This work was supported by CNCSIS-UEFISCSU, Project number PN II-RU TE 106, 9/04.08.2010.