New Trends in Networked Control of Complex Dynamic Systems: Theories and Applications
View this Special IssueResearch Article  Open Access
Identification of LTI TimeDelay Systems with Missing Output Data Using GEM Algorithm
Abstract
This paper considers the parameter estimation for linear timeinvariant (LTI) systems in an inputoutput setting with output error (OE) timedelay model structure. The problem of missing data is commonly experienced in industry due to irregular sampling, sensor failure, data deletion in data preprocessing, network transmission fault, and so forth; to deal with the identification of LTI systems with timedelay in incompletedata problem, the generalized expectationmaximization (GEM) algorithm is adopted to estimate the model parameters and the timedelay simultaneously. Numerical examples are provided to demonstrate the effectiveness of the proposed method.
1. Introduction
The advanced process control theories have enjoyed rapid development in the past several decades to meet the growing demands of closedloop system performances, such as improved process safety and efficiency of plant operation, consistent product quality, and economic optimization [1]. These control strategies have improved the process automation and stability through providing control solutions for the process operated under the abnormal working conditions, such as process fault [2–5], network transmission delay [6], data packet dropouts [7], and modeling error. Generally, the implementation of these control strategies relies on the understanding of the process dynamics and the availability of an accurate mathematical model of the process. In view of the difficulties and complexities imposed by modeling using first principle method, the datadriven modeling method, in which the process model is retrieved from the process data, has become a main modeling method.
Typically, the process data used in process modeling are generated by performing an identification experiment, in which a testing signal is designed and utilized to excite the process. Most of the conventional parameter estimation methods, such as prediction error method (PEM), instrumental variable (IV) method, and subspace method, assume that the identification data are sampled regularly and recorded properly. However, this is not always true in practical industry. For example, in the development of an inferential model for the sulfur content in the gas oil product, the sulfur concentration cannot be measured directly and the lab analysis is required which takes a long time. The process variable can be sampled in every minute, but the sulfur concentration is only available in every twelve hours. Another example is the industrial process with data transmission through the network. The recorded process data are corrupted by many networkinduced problems, such as transmission delay and packet dropout or missing. Therefore, parameter estimation with irregular data has not been extensively investigated in the literature.
Timedelays are commonly encountered in various engineering systems, such as chemical processes, mechanical systems, network control systems, transmission line, and economic systems [8]. Since the existence of timedelay usually causes performance degradation of the inferential model and is frequently a source of instability of the closedloop system, it should be handled carefully in the modeling process. Common methods to estimate the timedelays are nonparametric methods (e.g., step test or correlation analysis) and grid searching method. For example, Wang and Zhang [9] considered the robust identification problems of linear continuous timedelay systems from step responses. A linear regression equation was derived from the solution of the output time response and its variousorder integrals and solved by using IVleast squares (LS) method. The parameters of the transfer function were then recovered from the LS solution. Weyer [10] considered to build a model for the open water channel. The model parameters were estimated using the grid searching method in which one model was established for each timedelay in a range. The final model was selected as the model which gave the best prediction performance for a validation data set. The timedelay estimation methods mentioned above are to estimate the timedelay and model parameters in a separate way.
Missing data problem is very common in process industry. A special example is the irregularly sampling system. Many critical parameters, such as the product concentration, steam quality, and boiling point, cannot be measured directly by using the sensors. These parameters are measured through lab analysis, so only the slow rate data are available. However, the process variables, such as the temperature, pressure, and flow rate, can be measured online in fast rate by using the sensors. Therefore, we can treat the data samples between the slow rate data as missing data. Another example is the network control system in which data transmit via the wireless network or internet. Data missing occurs due to data packet dropout or missing. Other reasons for missing data are sensor fault, data recording system malfunction, and so forth. Several methods have been reported in the literature to handle missing data problem in system identification. For example, F. Ding and J. Ding [11] proposed an auxiliary modelbased approach to cope with the problems of parameter estimation and output estimation with irregularly missing output data using the PEM method. The outputs of the auxiliary model were used in the identification process. Zhu et al. [12] considered the identification of systems with slowly and irregularly sampled output data. The output error method was employed to estimate the fast rate model based on the fast input and slow output data. However, the methods mentioned above just used part of the process data, which may lead to information missing. Moreover, the statistical properties of the model parameters and the process noise cannot be given in these methods.
The work introduced in this paper aims at handling the identification problem of the LTI systems with missing output data in the presence of timedelay. The identification problem is formulated under the scheme of the generalized expectationmaximization (GEM) algorithm and the timedelay and missing output data are handled simultaneously. The GEM algorithm consists of expectation step (Estep) and maximization step (Mstep). In the Mstep, the maximization problem is transformed into an equivalent minimization problem and this problem is solved by using a general numerical optimization algorithm.
The rest of this paper is organized as follows. The problem statement is presented in Section 2. A brief revisit of the GEM algorithm and the mathematical formulation of the identification of LTI timedelay systems with incomplete data set are given in Section 3. Numerical examples are presented in Section 4 to show the effectiveness of the proposed method. The conclusions are given in Section 5.
2. Problem Statement
Consider the LTI system described by the following output error (OE) timedelay model: where is the timedelay which is assumed to be integer multiples of the sampling period, is the Gaussian white noise with zero mean and variance , and and are the output and input, respectively. The transfer function has the following form: Here, we assume that the model orders and are known a priori and the timedelay is uniformly distributed in a known range of .
The identification data are collected. We denote as and as . Since part of the output data are missing completely at random (MCAR), the output data set can be divided into and . Therefore, the identification problem is to estimate the parameters , the noise variance , and the timedelay based on the identification data and .
3. Parameter Estimation Using the GEM Algorithm
3.1. GEM Algorithm Revisit
The GEM algorithm is a generalpurpose iterative optimization algorithm to derive the maximum likelihood (ML) estimate and it has attracted great attentions of the researcher due to its flexibility in handling the missing data or hidden state [13]. Denote the missing data set by and the observed data set by . The main idea of the GEM algorithm is that, instead of optimizing the likelihood of the observed , the conditional expectation of the complete data likelihood function with respect to the missing data set is calculated in the Estep and the maximization problem is solved in the Mstep. The procedures of the GEM algorithm to calculate the ML estimate can be described as follows [13]:
Estep: given the and the parameter estimate in previous iteration, the function can be calculated by
Mstep: find the to increase over its value at ; that is,
The Estep and Mstep alternate until the relative change of the parameter estimate between neighboring iterations is smaller than a prespecified arbitrary small constant or the maximal iteration number is achieved.
3.2. LTI TimeDelay System Identification with Missing Output Using GEM Algorithm
Here, we treat the timedelay as a hidden state variable. The observed data set is constructed as and the missing data set is constructed as . The parameter vector is constructed as .
Based on the Bayesian property, the likelihood function of the complete data set can be decomposed into
The term can be further decomposed into Based on (1) and (2), depends only on the previous input sequence , the timedelay , and the parameter vector . Therefore, (6) can be rewritten as
Since the timedelay is uniformly distributed in the range , the probability of taking any value in this range is a constant. Since the input is measurable data and it is independent of the parameter vector , the term is a constant. Therefore, the last two terms of (5) will not play a role in the following derivations. The complete data likelihood function can be further written as where .
Therefore, the conditional expectation of the log complete data density in (3) can be written as
The expectation is firstly taken with respect to the discrete variable ; then we have
The expectation is then taken with respect to the continuous variable , so we have
In order to calculate , the unknown terms should be calculated firstly. Consider
Therefore, the function can be rewritten as
In the Mstep of the GEM algorithm, the unknown parameters should be estimated to increase the function by solving an optimization problem. Taking the gradient of the function (13) with respect to the and setting it to zeros, we have Substituting into the function (13), we get Based on the monotonicity of the log function, the problem is transformed into optimizing the following cost function:
Here, we introduce the variable denoting the noisefree output with timedelay . Based on (1) and (2), we have where . Therefore, the cost function can be rewritten as where . However, the cost function (18) cannot be optimized directly due to the unmeasurable . Here, we adopt the auxiliary model principle and the auxiliary model can be constructed based on the estimates obtained in the previous iteration. That is, where . Therefore, the cost function (18) with substituted by can be optimized by using the damped Newton algorithm, where where is a constant.
The timedelay can be selected as the delay in the range with maximal posterior probability. That is, The Estep and Mstep alternate until the convergence condition of the GEM algorithm is met.
4. Simulation Examples
4.1. A Numerical Simulation Example
Consider the following LTI timedelay system described by the OE timedelay model: The input data and output data are generated by simulation and the noise with zero mean and variance is added to the output. The input and output data are shown in Figure 1. In the simulation, output data are randomly missing. The parameter range of the timedelay is set to . The method proposed in this paper is used to estimate the parameters and the timedelay. The parameter estimate trajectories of the model parameters and the noise variance are shown in Figures 2 and 3, respectively. The estimated timedelay is which is consistent with the true timedelay. To further verify the effectiveness of the proposed method, the simulations are also performed with output data missing and output data missing. The estimated parameters after 13 iterations are summarized in Table 1. It can be seen from these figures and the table that the proposed GEM algorithm has a good identification performance.

(a)
(b)
4.2. The Continuous Stirred Tank Reactor
The Continuous Stirred Tank Reactor (CSTR) is a benchmark example used to test the performances of different modeling and control algorithms and the first principle model of the CSTR is described as [14] where the product concentration and the temperature are output variables and the coolant flow rate is the input variable. The steady state values of the process variables can be found in Gopaluni [14]. In this simulation, the CSTR is operated at a steady state working point which is at L/min, mol/L, and K. The task here is to build a firstorder model between and . The input and output data are generated through simulation and the noise with zero mean and variance is added to the output data. Since time is needed to measure the concentration , so the measurement delay with minutes is also added to the output data. The input and output data is shown in Figure 4. In this simulation, output data are randomly missing and the parameter range of the timedelay is set to . The proposed GEM algorithm is used to estimate the unknown parameters. The estimated parameters are , , , and . The selfvalidation and the crossvalidation results are shown in Figures 5 and 6. It can be seen from these results that the proposed method has a good identification performance and the estimated model can capture the dynamic behavior of the CSTR.
(a)
(b)
5. Conclusion
This paper considers the identification problem of LTI systems with irregular data set. The timedelay and the missing data are commonly encountered problems in process industry and the existence of these problems makes the process modeling a challenging task. The identification problem with incomplete data set in the presence of timedelay is formulated under the scheme of the GEM algorithm and the model parameters and the timedelay are estimated simultaneously in this algorithm. Numerical examples are presented to demonstrate the efficacy of the proposed method.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
References
 S. Yin, H. Luo, and S. X. Ding, “RealTime implementation of faulttolerant control systems with performance optimization,” IEEE Transactions on Industrial Electronics, vol. 61, no. 5, pp. 2402–2411, 2014. View at: Publisher Site  Google Scholar
 S. Yin, S. X. Ding, A. Haghani, and P. Zhang, “A comparison study of basic datadriven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process,” Journal of Process Control, vol. 22, no. 9, pp. 1567–1581, 2012. View at: Publisher Site  Google Scholar
 S. Yin, X. Yang, and H. R. Karimi, “Datadriven adaptive observer for fault diagnosis,” Mathematical Problems in Engineering, vol. 2012, Article ID 832836, 21 pages, 2012. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 S. Yin, S. X. Ding, A. H. A. Sari, and H. Hao, “Datadriven monitoring for stochastic systems and its application on batch process,” International Journal of Systems Science, vol. 44, no. 7, pp. 1366–1376, 2013. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 S. Yin, G. Wang, and H. Karimi, “Datadriven design of robust fault detection system for wind turbines,” Mechatronics, 2013. View at: Publisher Site  Google Scholar
 H. Dong, Z. Wang, and H. Gao, “Distributed ${H}_{\infty}$ filtering for a class of Markovian jump nonlinear timedelay systems over lossy sensor networks,” IEEE Transactions on Industrial Electronics, vol. 60, no. 10, pp. 4665–4672, 2013. View at: Publisher Site  Google Scholar
 H. Dong, Z. Wang, J. Lam, and H. Gao, “Fuzzymodelbased robust fault detection with stochastic mixed time delays and successive packet dropouts,” IEEE Transactions on Systems, Man, and Cybernetics B, vol. 42, no. 2, pp. 365–376, 2012. View at: Publisher Site  Google Scholar
 J. P. Richard, “Timedelay systems: an overview of some recent advances and open problems,” Automatica, vol. 39, no. 10, pp. 1667–1694, 2003. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 Q. G. Wang and Y. Zhang, “Robust identification of continuous systems with deadtime from step responses,” Automatica, vol. 37, no. 3, pp. 377–390, 2001. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 E. Weyer, “System identification of an open water channel,” Control Engineering Practice, vol. 9, no. 12, pp. 1289–1299, 2001. View at: Publisher Site  Google Scholar
 F. Ding and J. Ding, “Leastsquares parameter estimation for systems with irregularly missing data,” International Journal of Adaptive Control and Signal Processing, vol. 24, no. 7, pp. 540–553, 2010. View at: Publisher Site  Google Scholar  Zentralblatt MATH  MathSciNet
 Y. C. Zhu, H. Telkamp, J. H. Wang, and Q. L. Fu, “System identification using slow and irregular output samples,” Journal of Process Control, vol. 19, no. 1, pp. 58–67, 2009. View at: Publisher Site  Google Scholar
 G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions, John Wiley & Sons, New York, NY, USA, 2007. View at: Publisher Site  MathSciNet
 R. B. Gopaluni, “A particle filter approach to identification of nonlinear processes under missing observations,” The Canadian Journal of Chemical Engineering, vol. 86, no. 6, pp. 1081–1092, 2008. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2014 Xianqiang Yang and Hamid Reza Karimi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.