#### Abstract

This paper considers the parameter estimation for linear time-invariant (LTI) systems in an input-output setting with output error (OE) time-delay model structure. The problem of missing data is commonly experienced in industry due to irregular sampling, sensor failure, data deletion in data preprocessing, network transmission fault, and so forth; to deal with the identification of LTI systems with time-delay in incomplete-data problem, the generalized expectation-maximization (GEM) algorithm is adopted to estimate the model parameters and the time-delay simultaneously. Numerical examples are provided to demonstrate the effectiveness of the proposed method.

#### 1. Introduction

The advanced process control theories have enjoyed rapid development in the past several decades to meet the growing demands of closed-loop system performances, such as improved process safety and efficiency of plant operation, consistent product quality, and economic optimization [1]. These control strategies have improved the process automation and stability through providing control solutions for the process operated under the abnormal working conditions, such as process fault [2–5], network transmission delay [6], data packet dropouts [7], and modeling error. Generally, the implementation of these control strategies relies on the understanding of the process dynamics and the availability of an accurate mathematical model of the process. In view of the difficulties and complexities imposed by modeling using first principle method, the data-driven modeling method, in which the process model is retrieved from the process data, has become a main modeling method.

Typically, the process data used in process modeling are generated by performing an identification experiment, in which a testing signal is designed and utilized to excite the process. Most of the conventional parameter estimation methods, such as prediction error method (PEM), instrumental variable (IV) method, and subspace method, assume that the identification data are sampled regularly and recorded properly. However, this is not always true in practical industry. For example, in the development of an inferential model for the sulfur content in the gas oil product, the sulfur concentration cannot be measured directly and the lab analysis is required which takes a long time. The process variable can be sampled in every minute, but the sulfur concentration is only available in every twelve hours. Another example is the industrial process with data transmission through the network. The recorded process data are corrupted by many network-induced problems, such as transmission delay and packet dropout or missing. Therefore, parameter estimation with irregular data has not been extensively investigated in the literature.

Time-delays are commonly encountered in various engineering systems, such as chemical processes, mechanical systems, network control systems, transmission line, and economic systems [8]. Since the existence of time-delay usually causes performance degradation of the inferential model and is frequently a source of instability of the closed-loop system, it should be handled carefully in the modeling process. Common methods to estimate the time-delays are nonparametric methods (e.g., step test or correlation analysis) and grid searching method. For example, Wang and Zhang [9] considered the robust identification problems of linear continuous time-delay systems from step responses. A linear regression equation was derived from the solution of the output time response and its various-order integrals and solved by using IV-least squares (LS) method. The parameters of the transfer function were then recovered from the LS solution. Weyer [10] considered to build a model for the open water channel. The model parameters were estimated using the grid searching method in which one model was established for each time-delay in a range. The final model was selected as the model which gave the best prediction performance for a validation data set. The time-delay estimation methods mentioned above are to estimate the time-delay and model parameters in a separate way.

Missing data problem is very common in process industry. A special example is the irregularly sampling system. Many critical parameters, such as the product concentration, steam quality, and boiling point, cannot be measured directly by using the sensors. These parameters are measured through lab analysis, so only the slow rate data are available. However, the process variables, such as the temperature, pressure, and flow rate, can be measured on-line in fast rate by using the sensors. Therefore, we can treat the data samples between the slow rate data as missing data. Another example is the network control system in which data transmit via the wireless network or internet. Data missing occurs due to data packet dropout or missing. Other reasons for missing data are sensor fault, data recording system malfunction, and so forth. Several methods have been reported in the literature to handle missing data problem in system identification. For example, F. Ding and J. Ding [11] proposed an auxiliary model-based approach to cope with the problems of parameter estimation and output estimation with irregularly missing output data using the PEM method. The outputs of the auxiliary model were used in the identification process. Zhu et al. [12] considered the identification of systems with slowly and irregularly sampled output data. The output error method was employed to estimate the fast rate model based on the fast input and slow output data. However, the methods mentioned above just used part of the process data, which may lead to information missing. Moreover, the statistical properties of the model parameters and the process noise cannot be given in these methods.

The work introduced in this paper aims at handling the identification problem of the LTI systems with missing output data in the presence of time-delay. The identification problem is formulated under the scheme of the generalized expectation-maximization (GEM) algorithm and the time-delay and missing output data are handled simultaneously. The GEM algorithm consists of expectation step (E-step) and maximization step (M-step). In the M-step, the maximization problem is transformed into an equivalent minimization problem and this problem is solved by using a general numerical optimization algorithm.

The rest of this paper is organized as follows. The problem statement is presented in Section 2. A brief revisit of the GEM algorithm and the mathematical formulation of the identification of LTI time-delay systems with incomplete data set are given in Section 3. Numerical examples are presented in Section 4 to show the effectiveness of the proposed method. The conclusions are given in Section 5.

#### 2. Problem Statement

Consider the LTI system described by the following output error (OE) time-delay model: where is the time-delay which is assumed to be integer multiples of the sampling period, is the Gaussian white noise with zero mean and variance , and and are the output and input, respectively. The transfer function has the following form: Here, we assume that the model orders and are known a priori and the time-delay is uniformly distributed in a known range of .

The identification data are collected. We denote as and as . Since part of the output data are missing completely at random (MCAR), the output data set can be divided into and . Therefore, the identification problem is to estimate the parameters , the noise variance , and the time-delay based on the identification data and .

#### 3. Parameter Estimation Using the GEM Algorithm

##### 3.1. GEM Algorithm Revisit

The GEM algorithm is a general-purpose iterative optimization algorithm to derive the maximum likelihood (ML) estimate and it has attracted great attentions of the researcher due to its flexibility in handling the missing data or hidden state [13]. Denote the missing data set by and the observed data set by . The main idea of the GEM algorithm is that, instead of optimizing the likelihood of the observed , the conditional expectation of the complete data likelihood function with respect to the missing data set is calculated in the E-step and the maximization problem is solved in the M-step. The procedures of the GEM algorithm to calculate the ML estimate can be described as follows [13]:

E-step: given the and the parameter estimate in previous iteration, the -function can be calculated by

M-step: find the to increase over its value at ; that is,

The E-step and M-step alternate until the relative change of the parameter estimate between neighboring iterations is smaller than a prespecified arbitrary small constant or the maximal iteration number is achieved.

##### 3.2. LTI Time-Delay System Identification with Missing Output Using GEM Algorithm

Here, we treat the time-delay as a hidden state variable. The observed data set is constructed as and the missing data set is constructed as . The parameter vector is constructed as .

Based on the Bayesian property, the likelihood function of the complete data set can be decomposed into

The term can be further decomposed into Based on (1) and (2), depends only on the previous input sequence , the time-delay , and the parameter vector . Therefore, (6) can be rewritten as

Since the time-delay is uniformly distributed in the range , the probability of taking any value in this range is a constant. Since the input is measurable data and it is independent of the parameter vector , the term is a constant. Therefore, the last two terms of (5) will not play a role in the following derivations. The complete data likelihood function can be further written as where .

Therefore, the conditional expectation of the log complete data density in (3) can be written as

The expectation is firstly taken with respect to the discrete variable ; then we have

The expectation is then taken with respect to the continuous variable , so we have

In order to calculate , the unknown terms should be calculated firstly. Consider

Therefore, the -function can be rewritten as

In the M-step of the GEM algorithm, the unknown parameters should be estimated to increase the -function by solving an optimization problem. Taking the gradient of the -function (13) with respect to the and setting it to zeros, we have Substituting into the -function (13), we get Based on the monotonicity of the log function, the problem is transformed into optimizing the following cost function:

Here, we introduce the variable denoting the noise-free output with time-delay . Based on (1) and (2), we have where . Therefore, the cost function can be rewritten as where . However, the cost function (18) cannot be optimized directly due to the unmeasurable . Here, we adopt the auxiliary model principle and the auxiliary model can be constructed based on the estimates obtained in the previous iteration. That is, where . Therefore, the cost function (18) with substituted by can be optimized by using the damped Newton algorithm, where where is a constant.

The time-delay can be selected as the delay in the range with maximal posterior probability. That is, The E-step and M-step alternate until the convergence condition of the GEM algorithm is met.

#### 4. Simulation Examples

##### 4.1. A Numerical Simulation Example

Consider the following LTI time-delay system described by the OE time-delay model: The input data and output data are generated by simulation and the noise with zero mean and variance is added to the output. The input and output data are shown in Figure 1. In the simulation, output data are randomly missing. The parameter range of the time-delay is set to . The method proposed in this paper is used to estimate the parameters and the time-delay. The parameter estimate trajectories of the model parameters and the noise variance are shown in Figures 2 and 3, respectively. The estimated time-delay is which is consistent with the true time-delay. To further verify the effectiveness of the proposed method, the simulations are also performed with output data missing and output data missing. The estimated parameters after 13 iterations are summarized in Table 1. It can be seen from these figures and the table that the proposed GEM algorithm has a good identification performance.

**(a)**

**(b)**

##### 4.2. The Continuous Stirred Tank Reactor

The Continuous Stirred Tank Reactor (CSTR) is a benchmark example used to test the performances of different modeling and control algorithms and the first principle model of the CSTR is described as [14] where the product concentration and the temperature are output variables and the coolant flow rate is the input variable. The steady state values of the process variables can be found in Gopaluni [14]. In this simulation, the CSTR is operated at a steady state working point which is at L/min, mol/L, and K. The task here is to build a first-order model between and . The input and output data are generated through simulation and the noise with zero mean and variance is added to the output data. Since time is needed to measure the concentration , so the measurement delay with minutes is also added to the output data. The input and output data is shown in Figure 4. In this simulation, output data are randomly missing and the parameter range of the time-delay is set to . The proposed GEM algorithm is used to estimate the unknown parameters. The estimated parameters are , , , and . The self-validation and the cross-validation results are shown in Figures 5 and 6. It can be seen from these results that the proposed method has a good identification performance and the estimated model can capture the dynamic behavior of the CSTR.

**(a)**

**(b)**

#### 5. Conclusion

This paper considers the identification problem of LTI systems with irregular data set. The time-delay and the missing data are commonly encountered problems in process industry and the existence of these problems makes the process modeling a challenging task. The identification problem with incomplete data set in the presence of time-delay is formulated under the scheme of the GEM algorithm and the model parameters and the time-delay are estimated simultaneously in this algorithm. Numerical examples are presented to demonstrate the efficacy of the proposed method.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.