Recent Advances on Modeling, Control, and Optimization for Complex Engineering SystemsView this Special Issue
Research Article | Open Access
Christophe Corbier, Jean-Claude Carmona, "Mixed Estimators Variety for Model Order Reduction in Control Oriented System Identification", Mathematical Problems in Engineering, vol. 2015, Article ID 349070, 18 pages, 2015. https://doi.org/10.1155/2015/349070
Mixed Estimators Variety for Model Order Reduction in Control Oriented System Identification
A new family of MLE type estimators for model order reduction in dynamical systems identification is presented in this paper. A family of distributions proposed in this work combines () and () distributions which are quantified by four parameters. The main purpose is to show that these parameters add degrees of freedom (DOF) in the estimation criterion and reduce the estimated model complexity. Convergence consistency properties of the estimator are analysed and the model order reduction is established. Experimental results are presented and discussed on a real vibration complex dynamical system and pseudo-linear models are considered.
The choice of the norm is a fundamental problem in system identification. For some special cases, the least squares (LS) (), the least absolute deviation (LAD) (), and the Chebyshev estimator () methods have been proposed by many authors [1, 2]. It is well known that the norm estimators are efficient when the noise follows the Laplace distribution, the norm estimators are efficient when the noise follows the Gaussian distribution, and the norm estimators are efficient when the noise follows the uniform distribution. Furthermore, it has been shown that the norm estimator is the MLE if noise is a generalized -Gaussian . Thus, we can always find a suitable exponent for the norm method. In the reference paper , the authors discuss smooth and sensitive norms for prediction error system identification when disturbances are magnitude-bounded. They show that a necessary condition for norm to be statistically robust with respect to the family of the -contaminated distribution model with support for some arbitrary is that its second derivative does not vanish on the support and must be strictly positive with respect to . This latter condition is fundamental in our framework by ensuring the parameter estimate variance convergence rate (see Proposition 4.2 and Corollary 4.3 ). However, many drawbacks come from the use of the norm. Firstly, is the unique DOF of the norm. Secondly, the support is a restricted condition to deal with the outliers . Thirdly, the choice of may prevent robustness to the large data. Finally, most of the time, norm is used to the model reduction for robust control  with the restricted case where the model parameters number is limited. The norms are widely used in many domains [7, 8]. Even though the formal framework presents some difficulties, previous works show the interest to use these estimators. In system identification, Chen et al. in  propose an ARMA robust system identification using a generalized norm estimation algorithm. The authors use this norm to identify a system with a non-Gaussian noise, since the classical LSE is related to the situation when the noise distribution is white-Gaussian. Likewise, these authors in  propose the parameter estimation of linear systems with input-output noisy data, supposed to be corrupted by measurement noise unknown distributions.
Here, we deal with the problem of the estimation in system identification in the presence of atypical data (outliers) in the vibration dynamical system output signal, in order to reduce the model order for the controller design. In such a vibration dynamical system, natural outliers occur in the output signal, increasing considerably the model order in the estimation procedure.
To avoid this problem, a new variety of MLE type estimators are considered. We define a mixed distribution with multiple DOF containing and , parametrized by a threshold named scaling factor, denoted by . To ensure both the parameter estimate variance convergence and the model order reduction and apply conditions in  related to the second derivative of the norm on the support with respect to , we show that there exists a tradeoff between , , and . This reduction is efficient only with and , involving .
Model reduction has been subject to considerable interest in many domains. In , a method of the model reduction of the nonlinear complex Ginzburg-Landau equation is presented. In  the authors propose dimension reduction and dynamics of a spiking neural network model for decision making under neuromodulation. The model order reduction is a fundamental step, espacially in control design. To estimate a low order model of a system, several possibilities exist. The most obvious one is to directly estimate a lower order model. As known from, for example, Ljung , the prediction/output error estimate automatically gives models that are approximations of the true system in frequency-weight norm, determined by the input spectrum and noise model. See [14, chapter 8] for more details. A second possibility is to estimate a high order model which is then subjected to model reduction to the desired order. Indeed, some contributions that take into account that the high order model is obtained through an identification experiment when performing model reduction are Porat and Friedlander , Porat , Söderström et al. , Stoica and Söderström , Zhu and Backx [19, chapter 7] and Tjärnström and Ljung , and Tjärnström . Porat and Friedlander study an ARMA parameter estimation via covariance estimates. Recently, in time domain,  presents a relevant paper on a new model order reduction algorithm based on general orthogonal polynomials. The following contributors deal with models having input signals. Söderström et al.  focus on model structures that can be embedded in larger structures which are easy to estimate, such as ARX structures. After estimating the high order structure, they reduce the estimate to the low order structure in a weighted nonlinear least-squares sense. The method is called indirect prediction error method. In , the authors focus on model reduction. They study FIR and OE models. The former is estimated directly from data and the latter is computed by reducing a high order model, by model reduction. For OE models, they show that the reduced model has the same variance as the directly estimated one, if the reduced model class used contains the true system. Recently, in , the authors illustrate procedures to identify a state-space representation of a lossless or dissipative system from a given noise-free trajectory. The idea is to perform model reduction by obtaining a balanced realization directly from data and truncating it to obtain a reduced order model. However, these different model order reduction methods are performed after estimation phases with high model orders. On the other hand, this reduction is made in favorable conditions, where no outliers are present in the dataset and the prediction errors. In practice, we all know that real conditions lead to model complexity strongly increased. The lack of robustness of these methods damages the estimation and the model order. Different works have been performed, using robust estimators. A first alternative is the use of the norm [24–26], as a possibility to reduce the model complexity to identify a complex dynamical system, for example, an acoustic duct for active noise control (ANC). The authors propose relevant OE (output error) models with complexity equal to . The second one is to use a symmetrix convex Huber’s function . Recently, in [28, 29], the identification of the same plant, using estimators with low values of the scaling factor in Huber’s function, proposes OE- models with complexity equal to .
Here, we will tackle the distribution approach and, from this, we will propose a parameterized robust estimation criterion (PREC) using a symmetric convex function, offering to the user more flexibility in terms of an extra DOF, in order to improve the estimation balance robustness/performances. We will show that the number of DOF in this function facilitates the model order reduction. We will see that the reduced order models are provided by the estimation itself, where classical approaches separate the phases, estimation and then reduction. Indeed, we will demonstrate that the model order thus obtained remains lower than the corresponding estimate. Finally, experimental results on the acoustic duct will confirm these results.
We start in Section 2 by introducing the mathematical background related to definition and properties of the estimator. We develop the consistency and convergence properties of this estimator. The application to the system identification is presented in Section 3. Section 4 proposes the main results concerning the model order reduction. In Section 5, the improvement of the model order reduction with respect to the number of DOF is presented. Experimental results based on a vibration dynamical system are presented and discussed in Section 6. Conclusions and perspectives are drawn in Section 7.
2. Definition and Properties of the Estimators
The new family of -distributions proposed in this paper combines and distributions, quantified by four parameters, (), (), (), and (). Hence, it can be viewed as a generalization of the redescending mixed distributions, which corresponds to the M-estimates and (see [27, chapter 4, page 84-85]). A formal definition of a probability density function (pdf) of a random variable is as follows.
Definition 1. Consider a function , such thatwhere withwhere and are, respectively, the complete and incomplete Euler’s gamma functions. We can easily verify that, for all , and , which ensure that is a probability density. The parameter is defined as a threshold named scaling factor. Its value depends on the degree of the contamination of the random variables . Moreover, a low value will ensure a good robustness and a high value a good efficiency of the estimation.
The pdf belongs to the maximum likelihood estimators (MLE) class. Then, there exists a symmetric function variety such that .
Definition 2. Let one define a new estimator variety as a estimator, denoted by . Let be a probability space and let be a sequence of i.i.d.r.v’s with values in . Let be a Borel subset in . Let be a symmetric function such that is measurable for each . The estimator is defined by a minimum of the formwherewith and for all .
In robust statistics, continuity conditions are necessary to ensure good estimations [27, chapter 2, page 24-25]. Therefore, these conditions must be applied for both and its first derivative. This latter must be a bounded continuous function. From these considerations, we define the dimensional continuity conditions (DCC) as follows:where and .
The dimensional is given from DCC and, after straightforward calculations, we obtain Equation (6) allows to show that, for the conditions and , we get . We now define the gross error model (GEM) in the mixed framework as a generalization of the GEM given in [27, chapter 1, page 12].
Definition 3. Let be a fixed probability distribution having a twice differentiable density , such that is convex on the convex support of . Let be the level of contamination and let be the set of all probability distributions arising from through the -contamination modelHere, is the set of all probability measures on the real line. The corresponding pdf of , contained in , is given by (1), and it puts all contamination outside . A suitable value of should efficiently reduce the propagation of outlier in the prediction errors. This is done by . Accordingly, the treatment of the estimated residuals is focused on the outlier occurrence, so the following errors rapidly decrease and they are advantageously treated by .
Now, our goal is to provide the relation between the level of contamination and the scaling factor . In practice, the robustness is defined by a suitable value of that fixes the -contamination model.
Theorem 4. If one adjusts the scaling factor in , such that , from the condition , and are linked bywith where , , is incomplete Euler’s gamma function.
Figure 1 shows an example of the level of contamination as a function of , , parameterized by , with . A large value of the scaling factor means that the probability distribution of the prediction errors is weakly disturbed, corresponding to (see Figure 1). Conversely, a small value corresponds to a pdf more disturbed. For , the curves have a minimum, meaning that a particular value of the scaling factor renders the corresponding pdf weakly contaminated.
Definition 5. Let be a symmetric function given bywhere is the unit function with the condition . Let us introduce two index subsets and , respectively, related to the and contributions, defined by and . From this, the -PREC to be minimized is given bywhere is the estimated modeling error and is the sign function such that for , for , and for .
In order to prove the consistency and convergence of the robust estimator , we need the following lemma related to the moment of order of the random variable .
Lemma 6. For the considered pdf , the moments of order of a random variable arewith
See Appendix B for the proof.
Theorem 7. Let be a compact subset of and let be the estimator obtained as a solution of the problemLet be the global minimum obtained as a solution of the problemand then is a consistent estimate of ; that is,
In order to do so, we use the (MLE) consistency result by Newey and McFadden  and show that all the assumptions of their theorem hold.
Theorem 8. Suppose that are iid rv’s with pdf and(i);(ii), which is compact;(iii) is continuous at each w.p.1;(iv).Then .
See Appendix C for the proof.
Remark 9. Before presenting the theorem related to the asymptotic distribution of the estimator, let us notice that the standard asymptotic normality results for MLE requires that the PREC be twice continuously differentiable, which is not the case here. There exist, however, asymptotic normality results for nonsmooth functions and we will hereafter use the one proposed by Newey and McFadden  and Andrews . The basic insight of their approaches is that the smoothness condition of the PREC, , can be replaced by a smoothness of its limit, which in the standard maximum likelohood case corresponds to the expectation , with the requirement that certain remainder terms are small. Hence, the standard differentiability assumption is replaced by a stochastic differentiability condition, which can then be used to show that the MLE is consistent and asymptotically normal. Let us consider the -function as the derivative with respect to of denoted by . If this function is differentiable in , one can establish the asymptotic normality of by expanding about using element by element mean value expansions. This is the standard way of establishing asymptotic normality of the estimator. In a variety of applications, however, is not differentiable in , or not even continuous, due to the appearence of a sign function. In such a case, one can still establish asymptotic normality of the estimator provided is differentiable in . Since the expectation operator is a smoothing operator, is often differentiable in even though is not.
Let us consider the following theorem establishing the asymptotic distribution of the estimator.
Theorem 10. Let be iid rv’s from the pdf with an unknown parameter , with a compactness and interior of . Then the MLE of is consistent and asymptotically normal:where is the asymptotic covariance matrix.
In order to do so, we use Theorem 7.1 given by Newey and McFadden  and show that all the assumptions of their theorem hold.
Theorem 11. Suppose , , and(i) is maximized on at ;(ii) is an interior point of ;(iii) is twice differentiable at with nonsingular second derivative ;(iv);(v)for any , with the remainderand then .
The proof is given in Appendix D.
3. Application to the System Identification
Having defined the mixed framework, our purpose is now to apply this approach to the system identification by introducing the -PREC and its first and second derivatives, to express the asymptotic covariance matrix of the estimator. Before this, let us recall the prediction error context.
3.1. Prediction Error Approach
Throughout the paper, we denote the input signal by , the output signal by , and the disturbance signal by as a random variable sequence with zero mean and variances , and is the total number of measured data. We assume that is generated according towhere is a linear time-invariant system, usually referred to as the true system and is the noise filter. In the case of OE models, the model set is parameterized by a -dimensional real-valued parameter vector ; that is,with , , and . Moreover, is the lag operator such that , . The prediction error (also named estimated residuals) is with the prediction model, the regressor, and the gradient of .
3.2. Limit Hessian and -Matrix
The main purpose of this section is to express the asymptotic covariance matrix of the estimator for prediction error estimates in case, essentially, a true description of the system is available in the model set. The condition then becomes , where is a model structure, a mapping from a parameter space to a set of models. Let us consider the convergence domain defined by . We put ourselves in the case where the global minimum is equal to the true parameter vector ; in this case, . To establish this asymptotic covariance matrix, we need the limit Hessian and the -matrix. Using Theorem 10, the covariance matrix is given bywith the limit Hessian and the -matrix yielded, respectively, bywhere , . See Appendix E for the proof.
Remark 12. It is easy to show that the second derivative of is given by withSince , the second term on the right-hand side of (23) is negative. Accordingly, there exists a limit value of denoted by such that (23) is positive and then verifies Proposition 4.2 and Corollary 4.3 in .
3.3. Robust Final Prediction Error
Similar to the RFPE in Huber’s framework , we now establish a quality measure for a given model. Let be the robust model to be evaluated. It is estimated within the model structure . Let us denote , this quality measure that depends on the actual data record . We classically havewhere is the limit criterion. Let be the argument of the minimum of this limit criterion and let us expand around , and we obtainSimilarly, since ,From the Hessian uniform convergence theorem and the expectations of (25) and (26), we obtain for the -RFPEIf a single run is taken into account, then we can classically replace the expectation of with only one observation available. Therefore,Let us denote and . We obtainwhere is the identity matrix. Now, in order to completely define the expression of the -RFPE, we need the following result due to the Schur complement.
Lemma 13. Let be a symmetric matrix of the form , where , , and are real and positive definite matrices. If is invertible, then the following property holds:
In the mixed context, the estimation procedure is mainly a estimation since the impulsional effect of the outlier is rapidly treated by the . Accordingly, . Moreover, , , are positive definite matrices. Using Lemma 13 with , , and , we havewhere is the model complexity and . Therefore, and the trace of becomeswith being the model complexity. The -RFPE becomes
4. Model Order Reduction
Let two model classes and be given, respectively, related to Huber’s and estimators. To avoid some drawbacks with the notation of the parameter vectors, model classes and are parameterized by and , respectively. Recall that the PREC in Huber’s approach is defined as follows:See [29, chapter 4, page 50]. Let us consider the following conditions C1:(1) and ;(2) and ;(3) with ;(4).
Theorem 14. Assume that the conditions C1 are verified. Let be the innovation outlier; then
See the proof in Appendix F. Let be Huber’s estimator. Using in [29, chapter 7, page 125] and Lemma 13, since and , then From (33) and conditions C1, since and , we obtainFrom the curves of and using the conditions C1, thenLet us denote , where and are the DOF, respectively, related to Huber’s and symmetric functions.
Theorem 15. If and , then
The proof is given in Appendix G. This tuning offers to the user the necessary DOF to estimate and validate the reduced order models, as described in the following section.
5. Effect of DOF in the Symmetric Function to the Model Order Reduction
Here, we focus on a new vision on the link between the model reduction objective and its link to the estimation chosen, in particular the number of DOF of this symmetric function. Let be the number of DOF. We present in Table 1 different symmetric function related to the MLE context and the corresponding with their associated tuning parameters.
First, the reason for which the LSE and LSAD estimators do not allow the model order reduction seems to be due to the number of DOF; here, . From the comparison between Akaike’s FPE and the -FPE , we obtain . The condition is ensured only by the value of the variance of the estimation. Sometimes, this one is obtained with difficulty, especially in the presence of some outliers in the prediction errors. Since , the model complexity reduction remains difficult. From our point of view, the introduction of DOF in the estimation symmetric function facilitates the model complexity reduction. For example, the DOF in Huber’s symmetric function may ensure both the robustness and the model order reduction. Therefore, comparing (36) and the -FPE, we have . Consequently, the model order reduction may be obtained by tuning in Huber’s function to have the condition . By extention, with three DOF, the condition in (G.2) may be obtained by tuning (through ), , and . Thus, the flexibility of the symmetric function ensures the model order reduction. Table 2 summarizes these main results related to the effect of the norm’s DOF on the model order reduction.
6. Experimental Results
This section presents different results related to the identification of a real system, namely, an acoustic duct plant, in order to estimate and validate reduced order models, using estimators. We compare the estimation results with those given by Huber’s [28, 29], LSAD, LSE , and estimators . As shown in Figure 3, we have to find discrete-time dynamical models with reduced orders of the plant including the secondary loudspeaker, its amplifier, the acoustic secondary path, the measurement microphone, and its amplifier/antialiasing filter. For more details, the reader could refer to . The identification has been realized from a PRBS (pseudrandom binary sequence) as excitation input signal, with a length and level , approximating a white noise upon the frequency range . The sampling period is μs and the number of data is . The pure delay of the duct is given as an integer of the sampling period. Here, . Figure 2 shows the output signal given by the microphone during the recording phase of the data set of the process. An estimations campaign has been conducted to get the value of ensuring Proposition 4.2 and Corollary 4.3 in . was varied over the range with an incremental step of . The convergence is ensured for . We deduce that . In the sequel, for the estimation context, , , and , and we will denote all the models by .
6.2. Reduced Order Models
Table 3 presents the main results from , LSE, LSAD, , and estimators. The model order reduction is clearly shown and we can notice that the complexity reduction between LSE and the low dimension of the new mixed estimator is . In Table 4, the parameters of the - and - models are given. Figures 4 and 5 show the frequency responses of the - and - models, respectively, versus the spectral estimation of the process. Even though the model presents a lower complexity than Huber’s model, its frequency response especially in low frequencies particularly interesting in controller design is good. Remark that, for estimator, the fit is low and the model order increases considerably with .
In this paper, a new robust estimator based on a mixed distribution has been proposed in order to reduce the estimated model complexity. The main consistency and convergence properties of the robust estimator have been established and it has been shown that the tuning of DOF in the symmetric function improves the model order reduction. In effect, the additional DOF in the symmetric function of the estimation criterion present the advantage to tune its parameters. This flexibility offers to the user the choice to estimate and reduce the estimated model complexity in the same procedure. We presented experimental results from a vibration real process and we showed the effective reductions. Many aspects of these studies are opened to further researches. First, from a formal point of view, we should study the generalization of multiple DOF in a mixed symmetric function by investigating the link between the number of DOF and the model reduction. Then, it would be relevant to apply our framework to a process with a level of contamination of the data more significant.
A. Proof of Theorem 4
For all , the pdf has a total mass 1. Hence,Since is a symmetric function, then (A.1) becomeswith
Case 1 (for ). With , we get . Moreover, and . We obtainThe integral in (A.4) is equal to , which proves .
Case 2 (for ). From straightforward calculations, expression only necessitates the result of the integral . With