Mathematical Problems in Engineering

Volume 2015 (2015), Article ID 465372, 12 pages

http://dx.doi.org/10.1155/2015/465372

## Multimode Process Monitoring Based on Sparse Principal Component Selection and Bayesian Inference-Based Probability

^{1}Key Laboratory of Advanced Control and Optimization for Chemical
Processes of Ministry of Education, East China University of Science and Technology, Shanghai 200237, China^{2}Software Engineering Institute, East China Normal University, Shanghai 200062, China

Received 6 May 2015; Revised 27 July 2015; Accepted 28 July 2015

Academic Editor: Jean J. Loiseau

Copyright © 2015 Xiaodong Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

According to the demand for diversified products, modern industrial
processes typically have multiple operating modes. At the same time,
variables within the same mode often follow a mixture of Gaussian
distributions. In this paper, a novel algorithm based on sparse
principal component selection (SPCS) and Bayesian inference-based
probability (BIP) is proposed for multimode process monitoring. SPCS
can be formulated as a just-in-time regression between all PCs and
each sample. SPCS selects PCs according to the nonzero regression
coefficients which indicate the compact expression of the sample.
This expression is necessarily *discriminative*: amongst all
subset of PCs, SPCS selects the PCs which most compactly express the
sample and rejects all other possible but less compact expressions.
BIP is utilized to compute the posterior probabilities of each monitored
sample belonging to the multiple components and derive an integrated
global probabilistic index for fault detection of multimode processes.
Finally, to verify its superiority, the SPCS-BIP algorithm is applied
to the Tennessee Eastman (TE) benchmark process and a continuous stirred-tank
reactor (CSTR) process.

#### 1. Introduction

Over the past two decades, with the development of complex chemical processes and the growing demand of plant safety and stable product quality, timely process monitoring is gaining importance. Because large amounts of data can be gathered by the use of distributed control systems (DCSs), multivariate statistical process monitoring (MSPM) algorithms have received great attention. Among these algorithms, principal component analysis (PCA) and partial least squares (PLS) are the most widely used algorithms [1–8]. Both algorithms project high-dimensional data onto lower dimensional subspaces. Process normal and abnormal conditions can be isolated by the use of Hotelling’s or squared predicted error (SPE) [9–13]. Other complementary MSPM algorithms, including independent components analysis (ICA), Fisher discriminant analysis (FDA), and canonical variate analysis (CVA), are used to overcome some limitations in PCA/PLS-based monitoring schemes [14–17]. However, most of MSPM algorithms rely on the assumption that the system is in a single operating region and that the data follow a Gaussian distribution. In chemical processes, operating condition shifts are often encountered due to the changes of various factors such as feedstock, product specification, set points, and manufacturing strategy. When a process is running under substantially different operating conditions, only a small number of variables actually follow Gaussian distribution [18]. As a result, the multimodality of data distribution might lead to unseemliness for the monitoring of conventional MSPM algorithms. To address these problems, it is necessary to develop new algorithms.

In literature, multiple models can be built to fit each individual operating mode, but these are two essential issues that need to be addressed. One is how to divide the training data into multiple subsets correctly, corresponding to different operating modes. In order to solve this issue, many clustering algorithms are applied. In terms of the traditional approaches, Ge and Song [19] used fuzzy C-means clustering algorithm to separate the training data set according to the unique characteristics of each mode. He et al. [20] applied the -nearest-neighbor method. Srinivasan et al. [21, 22] identified the different operating modes by evaluating the Euclidean distances between samples in a constructed data window and then applied dynamic PCA-based similarity measures to cluster the samples. Liu and Chen [23] developed a method using Bayesian classification for selecting multiple regions from a training data set. Zhao et al. [24] presented a multiple principal component analysis (MPCA) algorithm that selects one suitable model to monitor multimode processes. The other issue is how to determine the final results. A proper measurement should be employed to determine which model is the most suitable one for monitoring at the current moment. Ng and Srinivasan [22] exploited the most suitable PCA model through a minimized distance reflecting both the and SPE values. Zhao et al. [24] close the local PCA model with the minimum SPE value. Natarajan and Srinivasan [21] used the distance between the sample and the center of local models as a criterion. Yu and Qin [25] performed Bayesian inference on the postprobabilities calculated by the Gaussian mixture model (GMM) or the nonlinear kernel GMM. Meanwhile, Ge and coworkers [26, 27] took advantage of Bayesian inference to softly combine the monitoring results computed by local models built by means of probabilistic PCA (PPCA), factor analysis (FA), or subspace algorithms.

To date, the problem of how to correctly divide the training data into multiple subset can successfully be solved by many algorithms mentioned in the previous paragraph. However, there are still some issues that need to be resolved; the most important one is how to select the key principal components (PCs) when using one suitable model for process monitoring. Many algorithms for selecting PCs have been proposed, such as cumulative percent variance (CPV) [28], variance of reconstruction error (VRE) [29], and cross validation (CV) [30]. Generally, most of the classical algorithms just take normal operational observations into account and select the first several PCs with larger variance. While PCs with larger variance of normal data cannot guarantee the capture of the largest variations in fault data online. Jolliffe [31] suggested that the last PCs may be as important as those with large variance. Togkalidou et al. [32] noted that the PCs with larger variance do not always contain much information for prediction. However, this issue is insufficiently discussed in PCA-based process monitoring, and the standard PC selection is still not established.

Fortunately, many researches have been aware of the inherent defects of classical PCA algorithm. A lot of workers tried to seek a subspace spanned by key PCs, which contains the most important information for process monitoring. Peng et al. [33] suggested a new feature selection algorithm, named minimal-redundancy-maximal-relevance criterion (mRMR). It is based on mutual information and selects the features with highest relevance to the target class. Jiang et al. [34] put forward the sensitive principal component analysis for fault detection and diagnosis in chemical processes. They pointed out that PCs selected by PCA algorithm are not always the key PCs for fault detection. Their task was to find the sensitive PCs which have relationship with fault information. Arbel et al. [35] proposed that the process variables that are preponderant in achieving specific objectives need to be selected.

In this paper, a process monitoring algorithm using multisubspace sparse principal component analysis with the BIP algorithm is put forward. First, variables are divided into different subblocks corresponding to different units or pieces of equipment to reduce the complexity of process analysis. By using BIP algorithm, multimode data in each subblock are divided into multiple subgroups. BIP can compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes. The PCs selected by PCA algorithm with larger variances do not always have relationship with fault information. Sparse principal component selection (SPCS) takes the information of both normal and abnormal observations into account. The algorithm is formulated as a just-in-time form that constructs an elastic net regression between all PCs and each sample. SPCS selects PCs corresponding to the nonzero regression coefficients which indicate the compact expression of the sample. This expression is necessarily* discriminative*: amongst all subset of PCs, SPCS selects the PCs which most compactly express the sample and rejects all other possible but less compact expression. Third, the key PCs are selected by SPCS in each subgroup to solve the problem of fault information loss. It needs to be stressed that the subspace spanned by the key PCs selected is the feature subspace. Finally, in order to verify the superiority of the SPCS-BIP algorithm, it is applied to the Tennessee Eastman (TE) benchmark problem and a continuous stirred-tank reactor (CSTR) process.

#### 2. Preliminaries

##### 2.1. Principal Component Analysis

Principal component analysis is a multivariate statistical analysis which is widely used in chemical process monitoring, fault detection, and so forth [36–38]. Let represent an -dimensional sample vector and denote a data matrix with zero mean and unit variance, where is the number of samples and is the number of variables in the process. From the statistical viewpoint, the PCA algorithm could be obtained by singular value decomposition (SVD) [28, 34]:where and are the score matrix and the loading matrix, respectively. is the principal components retained number. The loading matrix can be obtained by eigenvalue decomposition on the covariance matrix as follows:where denotes the eigenvalue matrix and contains the loading matrices of component subspace and residual subspace, respectively.

##### 2.2. Construction of Finite Gaussian Mixture Model Based on EM

For the process running at multiple operating condition, owing to the mean shifts or covariance changes, the assumption of multivariate Gaussian distribution becomes invalid [21, 22]. In this situation, the local Gaussian distribution is still appropriate to characterize each subset of measurement data from the same operating conditions. Therefore, the finite Gaussian mixture model is prime suited to represent the data sources driven by different operating modes [13, 24, 25].

To construct a FGMM, given a set of training samples , the log-likelihood function can be expressed asand the parameter estimation problem is formulated aswhere , is the prior probabilities, and is the number of Gaussian components included in FGMM. is the mean vector and is the covariance matrix.

There are a lot of learning algorithms, such as maximum likelihood estimation (MLE), EM, and the F-J algorithm, that have been put forward for mixture model estimation [39, 40]. As a more tractable numerical strategy, the EM algorithm has been well used in practice to estimate the maximum likelihood distribution parameter [39]. EM algorithm is implemented iteratively by means of repeating the expectation step (E-step) and maximization step (M-step) to calculate the posterior probabilities and then the corresponding distribution parameters until a convergence criterion of the log-likelihood function is satisfied. Given the training data and an initial estimate , the iterative E-step and M-step are expressed as follows:(i)E-step:where denotes the posterior probability of the training sample within the Gaussian component at the iteration; (ii)M-step:where , , and are the mean, covariance, and prior probability of the Gaussian component at the iteration, respectively.

#### 3. Fault Detection with Sparse Principal Component Selection and Bayesian Inference-Based Probability

In this section, the idea of SPCS-BIP algorithm for multimode process monitoring is demonstrated in detail. We first introduce the Bayesian inference-based probability which can derive the confidence boundary around the normal operating regions for process monitoring and fault detection. Then, the sparse principal component selection was introduced for selecting the key Pcs related with fault information. Finally, the steps of this algorithm were given.

##### 3.1. Bayesian Inference-Based Probability

In the previous section, the FGMM has been constructed, and it is essential to further derive the confidence boundary around the normal operating regions for process monitoring and fault detection. Due to the multimodality of mixture distribution, it is really difficult to capture the analytical boundary of the density function in a certain confidence level.

In the proposed monitoring approach, given an arbitrary monitored sample belonging to each Gaussian component, Bayesian inference strategy is used to calculate the posterior probability as follows:which can also be formulated as

Given that each component follows a unimodal Gaussian distribution, the squared Mahalanobis distance of from the center of follows distribution, provided that belongs to ,

Under the assumption that and has degree of freedom, denotes the squared Mahalanobis distance between and the mean center of . Owing to colinearity, is usually ill-conditioned, and the following regularized Mahalanobis distance is utilized instead to avoid too wide confidence regions:where the function of is to remove the ill condition of covariance matrix by adding a positive number to all the diagonal entries.

For the monitored sample , a local Mahalanobis distance-based probability index relative to each Gaussian component can be defined asor

Given the appropriate degree of freedom, can be computed by integrating the probability density function. Under a given confidence level, this index has the function of indicating whether the monitored sample is normal or abnormal provided that it belongs to the corresponding Gaussian component. A global BIP index is proposed to combine the local probability metrics across all the Gaussian clusters because the random characteristic of each monitored sample may come from multiple Gaussian components with the corresponding posterior probabilities. The formulation of BIP index for the monitored sample is given bywhere the posterior probability is used to incorporate the contribution of each local Gaussian component to the overall probabilistic index. As , we have

Under the preset confidence level 100%, the process is determined within normal operation if

Otherwise, the process operation is treated out of control.

##### 3.2. Sparse Principal Component Selection

Sparse representation has proven to be an extremely powerful tool for acquiring, representing, and compressing high-dimensional data [41–43]. This success is mainly because of the fact that the important reconstruction information of data such as process data and time series data has naturally sparse representations with respect to fixed bases, or concatenations of such bases. Qiao et al. [44] proposed that the graphs constructed by the -norm have the advantage of greater robustness to data noise, automatic sparsity, and adaptive neighborhood for individual datum. What is more another important advantage is that sparse representation has the potential discriminative ability since most nonzero elements are located on the samples in the same class as the represented sample.

Given the training sample , a test sample , the solution to the sparse representation problem can be obtained by solving the following -minimization problem:where are the sparse representation coefficients and denotes the number of nonzero elements of . From the perspective of statistics, formula (16) can be named the Lasso criterion. Lasso is a penalized least squares algorithm which was originally by quadratic programming imposing a constraint on the norm of the regression coefficients. Thus, the Lasso estimates are obtained by minimizing the Lasso criterion:where is nonnegative. However, only using the -norm penalty in Lasso has its limitation. Zou et al. [45] proposed that if there is a group of variables among which the pairwise correlations are very high, lasso tends to select any variable from the group and does not consider which one is selected. Fortunately, elastic net was put forward by Zou et al. which overcomes the limitation of only using the -norm penalty. It is known that combining the -norm and -norm penalty can result in grouping effectiveness in regression and thus enhance the prediction accuracy. For any nonnegative and , the elastic net estimates are given by

In brief, it is expected that the elastic net is used to group a set of sparse coefficients to construct the sparse alignment matrices, in which the sparse representation information or the potential discriminative information is encoded to enhance the discriminative ability in an unsupervised manner.

##### 3.3. Fault Detection with SPCS and BIP

The key problem for monitoring the multimode process is to select a suitable model and choose the subspace spanned by key PCs. In the Introduction, we had put forward the fact that the subspace spanned by the first several PCs with largest explained variance does not always have fault information.

In the following part, a novel multimode process monitoring approach based on SPCS and BIP is proposed. This approach is in a just-in-time form. For each sample, an elastic net regression between all PCs and the sample is constructed and solved. The PCs which have nonzero regression coefficients are retained, while other PCs are rejected. That means, for each sample, we can pick out the most discriminative bases and the others are set to zero. Its concrete calculating steps are summarized in Figure 1.