#### Abstract

According to the demand for diversified products, modern industrial
processes typically have multiple operating modes. At the same time,
variables within the same mode often follow a mixture of Gaussian
distributions. In this paper, a novel algorithm based on sparse
principal component selection (SPCS) and Bayesian inference-based
probability (BIP) is proposed for multimode process monitoring. SPCS
can be formulated as a just-in-time regression between all PCs and
each sample. SPCS selects PCs according to the nonzero regression
coefficients which indicate the compact expression of the sample.
This expression is necessarily *discriminative*: amongst all
subset of PCs, SPCS selects the PCs which most compactly express the
sample and rejects all other possible but less compact expressions.
BIP is utilized to compute the posterior probabilities of each monitored
sample belonging to the multiple components and derive an integrated
global probabilistic index for fault detection of multimode processes.
Finally, to verify its superiority, the SPCS-BIP algorithm is applied
to the Tennessee Eastman (TE) benchmark process and a continuous stirred-tank
reactor (CSTR) process.

#### 1. Introduction

Over the past two decades, with the development of complex chemical processes and the growing demand of plant safety and stable product quality, timely process monitoring is gaining importance. Because large amounts of data can be gathered by the use of distributed control systems (DCSs), multivariate statistical process monitoring (MSPM) algorithms have received great attention. Among these algorithms, principal component analysis (PCA) and partial least squares (PLS) are the most widely used algorithms [1–8]. Both algorithms project high-dimensional data onto lower dimensional subspaces. Process normal and abnormal conditions can be isolated by the use of Hotelling’s or squared predicted error (SPE) [9–13]. Other complementary MSPM algorithms, including independent components analysis (ICA), Fisher discriminant analysis (FDA), and canonical variate analysis (CVA), are used to overcome some limitations in PCA/PLS-based monitoring schemes [14–17]. However, most of MSPM algorithms rely on the assumption that the system is in a single operating region and that the data follow a Gaussian distribution. In chemical processes, operating condition shifts are often encountered due to the changes of various factors such as feedstock, product specification, set points, and manufacturing strategy. When a process is running under substantially different operating conditions, only a small number of variables actually follow Gaussian distribution [18]. As a result, the multimodality of data distribution might lead to unseemliness for the monitoring of conventional MSPM algorithms. To address these problems, it is necessary to develop new algorithms.

In literature, multiple models can be built to fit each individual operating mode, but these are two essential issues that need to be addressed. One is how to divide the training data into multiple subsets correctly, corresponding to different operating modes. In order to solve this issue, many clustering algorithms are applied. In terms of the traditional approaches, Ge and Song [19] used fuzzy C-means clustering algorithm to separate the training data set according to the unique characteristics of each mode. He et al. [20] applied the -nearest-neighbor method. Srinivasan et al. [21, 22] identified the different operating modes by evaluating the Euclidean distances between samples in a constructed data window and then applied dynamic PCA-based similarity measures to cluster the samples. Liu and Chen [23] developed a method using Bayesian classification for selecting multiple regions from a training data set. Zhao et al. [24] presented a multiple principal component analysis (MPCA) algorithm that selects one suitable model to monitor multimode processes. The other issue is how to determine the final results. A proper measurement should be employed to determine which model is the most suitable one for monitoring at the current moment. Ng and Srinivasan [22] exploited the most suitable PCA model through a minimized distance reflecting both the and SPE values. Zhao et al. [24] close the local PCA model with the minimum SPE value. Natarajan and Srinivasan [21] used the distance between the sample and the center of local models as a criterion. Yu and Qin [25] performed Bayesian inference on the postprobabilities calculated by the Gaussian mixture model (GMM) or the nonlinear kernel GMM. Meanwhile, Ge and coworkers [26, 27] took advantage of Bayesian inference to softly combine the monitoring results computed by local models built by means of probabilistic PCA (PPCA), factor analysis (FA), or subspace algorithms.

To date, the problem of how to correctly divide the training data into multiple subset can successfully be solved by many algorithms mentioned in the previous paragraph. However, there are still some issues that need to be resolved; the most important one is how to select the key principal components (PCs) when using one suitable model for process monitoring. Many algorithms for selecting PCs have been proposed, such as cumulative percent variance (CPV) [28], variance of reconstruction error (VRE) [29], and cross validation (CV) [30]. Generally, most of the classical algorithms just take normal operational observations into account and select the first several PCs with larger variance. While PCs with larger variance of normal data cannot guarantee the capture of the largest variations in fault data online. Jolliffe [31] suggested that the last PCs may be as important as those with large variance. Togkalidou et al. [32] noted that the PCs with larger variance do not always contain much information for prediction. However, this issue is insufficiently discussed in PCA-based process monitoring, and the standard PC selection is still not established.

Fortunately, many researches have been aware of the inherent defects of classical PCA algorithm. A lot of workers tried to seek a subspace spanned by key PCs, which contains the most important information for process monitoring. Peng et al. [33] suggested a new feature selection algorithm, named minimal-redundancy-maximal-relevance criterion (mRMR). It is based on mutual information and selects the features with highest relevance to the target class. Jiang et al. [34] put forward the sensitive principal component analysis for fault detection and diagnosis in chemical processes. They pointed out that PCs selected by PCA algorithm are not always the key PCs for fault detection. Their task was to find the sensitive PCs which have relationship with fault information. Arbel et al. [35] proposed that the process variables that are preponderant in achieving specific objectives need to be selected.

In this paper, a process monitoring algorithm using multisubspace sparse principal component analysis with the BIP algorithm is put forward. First, variables are divided into different subblocks corresponding to different units or pieces of equipment to reduce the complexity of process analysis. By using BIP algorithm, multimode data in each subblock are divided into multiple subgroups. BIP can compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes. The PCs selected by PCA algorithm with larger variances do not always have relationship with fault information. Sparse principal component selection (SPCS) takes the information of both normal and abnormal observations into account. The algorithm is formulated as a just-in-time form that constructs an elastic net regression between all PCs and each sample. SPCS selects PCs corresponding to the nonzero regression coefficients which indicate the compact expression of the sample. This expression is necessarily* discriminative*: amongst all subset of PCs, SPCS selects the PCs which most compactly express the sample and rejects all other possible but less compact expression. Third, the key PCs are selected by SPCS in each subgroup to solve the problem of fault information loss. It needs to be stressed that the subspace spanned by the key PCs selected is the feature subspace. Finally, in order to verify the superiority of the SPCS-BIP algorithm, it is applied to the Tennessee Eastman (TE) benchmark problem and a continuous stirred-tank reactor (CSTR) process.

#### 2. Preliminaries

##### 2.1. Principal Component Analysis

Principal component analysis is a multivariate statistical analysis which is widely used in chemical process monitoring, fault detection, and so forth [36–38]. Let represent an -dimensional sample vector and denote a data matrix with zero mean and unit variance, where is the number of samples and is the number of variables in the process. From the statistical viewpoint, the PCA algorithm could be obtained by singular value decomposition (SVD) [28, 34]:where and are the score matrix and the loading matrix, respectively. is the principal components retained number. The loading matrix can be obtained by eigenvalue decomposition on the covariance matrix as follows:where denotes the eigenvalue matrix and contains the loading matrices of component subspace and residual subspace, respectively.

##### 2.2. Construction of Finite Gaussian Mixture Model Based on EM

For the process running at multiple operating condition, owing to the mean shifts or covariance changes, the assumption of multivariate Gaussian distribution becomes invalid [21, 22]. In this situation, the local Gaussian distribution is still appropriate to characterize each subset of measurement data from the same operating conditions. Therefore, the finite Gaussian mixture model is prime suited to represent the data sources driven by different operating modes [13, 24, 25].

To construct a FGMM, given a set of training samples , the log-likelihood function can be expressed asand the parameter estimation problem is formulated aswhere , is the prior probabilities, and is the number of Gaussian components included in FGMM. is the mean vector and is the covariance matrix.

There are a lot of learning algorithms, such as maximum likelihood estimation (MLE), EM, and the F-J algorithm, that have been put forward for mixture model estimation [39, 40]. As a more tractable numerical strategy, the EM algorithm has been well used in practice to estimate the maximum likelihood distribution parameter [39]. EM algorithm is implemented iteratively by means of repeating the expectation step (E-step) and maximization step (M-step) to calculate the posterior probabilities and then the corresponding distribution parameters until a convergence criterion of the log-likelihood function is satisfied. Given the training data and an initial estimate , the iterative E-step and M-step are expressed as follows:(i)E-step:where denotes the posterior probability of the training sample within the Gaussian component at the iteration; (ii)M-step:where , , and are the mean, covariance, and prior probability of the Gaussian component at the iteration, respectively.

#### 3. Fault Detection with Sparse Principal Component Selection and Bayesian Inference-Based Probability

In this section, the idea of SPCS-BIP algorithm for multimode process monitoring is demonstrated in detail. We first introduce the Bayesian inference-based probability which can derive the confidence boundary around the normal operating regions for process monitoring and fault detection. Then, the sparse principal component selection was introduced for selecting the key Pcs related with fault information. Finally, the steps of this algorithm were given.

##### 3.1. Bayesian Inference-Based Probability

In the previous section, the FGMM has been constructed, and it is essential to further derive the confidence boundary around the normal operating regions for process monitoring and fault detection. Due to the multimodality of mixture distribution, it is really difficult to capture the analytical boundary of the density function in a certain confidence level.

In the proposed monitoring approach, given an arbitrary monitored sample belonging to each Gaussian component, Bayesian inference strategy is used to calculate the posterior probability as follows:which can also be formulated as

Given that each component follows a unimodal Gaussian distribution, the squared Mahalanobis distance of from the center of follows distribution, provided that belongs to ,

Under the assumption that and has degree of freedom, denotes the squared Mahalanobis distance between and the mean center of . Owing to colinearity, is usually ill-conditioned, and the following regularized Mahalanobis distance is utilized instead to avoid too wide confidence regions:where the function of is to remove the ill condition of covariance matrix by adding a positive number to all the diagonal entries.

For the monitored sample , a local Mahalanobis distance-based probability index relative to each Gaussian component can be defined asor

Given the appropriate degree of freedom, can be computed by integrating the probability density function. Under a given confidence level, this index has the function of indicating whether the monitored sample is normal or abnormal provided that it belongs to the corresponding Gaussian component. A global BIP index is proposed to combine the local probability metrics across all the Gaussian clusters because the random characteristic of each monitored sample may come from multiple Gaussian components with the corresponding posterior probabilities. The formulation of BIP index for the monitored sample is given bywhere the posterior probability is used to incorporate the contribution of each local Gaussian component to the overall probabilistic index. As , we have

Under the preset confidence level 100%, the process is determined within normal operation if

Otherwise, the process operation is treated out of control.

##### 3.2. Sparse Principal Component Selection

Sparse representation has proven to be an extremely powerful tool for acquiring, representing, and compressing high-dimensional data [41–43]. This success is mainly because of the fact that the important reconstruction information of data such as process data and time series data has naturally sparse representations with respect to fixed bases, or concatenations of such bases. Qiao et al. [44] proposed that the graphs constructed by the -norm have the advantage of greater robustness to data noise, automatic sparsity, and adaptive neighborhood for individual datum. What is more another important advantage is that sparse representation has the potential discriminative ability since most nonzero elements are located on the samples in the same class as the represented sample.

Given the training sample , a test sample , the solution to the sparse representation problem can be obtained by solving the following -minimization problem:where are the sparse representation coefficients and denotes the number of nonzero elements of . From the perspective of statistics, formula (16) can be named the Lasso criterion. Lasso is a penalized least squares algorithm which was originally by quadratic programming imposing a constraint on the norm of the regression coefficients. Thus, the Lasso estimates are obtained by minimizing the Lasso criterion:where is nonnegative. However, only using the -norm penalty in Lasso has its limitation. Zou et al. [45] proposed that if there is a group of variables among which the pairwise correlations are very high, lasso tends to select any variable from the group and does not consider which one is selected. Fortunately, elastic net was put forward by Zou et al. which overcomes the limitation of only using the -norm penalty. It is known that combining the -norm and -norm penalty can result in grouping effectiveness in regression and thus enhance the prediction accuracy. For any nonnegative and , the elastic net estimates are given by

In brief, it is expected that the elastic net is used to group a set of sparse coefficients to construct the sparse alignment matrices, in which the sparse representation information or the potential discriminative information is encoded to enhance the discriminative ability in an unsupervised manner.

##### 3.3. Fault Detection with SPCS and BIP

The key problem for monitoring the multimode process is to select a suitable model and choose the subspace spanned by key PCs. In the Introduction, we had put forward the fact that the subspace spanned by the first several PCs with largest explained variance does not always have fault information.

In the following part, a novel multimode process monitoring approach based on SPCS and BIP is proposed. This approach is in a just-in-time form. For each sample, an elastic net regression between all PCs and the sample is constructed and solved. The PCs which have nonzero regression coefficients are retained, while other PCs are rejected. That means, for each sample, we can pick out the most discriminative bases and the others are set to zero. Its concrete calculating steps are summarized in Figure 1.

*Offline Modeling*(1)Collect a set of historical training data under all possible operating conditions.(2)Use the EM algorithm to learn the Gaussian mixture model and estimate the model parameter set based on the iterative steps.(3)For each submodel, get a normal operational observation set , where is the number of samples and is the number of variables. This set is denoted as the training set for threshold determining. A testing set with both normal and abnormal observations is given for testing.(4)Normalize the training data through the mean value and variance of each variable.(5)Obtain all principal components using SVD decomposition. The training data is reconstructed by , where is the score vector and is the loading vector.(6)For training sample , construct an elastic net regression between each observation value of training data and loading vector made of PCs in step , according to .(7)Corresponding to the nonzero representation coefficients , construct a new loading vector .(8)Specify a confidence 100% and constant control limit .

*Online Monitoring*(1)Normalize the current time point data by using mean values and variance of the training data.(2)Obtain the loading vector from offline modeling.(3)When a test sample is coming, construct an elastic net regression between the sample and loading vector made of PCs in step , according to .(4)Corresponding to the nonzero representation coefficients , construct a new loading vector .(5)Generate the BIP control chart with the calculated BIP index values for all the monitored samples. If the BIP index of a test sample is lower than the control limit, which means the sample is normal, go to step . Else, there is a fault in the process.

#### 4. Case Studies on the TE and CSTR Process

In this case study, the TE benchmark and CSTR process are introduced to verify the effectiveness of the SPCS-BIP algorithm. PCA-GMM is the classic algorithm for multimode processing monitoring. And the fault detection index (FDI) is similar to Bayesian inference probability (BIP). So here, a comparison was made between SPCS-BIP and PCA-GMM. In addition, to verify the improvements of SPCS algorithm, which can select sparse PCs, a comparison was performed between the SPCS-BIP algorithm and the MPPCA algorithm.

##### 4.1. Tennessee Eastman Process

As a well-known benchmark process, the Tennessee Eastman process, which was presented by Downs and Vogel, has been widely applied to evaluate and compare the efficiency of process monitoring techniques [46, 47]. The schematic diagram of the process is illustrated in Figure 2. This process consists of five major unit operations: a reactor, a product condenser, a vapor-liquid separator, a recycle compressor, and a product stripper. In addition, there are six modes of process operation, as listed in Table 1. The variables can be divided into three categories: composition variables, continuous process variables, and manipulated variables. In our study, only modes 1 and 3 were simulated through the Simulink programs developed on the basis of the decentralized control strategy designed by Ricker [48]. The Simulink programs can be downloaded from http://depts.washington.edu/control/LARRY/TE/download.html. The 31 selected monitoring variables contained 9 manipulated variables and 22 continuous process variables. Thus, these variables were divided into five subblocks according to five units. However, given that only two variables were allocated to each the compressor unit and the condenser unit, there were four variables assigned to the other three related subblocks. As a result, the total of 31 variables was divided into three subblocks.

There are 20 faults in the multimode TE process, which are listed in Table 2. Among these faults, the root causes of the faults 16–20 are unknown [46, 47]. What is more, to simplify interpretation, the amplitudes of faults 3, 9, and 15 are so small. It is difficult to detect, so only the remaining 12 faults were considered in this study. In the modeling stage, 2000 normal samples, which include 1000 mode 1 samples and 1000 mode 3 samples, were collected as the training data set. In the testing stage, 1000 samples of mode 1 were tested first, and then the process switches to mode 3. As a result, the test data set consists of 1000 samples of mode 1 and 1000 samples of mode 3. And faults occurred from the 1200th sample. A set of 20 faults in multimode TE process, which are listed in Table 2, are simulated and the corresponding process data are collected for testing. The following simulations are run in MATLAB 8.3.0 (2014a) environment. Here, two indicators, which are FR (FR) and MR (MR), are often introduced to measure the result of process monitoring. FR is the rate of normal data classified as fault data. MR is the rate of fault data classified as normal rate.

In the MPPCA algorithm and PCA-GMM algorithm, when the variance contribution was selected as 85%, the dimension of feature space in MPPCA and the number of PCs in PCA-GMM were each selected as 18. In order to compare the monitoring performances of these algorithms in the same situation, the selected sparse PCs of each mode in SPCS-BIP were selected as 18. The 99% control limit was assigned to all three algorithms.

First, Figure 3 shows that the different submodes can be successfully divided by the EM algorithm used in this paper. And by using other algorithms, the modes also can be divided correctly. In other words, how to correctly divide the training data into multiple subset is not a problem by many related algorithms.

The normal process was tested by different algorithm, and the results are shown in Figure 4. In this figure, it is hard to figure out which algorithm’s FR is lower. In the figure, most samples of each algorithm are lower than the control limit. And by calculation, the FR of these algorithms are 0.333%, 0.08%, 0.25%, and 1.08%, respectively, corresponding to Figures 4(a), 4(b), 4(c), and 4(d). The monitoring performances of these three algorithms suggest that the FR are acceptable. Next, the data sets of 12 faults in mode 3 were tested, and the MR of these three algorithms are listed in Table 3, with the smallest MR shown in bold.

**(a)**MPPCA-**(b) MPPCA-SPE**

**(c) PCA-GMM**

**(d) SPCS-BIP**

From Table 3, we observe that the monitoring performance of SPCS-BIP is the best, compared to the MPPCA and PCA-GMM algorithms for all 12 faults. Here, we take the further analysis. In comparison with MPPCA and PCA-GMM algorithms, the SPCS-BIP algorithm can exactly divide the process data into subgroups corresponding to different modes by using the E-M algorithm, and, in each submode, SPCS-BIP can select the most important PCs that have most relation with the fault. Due to the fact that the subspace spanned by the PCs was monitored by BIP, most of the PCs are related to the main process of chemical industrial process, and only little PCs are related to the fault process. SPCS are discriminative by constructing an elastic net regression between all PCs and each sample. So, in Table 3, we observe that the results of SPCS-BIP are better than the results of MPPCA and PCA-GMM.

Figure 5 shows the monitoring performances of fault 10. It is easy to see that the FDI of MPPCA- and the BIP of PCA-GMM cannot detect the fault effectively in Figures 4(a) and 4(c). In the figure, more than half of the fault samples were regarded as the normal samples, while, compared to the performances of MPPCA- and PCA-GMM, the FDI of MPPCA-SPE shows some improvements. However, the monitoring performance of MPPCA-SPE does not match the performance of SPCS-BIP. We can find this point both in Figure 4 and Table 3

**(a)**MPPCA-**(b) MPPCA-SPE**

**(c) PCA-GMM**

**(d) SPCS-BIP**

##### 4.2. CSTR

This study simulated the CSTR process described by Yoon and MacGregor [49]. The diagram of the process is presented in Figure 6. Due to the fact that the CSTR process consisted of only one operating unit, the number of subblocks was selected as 1.

In the modeling stage, 1000 samples which include 500 mode 1 samples and 500 mode 2 samples were collected as the training data set. In the testing stage, 1000 samples of mode 2 were tested, and two faults were introduced to the process as follows.

*Case 1. *A step of 1 K was added in the cooling water temperature from the 500th sample.

*Case 2. *A 2 kmol/ step was added in the inlet solute concentration from the 500th sample.

In the MMPCA algorithm, when the variance contribution was selected as 85%, the dimension of feature space in MPPCA is 10. So, in order to compare the monitoring performances of these algorithms in the same situation, the number of PCs in PCA-GMM and the selected sparse PCs in SPCA-BIP were both selected as 10. The 99% control limit was assigned to all three algorithms.

The same as TEP, the FR of these algorithms are 0.4%, 0, 1.2%, and 1%, respectively. In an industry process, FR lower than 0.05 is acceptable [28].

The data sets of two faults in mode 2 were tested, and the MR were listed in Table 4. In the table, the smallest missed detection rates are shown in bold.

As shown in Table 4, the SPCS-BIP algorithm has shown the best performance for these two faults compared with other algorithms listed in the table. It is obvious that neither MPPCA nor PCA-GMM algorithms can detect the fault because their missed detection rates were high. In those four algorithms, only the SPCS-BIP was based on the selection PCs, so the improvements in the proposed sparse principal components selection can be demonstrated through the better monitoring performance of the SPCS-BIP algorithm.

Fault 1 is a bias in cooling water temperature . Due to the control loop in the CSTR process, these would be a bias in outlet temperature , and then the cooling water flow rate would increase. In Figure 7, both the MMPCA and PCA-GMM algorithms could not detect fault 1 effectively according to the performances of those shown in Figures 7(a), 7(b), and 7(c). In Figure 7(d), it is obvious that the monitoring performance of SPCS-BIP is much better than the others. The reason is that the correct classification for each subgroup by using E-M algorithm and the PCs selected by SPCS are discriminative and could construct the subspace that contains the important fault information for abnormal data.

**(a)**MPPCA-**(b) MPPCA-SPE**

**(c) PCA-GMM**

**(d) SPCS-BIP**

Fault 2 is a bias in inlet solute concentration . Then, due to the control loop in the CSTR process, there would be biases in outlet concentration and outlet temperature . According to the performances shown in Figure 8(b), the FDI of MPPCA-SPE could not detect fault 2. Compared to the MPPCA-SPE, the FDI of MPPCA- showed some improvements. Both PCA-GMM and the proposed SPCS are all using BIP. In Figures 8(c) and 8(d), we could hardly see which algorithm is better. However, in Table 4, we could obviously find that the SPCS-BIP is better. Even, compared to MPPCA-, the proposed algorithm has a little advantage than MPPCA-.

**(a)**MPPCA-**(b) MPPCA-SPE**

**(c) PCA-GMM**

**(d) SPCS-BIP**

#### 5. Conclusions

An algorithm using sparse principal component selection and Bayesian inference-based probability (SPCS-BIP) was proposed in this study. Given that the modern industrial processes typically have multiple operating modes, BIP is utilized to compute the posterior probabilities of each monitored sample belonging to the multiple components and derive an integrated global probabilistic index for fault detection of multimode processes. In each submode, we use the sparse principal component selection to select the key PCs that have the best relation with fault. This algorithm constructs an elastic net regression between all PCs and each sample and then selects PCs according to the nonzero regression coefficients which indicate the discriminative expression of the sample. Finally, the TE and CSTR processes were employed to verify the superiority of the SPCS-BIP algorithm. The monitoring performances of MPPCA, PCA-GMM, and SPCS-BIP methods are discussed compared to those of the MPPCA and PCA-GMM algorithms, and the monitoring performances of the SPCS-BIP algorithm were found to be the best ones among the three algorithms.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant no. 61375007) and Shanghai Science and Research Projects (Grant nos. 15JC1400600, 15JC1401700).