Abstract

In the present study, we introduce a new approach for the nonlinear monitoring process based on kernel entropy principal component analysis (KEPCA) and the notion of inertia. KEPCA plays double roles. First, it reduces the data in the high-dimensional space. Second, it constructs the model. Before data reduction, KEPCA transforms input data into high-dimensional feature space based on a nonlinear kernel function and automatically determines the number of principal components (PCs) based on the computation of the inertia. The retained PCs express the maximum inertia entropy of data in the feature space. Then, we use the Parzen window estimator to compute the upper control limit (UCL) for inertia-based KEPCA instead of the Gaussian assumption. Our second contribution concerns a new combined index based on the monitoring indices T2 and SPE in order to simplify the detection task of the fault and prevent any confusion. The proposed approaches have been applied to process fault detection and diagnosis for the well-known benchmark Tennessee Eastman process (TE). Results were performing.

1. Introduction

Since faults are integrated into most industrial processes’ execution, fault detection and identification is mandatory in monitoring and diagnosing different processes, in order to ensure a riskless functionality and improve the productivity of an industrial process. In most industrial systems, including chemical processes, the collected data are large, extremely correlated, and subject of noise. To this end, a data preprocessing is required to get rid of redundant data and extract only the significant information. This limitation has led to the development of multivariate statistical methods, such as principal component analysis (PCA) [1, 2], recursive PCA [3], entropy PCA [4], kernel principal component analysis (KPCA) [57], and modified kernel PCA [8]. Thereby, PCA is a linear projection method that has a weakness in the case of some chemical processes that present nonlinear characteristics; thus, PCA does not handle well fault detection [9]. Recently, the KPCA gained more attention in different research fields, particularly in the chemical process fault detection and diagnosis [1012]. KPCA enhances the computation efficiency by selecting only some features from a higher dimensional feature space. Thus, the computational complexity of the kernel matrix is minimised in KPCA [13]. Unless, KPCA is known to perform better result than PCA in monitoring nonlinear industrial processes [14, 15]; however, it is still confusing in determining and selecting automatically the optimal number of features in the feature space. In fact, we need to provide as an input the variance expressed by the maintained PCs. To eliminate this deficiency, we suggest applying our method, inertia-based KPCA combined to the Shannon entropy [16, 17], for the first time to monitor the chemical Tennessee Eastman process. Indeed, by incorporating the concept of entropy to the inertia-based KPCA, we obtain the inertia-based KEPCA, and ensuring an extraction of the PCs with higher valuable information. Accordingly, each sample contributes to the inertia that is considered as a probability, where Shannon entropy can be applied. Despite the widespread use of the Hotelling’ T2 and the square predicted error (SPE) statistics in monitoring, especially in chemical processes [2, 3, 810], they simultaneously detect fault very frequently, which is confusing for process operators conducting the fault analysis. Hence, the second role of our contribution is to cope with this issue by combining the monitoring indices T2 and SPE statistics into a single index. The combined index is constructed by integrating the information gained from the principal and the residual spaces together. With the application of our proposed new combined index, the detection of the fault is achieved and compared to another index developed in [10]. The simulation benchmark has been carried out on the chemical Tennessee Eastman process. We demonstrate that the proposed inertia-based KEPCA method resolves the shortcoming of the standard KPCA method, in terms of selecting the features automatically and in term of fault detection.

The present work is organized as follows. Section 2 presents the inertia-based KEPCA approach for fault detection and identification; also, the proposed new combined index is discussed. Section 3 presents the application of our approaches: inertia-based KEPCA and the new combined index to monitor the nonlinear Tennessee Eastman process. Section 4 concerns conclusion driven from this study.

2. Methods and Algorithms

2.1. Introduction to the Inertia-Based KEPCA Method

KEPCA is an unsupervised learning technique, as we documented it in [16]; also, it is an enhancement for the existing linear method EPCA [18]. In fact, it is the combination of the EPCA and the kernel mapping. The input data are indirectly mapped, using a kernel function, into a high-dimensional feature space and then executes linear EPCA in the output features. As a result, the reproducing kernel Hilbert space (RKHS) is constructed using the Gaussian function. Based on the concept of inertia, each mapped sample contributes to both explained and residual inertia, which are considered as a probability distribution. Afterward, the maximum of Shannon entropy was applied to the probability distribution to extract features expected to preserve only the most valuable information among all features. Furthermore, the inertia-based KEPCA is simply calculated away from any nonlinear optimization, and within it, large nonlinearities can be eliminated. The related algorithm of KEPCA is provided in detail in [16]. It is worth mentioning that the inertia-based KEPCA-based monitoring has the same construction steps as the standard KPCA [14, 15], and the main difference lies in performing the entropy to inertia-based PCA instead of only applying the PCA.

The Gaussian function for the data mapping step for the nonlinear process monitoring is given bywhere is the kernel bandwidth, predetermined by the user, proportional to the number of variables in the process at the hand.

2.2. Parzen Window Estimator for Probability Density Function (PDF)

Great number of estimators exists in the literature for probability density estimation. Considering the nonparametric estimation methods, we find the Parzen window estimator, kNN estimator, adaptive estimator, and variable estimator that is the combination of both Parzen and kNN estimators and others [19]. Hence, the method of Parzen remains the most largely used one for the estimation of the PDF. In fact, the Parzen window estimator is a practical tool that can extract density distribution empirically and provide a smooth PDF from population under consideration. The estimator’s expression is given as follows [20, 21]:where is the kernel function (Gaussian in our case), is the bandwidth, and is the distance separating sample and samples . Though, the choice of the bandwidth of the Parzen estimator is awkward. In this regard, in a diversity of research studies have been done mean integrated squared error (MISE) [22, 23], cross-validation, and others [19]. Here, we opt for the AMISE method cited in [24] to select the optimal value that gives effective results in smoothing parameter selection, especially in univariate random data.

2.3. Fault Detection Metrics

The fault detection phase is mandatory in monitoring nonlinear processes. Consequently, a variety of indices was developed [25]. The most widely used among them are the T2 and SPE statistics. They measure the variation within the feature space founded by the multivariate statistical method such as KPCA. Indeed, T2 refers to the sum of normalized squared scores, while the SPE is the sum of squared errors, in the principal space and residual space, respectively. In the present work, we will use the same expressions elaborated by previous research studies for the T2 and SPE [2, 3, 810] and combine both of them in order to construct one index.

2.4. Proposed New Combined Index (NCI)

Practically, one index is much easier for process monitoring, rather than two indices. Since T2 and SPE behave in a complementary way, they can be combined into one index. In this regard, Yue and Qin [26] proposed a combined index based on the T2 and SPE statistics. Thereafter, Choi et al. [10] suggested another unified index but with a regularized parameter. Accordingly, the main difference between our NCI and the one in [26] lies in the use of the probability density function (PDF) estimated through the Parzen window estimator to derive the upper control limits instead of the Gaussian assumption. This is because the Gaussian assumption-based UCL has been proven to not be valid and may provide false results considering that data can be non-Gaussian [15]. Therefore, our proposed NCI will be as follows:

The expressions and are given by integrating the estimated density through the Parzen window estimator (equation (2)) of training samples in the feature space. Let be the significance level and the estimated PDF. Thus, the expressions are as follows:

The fault is detected by the NCI, if the following condition is satisfied:

2.5. Faulty Variable Identification

Once the fault is detected, it is necessary to know which variables are the root of the cause. In this context, several works have proposed localization methods based on the calculation of the contributions of each variable to the monitoring indices [8, 11]. In these approaches, the variable with the greatest contribution is considered as the one that causes the appearance of the fault. The present work uses the sensitivity analysis method [8, 27] to identify the faulty variables. It is founded on the fact that the parameters causing the problem can be detected as a rate of change in the output variables of the system. The contributions of to the variable of a given vector , with variables, to each index are

Accordingly, the contribution of the same variable to our proposed index will be

2.6. Outline and Online Inertia-Based KEPCA Fault Detection Algorithm

Throughout the offline training phase, the normal operating data are used, where several parameters are determined: our NCI, including T2 and SPE statistics, the significance level, and the UCL of the NCI. Whereas, the online monitoring is useful to compute the detection indices T2 and SPE of the further vector of measurement. The process is judged to be under abnormal conditions if it surpasses the UCL of the NCI.

The following lines represent the steps of the outline and online processing of the inertia-based KEPCA fault detection algorithm.Phase1: offline training.  Step 1: scale the input data obtained under normal operating conditions (NOC)Step 2: choose the type of kernel function for mappingStep 3: compute the kernel matrix of the input data under NOC and centre itStep 4: obtain eigen-decomposition and sort eigenvalues with their corresponding eigenvectors in the descending orderStep 5: orthonormalize the eigenvectorsStep 6: compute the explained and residual inertia of each PCsStep 7: compute the sum of Shannon entropy of each inertiaStep 8: select the optimal number of PCs that corresponds to the maximum of sum of entropyStep 9: calculate NCI using the inertia-based KEPCA methodStep 10: determine UCL of the NCIPhase 2: online monitoring.Step 1: obtain test sample x and normalize it as in the offline phaseStep 2: compute the kernel vector of the test sample x and centre itStep 3: select the PCs of the test sample, which have the maximum of Shannon entropy of the inertiaStep 4: compare NCI of the test sample with its respective upper control limit provided by the model development phase; the process is in control, if NCI is less than its monitoring index. Other way, the process is out-of-control.

3. Application: Tennessee Eastman Process

To demonstrate our proposed approach inertia-based KEPCA monitoring, we consider the famous chemical benchmark Tennessee Eastman process (TEP). Figure 1 shows the control structure of the process. It is a nonlinear real industrial process described by dynamic properties. It encompasses five main parts: a compressor, a reactor, a condenser, a stripper, and a separator and is adopted as an unavoidable industrial benchmark process in evaluating monitoring approaches [28]. TEP comprises 22 continuous variables (in blue), 12 variables which are manipulated (in orange), and 19 composition measurement (in green). The sampling interval was set at 3 min, and the fault was injected at sample 160. More details and operating conditions are described in [28, 29]. Tables 1 and 2 describe the variables and the studied faults of the TE process, respectively.

3.1. Fault Detection Rule

Concerning the fault detection, some works consider that fault is detected once the monitoring index exceeds its control limit at the first time [9, 10], while other researchers define the detection time as the first sample number after eight sequential samples that have exceeded the control limit [8]. Whereas in [16], the fault detection is counted when a monitoring index surmounts the control limit in at least two consecutive occasions. To be more accurate in terms of comparison, we consider in our study, fault detection as successfully accomplished, when the monitoring index surpasses the control limit in at least two times consecutively as in [16]. Then, we evaluate the monitoring performance by the computation time (CT), the missed detection rate (MDR), and false alarm rate (FAR) metrics as in [25] based on the fault detection rule mentioned above.

3.2. Monitoring Performance Metrics

To evaluate the fault detection performance, three metrics are generally used: computation time, which is defined as the period between the time when a fault was introduced and the time of its detection. The false alarm rate (FAR) which expresses the percentage of the violated samples that exceeds the detection limit, to the faultless data. It can be computed as follows [25]:

The missed detection rate (MDR) is provided by the percentage of the faulty data that does not surpass the detection limit to all the faulty data.

3.3. Experimental Result

In this section, we demonstrate the efficiency of our proposed approach inertia-based KEPCA using faults 1 and 14 of the nonlinear TE process. The fault 1 concerns the change in the component A/C feed ratio and B composition constant, while the fault 14 is related to the valve of the reactor cooling water. To construct the model, we used all aforementioned variables apart from the 12th of the manipulated variable (the agitation speed of the reactor’s stirrer) as it is a constant. Besides, all 500 samples gathered under the NOC were employed as a training set. While, 960 samples were gathered under faulty operating conditions and used as test data, with 52 variables for each. The number of automatically selected PCs is 17, expressing maximum of inertia entropy. The parameter of the Gaussian function, incorporated in model development for mapping, was set as , where is the number of studied variables, is the variance of the input data, and is a constant depending on the process being examined. For the TE process, is equal to 40 as it was set in [15].

In Figure 2, the number of retained PCs is determined automatically. The retained PCs are necessary and sufficient in terms of information content (Shannon entropy). They are estimated to be 17 PCs considering all 52 variables of the TE process. Contrary to the work in [15], the number of retained PCs was indirectly specified through the cumulative percent eigenvalue (CPE), as an input parameter. Besides, they examine only 33 variables without a clear justification, which may leave out valuable information from data.

Here, it is worth noting that for all studied faults, we obtained the same number of selected PCs, which is 17 PCs.

In Figures 3 and 4, we can see the monitoring index plotted as a solid curve, whereas the horizontal dashed line represents the 99% confidence limit. The vertical dashed line represents the 160th sample when the fault was introduced. Our NCI of inertia-based KEPCA model detects quickly fault 1 at 163th sample and fault 14 at 161th sample. Therefore, the inertia-based KEPCA-based monitoring with our NCI demonstrates the existence of the fault 1 and fault 14 along the fault operation period.

Figure 5 shows the result of variables contribution to the fault 1. It can be seen that the largest contribution comes from variable number 1 and variable number 44. Considering Table 1, they correspond to the A feed and the A feed flow valve variables, respectively. Evidently, these two variables are closely connected to the component A feed. While, Figure 6 reveals the variables’ large contribution to the fault 14. We can see that variable numbers 9, 21, and 51 are the cause of this fault. Referring to Table 1, we can find that the variable number 9 concerns reactor temperature, variable number 21 deals with reactor cooling water inlet temperature, and variable number 51 relates to the reactor cooling water flow valve. Obviously, all mentioned variables are connected to fault of the reactor cooling water valve.

Tables 3 and 4 provide the detection rates and computation time, respectively, for all 21 faults using inertia-based KEPCA with our proposed NCI compared to the index proposed by Choi et al. [10] named the unified index. Results on TE show that the choice of the monitoring index affects the process monitoring result. In term of FDRs, our NCI-Parzen has overall higher rates over its counterpart, the unified index. Besides, comparing our results to the results given in [15] named KPCA-KDE, we can see that our method competes with KEPCA-KDE. Both inertia-based KEPCA and KPCA-KDE provide good results, in terms of the detection rate for faults ID 1, 2, 4, 6, 7, 8, 12, 13, 14, and 17. However, for faults that are proved difficult in detection (faults ID 3, 9, and 15), our method inertia-based KEPCA performs better than KPCA-KDE. Moreover, Table 4 shows that the elapsed time before a fault is detected for our NCI is either less or equal to the one for the unified index. Whereas, the computation time for the KPCA-KDE and inertia-based KEPCA is still almost the same. Therefore, integrating our NCI to the inertia-based KEPCA technique for monitoring grants better monitoring results compared to the unified index combined to inertia-based KEPCA assumption.

4. Conclusion

The present study aims to investigate our proposed method inertia-based KEPCA for selecting automatically the sufficient number of PCs for model development. The model was constructed under NOC, and only 17 PCs were automatically estimated to be sufficient and expressed maximum inertia entropy. Hence, this approach can be integrated to the statistical multivariate methods. In addition, our developed new combined index based on T2 and SPE integrates the information gained from the principal and the residual spaces together. The NCI detects more and earlier fault than the unified index in terms of fault detection rate and computation time. Thus, it can be expanded to the nonlinear process fault detection in order to simplify the detection task and avoid confusion. The UCL of our NCI was derived directly from its PDF estimated using the Parzen window estimator. Overall results show that our approaches improve dimensionality reduction, the fault detection, and identification applied to the nonlinear benchmark Tennessee Eastman process.

Data Availability

The Tennessee Eastman data used to support the findings of this study have been deposited in http://depts.washington.edu/control/LARRY/TE/download.html.

Conflicts of Interest

The authors declare that there are no conflicts of interest.