Abstract

Due to its simplicity and easy implementation, partial least squares (PLS) serves as an efficient approach in large-scale industrial process. However, like many data-based methods, PLS is quite sensitive to outliers, which is a common abnormal characteristic of the measured process data that can significantly affect the monitoring performance of PLS. In order to develop a robust prediction and fault detection method, this paper employs the partial robust M-regression (PRM) to deal with the outliers. Moreover, to eliminate the useless variations for prediction, an orthogonal decomposition is performed on the measurable variables space so as to allow the new method to serve as a powerful tool for quality-related prediction and fault detection. The proposed method is finally applied on the Tennessee Eastman (TE) process.

1. Introduction

With the rapid development of modern science and technology, the industrial production processes become more automated and more complicated. The result is that safety and reliability of the complicated process become critical issues concerned during the process design [1, 2]. Many efforts have been done both in industry and academia. If precise analytical model of the process is known as prior, the well-developed model-based diagnosis approaches can be successfully applied for online process monitoring [38]. However, limited to the poor understanding of the underling process, it is quite difficult to obtain a precise model of the process, which means that model-based techniques usually cannot be applied in practical.

Different from model-based approaches, data-driven techniques do not require any knowledge about the model of the complex process. Many efficient data-driven methods have been developed in recent years [913]. Due to its simplicity and easy implementation, partial least squares [14, 15] quickly becomes one of the most popular methods. By identifying the regression coefficient between the measurable variables space and the prediction variables space, PLS can be easily applied for the prediction of the quality-related indicator [16, 17]. Besides, the successful applications of PLS in fault detection have also been reported in many existing literatures [9, 16, 18]. However, one drawback of PLS is that it is very sensitive to the abnormal characteristics of the measured process data, for example, outliers, which may be caused by various reasons like formatting errors, hardware failure, nonrepresentative sampling, and so forth. One single outlier may seriously affect the performance of PLS. In statistical sense, outliers are samples with extreme values that are located far from the data majority. There are two categories of outliers in the measurable variables space and the prediction variables space, called high leverage points and high residual points, respectively. To overcome the drawback of classical PLS, many robust versions of PLS had been proposed [1921]. Nevertheless, all these methods either suffer from nonrobust to high leverage points or are not efficient enough. To develop a robust and efficient method, Serneels et al. [22] proposed a partial robust M-regression (PRM) approach which weakens the effect of outliers by choosing a proper weighting scheme with relative less computational load. PRM has become a popular method and a matlab toolbox had been developed.

On the other hand, the goal of modern industrial process is the pursuit of high quality, not just the high production. It is extremely important to ensure high quality products to make enterprises to survive in the fierce competition of the worldwide market. Nowadays, quality-related prediction and diagnosis play a critical role in practical production and have a wide range of applications [23, 24]. Therefore, it has practical significance to develop a robust data-driven, quality-related prediction and diagnosis method which can deal with the outliers. The main purpose of this paper is to develop such an approach. Based on the partial robust M-regression method, this paper first realizes a PRM-based prediction approach. Furthermore, with an orthogonal decomposition [16] performed on the measurable variables space, this paper finally develops a quality-related prediction and diagnosis method.

The rest of this paper is organized as follows. Section 2 first reviews the basic algorithm of the partial robust M-regression and then proposes the new method. Section 3 briefly introduces the industrial benchmark of Tennessee Eastman (TE) process. Section 4 presents the simulation results, and we draw conclusion in Section 5.

2. Preliminaries and the New Approach

2.1. Partial Robust M-Regression

PRM is a robust version of PLS which can weaken the effect of outliers by choosing a proper weighting scheme. Let us first review the classical PLS algorithm. Given measurement data matrix (measurable variables space) , in which observations of measurable variables are recorded, and a quality variable vector which contains observations of one prediction variable (a univariate output is considered here), that is, by projecting and onto the latent variables space, we have the following PLS model: where is the number of latent variables and and are the loading matrices of and , respectively. is the predicted output and is the regression coefficient between the measurable variables space and the prediction variables space. and are the residuals of and , respectively. Algebraically, the PLS model can be calculated by an iterative algorithm, such as NIPALS [25] or SIMPLS [15]. We take SIMPLS for example, which can be summarized in Algorithm 1.

S1: Normalize and into zero mean and unit variance.
S2: Set , , , for .
 2.1 compute , which is the dominant eigenvector of ;
 2.2 ;
 2.3 ;
 2.4 ;
 2.5 ;
 2.6 ;
 2.7 ;
 2.8 .
S3: , ,
= , ,

As mentioned previously, there are two categories of outliers existing in the measurable variables space and the prediction variables space, respectively. In order to weaken their influence, two types of weighting coefficients are designed in PRM, called leverage weights and residual weights , which are computed as follows: with where and are the median estimate and -median estimate, respectively. is the “fair” function and is a tuning constant [22]. Then, the global weight is To solve PRM, an iterative reweighted partial least squares algorithm will be used. In each step, the observation will be first multiplied by to , and then PLS regression is performed on the reweighted model. Detailed steps of PRM are summarized in Algorithm 2.

S1: Calculate using (7) with initial value , calculate using (6) with replaced by ,
 and then compute using (11);
S2: Multiply each row of and by , then perform
 PLS regression (see Algorithm 1) on the new PLS model. Divide each row of by ;
S3: Calculate using (8), and using (6), (7), (11);
S4: If the relative difference in norm between two consecutive approximations of the regression coefficients is smaller than
 then continue to the next step, else go back to S2;
S5: Get , , , in the last step of PLS at the final iteration.

2.2. The Proposed Prediction and Diagnosis Approach

Based on the algorithm of PRM (see Algorithm 2) and (5), we can easily implement online prediction of quality-related indicators. Next we will develop EM-PRM-based fault detection scheme. For most existing schemes, detecting all the faults timely and accurately is the most important evaluation criteria. However, not all the faults may cause serious damage in practice, for example, faults that are unrelated to the quality-related indicators are harmless to the production. Therefore, if the nature of the fault is known in advance, reducing fault alarm rate for the fault unrelated to the quality-related indicators is another important evaluation criteria [16]. Zhou et al. [26] proposed such a criteria which classifies the faults into two categories, that is, faults effecting and faults having influence on . Based on this criteria, we should design test statistics and the corresponding threshold in subspaces and , separately. Following the idea of [16], an orthogonal decomposition algorithm is employed in our new fault detection scheme.

First, perform singular value decomposition on matrix : where , , .

Then, construct orthogonal spaces , as follows:

Last, project onto the orthogonal subspaces and :

After obtaining and , we can continue to design test statistics and threshold in the two subspaces, respectively. Firstly, we use for statistic for monitoring subspace ; we have and the corresponding threshold is where is distribution with significance level . If , a fault which affects appeared, and else it is fault free.

Similarly, we use for monitoring subspace and have the following statistic and corresponding threshold: If , a fault which has no influence on appeared, and else it is fault free. We summarize the main steps of our new method in Algorithm 3.

S1: Compute regression coefficient using PRM algorithm (see Algorithm 2).
S2: Realize online prediction using (5).
S3: Perform SVD on .
S4: Monitor subspace using (15) and (16).
S4: Monitor subspace using (17) and (18).

3. Description of Tennessee Eastman Benchmark

The Tennessee Eastman process is a chemical plant simulator, in which a total of 53 variables are available with 12 manipulated variables (XMV(1–12)) and 41 process variables (XMEAS(1–41)). It is developed by Eastman Chemical Company to serve as a benchmark for research purpose and it can be downloaded from http://brahms.scs.uiuc.edu. Figure 1 shows the schematic diagram of TE. As we can see, five units are contained in the process: a vapor-liquid separator, the condenser, the reactor, a product stripper, and a recycle compressor. The process produces two products from four reactants. An inert and a by-product are also present, making a total of eight components, which are named as A, B, C, D, E, F, G, and H [10]. Additionally, for monitoring studies purpose, 21 faults (IDV(1–21)) are designed in the benchmark just as shown in Table 1. The effectiveness of the proposed approach will be verified on the TE benchmark.

4. Simulation Results

In this section, the proposed scheme will be applied on the TE benchmark. Two tasks are involved in the simulation, that is, quality-related prediction and fault detection. Firstly, we determine the input and output variables. As mentioned earlier, there are 53 variables available and we choose 22 process measurements (XMEAS(1–22)) and 11 manipulated variables (XMV(2–12)) as the input variables. The analyzer for component G (XMEAS(40)) is used for the final product analysis; therefore, we choose it as the output variable, that is, the quality indicator. A total of 480 samples are obtained from normal process operation, and these samples will be used for scheme design. In addition, a certain percentage of outliers are artificially added into the normal samples. Four groups of prediction results are completed, in which 0% outliers, 5% outliers, 10% outliers, and 15% outliers are mixed in the normal samples, respectively. Figures 2, 3, 4, and 5 show these results. As we can see in these figures, the classical PLS method has an obvious prediction bias due to the existing of outliers, especially in Figure 5. In contrast, the PRM-based method provides a more accurate prediction result. These results explain the nonrobust nature of PLS and verify the robustness of PRM.

Next, we apply the PRM-based method for quality-related fault detection. As explained previously, in the sense of quality-related classification of faults, the PRM-based method should distinguish whether a fault affects the predicted output or not. To illustrate this, we detect the faults IDV, IDV, IDV, and IDV using the PRM-based fault detection method. Both the statistical results in the orthogonal subspaces and are shown in Figures 6(a)6(d), represented by “T2h” and “T2t”, respectively. According to [16], we know that and are completely unrelated. Seen from Figures 6(a)6(d), the faults that affect the predicted output and have no effect on the predicted output are clearly distinguished. For example, in Figure 6(b), although both subspaces and have a fault during 160 s and 350 s, the fault in subspace disappears after 350 s. In Figure 6(d), during 250 s and 400 s there is a fault in subspace but there is no fault in subspace . In Figures 6(a) and 6(c), both subspaces and have a fault or not, synchronously. By summarizing these experiments we are able to come to a conclusion that all these simulation results demonstrate the effectiveness of the proposed new approach.

5. Conclusion

Aiming to solve the nonrobustness of PLS against missing values and outliers, this paper presents an PRM-based quality-related prediction and fault detection scheme. Based on the partial robust M-regression approach, a prediction method is first implemented. Following the idea of orthogonal projection, different test statistics are designed in the two orthogonal subspaces, respectively. Thereby, quality-related fault detection is realized, in which faults that affect or do not affect the quality indicator are distinguished, so that the false alarm rate for the fault unrelated to the quality indicator will be reduced. The effectiveness of the new approach is finally demonstrated on the benchmark of Tennessee Eastman process.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The research leading to these results has received funding from the Polish-Norwegian Research Program operated by the National Centre for Research and 24 Development under the Norwegian Financial Mechanism 2009-2014 in the frame of Project Contract no. Pol-Nor/200957/47/2013.