Abstract

A novel fault detection method is proposed for detection process with nonlinearity and multimodal batches. Calculating the Mahalanobis distance of samples, the data with the similar characteristics are replaced by the mean of them; thus, the number of training data is reduced easily. Moreover, the super ball regions of mean and variance of training data are presented, which not only retains the statistical properties of original training data but also avoids the reduction of data unlimitedly. To accurately identify faults, two control limits are determined during investigating the distributions of distances and angles between training samples to their nearest neighboring samples in the reduced database; thus, the traditional -nearest neighbors (only considering distances) fault detection (FD-kNN) method is developed. Another feature of the proposed detection method is that the control limits vary with updating database such that an adaptive fault detection technique is obtained. Finally, numerical examples and case study are given to illustrate the effectiveness and advantages of the proposed method.

1. Introduction

Fault detection has been one focus of recent efforts since there existed a growing need for the quality monitoring and safe operation in the practical process engineering [14]. The objective existences of dynamic change, multiple modes, and nonlinearity pose serious challenges for fault detection proceeding in most of the process engineering, such as semiconduction process [58]. Hence, an effective and adaptive fault detection technology is worth investigating in order to deal with these obstacles.

Note that nonlinear PCA method [9] dynamic PCA [10] have been reported to be used for tackling dynamic and nonlinear process. Following them, [11] investigated the fault detection for nonlinear systems based on T-S fuzzy-modeling theory. Reference [12] investigated the nonlinear systems modeling and fault detection for electric power systems. However, the aforementioned methods fail to work well for the dynamic systems with nonlinearity together with multiple modes. Recently, [5, 6, 1315] proposed some detection techniques to jointly address the nonlinear, multimodal, and dynamic behaviors of systems. References [5, 6] applied kNN rule and improved PCA-kNN to fault detection for semiconductor manufactory process with nonlinear and multimode behaviors. Reference [14] proposed an adaptive local model based on the monitoring approach for online monitoring of nonlinear and multiple mode processes with non-Gaussian information. Reference [15] proposed a data-based just-in-time (JIT) SPC detection and identification technique, where the distance was calculated and checked every time when fault detection was conducted. Reference [13] reduced and updated training database, and it presented JIT fault detection method.

Note that it is the key how to determine the scale of reduced database for precise fault detection. However, to the best of the authors’ knowledge, how to reduce and update the training samples set to lighten the computation load and realize high detection performance has not been investigated fully to date. Moreover, there are data drift and shift as well as circumstance disturbance involved in the practical engineering application such that originally normal data may be mistaken for fault, or vice versa. Time-varying control limits design is a potential approach used to overcome the above-mentioned negative factors, while few results have been available in the literature so far, which motivates the present study.

This paper is concerned with the two time-varying control limits design used for online fault detection for the multi-mode and nonlinear processes. The key idea is that Mahalanobis distances among samples and super ball domains of mean and variance of samples are first computed by a JIT approach to reduce and update the training data set as queries being detected. Then, two control limits are computed in terms of both kNN distance and kNN angle rules such that we can accurately identify whether the current data is normal or not by on-line approach. It is worth pointing out that two control limits vary according to the updating database such that an adaptive fault detection technique that can effectively eliminate the impact of data drift and shift on the performance of detection process is obtained. Several distinguished differences from the existing solutions to deal with fault detection for industrial processes with nonlinear and multi-mode behaviors are given below.(1)Compared with [5, 6], FD-kNN method is improved in the sense of the stochastic characteristic (mean) of training samples’ angles to their -nearest neighboring training samples being investigated to calculate control limit. Thus, two control limits are derived, which is the significant contribution in this paper.(2)Different from [15], we propose a new fault detection framework used to reduce and update database, as well as vary control limits. Note that [5, 6] are not also focused on this framework.(3)There exist two significant differences from [13]. The first one is that the method of reducing training database. Here, two thresholds are proposed to control the reduction of data. The second one is that two time-varying control limits used for detecting fault are presented, while only one time-varying control limit is derived in [13].

This paper is organized as follows. An algorithm of reducing training database is presented in Section 2. Section 3 is dedicated to describe the on-line fault detection method. Section 4 presents the results of experimental simulation. Conclusions are stated in Section 5.

2. Reducing the Training Data Set

In this section, we will describe a technique of reducing the training data set.

The need for reducing training samples in database originates from the need to reduce calculation load and cost expenditure for fault detection. However, the key is how to control the reduction degree, while guaranteeing high detection quality. Here, we try to utilize the property that the closer the Mahalanobis distance between two samples is, the more similar their basic features are. The basic idea is that the two data with the closest Mahalanobis in database are searched and substituted for the mean of them [16], and this process is repeated until both mean of samples and variance of samples exceed a specific threshold. Let denote the training data matrix with samples (rows) and variables (columns), meanwhile, also represents the raw data set that consists of samples. The detailed algorithm is given as follows.

Algorithm 1. One has the following.
Step  1. Let and denote the covariance matrix of , where and are the stochastic variables, and those means are denoted by and , respectively.
Step  2. Calculate the variance and mean of samples as and , where denotes the sample variance corresponded to stochastic variable .
Step  3. For each sample, we calculate the Mahalanobis distances between it and all of the other samples stored in data set , and we define a Mahalanobis distance matrix , where represents the Mahalanobis distance between sample and sample .
Step  4. The minimum and nonzero element in each raw in the matrix is searched and all of them construct a row vector . Moreover, we record the place (column number) of each minimum element in each row, and they are placed in a row vector . Based on it, finding out the minimum value in , and if its place in is and No. element is in vector , then is the minimum value in the matrix , which means the Mahalanobis distance between the sample and the sample is the closest in training data set.
Step  5. Leting , the sample is replaced by the mean of the sample and the sample , and the sample is deleted; thus, the matrix is reduced a row. Similar to Step 2, the variance and mean of samples are calculated, respectively. Set a threshold , if the variance and mean of samples belong to the super ball domain of the variance and mean of samples ; that is, and , return to Step 3, where ; otherwise, is the simplified data SET. Exit.

Remark 2. Note that the two similar samples are replaced by the mean of them based on Mahalanobis distance; however, the statistical characteristics of samples will remain essentially unchanged since the threshold that bounds the ranges of mean and variance-centered super ball domains limits the reduction degree of training data. Obviously, the smaller the threshold , the fewer the samples deleted, and then good detection results may be obtained since mean and variance of raw data change less. However, too much data will have heavy load on both storage cost and computation. Therefore, we advise that a proper small threshold should be determined based on the compromise of lower cost and higher detection performance. Obviously, Algorithm 1 is a kind of logical and promising way of reducing training data.

Remark 3. As a matter of fact, our approach has obviously extended the methods used to reduce database in [13, 16] in the sense that the changes of mean and variance of raw data set are limited inside two specific super ball domains. Few changes in mean and variance of raw data set guarantee that the statistical characteristics of raw data are unchanged to some level.

3. Detection Method

The basic principle of the proposed fault detection method in this paper is that the trajectory of an incoming normal sample is similar to the trajectories of training samples that consist of normal data, which means that the trajectory of an incoming fault sample must exhibit some deviations from the trajectories of normal training samples [5, 6]. In other words, the distance between a fault sample and the nearest neighboring training samples must be greater than a normal sample’s distance to the nearest neighboring training samples, and a fault sample’s angle with the nearest neighboring training samples must also be greater than a normal sample’s angle with the nearest neighboring training samples. Therefore, if we can determine the distribution of training samples’ distances to their nearest neighboring training samples and the distribution of training samples’ angles with their nearest neighboring training samples, we can define two control limits for given confidence levels. A query is considered abnormal if its distance to its nearest neighboring training samples is beyond the control limit or the control limit . Otherwise, the query is normal.

In this section, we will give two fault detection methods. One is that queries are identified with fixed control limits as shown in Figure 1. In Figure 2, the other scheme in which control limits are updated along with normal query updating database also can be seen.

3.1. Fault Detection with Fixed Control Limits

(A) Offline Model Building

Algorithm 4. One has the following.
Step  1. Set choose positive integers , , and denotes the number of data in the reduced database .
Step  2. Find neighbors with the nearest distance and neighbors with minimum angle for sample in the database , respectively.
Step  3. Calculate the squared distances between sample and its nearest neighbors and the angles between sample and its neighbors.
Step  4. Calculate the mean of these squared distances and the mean of these angles.
Step  5. Set . If , go to Step 2; otherwise, go to Step 6.
Step  6. Estimate the cumulative distribution functions of and to obtain and .

Remark 5. At Step 2, Euclidean distance is used since it is simple and easy, but any other distance is also suitable for the method proposed. For the choice of and , an alternative approach is to try several different values of and on historical data and choose the values that give the best cross-validation [5, 6].

Remark 6. Two control limits and proposed in this paper are determined in terms of cumulative distribution functions for given confidence levels, which means that the mean values of vast majority squared distance and angles based on kNN rule for the normal samples do not exceed it. For example, 95% control limit means a value within which 95% of population of normal operation data (calculated mean values) are included. Here, 95% is called confidence level based on probability and statistical theory.

(B) Online Detection

Algorithm 7. One has the following.
Step  1. Calculate the squared distances between the query and its -nearest neighbors and the angles between it and its neighbors in the reduced data set .
Step  2. Calculate the means of the above squared distances and angles.
Step  3. The query is abnormal if the means are beyond either or ; otherwise, this query is normal. , return to Step 1.

3.2. Fault Detection with Varying Control Limits

Firstly, we perform Algorithm 4 to obtain two control limits or based on the reduced training data. Next, we continue to carry out Algorithm 8 that is obtained by rewriting Algorithm 7.

Algorithm 8. One has the following.
Step  1. Calculate the distances between the query and its -nearest neighbors and the angles between it and its neighbors in the reduced data set .
Step  2. Calculate the means of the above squared distances and angles.
Step  3. The query is abnormal if the means are beyond either or ; then, , return to Step 1. Otherwise, this query is normal and it can be put into the reduced database to updated database .
Step  4. Continuing to perform Algorithm 1 and Algorithm 4, the new or are calculated. , return to Step 1.
More detailed implementation of Algorithm 8 is shown in Figure 3.

Remark 9. Obviously, time-varying control limits obtained by recalculating control limits when the new detected normal data are added into database can reduce the effect of drift and shift or circumstance disturbance on fault detection, compared with the fix control limit [5, 6, 15]. However, it may also increase computation complexity and cost expenditure. It should be pointed out that Algorithm 1 is implemented once the on-line detection process is completed as shown in Algorithm 8. Thus the updated database can be reduced, such that the low cost of storage and high fault detection quality can be guaranteed simultaneously.

Remark 10. Compared with [5, 6, 913, 15], the technique that reduces and updates training data set is a main contribution in this paper. More importantly, as shown in Figure 3, the varying control limits can be derived such that the adaptive fault detection can eliminate the impact of data drift and shift on the quality of fault detection to some extent. Moreover, we make use of two control limits to identify faults; it is natural that detection performance is better by the method proposed in this paper than that obtained in [5, 6].

Remark 11. Note that the database is updated by on-line approach and confidence limits of statistics used for detecting faults are also time-varying in [14]. For the sake of comparison, we give two differences. One is that the database is not only updated but also reduced during detection process. The other is that models of systems are not constructed, and two controls limits are presented by investigating the statistical characteristics.

Remark 12. In fact, the difficulties posed by nonlinearity, dynamic changes, and multiple modes of control process on fault detection have been addressed explicitly by the detection method proposed, which comes as no great surprise, since the kNN technique (handling nonlinearity and multiple modes), the on-line detection, and update scheme (adapting to dynamic changes) are integrated.

4. Numerical Examples

In this section, two examples are given to show the effectiveness of the proposed fault detection technique. In Example 1, firstly, we give the results of reducing training data set under different thresholds and the main aim is to show the effect of thresholds on the number of reducing training data; secondly, we not only verify the effectiveness of the detection method proposed in this paper for nonlinear process but also illustrate that faults can be identified better under two control limits than one control limit; in addition, comparative results are given to show the advantages of this paper. In Example 2, we verify the effectiveness of proposed detective technique under multiple modes. More importantly, the advantage of dynamic and varying control limit can be shown.

Example 1. Consider the following dominant nonlinear and single process mode:
(A) Reduction of Training Data under Different Thresholds. 60 normal runs are operated for verifying the method of reducing training data set. Here, three thresholds , , and are set, corresponding to which Table 1 gives the number of left samples in the reduced data set, and the upper bounds of distance deviations between mean and variance of left data and those of the training data can also be seen in Table 1. Figure 4 shows the training data set and the reduced data set under different thresholds.

As shown in Table 1 and Figure 4, the left data become less and less and distance deviations of mean and variance increase with the threshold increasing. Then, a small threshold is favorable in order that the characteristics of training data is retained leading to better detection results. To clearly describe the reducing data process, we give Figures 57. In Figure 5, the red star denotes the mean of training data, and blue stars with dashed line describe the changes of means of training sample during reducing process under threshold . Similarly, Figures 6 and 7 show the changes of variance of sample under the aforementioned threshold. Moreover, the circular regions (when the dimension of sample exceeds 3, we use the supper ball regions) with radius of 0.02 are shown in Figures 5 and 7, respectively.

(B) Verification of the Detection Method Proposed and Comparison with relevant results. Continuing to operate the system (1), we obtain 300 normal data used for training data, 5 normal runs used for validation, and 10 faults introduced and all of these data are shown in Figure 8. Some necessary parameters used in Algorithms 1 and 4 and obtained results are given in Table 2. Here, two confidence levels 99% and 85% are chosen to obtain the and . The detection result by the method in this paper is presented in polar coordinates shown in Figure 9. Clearly, the normal data should belong to the area enclosed by the polar axis, a ray with polar angle and polar radius based on Algorithm 7, and the data outside the area are faults. It should be pointed out that faults that do not appear in Figure 9 have exceeded the display extent. From Figure 9, all of faults are accurately identified. Simulation results presented illustrate that defection performance does not suffer degradation by virtue of the reduced data set, which will contribute to saving storage space and reducing the computational complexity. Figures 10 and 11 show the detection results by the method proposed in [5, 13], respectively. In Figure 10, fault 3 is identified as normal and faults 1, 3, and 4 are mistaken for normal data in Figure 11. The single control limit based on the nearest distances is used in [5, 13] to detect data, while two control limits obtained by utilizing the distributions of -nearest distances and smallest angles are used in detective process in this paper. By comparison with the methods in [5, 13], the advantage of the detective technique with two control limits over one control limit is obvious.

Example 2. Considers the following bimodal cases [5, 6]:
The above two cases are operated to produce 200 normal data, respectively, and continue to be operated to produce 10 faults that are used for the defection, respectively. Moreover, the first case is operated to produce 10 normal data and 50 normal data are produced in operating the second case, then the total 60 normal data are used for validation. All data are given in Figure 12. Similar to Example 1, both and are set to be 10, and the confidence level is chosen as 99% and 90% to calculate the and , respectively. Threshold is set. When validations are detected, data set is updated and reduced, and control limits are also updated. At last, 299 normal samples are left to use for fault detection. The detection results by the proposed method in this paper is presented in Figures 13 and 14. As shown in Figures 13 and 14, control limits and are updated as the validations are identified and all of faults are identified correctly. Note that the normal sample 13 used for validation is correctly identified, while it is mistaken for fault under the fix control limit (similar to [15]), which illustrates the advantage of the varying control limits.

5. Case Study

In this section, all of data used for training and validation are produced from an AL stack etch process that was performed on a commercial scale Lam 9600 plasma etch tool at Texas Instrument, Inc. [17]. It is well known that AL stack etch process is characterized by the multiple modes and nonlinearity, and it is usually accompanied by data drift and shift. By Algorithm 8, the control limits and are calculated, respectively. Changing training data, faults are obtained. Figure 15 shows the detection results by the proposed method in this paper. One can clearly see that almost all of validations are identified as normal data and all of faults exceed the control limit . This case illustrates the effectiveness of the proposed method.

6. Conclusions

This paper studies an adaptive fault detection method faced to process engineering with nonlinear and multimodal behaviors. The main idea is as follows: firstly, the training database is reduced and updated by on-line approach to lighten storage load and obtain varying control limits; next, two control limits are determined by investigating the distributions of the kNN squared distances and kNN angles of normal samples to guarantee high quality of detection. The developed FD-kNN method based on local neighborhoods naturally handles process nonlinearity and multimodal environment. We highlight that two control limits are actively adjusted by on-line approach to overcome effect of drift and shift on the quality of detection. Thus, queries can be identify as correctly as possible. Finally, numerical examples and case study are given to illustrate the effectiveness and advantages of the proposed method. With the development of signal estimation technology of networked nonlinear stochastic systems [1820], on-line and adaptive fault detection methods for these systems based on updated database will be discussed in the future.

Conflict of Interests

The authors do not have any conflict of interests regarding the content of the paper.

Acknowledgments

The authors would like to acknowledge the National Natural Science Foundation of China under Grants 61104093, 61174119, 61174026, 61104003, 60774070, and 61074029, the Special Program for Key Basic Research Founded by MOST under Grant 2010CB334705, the National High Technology Research and Development Program of China (863 Program) under Grant 2011AA040101, and the Scientific Research Project of Liaoning Province of China under Grants L2012141, L2011064. The authors also would like to acknowledge the Opening Project of Key Laboratory of Networked Control Systems, Chinese Academy of Sciences.