`Journal of Applied MathematicsVolume 2014 (2014), Article ID 729763, 9 pageshttp://dx.doi.org/10.1155/2014/729763`
Research Article

## Nonlinear Fault Separation for Redundancy Process Variables Based on FNN in MKFDA Subspace

1College of Automation, Chongqing University, Chongqing 400044, China
2School of Electrical and Information Engineering, Chongqing University of Science and Technology, Chongqing 401331, China
3School of Safety Engineering, Chongqing University of Science and Technology, Chongqing 401331, China
4College of Mechanical and Power Engineering, Chongqing University of Science and Technology, Chongqing 401331, China

Received 19 July 2013; Revised 22 October 2013; Accepted 23 October 2013; Published 4 February 2014

Copyright © 2014 Ying-ying Su et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Nonlinear faults are difficultly separated for amounts of redundancy process variables in process industry. This paper introduces an improved kernel fisher distinguish analysis method (KFDA). All the original process variables with faults are firstly optimally classified in multi-KFDA (MKFDA) subspace to obtain fisher criterion values. Multikernel is used to consider different distributions for variables. Then each variable is eliminated once from original sets, and new projection is computed with the same MKFDA direction. From this, differences between new Fisher criterion values and the original ones are tested. If it changed obviously, the effect of eliminated variable should be much important on faults called false nearest neighbors (FNN). The same test is applied to the remaining variables in turn. Two nonlinear faults crossed in Tennessee Eastman process are separated with lower observation variables for further study. Results show that the method in the paper can eliminate redundant and irrelevant nonlinear process variables as well as enhancing the accuracy of classification.

#### 1. Introduction

With developments of modern process industry, multivariate monitor from sensors has showed their multicollinearity, nonlinear correlative coupling, time delay, and redundancy. It makes complexity increasing with exponent to fault separation and diagnosis, called “Curse of Dimension” [1, 2]. On the other hand, right ratio of fault classification decreases with multivariate and redundancy process variables. Therefore, many attentions have been paid on two points of view that are variable selection and dimension reduction [3, 4].

Among the study of variable selection, the existed methods can be broadly classified into three categories: random search techniques, measure-based method, and intelligent computation. In random search, each process variable is directly deleted or involved in the classification model one time in turn to search the most suitable input sets under a certain criterion, such as forward selection, backward selection, and stepwise that are simple and easily realized methods [5]. While it was studied by Masion and Gunst [6] that these methods would result in mistaken results, variable set appears multicollinearity. Measure-based method appears to select variable with computing relevancy among all variables, as well as that between variables and labels. The variables with highest similar characteristic will be gathered in one kind. According to different definition, K-L information measure, minimum description length, and mutual information are used [79]. Intelligent computation deepens to solve nonlinear variable selection problem, such as neurnal network that is once used to nonlinear model, while its selection criterion is uncertain [10].

Dimension reduction is different from variable selection, which mainly depends on transformation and information extraction of original variable matrix. It projects original variables with a certain mapping to a new subspace and extracts information in lower dimension, such as principal component analysis (PCA) [11] and partial least squares (PLS) [12]. Original variables with linear-relative process variables are linearly projected according to the maximum direction of covariance matrix. Meanwhile, the maximum original information can be kept as most as possible. Contribution chart method is the way to calculate contribution of each variable to certain fault with statics and SPE [13, 14] for PCA. The above linear methods have been extended to nonlinear ones after kernel method presented [1520], such as kernel principal component analysis (KPCA), kernel partial least squares (KPLS), and kernel fisher discriminant analysis (KFDA). Kernel method converts a linear classification learning algorithm into nonlinear one, by mapping the original observations into a higher-dimensional space. So that linear classifier in the new space equals to a nonlinear classifier in the original space.

However, nonlinear information projected to the new feature space has higher dimension, and data matrix has lost their original physical meaning in original sample space. If we separated nonlinear faults crossed together in original space, the dimension of classifier with kernel method would become huge, while right ratio would decrease with redundancy and multicollinearity variables.

The objective of this paper is to deepen dimension reduction method for the above problems with measure method in variable selection called MKFDA-FNN. Nonlinear process variables are projected in higher-dimension space with MKFDA. Discriminant vector and its corresponding feature vector with maximum separation are computed to cluster original variables with highest similarity. With embed-dimension increasing, false nearest neighbors (FNN) with high similarity are able to be removed in turn. Thus, nonlinear redundancy and multicollinearity process variables can be removed from input sets to nonlinear classifier. Finally, we give an actual fault separation problem in classical chemical process Tennessee Eastman (TE) to further study.

#### 2. Problem Description

In fault separation problem presented above, it equals to screen original process variables related to certain faults as most as possible. Multivariate data matrix considered initially with normal and fault information is described in Figure 1, where are process variables with -dimension, present time-delay variables of at different sample time, present time-delay variables of , and present time-delay variables of at different sample-time. In this way, original data matrix is composed of -dimension process/control variables and their delay variables in where , , and present maximum delay order of process/control variables , , and , presents current sample time, and is sample length.

Figure 1: The fault diagnosis with multivariate.

#### 3. Multivariate Fault Separation Based on MKFDA-FNN

To fault separation problem with nonlinear redundancy process/control variables, an approach is proposed in Figure 2. Correlated nonlinear variables are firstly projected to a higher-dimension MKFDA subspace. Furthermore, in order to find fairly useful variables, the importance of each input is measured in subspace with distance measure inspired by FNN. Accordingly, redundant variables are recognized. It makes separation of faults crossed together easily.

Figure 2: Nonlinear fault diagnosis with redundancy process variables based on FNN in MKFDA subspace.
##### 3.1. False Nearest Neighbors

FNN is the feature selection method on the basis of phase space reconstruction (PSR) in high-dimension data space [21]. With embed-dimension increasing, movement locus becomes open, and false nearest neighbors with high similarity are able to be removed in turn. It restores the locus of chaos. Its algorithm is as follows.

In -dimension phase space including original variables and their time delay, each phase vector has one nearest neighbors . Their 2-norm distance is

When -dimension is increased to , the above phase vector is changed as new one, noted as in

If was much bigger than , it means the projection of two nonneighbor phase vector from higher dimension to lower one. So the two neighbors are the false nearest neighbors.

Note that

If is larger than , should be fault nearest neighbor of . Threshold is determined between interval (10, 50). Once there appeared noise in process data, the following judge criterion should be involved. If , should be nearest fault neighbor of , where is

The distance measure between vectors can explain the similarity of false nearest neighbors factually in (6). Assume that there was a data space with -dimension variable, and one sample vector is . We set variable as zero, standing for vector without variable that is noted as in Figure 3.

Figure 3: Data space with -dimension variable.

The similarity between and is

If distance measure is small, it shows that vectors and have highly similarity. That is, the removed variable makes little impact on nonlinear pattern, and process variable has low interpreting ability. Otherwise, if it was much bigger, it reveals that much differs from . Process variable is important to interpreting of nonlinear pattern. is false nearest neighbors of .

##### 3.2. Kernel Fisher Discriminant Analysis

KFDA is most useful to nonlinear classification problems [22]. Nonlinear discriminant vector in original space is extracted to linear optimal discriminate vector in high-dimension feature space with conventional fisher discriminant analysis (FDA). Since dimension of is much higher, it is hard to directly confirm nonlinear mapping function from original space to the feature space. Reproducing kernel-based method widely developed in machine learning (ML) can achieve this goal. Nonlinear mapping is indirectly found according to in Gram-space [23], where .

Conventional kernel function can be selected as follows [6].(i)Polynomial kernel function , , is constant.(ii)Gaussian kernel function , is the parameter of breadth.(iii)Sigmoid kernel function .

Assume that original sample set was with -dimension and -samples, where is the sample of th type, , and . There exists nonlinear mapping function . It transforms nonlinear original sample space to linear classification in high-dimension data space ; that is, , . In space , distance scatter of intraclass and classes with training data is and in (7) and (9), respectively, where is the mean of th type in feature space. KFDA is to find a projection direction , which meets the following two properties: data that has similar characteristic should be gathered together as most as possible; the ones with different characteristic should be gathered as far as possible. So a key is to search projection direction and its corresponding discriminant function . Similarly with linear FDA, the optimal projection direction is to search vector , which maximizes fisher criterion function (10), where is optimal projection direction:

Since dimension of feature space is usually high and is indirect mapping function, discriminant vector is hard to compute directly. Thus, each solution is expressed as linear combination of samples in (11), according to kernel-based method, where .

Moreover, nonlinear transformation function of samples can be projected to feature space with direction in

From (11), for all , assume that and projection of mean vector with direction in feature space is where , .

From (12) and (13), we have where , , .

Since fisher criterion function is optimal solution of (15), vector can be resolved as in the following fisher criterion (16) [24]:

Furthermore, the solution of optimal vector and can be solved [25] with

Thus, the corresponding function of kernel fisher discriminant function is obtained as

##### 3.3. Multikernel Fisher Discriminant Analysis

From Section 3.2, the solution of maximizing (15) equals to the solution of maximizing (16). Assume that is optimal solution to classification effect, whereas is both determined by kernel scatter matrix and difference of kernel mean vector . In the condition of independent and identically distributed, kernel mean of samples is independent with number of samples. It indicates difference of kernel mean vector doing nothing with the unbalance of samples. So is only determined by kernel scatter matrix for intraclass. If distribution of different variables differed, it should result in the contributions not in the similar interval. Besides that the solution of is not the optimal one. Hence, in order to avoid the influence of different distribution for samples, we presented multikernel fisher discriminant analysis method. It advances the kernel criterion function into where is the adjustable MATLAB parameter and and are the kernel matrix computed with each suitable kernel function from Section 3.2 (i)/(ii)/(iii).

In this way, the influence with different sample distributions is considered with the suitable kernel function.

The above algorithm in this paper can be chiefly described in Table 1. In this way, the contribution of each original process variable to the certain fault is measured.

Table 1: Steps designed in this paper.

#### 4. Fault Separation of Tennessee Eastman with Redundancy Variables

##### 4.1. Tennessee Eastman Chemical Process

Tennessee Eastman (TE) is a classical chemical process created by Eastman Chemical Company in 1993 [26]. Its technological process is shown in Figure 4. There are four reactants (A, C, D, and E) and two products (G, H). Besides that, there is one inert material B and byproduct F.

Figure 4: The technological process of Tennessee Eastman.

In TE process, the dynamic TE model is composed of five major units: a reactor, a separator, a stripper, a condenser, and a compressor. Each unit can be expressed with some equations, in all of 148 algebraic equations and 30 differential equations. So it becomes one of the most complex models and is widely used to test study algorithm with control, system monitor, fault diagnosis, and so forth. Here, we take Tennessee Eastman as the study object to measure its fault separation ability with our method.

##### 4.2. Nonlinear Fault Separation of Redundancy Variables

In TE process, there are 41 observed variables and 12 manipulated variables from controller, some of which are nonlinear redundancy variables. Moreover, there are 20 types of classical fault in TE process shown in Table 2. Since Fault9 and Fault11 are nonlinear overlapped together shown in Figure 5, we take their fault separation as the study goal, meanwhile, 53 process variables must be screened for their multicollinearity and nonlinear redundancy. Process data of TE is simulated at one-minute sampling time in MATLAB software from Downs [27]. All the measurements have Gaussian noise. A total of 1000 samples are collected for training, where 800 data are collected for Fault9 and 200 for Fault11. In addition, 835 samples are applied to test separation validity with 644 for Fault9 and 171 for Fault11.

Table 2: State distribution in TE process.
Figure 5: The distribution of Fault9 and Fault11, described in 2-dimension diagram with -axis of process variable Vab.13 and -axis of process variable Vab.21.
##### 4.3. Results and Discussion

If we distinguished Fault9 and Fault11, there are 53 variables to be considered in all. Therefore, we compute the contribution of 53 variables with mentioned method to see the importance of each process variables on faults. Multikernel function is selected as Gaussian kernel and polynomial kernel, each comprised of 50%. The contributions of each variable to the faults are computed with steps in Section 3.3 that is shown in Figure 6 and Table 3. From large to small, the proper importance of all the 53 process variables is reordered as {Vab.21, Vab.13, Vab.9, Vab.16, Vab.7, Vab.20, Vab.11, Vab.2, Vab.12, Vab.8, Vab.19, Vab.5, Vab.22, Vab.6, Vab.3, Vab.18, Vab.14, Vab.15, Vab.17, Vab.10, Vab.41, Vab.40, Vab.27, Vab.23, Vab.29, Vab.31, Vab.26, Vab.33, Vab.25, Vab.32, Vab.4, Vab.24, Vab.30, Vab.35, Vab.34, Vab.37, Vab.36, Vab.28, Vab.39, Vab.38, Vab.1, Vab.53, Vab.52, Vab.51, Vab.50, Vab.49, Vab.48, Vab.47, Vab.46, Vab.45, Vab.44, Vab.43, Vab.42}.

Table 3: The contributions of 53 process variables to fault separation.
Figure 6: Contribution of all the 53 process variables to distinguish Fault9 and Fault11.

In the Following, the curves of the first two important Vab.21 and Vab.13 in TE process are given in Figures 7(a) and 7(b) and Figures 8(a) and 8(b), respectively. It expresses the strong variation of process variables Vab.21 and Vab.13, actually.

Figure 7: The changing of process Vab.21 in actual TE.
Figure 8: The changing of process Vab.13 in actual TE.

According to the sequence of each process variable, the different feature sets are constructed as {Vab.21}, {Vab.21, Vab.13}, {Vab.21, Vab.13, Vab.9}, and so on. Nonlinear pattern classification of Fault9 and Fault11 is tested with support vector machine (SVM), which is widely used in pattern recognition. The parameters of SVM are optimized with cross-validation and . With the above variable sets, the accuracy of fault separation between Fault9 and Fault11 is successively tested. The results are shown in Figure 9 and Table 4. It reveals that the separation accuracy becomes lower when the considered variables increase.

Table 4: The accuracy with different feature sets with testing data.
Figure 9: The accuracy with different feature sets to indentify Fault9 and Fault11 with testing data.

From the above results, we conclude that if all the 53 process variables were used to separate Fault9 and Fault 11, right ratio is merely 72.12%. It indicates that not all of the variables are directly related to certain fault. Some redundancy or irrelevant variables may decrease the classification accuracy and must be eliminated. If the feature were selected as the first five process variables {Vab.21, Vab.13, Vab.9, Vab.16, Vab.7}, the accuracy increases to the highest as 94.55%. It means that the above five process variables are key to the fault separation. If the model should be simplified at most, the process variable {Vab.21} is the best feature variable. We can recognize Fault9 and Fault11 according to the process changing of Vab.21.

On the other hand, Fault9 stands for the random disturbance to feed temperature. Fault11 is random disturbance to reactor cooling water inlet temperature. While {Vab.21, Vab.13, Vab.9, Vab.16, Vab.7} are the reactor coolant temperature, product separation pressure, reactor temperature, stripper pressure, reactor pressure, respectively, it is easy to see that the five selected variables are fairly relative to Fault9 and Fault11. The simulation results keep pace with the reality.

#### 5. Conclusions

Nonlinear redundancy and multicollinearity variables can decrease the accuracy in classifier that must be eliminated. For the problem, FNN in MKFDA subspace is studied in the paper. Nonlinear variables are projected to a new linear higher dimension subspace with single-kernel fisher discriment analysis to get optimal classification with the intra-class nearest and inter-class farthest as most as possible. Furthermore, conventional single-kernel KFDA is expanded to multikernel method to solve the influence of each process variable with different distribution function. In order to reduce the higher dimension emerging in multi-KFDA subspace, FNN is composed to recognize the importance of each process variables on faults. According to simulation results in TE process, original variables are reduced to 5 in this paper, and the accuracy of tested right ratio reaches to 94.55% compared with tested right ratio 72.12% in the classifier between Fault9 and Fault11.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This research is supported by National Natural Science Foundation of China (no. 51376204, no. 51075418, no. 50905194, and no. 61174015), the Natural Science Foundation Project (no. CQCSTC2012jjA40026, no. CQCSTC2012jjA90011), and the Research Foundation of Chongqing University of Sci. and Tec. (no. CK2011Z01, no. CK2011B04).

#### References

1. D. H. Lim, S. H. Lee, and M. G. Na, “Smart soft-sensing for the feedwater flowrate at PWRs using a GMDH algorithm,” IEEE Transactions on Nuclear Science, vol. 57, no. 1, pp. 340–347, 2010.
2. M. J. Brusco and D. Steinley, “Exact and approximate algorithms for variable selection in linear discriminant analysis,” Computational Statistics and Data Analysis, vol. 55, no. 1, pp. 123–131, 2011.
3. F. A. Michelsen, B. F. Lund, and I. J. Halvorsen, “Selection of optimal, controlled variables for the TEALARC LNG process,” Industrial and Engineering Chemistry Research, vol. 49, no. 18, pp. 8624–8632, 2010.
4. F. Cipollini and G. M. Gallo, “Automated variable selection in vector multiplicative error models,” Computational Statistics and Data Analysis, vol. 54, no. 11, pp. 2470–2486, 2010.
5. A. J. Miller, Subset Selection in Regression, Chapman and Hall, London, UK, 2002.
6. R. L. Masion and R. F. Gunst, Statistical Design and Analysis of Experiments with Applications to Engineering and Science, John Wiley & Sons, Hoboken, NJ, USA, 2004.
7. Y. Yang and J. O. Pederson, “A Comparative study on feature selection in text categorization,” in Proceedings of the 14th International Conference on Machine Learning, pp. 412–420, 1997.
8. K. Kira and L. A. Rendell, “The Feature selection problem: traditional methods and a new algorithm,” in Proceedings of the 9th National Conference on Artificial Intelligence (AAAI '92), pp. 129–134, July 1992.
9. B. Pfahringer, “Compression-based feature subset selection,” in Proceedings of the Workshop on Data Engineering for Inductive Learning (IJCAI '95), pp. 101–106, 1995.
10. J. C. Isaac, Kernel methods and component analysis for pattern recognition [Ph.D. thesis], 2007.
11. J. C. Huang, J. S. Zhao, W. Sun, and Y. K. Ding, “PCA-based early fault diagnosis of solid waste incinerator,” Chemical Industry and Engineering Progress, vol. 25, no. 12, pp. 1489–1492, 2006.
12. S. Wold, M. Sjöström, and L. Eriksson, “PLS-regression: a basic tool of chemometrics,” Chemometrics and Intelligent Laboratory Systems, vol. 58, no. 2, pp. 109–130, 2001.
13. J. D. Wu, P. H. Chiang, Y. W. Chang, and Y. J. Shiao, “An expert system for fault diagnosis in internal combustion engines using probability neural network,” Expert Systems with Applications, vol. 34, no. 4, pp. 2704–2713, 2008.
14. D. F. Wang, S. J. Wang, and J. He, “Maintaining and fault removing on hydraulic system of CAK6140,” Machinery Design and Manufacture, vol. 7, pp. 161–162, 2010.
15. J. H. Li and P. L. Cui, “Improved kernel fisher discriminant analysis for fault diagnosis,” Expert Systems With Applications, vol. 36, no. 2, pp. 1423–1432, 2009.
16. M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre, “Generalized power method for sparse principal component analysis,” Journal of Machine Learning Research, vol. 11, pp. 517–553, 2010.
17. K. Kim, J.-M. Lee, and I.-B. Lee, “A novel multivariate regression approach based on kernel partial least squares with orthogonal signal correction,” Chemometrics and Intelligent Laboratory Systems, vol. 79, no. 1-2, pp. 22–30, 2005.
18. R. Jenssen, “Kernel entropy component analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 847–860, 2010.
19. N. Otopal, “Restricted kernel canonical correlation analysis,” Linear Algebra and Its Applications, vol. 437, no. 1, pp. 1–13, 2012.
20. M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003.
21. H. Y. Wang and Z. H. Sheng, “Choice of the parameters for the phase space reconstruction of chaotic time series,” Journal of Southeast University, vol. 30, no. 5, pp. 113–117, 2000.
22. Z. B. Zhu and Z. H. Song, “A novel fault diagnosis system using pattern classification on kernel FDA subspace,” Expert Systems with Applications, vol. 38, no. 6, pp. 6895–6905, 2011.
23. S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K.-R. Muller, “Fisher discriminant analysis with kernels,” in Proceedings of the 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99), pp. 41–48, August 1999.
24. B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998.
25. J. Y. Gan and Y. W. Zhang, “Generalized kernel fisher optimal discriminant in pattern recognition,” Pattern Recognition and Artificial Intelligence, vol. 15, no. 4, pp. 429–434, 2002.
26. J. J. Downs and E. F. Vogel, “A plant-wide industrial process control problem,” Computers and Chemical Engineering, vol. 17, no. 3, pp. 245–255, 1993.
27. N. Lawrence Kicker, “Tennessee eastman,” 2013, http://www.cheme.washington.edu/facresearch/faculty/ricker.html.