Research Article  Open Access
Nonlinear Fault Separation for Redundancy Process Variables Based on FNN in MKFDA Subspace
Abstract
Nonlinear faults are difficultly separated for amounts of redundancy process variables in process industry. This paper introduces an improved kernel fisher distinguish analysis method (KFDA). All the original process variables with faults are firstly optimally classified in multiKFDA (MKFDA) subspace to obtain fisher criterion values. Multikernel is used to consider different distributions for variables. Then each variable is eliminated once from original sets, and new projection is computed with the same MKFDA direction. From this, differences between new Fisher criterion values and the original ones are tested. If it changed obviously, the effect of eliminated variable should be much important on faults called false nearest neighbors (FNN). The same test is applied to the remaining variables in turn. Two nonlinear faults crossed in Tennessee Eastman process are separated with lower observation variables for further study. Results show that the method in the paper can eliminate redundant and irrelevant nonlinear process variables as well as enhancing the accuracy of classification.
1. Introduction
With developments of modern process industry, multivariate monitor from sensors has showed their multicollinearity, nonlinear correlative coupling, time delay, and redundancy. It makes complexity increasing with exponent to fault separation and diagnosis, called “Curse of Dimension” [1, 2]. On the other hand, right ratio of fault classification decreases with multivariate and redundancy process variables. Therefore, many attentions have been paid on two points of view that are variable selection and dimension reduction [3, 4].
Among the study of variable selection, the existed methods can be broadly classified into three categories: random search techniques, measurebased method, and intelligent computation. In random search, each process variable is directly deleted or involved in the classification model one time in turn to search the most suitable input sets under a certain criterion, such as forward selection, backward selection, and stepwise that are simple and easily realized methods [5]. While it was studied by Masion and Gunst [6] that these methods would result in mistaken results, variable set appears multicollinearity. Measurebased method appears to select variable with computing relevancy among all variables, as well as that between variables and labels. The variables with highest similar characteristic will be gathered in one kind. According to different definition, KL information measure, minimum description length, and mutual information are used [7–9]. Intelligent computation deepens to solve nonlinear variable selection problem, such as neurnal network that is once used to nonlinear model, while its selection criterion is uncertain [10].
Dimension reduction is different from variable selection, which mainly depends on transformation and information extraction of original variable matrix. It projects original variables with a certain mapping to a new subspace and extracts information in lower dimension, such as principal component analysis (PCA) [11] and partial least squares (PLS) [12]. Original variables with linearrelative process variables are linearly projected according to the maximum direction of covariance matrix. Meanwhile, the maximum original information can be kept as most as possible. Contribution chart method is the way to calculate contribution of each variable to certain fault with statics and SPE [13, 14] for PCA. The above linear methods have been extended to nonlinear ones after kernel method presented [15–20], such as kernel principal component analysis (KPCA), kernel partial least squares (KPLS), and kernel fisher discriminant analysis (KFDA). Kernel method converts a linear classification learning algorithm into nonlinear one, by mapping the original observations into a higherdimensional space. So that linear classifier in the new space equals to a nonlinear classifier in the original space.
However, nonlinear information projected to the new feature space has higher dimension, and data matrix has lost their original physical meaning in original sample space. If we separated nonlinear faults crossed together in original space, the dimension of classifier with kernel method would become huge, while right ratio would decrease with redundancy and multicollinearity variables.
The objective of this paper is to deepen dimension reduction method for the above problems with measure method in variable selection called MKFDAFNN. Nonlinear process variables are projected in higherdimension space with MKFDA. Discriminant vector and its corresponding feature vector with maximum separation are computed to cluster original variables with highest similarity. With embeddimension increasing, false nearest neighbors (FNN) with high similarity are able to be removed in turn. Thus, nonlinear redundancy and multicollinearity process variables can be removed from input sets to nonlinear classifier. Finally, we give an actual fault separation problem in classical chemical process Tennessee Eastman (TE) to further study.
2. Problem Description
In fault separation problem presented above, it equals to screen original process variables related to certain faults as most as possible. Multivariate data matrix considered initially with normal and fault information is described in Figure 1, where are process variables with dimension, present timedelay variables of at different sample time, present timedelay variables of , and present timedelay variables of at different sampletime. In this way, original data matrix is composed of dimension process/control variables and their delay variables in where , , and present maximum delay order of process/control variables , , and , presents current sample time, and is sample length.
3. Multivariate Fault Separation Based on MKFDAFNN
To fault separation problem with nonlinear redundancy process/control variables, an approach is proposed in Figure 2. Correlated nonlinear variables are firstly projected to a higherdimension MKFDA subspace. Furthermore, in order to find fairly useful variables, the importance of each input is measured in subspace with distance measure inspired by FNN. Accordingly, redundant variables are recognized. It makes separation of faults crossed together easily.
3.1. False Nearest Neighbors
FNN is the feature selection method on the basis of phase space reconstruction (PSR) in highdimension data space [21]. With embeddimension increasing, movement locus becomes open, and false nearest neighbors with high similarity are able to be removed in turn. It restores the locus of chaos. Its algorithm is as follows.
In dimension phase space including original variables and their time delay, each phase vector has one nearest neighbors . Their 2norm distance is
When dimension is increased to , the above phase vector is changed as new one, noted as in
If was much bigger than , it means the projection of two nonneighbor phase vector from higher dimension to lower one. So the two neighbors are the false nearest neighbors.
Note that
If is larger than , should be fault nearest neighbor of . Threshold is determined between interval (10, 50). Once there appeared noise in process data, the following judge criterion should be involved. If , should be nearest fault neighbor of , where is
The distance measure between vectors can explain the similarity of false nearest neighbors factually in (6). Assume that there was a data space with dimension variable, and one sample vector is . We set variable as zero, standing for vector without variable that is noted as in Figure 3.
The similarity between and is
If distance measure is small, it shows that vectors and have highly similarity. That is, the removed variable makes little impact on nonlinear pattern, and process variable has low interpreting ability. Otherwise, if it was much bigger, it reveals that much differs from . Process variable is important to interpreting of nonlinear pattern. is false nearest neighbors of .
3.2. Kernel Fisher Discriminant Analysis
KFDA is most useful to nonlinear classification problems [22]. Nonlinear discriminant vector in original space is extracted to linear optimal discriminate vector in highdimension feature space with conventional fisher discriminant analysis (FDA). Since dimension of is much higher, it is hard to directly confirm nonlinear mapping function from original space to the feature space. Reproducing kernelbased method widely developed in machine learning (ML) can achieve this goal. Nonlinear mapping is indirectly found according to in Gramspace [23], where .
Conventional kernel function can be selected as follows [6].(i)Polynomial kernel function , , is constant.(ii)Gaussian kernel function , is the parameter of breadth.(iii)Sigmoid kernel function .
Assume that original sample set was with dimension and samples, where is the sample of th type, , and . There exists nonlinear mapping function . It transforms nonlinear original sample space to linear classification in highdimension data space ; that is, , . In space , distance scatter of intraclass and classes with training data is and in (7) and (9), respectively, where is the mean of th type in feature space. KFDA is to find a projection direction , which meets the following two properties: data that has similar characteristic should be gathered together as most as possible; the ones with different characteristic should be gathered as far as possible. So a key is to search projection direction and its corresponding discriminant function . Similarly with linear FDA, the optimal projection direction is to search vector , which maximizes fisher criterion function (10), where is optimal projection direction:
Since dimension of feature space is usually high and is indirect mapping function, discriminant vector is hard to compute directly. Thus, each solution is expressed as linear combination of samples in (11), according to kernelbased method, where .
Moreover, nonlinear transformation function of samples can be projected to feature space with direction in
From (11), for all , assume that and projection of mean vector with direction in feature space is where , .
From (12) and (13), we have where , , .
Since fisher criterion function is optimal solution of (15), vector can be resolved as in the following fisher criterion (16) [24]:
Furthermore, the solution of optimal vector and can be solved [25] with
Thus, the corresponding function of kernel fisher discriminant function is obtained as
3.3. Multikernel Fisher Discriminant Analysis
From Section 3.2, the solution of maximizing (15) equals to the solution of maximizing (16). Assume that is optimal solution to classification effect, whereas is both determined by kernel scatter matrix and difference of kernel mean vector . In the condition of independent and identically distributed, kernel mean of samples is independent with number of samples. It indicates difference of kernel mean vector doing nothing with the unbalance of samples. So is only determined by kernel scatter matrix for intraclass. If distribution of different variables differed, it should result in the contributions not in the similar interval. Besides that the solution of is not the optimal one. Hence, in order to avoid the influence of different distribution for samples, we presented multikernel fisher discriminant analysis method. It advances the kernel criterion function into where is the adjustable MATLAB parameter and and are the kernel matrix computed with each suitable kernel function from Section 3.2 (i)/(ii)/(iii).
In this way, the influence with different sample distributions is considered with the suitable kernel function.
The above algorithm in this paper can be chiefly described in Table 1. In this way, the contribution of each original process variable to the certain fault is measured.

4. Fault Separation of Tennessee Eastman with Redundancy Variables
4.1. Tennessee Eastman Chemical Process
Tennessee Eastman (TE) is a classical chemical process created by Eastman Chemical Company in 1993 [26]. Its technological process is shown in Figure 4. There are four reactants (A, C, D, and E) and two products (G, H). Besides that, there is one inert material B and byproduct F.
In TE process, the dynamic TE model is composed of five major units: a reactor, a separator, a stripper, a condenser, and a compressor. Each unit can be expressed with some equations, in all of 148 algebraic equations and 30 differential equations. So it becomes one of the most complex models and is widely used to test study algorithm with control, system monitor, fault diagnosis, and so forth. Here, we take Tennessee Eastman as the study object to measure its fault separation ability with our method.
4.2. Nonlinear Fault Separation of Redundancy Variables
In TE process, there are 41 observed variables and 12 manipulated variables from controller, some of which are nonlinear redundancy variables. Moreover, there are 20 types of classical fault in TE process shown in Table 2. Since Fault9 and Fault11 are nonlinear overlapped together shown in Figure 5, we take their fault separation as the study goal, meanwhile, 53 process variables must be screened for their multicollinearity and nonlinear redundancy. Process data of TE is simulated at oneminute sampling time in MATLAB software from Downs [27]. All the measurements have Gaussian noise. A total of 1000 samples are collected for training, where 800 data are collected for Fault9 and 200 for Fault11. In addition, 835 samples are applied to test separation validity with 644 for Fault9 and 171 for Fault11.

4.3. Results and Discussion
If we distinguished Fault9 and Fault11, there are 53 variables to be considered in all. Therefore, we compute the contribution of 53 variables with mentioned method to see the importance of each process variables on faults. Multikernel function is selected as Gaussian kernel and polynomial kernel, each comprised of 50%. The contributions of each variable to the faults are computed with steps in Section 3.3 that is shown in Figure 6 and Table 3. From large to small, the proper importance of all the 53 process variables is reordered as {Vab.21, Vab.13, Vab.9, Vab.16, Vab.7, Vab.20, Vab.11, Vab.2, Vab.12, Vab.8, Vab.19, Vab.5, Vab.22, Vab.6, Vab.3, Vab.18, Vab.14, Vab.15, Vab.17, Vab.10, Vab.41, Vab.40, Vab.27, Vab.23, Vab.29, Vab.31, Vab.26, Vab.33, Vab.25, Vab.32, Vab.4, Vab.24, Vab.30, Vab.35, Vab.34, Vab.37, Vab.36, Vab.28, Vab.39, Vab.38, Vab.1, Vab.53, Vab.52, Vab.51, Vab.50, Vab.49, Vab.48, Vab.47, Vab.46, Vab.45, Vab.44, Vab.43, Vab.42}.

In the Following, the curves of the first two important Vab.21 and Vab.13 in TE process are given in Figures 7(a) and 7(b) and Figures 8(a) and 8(b), respectively. It expresses the strong variation of process variables Vab.21 and Vab.13, actually.
(a) Process Vab.21 (reactor coolant temperature) in Fault9
(b) Process Vab.21 (reactor coolant temperature) in Fault11
(a) Process Vab.13 (product separation pressure) in Fault9
(b) Process Vab.13 (product separation pressure) in Fault11
According to the sequence of each process variable, the different feature sets are constructed as {Vab.21}, {Vab.21, Vab.13}, {Vab.21, Vab.13, Vab.9}, and so on. Nonlinear pattern classification of Fault9 and Fault11 is tested with support vector machine (SVM), which is widely used in pattern recognition. The parameters of SVM are optimized with crossvalidation and . With the above variable sets, the accuracy of fault separation between Fault9 and Fault11 is successively tested. The results are shown in Figure 9 and Table 4. It reveals that the separation accuracy becomes lower when the considered variables increase.

From the above results, we conclude that if all the 53 process variables were used to separate Fault9 and Fault 11, right ratio is merely 72.12%. It indicates that not all of the variables are directly related to certain fault. Some redundancy or irrelevant variables may decrease the classification accuracy and must be eliminated. If the feature were selected as the first five process variables {Vab.21, Vab.13, Vab.9, Vab.16, Vab.7}, the accuracy increases to the highest as 94.55%. It means that the above five process variables are key to the fault separation. If the model should be simplified at most, the process variable {Vab.21} is the best feature variable. We can recognize Fault9 and Fault11 according to the process changing of Vab.21.
On the other hand, Fault9 stands for the random disturbance to feed temperature. Fault11 is random disturbance to reactor cooling water inlet temperature. While {Vab.21, Vab.13, Vab.9, Vab.16, Vab.7} are the reactor coolant temperature, product separation pressure, reactor temperature, stripper pressure, reactor pressure, respectively, it is easy to see that the five selected variables are fairly relative to Fault9 and Fault11. The simulation results keep pace with the reality.
5. Conclusions
Nonlinear redundancy and multicollinearity variables can decrease the accuracy in classifier that must be eliminated. For the problem, FNN in MKFDA subspace is studied in the paper. Nonlinear variables are projected to a new linear higher dimension subspace with singlekernel fisher discriment analysis to get optimal classification with the intraclass nearest and interclass farthest as most as possible. Furthermore, conventional singlekernel KFDA is expanded to multikernel method to solve the influence of each process variable with different distribution function. In order to reduce the higher dimension emerging in multiKFDA subspace, FNN is composed to recognize the importance of each process variables on faults. According to simulation results in TE process, original variables are reduced to 5 in this paper, and the accuracy of tested right ratio reaches to 94.55% compared with tested right ratio 72.12% in the classifier between Fault9 and Fault11.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This research is supported by National Natural Science Foundation of China (no. 51376204, no. 51075418, no. 50905194, and no. 61174015), the Natural Science Foundation Project (no. CQCSTC2012jjA40026, no. CQCSTC2012jjA90011), and the Research Foundation of Chongqing University of Sci. and Tec. (no. CK2011Z01, no. CK2011B04).
References
 D. H. Lim, S. H. Lee, and M. G. Na, “Smart softsensing for the feedwater flowrate at PWRs using a GMDH algorithm,” IEEE Transactions on Nuclear Science, vol. 57, no. 1, pp. 340–347, 2010. View at: Publisher Site  Google Scholar
 M. J. Brusco and D. Steinley, “Exact and approximate algorithms for variable selection in linear discriminant analysis,” Computational Statistics and Data Analysis, vol. 55, no. 1, pp. 123–131, 2011. View at: Publisher Site  Google Scholar
 F. A. Michelsen, B. F. Lund, and I. J. Halvorsen, “Selection of optimal, controlled variables for the TEALARC LNG process,” Industrial and Engineering Chemistry Research, vol. 49, no. 18, pp. 8624–8632, 2010. View at: Publisher Site  Google Scholar
 F. Cipollini and G. M. Gallo, “Automated variable selection in vector multiplicative error models,” Computational Statistics and Data Analysis, vol. 54, no. 11, pp. 2470–2486, 2010. View at: Publisher Site  Google Scholar
 A. J. Miller, Subset Selection in Regression, Chapman and Hall, London, UK, 2002.
 R. L. Masion and R. F. Gunst, Statistical Design and Analysis of Experiments with Applications to Engineering and Science, John Wiley & Sons, Hoboken, NJ, USA, 2004.
 Y. Yang and J. O. Pederson, “A Comparative study on feature selection in text categorization,” in Proceedings of the 14th International Conference on Machine Learning, pp. 412–420, 1997. View at: Google Scholar
 K. Kira and L. A. Rendell, “The Feature selection problem: traditional methods and a new algorithm,” in Proceedings of the 9th National Conference on Artificial Intelligence (AAAI '92), pp. 129–134, July 1992. View at: Google Scholar
 B. Pfahringer, “Compressionbased feature subset selection,” in Proceedings of the Workshop on Data Engineering for Inductive Learning (IJCAI '95), pp. 101–106, 1995. View at: Google Scholar
 J. C. Isaac, Kernel methods and component analysis for pattern recognition [Ph.D. thesis], 2007.
 J. C. Huang, J. S. Zhao, W. Sun, and Y. K. Ding, “PCAbased early fault diagnosis of solid waste incinerator,” Chemical Industry and Engineering Progress, vol. 25, no. 12, pp. 1489–1492, 2006. View at: Google Scholar
 S. Wold, M. Sjöström, and L. Eriksson, “PLSregression: a basic tool of chemometrics,” Chemometrics and Intelligent Laboratory Systems, vol. 58, no. 2, pp. 109–130, 2001. View at: Publisher Site  Google Scholar
 J. D. Wu, P. H. Chiang, Y. W. Chang, and Y. J. Shiao, “An expert system for fault diagnosis in internal combustion engines using probability neural network,” Expert Systems with Applications, vol. 34, no. 4, pp. 2704–2713, 2008. View at: Publisher Site  Google Scholar
 D. F. Wang, S. J. Wang, and J. He, “Maintaining and fault removing on hydraulic system of CAK6140,” Machinery Design and Manufacture, vol. 7, pp. 161–162, 2010. View at: Google Scholar
 J. H. Li and P. L. Cui, “Improved kernel fisher discriminant analysis for fault diagnosis,” Expert Systems With Applications, vol. 36, no. 2, pp. 1423–1432, 2009. View at: Google Scholar
 M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre, “Generalized power method for sparse principal component analysis,” Journal of Machine Learning Research, vol. 11, pp. 517–553, 2010. View at: Google Scholar
 K. Kim, J.M. Lee, and I.B. Lee, “A novel multivariate regression approach based on kernel partial least squares with orthogonal signal correction,” Chemometrics and Intelligent Laboratory Systems, vol. 79, no. 12, pp. 22–30, 2005. View at: Publisher Site  Google Scholar
 R. Jenssen, “Kernel entropy component analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 847–860, 2010. View at: Publisher Site  Google Scholar
 N. Otopal, “Restricted kernel canonical correlation analysis,” Linear Algebra and Its Applications, vol. 437, no. 1, pp. 1–13, 2012. View at: Publisher Site  Google Scholar
 M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003. View at: Publisher Site  Google Scholar
 H. Y. Wang and Z. H. Sheng, “Choice of the parameters for the phase space reconstruction of chaotic time series,” Journal of Southeast University, vol. 30, no. 5, pp. 113–117, 2000. View at: Google Scholar
 Z. B. Zhu and Z. H. Song, “A novel fault diagnosis system using pattern classification on kernel FDA subspace,” Expert Systems with Applications, vol. 38, no. 6, pp. 6895–6905, 2011. View at: Publisher Site  Google Scholar
 S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K.R. Muller, “Fisher discriminant analysis with kernels,” in Proceedings of the 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99), pp. 41–48, August 1999. View at: Google Scholar
 B. Schölkopf, A. Smola, and K.R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998. View at: Google Scholar
 J. Y. Gan and Y. W. Zhang, “Generalized kernel fisher optimal discriminant in pattern recognition,” Pattern Recognition and Artificial Intelligence, vol. 15, no. 4, pp. 429–434, 2002. View at: Google Scholar
 J. J. Downs and E. F. Vogel, “A plantwide industrial process control problem,” Computers and Chemical Engineering, vol. 17, no. 3, pp. 245–255, 1993. View at: Google Scholar
 N. Lawrence Kicker, “Tennessee eastman,” 2013, http://www.cheme.washington.edu/facresearch/faculty/ricker.html. View at: Google Scholar
Copyright
Copyright © 2014 Yingying Su et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.