Journal of Applied Mathematics

Journal of Applied Mathematics / 2014 / Article
Special Issue

Mathematical Approaches in Advanced Control Theories 2013

View this Special Issue

Research Article | Open Access

Volume 2014 |Article ID 729763 | 9 pages |

Nonlinear Fault Separation for Redundancy Process Variables Based on FNN in MKFDA Subspace

Academic Editor: Xianxia Zhang
Received19 Jul 2013
Revised22 Oct 2013
Accepted23 Oct 2013
Published04 Feb 2014


Nonlinear faults are difficultly separated for amounts of redundancy process variables in process industry. This paper introduces an improved kernel fisher distinguish analysis method (KFDA). All the original process variables with faults are firstly optimally classified in multi-KFDA (MKFDA) subspace to obtain fisher criterion values. Multikernel is used to consider different distributions for variables. Then each variable is eliminated once from original sets, and new projection is computed with the same MKFDA direction. From this, differences between new Fisher criterion values and the original ones are tested. If it changed obviously, the effect of eliminated variable should be much important on faults called false nearest neighbors (FNN). The same test is applied to the remaining variables in turn. Two nonlinear faults crossed in Tennessee Eastman process are separated with lower observation variables for further study. Results show that the method in the paper can eliminate redundant and irrelevant nonlinear process variables as well as enhancing the accuracy of classification.

1. Introduction

With developments of modern process industry, multivariate monitor from sensors has showed their multicollinearity, nonlinear correlative coupling, time delay, and redundancy. It makes complexity increasing with exponent to fault separation and diagnosis, called “Curse of Dimension” [1, 2]. On the other hand, right ratio of fault classification decreases with multivariate and redundancy process variables. Therefore, many attentions have been paid on two points of view that are variable selection and dimension reduction [3, 4].

Among the study of variable selection, the existed methods can be broadly classified into three categories: random search techniques, measure-based method, and intelligent computation. In random search, each process variable is directly deleted or involved in the classification model one time in turn to search the most suitable input sets under a certain criterion, such as forward selection, backward selection, and stepwise that are simple and easily realized methods [5]. While it was studied by Masion and Gunst [6] that these methods would result in mistaken results, variable set appears multicollinearity. Measure-based method appears to select variable with computing relevancy among all variables, as well as that between variables and labels. The variables with highest similar characteristic will be gathered in one kind. According to different definition, K-L information measure, minimum description length, and mutual information are used [79]. Intelligent computation deepens to solve nonlinear variable selection problem, such as neurnal network that is once used to nonlinear model, while its selection criterion is uncertain [10].

Dimension reduction is different from variable selection, which mainly depends on transformation and information extraction of original variable matrix. It projects original variables with a certain mapping to a new subspace and extracts information in lower dimension, such as principal component analysis (PCA) [11] and partial least squares (PLS) [12]. Original variables with linear-relative process variables are linearly projected according to the maximum direction of covariance matrix. Meanwhile, the maximum original information can be kept as most as possible. Contribution chart method is the way to calculate contribution of each variable to certain fault with statics and SPE [13, 14] for PCA. The above linear methods have been extended to nonlinear ones after kernel method presented [1520], such as kernel principal component analysis (KPCA), kernel partial least squares (KPLS), and kernel fisher discriminant analysis (KFDA). Kernel method converts a linear classification learning algorithm into nonlinear one, by mapping the original observations into a higher-dimensional space. So that linear classifier in the new space equals to a nonlinear classifier in the original space.

However, nonlinear information projected to the new feature space has higher dimension, and data matrix has lost their original physical meaning in original sample space. If we separated nonlinear faults crossed together in original space, the dimension of classifier with kernel method would become huge, while right ratio would decrease with redundancy and multicollinearity variables.

The objective of this paper is to deepen dimension reduction method for the above problems with measure method in variable selection called MKFDA-FNN. Nonlinear process variables are projected in higher-dimension space with MKFDA. Discriminant vector and its corresponding feature vector with maximum separation are computed to cluster original variables with highest similarity. With embed-dimension increasing, false nearest neighbors (FNN) with high similarity are able to be removed in turn. Thus, nonlinear redundancy and multicollinearity process variables can be removed from input sets to nonlinear classifier. Finally, we give an actual fault separation problem in classical chemical process Tennessee Eastman (TE) to further study.

2. Problem Description

In fault separation problem presented above, it equals to screen original process variables related to certain faults as most as possible. Multivariate data matrix considered initially with normal and fault information is described in Figure 1, where are process variables with -dimension, present time-delay variables of at different sample time, present time-delay variables of , and present time-delay variables of at different sample-time. In this way, original data matrix is composed of -dimension process/control variables and their delay variables in where , , and present maximum delay order of process/control variables , , and , presents current sample time, and is sample length.

3. Multivariate Fault Separation Based on MKFDA-FNN

To fault separation problem with nonlinear redundancy process/control variables, an approach is proposed in Figure 2. Correlated nonlinear variables are firstly projected to a higher-dimension MKFDA subspace. Furthermore, in order to find fairly useful variables, the importance of each input is measured in subspace with distance measure inspired by FNN. Accordingly, redundant variables are recognized. It makes separation of faults crossed together easily.

3.1. False Nearest Neighbors

FNN is the feature selection method on the basis of phase space reconstruction (PSR) in high-dimension data space [21]. With embed-dimension increasing, movement locus becomes open, and false nearest neighbors with high similarity are able to be removed in turn. It restores the locus of chaos. Its algorithm is as follows.

In -dimension phase space including original variables and their time delay, each phase vector has one nearest neighbors . Their 2-norm distance is

When -dimension is increased to , the above phase vector is changed as new one, noted as in

If was much bigger than , it means the projection of two nonneighbor phase vector from higher dimension to lower one. So the two neighbors are the false nearest neighbors.

Note that

If is larger than , should be fault nearest neighbor of . Threshold is determined between interval (10, 50). Once there appeared noise in process data, the following judge criterion should be involved. If , should be nearest fault neighbor of , where is

The distance measure between vectors can explain the similarity of false nearest neighbors factually in (6). Assume that there was a data space with -dimension variable, and one sample vector is . We set variable as zero, standing for vector without variable that is noted as in Figure 3.

The similarity between and is

If distance measure is small, it shows that vectors and have highly similarity. That is, the removed variable makes little impact on nonlinear pattern, and process variable has low interpreting ability. Otherwise, if it was much bigger, it reveals that much differs from . Process variable is important to interpreting of nonlinear pattern. is false nearest neighbors of .

3.2. Kernel Fisher Discriminant Analysis

KFDA is most useful to nonlinear classification problems [22]. Nonlinear discriminant vector in original space is extracted to linear optimal discriminate vector in high-dimension feature space with conventional fisher discriminant analysis (FDA). Since dimension of is much higher, it is hard to directly confirm nonlinear mapping function from original space to the feature space. Reproducing kernel-based method widely developed in machine learning (ML) can achieve this goal. Nonlinear mapping is indirectly found according to in Gram-space [23], where .

Conventional kernel function can be selected as follows [6].(i)Polynomial kernel function , , is constant.(ii)Gaussian kernel function , is the parameter of breadth.(iii)Sigmoid kernel function .

Assume that original sample set was with -dimension and -samples, where is the sample of th type, , and . There exists nonlinear mapping function . It transforms nonlinear original sample space to linear classification in high-dimension data space ; that is, , . In space , distance scatter of intraclass and classes with training data is and in (7) and (9), respectively, where is the mean of th type in feature space. KFDA is to find a projection direction , which meets the following two properties: data that has similar characteristic should be gathered together as most as possible; the ones with different characteristic should be gathered as far as possible. So a key is to search projection direction and its corresponding discriminant function . Similarly with linear FDA, the optimal projection direction is to search vector , which maximizes fisher criterion function (10), where is optimal projection direction:

Since dimension of feature space is usually high and is indirect mapping function, discriminant vector is hard to compute directly. Thus, each solution is expressed as linear combination of samples in (11), according to kernel-based method, where .

Moreover, nonlinear transformation function of samples can be projected to feature space with direction in

From (11), for all , assume that and projection of mean vector with direction in feature space is where , .

From (12) and (13), we have where , , .

Since fisher criterion function is optimal solution of (15), vector can be resolved as in the following fisher criterion (16) [24]:

Furthermore, the solution of optimal vector and can be solved [25] with

Thus, the corresponding function of kernel fisher discriminant function is obtained as

3.3. Multikernel Fisher Discriminant Analysis

From Section 3.2, the solution of maximizing (15) equals to the solution of maximizing (16). Assume that is optimal solution to classification effect, whereas is both determined by kernel scatter matrix and difference of kernel mean vector . In the condition of independent and identically distributed, kernel mean of samples is independent with number of samples. It indicates difference of kernel mean vector doing nothing with the unbalance of samples. So is only determined by kernel scatter matrix for intraclass. If distribution of different variables differed, it should result in the contributions not in the similar interval. Besides that the solution of is not the optimal one. Hence, in order to avoid the influence of different distribution for samples, we presented multikernel fisher discriminant analysis method. It advances the kernel criterion function into where is the adjustable MATLAB parameter and and are the kernel matrix computed with each suitable kernel function from Section 3.2 (i)/(ii)/(iii).

In this way, the influence with different sample distributions is considered with the suitable kernel function.

The above algorithm in this paper can be chiefly described in Table 1. In this way, the contribution of each original process variable to the certain fault is measured.

Inputs , , ,

Step  1Initiate and compute ,
Step  2Select suitable multikernel function
Step  3Compute the kernel mean vector between two kinds with
Step  4Compute the kernel scatter matrix of intraclass
Step  5Compute ,
Step  6Get the optimal solution of (16)
Step  7Place the inspected process variable as zero in original samples
Step  8Project the new samples into the feature space
Step  9Compute the contribution of one variable at one time with FNN in MKFDA
Step  10Repeat the above course for the remaining variables

OutputsThe distance measure of each original variable is obtained

4. Fault Separation of Tennessee Eastman with Redundancy Variables

4.1. Tennessee Eastman Chemical Process

Tennessee Eastman (TE) is a classical chemical process created by Eastman Chemical Company in 1993 [26]. Its technological process is shown in Figure 4. There are four reactants (A, C, D, and E) and two products (G, H). Besides that, there is one inert material B and byproduct F.

In TE process, the dynamic TE model is composed of five major units: a reactor, a separator, a stripper, a condenser, and a compressor. Each unit can be expressed with some equations, in all of 148 algebraic equations and 30 differential equations. So it becomes one of the most complex models and is widely used to test study algorithm with control, system monitor, fault diagnosis, and so forth. Here, we take Tennessee Eastman as the study object to measure its fault separation ability with our method.

4.2. Nonlinear Fault Separation of Redundancy Variables

In TE process, there are 41 observed variables and 12 manipulated variables from controller, some of which are nonlinear redundancy variables. Moreover, there are 20 types of classical fault in TE process shown in Table 2. Since Fault9 and Fault11 are nonlinear overlapped together shown in Figure 5, we take their fault separation as the study goal, meanwhile, 53 process variables must be screened for their multicollinearity and nonlinear redundancy. Process data of TE is simulated at one-minute sampling time in MATLAB software from Downs [27]. All the measurements have Gaussian noise. A total of 1000 samples are collected for training, where 800 data are collected for Fault9 and 200 for Fault11. In addition, 835 samples are applied to test separation validity with 644 for Fault9 and 171 for Fault11.


1A/C feed ratio, B composition constantStep
2B composition, A/C ratio constantStep
3D feed temperatureStep
4Reactor cooling water inlet temperatureStep
5Condenser cooling water inlet temperatureStep
6A feed lossStep
7C header pressure loss-reduced availabilityStep
8A, B, C feed compositionRandom
9D feed temperatureRandom
10C feed temperatureRandom
11Reactor cooling water inlet temperatureRandom
12Condenser cooling water inlet temperatureRandom
13Reaction kineticsSlow drift
14Reactor cooling water valveSticking
15Condenser cooling water valveSticking

4.3. Results and Discussion

If we distinguished Fault9 and Fault11, there are 53 variables to be considered in all. Therefore, we compute the contribution of 53 variables with mentioned method to see the importance of each process variables on faults. Multikernel function is selected as Gaussian kernel and polynomial kernel, each comprised of 50%. The contributions of each variable to the faults are computed with steps in Section 3.3 that is shown in Figure 6 and Table 3. From large to small, the proper importance of all the 53 process variables is reordered as {Vab.21, Vab.13, Vab.9, Vab.16, Vab.7, Vab.20, Vab.11, Vab.2, Vab.12, Vab.8, Vab.19, Vab.5, Vab.22, Vab.6, Vab.3, Vab.18, Vab.14, Vab.15, Vab.17, Vab.10, Vab.41, Vab.40, Vab.27, Vab.23, Vab.29, Vab.31, Vab.26, Vab.33, Vab.25, Vab.32, Vab.4, Vab.24, Vab.30, Vab.35, Vab.34, Vab.37, Vab.36, Vab.28, Vab.39, Vab.38, Vab.1, Vab.53, Vab.52, Vab.51, Vab.50, Vab.49, Vab.48, Vab.47, Vab.46, Vab.45, Vab.44, Vab.43, Vab.42}.

Process variableContributionProcess variableContribution


In the Following, the curves of the first two important Vab.21 and Vab.13 in TE process are given in Figures 7(a) and 7(b) and Figures 8(a) and 8(b), respectively. It expresses the strong variation of process variables Vab.21 and Vab.13, actually.

According to the sequence of each process variable, the different feature sets are constructed as {Vab.21}, {Vab.21, Vab.13}, {Vab.21, Vab.13, Vab.9}, and so on. Nonlinear pattern classification of Fault9 and Fault11 is tested with support vector machine (SVM), which is widely used in pattern recognition. The parameters of SVM are optimized with cross-validation and . With the above variable sets, the accuracy of fault separation between Fault9 and Fault11 is successively tested. The results are shown in Figure 9 and Table 4. It reveals that the separation accuracy becomes lower when the considered variables increase.

Feature setCombination of variablesAccuracy

Set2Vab.21, Vab.1385.731%
Set3Vab.21, Vab.13, Vab.989.652%
Set4Vab.21, Vab.13, Vab.9, Vab.1692.123%
Set5Vab.21, Vab.13, Vab.9, Vab.16, Vab.794.547%
Set6Vab.21, Vab.13, Vab.9, Vab.16, Vab.7, Vab.2092.532%
Set53Vab.21, Vab.13, Vab.9, Vab.16, Vab.7, Vab.20, … Vab.4272.119%

From the above results, we conclude that if all the 53 process variables were used to separate Fault9 and Fault 11, right ratio is merely 72.12%. It indicates that not all of the variables are directly related to certain fault. Some redundancy or irrelevant variables may decrease the classification accuracy and must be eliminated. If the feature were selected as the first five process variables {Vab.21, Vab.13, Vab.9, Vab.16, Vab.7}, the accuracy increases to the highest as 94.55%. It means that the above five process variables are key to the fault separation. If the model should be simplified at most, the process variable {Vab.21} is the best feature variable. We can recognize Fault9 and Fault11 according to the process changing of Vab.21.

On the other hand, Fault9 stands for the random disturbance to feed temperature. Fault11 is random disturbance to reactor cooling water inlet temperature. While {Vab.21, Vab.13, Vab.9, Vab.16, Vab.7} are the reactor coolant temperature, product separation pressure, reactor temperature, stripper pressure, reactor pressure, respectively, it is easy to see that the five selected variables are fairly relative to Fault9 and Fault11. The simulation results keep pace with the reality.

5. Conclusions

Nonlinear redundancy and multicollinearity variables can decrease the accuracy in classifier that must be eliminated. For the problem, FNN in MKFDA subspace is studied in the paper. Nonlinear variables are projected to a new linear higher dimension subspace with single-kernel fisher discriment analysis to get optimal classification with the intra-class nearest and inter-class farthest as most as possible. Furthermore, conventional single-kernel KFDA is expanded to multikernel method to solve the influence of each process variable with different distribution function. In order to reduce the higher dimension emerging in multi-KFDA subspace, FNN is composed to recognize the importance of each process variables on faults. According to simulation results in TE process, original variables are reduced to 5 in this paper, and the accuracy of tested right ratio reaches to 94.55% compared with tested right ratio 72.12% in the classifier between Fault9 and Fault11.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


This research is supported by National Natural Science Foundation of China (no. 51376204, no. 51075418, no. 50905194, and no. 61174015), the Natural Science Foundation Project (no. CQCSTC2012jjA40026, no. CQCSTC2012jjA90011), and the Research Foundation of Chongqing University of Sci. and Tec. (no. CK2011Z01, no. CK2011B04).


  1. D. H. Lim, S. H. Lee, and M. G. Na, “Smart soft-sensing for the feedwater flowrate at PWRs using a GMDH algorithm,” IEEE Transactions on Nuclear Science, vol. 57, no. 1, pp. 340–347, 2010. View at: Publisher Site | Google Scholar
  2. M. J. Brusco and D. Steinley, “Exact and approximate algorithms for variable selection in linear discriminant analysis,” Computational Statistics and Data Analysis, vol. 55, no. 1, pp. 123–131, 2011. View at: Publisher Site | Google Scholar
  3. F. A. Michelsen, B. F. Lund, and I. J. Halvorsen, “Selection of optimal, controlled variables for the TEALARC LNG process,” Industrial and Engineering Chemistry Research, vol. 49, no. 18, pp. 8624–8632, 2010. View at: Publisher Site | Google Scholar
  4. F. Cipollini and G. M. Gallo, “Automated variable selection in vector multiplicative error models,” Computational Statistics and Data Analysis, vol. 54, no. 11, pp. 2470–2486, 2010. View at: Publisher Site | Google Scholar
  5. A. J. Miller, Subset Selection in Regression, Chapman and Hall, London, UK, 2002.
  6. R. L. Masion and R. F. Gunst, Statistical Design and Analysis of Experiments with Applications to Engineering and Science, John Wiley & Sons, Hoboken, NJ, USA, 2004.
  7. Y. Yang and J. O. Pederson, “A Comparative study on feature selection in text categorization,” in Proceedings of the 14th International Conference on Machine Learning, pp. 412–420, 1997. View at: Google Scholar
  8. K. Kira and L. A. Rendell, “The Feature selection problem: traditional methods and a new algorithm,” in Proceedings of the 9th National Conference on Artificial Intelligence (AAAI '92), pp. 129–134, July 1992. View at: Google Scholar
  9. B. Pfahringer, “Compression-based feature subset selection,” in Proceedings of the Workshop on Data Engineering for Inductive Learning (IJCAI '95), pp. 101–106, 1995. View at: Google Scholar
  10. J. C. Isaac, Kernel methods and component analysis for pattern recognition [Ph.D. thesis], 2007.
  11. J. C. Huang, J. S. Zhao, W. Sun, and Y. K. Ding, “PCA-based early fault diagnosis of solid waste incinerator,” Chemical Industry and Engineering Progress, vol. 25, no. 12, pp. 1489–1492, 2006. View at: Google Scholar
  12. S. Wold, M. Sjöström, and L. Eriksson, “PLS-regression: a basic tool of chemometrics,” Chemometrics and Intelligent Laboratory Systems, vol. 58, no. 2, pp. 109–130, 2001. View at: Publisher Site | Google Scholar
  13. J. D. Wu, P. H. Chiang, Y. W. Chang, and Y. J. Shiao, “An expert system for fault diagnosis in internal combustion engines using probability neural network,” Expert Systems with Applications, vol. 34, no. 4, pp. 2704–2713, 2008. View at: Publisher Site | Google Scholar
  14. D. F. Wang, S. J. Wang, and J. He, “Maintaining and fault removing on hydraulic system of CAK6140,” Machinery Design and Manufacture, vol. 7, pp. 161–162, 2010. View at: Google Scholar
  15. J. H. Li and P. L. Cui, “Improved kernel fisher discriminant analysis for fault diagnosis,” Expert Systems With Applications, vol. 36, no. 2, pp. 1423–1432, 2009. View at: Google Scholar
  16. M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre, “Generalized power method for sparse principal component analysis,” Journal of Machine Learning Research, vol. 11, pp. 517–553, 2010. View at: Google Scholar
  17. K. Kim, J.-M. Lee, and I.-B. Lee, “A novel multivariate regression approach based on kernel partial least squares with orthogonal signal correction,” Chemometrics and Intelligent Laboratory Systems, vol. 79, no. 1-2, pp. 22–30, 2005. View at: Publisher Site | Google Scholar
  18. R. Jenssen, “Kernel entropy component analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 5, pp. 847–860, 2010. View at: Publisher Site | Google Scholar
  19. N. Otopal, “Restricted kernel canonical correlation analysis,” Linear Algebra and Its Applications, vol. 437, no. 1, pp. 1–13, 2012. View at: Publisher Site | Google Scholar
  20. M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality reduction and data representation,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003. View at: Publisher Site | Google Scholar
  21. H. Y. Wang and Z. H. Sheng, “Choice of the parameters for the phase space reconstruction of chaotic time series,” Journal of Southeast University, vol. 30, no. 5, pp. 113–117, 2000. View at: Google Scholar
  22. Z. B. Zhu and Z. H. Song, “A novel fault diagnosis system using pattern classification on kernel FDA subspace,” Expert Systems with Applications, vol. 38, no. 6, pp. 6895–6905, 2011. View at: Publisher Site | Google Scholar
  23. S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K.-R. Muller, “Fisher discriminant analysis with kernels,” in Proceedings of the 9th IEEE Workshop on Neural Networks for Signal Processing (NNSP'99), pp. 41–48, August 1999. View at: Google Scholar
  24. B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998. View at: Google Scholar
  25. J. Y. Gan and Y. W. Zhang, “Generalized kernel fisher optimal discriminant in pattern recognition,” Pattern Recognition and Artificial Intelligence, vol. 15, no. 4, pp. 429–434, 2002. View at: Google Scholar
  26. J. J. Downs and E. F. Vogel, “A plant-wide industrial process control problem,” Computers and Chemical Engineering, vol. 17, no. 3, pp. 245–255, 1993. View at: Google Scholar
  27. N. Lawrence Kicker, “Tennessee eastman,” 2013, View at: Google Scholar

Copyright © 2014 Ying-ying Su et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

818 Views | 481 Downloads | 2 Citations
 PDF  Download Citation  Citation
 Download other formatsMore
 Order printed copiesOrder

We are committed to sharing findings related to COVID-19 as quickly and safely as possible. Any author submitting a COVID-19 paper should notify us at to ensure their research is fast-tracked and made available on a preprint server as soon as possible. We will be providing unlimited waivers of publication charges for accepted articles related to COVID-19. Sign up here as a reviewer to help fast-track new submissions.