A Combined Markov Chain Model and Generalized Projection Nonnegative Matrix Factorization Approach for Fault Diagnosis

Yuguang, Niu; Shilin, Wang; Ming, Du

doi:https://doi.org/10.1155/2017/7067025

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusions Conflicts of Interest References Copyright Related Articles

Special Issue

Stochastic Systems and Control: Theory and Applications

View this Special Issue

Research Article | Open Access

Volume 2017 | Article ID 7067025 | https://doi.org/10.1155/2017/7067025

A Combined Markov Chain Model and Generalized Projection Nonnegative Matrix Factorization Approach for Fault Diagnosis

Niu Yuguang,¹Wang Shilin,²and Du Ming²

Academic Editor: Honglei Xu

Received09 Mar 2017

Revised09 May 2017

Accepted22 May 2017

Published13 Jun 2017

Abstract

The presence of sets of incomplete measurements is a significant issue in the real-world application of multivariate statistical process monitoring models for industrial process fault detection. Since the missing data in the incomplete measurements are usually correlated with some of the available variables, these measurements can be used if an efficient algorithm is presented. To resolve the problem, a novel method combining Markov chain model and generalized projection nonnegative matrix factorization (MCM-GPNMF) is proposed to detect and diagnose the faults in industrial process. The basic idea of the approach is to use MCM-GPNMF to extract the dominant variables from incomplete process data and to combine them with statistical process monitoring techniques. and statistics are defined as online monitoring quantities for fault detection and corresponding contribution plots are also considered for fault isolation. The proposed method is applied to a 1000 MW unit boiler process. The simulation results clearly illustrate the feasibility of the proposed method.

1. Introduction

As industrial process becomes more and more complex, process monitoring and diagnosis techniques are gaining importance for plant safety, maintenance cost, and profit margins. Multivariate statistical process monitoring (MSPM) techniques have been widely used to build statistical models for some unmeasured variables and for establishing online monitoring schemes for industrial process [1, 2]. These models extract a small number of latent variables, which in a manner better summarize the properties contained in the original variables. Monitoring and diagnosis using these latent variables are both simpler and more powerful than using the original variables [3].

It is well known that the traditional MSPM models usually require that the process data on all variables must be complete. In practice, however, one of the challenges in applying these models is to deal with the process data sets that contain some missing observations. Sometimes, more than 50% of industrial process data would contain the missing data [4]. Since the missing data in the incomplete measurements is usually correlated with some of the available variables, the conventional MSPM methods generally eliminate the sample data in the data matrix that contain them, but doing so will leave the corresponding nature of process unknown.

It is of great importance to determine how to use the incomplete process data sets to build the normal operating model. The well-known Markov chain model (MCM), a typical stochastic process model, is one of commonly used prediction approaches. The most basic feature of MCM is “Markov property,” also known as “no aftereffect.” The next process variable state is predicted by the transition probability matrix obtained using MCM to predict the mobility of measurements if the original time series satisfies the conditions of the Markov chain (MC) [5, 6]. The MC prediction method has been widely used in various fields. In Jeon and Lee’s research, MC-based prediction routing methods are proposed to select the optimal behavior nodes [7]. In order to keep balance of electric power system, Yoder et al. use the MCM to predict short-term wind power [8].

Nonnegative matrix factorization (NMF) is a novel multivariate data analysis and dimension reduction technique that has many applications in spectroscopy, data mining, and pattern recognition [9]. NMF and its variant methods are typically applied to high-dimensional data where each element has a non-negative value, and they find a low rank approximation from the historical process data sets. Unlike the traditional MSPM methods, the NMF-based algorithms do not have any assumption about the nature of the process variables except for nonnegativity. The nonnegativity restriction lets only additions in the factorization process. This property makes NMF obtain sparse and part-based subspace representations of the original data sets [10, 11]. Therefore, NMF-based methods have potential superiority to solve the monitoring and diagnosis problem of complex industrial process.

Normally, the NMF-based algorithms require that the measurement data be nonnegative. In practice, however, the process data of industrial process may be not fulfilling this constraint. Due to the difference in unit selection, the collected data is likely to contain negative numbers. Although it is possible by adjusting the unit to make negative data satisfy the nonnegative constraints, in order to make NMF method have a wider range of applications, we want to relax the nonnegative constraints on the original data sets. In this research, we will propose a new variant of NMF to solve the above problem. It can be called generalized projection nonnegative matrix factorization (GPNMF). Then, MCM and GPNMF are combined to extract useful information from incomplete process data and to combine them with statistical process monitoring techniques.

The rest of the article is organized as follows: Section 2 introduces MCM-based prediction method. Section 3 proposes the GPNMF method. In Section 4, the MCM-GPNMF-based process monitoring method is introduced. In Section 5, a case study in a 1000 MW unit boiler process is shown and discussed. Section 6 contains the conclusions.

2. MCM-Based Prediction Method

2.1. Markov Chain Model

Markov chain model is a stochastic process which can be used to estimate the transition probability between the discrete states of the system, and it can be expressed by some parameters. In the first-order Markov chain model, the state at a certain moment only depends on the state of its previous moment [12]. In the further research, the second-order or even higher-order Markov chain process was proposed [12, 13]. In these second-order or higher-order Markov chain processes, the state of a moment depends on two or more states before it. In this paper, we will use the second-order Markov chain model to estimate the missing values in the data set.

For a second-order Markov chain model, let be a stochastic process. The set of all possible states in the stochastic process is called state space. The state space is defined as .

For a given time series , its conditional probability distribution can be expressed as

Markov transition probability is defined as follows:where represents the current state of the moment and and represent the previous and next states of , respectively.

If a stochastic process contains states, the matrix consisting of all transition probabilities is called the state transition matrix of the second-order Markov chain. The state transition matrix can be expressed as follows:

The second-order Markovian transfer matrix has the following property. For any state , the sum of the probability of its transition to all states is 1. Consider

The maximum likelihood estimation of transfer probability for second-order Markov chains is defined as follows:where is the transition frequency that the state is transferred from the previous moment state and the current time state to the next moment state .

The cumulative distribution function (CDF) is also used in the process of generating the time series. The CDF of the second-order Markov chain model is calculated as follows:

The second-order Markov chain model is used to generate the process variable time series, where the initial state is completely random. Generate a random number between 0 and 1 by the random number generator. For the second-order Markov chain model, the current time state and the previous time state are known. If satisfies , then the next time state is .

Only a time series of state is obtained by the above steps. Next, it is necessary to transform the resulting time series into the time series of actual process variable. In addition to the fact that the first and last states do not need to be transformed, all other states need to use formula (7) to transform.where and represent the upper and lower limits of a state, respectively. represents a random number evenly distributed between 0 and 1. represents the actual value of the process variable.

2.2. Markov Property Test

When using the Markov chain to predict the value of a variable, it is not necessary to consider the change of the variable in the past. If the current state of the variable is known, the state of the next moment can be predicted. The Markov property test of the original time series should be carried out before the establishment of the Markov chain model. The specific method of Markov property test is given as follows: we assume that all states of the original time series are . The marginal probability is defined as follows:where stands for the frequency that the variable transitions from state to state .

When the amount of data is large enough, the statistic obeys the distribution and the degree of freedom of distribution is . The statistic can be calculated as follows:where stands for the probability that the variable transitions from state to state .

When the significance level α is given, the value of can be obtained by looking up the table. The Markov chain model can be used to process the variable if is satisfied.

2.3. Numerical Example

In this section, the active power of a 1000 MW unit is chosen for numerical experiment. Under the influence of random noise, the power measurement is randomly fluctuating near the set point. Therefore, the active power can be regarded as a stochastic process. The normal operation data of the active power is used as experiment data vector with 500 samples. The maximum and minimum values of power in all samples are 743.86 MW and 749.94 MW, respectively. Therefore, the stochastic process is divided into 16 states. In these states, the power values 743 MW and 750 MW are defined as state 1 and state 16, respectively. The other states are evenly distributed between state 1 and state 16, and the power interval between each two states is 500 kW. The significance level is given by 0.05 in the present work. The conclusion can be easily obtained through the calculation; that is, . Therefore, The Markov chain model can be used to process this time series.

Next, we will select 50 samples from the experimental data randomly and their values will be set to zero. These 50 samples represent the missing data in experiment data vector. Then, the second-order Markov chain model is used to deal with the missing data in experiment data vector. The incomplete data can be replaced by the predicted value, which is shown in Figure 1.

As seen in Figure 1, the predicted data are very close to the corresponding original data. In other words, the prediction value by second-order MCM is very effective. Therefore, this example can illustrate the feasibility of the second-order MCM algorithm.

3. Generalized Projection Nonnegative Matrix Factorization

3.1. NMF Algorithm

The mathematical formulation of NMF is expressed as follows. Given a process data matrix (each column of is a sample vector), where all elements are nonnegative and a natural number , NMF aims to find two low rank nonnegative matrices and such that . Here, each column of is called basis vector. Each column of is called an encoding and is in one-to-one correspondence with a sample vector in [10]. In other words, each sample vector is approximated by a linear combination of the columns of , weighted by the components of [11].

The approximate factorization problem can be formulated as an optimization problem that uses a cost function to measure the quality of the approximation. Lee and Seung constructed a simple objective function which utilizes the square of the Euclidean to measure the distance between and [11]. The objective function of the optimization problem can be expressed as follows:

It is well known that the function is convex in only or only but is not convex in both variables meanwhile. Therefore, it is realistic to find local minima by solving the above optimization problem. Lee and Seung presented an iterative algorithm to obtain the local minima. The objective function is monotonically nonincreasing under the following rules [11]:

Heretofore, the multiplicative iterative algorithm is the most classic and widely used monotone algorithm. The contradiction of the optimization algorithm between convergence speed and easy use can be better coordinated by using above iterative rules.

3.2. GPNMF Algorithm

Many improvements of NMF have been presented based on the objective function of the basic NMF algorithm by adding different regularization terms so as to increase the constraint conditions to NMF from different perspectives [14, 15]. On the contrary, Yuan and Oja proposed an improved NMF algorithm based on linear projection [16]. It can be called projective nonnegative matrix factorization (PNMF). However, the PNMF algorithm does not guarantee that the objective function is monotonically decreasing during the iteration, and there may be oscillations. In order to solve this problem and make the NMF algorithm better adapt to the actual industrial process, we propose a new variation of NMF method named generalized projection nonnegative matrix factorization (GPNMF). This method will not only guarantee the objective function monotone nonincreasing under the new iterative rule but also make the process data not necessarily constrained to be nonnegative. In this approach, the approximate factorization will be rewritten as and the optimization problem can be expressed as follows:

The above constrained optimization problem can be regarded as a typical application of regularized least squares problem proposed by Tikhonov in 1963 [17]. The objective function of (12) can be expanded to a function about the variable which ignores the constant term . Hence, (12) can be rewritten as follows:

The gradient matrix of (13) can be obtained by using matrix differential:

When the value of (14) is equal to 0, the solution of the least squares problem in (12) is given by

Since the nonnegative constraint for the original data matrix is relaxed in the GPNMF algorithm, now consider the case where contains positive and negative elements. The factors and are defined as the absolute values of all positive elements and negative elements in , respectively. and are calculated separately as follows:where represents the absolute value for all the elements of the matrix . Then the original data matrix can be expressed as and (15) can be adapted as follows:

It can be seen from (13) that the objective function is a quartic function with respect to the coefficient matrix . Here, a new iterative rule for the optimization problem in (12) is presented as follows:

The objective function in (12) is invariant under this update if and only if is at local minima of the constrained optimization problem.

4. Fault Diagnosis Algorithm Based on MCM-GPNMF

4.1. Initialization of GPNMF Algorithm

For the moment, NMF and its improved algorithms are solved by iteration. It is well known that a good iteration initial value can improve the convergence speed and accuracy of the NMF-based algorithm. Although many researchers have pointed out the importance of a good initial value for the NMF algorithm in their article, most people still use the random method to initialize the NMF algorithm in the practical application. Since the NMF can only converge to the local optimal solution, the different initial values will result in different results. Langville et al. compared several commonly used initialization methods [18]. For the presented GPNMF algorithm, there is only one unknown factor, that is, coefficient matrix . Besides, the approximate factorization can clearly indicate that the coefficient matrix is calculated to satisfy , where is identity matrix of size . In this research, therefore, the coefficient matrix is initialized with a -by--dimensional matrix with ones on the diagonal and zeros elsewhere.

4.2. Statistical Monitoring Model Based on GPNMF

The GPNMF-based statistical process monitoring model can be described aswhere the matrix and represent the eigensubspace and the residual subspace, respectively. The matrix is the reconstructed value of the coefficient matrix . It reflects the changes in process variables. Due to the fact that the eigensubspace and the residual subspace of GPNMF algorithm are similar to the principal component subspace and residual subspace of PCA-based method, according to the definition of monitoring statistic and SPE in PCA method, we construct two new monitoring statistics based on GPNMF statistical monitoring model to monitor the change of eigensubspace and residual subspace. They are calculated as follows:where the vector and represent the th column of matrices and , respectively. The column vector is the reconstructed value of . It can be calculated as follows:

For the two new monitoring statistics proposed in this section, it is obviously unreasonable to assume that these two metrics are subject to a particular distribution and calculate their upper control limits like traditional monitoring algorithm. Therefore, we use kernel density estimation (KDE) which is a more general method to calculate the actual distribution information of the monitoring statistics and then determine their upper control limits. The confidence interval is given by 99% during the calculation process.

4.3. Contribution Plot Method

It is well known that contribution plot method has been widely used in fault isolation research field. Once a fault is detected by the fault detection method, this method will be used to isolate the variables that are most likely to cause the failure. Contribution plot method is a commonly used fault separation method in PCA-based fault diagnosis algorithm. The monitoring statistics and SPE are often used in the contribution plot method. The corresponding contributions to the statistic and SPE statistic can be expressed using the following equations, respectively:where and . The vector represents the th column of unit matrix . The integer represents the number of process variables in PCA. The parameters and represent the contribution of each process variable to the monitoring statistics and SPE, respectively.

Alcala and Qin made a detailed analysis and summary of the contribution plot method [19]. Besides, a basic framework to construct contribution graph method was given in their research. Based on this framework, the corresponding contributions to the statistic and statistic can be designed as follows:where the parameters and represent the contribution of each process variable to the monitoring statistics and , respectively. The vector represents the th column of unit matrix . The natural number represents the number of process variables in GPNMF method.

5. Monitoring Performance

5.1. Fault Detection

In this section, the proposed method is applied to a 1000 MW unit boiler process. The boiler process for monitoring experiment contains three control systems: main steam temperature control system, main steam pressure control system, and feedwater control system. This process mainly consists of 33 consecutive process measurements variables, including 14 temperature signals, 9 pressure signals, 9 flow signals, and 1 power signal. All the 33 process variables are used for fault detection in this experiment. The normal operation history data of the boiler process is used as training data set which has 500 samples. For the fault data, the corresponding testing data set consists of 500 observations. Remarkably, the testing data set begins with normal operation data and the abnormal data is introduced from sample 51. All these process data are sampled every 3 sec.

In order to display the monitoring performance of MCM-GPNMF-based method, a bias fault in the main steam temperature is taken into account. The magnitude of the fault is 1% of the actual measured value. Next, we consider the condition that 10% of the measurement values in testing matrix are missing randomly. The testing matrix contains 33 variables and each variable has 500 samples. In other words, 1650 values of testing matrix are missing. The monitoring result of bias fault when 10% of testing data are missing is shown in Figure 2.

(a)

(b)

(c)

(d)

(e)

(f)

Figures 2(a) and 2(d) show the monitoring result based on the GPNMF algorithm when the testing data set is complete. Figures 2(b) and 2(e) represent the detection result when the same method is used and 10% of the measurement values in testing matrix are missing. Figures 2(c) and 2(f) represent the monitoring result of MCM-GPNMF-based algorithm when 10% of the measurement values in testing matrix are missing. As shown in Figure 2, the GPNMF-based monitoring algorithm has an excellent monitoring performance, if all the process data is complete. However, the monitoring performance of the method has a serious decline when the testing data set contains missing data. When the fault occurs, the localized features of missing values will have a little change. So it is a huge challenge for statistic of GPNMF, which is shown in Figure 2. Many fault samples are error-detected by statistic. For the statistic, although it can detect almost all of the faulty samples, many normal samples are error-detected. On the contrary, the statistic and statistic of MCM-GPNMF-based method can detect almost all the faulty samples (over 98% actually). Meanwhile, the MCM-GPNMF-based method has the lower false alarm rate of statistic than GPNMF, when the testing matrix is incomplete. However, in all three cases, it should be mentioned that several normal samples are error-detected, but the monitoring result of normal samples is still acceptable. In summary, the presented MCM-GPNMF-based monitoring method can deal with the fault detection problem, which contains incomplete data.

5.2. Fault Isolation

Based on the monitoring result of the previous section, in this part, we will use the contribution plot based on statistic and statistic to identify the variable that most likely leads to the fault. Due to the fact that the accuracy of fault variable identification is closely related to the accuracy of fault detection, it is meaningful to calculate the contribution value of each variable after the system fault is detected accurately. The most obvious change when bias fault occurs is that the value of main steam temperature suddenly changes by the reason of a step temperature change at itself. The main steam temperature is the 13th variable of the chosen boiler process. Both of the contribution plots corresponding to statistic and statistic, shown in Figure 3, identify this variable accurately. It could illustrate the effectiveness of the presented algorithm.

6. Conclusions

In recent years, the NMF-based algorithm, which is still under development, has shown great potential in the field of industrial process fault detection. However, there are still some weaknesses for these methods in dealing with the problem of missing data. This paper proposed a new variant of NMF named generalized projection nonnegative matrix factorization which combines with the second-order Markov chain model for the situation that the process data contain missing data, which significantly improves the detection rate of industrial process. Meanwhile, two types of monitoring indices, statistic and statistic, and the corresponding contribution plots were designed, respectively. As a result, the simulation in a 1000 MW unit boiler process showed that the proposed method yielded better results for fault detection and fault isolation. In other words, the monitoring method based on MCM-GPNMF proposed in this work is valuable for research and application.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

M. Kano and Y. Nakagawa, “Data-based process monitoring, process control, and quality improvement: recent developments and applications in steel industry,” Computers & Chemical Engineering, vol. 32, no. 1, pp. 12–24, 2008.
View at: Publisher Site | Google Scholar
S. J. Qin, “Survey on data-driven industrial process monitoring and diagnosis,” Annual Reviews in Control, vol. 36, no. 2, pp. 220–234, 2012.
View at: Publisher Site | Google Scholar
P. R. C. Nelson, J. F. MacGregor, and P. A. Taylor, “The impact of missing measurements on PCA and PLS prediction and monitoring applications,” Chemometrics and Intelligent Laboratory Systems, vol. 80, no. 1, pp. 1–12, 2006.
View at: Publisher Site | Google Scholar
K. Lakshminarayan, S. A. Harp, and T. Samad, “Imputation of missing data in industrial databases,” Applied Intelligence, vol. 11, no. 3, pp. 259–275, 1999.
View at: Publisher Site | Google Scholar
A. A. Nudel'man, The Markov Moment Problem And Extremal Problems: Ideas And Problems of PL Čebyšev And AA Markov And Their Further Development, American Mathematical Society, 1977.
L. E. Baum and J. A. Eagon, “An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology,” Bulletin of the American Mathematical Society, vol. 73, pp. 360–363, 1967.
View at: Publisher Site | Google Scholar | MathSciNet
I.-K. Jeon and K.-W. Lee, “A dynamic Markov chain prediction model for delay-tolerant networks,” International Journal of Distributed Sensor Networks, vol. 12, no. 9, pp. 1–7, 2016.
View at: Publisher Site | Google Scholar
M. Yoder, A. S. Hering, W. C. Navidi, and K. Larson, “Short-term forecasting of categorical changes in wind power with Markov chain models,” Wind Energy, vol. 17, no. 9, pp. 1425–1439, 2014.
View at: Publisher Site | Google Scholar
Z. Li, X. Wu, and H. Peng, “Nonnegative Matrix Factorization on Orthogonal Subspace,” Pattern Recognition Letters, vol. 31, no. 9, pp. 905–911, 2010.
View at: Publisher Site | Google Scholar
D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999.
View at: Publisher Site | Google Scholar
D. D. Lee and H. S. Seung, Algorithms for Non-Negative Matrix Factorization, Advances in Neural Information Processing Systems, 2001.
G. D'Amico, F. Petroni, and F. Prattico, “First and second order semi-Markov chains for wind speed modeling,” Physica A. Statistical Mechanics and its Applications, vol. 392, no. 5, pp. 1194–1201, 2013.
View at: Publisher Site | Google Scholar | MathSciNet
A. E. Raftery, “A model for high-order Markov chains,” Journal of the Royal Statistical Society, Series B: Methodological, pp. 528–539, 1985.
View at: Google Scholar
D. Cai, X. He, J. Han, and T. S. Huang, “Graph regularized nonnegative matrix factorization for data representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1548–1560, 2011.
View at: Publisher Site | Google Scholar
H. Lee, J. Yoo, and S. Choi, “Semi-supervised nonnegative matrix factorization,” IEEE Signal Processing Letters, vol. 17, no. 1, pp. 4–7, 2010.
View at: Publisher Site | Google Scholar
Z. Yuan and E. Oja, “Projective nonnegative matrix factorization for image compression and feature extraction,” in Proceedings of the Scandinavian Conference on Image Analysis, Springer, Berlin, Germany, 2005.
View at: Google Scholar
A. Tikhonov, “Solution of incorrectly formulated problems and the regularization method,” Soviet Mathematics Doklady, vol. 5, 1963.
View at: Google Scholar
A. N. Langville et al., “Algorithms, initializations, and convergence for the nonnegative matrix factorization,” https://arxiv.org/abs/1407.7299.
View at: Google Scholar
C. F. Alcala and S. J. Qin, “Reconstruction-based contribution for process monitoring,” Automatica. A Journal of IFAC, the International Federation of Automatic Control, vol. 45, no. 7, pp. 1593–1600, 2009.
View at: Publisher Site | Google Scholar | MathSciNet

Copyright

Copyright © 2017 Niu Yuguang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1939

Downloads

743

Citations