Shock and Vibration

Volume 2018, Article ID 5794513, 16 pages

https://doi.org/10.1155/2018/5794513

## Nonlinear Model for Condition Monitoring and Fault Detection Based on Nonlocal Kernel Orthogonal Preserving Embedding

Department of Weaponry Engineering, Naval University of Engineering, Wuhan 430000, China

Correspondence should be addressed to Fuqing Tian; moc.621@100qfnait

Received 30 November 2017; Revised 28 April 2018; Accepted 5 May 2018; Published 11 June 2018

Academic Editor: Juan C. G. Prada

Copyright © 2018 Bo She et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The dimension reduction methods have been proved powerful and practical to extract latent features in the signal for process monitoring. A linear dimension reduction method called nonlocal orthogonal preserving embedding (NLOPE) and its nonlinear form named nonlocal kernel orthogonal preserving embedding (NLKOPE) are proposed and applied for condition monitoring and fault detection. Different from kernel orthogonal neighborhood preserving embedding (KONPE) and kernel principal component analysis (KPCA), the NLOPE and NLKOPE models aim at preserving global and local data structures simultaneously by constructing a dual-objective optimization function. In order to adjust the trade-off between global and local data structures, a weighted parameter is introduced to balance the objective function. Compared with KONPE and KPCA, NLKOPE combines both the advantages of KONPE and KPCA, and NLKOPE is also more powerful in extracting potential useful features in nonlinear data set than NLOPE. For the purpose of condition monitoring and fault detection, monitoring statistics are constructed in feature space. Finally, three case studies on the gearbox and bearing test rig are carried out to demonstrate the effectiveness of the proposed nonlinear fault detection method.

#### 1. Introduction

Mechanical equipment is widely used in modern industrial production, but it often suffers from damage during the long time operation, such as the fracture of bearings and the broken tooth of gears; the defect of these parts may cause the performance of the machine to degrade, or even cause security accidents. Therefore, the fault detection of mechanical equipment is of great significance to ensure the safety of the industrial production process and the economic benefits. In recent years, the multivariate statistical process monitoring (MSPM) technique has been developed and used to detect the faults in industrial production process, such as principal component analysis (PCA) [1], partial least squares (PLS) [2], and independent component analysis (ICA) [3]. These classical monitoring methods perform dimension reduction on the process data and extract few components to construct monitoring statistics which can reflect the characteristics of the original data, at this point, the performance of dimension reduction will affect the monitoring effect.

Multivariate data-driven statistical PCA-based monitoring framework is the most frequently employed method in condition monitoring and fault detection field. To overcome the weakness that linear monitoring method may perform poorly in processing the nonlinear monitoring processes, KPCA-based monitoring method is widely investigated and used to detect faults [4, 5]. Although the improved PCA-based monitoring methods can retain latent features of raw data, they only capture the global structure of the data, and the local structure characteristics in the data have been ignored. However, the features extracted from the local structure of the data can also represent the different aspects of the data. The loss of the important information may have impact on dimension reduction and monitoring result [6].

As opposed to the global data structure preserving dimension reduction techniques, manifold learning methods have been developed to preserve the local data structure characteristics, represented by Laplacian eigenmap (LE) [7], local preserving projections (LPP) [8], locally linear embedding (LLE) [9], and neighborhood preserving embedding (NPE) [10]. LPP and NPE both are linear projection methods that can process the testing data conveniently; manifold learning based monitoring methods can overcome some limits of the PCA-based monitoring method. However, these manifold learning methods only consider the neighborhood relationships to preserve local properties among samples and thus may also lose crucial information contained in the global data structure. In order to take both global and local data structure characteristics into account, the methods which unify LPP and PCA have been proposed, and the fault detection performances have proven to be better than LPP and PCA [11, 12]. But these approaches are still linear methods, as they are employed to process the nonlinear process data; these methods have limitations and may obtain a poor monitoring performance.

On the other hand, kernel function is usually investigated to extend linear methods to nonlinear methods, by mapping original data from input space into high dimension feature space, and then perform the linear method in feature space. For the purpose of taking full advantages of global and local data structure and processing the nonlinear monitoring problem efficiently, a kernel global-local preserving projections (KGLPP) method [13] based on KLPP and KPCA has been proposed, and the results show that it outperforms the linear global-local preserving projections (GLPP) method [14]. Orthogonal neighborhood preserving embedding (ONPE) is an orthogonal form of conventional NPE algorithm, which adds an additional orthogonal constraint on the projection vectors [15]; thus, ONPE not only inherits the local structure preserving feature, but also can avoid the distortion defects of NPE [16]. Moreover, the orthogonal property is also an advantage for fault detection and fault diagnosis. Firstly, orthogonal transformations can enhance the locality preserving power, which is effective in data reconstruction and computing reconstruction error; it is useful for fault detection. Secondly, the dimension reduction methods considering orthogonal constraint can improve the performance of identification, which is helpful to detect fault effectively [17].

In this paper, a new nonlinear dimension reduction method named nonlocal kernel orthogonal preserving embedding (NLKOPE) is proposed on the basis of a linear dimension reduction method named nonlocal orthogonal preserving embedding (NLOPE). NLOPE takes both advantages of ONPE and PCA into account; NLKOPE is a nonlinear extension of NLOPE. The exponentially weighted moving average (EWMA) statistic is built for condition monitoring and fault detection. To verify the effectiveness of our proposed methods, these methods are employed to detect the faults of gearbox and evaluate the performance degradation of bearing. In order to diagnose the fault type of the bearing, dual-tree complex wavelet packet transform (DTCWPT) is used for noise reduction, and Hilbert transform envelope algorithm is employed to extract the fault characteristic frequency.

The rest of the paper is organized as follows. KPCA, ONPE, and KONPE are reviewed and analyzed in Section 2. The proposed NLOPE-based monitoring method is developed in Section 3. The proposed NLKOPE-based monitoring method is developed in Section 4. In Section 5, three cases are used to demonstrate the effectiveness of the proposed methods. Finally, conclusions are drawn in Section 6.

#### 2. Background Techniques

##### 2.1. Kernel Principal Component Analysis

As a multivariate method, PCA is widely used for process monitoring. However, for some complicated cases in industrial processes with nonlinear characteristics, PCA performs poorly as it takes the process data as linear, and some useful nonlinear features may be lost when the PCA is used to reduce the dimension and extract features. KPCA performs a nonlinear PCA that constructs a nonlinear mapping from the input space to the feature space through the kernel function. Given data set , where is the number of samples and is the number of variables, the samples in the input space are extended into the feature space by using a nonlinear mapping , the covariance matrix in the feature space can be expressed aswhere it is assumed that the data set in feature space is centered and . The principal component can be calculated by solving the eigenvalue problem in the feature space.where denotes the dot product between and and denotes eigenvalue and denotes eigenvector. For , eigenvector is regarded as a linear combination of with the coefficients , .

Multiplying at both sides of (2), , we obtain

Defining a kernel matrix with , (4) can be expressed aswhere , by solving the eigenvalue problem of (5), this yields eigenvectors with the eigenvalues . The coefficients are normalized to satisfy and to ensure . The projection of a test sample is obtained as follows:

##### 2.2. Kernel Orthogonal Neighborhood Preserving Embedding

###### 2.2.1. Orthogonal Neighborhood Preserving Embedding

Given a high dimensional data set , as a linear dimension reduction method, ONPE is used to find a transformation matrix that maps the high dimensional data set to the low dimensional data set , that is, . In NPE algorithm, in order to preserve the local geometric structure, the adjacency graph is built to reflect the relationship between samples; each sample can be reconstructed by a linear combination of its neighbors with the corresponding weight coefficients. The weight coefficients matrix is computed by minimizing the following objective function:

The reconstruction weight coefficients between and its neighbors are preserved to reconstruct by using its corresponding neighbors. Then, the low dimensional embedding is obtained by optimizing the error function: where , , , and is the number of neighbors of . If is not the neighbor of , . In ONPE, any high dimensional data can be mapped into the reduced space by the orthogonal projection matrix , according to (7)-(8), matrix is obtained by the following formulations:where , . Using the Lagrange multiplier method, the orthogonal vectors is calculated iteratively as follows:(1) is the eigenvector corresponding to the smallest eigenvalue of matrix .(2) is the eigenvector corresponding to the smallest eigenvalue of matrix :where .

###### 2.2.2. Kernel Orthogonal Neighborhood Preserving Embedding

ONPE algorithm has some nonlinear data processing ability, but it is essentially a linear method, the limitations are obvious as it comes to extract the nonlinear features of data. KONPE is a nonlinear extension of ONPE; the original data are mapped into the kernel space that it is feasible to use ONPE to obtain low dimensional nature characteristics.

Given a data set , the data are projected on to the high dimensional feature space by using the nonlinear mapping function , assuming that the data is centered and satisfies . Defining be the linear mapping matrix to project the data from feature space to the low dimensional space, the low dimensional embedding are obtained by using matrix , that is . The matrix can be expressed as a linear combination of with the coefficients .where . According to , the formulations for calculating the matrix are as follows:where , . The problem of computing can be converted to solve the coefficients vectors based on the Lagrange multiplier method. According to (16) and (18), the specific steps are as follows:(1) is the eigenvector corresponding to the smallest eigenvalue of matrix .(2) is the eigenvector corresponding to the smallest eigenvalue of matrix : where ; the kernel matrix should be centered by where .

Given a test sample , the variable of the low dimensional sample is obtained by mapping into vector in the feature space, .

The test kernel matrix should also be centered as follows:where , .

#### 3. Nonlocal Orthogonal Preserving Embedding

##### 3.1. Algorithm Description

In order to preserve both the local and global data structures, NLOPE algorithm is proposed to unify the advantages of PCA and ONPE. Given a data set , the objective function of NLOPE is as follows:where , .

Using the Lagrange multiplier method, the projection vector can be calculated by solving following eigenvector problems:(1) is the eigenvector corresponding to the smallest eigenvalue of matrix .(2) is the eigenvector corresponding to the smallest eigenvalue of matrix :where , is the dimension of samples in NLOPE space, , , and . A strict mathematical proof of the projection vector is given in Appendix A.

##### 3.2. Selection of Parameter

The parameter describes different roles of global and local data structure preserving in constructing the NLOPE model; it is important to choose an appropriate value of , which will affect the extraction of latent variables. As we need to solve a dual-objective optimization problem, usually it is hard to find an absolutely optimal solution which simultaneously optimizes the two subobjectives. However, it is possible to obtain a relatively optimal solution by making balance between them. The parameter is used to balance the matrix and matrix in (24), and it can be regarded as balancing the energy variations of and . Thus, we choose spectral radius of the matrix to estimate the value of .

To balance the global and local structure of the data, can be selected as follows [6]: where and denote the energy variations of and . is the spectral radius of the matrix. and are defined in (24). Thus, is computed by

##### 3.3. Monitoring Model

In PCA-based monitoring method, Hotelling’s statistic and the squared prediction error statistic are often used for fault detection. Similarly, the two monitoring statistics are applied in NLOPE-based model. Hotelling’s is used to measure the variation in latent variable space and detects a new sample if the variation in the latent variables is greater than the variation explained by the model, which is computed aswhere is the covariance matrix of the projection vectors of training samples in the NLOPE subspace.

The squared prediction error statistic is a measurement of the variation in residual space and is used to measure the goodness of fit of the new sample to the model; it is defined as follows [20]:where is the embedding of input sample in NLOPE subspace.

As it is hard to estimate the condition of the machine only by the original vibration signal, some features need to be constructed. Time-domain and frequency-domain features can be generated from vibration data and are also widely used to characterize the state of machinery, time-domain features such as kurtosis, crest factor, and impulse factor, i.e., are sensitive to impulsive oscillation, and frequency-domain features can reveal some information that can not be characterized by time-domain features. In this study, 11 time-domain features and 13 frequency-domain features [21] were extracted from each sample to construct the high dimensional feature sample. For the purpose of condition monitoring and fault detection, it is critical to extract the most useful information hidden in current machine state. Therefore, the dimension reduction methods can be employed to extract latent features effectively.

In order to detect the incipient fault of mechanical equipment more accurately and reliability, the exponentially weighted moving average (EWMA) statistic based on a combined index of and statistics is developed to detect the fault of the mechanical equipment. The combined index is a summation of and statistics as follows:where and are the control limits of and statistics, they can be computed by kernel density estimation (KDE) algorithm, the values of statistic should be normalized between 0 and 1 by using the maximal value and minimum value of , and the values of statistic should be normalized, too.

The statistic is computed as follows:where is calculated by the average of preliminary data and is a smoothing constant between 0 and 1. While is large, the value of puts more weight on the current statistic than on the historic statistic. Calculate the control limit for statistic by KDE method, too. In this study, the value of is set to 0.2.

The offline modeling procedure is listed as follows:(1)The healthy samples are used to be the training samples, convert each original training sample into the high dimensional feature sample, and then normalize the high dimensional feature samples to zero mean and unit variance.(2)Use (26) to calculate the projection matrix , and calculate the projected vectors of the training samples in the NLOPE subspace.(3)Compute and statistics of all training samples, and calculate the control limits and , and then obtain the EWMA statistics and the control limit .

The online monitoring procedure is listed as follows:(1)Convert each testing sample into the high dimensional feature sample, and normalize the high dimensional feature samples with the mean and variance of the training feature samples .(2)Calculate the projected vectors of testing samples as .(3)Compute EWMA statistics associated with , and monitor if they exceed the control limit .

#### 4. Nonlocal Kernel Orthogonal Preserving Embedding

##### 4.1. Algorithm Description

NLKOPE algorithm performs a nonlinear NLOPE by using the kernel trick. Given a data set , the nonlinear mapping is used to project onto the feature space, the data set in the feature space is assumed to be centered, and . The objective function of NLKOPE algorithm is computed as follows:According to (33), computing is reduced to obtain the coefficients vectors , using the Lagrange multiplier method, the formulation is converted as follows:

The coefficients vectors are obtained as(1) is the eigenvector corresponding to the smallest eigenvalue of matrix :(2) is the eigenvector corresponding to the smallest eigenvalue of matrix :where , is the dimension of samples in NLKOPE space, , , , and is the centered kernel matrix of obtained by (21). A strict mathematical proof of the coefficients vectors is given in Appendix B.

##### 4.2. Selection of Parameter

The method to choose the parameter in NLKOPE model is same as in the NLOPE model, while the values of spectral radius are different, and is set aswhere , .

##### 4.3. Monitoring Model

Hotelling’s statistic and the squared prediction error statistic are also used in NLKOPE-based model to monitor the abnormal variations. The statistic defines aswhere is the covariance matrix of the projection vectors in the training samples.

The statistic defines as [22]where is the centered kernel vector of obtained via (23).

The offline modeling procedure is listed as follows:(1)The healthy samples are used to be the training samples, convert each original training sample into the high dimensional feature sample, and then normalize the high dimensional feature samples to zero mean and unit variance.(2)Compute the kernel matrix by selecting a kernel function, and center the kernel matrix via (21).(3)Obtain the projection matrix by calculating the eigenvector problem of (36)-(37).(4)Compute and statistics of all training samples, and calculate the control limits and , and then obtain the EWMA statistics and the control limit .

The online monitoring procedure is listed as follows:(1)Convert each testing sample into the high dimensional feature sample, and normalize the high dimensional feature samples with the mean and variance of the training feature samples ..(2)Compute the kernel vector and center it to get via (23).(3)Calculate the projected vectors of testing samples via (41).(4)Compute EWMA statistics associated with , and judge whether they exceed the control limit .

The procedure of condition monitoring and fault detection by the method of NLKOPE is shown in Figure 1. The healthy vibration signals are collected to implement NLKOPE and construct the offline model, and then the model will be employed to implement online condition monitoring, fault detection, and performance degradation assessment.