Journal of Control Science and Engineering

Volume 2018, Article ID 1025353, 9 pages

https://doi.org/10.1155/2018/1025353

## Fault Diagnosis Method Based on Gap Metric Data Preprocessing and Principal Component Analysis

School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China

Correspondence should be addressed to Chenglin Wen; nc.ude.udh@lcnew

Received 22 February 2018; Accepted 11 April 2018; Published 17 May 2018

Academic Editor: Zhijie Zhou

Copyright © 2018 Zihan Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Principal component analysis (PCA) is widely used in fault diagnosis. Because the traditional data preprocessing method ignores the correlation between different variables in the system, the feature extraction is not accurate. In order to solve it, this paper proposes a kind of data preprocessing method based on the Gap metric to improve the performance of PCA in fault diagnosis. For different types of faults, the original dataset transformation through Gap metric can reflect the correlation of different variables of the system in high-dimensional space, so as to model more accurately. Finally, the feasibility and effectiveness of the proposed method are verified through simulation.

#### 1. Introduction

As the complexity of industrial manufacturing systems increases, the correlation between variables in the system becomes more complex, and these variables contain important information about the status of the system. Therefore, it is an important issue that fault detection and diagnosis of the system are through the information of these variables.

However, in industrial manufacturing systems, because of the different dimensions of system variables, it is usually necessary to preprocess the data to standardize the data. In traditional data preprocessing methods, ignoring the influence of dimension on the correlation between system variables leads to the lack of correlation of system variables after data preprocessing, which makes it difficult to extract the representative principal components. Therefore, maintaining the correlation between system variables is the key to data preprocessing.

In order to solve this problem, many studies have been made. Wen et al. proposed a method called Relative Principle Component Analysis (RPCA) [1]; it introduces analyzing and determining the importance of each component to the prior information of the system, giving the corresponding weight of each component of the system, and establishing relative principal component model. Literature [2] proposed a fault diagnosis method based on information incremental matrix. Based on this, Yuan et al. proposed a relative transformation of information incremental matrix fault diagnosis method [3], which can effectively detect variables that play an important role in the system. Because of their smaller absolute value and less absolute changes, small changes of these important variables usually play a very crucial role in the system. Xu and Wen proposed a fault diagnosis method based on information entropy and relative principal component analysis [4]; in high-dimensional system, the high correlation of system variables leads to the model not being able to select the representative principal components. The approach given by them is to use information entropy to measure the uncertainty of variables and calculate the information gain of variables. According to the different degrees of importance of variables, relatively transform the data to get a more accurate data model. Jiao et al. proposed a method for simulation model validation based on Theil’s inequality coefficient and principal component analysis [5]; it is based on the TIC model; given a model of the differences in position and trend between the simulated output and the reference output, there is a correlation between the two differences, using PCA to obtain the verification results. Kangling et al. proposed a fault diagnosis study based on adaptive partition PCA [6]. In order to solve the inaccurate modeling problem, the diagnosis model can be automatically updated and adjusted, so as to improve the model matching and the accuracy of diagnosis results.

The Gap metric is proved to be more suitable for measuring the distance between two linear systems than the norm-based ones [7, 8], and the effect of dimension on each variable can be reflected in the Riemannian space when data preprocessing is performed. The gap metric is widely used in the study of the uncertainty and robustness of the feedback system. Tryphon proposed a method that can easily calculate the gap metric [9], which improves the practicality of the gap metric. Literature [10] proposed the concept of -gap metric in the frequency domain and then extended it to nonlinear systems [11]. Ebadollahi and Saki proposed that in the multimodel predictive control, in order to track the maximum power point without losing its control performance, the gap metric method is used to divide the entire area of the partial load system into a corresponding linear model. This method ensures the stability of the original closed-loop system [12]. Konghuayrob and Kaitwanidvilai used -gap to measure the distance between two linear systems [13]. They used low-order controllers with similar dynamic characteristics to replace traditional high-level and complex controllers, and both controllers have similar dynamic characteristics and robustness. For multilinear model control of nonlinear systems, Du proposed a weighting method based on Gap metric [14]. The Gap metric method was used to calculate the weighting function of the local controller combination. The validity of the method was verified by the CSTR system.

In the principal component analysis method based on the traditional data preprocessing, when removing the original data dimension, all are based on the European metric data preprocessing method, and some important information of the data will be ignored, and often these data are important variables that contain slowly changing fault information. In the method proposed in this article, Gap metric can project data on Riemann spheres in Riemann space and can highlight the information easily ignored in the European space. After the Gap metric data preprocessing method is adopted, the eigenvalue feature vector decomposition is performed on the processed data matrix, and the principal component is constructed to construct the principal component space according to the cumulative percent variance criterion, since Gap metric can highlight the variable data with small absolute change and relatively large variation in the system variables, so we can extract the main component and the main component vector when constructing the principal component space. By calculating the statistical limit and the statistical limit of the normal system dataset, we detect whether the statistics and statistics of the test dataset exceed the limit to judge whether the system is faulty, and then the fault variables are separated by the contribution of system variables to fault samples.

The rest of this article is organized as follows. In the second section, we briefly review the PCA approach. In the third section, we propose a kind of improved PCA data preprocessing method, which is data preprocessing method based on gap metric. In the fourth section, we set up a system model and test the feasibility and effectiveness of the proposed method by different types of faults. In the fifth section, we give a summary and future research direction.

#### 2. PCA Based on Traditional Data Preprocessing

The basic idea of PCA is to decompose multivariable sample space into lower dimensional principal component subspaces composed of principal components variables and a residual subspace according to the historical data of process variables. And the statistics which can reflect the change of space are constructed in these two subspaces. Then, the sample vectors are, respectively, projected into two subspaces and compute the distance from the sample to the subspace. The process monitoring and fault detection are performed by comparing the distance with the corresponding statistics.

First, model by PCA to variable space. Select a set of variables under normal conditions as the original data. is a test sample that contains variables and each variable has independent samples. Construct the original measurement data matrix:Here, each column of represents a variable, and each row represents a sample. Because the dimensions of the measured variables are different, each column of the data matrix is normalized. Assuming that the normalized measurement data matrix is Here,,, is the average of columns,. is the variance matrix of .

The covariance matrix of is

The processing of the matrix is generally eigenvalue decomposition, according to the size of the eigenvalues arranged in descending order. The PCA model decomposes as follows:

where is the projection in the principal component space, is the projection in the residual space, and is the load matrix, which consists of the first eigenvectors of . is the scoring matrix, the elements of are called the primary variables, and is the number of the primary. The principal component space is the part of the modeling, and the residual space is not the part of the modeling, which represents the noise and fault information in the data.

The selection of the principal component number is based on the Cumulative Percent Variance (CPV). This criterion determines the number of principal elements based on the cumulative sum of percentages of principal components. The CPV represents the ratio of the data changes explained by the first principal component to the total data changes. Therefore, the cumulative contribution rate CPV of the first principal can be expressed as

where is the eigenvalue of the covariance matrix . In general, when the cumulative contribution rate reaches 85% or more, it is considered that the number of elements contains enough information of the original data.

#### 3. Improved Data Preprocessing Method

In this section, we propose a kind of data preprocessing method based on Gap metric.

##### 3.1. Gap Metric

In the Riemann space, and are used to represent the spherical projection of complex numbers and on a three-dimensional Riemann ball with diameter 1 and the chord between and is denoted by ; then is defined by is used to express the spherical distance between and , that is, the arc length connecting and on the Riemann ball; thenAs can be seen from Figure 1, the shortest arc length on the circle is obtained from a plane-cut Riemann sphere determined by 3 points of the center of the ball, and .