Journal of Control Science and Engineering

Volume 2017 (2017), Article ID 2697297, 8 pages

https://doi.org/10.1155/2017/2697297

## Fault Diagnosis Method Based on Information Entropy and Relative Principal Component Analysis

School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China

Correspondence should be addressed to Chenglin Wen; nc.ude.udh@lcnew

Received 19 December 2016; Accepted 22 January 2017; Published 20 February 2017

Academic Editor: Xiao He

Copyright © 2017 Xiaoming Xu and Chenglin Wen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In traditional principle component analysis (PCA), because of the neglect of the dimensions influence between different variables in the system, the selected principal components (PCs) often fail to be representative. While the relative transformation PCA is able to solve the above problem, it is not easy to calculate the weight for each characteristic variable. In order to solve it, this paper proposes a kind of fault diagnosis method based on information entropy and Relative Principle Component Analysis. Firstly, the algorithm calculates the information entropy for each characteristic variable in the original dataset based on the information gain algorithm. Secondly, it standardizes every variable’s dimension in the dataset. And, then, according to the information entropy, it allocates the weight for each standardized characteristic variable. Finally, it utilizes the relative-principal-components model established for fault diagnosis. Furthermore, the simulation experiments based on Tennessee Eastman process and Wine datasets demonstrate the feasibility and effectiveness of the new method.

#### 1. Introduction

In the process of the industry manufacturing, there is a large amount of variables that is highly correlative; these variables contain the essential information that would be helpful to judge the status of the system. As a result, it is an important problem to find and predict the fault through this information to ensure that the equipment always works in a safe and reliable way [1, 2].

However, during the industry manufacturing, the collected characteristic variables have different units; this raises a problem that we may come out with the different result only due to the unit difference; hence, we have to standardize the unit. What is more, after the standardization, it is inevitable to lose the diversity among different variables and present the property of distribution uniformity in the perspective of geometry which makes it hard to extract the principle component for compression and diagnosis. As to overcome these problem, some methods have been proposed recently [3–8]. Shi et al. use the Mahalanobis distance for relative transformation to reduce the effect of the dimension standardization [4]. Tang et al. propose a relative transformation principal component analysis to reduce the data noise for the transformation oil breakdown voltage prediction [5]. Yi et al. introduce a relative transformation operator to change the original variables in the spatial distribution and eigenvalues of the covariance matrix in the feature space [6]. Wen et al. propose a method called Relative Principle Component Analysis (RPCA); it introduces weighting for each variable based on the prior information of the system to eliminate the false information due to standardizing the variable units [7, 8], but the shortage of this method is that it needs a large amount of prior information from the system which is hard to gain in real engineering application.

In order to solve the problem, this paper introduces the concept of information entropy and proposes a new fault diagnosis method that combines the information entropy and relative transformation PCA which is called information entropy relative transformation PCA (InEnRPCA). The information entropy is put forward by Shannon in 1948 [9]; it indicates that the redundancy exists among any information and can be measured based on the symbol in the information such as the number, alphabet, and words. With the development of the information theory, the information entropy has become an effective method to measure the degree of importance for each feature in the sample and achieved a wide application in many areas. For instance, while using the decision-making tree for classification, Hu et al. use the information entropy to calculate the significance for each feature and then prune the decision-making tree and reduce the false alarm [10]. Y. Y. Chen and Y. M. Chen use the information entropy to measure the uncertainty on the data in the attribute reduction algorithm [11]. Wang et al. use the information entropy to balance the weight of each sememe in the area of natural language processing [12].

As to the problem we are facing, we start in the perspective of information theory and use the information gain algorithm to extract information entropy as the heuristic knowledge from the original dataset, then use it to calculate the relative transformation factor and allocate the weight for each standardized characteristic variable, and, finally, use the corresponding RPCA method for fault diagnosis.

The rest of this paper is organized as follows: in Section 2, we review the information entropy and information gain definition and algorithm. The original relative transformation PCA method is given in Section 3. Our simulation experiments on Tennessee Eastman process and the Wine dataset from UCI are stated in Section 4, where we make a comparisons between PCA, USPCA, and improved InEnRPCA with thirteen datasets to demonstrate the effectiveness of the new method. Finally, Section 5 gives conclusions and some discussions.

#### 2. Overview of Our Approach

A brief overview of our fault diagnosis approach is given in this section. As the framework in Figure 1 shows, the proposed approach consists of two parts: calculating the relative transformation operators based on information entropy and fault diagnosis based on RPCA-*k*NN.