Shock and Vibration

Volume 2018 (2018), Article ID 6024874, 12 pages

https://doi.org/10.1155/2018/6024874

## An Enhancement Deep Feature Extraction Method for Bearing Fault Diagnosis Based on Kernel Function and Autoencoder

School of Mechanical Engineering, Dalian University of Technology, Dalian 116024, China

Correspondence should be addressed to Fengtao Wang

Received 22 November 2017; Accepted 17 January 2018; Published 27 February 2018

Academic Editor: Murat Inalpolat

Copyright © 2018 Fengtao Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Rotating machinery vibration signals are nonstationary and nonlinear under complicated operating conditions. It is meaningful to extract optimal features from raw signal and provide accurate fault diagnosis results. In order to resolve the nonlinear problem, an enhancement deep feature extraction method based on Gaussian radial basis kernel function and autoencoder (AE) is proposed. Firstly, kernel function is employed to enhance the feature learning capability, and a new AE is designed termed kernel AE (KAE). Subsequently, a deep neural network is constructed with one KAE and multiple AEs to extract inherent features layer by layer. Finally, softmax is adopted as the classifier to accurately identify different bearing faults, and error backpropagation algorithm is used to fine-tune the model parameters. Aircraft engine intershaft bearing vibration data are used to verify the method. The results confirm that the proposed method has a better feature extraction capability, requires fewer iterations, and has a higher accuracy than standard methods using a stacked AE.

#### 1. Introduction

Effective health diagnosis of rolling bearing is a significant initiative in today’s industry. The bearing of rotating machinery will inevitably experience various faults under harsh working conditions such as large loads, strong impacts, and high speed [1]. The faults may lead to serious casualties if it has not been seasonably detected. Therefore, it is crucial to accurately and automatically diagnose the different faults before they cause serious damage.

The method based on vibration signal has been widely studied and applied in virtue of vibration signals usually carrying rich information [2], and the intelligent diagnosis method is especially considered in recent years. Intelligent fault diagnosis of rotating machinery is a type of pattern recognition problem consisting of three steps, including data preprocessing, feature extraction and selection, and fault classification [3, 4]. First, the raw data collected by the sensor are preprocessed. Then the time-domain, frequency-domain, and time-frequency-domain features are extracted and selected manually. Finally, a classifier is applied using these features to provide a fault diagnosis. The rapid development of artificial intelligence has resulted in an increased use of machine learning methods for bearing fault diagnosis and examples include artificial neural network (ANN) and support vector machine (SVM) methods. Bin et al. [5] utilized wavelet packets-empirical mode decomposition to decompose the original signal and extracted statistical features as multilayer perceptron network for fault classification. Zhang et al. [6] extracted nineteen statistical features from the measured vibration signals as inputs for SVM to recognize the roller bearing operation conditions. In [7], the signal is firstly filtered by morphological filter and then decomposed by the empirical mode decomposition (EMD) method; the extract features are mapped into the LTSA to extract the character features used as an input to an SVM for diagnosis. However, the traditional neural network method requires manual feature selection, which requires considerable theoretical knowledge and practical experience. The widely used ANN and SVM methods represent supervised learning models with a shallow structure that lacks sufficient representation of the fault features and a large number of labeled data are required [8].

Deep learning represents a novel pattern recognition approach and was proposed by Hinton and Salakhutdinov in 2006 [9]; the method has developed rapidly in recent years. In addition to being used for image recognition and speech recognition, this method has resulted in breakthroughs in the field of bearing fault diagnosis. Due to the multilayered structure of deep learning, it can derive fault information of the bearings from historical data and provide an accurate assessment. Jia et al. [10] used a denoising AE to construct a deep learning model, selected the Fourier coefficient of the original signal as an input, and achieved good results according to the fault diagnosis test of rolling bearings and gears. Shao et al. [11] combined a contractive autoencoder with the denoising autoencoder, and adopted the locality preserving projection algorithm to carry out the feature fusion and achieved a good performance. Chen and Li [12] used a sparse AE to fuse the features of multiple sensors and developed a deep belief network (DBN) to conduct a fault diagnosis of rolling bearings. AEs are a widely used model in deep learning and are capable of extracting deep features from unlabeled data using a multilayer coding process. However, for many applications, the model training is difficult and the performance is affected by the structure of the hidden layer and the number of iterations. Moreover, manual signal processing or feature selection is still required for a bearing fault diagnosis [13, 14], which reduces the applicability of the deep learning method. In order to further improve the diagnostic performance of the deep learning network and improve the model applicability, a kernel function method is applied to the AE.

The kernel function is an effective method utilized in machine learning to solve the nonlinear problems; examples include SVM [15] and radial basis function (RBF) neural network [16]. The input space is mapped to a high-dimensional feature space using a nonlinear transform and the calculation in the high-dimensional features space is performed using a kernel function; this approach reduces the computational complexity. Some conventional methods have been improved by various approaches of the kernel function method, including the kernel principal component analysis (KPCA) [17], the kernel independent component analysis (KICA) [18], and the kernel discriminant analysis (KDA) [19]. Hence, a novel KAE network based on a kernel function combined with an AE is proposed.

In this study, an enhancement deep feature extraction method is developed with one KAE and* N* AEs; the input data is mapped to a high-dimensional space and a coding network is used for coding the calculations of the high-dimensional data by KAE; then the* N* AEs are carried out to extract deep feature layer by layer. Three experiments were applied to validate the proposed method on an aircraft engine intershaft bearing test rig. The results show that the proposed method is capable of extracting the features of the bearing faults with fewer iterations and with a higher accuracy compared to standard methods.

The remainder of this paper is organized as follows. Section 2 presents the fundamental theory of autoencoder and stacked autoencoder network. Section 3 presents the proposed method and the procedure. Section 4 discusses the fault diagnosis result in three experiments and features extraction capability which are visualized by principal component analysis method. The conclusions are given in Section 5.

#### 2. The Fundamental Theory

##### 2.1. Autoencoder

A common AE is a three-layer network consisting of an encoder network and a decoded network. The encoder network connects the input layer and the hidden layer, which can obtain the features of the original data. The hidden layer and the output layer are connected by the decoder network that reconstructs the output, which is equal to the input based on the low-dimensional coding data.

The encoder network is defined as an encoding function denoted by [20]. For the training sample , the encoder takes the input vector nonlinear mapping to a hidden representation through :where is the activation function of encoder, and the parameter set of encoder is , where is the weight matrix and is bias vector.

The decoder network is defined as a reconstruction function denoted by . It maps transform back into a reconstruction vector :where are the parameter set of encoder, weight matrix , and is bias vector.

The parameter set of the autoencoder is optimized to minimize the reconstruction error through the training processwhere is the loss function that means the discrepancy between and .

When the loss function is sufficiently small, it can be assumed that the coding vector is capable of reconstructing the original input vector; that is, most of the information contained in the original data is included in the encoding vector. The automatic coding network is also a nonlinear reduction method, in which the dimensions are lower for the encoding vector than the input vector.

##### 2.2. Stacked Autoencoder Network

An AE is an unsupervised three-layer learning network but its information extraction ability is limited and it lacks sufficient structure to represent the deep characteristics of the signal. A stacked autoencoder (SAE) uses multiple AE layers to develop more hidden layers; each AE layer performs a nonlinear transformation of the input samples from the preceding layer to the following one. During the training process, the hidden layers of the AE layers represent the inputs to the succeeding AE layer and the network uses an unsupervised learning algorithm layer by layer to extract the features from the input data.

The input layer and the first hidden layer of the SAE are regarded as the encoder network of the first AE. After the first AE is trained through minimizing the reconstruction error in (3), the first encode vector of the is calculated as follows:where is the parameter set of the first AE.

Then the encode vector is treated as input data; the first hidden layer and the second hidden layer of the SAE are regarded as the encoder network of the second AE. The process is conducted in the sequence until the* N*th AE is trained for initializing the final hidden layer of the SAE. And the* N*th encode vector is calculated aswhere is the parameter set of the* N*th AE.

Subsequently, a backpropagation (BP) algorithm is used to fine-tune the network parameters using a supervised approach. The SAE is a type of deep neural network (DNN) that combines a supervised and an unsupervised approach [21, 22].

#### 3. The Proposed Method

##### 3.1. Autoencoder Based on a Kernel Function

The encoding process of the AE is a nonlinear calculation but for low-dimensional raw data an accurate classification requires a large number of iterations and a long calculation time; this approach is also prone to misclassifications. In order to solve this problem, the kernel function method is combined with the AE.

A kernel function is defined as a nonlinear mapping from the input space to the characteristic space ; for all , function . Because the calculation of the hidden layer in the feature space is provided by the kernel function, a specific mapping relationship does not have to be defined, which greatly reduces the computational complexity of the problem.

Based on the above theory, an improved method KAE, which combines the kernel function and the AE, is proposed. First, the Gram matrix of the kernel function is calculated and its input is the new automatic encoder; the coding process changes towhere are any two samples of the training data.

Correspondingly, the decoding function is changed to

The improved AE network firstly maps the data to a high-dimensional space; then the high-dimensional data are coded and calculated and the nonlinear low-dimensional features are obtained. By adding the kernel functions, the original signal components are mapped to a high-dimensional space, which speeds up the coding process and improves the efficiency of extracting the signal characteristics and the classification accuracy. The algorithm structure diagram is shown in Figure 1.