Abstract

The vibration signals are usually characterized by nonstationary, nonlinearity, and high frequency shocks, and the redundant features degrade the performance of fault diagnosis methods. To deal with the problem, a novel fault diagnosis approach for rotating machinery is presented by combining improved local mean decomposition (LMD) with support vector machine–recursive feature elimination with minimum redundancy maximum relevance (SVM-RFE-MRMR). Firstly, an improved LMD method is developed to decompose vibration signals into a subset of amplitude modulation/frequency modulation (AM-FM) product functions (PFs). Then, time and frequency domain features are extracted from the selected PFs, and the complicated faults can be thus identified efficiently. Due to degradation of fault diagnosis methods resulting from redundant features, a novel feature selection method combining SVM-RFE with MRMR is proposed to select salient features, improving the performance of fault diagnosis approach. Experimental results on reducer platform demonstrate that the proposed method is capable of revealing the relations between the features and faults and providing insights into fault mechanism.

1. Introduction

Rotating machines are the heart of some modern equipment such as aircraft engine, gas turbine, and reducer and often a common source of machine faults. However, most vibration signals are nonstationary, nonlinear, and disturbed with heavy background noises, and the machinery components are seriously coupled, providing a challenge to detect and identify specific rotating machine faults accurately [1, 2]. For accurate diagnosis of rotating machine fault, a number of techniques have been developed in recent years [35].

Generally, vibration signals from mechanical equipment are the most effective and common way to identify the states and faults of rotating machine [3]. According to the domains to which the extracted features belong, machinery fault diagnosis methods can be categorized into time domain-based, frequency domain-based, and time-frequency domain-based diagnosis methods [4]. The time domain-based features have intuitive physical meaning with low computation complexity while the classification, location, and severity of fault cannot be identified through these features. The frequency domain-based methods like Fourier transform and its variants detect the fault according to the spectrum of vibration signal. However, Fourier transform is not capable of dealing with nonstationary and nonlinear signal. Time-frequency domain diagnosis methods can decompose signal into multiple-scale components; the local characteristics of signal can be revealed. Moreover, the components are approximately stationary and linear, and the time or frequency features are then extracted to detect machinery faults.

Due to poor working environment, fault features are usually immersed in heavy background noise. Especially for incipient fault, the fault features are too weak to be extracted [5]. Time-frequency analysis techniques, such as Wigner-Ville distribution [6], wavelet transform [7], and EMD [8], are effective in suppressing signal noise. In Wigner-Ville distribution, cross-term interferences among multiple component signals impair fault feature because of quadratic form of Wigner-Ville distribution. Due to the limited length of wavelet basis function, energy leakage may happen for wavelet transformation, degrading the resolution signals in time domain and frequency domain. Empirical mode decomposition (EMD) suffers from mode mixing and end effect. SMITH [9] proposed the LMD to compute instantaneous frequency and the distribution of time-frequency energy of electroencephalogram signals in 2005. The method suppresses end effect, eliminates over-envelope and under-envelope, and reduces computational complexity. Compared with EMD, LMD achieves instantaneous amplitude and frequency without Hilbert transformation and fitting the upper and lower envelopes. Therefore, LMD not only overcomes the problems existing in EMD, but also concentrates more useful information in a few decomposition components, which is suitable to analyze nonlinear and nonstationary signal [10]. At present, the LMD integrating with machine learning methods has been developed to enhance fault features and identify fault patterns [1114].

Fault feature extraction from vibration signals is a critical step of fault diagnosis. However, in many real-world applications, there are some redundant features in high-dimensional data, reducing generalization performance of fault detection model. Therefore, removing redundant features from high-dimensional data has become an effective way to enhance the performance of fault diagnosis model [15, 16]. Aiming at the insufficient stability when variable predictive model class discrimination (VPMCD) was applied to small samples or in multi-correlative feature space, Tang et al. [15] proposed ARSFS method based on affinity propagation (AP) clustering, RReliefF, and sequential forward search for feature selection. The experimental results showed that ARSFS effectively identified the faults of the rolling bearing. Intelligence optimization methods have been commonly applied for feature selection, which achieves better identification accuracy than most existing feature selection approaches [16]. Support vector machine–recursive feature elimination (SVM-RFE) based on one-to-one or one-to-many strategy provides a feasible way to identify multiple machinery faults [1719]. Tang et al. [18] proposed a two-stage SVM-RFE, in which feature extraction and the feature subset selection were conducted sequentially. However, SVM-RFE barely takes the degree of the features effect on the classifier into account and ignores redundant features. Sometimes, the classification accuracy on subset consisting of M features from N (M<N) salient features may be higher than that on the subset consisting of N salient features in some applications.

Taking the above problem into account, minimum redundancy maximum relevance (MRMR) algorithm is introduced to evaluate the feature subset selected by SVM-RFE, and the optimal feature subset is thus yielded. In MRMR, mutual information is used to measure the relevance and redundancy of features, and the feature subset is achieved through two cost functions called information difference and information entropy [20]. This algorithm is called support vector machine–recursive feature elimination–minimum redundancy maximum relevance (SVM-RFE-MRMR). As seen later, the new method is capable of effectively dealing with the problem of traditional feature selection algorithm.

The structure of this paper is organized as follows: Section 2 briefly introduces the improved LMD method, and the elimination of end effect is illustrated. SVM-RFE-MRMR algorithm is proposed and the details are given in Section 3. Experiments are performed on a reducer test system to verify the effectiveness of the proposed method in Section 4. Finally, some conclusions are drawn in Section 5.

2. Improved Local Mean Decomposition

LMD is an adaptive time-frequency analysis method similar to empirical mode decomposition (EMD). Both methods can decompose a complex signal into several single component signals with physical meaning. Compared with EMD, LMD can diminish end effect and eliminate over-envelope and under-envelope with low computational complexity and less information loss [21].

LMD decomposes complicated signals into a set of product functions (PF). The instantaneous frequency of each PF is the product of an envelope signal and a frequency modulated signal. The time-frequency distribution of original signal can be yielded by combining the instantaneous amplitude and instantaneous frequency of all PF components. The schematic of LMD is shown in Figure 1.

Local extreme value is the premise for evaluating product functions. Due to the limited length of a signal, the endpoint of the signal may not be the extreme point. Then false components will gradually “pollute” the whole signal sequence inward from two endpoints, resulting in the divergence of envelope function and degradation of the decomposition results, namely, end effect. To deal with the problem, an improved LMD method is proposed. Firstly, support vector regression (SVR) model is utilized to extend both ends of the original signal, and then the extended signal is decomposed into a set of PFs via LMD.

Assuming a set of data , where , , and is the length of the samples, a SVR model is formulated as follows:where is a nonlinear mapping function to map the sample data into the high-dimensional feature space, is the weight vector, and is the offset.

Introducing penalty parameter and insensitive loss function, the SVR model can be achieved by solving the following convex optimization problem:where and are the slack variables.

This above optimization problem can be transformed into the dual problem by introducing kernel trick; i.e.,

where and are the Lagrangian multipliers.

Finally, the SVR model is obtained.

Here, radial basis function (RBF) is selected as the kernel function.where and is required to be prespecified.

Penalty parameter and the kernel parameter play a significant role in the model generalization. Particle swarm optimization (PSO) algorithm is introduced to find the optimal parameters. The hybrid PSO-SVR method treats the parameters and as the particles, and the velocity and location are constantly updated.

In above 2D searching space, population consists of n particles, where the th particle is expressed as a 2D vector .

Let be the velocity of th particle and denote individual and global extremums by and , respectively. In each iteration, the velocity and position of each particle are updated according to the following formula:where represents the inertia factor, d represents d-dimensional space, , represents the current number of iterations, represents the velocity of th particle, are acceleration factors, and and are random numbers between .

The process of optimized PSO-SVR is illustrated in Figure 2.

The fitness function is mean square error (MSE).where represents the fitness of th particle, is the output of SVR with input , is the expected output, and is the number of samples.

In order to demonstrate the efficiency of the improved LMD method, end effect evaluation index is introduced, which is calculated as follows:where is the effective value of original signal, is the effective value of the th PF, is the effective value of residual component, and where is the signal to be calculated. The larger the is, the greater the influence of end effect is.

Based on the above specifications, a synthesis signal is generated as

The sampling frequency is 500 Hz; the time interval is and it contains 300 points in total. Optimal parameter results of PSO with generations are shown in Figure 3. The optimized penalty parameter is , and the kernel function parameter is .

The extreme mirror extension method and the SVR-based extension method are performed on the simulation signal. LMD results of the extreme mirror extension and SVR-based extension methods are shown in Figures 4 and 5; the corresponding end effect evaluation indexes are and , respectively. It can be seen from Figure 4 that PF2 has obvious distortion in its left and right endpoint, so the method cannot characterize the trend of the original signal. Based on the comparison of two methods, it is induced that the SVR-based extension approach outperforms the other method.

3. Proposed SVM-RFE-MRMR Method

3.1. Support Vector Machine–Recursive Feature Elimination (SVM-RFE)

SVM-RFE was firstly introduced to rank genes from gene expression data for cancer classification [22]. SVM-RFE performs backward feature elimination. All features are firstly sorted in terms of the ranking scores of features, and the feature with smaller ranking score will be removed from candidate feature subset. Similar to the above procedure, SVM is literately trained on the new candidate feature subset, and the remaining features are resorted until one feature is left in candidate feature subset.

More specifically, given training dataset , where is the training sample, , is the class label of , is the number of training samples, and is the feature dimension of the sample, the decision function of SVM is expressed as , where is a vector and is a scalar. By introducing the kernel trick, the dual optimization problem of SVM can be written aswhere is a trade-off between the training accuracy and the model complexity.

In SVM, the importance of features is measured by their weight coefficients:where is achieved by solving the optimization problem in (12).

In SVM-RFE, the ranking score of th feature is defined as

The detailed algorithm of SVM-RFE is shown in Algorithm 1.

Input: Training samples , .
Output: Ranked feature list .
(A) Initialize: The original feature subset and the
ranked feature list .
(B) Loop following procedures until
(1) while    do
(2) Train a SVM with features in
(3) for all th feature in   do
(4) Compute using equation (14)
(5) end for
(6)
(7)
(8)
(9) end while
(10) return  

From Algorithm 1, it is induced that the ranking criteria can measure the correlation between features and decisions by judging the weight of features. Although SVM-RFE can eliminate those unimportant features one by one, it is noted that whether the deleted features are redundant is a problem that requires attention. To address the issue, MRMR is introduced in the paper.

3.2. Minimum Redundancy Maximum Relevance (MRMR) Method

Ding and Peng introduced a criterion firstly to measure relevance and redundancy of features by using mutual information called minimum redundancy maximum relevance (MRMR) [23]. MRMR measures the maximal sample information and the minimal relevance among features by defining maximum relevance criteria and minimum redundancy criteria, respectively. MRMR can achieve the ranking score of each feature. Specifically, given two feature vectors and with probability density and , let be the joint probability density of and . Their mutual information (MI) which measures the redundancy among features is defined as follows:

The minimum redundancy and maximum relevance are calculated aswhere and represent feature set and the number of features in , respectively, is class label, represents the MI of feature and class label , represents the MI of feature and feature , is the mean of MI, and represents the MI among the features.

MRMR intends to yield the features with minimum redundancy and maximum relevance through the following two criteria:

3.3. Support Vector Machine–Recursive Feature Elimination–Minimum Redundancy Maximum Relevance (SVM-RFE-MRMR)

According to the above introduction, SVM-RFE takes the correlation between features and decisions into account, ignoring the relation among features while MRMR can yield the features with minimum redundancy and maximum relevance. Therefore, SVM-RFE-MRMR is proposed to yield a subset of features with minimum redundancy and maximum relevance. Besides the ranking criterion defined by SVM-RFE, SVM-RFE-MRMR defines a new criterion for feature , which is formulated bywhere is the set of features that remain in each iteration. A ranking score is defined by normalizing SVM-RFE criterion and MRMR criterion of each feature; i.e.,where , .

The overall algorithm of SVM-RFE-MRMR is described in Algorithm 2.

Input: Training samples , .
Output : Ranked feature list .
(A) Initialize: The original feature subset and the
ranked feature list .
(B) Loop following procedures until :
(1)  while    do
(2) Train a SVM with features in
(3)for  all th feature in   do
(4) Compute using equation (14)
(5) Compute using equation (19)
(6)end for
(7) Find and using equation (20) and (21) respectively
(8)for  all th feature in   do
(9) Compute
(10)end for
(11)
(12)
(13)
(14)  end while
(15)  return  

4. Fault Identification and Analysis for Reducer

4.1. Experimental Device

Fault diagnosis for reducer is conducted on vibration data while it is difficult to acquire multiple fault signals in practice. Therefore, a reducer simulation platform is constructed to acquire fault signals. The mechanical system of the whole device is shown in Figure 6(a), and the number of each component is shown in Figure 6(b). This system mainly consists of motor, reducer, magnetic powder brake, vibration sensor, etc. The vibration sensor is installed at the 4# bearing. The magnetic powder brake is used to simulate mechanical load by adjusting the voltage. In the experiment, a tooth of gear b, a part of inner circle in 4# bearing, and a part of outer circle in 4# bearing were cut off by line cutting. Figure 7 shows broken tooth fault, inner circle fault, and outer circle fault, respectively. The sampling frequency was 4 kHz, and the motor rotating speed was 1420 r/min. Original waveforms of various states are shown in Figure 8.

4.2. Fault Feature Generation

The fault diagnosis process for reducer under different states is shown in Figure 9. After acquiring the original vibration waveform, we select reasonable signal segments that can reflect fault. Then we use the improved LMD algorithm to decompose the 4 kinds of vibration signals and obtain corresponding PFs (Figures 1013). It is particularly important to extract effective feature parameters. Common fault features include amplitude domain features, time domain features, frequency domain features, and time-frequency features. Approximate entropy as a time domain characteristic has good anti-noise, anti-interference ability and good robustness. Power spectral entropy as a frequency domain characteristic represents the distribution state of vibration energy in frequency domain. The more uniform the distribution of vibration energy, the less concentrated the signal distribution; namely, the signal is more complex. Therefore, we select the PF that contains fault feature frequency to calculate the root mean square, kurtosis, approximate entropy, and power spectral entropy in this paper. All these parameters are combined to form the original feature set as shown in Table 1. Only part of the data is listed due to limited space.

The feature parameters extracted from vibration signals under 4 states are arranged in sequence into a matrix of 120 rows and 8 columns, and each column of the matrix is normalized to .

The normalized equation can be expressed aswhere , , is the minimum value of , and is the maximum value of .

4.3. Multifault Classification Model Training

There are 4 kinds of fault states in this experiment, which belongs to multiclassification problem. The training of multifault classification model is shown as below:

Firstly, we label the different operation status as Table 1. The normal state is defined as 1, broken tooth fault is defined as 2, inner circle fault is defined as 3, and outer circle fault is defined as 4.

Secondly, 6 binary classifiers need to be constructed, such as (1v2), (1v3), (1v4), (2v3), (2v4), and (3v4). Corresponding vector is selected as the training set in the process of training each binary classifier, and we obtain 6 kinds of training models. Then we use corresponding test set to test the 6 kinds of results, respectively. The entire feature set contains 120 samples; each sample contains 8 feature characteristics. Lines 1–30 represent normal state (label 1); lines 31–60 represent broken tooth state (label 2); lines 61–90 represent the inner circle fault (label 3); and lines 91–120 represent the outer circle fault (label 4). We randomly select two-thirds of the samples in each label as the training samples, and the remaining as the test samples.

Finally, we apply the test samples to test the accuracy of classification model.

Optimal parameters need to be searched in the process of classification training. Common parameter optimization methods include grid method, genetic algorithm (GA) method, and PSO method. The process of searching optimal parameters via three methods and corresponding test results are shown in Figures 14, 15, and 16, respectively. The classification accuracy of each method is shown in Table 2. As can be seen from the table, two model parameters achieved by these methods are very close and their classification accuracy is consistent. The advantage of grid method is that it can search multiple parameters at the same time. For independent parameter pairs, it is easy to search in parallel and it takes less time when searching for fewer optimization parameters. Besides, the approach can find global optimal solution when the optimization interval is large enough and the step distance is short enough [24]. GA method has good global search performance, but the search speed is slow and the solution efficiency is low. PSO method has strong local search performance and fast convergence speed, but it is prone to premature convergence and then fall into local optimal solution [25, 26]. Therefore, the SVM model parameters are selected via grid method in this paper.

4.4. Fault Feature Selection

The process for selecting optimal feature subset via SVM-RFE-MRMR is shown in Figure 17. ① We use the training set of overall feature set to train support vector classifier (SVC) and get corresponding classification model. ② The classification model is used to calculate the ranking score. The feature attribute with minimum ranking score is eliminated according to the ranking criteria. ③ SVM is used to retrain the remaining features to obtain a new ranking score until a feature ranked table is found. ④ The ranked table is used to train SVM by defining several nested feature subsets . Prediction accuracy of SVM is used to evaluate the performance of these subsets so as to obtain the optimal feature subset. ⑤ We use two criteria, training set leave-one-out cross validation error recognition rate (Loo Error Rate) and independent test set error recognition rate (Test Error Rate), to determine the optimal feature subset.

Following the steps outlined above, we use the training set designed in Section 4.3 to derive the SVC training and get optimized parameters and . These parameters are then applied to rank the influence degree of each feature; the result is shown in Table 3. At the same time, the nested feature subset is obtained as shown in Table 4. In this table, the relationship among each subset is . Next, calculate the classification accuracy on the basis of optimized parameters and nested feature subset in first step, and use the Loo Error Rate criterion to determine the optimal feature subset. The result is shown in Table 5. Finally, select the nested feature subset to train SVC and get optimized parameters , . The two parameters and the test set are used to calculate the classification accuracy and Test Error Rate as before, which can evaluate the performance of predictive model. It can be seen from Table 6 that the Test Error Rate of feature subset is the lowest.

Through contrasting and analyzing Tables 5 and 6, we can conclude the following information: (a) different combinations of features can achieve the same effect; (b) some fault features contain analogous information; (c) the optimal feature subset contains the least number of features.

5. Conclusions

This paper presents a fault diagnosis method based on the combination of improved LMD and support vector machine–recursive feature elimination–minimum redundancy maximum relevance (SVM-RFE-MRMR) for feature redundancy of reducer vibration signals. The approach reduces feature dimensionality of original feature set through discarding those features that contribute less to classification or are insensitive to fault and preserving the best feature set made up of the optimal feature parameters. The results of the experiment verify that the proposed approach has good reliability and achieves high classification accuracy, which provides good reference value for condition monitoring of rotating machinery. In addition, this method can also be applied to multichannel fault signals processing and overcome the problem of high data bits and small samples.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work has been supported in part by the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions, University-Industry Cooperation Research Project in Jiangsu Province under Grant BY2016026-02, and the State Key Laboratory of Integrated Services Networks under Grant ISN10-10.