Multiscale Permutation Entropy Based Rolling Bearing Fault Diagnosis
A new rolling bearing fault diagnosis approach based on multiscale permutation entropy (MPE), Laplacian score (LS), and support vector machines (SVMs) is proposed in this paper. Permutation entropy (PE) was recently proposed and defined to measure the randomicity and detect dynamical changes of time series. However, for the complexity of mechanical systems, the randomicity and dynamic changes of the vibration signal will exist in different scales. Thus, the definition of MPE is introduced and employed to extract the nonlinear fault characteristics from the bearing vibration signal in different scales. Besides, the SVM is utilized to accomplish the fault feature classification to fulfill diagnostic procedure automatically. Meanwhile, in order to avoid a high dimension of features, the Laplacian score (LS) is used to refine the feature vector by ranking the features according to their importance and correlations with the main fault information. Finally, the rolling bearing fault diagnosis method based on MPE, LS, and SVM is proposed and applied to the experimental data. The experimental data analysis results indicate that the proposed method could identify the fault categories effectively.
The vibration signals of mechanical systems, especially for ones with fault, often show mutation, nonlinearity, and nonstationarity because of the strike, velocity chopping, structure transmutation, loading, and friction. Hence, it is very crucial for mechanical fault diagnosis to extract the fault feature information from the nonlinear and nonstationary signal. A primary method for dealing with the nonlinear and nonstationary signal is time-frequency analysis , which has been applied to the mechanical fault diagnosis field widely for its ability to provide local information both in time and frequency domains of vibration signals . However, the time-frequency analysis method, such as wavelet transform or Hilbert-Huang transform [3, 4], which decomposes the vibration signal into several stationary monocomponent signals, cannot reflect the subtle dynamic changes of vibration signal effectively and, therefore, inevitably will have some limitations .
With the development of nonlinear dynamic theories, especially in recent years, a number of nonlinear parameters and methods, such as chaos theory, fractal dimension, and information entropy, have been applied to machine condition monitoring and fault diagnosis. For instance, Logan and Mathew elaborated the application of the correlation dimension to vibration fault diagnosis of rolling element bearing ; Jiang et al. used the correlation dimension in gearbox condition monitoring . However, reliable estimation of correlation dimension requires very long datasets, which might be difficult or even impossible to be achieved especially in online, real-time monitoring and diagnosis . Lately, approximate entropy (ApEn) was introduced and selected as a tool for rolling bearing health monitoring by Yan and Gao . Unfortunately, the estimation of ApEn depends heavily on the data length, and the estimated value is uniformly lower than the expected one, especially for a short dataset, and lacks relative coherence as well [8, 9]. In order to overcome the shortcomings of ApEn, the sample entropy (SampEn) was proposed by Richman and Moorman [9, 10]. However, ApEn and SampEn both measure the complexity of time series in a single scale. Based on SampEn, multiscale entropy (MSE) was introduced by Costa as an enhanced approach to evaluate the complexity of complex time series in different scales [11, 12]. MSE has been recently utilized to extract the fault feature information from rolling bearing vibration signal by Zhang et al. . However, the SampEn estimation will be affected by the nonstationarity, outliers, and artifacts of time series, which changes the standard deviation of time series and similarity criterion  and hence will cause a bad estimation of MSE. In addition, the computations of MSE are also very time-consuming, especially for a very long time series.
Recently, permutation entropy (PE) was proposed by Bandt and Pompe [14, 15] for measuring the randomicity and detecting the dynamic changes of time series. Compared with the parameters mentioned above, the computation of PE is simple, immune to noise, and suitable for online monitoring. Recently, Yan and Liu  viewed PE as a tool for status characterization of rotary machines and their research indicates that PE could effectively detect and amplify the dynamic change of rolling bearing vibration signals. Nicolaou and Georgiou  used PE and SVMs to detect the epileptic electroencephalogram. Their findings indicate that the low computational complexity of PE makes it a highly favorable feature to be employed as part of a system for real-time automated seizure detection.
However, like traditional single scale nonlinear dynamic parameters ApEn and SampEn, PE detects the dynamic changes and randomness of time series only in a single scale. Recently, multiscale permutation entropy (MPE) was introduced by Aziz and Arif in the literature  to measure the complexity of time series in different scales and is compared with MSE through analyzing the physiological time series and the results show that MPE is more robust than MSE in analyzing the presence of artifacts and white Gaussian noise.
As the vibration signals collected from normal rolling bearing are random and irregular, the randomness and the dynamic behavior of the vibration signal will change abruptly when the rolling bearing of equipment works under a bad condition. Due to the complexity of mechanical system, the vibration signal is much more complex and contains much more important information in different scales. Hence, MPE is employed to detect the dynamic changes and fault features from the rolling bearing vibration signal.
In the paper, firstly the PE values with different scales are served as initial feature parameters to extract fault feature information from the bearing vibration signal. Since the feature vector concludes MPE values in different scales, which will lead to a high dimension and information redundancy, and it is also difficult to find out the features containing the main fault information, in this paper the LS proposed by He et al.  is employed to refine the feature vector and rank the feature values according to their importance. Then the several most important features are reconstructed as the new feature vector for the SVM training and testing. Next, naturally, a multifault classifier needs to be constructed to fulfill the diagnostic procedure automatically. As support vector machine (SVM) has the merits of suitability for small sample data classification and fast training, in this paper, SVM is adopted to construct the multifault classifier [19–21].
The rest of the paper is organized as follows. In the second section, the definitions of PE and MPE are introduced, respectively. In the third section, the Laplacian score (LS) is introduced firstly, and then a new rolling bearing fault diagnosis method based on MPE, LS, and SVM is proposed. In the fourth section, the proposed method is applied to rolling bearing experimental data and some comparisons are made. Finally, the fifth section concludes the paper.
2. Algorithms of PE and MPE
2.1. Algorithm of PE
Permutation entropy (PE) was introduced recently to detect dynamic changes of time series by Bandt and Pompe [14, 15], which is based on comparison of neighboring values and therefore has the advantages of simple computation, less calculating amounts and time. Besides, it has been verified that, similar to Lyapunov exponents, PE is particularly useful and robust in the presence of dynamic or observational noise , and its algorithm is described as follows.
Consider a time series, , with length , which can be reconstructed as where is the embedding dimension and is the time delay. As described in [16, 22], for a given but arbitrary , the real values , contained in each , can be rearranged in an increasing order as
If there exist two elements in that have the same value, for example, , then we order the quantities according to the values of their corresponding ’s; namely, if , then is written. Accordingly, any vector can be mapped onto a group of symbols as where , . is the largest number of distinct symbols and is one of the m! permutations of distinct symbols, which is mapped onto the number symbols in -dimensional embedding space. When each such permutation is considered as a symbol, then the reconstructed trajectory in the -dimensional space is represented by a symbol sequence .
Therefore, if we suppose that the probability distribution for the distinct symbols be as , , where , then the PE for the time series can be defined as the Shannon entropy for the distinct symbols:
It is noticed that attains the maximum value, , when . For convenience, can be normalized by as
Obviously, . A smaller value of indicates that the time series is much more regular and the smallest value of (zero) means that the time series is very regular as the periodic signal. And a larger means a much more random time series and the largest possible value of (one) is realized when all permutations have equal probability, as is in the case of white noise . Therefore, PE is a very suitable tool for describing local order structure and amplifying the dynamic changes of time series.
There are three parameters to be considered in the calculation of PE, namely, the length of time series , embedding dimension, and time delay . Bandt recommended . However, in the following research we will find that seems to be the most suitable. In order to investigate the effect of and on computation of PE, five Gaussian white noise signals, respectively, with lengths 128, 256, 512, 1024, and 2048, are under our consideration. For convenience, their PE values are denoted by PE1, PE2, PE3, PE4, and PE5. Figure 1 shows their PE relationships with different and when .
As the Gaussian white noise signal is random and it should have an estimated value close to 1, therefore when is less than 2048, should be no more than 7 (where the estimated PE is smaller than 0.9). From Figure 1 it can be found that the difference between PE4 with length 1024 and PE3 with length 512 is only 0.0659 when . Hence when , is sufficient for PE calculation.
In addition, the time delay has a little effect on the estimation of PE. Take the Gaussian white noise signal with length 512 as an instance. Its PE is shown in Figure 2 with ranging from 1 to 6 in different embedding dimensions ( ranging from 2 to 8). And from Figure 2 it can be found that there are very small differences among the PEs between different time delays. Therefore, in this paper, we set .
2.2. Calculation of MPE
Multiscale permutation entropy (MPE) is defined as the PE set of time series in different scales and is calculated as follows .(1)Consider the time series, , which can be divided into several coarse-grained time series in the form as where is the scale factor. Obviously, when , is the original time series . When , is a coarse grained time series with length .(2)Calculate PE of each coarse-grained time series under the same parameters, and then plot these PE values as a function of scale factor . We call this procedure multiscale permutation entropy analysis.
In order to select the best for MPE calculation, we take the Gaussian white noise signal with length as an example. The MPEs are calculated under embedding dimension , and 7 when the parameters maximal scale factor and . Correspondingly, their consuming times are 0.1880 second (s), 0.6710 s, 3.8290 s, and 27.6710 s, when a desktop computer with 2.0 GHz, Pentium Dual-Core CPU, 2.0 GB RAM, and MATLAB (R2011a) platform is utilized. The MPE is plotted as a function of the scale factor and is shown in Figure 3. From Figure 3 it can be concluded that when is less than 6 ( and 5), with the increase of scale factor , the PE values change very slowly with a value close to 1 and could not reflect the dynamic changes sensitively. However, if is too large (e.g., ), the calculation of PE would cost much runtime (27.6710 s for the data with length ) and the PE value is less than the expected one. As when , the Gaussian white noise signal should have an expected PE value close to 1, based on these consideration, may be the most suitable.
3. The Proposed Method
3.1. Laplacian Score (LS) for Feature Selection
Theoretically, the extracted MPE features in 12 scales are able to identify the fault categories. However, the feature vector with a high dimension will be time-consuming and information inefficient for fault diagnosis. Therefore it is necessary to select the most important features which contain the main fault information from the 12 features, which could avoid the dimension disaster and improve the performance and efficiency of rolling bearing automatically fault diagnosis.
Laplacian score (LS) is a popular feature ranking based feature selection method and is mainly founded on Laplacian eigenmaps and locality preserving projection. Its basic idea is to estimate the features according their locality preserving power . In LS algorithm those features with the lowest scores are chosen as the most important ones. LS has not been widely used in rolling bearing fault diagnosis for feature selection; in this paper it is employed to decrease the dimension of the initial fault features and select the most important features to represent the main fault information of vibration signal.
3.2. The Proposed Method
Based on the advantages of MPE, LS, and SVM, the proposed rolling bearing fault diagnosis method is described as follows.(1)Calculate MPE for each rolling bearing vibration signal with parameter selection , , , and the maximum scale factor .(2)Then the obtained MPEs in all scales (i.e., 12 PEs) are viewed as the initial feature vector to represent the main fault information of vibration signal.(3)LS is employed to rank the 12 features from low to high score according to their importance and relationships with fault information.(4)The first several features with the least scores are selected as the new feature vector.(5)The new feature vectors are used to train and test the SVM based multifault classifier to fulfill fault diagnosis automatically.
The proposed method can be described briefly as in Figure 4.
In step (4) of the proposed method, as too many features will cost much training time and cause information redundancy while too few ones cannot completely reflect the fault information and get a lower accuracy, the novel feature vector in this paper is constructed by the first five features with the lowest LSs to achieve an effective fault diagnosis.
4. Analysis of Experimental Data
4.1. Experimental Data Description
The rolling bearing experimental data analyzed in the paper are kindly provided by Bearing Data Center, Case Western Reserve University [8, 23]. The tested bearing is 6205-2RS JEM SKF deep groove ball bearing, with motor load about 2206.50 watts and motor speed 1730 r/min. Here three rolling bearings with outer race fault (ORF), inner race fault (IRF), and rolling element fault (REF) are under our consideration and single point faults with defect sizes 0.5334 mm in diameter and 2.794 mm in depth were set into the tested bearing using electrodischarge machining. The data collection system consists of a high-bandwidth amplifier particularly designed for vibration signals and a data recorder with a sampling frequency of 12000 Hz per channel. A detailed signal collection depiction can be easily found in [8, 23].
4.2. Results and Discussions
The vibration signals of normal (NORM) bearing and bearings with fault (ORF, IRF, REF) are depicted in Figure 5.
It is unobvious to identify the normal and fault rolling bearings from each other especially differentiating NORM from REF and IRF from ORF. Therefore MPE is utilized to analyse above signals and their MPEs are plotted as a function of the scale factor in Figure 6.
From Figure 6 it can be found that the MPE with scale factor , namely, the PE of original vibration signal, could detect the dynamic changes of systems when the bearing works under a faulty condition. The PE of original vibration signal of normal rolling bearing is smaller than the PEs of rolling bearings with fault, which is coincident with Yan’s conclusions . In the literature  Yan and Liu concluded that the PE of normal condition is smaller than PEs of rolling bearings with worn rolling element and broken cage. When the rolling bearing is broken, the dynamic change will occur and can be detected and amplified by PE. Therefore when the rolling bearing is broken, the dynamic change will occur and cause a larger PE than that of normal condition. However, the single scale based PE only discriminates the faulty rolling bearing from normal ones (with threshold about 0.73) and cannot clearly identify the fault categories, that is, REF, IRF, or ORF. As the bearing vibration signals contain much more important fault information in other scales, it is essential to deal with the vibration signal using a MPE method.
If the extracted MPEs with 12 scales from the vibration signal are viewed as the feature vector, it will increase computational time and complexity, and the redundant information will decrease the classification accuracy. However, it is difficult for us to find out which feature contains the main fault information. In the literature , the statistical features of the MSE are used for reducing the dimension of feature vectors. However, the statistical features ignore the characteristics of the inner relation between the features. Therefore, in this paper the LS is employed to select the most important features to represent the vibration signal.
In this paper, normal and three faults (REF, IRF, and ORF) types of rolling bearing are under our consideration. Each type has 30 samples and there are totally 120 samples. By extracting MPE from each vibration signal, correspondingly, 120 initial feature vectors with 12 PEs can be obtained. For each fault type, 10 samples are randomly chosen for training and the remaining 20 samples are used for testing. Hence, a training dataset (with dimension ) and a testing dataset (with dimension ) are obtained.
Then, the LS is used to rank the 12 features according to their importance and the results are shown as follows: LS1 < LS2 < LS9< LS11 < LS10 < LS12 < LS6 < LS7 < LS8 < LS3 < LS4 < LS5,where the subscript stands for scale factor number. Therefore, the MPEs with , and 10 are adopted to compose the new feature vector.
Next, a multi-fault classifier consisting of three SVMs, that is, SVM1, SVM2, and SVM3, is trained, where SVM1 is used to distinguish normal from the fault, SVM2 is used to discriminate IRF from REF and ORF, and SVM3 is used to discriminate REF from ORF. The structure diagram of the multi-fault classifier is depicted as in Figure 7.
After training the SVM-classifier with the 40 training feature vectors, the remaining 80 testing features are used to test the trained SVM-classifier and the outputs of the multiclassifier is shown in Table 1, from which it can be concluded that the classification accuracy of the proposed method on testing data achieves a perfect level (100%) and no samples are misclassified, which indicates that the proposed method can identify the fault categories effectively.
For comparison, a multiclassifier based on back propagation (BP) neural network [24–26] consisting of two layers in which the node numbers of input layer and output layer are 8 and 4, respectively, is used to fulfill the same classification problems. For convenience, NORM is labeled 1, IRF is labeled 2 (the output of BP-classifier plus 1), REF is labeled 3 (the output plus 2), and ORF is labeled 4 (the output plus 3). The training and testing samples are the same as the above SVM-classifier. The classification results of BP-classifier are given in Figure 8. The results indicate that the BP-classifier also recognizes the fault categories with accuracy of 100%. However, the training time of BP-classifier is much longer than the SVM-classifier’s. Moreover, the accuracy cannot be high enough due to the limitations of “overfitting,” slow convergence velocity, and relapsing into local extremum easily .
In addition, in order to verify the essentiality of multiscale analysis using MPE, the PE value of the original vibration signal, namely, the MPE with scale factor is taken as the feature vector. Then train the SVM-classifier and BP-classifier with the same 40 training samples, respectively. And the outputs of SVM-classifier are given in Table 2, and the outputs of BP-classifier are depicted in Figure 9, respectively. It can be found that there are six testing data misclassified by both SVM- and BP-classifier with recognition rate of accuracy 92.5%. Therefore, the analysis results of Table 2 and Figure 9 indicate that the single scale based PE of original signal cannot reflect the nature of fault information, and it is necessary to handle the vibration signal using MPE method for getting much more fault information.
To verify that it is necessary and superior to refine the feature vectors using LS, without loss of generality, the MPE with scales 1, 2, 3, 4, and 5 are taken as the feature vector to train and test the SVM-classifier. After training the classifier, the outputs of testing data are given in Table 3. It is easy to find from the Table 3 that two testing samples are misclassified and the identification rate is 97.5%, which is lower than the proposed method (100%). Therefore, the analysis result indicates that it is essential to optimize features using LS.
In consideration of the nonlinearity and nonstationarity of rolling bearing vibration signal, a novel rolling bearing fault diagnosis method based on MPE, LS, and SVM is proposed. Permutation entropy (PE) is defined to detect the dynamic changes of time series. For the complexity of the mechanical system, the vibration signal always contains much more important failure information in different scales. Therefore, in this paper MPE is adopted to extract the nonlinear fault characteristics from vibration signal. Besides, in order to achieve the fault diagnosis automatically, the SVM is utilized to construct the multifault classifier. Meanwhile, to refine the feature vector and select the most important features, Laplacian score (LS) is employed for feature selection. Finally, the proposed method is applied to rolling bearing experimental data. Also, the SVM-classifier is compared with BP-classifier and the single scale based PE is compared with MPE by analyzing the experimental data, and the comparison result indicates that the proposed method could get much higher identifying accuracy and has verified the necessities of analyzing the vibration signal with MPE and selecting feature by LS as well. Finally, the proposed method is aiming to fault diagnosis of rolling bearing and has been verified as an effective way by experiment data. However, the proposed method also has some problems, such as the number selection of feature vector refined by LS, the construction of multiclassifier, and its generalization to other bearings or gear fault diagnosis, and they will be discussed and solved in the future work.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the Natural Science Foundation of Hunan Province, China (Grant no. 11JJ2026), the National Natural Science Foundation of China (Grants no. 51075131 and no. 51175158), and Hunan Provincial Innovation Foundation for Postgraduate (Grant no. CX2013B144). The first author would like to appreciate the kind help of C. Bandt. The authors would like to thank anonymous peer reviewers for their valuable suggestions.
E. Sejdić, I. Djurović, and J. Jiang, “Time-frequency feature representation using energy concentration: an overview of recent advances,” Digital Signal Processing, vol. 19, no. 1, pp. 153–183, 2009.View at: Google Scholar
N. E. Huang, Z. Shen, S. R. Long et al., “The empirical mode decomposition and the Hubert spectrum for nonlinear and non-stationary time series analysis,” Proceedings of the Royal Society A, vol. 454, no. 1971, pp. 903–995, 1998.View at: Google Scholar
J. D. Jiang, J. Chen, and L. S. Qu, “The application of correlation dimension in gearbox condition monitoring,” Journal of Sound and Vibration, vol. 223, no. 4, pp. 529–541, 1999.View at: Google Scholar
J. S. Richman and J. R. Moorman, “Physiological time-series analysis using approximate and sample entropy,” American Journal of Physiology—Heart and Circulatory Physiology, vol. 278, no. 6, pp. H2039–H2049, 2000.View at: Google Scholar
M. Costa, A. L. Goldberger, and C.-K. Peng, “Multiscale entropy analysis of complex physiologic time series,” Physical Review Letters, vol. 89, no. 6, Article ID 068102, 4 pages, 2002.View at: Google Scholar
M. Costa, A. L. Goldberger, and C. K. Peng, “Multiscale entropy analysis of biological signals,” Physical Review E, vol. 71, Article ID 021906, 2005.View at: Google Scholar
C. Bandt and B. Pompe, “Permutation entropy: a natural complexity measure for time series,” Physical Review Letters, vol. 88, no. 17, Article ID 174102, 4 pages, 2002.View at: Google Scholar
X. He, D. Cai, and P. Niyogi, “Laplacian score for feature selection,” in Advances in Neural Information Processing System, MIT Press, Cambridge, Mass, USA, 2005.View at: Google Scholar