#### Abstract

In view of the problem that the actual degradation status of rolling bearing has a poor distinguishing characteristic and strong fuzziness, a rolling bearing degradation state identification method based on multidomain feature fusion and dimension reduction of manifold learning combined with GG clustering is proposed. Firstly, the rolling bearing all-life data is preprocessed by local characteristic-scale decomposition (LCD) and six typical features including relative energy spectrum entropy (LREE), relative singular spectrum entropy (LRSE), two-element multiscale entropy (TMSE), standard deviation (STD), RMS, and root-square amplitude (XR) are extracted and compose the original multidomain feature set. And then, locally preserving projection (LPP) is utilized to reduce dimension of original fusion feature set and genetic algorithm is applied to optimize the process of feature fusion. Finally, fuzzy recognition of rolling bearing degradation state is carried out by GG clustering and the principle of maximum membership degree and excellent performance of the proposed method is validated by comparing the recognition accuracy of LPP and GA-LPP.

#### 1. Introduction

Rolling element bearings are one of the most important components for carrying heavy loads and providing constant rotational speed in rotating machines [1]. With continuous operation of rotating machines for a long time, rolling bearings’ performance condition is changing all the time and that affects performance stability of the whole machine directly. Therefore, there is a practical significance for improving rotating machines’ service life by rolling bearing degradation state identification in real time.

The fault feature extracted from vibration signals is analyzed to determine the bearing state [2]. And fault feature extraction is the basis of realizing rolling bearing degradation state recognition. Scientific degradation features can characterize the degradation degree of rolling elements accurately and stably. Degradation features are mainly selected form time domain, frequency domain, time-frequency analysis, and signal complexity aspects. Considering that the actual vibration signal of rolling bearings is nonlinear and nonstationary, the ability of time and frequency domain statistics to characterize different degradation states of the same bearings is relatively poor. For instance, kurtosis is insensitive to initial damage [3] and it can hardly characterize the slight degradation state exactly. These years, the information entropy theory is widely used in signal processing and fault diagnosis and it develops into different forms of entropy with different properties such as approximate entropy (ApEn), sample entropy (SampEn), multiscale entropy (MSE), spatial information entropy (SIE), and fuzzy entropy (FuzzyEn) [4, 5]. These entropy features apply nonlinear dynamics theory which is different from traditional time-domain indexes in health monitoring and fault identification and have made achievements. Compared with the preset fault pattern recognition, rolling bearing degradation state recognition in its whole life is more ambiguous and complex. However, a single feature of vibration signals can only reflect fault characteristics of rotating machines at a certain fault degree and this can result in problems such as recognition inaccuracy, system instability, and ambiguous recognition results [6]. To address these problems, multidomain feature fusion is widely used in degradation state recognition and fault prediction of rotating machines [7, 8].

However, high-dimensional feature vector composed by multidomain features inevitably has the problems of information redundancy and characteristic conflict and the effective information is easy to be submerged by high-dimensional data [9]. Moreover, the use of high-dimensional data leads to a sharp increase in the amount of calculation that is not conductive to the real-time identification of rolling bearing degradation state. Manifold learning theory has the ability to identify low-dimensional nonlinear structure which is hidden in high-dimensional data and thus, in recent years, those manifold learning algorithms including locally linear embedding (LLE), locally preserving projection (LPP), isometric feature mapping (IsoMap), and Laplacian eigenmaps (LE) [10]. By the neighbor graphs obtained from high-dimensional features, LPP algorithm can gain its projection in the low-dimensional space. In this way, fusion and reduction of high-dimensional data are achieved. Compared with IsoMap and LLE, LPP has an advantage of simple calculation and fast processing speed. The results of LPP are closely related to nearest neighbor parameters that have no definite criterion. Therefore, the optimized parameters are obtained by repeated experiments. Reference [11] proposed a modified kernel distance measure sensitivity factor to measure the ability that fault features characterize different fault patterns. In view of this, LPP algorithm can be optimized by taking the sensitivity factor as object function. When the factor reaches its maximum, the effect of LPP feature fusion is best.

Considering that the actual rolling bearing degradation states perform strong fuzziness and the boundaries of different degradation states are difficult to determine, Fuzzy C-means (FCM) clustering [12] and Gustafson-Kessel (GK) clustering [13] are widely used in fault diagnosis. And Gath-Geva (GG) clustering improves FCM and GK algorithm by fuzzy maximum likelihood estimation distance norm and the clustering effect is better [14].

Based on the above analysis, a rolling bearing degradation state identification method based on fusion and dimension reduction of multidomain features and GG clustering is put forward in this paper. Six features computed from information entropy and time domain are fused by LPP optimized by genetic algorithm (GA-LPP) in order to separate the training points of different degradation degrees as clear as possible. Finally. degradation state recognition is realized by GG clustering and the principle of maximum membership.

#### 2. Multidomain Feature Extraction

##### 2.1. Time-Domain Features

In order to fully characterize different degradation states of rolling bearings, multi-time-domain features are needed to analyze. Common time-domain indexes include mean, standard deviation (STD), root mean square (RMS), root- square amplitude, skewness, peak to peak, waveform index, pulse index, margin index, partial slope index, and kurtosis. These features are examined from three aspects of ability to follow degradation trend, monotonicity, and data smoothing. Three features including STD , RMS , and root-square amplitude are selected to compose a three-dimensional feature matrix as follows:

##### 2.2. Entropy Features

Combined with LCD theory [4], relative entropy theory, and multivariate multiscale entropy theory, the entropy features including LREE, LRSE, and TMSE are constructed below.

###### 2.2.1. LREE and LRSE

(1) According to the LCD noise reduction criterion guided by the mutual correlation coefficient [15], the vibration signal is decomposed and reconstructed. Suppose that samples in degradation state and a single sample in normal state are acquired from the reconstructed signal.

(2) With the development of rolling bearing degradation state, the energy at characteristic frequency and its multiplications will become larger in the frequency spectrum which is obtained by LCD and Hilbert transform. For rolling bearings, different fault modes have different vibration characteristics. For a certain rolling bearing fault mode just as inner ring pitting, its vibration characteristic frequency and its frequency multiplication can be calculated by the following formula: where is roller number of the bearings, is roller diameter, represents pitch diameter, denotes contact angle, and is the rotor frequency.

(3) The sum energy of all samples at the characteristic frequency is computed as follows:

(4) At the characteristic frequency , the energy proportion of and in the sum energy is and :

(5) The LREE between normal state and degradation state is defined as follows:

(6) The singular value spectrum of normal samples and degradation samples can be obtained by singular value decomposition (SVD):

(7) Combined with the relative entropy theory, the related probabilities are defined as and :where

(8) The LRSE between normal state and degradation state is defined as follows:

###### 2.2.2. TMSE

Through LCD, there is enough degradation state information in the first two signal components whose cross-correlation coefficient is higher than others. For the two components whose sequence length is , after coarse grain, two-element embedding reconstruction, composite delay vectors, and thresholds setting, assuming that the two composite delay vectors’ embedding dimensions are and , the conditional probabilities are, respectively, and when similar capacity limit is . TMSE can be expressed as the natural logarithm of the conditional probabilities’ ratio:where is embedding vector and is delay vector.

The above three kinds of entropy features constitute another three-dimensional feature matrix, which can be expressed as

Above all, entropy features and time-domain features constitute a six-dimensional multidomain feature matrix:

#### 3. Optimized LPP Based on GA

##### 3.1. The Principle of LPP

LPP algorithm can retain the nonlinear structure and local characteristics inside the data when it is applied for high-dimensional data reduction. The algorithm principle can be shown as below [16].

For data samples with dimensional space , the matrix is its low-dimensional space samples, where is a dimensional vector (). The similarity matrix can be defined by the following formula:where and are the nearest neighbor points and is a constant.

LPP algorithm can be achieved by solving the following optimization problem:which needs to satisfy and is Laplace operator. The matrix reflects the density of the data distribution. Then, the transform matrix can be calculated by solving the generalized eigenvalue decomposition problem:

In the above formula, the matrix is sometimes a singular case. For this problem, the feature set is usually projected onto a PCA subspace and in this way the singularity can be eliminated. And then the following linear mapping can be obtained:

##### 3.2. Kernel Space Measure Sensitivity Factor

In order to evaluate the distinction effect of different degradation states by training samples after fusion and dimension reduction, Zheyuan et al. [17] propose that distance between different types of samples in kernel space is taken as the basis of feature evaluation. However, in clustering analysis, clustering center selection not only depends on the distance, but also depends on the degree of aggregation of the same type of points. Therefore, reference [11] takes the ratio of different types’ distance and divergence of the same type as the measure factor in kernel space. And this factor is regarded as the distinguishing criterion for high accuracy. The Gaussian radial basis kernel function is selected to calculate the distance between and . The form is as follows:

Then, the distance between two points can be expressed asOn this basis, the average distance between training samples of type and type can be calculated as follows:where ; . is the number of sample categories. and are the number of samples of type and type .

The average distance between different sample categories is

The divergence of the same sample category can be expressed aswhere is average of training samples of category .

According to the definition, the kernel space measure sensitivity factor is

##### 3.3. Optimization Based on GA

In order to make the fusion features gained from LPP dimension reduction distinguish different degradation states better, genetic algorithm (GA) is applied to optimize the kernel space where there are kinds of training samples. GA is a newly developing algorithm to search an optimal solution. The process of GA algorithm mainly includes population initialization, crossover, mutation, fitness calculation (individual evaluation), and selection (population replacement). The kernel space measure sensitivity factor is taken as the fitness function for optimization and the optimal individual is the case where the discrimination of different degradation states is highest.

Studies have shown that the clustering effect of LPP fusion features will change along with the changing kernel space. In the interest of finding the optimal kernel space, all training samples need to do affine transformation. Take 3D fusion features as an example, one training point is set as and affine transform angles are set as and . So the affine transformation matrix is

The new sample feature points after kernel space transformation can be computed by the following equation:

The two affine transform angles are used as the training entity and the individuals are randomly generated to complete initialization. By the optimization process of GA, the training sample clustering effect is found to be the best.

#### 4. GG Clustering Algorithm

For the training sample set , it is assumed that each sample is made up of characteristics: . After initialization, all samples are divided into categories; namely, the number of clustering classifications is . The clustering centers of all categories are and the membership matrix is . The element represents the membership degree of the training sample to the degradation state . In GG algorithm, the following objective function can reach the minimum value with the iterative adjustment of and :where is the weighted index generally taken to 2.

Different from FCM clustering, indicates the distance measure calculated by the covariance matrix in GG clustering. In that way, the data samples of different directions and shapes can be reflected effectively.

#### 5. The Process of Degradation State Identification

The original vibration signal is preprocessed by LCD. The time-domain features of STD, RMS, and root-square amplitude and the entropy features of LREE, LRSE, and TMSE are extracted from the selected signal components to compose the original characteristic set. The degradation state recognition processes are as shown in Figure 1.

The degradation state recognition algorithm mainly contains the following key steps:(i)*LCD Pretreatment.* According to the cross-correlation coefficient between the LCD components and the original signal, the useful components can be chosen. Considering the amount of information existing in components and the time of computation, the first two components whose coefficient is higher than others are selected for further analysis after many tests.(ii)*Feature Extraction and Fusion.* Six-dimensional multiple domain features are fused by LPP algorithm and the intrinsic dimension is three according to the maximum likelihood estimation. Therefore, the target dimension of feature fusion is set as three. On the basis of the maximum sensitive factor principle, the fusion features are optimized by GA to find the best kernel space for clustering analysis.(iii)The clustering centers are determined by GG algorithm and the rolling bearing degradation identification is achieved by the principle of maximum membership degree.

#### 6. Instance Verification

##### 6.1. Experimental Platform and Data Preprocessing

The bearing full-life data used in this paper comes from Hangzhou bearing test and research center [18]. As is shown in Figure 2(a), the test platform mainly consists of a ABLT-1A bearing test machine, a signal acquisition module, and state monitoring equipment. As Figure 2(b) shows, four CA-YD-139 acceleration sensors are, respectively, fixed up on four bearing test stations and connected to DH-5920 dynamic signal acquisition instrument. Four sets of rolling bearings can be intensively tested and multiple sets of full-life vibration data can be stored simultaneously. What is more, four thermal resistors and a YD-1 acceleration sensor are connected with a signal amplifier to monitor the operating parameters. When the index exceeds the alarm threshold, the test machine will stop working.

**(a)**

**(b)**

Deep groove ball bearings are widely used in rotating machinery. There is practical significance in engineering taking typical type of 6204 bearing as testing object. The real bearing in normal state is shown in Figure 3(a). The specific parameters are set as shown in Table 1.

**(a)**

**(b)**

When the test bench running time reaches 9600 minutes, the machine is shut down. Inner ring pitting occurs in the bearing at number 4 station and that result in bearing failure (as shown in Figure 3(b)).

The collected 960 groups of vibration data record the whole process of rolling bearing from normal state to failure state. Figure 4 shows the real-time monitoring curves of average amplitude versus time which reflect different degradation states of rolling bearing clearly. According to the change of curve amplitude and curvature, the rolling bearing performance variation can be initially divided into four states: normal state, slight degradation, severe degradation, and failure state. The details are presented in Table 2.

The original signal is preprocessed by LCD to get 10 intrinsic scale components (ISCs) and the first 5 ISCs are shown in Figure 5. Further the cross-correlation coefficient between each component and the original signal is calculated and the value relation is as follows:What is more, there are only the first and the third ISC whose coefficient is more than 0.5, respectively, 0.6487 and 0.5395. Therefore, the two components are taken as signal source for degradation feature extraction.

##### 6.2. Degradation Feature Fusion and Optimization

According to the degradation state division in Table 2, 100 groups of normal data, 100 groups of slight degradation data, 60 groups of severe degradation data, and 30 groups of failure data are selected as training samples. The characteristic indexes of different degradation states are extracted and normalized, respectively. The 3D time-domain feature points are shown in Figure 6. In the bearing degradation process from normal state to failure state, these three features are monotonically increasing and the effect of failure state distinguishing is obvious. However, the points of the other three degradation states are mixing severely and cannot be distinguished clearly. Although the time-domain features such as RMS are easy to get and have good stability to characterize degradation states, literature [19] indicates that these time-domain features are not sensitive to early bearing fault including slight degradation and severe degradation until bearing failure occurs. What is more, reference [20] points out that rolling bearings’ vibration signals present nonlinear characteristics, and these three traditional time-domain features are similar and can hardly make an accurate evaluation of the early degradation states of the bearings. These arguments explain clearly why the other three degradation states except for the failure one are mixed severely and cannot be distinguished by 3D time-domain features.

Similarly, the 3D complexity feature points made up of entropy indexes of LREE, LRSE, and TMSE (scale factor is 15) are shown in Figure 7. The entropy vector can distinguish normal state, slight degradation, and severe degradation on the whole. Nevertheless, in the failure state, the training samples’ clustering effect is unsatisfying. Reference [21] demonstrates that entropy indexes are sole dependent on the probability distribution of the event occurrence in bearing fault signals. They are sensitive to the degradation state changing but are more susceptible to spurious vibrations. When the bearing comes to failure state, the violent condition changing will make the vibration signals mixed with a lot of spurious components and the entropy features cannot stably characterize the failure state of bearings. Therefore, the 3D entropy features at failure state show strong discreteness in Figure 7.

In order to improve the discrimination effect of different degradation states, the above time-domain features and entropy features need to be fused. Therefore, the six-dimensional multidomain feature vectors are input to the LPP for feature fusion and dimension reduction. In order to ensure the information exchanging among the neighborhoods, the neighborhood number should not be too small; yet if is too large, the local features can be incomplete. Generally analyzed, the size of should be between and where is the intrinsic dimension and is the number of training samples in each category. In this paper, and . Thus, .

The clustering effect is better when that is presented in Figure 8. Compared with the time-domain features and the entropy features, the degradation state distinguishing ability of the LPP fusion features is better and the clustering effects of normal state, slight degradation, and failure state are satisfying. But the robustness of fusion features in severe degradation state is relatively poor and this results in the fact that the same severe degradation state is divided into two sample parts. Meanwhile, the sample class spacing is relatively small and the clustering effect is not good. So the process of feature fusion needs to be optimized.

The kernel space measure sensitive factor is taken as the objective function. According to formula (22) and formula (23), the kernel space is optimized by GA so that the factor has a maximum value. In order to improve the convergence speed and ensure the search quality, the population size is set as = 20~200. After several experiments, . The larger the crossover probability is, the higher the loss rate of excellent results is. But when the probability is too small, the search will be blocked. In general, crossover probability = 0.6~1.0 and here it is 0.8. Mutation probability generally should not be too large; otherwise GA will become a random search method and the precision and speed of convergence will be influenced. Therefore, the mutation probability .

As shown in Figure 9, after 26 iterations, the kernel space measure sensitivity factor tends to be stable and the maximum value is achieved. And the optimized affine transformation angles are and . Figure 10 presents the space distribution of optimized fusion feature points. In comparison with Figure 8, the optimized fusion features distinguish different degradation states better than the original features and especially the clustering effect of training samples in severe degradation state improves a lot. What is more, the different class distinctions are further widening. Thus, the optimization effect is obvious.

In order to furtherly illustrate the excellent performance of the proposed method, the sensitivity factors of time-domain features, entropy features, LPP fusion features, and GA-LPP fusion features are calculated, respectively, and the result is just as Figure 11 shows. The kernel space measure sensitivity factor of GA-LPP fusion features is the maximum one and it indicates that the fusion features have a strong ability to characterize different bearing degradation states after GA optimization.

##### 6.3. Degradation State Recognition Based on GG Method

According to the number of bearing degradation states, the number of clustering centers is determined as . The weighted factor is and the iterative stopping threshold value is . The matrix composed by GA-LPP fusion features is computed by GG clustering and the clustering center matrix is

In accordance with Table 1, every 5 groups of data are chosen randomly as testing samples from each degradation state. The selected 20 groups of data’s multidomain features are optimized by GA-LPP at the same affine transformation angles. The fusion feature space distribution is shown in Figure 12 where the testing feature points are well distributed around the clustering centers and the testing sample spacing is large enough. This method can effectively avoid identification misjudgment and improve the recognition accuracy.

The membership matrix is established based on grey correlation analysis. Based on this, bearing degradation state recognition is realized guided by the principle of maximum membership value. Table 3 is the membership matrix between the testing samples and each standard degradation state. By comparing the membership value of the same sample point and different degradation states, the recognition result is the degradation state whose membership value is maximum. Here are two LPP results before and after GA optimization. Without GA optimization, LPP fusion features judge slight degradation state as normal state and severe degradation state is mistaken as failure state. The accuracy of degradation state recognition is only 85%. In comparison, GA-LPP fusion features have a better distinguishing ability. 20 groups of identification results are in complete agreement with the real degradation states and the excellent performance of the proposed method is verified.

#### 7. Conclusion

In order to improve degradation state recognition accuracy in rolling bearing all-life cycle, this paper proposes a new degradation state identification method based on GA-LPP and GG clustering. Through the actual signal processing and analysis, the following conclusions can be obtained.(1)Compared with preset fault degrees, it is difficult to distinguish different degradation states in the bearing cycle life. Single domain features usually measure degradation states from only one perspective, so the ability of single domain features to characterize complex and fuzzy degradation states can be insufficient. In manifold learning theory, LPP algorithm can fuse multidomain features and reduce dimension to improve distinguishing effects of different degradation states.(2)The kernel space measure sensitivity factor is taken as the optimization criterion. GA algorithm based on kernel space transformation is applied to optimize the LPP feature fusion process which can separate different degradation samples better. In this way, the clustering effect of the same degradation state is more satisfactory and the accuracy is higher.(3)There is some engineering value combining GA-LPP multidomain feature fusion and GG clustering in the field of degradation state recognition.(4)The GA parameter setting has a certain effect on the convergence speed and the calculation precision. Even if the parameters are the same for repeated experiments, the results can fluctuate. Therefore, the following work is to improve the proposed method applicability by parameter optimization and enhancing GA searching stability.

#### Competing Interests

The authors declare that they have no competing interests.

#### Acknowledgments

This project is supported by National Natural Science Foundation of China (Grant no. 51541506).