#### Abstract

Accurately identifying faults in rolling bearing systems by analyzing vibration signals, which are often nonstationary, is challenging. To address this issue, a new approach based on complementary ensemble empirical mode decomposition (CEEMD) and time series modeling is proposed in this paper. This approach seeks to identify faults appearing in a rolling bearing system using proper autoregressive (AR) model established from the nonstationary vibration signal. First, vibration signals measured from a rolling bearing test system with different defect conditions are decomposed into a set of intrinsic mode functions (IMFs) by means of the CEEMD method. Second, vibration signals are filtered with calculated filtering parameters. Third, the IMF which is closely correlated to the filtered signal is selected according to the correlation coefficient between the filtered signal and each IMF, and then the AR model of the selected IMF is established. Subsequently, the AR model parameters are considered as the input feature vectors, and the hidden Markov model (HMM) is used to identify the fault pattern of a rolling bearing. Experimental study performed on a bearing test system has shown that the presented approach can accurately identify faults in rolling bearings.

#### 1. Introduction

Rolling element bearing failure is one of the foremost causes of failures in rotating machinery, and such failure may result in costly production loss and catastrophic accidents. Early detection and diagnosis of bearing faults while the machine is still in operation can help to avoid abnormal event progression and to reduce productivity loss [1]. Since structural defects can cause changes of the bearing dynamic characteristics as manifested in vibrations, vibration-based analysis has long been established as a commonly used technique for diagnosing bearing faults [2]. However, some nonlinear factors such as clearance, friction, and stiffness affect complexity of the vibration signals; thus it is difficult to make an accurate evaluation on the working condition of rolling bearings only through analysis in time or frequency domain as it does traditionally [3].

In order to overcome limitations of the traditional techniques, autoregressive (AR) model has been successfully applied to extracting features from vibration signals for fault diagnosis in recent years [4–6]. This is because AR model is a time series analysis method whose parameters comprise important information of the system condition, and an accurate AR model can reflect the characteristics of a dynamic system [7]. For example, AR model was combined with a fuzzy classifier for fault diagnosis in vehicle transmission gear [8]. Three distinct techniques of autoregressive modeling were compared for their performance and reliability under conditions of various bearings signal lengths [9]. A diagnosis method based on the AR model and continuous HMM has also been used to monitor and diagnose the rolling bearing working conditions [10]. However, when the AR model is applied directly to the nonstationary bearing vibration signals, the analysis results are imperfect since the estimation method of the autoregression parameters of the AR model is no longer applicable. Because the vibration signal is nonstationary, whereas the AR model is suitable for stationary signal processing, it is, therefore, necessary to preprocess the vibration signals before the AR model is generated.

Empirical mode decomposition (EMD) is an adaptive time-frequency signal processing method [11]. With EMD, a signal is decomposed into a series of intrinsic mode functions (IMFs) according to its own characteristics [12]. For example, a new fault feature extraction approach based on EMD method and AR model was used to process vibration signals of roller bearings [3]. However, when the EMD method is applied to the nonstationary signals containing intermittent signal components, the original signal cannot be decomposed accurately because of the problem of mode mixing [13]. To alleviate mode mixing, Wu and Huang developed ensemble empirical mode decomposition (EEMD) to improve EMD. By adding noise to the original signal and calculating the means of IMFs repeatedly, EEMD is more accurate and effective for signal decomposition [13]. Although the EEMD method has effectively resolved the mode-mixing problem, it is time consuming for implementing the large enough ensemble mean. That is to say, the algorithm efficiency will be greatly reduced. Aiming at solving this problem, the complementary ensemble EMD (CEEMD) method is proposed [14]. In this approach, the residue of added white noises can be extracted from the mixtures of data and white noises via pairs of complementary ensemble IMFs with positive and negative added white noises. The CEEMD method has the same performance as the EEMD, but the computational efficiency is greatly improved.

In this paper, we combine the advantages of CEEMD and time series model and propose a new method based on CEEMD and AR model for rolling bearing fault diagnosis. The CEEMD is used as the pretreatment to filter the signal and extract the IMF which is closely correlated to the filtered signal, and then the AR model of the selected IMF is established. The AR model parameters are used as the feature vectors to a classifier, where the hidden Markov model (HMM) is used to identify the fault pattern of a rolling bearing. The rest of this paper is organized as follows. In Section 2, the review of the fault diagnosis method based on AR model is presented, and the proposed method for rolling bearing fault diagnosis is discussed. The evaluations and experiments are presented in Section 3. Finally, concluding remarks are drawn in Section 4.

#### 2. Theoretical Framework

##### 2.1. Time Series Modeling

Autoregressive moving average (ARMA) model is the representative time series model, which can be expressed in linear difference equation form as where and are the parameters of the ARMA () model, is zero mean stationary random sequence, is white noise sequence, and and are model parameters to be estimated. The parameters of and are estimated by the time sequence of (), which is called the time series modeling. If , the ARMA () model will degrade to order MA() model, and if , the ARMA () model will degrade as order AR () model in (1). The AR model is stable and its structure is simpler than ARMA model. Therefore, the AR model will be established for characterizing the rolling bearing vibration signal, if the precision of the model is enough for expressing the system, which is expressed as where , is the length of the time series , is the order number, and . The is expressed as

It is critical to determine the order number of the AR model, because the accuracy of the order not only affects the accuracy of identification of the system, but also influences the stability of the system. In order to estimate the order of the AR model correctly, FPE criterion, BIC criterion, and AIC criterion are usually used [15], and they are expressed as FPE criterion BIC criterion AIC criterion After the model order is determined, the nonlinear least squares method can be used to estimate model parameters, and then the AR model with specific parameters is established.

##### 2.2. Complementary Ensemble Empirical Mode Decomposition

Complementary ensemble empirical mode decomposition (CEEMD) is an improved algorithm based on empirical mode decomposition (EMD). Through EMD process, any complex time series can be decomposed into finite numbers of intrinsic mode functions (IMFs), and each IMF reflects the dynamic characteristic of the original signal. The IMF component must satisfy two conditions: (a) the number of poles and zeros is either equal to each other or differs at most by one; (b) the upper and lower envelopes must be locally symmetric about the timeline. The basic principle of EMD method is to decompose the original signal into the form as shown in (7) by continuously eliminating the mean of the upper and lower envelope connected with the minimum and maximum of the signal [16]. Consider where is the vibration signal, is the IMF component including different frequency bands ranging from high to low, and is the residue of the decomposition process, which is the mean trend of .

The EMD method is a kind of adaptive local analysis method, with each IMF highlighting the local features of the data. However, EMD decomposition results often suffer from mode mixing, which is defined as either a single IMF consisting of widely disparate scales or a signal residing in different IMF components [17]. To make it clear, a simulated signal consists of a Gaussian-type impulse interference and a cosine component with 500 Hz frequency , and a trend term is used as an example. The equation of the simulated signal is expressed as where , , and .

The waveform of the simulated signal is shown in Figure 1, and the corresponding EMD results for the signal are shown in Figure 2, where the mode mixing happens.

To overcome the problem of mode mixing, the ensemble empirical mode decomposition (EEMD) was proposed [18], where Gaussian white noises with finite amplitude are added to the original signal during the entire decomposition process. Due to the uniform distribution statistical characteristics of the white noise, the signal with white noise becomes continuous in different time scales, and no missing scales are present. As a result, mode mixing is effectively eliminated by the EEMD process [18]. The EEMD decomposition result of signal is shown in Figure 3, where the added white noise amplitude is 0.25 times the original signal standard deviation, and the number of decompositions is 200 times.

It should be noted that, during the EEMD process, each individual trial may produce noisy results, but the effect of the added noise can be suppressed by large number of ensemble mean computations. This would be too time consuming to implement. An improved algorithm, named complementary ensemble mode decomposition (CEEMD), is suggested to improve the computation efficiency. In this algorithm, the residue of the added white noises can be extracted from the mixtures of data and white noises via pairs of complementary ensemble IMFs with positive and negative added white noises. Although this new approach yields IMF with a similar RMS noise to EEMD, it eliminates residue noise in the IMFs and overcomes the problem of mode mixing with much more efficiency [14]. The procedure on implementing CEEMD is shown below:(a) and are constructed by adding a pair of opposite phase Gaussian white noises with the same amplitude. Then and ;(b) and are decomposed by EMD only a few times, and IMF_{x1} and IMF_{x2} are ensemble means of the corresponding IMF generated from each trial;(c)the average of corresponding component in and is calculated as the CEEMD decomposition results; that is,
The flow chart of CEEMD is shown in Figure 4, where is the decomposition trials.

Figure 5 is the decomposition result by CEEMD for the signal . As compared to the result shown in Figure 3, the decomposition accuracies of EEMD and CEEMD are consistent, while EEMD takes 1.62 s and CEEMD only needs 0.13 s.

##### 2.3. Fault Diagnosis Based on CEEMD and Time Series Model

Based on CEEMD and time series model, a hybrid fault diagnosis approach can be designed. The hybrid approach combines the advantages of CEEMD method in the nonstationary signal decomposition with the ability of time series modeling in feature extraction. The flow chart of the developed approach is shown in Figure 6.

The main steps are as follows.

*Step 1*. The rolling bearing vibration signal is sampled and then decomposed by CEEMD with the process shown in Figure 4.

*Step 2*. The product of energy density and average period of the IMFs which is a constant value according to [19] is calculated using (10) and parameter is calculated using (11). Then the signal is filtered by comparing the parameter and the given threshold value; that is to say, when , the previous IMFs with the trend term need to be removed as noise and to rebuild the residual IMFs as filtered signal [19, 20]:
where is the energy density of the IMF, is the average period of the IMF, is the length of each IMF, is the amplitude of the IMF, and is the total number of extreme points of IMF.

*Step 3*. Equation (12) is used to calculate the correlation coefficient between the filtered signal and each IMF, and the IMF which is closely correlated to the filtered signal is selected for AR modeling [21]:

*Step 4*. The least square method is used to estimate the parameters vectors of the AR model established in Step* *3, and the parameters vectors are considered as the model feature vector.

*Step 5*. After scalar quantization by index calculation formula of Lloyds algorithm in (13) [22], the feature vector is used to train the HMM of each bearing working condition:
where is the length of the codebook vector, partition () is the partition vector with the length of , and is the feature vector for scalar quantization.

*Step 6*. A test vibration signal can then be acquired for diagnosis, and the model feature vector is first extracted. After scalar quantization, the feature vector is put into the well-trained HMMs, and the corresponding HMM which has the maximum probability is regarded as the classification result [23].

#### 3. Evaluation of the Method Based on CEEMD and AR Model

##### 3.1. Evaluation Using Simulated Signals

To demonstrate the validity of the method proposed in this study, three signals , , and are simulated as shown in Figure 7. The signal consists of a Gaussian-type impulse interference, a cosine component with 10 Hz frequency, a trend term, and white noise. The signal consists of a Gaussian-type impulse interference, a square wave with 65% duty ratio, a trend term, and white noise. The signal consists of a Gaussian-type impulse interference, a sawtooth wave with 15 Hz frequency, a trend term, and white noise.

Figure 8 shows the results of the CEEMD of signals , , and . Correlation coefficients between filtered signal and each IMF are illustrated in Table 1.

(a) |

(b) |

(c) |

It can be seen in Table 1 that the IMF which is closely correlated to the filtered signal is IMF5 for both signal and signal and IMF6 for signal . They are used to construct the AR models, and the corresponding feature vectors are estimated as shown in Table 2. After scalar quantization, the feature vectors are used to train the HMM for signal classification.

A total of 90 feature vectors were collected from three groups of signals using the proposed approach. One-third of the feature vectors in each condition were used for training the classifier and others were used for testing. The results of the signal classification are listed in Table 3.

Results in Table 3 indicate that the presented method based on CEEMD and time series modeling can effectively identify different signals, and the overall classification rate is 96.7%. For the purpose of comparison, the signal classification rates use the method based on time series modeling only, and the method based on EMD and time series modeling is also calculated. 88.3% and 93.3% classification rates are obtained, respectively. It is obvious that efficiency of the signal classification method proposed in this paper is improved to a certain extent.

##### 3.2. Evaluation Using Experimental Data

In order to illustrate the practicability and effectiveness of the proposed method, a bearing fault data set from the electrical engineering laboratory of Case Western Reserve University is analyzed [24]. The data set is acquired from the test stand shown in Figure 9, where it consists of a 2 hp motor, a torque transducer, a dynamometer, and control electronics. The test bearings support the motor shaft which is the deep groove ball bearings with the type of 6205-2RS JEMSKF. Vibration data was collected at 12,000 samples per second using accelerometers, which are attached to the housing with magnetic bases. The motor load level was controlled by the fan in the right side of Figure 9.

Figure 10 illustrates representative waveforms of the sample vibration signals measured from the test bearings under four initial conditions: (a) signal from a healthy bearing, (b) signal from a bearing with inner ring defect, (c) signal from a bearing with rolling element defect, and (d) signal from a bearing with outer ring defect. These signals were measured under 0 hp motor load with the motor speed of 1797 rpm. The decomposed IMFs of these signals are shown in Figure 11.

**(a)**

**(b)**

**(c)**

**(d)**

**(a) No defect**

**(b) Inner ring defect**

**(c) Rolling element defect**

**(d) Outer ring defect**

Correlation coefficients calculated between the filtered signal and each IMF are shown in Table 4.

The IMF which is closely correlated to the filtered signal is IMF2 for signal (a) and IMF1 for signals (b), (c), and (d), respectively. These IMFs are used for AR model construction. The model order estimation curves of the four conditions based on the principle of FPE criterion are shown in Figure 12. We can see that when the model order is 6, each model's residual tends to be stable. Therefore the model order is selected as 6, and the results of parameters estimation are listed in Table 5.

**(a) No defect**

**(b) Inner ring defect**

**(c) Rolling element defect**

**(d) Outer ring defect**

The parameters in Table 5 were quantified by Lloyds algorithm in (12) as feature vectors for training the HMMs of different conditions. The results of quantization are revealed in Figure 13.

**(a) No defect**

**(b) Inner ring defect**

**(c) Rolling element defect**

**(d) Outer ring defect**

A total of 160 feature vectors were collected from the four conditions, half of the feature vectors were used for training the classifier and others for signal classification, and the classification results are listed in Table 6. Out of 80 test feature vectors, just two cases were not correctly classified, and the overall classification rate is 97.5%.

For comparison, Tables 7 and 8 list classification results based on time series modeling using measured signal directly and based on EMD and time series model method. From the comparison results, the proposed method is efficient for rolling bearing fault diagnosis, and the overall classification rate of the proposed method is higher to a certain extent than the other two methods mentioned above.

#### 4. Conclusions

Aiming at diagnosing rolling bearing faults, a hybrid approach based on CEEMD and time series modeling is proposed in this paper. The CEEMD method can decompose the nonstationary signal into a series of IMFs with low computation. AR model is an effective approach to extract the fault feature of the vibration signals and the fault pattern can be identified directly by the extracted fault features without establishing the mathematical model and studying the fault mechanism of the system. In this paper, the CEEMD method is used as a pretreatment, which can increase the accuracy of the AR model for the measured signal, and the AR model of the IMF which is closely correlated to the filtered signal is established to extract the fault feature parameters. Comparing to the EMD-AR approach and the direct modeling approach where raw signals are directly used as input for AR modeling, a higher classification rate was shown to be achieved by using the new approach (e.g., 96.7% for simulated signals and 97.5% for experimental data). Meanwhile we anticipate that the proposed method can also be used for incipient fault diagnosis in rolling bearing, where further experiments are needed to verify the accuracy. Since the approach presented in this study is generic in nature, it can be readily adapted to a broad range of applications for machine fault diagnosis.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work has been supported in part by the National Natural Science Foundation of China (no. 61101163 and no. 51175080) and the Nature Science Foundation of Jiangsu Province of China (no. BK2012739).