#### Abstract

Targeting the nonstationary and non-Gaussian characteristics of vibration signal from fault rolling bearing, this paper proposes a fault feature extraction method based on variational mode decomposition (VMD) and autoregressive (AR) model parameters. Firstly, VMD is applied to decompose vibration signals and a series of stationary component signals can be obtained. Secondly, AR model is established for each component mode. Thirdly, the parameters and remnant of AR model served as fault characteristic vectors. Finally, a novel random forest (RF) classifier is put forward for pattern recognition in the field of rolling bearing fault diagnosis. The validity and superiority of proposed method are verified by an experimental dataset. Analysis results show that this method based on VMD-AR model can extract fault features accurately and RF classifier has been proved to outperform comparative classifiers.

#### 1. Introduction

Rolling bearings are widely used in industrial field, so the study on the method of rolling bearing fault diagnosis has a great significance. Fault diagnosis process mainly includes two important aspects: one is the fault feature information extraction and the other is pattern recognition and classification [1].

Nowadays, the research about fault feature extraction of rolling bearing receives widespread attention [2]. A considerable number of theoretical and experimental studies verified that autoregressive (AR) model is closely related to the characteristics of mechanical systems and AR model parameters are very sensitive to condition change [3–6]. Hence, the condition of system can be evaluated by the feature vector structured by AR model parameters exactly. However, the vibration signal of rolling bearing is typical nonstationary and non-Gaussian signal and AR model analysis is based on stationary random signal. Targeting this problem, some research combined AR model with empirical mode decomposition (EMD) [7, 8]. EMD is proposed by Huang et al. and this method is based on the local characteristic time scale of signal. By applying EMD method to original signal, a series of stationary intrinsic mode functions (IMFs) can be acquired [9]. Then AR model is established for some selected IMFs and the AR model parameters are treated as feature vectors. Nevertheless, due to the defects of the algorithm, mode mixing is an inevitable problem in EMD. When mode mixing arises, single IMF contains widely different characteristic time scales or a similar time scale appears in adjacent IMFs. The problem would cause each IMF to not be able to reflect the real physical meaning. To overcome this shortcoming, Wu et al. presented the ensemble empirical mode decomposition (EEMD) method using the statistical properties of white noise in 2004 [10, 11]. Dragomiretskiy and Zosso put forward variational mode decomposition (VMD) in 2013 which is entirely nonrecursive and the modes are extracted concurrently [12]. Recently, some researchers focus on VMD method combining with engineering practice [13–16]. An and Zeng analyzed the pressure fluctuation signal from a hydroturbine by using VMD method. Compared with EMD, the research showed that VMD could overcome the shortcoming of mode mixing effectively [13]. Liu et al. applied VMD and EEMD to detect monotonic component in the degradation signal from a wind turbine gearbox and VMD method got better performance [14]. Wang et al. proposed a method to detect multiple signatures caused by rotor-to-stator rubbing with VMD. VMD extracts all impact signatures successfully, which is superior to conventional EMD, EEMD, and empirical wavelet transform (EWT) [15].

The other key aspect of rolling bearing fault diagnosis is pattern recognition and classification. The pattern recognition method based on artificial neural network (ANN) classifier has been widely studied in mechanical fault diagnosis with its strong ability of self-organizing, self-learning, and nonlinear pattern classification performance [17]. While ANN needs a lot of typical fault data samples and experience knowledge to ensure the accuracy of the network, it brings about great difficulties in practical engineering application [18]. Also, with high-dimensional sample, the results are always unsatisfactory if dimension is not reduced or features are not preselected (e.g., with genetic algorithm (GA)) [19]. Support vector machine (SVM) has stronger generalization ability than neural network. It can get better results when solving the problem of high-dimensional sample classification [19]. However, the problem is that classification accuracy is affected by structural parameters of SVM such as penalty factor and kernel function parameter [20]. To have an ideal effect, scholars studied a series of optimization methods. Both the traditional optimization algorithm like cross validation (CV) and heuristic methods, such as GA and particle swarm optimization (PSO), can optimize the SVM parameters to some extent, while this process requires a lot of calculation and time [21–24].

In addition to ANN and SVM, random forest (RF), which is proposed by Breiman in 2001, has an excellent performance in field of pattern recognition [25]. Different from structural risk minimization principle of SVM, the learning method of RF is ensemble learning [25]. Essentially, RF contains multiple decision tree classifiers and there is no connection between any two decision trees. When the test data is sent into the random forest, every decision tree makes the decision of classification. The most popular class is voted and the final result depends on the majority of trees [19]. Recently, with the advantages of admirable generalization ability, simple structure parameters, and high classification accuracy, RF has received increased interest of scientists in the field of electronic tongue [26], digital soil mapping [27], land cover mapping [28, 29], corrosion monitoring [30], hyperspectral data [31], urban area classification [32], and fault diagnosis in spur gears [33]. Meanwhile, in the field of rolling bearing fault diagnosis, there is few research of RF classifier.

The main contribution of this paper is listed as follows. () A recently proposed method VMD is utilized for decomposing vibration signal from rolling bearing to obtain a series of stationary component signals. AR model is established for selected component signals and the parameters and remnant of AR models served as fault characteristic vectors. A fault feature extraction method based on VMD-AR model is presented in this work. () A novel random forest classifier is studied in the pattern recognition and classification of rolling bearing. To validate the superiority of RF, the comparisons between RF and some classification methods mentioned above, such as SVM, GA-SVM, and PSO-SVM, are conducted. () The effectiveness of proposed VMD-AR-RF method in this paper is confirmed by the experimental dataset which is from Case Western Reserve University Bearing Data Center. This dataset has become a criterion for testing algorithms in the field of rolling bearing fault diagnosis [34]. The remaining part of this paper is organized as follows. VMD algorithm, AR model principle, and RF classifier are investigated in Section 2. In Section 3, we present the rolling bearing fault diagnosis method based on VMD-AR-RF. The experimental dataset validation and results analysis are shown in Section 4. Finally the conclusions are drawn in Section 5.

#### 2. Theoretical Background

##### 2.1. Variational Mode Decomposition

The VMD method decomposes a real signal into a series of modes which have specific sparsity properties. All these modes can reconstruct the original signal. It can be assumed that each mode is compact around center pulsation which depends on the decomposition. Hence, the sparsity property of each mode is determined by its bandwidth in frequency spectrum. To have access to the bandwidth of every mode, the following scheme is proposed. () Hilbert transform firstly is applied to every mode to gain a unilateral frequency spectrum. () Then add an exponential tuned to each estimated center frequency and shift the frequency spectrum of every mode to “baseband.” () Observe Gaussian smoothness of demodulated signal and estimate the bandwidth of each mode. Then the problem in the decomposition process is defined as follows [12]:where is the original signal, are the modes, and are their center frequency. is the Dirac distribution. In order to confirm the reconstruction constraint, a quadratic penalty term and Lagrangian multipliers, , are introduced. The problem can be rewritten in the following format [12]:where is the balancing parameter of the data-fidelity constraint. Then the problem in (2) can be solved by means of the alternate direction method of multipliers (ADMM). The ADMM algorithm is listed in the following steps [12]:(a)ADMM optimization algorithm for VMD: Initialize Repeat For do Update End for For do Update End for Dual ascent Until convergence: .(b)Minimization with respect to :

In (3), should be updated. And the subproblem can be equal to the formulation below:

And the solution can be drawn by using Parseval/Plancherel Fourier isometry under L2 norm:(c)Minimization with respect to :

The minimization with respect to can be given as follows:

To summarize the above steps, the complete optimization algorithm for VMD can be illustrated in the following steps: Initialize Repeat For do Update for all Update End for Dual ascent for all Until convergence: .

Detailed theory and advantage of VMD were given in the literature [12]. For example, a mixed signal is defined by the following equation:where , , and the noise obeys uniform distribution within [0 0.1] and the mean is zero. EMD and VMD are adopted to decompose the mixed signal, respectively. Results are presented in Figure 1. Eight-component signal and a residual are produced by EMD. In this figure, only the front six components are shown. Another six components on the right side are from VMD. It is clear to observe the severe mode mixing in EMD modes, . Compared to EMD, VMD can handle mode mixing effectively. Two real components from corresponding to can be separated accurately.

**(a)**

**(b)**

##### 2.2. AR Model and Parameters

For system , assume is a white noise sequence to motivate the system and is output. Equation (13) can be established:where and are the model parameters of this system, , and is model order. If all are zero, the model is called AR model. And it can be illustrated by the identity below:

Multiply on both sides of (14) and get average. The result can be expressed as follows:where is autocorrelation function of . According to the properties of white noise, (17) can be proved:where is the variance of input . The relation between AR model parameters and the autocorrelation function of can be established. Equation (14) can be transformed into the following form:where is the Kronecker delta function. Rewrite (18) into matrix form as follows:

In the process of formula reasoning, the even symmetry property of autocorrelation function is used: . Equations (18) and (19) are called Yule-Walker equations of AR model. It can be seen that AR model with order consists of parameters, . When we know the autocorrelation functions, , the parameters of AR model can be calculated. These parameters can reflect the system condition sensitively [3]. So the AR model parameters can be adopted to structure the feature vector of rolling bearing system. As for the order for an AR model, we can get a best value based on Akaike information criterion (AIC). The AIC value of a model is given as follows:where is the variance of the residuals of the model, is the number of model parameters, and is the length of signal. The best model order is that with the lowest AIC value. The detailed theory of AIC is explained in the literature [6].

##### 2.3. Random Forest (RF) Classifier

RF is an ensemble learning algorithm, which was proposed by Breiman in 2001. Before RF, decision tree method and bagging method are also based on this idea [25]. In fact, RF is an ensemble of unpruned decision trees and the growing algorithm is the same as decision trees. Also, RF selects the best split mode in a randomly chosen feature subset on the basis of bagging idea. RF consists of trees , where . is -dimensional feature vector composed of feature parameters of signal. The outputs , where , , is the prediction for a classified object by the tree, and a collection of all individuals make a final classification decision. Figure 2 illustrates the workflow of RF algorithm and the algorithm process is interpreted as follows [19, 25]:(1)From a training dataset of samples and features, draw a bootstrap sample. Sampling method is randomly sampling with replacement. Each bootstrap also has samples.(2)For each bootstrap sample, grow a tree with the following modification: firstly, select a subset of from features randomly at each node. Then choose the best split mode. In the process of growing, no pruning is conducted. Finally, the tree is grown to the maximum size.(3)Repeat the steps above until the number of grown trees reaches .(4)Send the testing dataset into RF and aggregate the outputs from trees. And the classification result is determined by the majority vote.

Beyond the steps mentioned above, actually, in a bootstrap sample, some samples from training dataset are left out, while some others are repeated. It can be calculated that only 2/3 of the training molecules are applied to build the tree for each bootstrap, which is InBag data. The remaining 1/3 are called Out-of-Bag (OOB) samples. For the left 1/3 samples which have not been used to train tree, the classification error rate of corresponding tree can be estimated with these independent ones [19, 25].

Generally speaking, there are only two variables in RF. One is the number of trees in the forest () and the other one is the random subset of each node (). Furthermore, some researches have shown that the sensitivity of the parameters is weak [26]. This is a significant advantage to dealing with the engineering practice. Another point of concern is the random selected subset of features, . It makes the structure of the tree less complete and greedy and increases the possibility that some weak features can have access to the tree and combine with other features. Thus, the local characteristics of each sample can be magnified and the probability of wrong judgement caused by information loss can be reduced. All the votes by trees have a comprehensive assessment for a sample [30]. RF is a novel and powerful statistical classifier that is well used in other domains but is relatively unknown in the field of rolling bearing fault diagnosis [35].

#### 3. Rolling Bearing Fault Diagnosis Method Based on VMD-AR Model and RF Classifier

Summarizing the method and theory mentioned above, a novel rolling bearing fault diagnosis method based on VMD-AR and RF classifier is proposed. The working process is shown in Figure 3. The main steps are given as follows:(1)Acquire the vibration signal of rolling bearing under different conditions. For fault location problem, conditions include normal, inner race fault, outer race fault, and ball fault. For problem of fault severity degree, the conditions are concerned with the crack width or depth. Divide data of each condition into 2 groups randomly. One is training dataset and the other is testing dataset.(2)Decompose the vibration signal into several stationary component modes, . The exact number of depends on the specific characteristics of signal. Usually, the mode mixing problem will arise when the number of decomposed modes by VMD is too small. Also, too large number of modes will cost much computing time.(3)Establish AR model for each component mode. The order can be determined by AIC. The parameters and remnant variance served as the variable of feature vector: where is the number of component modes.(4)The feature vector for a sample is constructed as follows:(5)Train RF classifier by means of feature vectors corresponding to all conditions. According to literature [25], , which is the number of trees in forest, is set to 500. Another parameter, , in RF classifier which means the number of random subsets from all features can be set to the recommended value in literature [19, 25]: where is the number of total features. And check out the performance of trained RF classifier with the testing dataset.

#### 4. Experimental Analysis

To validate the effectiveness of the proposed approach, the experimental dataset from Case Western Reserve University Bearing Data Center (CWRU) is analyzed. This dataset is extensively adopted to test algorithms. The sampling frequency is 48 kHz and rotational speed of rotor is 1772 rpm. The bearing type is deep groove ball bearing (6205-2RS JEM SKF) and single point faults were introduced to test bearings [34].

##### 4.1. Case 1: Results for Fault Location Problem

Firstly, the study to distinguish different fault locations is performed. The dataset contains 4 different conditions (normal, inner race fault, outer race fault, and ball fault). The fault diameter of each fault type is 0.18 mm. There are 100 samples under each condition and 400 samples are acquired totally. In every condition, some samples selected randomly served as training data. The remaining samples are testing data. All the sample signals are decomposed by VMD algorithm. Then AR model is established for every component mode. In fact, the system condition is mainly decided by the first several AR parameters and the remnant variance. In this paper, we select the first 6 ones and remnant variance to construct feature vector. All training feature vectors are adopted to train RF classifier. In this work, we test different numbers of modes decomposed by VMD, respectively. Different proportions of the training samples to testing samples are also taken into account. The numbers of modes and the proportions of the training samples to testing samples are listed. Meanwhile, we performed 10 times for each test and the average accuracy can be achieved.

From Table 1, we can see the diagnostic accuracies are almost close to 100%. The results show that the method based on VMD-AR model and RF classifier has a high accuracy in the classification problem of fault location. We can also observe that the number of component modes has almost no effect on the classification rates. It is worth mentioning that high diagnostic accuracies can be acquired when only 80 training samples are applied to train the RF classifier. It indicates that RF classifier has an excellent generalization ability and self-learning ability.

Furthermore, to verify the superiority of RF classifier, the diagnosis results of SVM classifier using the same fault feature extraction method and calculating process are presented in Table 2. In this step, the state-of-the-art software LibSVM is adopted and all the default structural parameters are used [36]. One can see that both the amount of the training samples and the number of modes have an obvious influence on the diagnostic results. For 4 component modes, the recognition rates are relatively low. All the rates are smaller than 94%. The diagnostic accuracy can only reach 88.03% when the number of training sets is 80. Nevertheless, the accuracy will increase to 93.13% when the amount of training samples is 320. From another aspect, the diagnostic accuracies have marked improvement when using 5, 6, or 7 component modes. Compared with Table 1, the classification rates of RF are considerably higher than SVM.

##### 4.2. Case 2: Results for Fault Severity Level Problem

The second case study is the fault severity level problem. In the dataset from CWRU, the inner race fault data is selected. The fault diameters are 0.18 mm, 0.36 mm, and 0.53 mm, respectively. Following the same procedure above, the diagnostic results are shown in Table 3. From Table 3, we can observe that the classification rates are still high. The diagnostic accuracies are more than 98% in most cases. The results prove that the proposed fault diagnosis method can also handle the rolling bearing severity level classification effectively. Similarly, the accuracies have no evident changes with different numbers of component modes and proportions of the training samples to testing samples.

Similar tests are carried out on the SVM classifier and the results are given in Table 4. The diagnostic accuracies change with the number of modes. In particular, when the signal is decomposed into 4 component modes, the classification rates are low. When the training samples are 80, we can only acquire a diagnostic accuracy of 81.96%. Also, the amount of training samples has a strong influence on the diagnostic accuracy. Certainly, the superiority of RF is confirmed again.

##### 4.3. Discussion

From Table 2, it is clear to find that the classification rates of outer race fault can only be about 53% when using 4 component modes and fewer training samples. Figure 4 is the real output with 320 testing samples in one time test. A considerable amount of outer race fault testing samples is classified into inner race fault incorrectly. From Table 4, we can also observe the classification rates of fault diameters of 0.36 mm are unsatisfactory with 4 component modes and fewer training samples. A real output in one time test is displayed in Figure 5. On one hand, it is plausible that there is mode mixing problem when the signal is decomposed into 4 modes. This defect will bring more difficulties of classification for SVM. On the other hand, the classification success rate of SVM classifier is affected by structural parameters severely. In the further work, we use two optimization algorithms (GA and PSO) to select optimal penalty factor and kernel function parameter for SVM. Referring to literature [22, 23], the maximum generation is 50 and the number of populations is 20 in GA-SVM. For PSO, the maximum generation count is 50, the number of particles is set to 20, acceleration constants and , and inertia weight . The diagnostic accuracies are relatively low when using 4 component modes. Hence, the datasets with 4 modes are still adopted for further comparative study. The diagnostic accuracies and consuming time under different classifiers are expressed in Tables 5 and 6. Obviously, the classification performance and generalization ability of SVM can be significantly promoted by GA and PSO optimization algorithms. Nevertheless, the optimization procedure needs a mass of calculation and time. RF classifier has obvious advantages both in computation time and in accuracy. It is worth mentioning that RF is insensitive to the classifier parameters. A quite satisfactory result can be obtained when we select the recommended parameter value.

In general, the fault diagnosis success rates are all over 95% when the fault feature extraction method based on VMD-AR model is combined with different classifiers (RF, GA-SVM, or PSO-SVM). It proves that the VMD-AR model method is suitable for extracting feature of rolling bearing vibration signal which is always nonstationary. The comparative analysis indicates that the RF has a better classification capacity compared with the SVM at the time of handling the rolling bearing fault diagnosis problem while costing little time. The training strategy based on ensemble learning of RF is superior to the structural risk minimization principle of SVM in this field.

#### 5. Conclusion

In this paper, a rolling bearing fault diagnosis method based on VMD-AR model and RF classifier is put forward. Firstly, the VMD-AR model method is developed for feature extraction. Secondly, a novel RF classifier is applied to pattern recognition. Finally, the validity of this method is validated by an experimental dataset. The analysis result shows that the VMD-AR model method has a good performance in feature extraction of rolling bearing. Furthermore, the comparative analysis shows that the RF classifier combined with VMD-AR model has a higher diagnosis success rate compared with SVM, GA-SVM, and PSO-SVM, while the time consumption is little. This classifier is pretty suitable to deal with the rolling bearing fault classification problem.

In summary, the novel method based on VMD-AR model and RF classifier is valid for rolling bearing fault diagnosis and has a wide application prospect in engineering practice.

#### Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

#### Acknowledgments

The research work is supported by the National Natural Science Foundations of China (no. 11572167). The authors would like to express their sincere gratitude to Post doctorate Chao Liu for his improving suggestions.