#### Abstract

The research of rolling element bearings (REBs) fault diagnosis based on single sensor vibration signal analysis is very common. However, the information provided by an individual sensor is very limited, and the robustness of the system is poor. In this paper, a novel fault diagnosis method based on coaxial vibration signal feature fusion (CVSFF) is proposed to fully analyze the multisensor information of the system and build a more reliable diagnostic system. An ensemble empirical mode decomposition (EEMD) method is used to decompose the original vibration signal into a number of intrinsic mode functions (IMFs). Then the autocorrelation analysis is introduced to reduce the random noise remaining in IMFs. After that, the Rényi entropy is calculated as the feature of bearings. Finally, the features of coaxial vibration signal are fused by a multiple-kernel learning support vector machine (MKL-SVM) to classify bearing conditions. In order to verify the effectiveness of the CVSFF method in REB diagnosis, eight data sets from the Case Western Reserve University Bearing Data Center are selected. The fault classification results demonstrate that the proposed approach is a valuable tool for bearing faults detection, and the fused feature from coaxial sensors improves fault classification accuracy for REBs.

#### 1. Introduction

The state of bearing is very important to efficient operation of mechanical equipment. Fault bearing will cause periodic impacts on vibration, which will lead to problems of other parts of mechanical system. Therefore, it is of great significance to find out bearing fault in time and replace bearing to avoid the breakdown of machine.

With the development of information technology, judging bearing state by signal analysis has become an important trend of condition-based monitoring (CBM) [1]. In recent years, many methods have been applied to fault detection of REBs, such as vibration signal, oil, temperature, and acoustic emission analysis. Among all these methods, vibration signal analysis is the most widely used and effective method owing to the great information that vibration signal contains [2, 3]. If bearing is operating with local spall, it will cause vibration impulse [4]. Through analysis of the collected vibration signals, the fault impact characteristics can be obtained, so as to realize bearing fault diagnosis. In the research of bearing fault diagnosis based on vibration signals, many papers focus on the signal collected from a single sensor. However, for the complex mechanical system, it is uncertain to use the information of an individual sensor for fault diagnosis, which will lead to unreliable diagnosis results in many cases.

Information fusion is a technology that merges data to obtain more consistent, informative, and accurate information than the original raw data that are mostly uncertain [5]. Some scholars have made achievements in bearing fault diagnosis based on multisensor information fusion technology. These research studies are mainly divided into multisource data fusion relating to the data from the same kind of sensors and different kinds of sensors. Owing to the advantages of vibration signals, many scholars adopt the method of multiple vibration sensor signal fusion in bearing fault diagnosis. These signal fusion studies can be classified as data-level fusion, feature-level fusion, and decision-level fusion.

Yan et al. [6] proposed a concept of space-time fragments, and vibration signals captured by multiple sensors are fused at data level. For feature-level fusion, Jiang et al. [7] developed a feature-level information fusion methodology of fault diagnosis in rotating machinery. Tao et al. [8] presented a novel bearing fault diagnosis method using multivibration signals and deep belief network (DBN). Banerjee and Das [9] suggested a hybrid model for an on board fault-tolerant control system by vibration data fusion. Wang et al. [10] fused vibration data features as the health indicator of bearing status and got a good representation of bearing defect conditions. Zhou et al. [11] addressed a feature fusion approach based on NCA and coupled hidden Markov model. Hong et al. [12] introduced a preprocessing model of bearing using wavelet packet-empirical mode decomposition (WP-EMD) for feature extraction of vertical and horizontal vibrations. For decision-level fusion, Hui et al. [13] proposed an automated bearing fault diagnosis model that employs SVM and the Dempster–Shafer evidence theory in classification.

In the aspect of different kinds of signals fusion, some scholars have carried out bearing fault diagnosis based on the fusion of vibration signals and other types of signals. Safizadeh and Latifi [14] presented a new method of bearing fault diagnosis using the fusion of two primary sensors: an accelerometer and a load cell. Lu et al. [15] designed a sound-aided vibration signal adaptive stochastic resonance (SAVASR) method for bearing fault detection. All those works indicate that multisensor information fusion methods have higher classification accuracy than single sensor data analysis in bearing fault diagnosis.

Nevertheless, there are still some problems in vibration signal source of the above bearing fault diagnosis research studies. The authors of [6, 9, 10] do not specify the installation location of vibration sensors. The experiments of papers [7, 9] have too many sensors, which make the system more complex. The researchers [11, 12, 14, 15] only collect vibration signals of one or multidirections of a single bearing. In the experiment Case 2 of [11], vibration sensors are only installed on the base of the test rig. As a matter of fact, the location of vibration sensor is very important in fault diagnosis of mechanical equipment. Vibration data are composed of multiple vibration source signals and noise signals. It must pass through multiple interfaces to reach an accelerometer, which will definitely cause energy dissipation. Because a shaft and bearing inner race are rigidly connected, the shaft plays a role of vibration transmission between coaxial bearings. The fault signals of coaxial bearings are usually similar in a frequency domain. Therefore, the coaxial vibration signal feature fusion (CVSFF) algorithm proposed in this paper takes vibration sensors of coaxial bearings as data sources. Enough information about REBs can be obtained from limited multisource data. Then, the characteristics of this information are fused to realize bearing fault diagnosis.

In the research of bearing fault diagnosis based on information fusion technology, feature-level fusion is widely used. Among the existing algorithms, a support vector machine (SVM) has been widely used because of its good classification performance. The researchers of papers [7, 9, 13] used SVM as a classifier for fault classification and have achieved good results. However, the classification performance of SVM is greatly affected by a kernel function. The determination of kernel function depends on human experience [16]. To solve this problem, multiple kernel learning (MKL) methods have been proposed. MKL learns the kernel function and classifier parameters simultaneously, which can effectively solve the problem of kernel function selection. Meanwhile, the SVM trained by MKL has more flexibility and higher classification accuracy. Many scholars have applied MKL to SVM model optimization and obtained good results [16, 17].

Based on the abovementioned analysis, a new method of fault diagnosis based on CVSFF is presented in this paper. First, an EEMD method is used to decompose the original vibration signals collected from bearings at both ends of a shaft. Then, autocorrelation is carried out to reduce the random noise in IMF components. After that, an energy ratio of each IMF component is calculated to extract probability mass function (PMF). The Rényi entropy feature matrix of coaxial sensors is obtained based on PMF. Finally, different states of REBs are classified by MKL-SVM.

This paper is organized as follows: the proposed method of CVSFF is presented in the next section. In Section 3, the algorithm is validated and analyzed by experimental data. Meanwhile, MKL-SVM classification based on an individual sensor and SVM, genetic algorithm-optimized SVM (GA-SVM), and particle swarm optimization SVM (PSO-SVM) based on coaxial features are carried out to evaluate the effectiveness of CVSFF. Section 4 presents the conclusion.

#### 2. Methods

##### 2.1. EEMD

There are many time-frequency data analysis methods to decompose nonlinear and nonstationary time series into a set of components, such as empirical mode decomposition (EMD) [18], ensemble empirical mode decomposition (EEMD) [19], variational mode decomposition (VMD) [20], and broadband mode decomposition (BMD) [21, 22]. Among all those methods, EEMD is the most widely used algorithm in CBM of REBs. EEMD can significantly improve the decomposition effect by reducing mode mixing. The two important parameters used in the EEMD algorithm are ratio *k* for standard deviation of white noise to standard deviation of signal and the total number *M* of EMD. Huang suggested that *k* is generally set as 0.2 [19]. Moreover, the amplitude of white noise should be reduced appropriately for the signal mainly composed of high-frequency components and increased for the signal mainly composed of low-frequency components. Besides, Huang found that *M* follows the statistical law of the following equation.where is the maximum relative error of signal decomposition. In this paper, *M* is taken as 100, and *k* is taken as 0.2.

The specific decomposition steps of EEMD are as follows:(1)Add random white noise *kσ*_{x}*n *(*t*) to *x *(*t*) as shown in the following equation, where *n *(*t*) is the Gaussian white noise with mean value 0 and standard deviation 1 and *σ*_{x} is the standard deviation of signal *x *(*t*):(2)EMD decomposition of signals *x*_{m}(*t*).(3)Repeat Steps 1 and 2 for *M* times.(4)The average value of each IMF component is obtained by decomposing *M* times EMD, and the global average is obtained:where and are the *i*^{th} IMF and residual component, respectively.

##### 2.2. Autocorrelation

An autocorrelation function describes the relationship of a signal at different times. The autocorrelation function of signal *x* (*t*) is defined as

According to the properties of autocorrelation function, a periodic signal has the same cycle as an original signal after autocorrelation. Furthermore, the autocorrelation function of random noise attenuates quickly and tends to zero with the increase of time delay *τ*. If the periodic signal contains random noise, the autocorrelation function can be used to reduce noise. In this paper, autocorrelation function is applied to noise reduction of IMFs, so as to retain the useful periodic signals in IMFs and reduce random white noise.

##### 2.3. Rényi Entropy

Self-information *I*(*x*_{1}) refers to the amount of information contained in an event *x*_{i} of a physical system. Different events in a physical system contain diverse amounts of information. So *I *(*x*_{1}) is a random variable that cannot be used as a measure of information about the whole system.

Shannon [23] defines the mathematical expectation of information as information entropy, that is, the average amount of information in a source:where *p*_{i} is the probability mass function (PMF).

Shannon entropy is a nonparametric measure, and it is well known that it does not have significant sensitivity when dealing with noisy data. For this reason, Rényi entropy [24] has been chosen as another potential entropy measure, which is defined as follows:

The parameter *α* in Rényi entropy can be used to make the entropy more or less sensitive to particular segments of the probability distributions. *α* ⟶ 0 causes the Rényi entropy to become highly sensitive to changes in the tails of the distribution. And for *α* ⟶ 1, it reduces to the Shannon entropy, and hence, the Rényi entropy becomes more sensitive in the regions where the bulk of PMF is located [25].

In order to obtain the frequency distribution change of vibration signal’s energy for rolling bearing, *p*_{i} is the energy ratio of IMFs to total energy:where is the energy of IMF_{i}, , *n* is the number of IMFs, *N* is the number of points of signal , and .

The Rényi entropy is a parameter that characterizes the statistical properties of random variables and reflects the randomness of variables. The energy distribution of normal rolling bearing vibration signal in each frequency band is uniform. It means that the energy distribution is uncertain, so entropy is relatively large. When rolling bearing spall occurs, the energy is mainly distributed in the resonance frequency. The uncertainty of energy distribution is relatively reduced, so the entropy decreases. Therefore, the Rényi entropy is a sensitive feature for REBs classification.

##### 2.4. MKL-SVM

SVM is a machine learning method based on statistical learning theory and structural risk minimization principle and is developed by Vapnik and his group [26]. The optimization classification function is as follows:where *K*(·, ·) is the kernel function associated with a reproducing kernel Hilbert space (RKHS) *H*, **x**_{i} is the *i*^{th} training data, **x** is the data to be classified, and and are the unknown coefficients.

There are many kinds of kernel functions, such as linear kernel function ; polynomial kernel function ; Gaussian radial basis kernel function ; and sigmoid kernel function .

The kernel function and SVM model parameters directly affect the performance of the SVM classifier. However, it is difficult to map the sample into a high-dimensional feature space by using a single kernel function for complex classification problem. In recent years, many scholars have carried out relevant research on MKL-SVM. In order to select the appropriate kernel function, MKL-SVM learns by combining different kernel functions and thus has more flexibility, better generalization ability, and stronger model interpretation ability.

In MKL framework, *K *(·, ·) is a convex linear combination of a set of basic kernels:where *d*_{m} is the weight of *K*_{m}(·, ·) obtained by sample learning. The decision function of MKL is as follows:where each function *f*_{m} belongs to a different RKHS *H*_{m} associated with a kernel *K*_{m}. Rakotomamonjy et al. [27] proposed a simple MKL method to learn both the coefficients and the weights *d*_{m}. They adopted the gradient method to solve the MKL problem. The optimal solution is obtained by calculating the gradient of the objective function about to *d*_{m}:

Finally, the decision functions of MKL-SVM are obtained as follows:

##### 2.5. Structure of CVSFF

The method process of CVSFF proposed in this paper is as follows: Step 1: collecting signals from coaxial vibration sensors Step 2: decompose vibration data through EEMD and obtain respective IMFs Step 3: denoise the IMFs by the autocorrelation function Step 4: extract the Rényi entropy of IMF components Step 5: randomly select the training set and test set from feature matrix Step 6: train MKL-SVMs Step 7: input the test set into MKL-SVM models, and output the classification results

The method flowchart is shown in Figure 1.

#### 3. Experiments Analysis and Discussion

##### 3.1. Data Set

The bearing data diagnosed in this paper were obtained from the Case Western Reserve University (CWRU) Bearing Data Center. These data sets have been considered as a benchmark and analyzed by many researchers. As shown in Figure 2, the test rig consists of a 2 hp motor, a torque transducer, and a dynamometer.

Two coaxial accelerometers are installed at both ends of the motor to measure vibration signal of the corresponding bearing. In this paper, these two coaxial vibration signals in the drive end and fan end are used for fault diagnosis.

Smith and Randall [28] used three established bearing diagnostic techniques to provide a benchmark analysis of these widely used data sets. According to the data analysis conclusion in [28], data of eight states that are difficult to diagnose by the benchmark method were selected for analysis in this paper, as shown in Table 1.

The last column in Table 1 means categorisation of the benchmark method proposed in that paper. The explanation of diagnosis category is as follows: Y1: data clearly diagnosable and showing classic characteristics for the given bearing fault in both the time and frequency domains Y2: data clearly diagnosable but showing nonclassic characteristics in either or both of the time and frequency domains P1: data probably diagnosable; e.g., the envelope spectrum shows discrete component at the expected fault frequencies, but they are not dominant in the spectrum P2: data potentially diagnosable; e.g., the envelope spectrum shows smeared components that appear to coincide with the expected fault frequencies N1: data not diagnosable for the specified bearing fault but with other identifiable problems (e.g., looseness) N2: data not diagnosable and virtually indistinguishable from noise, with the possible exception of shaft harmonics in the envelope spectrum

Randall only used the benchmark method to analyze four sets of data with data set number 169, 170, 171, and 172 for inner ring fault of the drive end with 12 kHz sampling frequency. Data set 171 was selected from the four groups with poor diagnostic results. The reason for choosing data set 276 is the same as that for choosing data set 171. Except these two sets and normal bearing data, data sets in Table 1 are not diagnosable by the benchmark method in [28].

##### 3.2. EEMD Analysis

Vibration signals of two coaxial sensors corresponding to eight states were cut into 1600 segments with ten seconds data length (120,000 points) for each state. Each state of bearing has 200 segments from two coaxial sensors (each sensor has 100 segments; each segment has 1200 points). Every segment of data is decomposed into nine IMFs and a residual component by EEMD. Drive-end raw time waveforms and part of decomposition results of eight different states are given in Figure 3. Only original signal and the first seven orders of IMFs are shown in the figure.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

In the time-domain distribution of IMF, there is no fault-related impulse component. In order to view the frequency distribution of each IMF, fast Fourier transform (FFT) was performed for each component. Take the first segment of data for eight states as examples, IMF frequency-domain distribution of these data is shown in Figure 4.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

**(l)**

**(m)**

**(n)**

**(o)**

**(p)**

It can be seen in Figure 4 that the energy of IMFs is mainly concentrated in IMF1. Due to the characteristics of EEMD, the frequency content of IMFs ranges from high to low with the increase of order. Moreover, EEMD separates the frequency bands successfully without mode mixing. It may also be noticed that the frequency distribution of normal bearing coaxial sensor data is very similar by comparing Figures 4(a) and 4(b). In Figures 4(c) to 4(p), which are fault bearing coaxial data, components except IMF1 have similar frequency distribution too. Meanwhile, it can be seen from the red box in Figure 4 that the frequency distribution of IMF components is spread over many frequency bands. The random noise in IMFs submerged part of the fault impulse, thus affecting the subsequent feature extraction effect. Therefore, the method of autocorrelation noise reduction is needed to reduce random noise in IMF components.

##### 3.3. Autocorrelation Noise Reduction

In order to suppress the noise, the autocorrelation function of each IMF is computed in the time domain. Figure 5 is the frequency distribution of the corresponding data in Figure 4 after autocorrelation denoise.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

**(l)**

**(m)**

**(n)**

**(o)**

**(p)**

By comparing Figures 5 and 4, it can be seen that the frequency amplitude marked by the red box in Figure 4 significantly decreases after the autocorrelation noise reduction. And the frequency band corresponding to the large amplitude of each component does not change.

The frequency distribution of IMFs is more centralized, which means random noise in the component is effectively suppressed after autocorrelation. However, Res in Figures 5(b), 5(d), and 5(f) show a same low-frequency peak distribution of 10 Hz marked in the red dotted box. We take Res of Figure 5(d) as an example to analyze this phenomenon. The time domain, autocorrelation function, and corresponding frequency-domain distribution of this component are shown in Figure 6.

**(a)**

**(b)**

**(c)**

As can be seen from Figure 6, the signal becomes a monotone signal after autocorrelation. So, the frequency peak of 10 Hz in Figures 5(b), 5(d), and 5(f) correspond to the time length 0.1 s of signal. It has no actual physical meaning. Since this paper only uses IMF1-9 components to calculate the entropy value, it has no influence on the feature matrix distribution.

##### 3.4. Entropy Feature Extraction

According to equations (6) and (7), we calculated the Rényi entropy of IMFs. The dimension of entropy multisensor feature matrix of each state is 100 2. The row 100 of feature matrix means 100 segments of each state. The column 2 of feature matrix means 2 coaxial sensors. Table 2 shows the mean values of entropy for eight states which are calculated by taking an average of 100 Rényi entropy values for every state of bearings. The entropy values of IMFs without and with denosie are compared in Table 2.

The variation in Table 2 means the change rate of entropy mean value for the noise-reduced data comparing with original IMFs:where *R*_{1} is the mean value of entropy for data without denoise and *R*_{2} is mean value of entropy for data with autocorrelation denoise.

As can be seen from Table 2, the entropy of all measuring points decreases after autocorrelation. Arrays with a large reduction (greater than 50%) are highlighted in bold. The maximum reduction is 81.53%. It can also be seen from Table 2 that entropy of data from sensors located in faulty bearing end generally drops greatly. This may be because the vibration signal at the fault bearing end is more susceptible to noise interference.

Figure 7 is the Rényi entropy box plot calculated by vibration signals of two coaxial sensors. Figure 7(a) is the Rényi entropy distribution for data from the drive-end sensor. Figure 7(b) is the Rényi entropy distribution for data from the fan-end sensor. The black part in Figure 7 is the entropy value calculated from raw data, while the red part is the entropy value calculated by IMFs after autocorrelation noise reduction. It is clear that all entropy decreased after noise reduction. Furthermore, the entropy distribution for data without noise reduction from drive-end or fan-end sensor is similar to that after autocorrelation denoise. Although autocorrelation denoise hardly changes the entropy tendency of bearings in different states, it increases the differentiation of eight states. In addition, the entropy values of normal bearing in both sensors are greater than those of all faulty bearings which are consistent with the characteristics of the Rényi entropy.

**(a)**

**(b)**

In order to check the distribution of the outlier in the box diagram, we counted the number of outliers in the box plot of Figure 7. It can be clearly seen from Table 3 that the entropy distribution becomes more centralized and the number of outlier decreases after vibration noise reduction. The overall decrease of outliers was attributed to bearing states of I-F, O3-F, and O12-F. That is, the reduction of the outlier mainly consisted of coaxial data of fan-end fault bearing.

The Rényi entropy of the vibration signal at drive end and fan end constitutes an 800 2 dimensional multisensor feature vector. To compare the distribution of feature matrix without and with noise reduction more distinctly, we draw the feature scatter plot. Their distributions are shown in Figure 8.

**(a)**

**(b)**

As can be observed, features from all conditions except normal bearing have relatively concentrated space distribution in Figure 8(a). In Figure 8(b), although conditions B-F, O3-F, and O12-F still have some similar features, the aliasing of the feature distributions of different states has been significantly improved after noise reduction. The characteristic differentiation degree of bearing in eight different states is increased.

##### 3.5. Result Comparison and Discussion

Through all above signal preprocessing methods carried out in Section 3.2 to 3.4, we obtained the coaxial vibration feature matrix. To classify eight different states of bearing, MKL-SVM is used in this paper. Kernel functions are applied to the features of single sensor and all sensors.

Common kernel functions are divided into two types: local kernel function and global kernel function. The local kernel function has local characteristics, strong learning ability, and weak generalization ability. Linear and Gaussian kernel functions are typical local kernel function. On the contrary, the global kernel function has global characteristics, strong generalization ability, and weak learning ability. Polynomial and sigmoid kernel functions are typical global kernel functions.

In order to make the model have a good classification effect for data with different characteristics, the Gaussian kernel and polynomial kernel functions with different parameters were selected in this paper. Detailed parameters are as follows:(1)Five Gaussian kernel functions with different bandwidths: bandwidth is uniformly sampled at the interval of [0.01, 100] on a logarithmic scale.(2)Three different levels of polynomial function: the orders *d* are 1, 2, and 3.

These kernel functions are applied to a single feature and all features, respectively. Therefore, there are 24 kernels in total.

###### 3.5.1. Comparison and Discussion of Coaxial Signals with Individual Signal

In order to verify the sensitivity of coaxial signal features, we also did the same diagnosis based on single sensor. To train MKL-SVMs, 70% of feature matrix is selected as training data. The *one against all* method is adopted to construct multiclassification model. To improve computational efficiency, we jointly optimized eight binary classification problems. A total of eight MKL-SVMs and a combination of kernels are obtained for coaxial signals or individual signal. The detailed parameters of the models are shown in Table 4 and 5.

As can be observed from Table 4, weight of the kernel function acting on the data of two coaxial sensors accounts for 91.41%, which takes the most part of the MKL model. More specifically, the fusion of two-dimensional coaxial features outperformed feature of a single sensor. It can also be concluded from Table 4 that the weights obtained by MKL realize not only the selection of kernel function but also the selection of data. The kernel function parameters of model based on the data of the drive end in Table 5 are similar to those of the fan end. These selected kernel functions are sorted by weight as follows: Gaussian kernel = 0.01, Gaussian kernel = 0.1, and polynomial kernel *d* = 3. In addition, a first-order polynomial kernel is included based on the drive-end data, but the weight is very small. According to the previous formula, the first-order polynomial kernel is a linear kernel. The classification results of these three models based on different data are shown in Table 6. The classification accuracy based on the CVSFF method is 97.50%. This is much higher than those of single signal-based classification accuracy, which are 66.67% and 60.42%, respectively. All results of three models prove that the fusion of data from two coaxial sensors improved accuracy and robustness in fault detection of bearing.

The huge difference in results is mainly due to the good differentiation of two-dimensional features based on coaxial signals, while the single-dimensional entropy distribution of the individual sensor overlaps, as shown in Figure 7. For the characteristics of drive end, I-F and O12-F are distributed with overlap in the interval [0.6, 1.2]; I-D and O6-D are distributed with overlap in the interval [0.3, 0.5]. For the characteristics of fan end, I-D and I-F are distributed with overlap in the interval [1.1, 1.6]; O6-D, B-F, O3-F, and O12-F are distributed with overlap in the interval [0.3, 0.7].

###### 3.5.2. Comparison and Discussion of MKL-SVM with SVM, GA-SVM, and PSO-SVM

To introduce the effectiveness of MKL-SVM, SVM, genetic algorithm-optimized SVM (GA-SVM), and particle swarm optimization SVM (PSO-SVM) based on single kernel are measured in this section. All parameters of SVM are default. That is to say, the kernel function is the Gaussian kernel; the bandwidth of the Gaussian kernel is 1.

In order to compare the results of CVSFF, we use the same feature matrix that is used in MKL-SVM to train and test the other three SVM models. The classification results are shown in Table 7. The classification accuracies are 97.50%, 95.00%, 95.83%, and 96.67% using the MKL-SVM, original SVM, GA-SVM, and PSO-SVM, respectively. The Gaussian kernel bandwidth obtained by GA and PSO optimization is 8.64 and 5.64. Although the classification accuracy of two optimized algorithms is 0.83% and 1.67% higher than that of SVM, it is still lower than MKL-SVM.

Furthermore, the running time of the models is compared. The experiments are carried out on Intel (R) Core (TM) i5-4210U CPU 2.4 GHz, 4G RAM, Win 7 and MATLAB R2016b. With the exception of the SVM model, the other three models need to optimize the parameters. Thus, the running time of these three models is greatly increased compared with that of SVM. In the first, third, and fourth models, GA-SVM has the shortest running time but the lowest accuracy. The MKL-SVM model adopted in this paper has a higher classification accuracy, although its running time is 86.17 s longer than GA-SVM. Compared with PSO-SVM, this method is not only more efficient but also more accurate in classification.

To analyze the results of these four models in detail, we draw the confusion matrix corresponding to the classification results in Table 7, as shown in Figure 9.

**(a)**

**(b)**

**(c)**

**(d)**

Apparently, the error classification is mainly concentrated in B-F, O3-F, and O12-F. And the predicted states of these error classifications are B-F, O3-F, and O12-F. It verifies the characteristics of the feature distribution in Figure 8(b). The lowest classification accuracy of MKL-SVM is 90.00% (O3-F, 27/30) for single state bearing, which is significantly higher than that of SVM (73.33%, B-F, 22/30). In addition, the classification accuracy of MKL-SVM for the first five states is 100%, while SVM, GA-SVM, and PSO-SVM have 1, 2, and 1 misclassification of data in I-D state, respectively.

However, it can also be seen from Figure 9 that there are three groups of misclassification for O3-F based on the method proposed in this paper, while the performance of the other three models is better. The three segments misclassified by MKL-SVM are the 79^{th}, 88^{th}, and 97^{th} segments of data O3-F. GA-SVM misclassified the 71^{th} segment of O3-F. PSO-SVM misclassified the 88^{th} segment of O3-F which is one of the MKL-SVM misclassified groups. The 97^{th} segment is not close to the feature of B-F, but it is still misclassified as B-F. The results that are misclassified by MKL-SVM need further study and improvement. Nevertheless, MKL-SVM outperforms SVM, GA-SVM, and PSO-SVM both in whole and single state classification accuracy.

#### 4. Conclusion

This paper proposed a feature fusion method based on MKL-SVM using coaxial vibration signals to classify REB states. The obtained accuracy of coaxial signals is 97.50% which is much higher than the results of single sensor signal for drive end (66.67%) and fan end (60.42%). Polynomial kernel with global characteristics and Gaussian kernel with local characteristics are selected, which greatly improves the generalization ability of the model. By comparing the CVSFF, SVM, GA-SVM, and PSO-SVM-based feature fusion, 97.50%, 95.00%, 95.83%, and 96.67% accuracies are obtained, respectively. It shows that the method extracted in this paper is more effective for REB fault classification.

#### Data Availability

The bearing data used to support the findings of this study are obtained from the Case Western Reserve University (CWRU) Bearing Data Center (http://csegroups.case.edu/bearingdatacenter/home).

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

The research was supported by the China Railway (grant number 2017J004-H).