Abstract

Identification of rolling bearing fault patterns, especially for the compound faults, has attracted notable attention and is still a challenge in fault diagnosis. In this paper, a novel method called multiscale feature extraction (MFE) and multiclass support vector machine (MSVM) with particle parameter adaptive (PPA) is proposed. MFE is used to preprocess the process signals, which decomposes the data into intrinsic mode function by empirical mode decomposition method, and instantaneous frequency of decomposed components was obtained by Hilbert transformation. Then, statistical features and principal component analysis are utilized to extract significant information from the features, to get effective data from multiple faults. MSVM method with PPA parameters optimization will classify the fault patterns. The results of a case study of the rolling bearings faults data from Case Western Reserve University show that the proposed intelligent method (MFE_PPA_MSVM) improves the classification recognition rate; the accuracy will decline when the number of fault patterns increases; prediction accuracy can be the best when the training set size is increased to 70% of the total sample set. It verifies the method is feasible and efficient for fault diagnosis.

1. Introduction

With the increasing complexity of modern industry, fault diagnosis as accurately and timely plays an important role in industrial applications. Many fault diagnosis analysis methods have been developed to accurately and automatically identify faults in the past two decades. They usually use some basic measurements, like vibration, acoustic, temperature, and wear debris analysis [1, 2]. Although these methods have been beneficial, tests are still quite expensive and time-consuming. Fault diagnosis using the big data is a goal that has not yet been fully implemented. When the machine creates faults, the dynamic signals of the machine structure can be monitored in real time. The most effective fault diagnosis is feature learning from the monitoring information. Numerous researchers focus on fault diagnosis by faults patterns recognition, but most studies are concerned with the single fault patterns recognition [35]. However, the fault of the machinery may be the compound, which is influenced by two or three causes in practice.

Compared with single fault, compound faults lead to serious performance degradation and are more difficult to recognize. This leaves the challenging task to identify multiple faults effectively. There are a few studies on multiple faults patterns recognition [68]. Chen et al. integrate extreme-point symmetric mode decomposition with extreme learning machine to identify typical multiple patterns recognition [6]. Ranaee and Ebrahimzadeh use a back-propagation (BP) neural network to recognize multiple-fault patterns [7]. Lu et al. propose a hybrid system that uses independent component analysis (ICA) and support vector machine (SVM) for recognizing mixture patterns. That method initially applies the ICA to get the independent components (ICs), and then the ICs are used as the inputs for the SVM classifier [8].

Feature extraction has become a major technique for multiple-fault patterns recognition. Numerous previous studies have reported about signal processing [913], like Fourier Transform [9] and wavelet transform [10]. But these methods should select appropriate base functions in advance. It is difficult to get effective analysis results, because the data from real world machines are nonstationary and nonlinear. Empirical mode decomposition (EMD), as a formidable and effective time–frequency analysis method, is programed to analyze the nonstationary signals and can be adaptive to decompose the confusion signal into intrinsic mode functions (IMFs) by the inherent characteristics of the signals [1113]. Features extraction by EMD is appropriate for distinguishing different mechanical signals [1416]. Wang et al. [14] propose a novel feature extraction method by nonnegative EMD manifold in machinery fault diagnosis. Saidi et al. [15] proposed an EMD-based fault diagnosis module to detect the incipient bearing faults based on the raw vibration signals. Ali et al. [16] use EMD as feature extraction method then select the most important intrinsic mode functions and classify bearings defects by the artificial neural network (ANN). However, due to factors of uncertainty and nonlinearity of the production process, EMD analysis methods may overcome these limitations to implement the multiple faults diagnosis.

Traditionally, fault diagnosis was examined and analyzed manually by some measurements data. With the development of machine learning techniques, expert systems were employed for faults recognition in automatic process monitoring [1720]. Support vector machine (SVM) has been widely used in recognizing multiple faults for its excellent performance in the practical application. Unlike neural network methods, SVM has great generalization ability of dealing with small samples since its model complexity does not depend on the number of features and thus is suitable for high dimensional data [13, 21, 22].

The above discussion shows that although some methods have shown prospective results in improving multiple-fault diagnosis performance, none has been widely used and still has improvement room to achieve the final goal. Multiple-fault classification method is proposed by a combination of empirical mode decomposition, PCA, and MSVM theory with PPA parameters optimization. Process signals were decomposed into IMFs by EMD method, and Hilbert transformation is utilized to get the instantaneous frequency of decomposed components. Then, the statistical features of intrinsic mode function and instantaneous frequency were calculated. Principal component analysis is utilized to extract significant information from the statistical features, to get effective data of multiple faults. Finally, MSVM with PPA parameters optimization will classify the fault modes.

2. Processing Model

The proposed fault patterns recognition model performs in three modules to effectively monitor the multiple faults (Figure 1).(i)Features extraction as the first stage: according to the characteristics of the machinery and equipment operating process, process signals were chosen and decomposed into IMFs by EMD method and instantaneous frequency of decomposed components was obtained by Hilbert transformation. Statistical and shape features are extracted from the EMD data. Then PCA is further applied to reduce the feature dimension and the computational complexity.(ii)Classify the fault patterns by MSVM in the second stage: the selected features are used as the inputs and the MSVM classifier should be designed properly for getting the satisfactory recognition performance.(iii)Optimization module as the third stage: K-fold cross-validation and an adaptive mutation particle swam optimization are combined as the PPA parameters optimization method to select the parameters for MSVM.

2.1. Features Extraction

(1) EMD Method Principle. Empirical mode decomposition (EMD) is an adaptive frequency analysis method; its main principle is to decompose the complex signals into some IMFs self-adaptively by the inherent characteristics of the signal, and every IMF shows the specific frequency information of the signals. EMD helps to smooth processing of the signal, decompose the fluctuations or trends of signal in different scales gradually, and overcome the pre-filter center frequency and bandwidth problem in traditional envelope analysis. EMD is suitable for nonstationary and nonlinear signal analysis, and it has been widely used in many fields [1114], and the steps of EMD can be shown as follows.

Firstly, the raw vibration signal was decomposed; seewhere is the original vibration signal; is the intrinsic mode function; is the remaining functions, which represents the overall trend of signals. means decomposition time and means total decomposition times.

The mode components are separated by the instantaneous frequency from high to low; EMD method can be viewed as a set of high-pass filter from the filter characteristic. EMD method obtained the first few high-frequency IMF components, which can effectively represent the signal characteristics; remaining IMFs belong to the residual component, which is mainly the low-frequency noise. Selected high-frequency IMF components are transformed by the Hilbert method (see (2)), and (3) construct their analytic signals.Amplitude function and phase function can be got byEq. (5) means the derivative of the phase function, namely, the instantaneous frequency.

IMF components largely reflect the true characteristic information of the original signal. However, both ends of the signal generate divergent phenomenon due to the EMD having used the cubic spline interpolation method. And with the gradual deepening of decomposition, divergent phenomenon has extended to the entire signal and produces modal aliasing. Recent studies show that the longer signal can be selected to reduce endpoint divergence and then select the subsequent IMF component, whose both ends of the signal are intercepted, to reduce the impact of endpoint modal aliasing.

In this paper, IMFs and their corresponding instantaneous frequencies are selected as characteristic variable data, and they effectively extract features value to calculate the corresponding eight statistical feature values as feature data set, respectively, mean, max, range, standard deviation, skewness, kurtosis, coefficient variation, and sum of square (see Table 1).

(2) Principal Component Analysis (PCA). Multivariate statistical analysis is the most commonly used data-driven method for fault diagnosis. The principal component analysis (PCA) as an efficient representative uses linear transformation to obtain fewer features as little as possible. It can make these new features not correlated to each other but maintain the original information as much as possible. The PCA method is chosen as the second feature extraction step to eliminate duplication of information in this study.

PCA algorithm is assumed that we have a collection of unlabeled training data samples organized into the matrix . Each data sample is a (column) vector of dimension . For example, sample can be denoted as , where, for a matrix (or a vector) , indicates its transpose. For simplicity of description, we assume each data sample has already been appropriately scaled and demeaned.

First, calculate the covariance matrix

The matrix describes the overall scattering of the process data. The matrix is symmetric and can be orthogonally diagonalized as where is an orthonormal eigenvector matrix, and , is the diagonal matrix of eigenvectors, such that . Moreover, the eigenvalues are ordered such that . The eigenvectors are the principle components of the th dimension.

Let be the contribution rate of the th principal component. Choose the smallest principal components such that

where the threshold is the desired percent of variance retained; for instance, a threshold can be chosen to be 0.99. The data set shall be approximated using the set of the first principal components organized into the matrix

Specifically, for each data sample , , the extracted new feature is a -dimension vector given by

The new data set is subsequently given by

Therefore, the original data set iswhere is the residual matrix, mainly caused by noise. Removing residuals will not cause significant impact on the useful information.

2.2. Support Vector Machine

Basic binary SVM is initially designed to deal with two-class problems based on the structural risk minimization theory. It is set up to get the best solution between model complexities and learning ability. However, it has been extended to multiclass problems.

The binary SVM classification method is established by constructing an optimal separating hyperplane (OSH), in order to maximize the margin between two classes of data points (see Figure 2). Suppose that the training set . A binary SVM model with a nonlinear kernel is to find the best classifier with parameters , in the form ofin order towhere the minimization is over all decision variables and is a penalty constant. It can be applied to control the trade-off between minimize classification error and maximize margin.

For nonlinear decision boundary, the kernel function is applied to transform the input from a low-dimensional space into a higher dimensional feature space, so that an optimal linear separating hyperplane can be found. Although many researchers proposed several types of kernel functions, radial basis functions (RBF) are the most widely used to solve nonlinear problems in SVM. Its definition can be described as for :where denotes the width of the RBF, and indicates the Euclidean norm of .

To solve multiclass problems, a MSVM method is applied in the second classifier stage. Two kinds of the MSVM methods are widely used; one is one-against-all (OAA); the other is one-against-one (OAO). In this paper, the OAO is adopted for multiple faults recognition. This method constructs binary SVM classifiers and every sample is trained to separate one class from another class. Testing sample can be got the results by the voting results.

The largest problems encountered in the MSVM are to select the best penalty parameter and the kernel function value . To alleviate this difficulty, PPA parameters optimization method is next used to get the best values of parameters and in the MSVM classifier.

2.3. Parameters Optimization

MSVM is applied to classify the multiple-fault patterns in this paper, but the largest problem encountered is how to select the penalty parameter and kernel function parameters value (γ) of MSVM. In many literatures, these two parameters can be got by -fold cross-validation.

The principal of -fold cross-validation is make the original sample separated into subsamples randomly and then choose one subsample as the testing data for validation, and the rest subsamples are used as the training data. Then repeat times; apply every subsample adopted once as the validation data. An averaged testing result can be got by this -folds. The advantage of this method is to ensure that all data are applied for training and testing. Select the optimal penalty parameter and kernel function parameters value with the highest rate of -fold cross-validation as the final MSVM parameters, and then use these two parameters on the entire training set, and finally test unknown testing samples by the trained classification model.

-fold cross-validation weakness is that elected best parameters obtained from the training data cannot represent the entire training data. The result will be affected when there are only small sizes of training data. Therefore, adaptive mutation particle swam optimization and -fold cross-validation are combined to get the best parameters of MSVM. Firstly, define MSVM regularization parameter and kernel parameter as a combination of a particle and use the training accuracy by -fold cross-validation as the fitness function; 3-fold cross-validation is applied. The steps of the proposed parameters optimization method are followed.

Step 1. Set PPA parameters, like population number, swarm size, maximum velocity, and the probability of adaptive mutation rate, parameter ranges (see Table 2).

Step 2. Randomly generate the initial particle and set velocity; the particles are used MSVM to get training accuracy as fitness value by 3-fold cross-validation method, The 3-fold in cross-validation is chosen according to the proportion of the training samples to the testing sample [23], the particle’s best known position of the initial position.

Step 3. Update the individual position and velocity of every particle. Subsequently, renew the best known position of each particle and the best group position . Specifically, let , be two uniformly distributed random numbers in the interval . The velocity and position of each particle at iteration , , are updated using the current velocity and the distances from and as follows. Specifically, let be the maximum allowed position in the th dimension and be the minimum allowed position in the th dimension. Define And similarly Moreover, define Calculate Then calculate each component of by For , the notation sgn indicates the usual signum function. Then the th particle position is updated with

Step 4. For solving PSO’s “premature” problem, which is easy to relapse into a local extremum and other particle quickly moves to this local position in the optimization process. AMPSO is used to solve this problem; it makes the algorithm escape from the local optima to find the best solution in the other space.

As can be seen in formula (22), the next position of a particle is determined by both its current position and its new velocity. The new velocity is determined by the immediately previous velocity, individually best , and group best , as shown in formula (20). If the algorithm is in premature, then the group best is the local optimal solution. If is changed, the search direction of particles will be redirected. Thus, the main idea of the AMPSO is by mutating in hope that the search will get out of a local optimum to explore new individual optimum and group optimum.

The mutation of the PSO is designed as a random operator with a certain probability . Specifically, for a uniformly distributed random number a mutated new group optimum is obtained as follows:

Step 5. Until meeting a termination criterion, which can be the number of iterations performed, or meeting the accuracy requirements, repeat from Step 2.

Step 6. Find the global best position .

Through improved adaptive mutation particle swam optimization algorithm and -fold cross-examination validating, the optimum parameters of MSVM are used to train the entire training set; get the PPA-MSVM classification model. Then trained PPA-MSVM classifier is utilized to train unknown testing set, so that diagnosis of the rolling bearing faults is done intelligently.

3. Case Analyses

For verifying the feasibility and effectiveness of this method, the bearing dataset of Case Western Reserve University Bearing Data Center is adopted in this paper. The detailed description of the experimental apparatus is presented in Figure 3; it consists of a 2 hp, three-phase induction motor (left), a torque transducer (middle), and a dynamometer-load (right) [24, 25]. There are four different bearing conditions with four different loads simulated including healthy, inner-race defect, rolling element defect, and outer-race defect. Typical waveforms of the four conditions are illustrated in Figure 4.

3.1. Data Descriptions

In this study, the bearings with 1797 r/min in rotating speed at a sampling frequency of 12 kHz for four bearing conditions were selected to evaluate the proposed method. Single point faults with different fault diameters of 0.178 mm, 0.356 mm, and 0.533 mm were used to test in this paper. So there are ten fault classes; specific data is shown in Table 3. Each sample set selects 1000 vibration data, 120 groups of data sets in each mode.

3.2. Features Extraction

(1) Empirical Mode Decomposition. Firstly, Process vibration signals were decomposed into intrinsic mode function by EMD method. IMF1 ~ 4 after intercepting both ends are selected in this paper; then instantaneous frequency of decomposed components was obtained by Hilbert transformation (seen in Figure 5). A total of eight components are, respectively, four IMF components and four corresponding instantaneous frequency sequences. For suppressing the end effect of EMD, each component signal cut off both ends, taking the remaining 800 data points of IMF and instantaneous frequency.

(2) Statistical Features. Calculate eight statistical features (mean, max, range, std, skewness, kurtosis, variation coefficient (VC), and Sum Of Squares), extract 64-dimensional feature vectors from each sample. Every patterns data form is 120 × 64.

The feature space box plots of the eight features in 10 different classes are generated by the data after EMD (seen in Figure 6). The features of different patterns present different values, which can be relatively well separated from the other classes.

(3) Principal Component Analysis. Principal component analysis can get the fewer variables but maintain the original information as much as possible. So in this study, after choosing the statistical and sharp features, we use the PCA method as the second feature extraction to eliminate duplication of information. 70% of the sample are used for training and the rest 30% are for testing. The recognition accuracy can be estimated by the testing samples. In PCA, cumulative contribution threshold value is set to 90%, indicating that PCA select owns 90% of data information by the main components; the results are shown in Figure 7 and Table 4.

3.3. Result Analysis

(1) Performance of Recognizer in Optimization. After the features extraction of bearing fault, MSVM with radial basis function, first use the cross-validation (CV) method to get regularization parameter and kernel parameter , both of the search range ; the index step is 0.5, the relationship of the trained MSVM accuracy and , is shown in Figures 8 and 9.

From Figures 8 and 9, MSVM classification recognition rate has a greater impact when and vary over a considerable range. The lowest classification rate is only around 18%. Accordingly, use particle parameter adaptive method to optimize MSVM kernel parameters and regularization parameter . The parameters of PPA can be seen in Table 5, and take fold number of -fold cross-validation as 3 (best cross-validation accuracy = 95.24%, best = 100, and best = 0.01).

The average recognition accuracies of GV_MSVM and PPA_MSVM show that proposed PPA_MSVM method plays a significant role in increasing the recognition accuracy. Because GV_MSVM quite depends on the fold results of the training sample data, but not the entire training set, PPA algorithm can solve this problem.

The simulations show that the PCA method is less effective than that of EMD statistical features. The data dimensions of the original feature set will be effectively reduced to improve the efficiency of the identification operation. And only relying on a small number of 15 main element characteristics can effectively identify the type of fault and still maintain a high accuracy rate. But on the other hand, compared with the original feature set, using PCA can cause information missing to reduce the recognition effect, but this negative effect impact to identify the fault is small; we can accept this result.

(2) Performance of Recognizer in Different Fault Patterns. To verify the proposed method and analyze the classify accuracy on different fault pattern, the behavior of the simulated fault patterns number, PCA, and parameters optimization method were examined through a full factorial experiment. It mainly had the following three factors: (1) parameters optimization method with two levels (CV_MSVM and PPA_MSVM), (2) fault pattern number with three levels of four patterns (see Table 3, fault numbers 1–4), seven patterns (see Table 3, fault numbers 1–7), and ten patterns (see Table 3, fault numbers 1–10).

Table 6 shows the results of the full factorial experiment. The prediction accuracy can be reached to 100% when it only has 4 fault patterns. It indicates that the proposed method was effective in classifying the fault patterns. However, the accuracy will decline when the numbers of fault patterns increase. The reasons are the multiple-fault patterns information may be confused and more difficult to recognize and so result in serious performance degradation. Besides, when it is 7 fault patterns, PCA method plays an effective role in two different parameters optimization method. Finally, the results show that parameters optimization method of PPA is more effective than that of CV when coupled with the PCA and different fault pattern number.

(3) Performance of Recognizer in Different Training Samples. The performance of the optimization methods has been compared with CV for investigating the capability of the proposed REBs multiple-fault patterns method. To indicate the influence of the training sample, we test the accuracies of the PCA_PPA_MSVM model based on the proposed EMD and statistical feature extraction, in the cases where the percentages of training samples are 40%, 50%, 60%, 70%, 80%, and 90%. The testing results are presented in Table 7.

As shown by the experimental results, prediction accuracy can be the best when the training set size is increased to 70% of the total sample set. The reason is the prediction accuracy will be higher when the training model gets the best parameters. But it also has the overfitting problem; we can find the prediction accuracy to be not well when the percentages of training samples are 80% and 90%.

(4) Performance of Recognizer in Different Feature Extraction Methods. Feature extraction can lead to faster training and more efficiency in multiple-fault diagnosis method. Thirteen statistical and shape features are utilized as the inputs in this paper. In order to explain its effectiveness, MSVM classifier using the EMD, PCA, and MFE (Combine EMD and PCA) as the feature extraction method is constructed. Table 8 shows the recognition accuracy of three different feature extraction methods.

The average prediction accuracies of EMD_MSVM (8.33%), PCA_MSVM (72.5%), and MEF_MSVM (94.50%) show that feature extraction method plays an important role in improving the recognition accuracy. From the results, we can find that multiple-fault diagnoses are difficult to recognize due to the complex relation, but the result is much better after using multiscale feature extraction (MEF) method, which decomposes the data into intrinsic mode function empirical mode decomposition method and instantaneous frequency of decomposed components was obtained by Hilbert transformation, and then statistical features and principal component analysis are utilized to extract significant information from the features.

4. Conclusion

The objective of this study is to propose a fusion approach for the multiple-fault diagnosis with single and coupling faults, by multiscale feature extraction with integrating three information methods (empirical mode decomposition, statistical features extraction, and principal component analysis) of signal progress, respectively, in time domain, frequency domain, and time–frequency domain. MSVM method with particle parameter adaptive (PPA) parameters optimization will classify the fault patterns. From this discussion, the proposed MFE_MSVM_PPA method can produce the highest average correct classifier accuracy compared with other methods in experiments. Besides, we analyze the influences of the prediction accuracies under different elements, like parameters optimization method, fault pattern number, PCA, and training sample size. The proposed classification method holds high precision on multiple faults fusion diagnosis and is proved to be a promising diagnosis approach for catering to the increasing characteristic parameters and feature information.

This multiple-fault diagnosis approach is feasible and, as the computational results show, quite effective in improving the compound faults diagnosis of rolling bearing fault patterns. While still immature, the data is from simulation, not relying on significant real-time testing. Because getting field data for validation of the approach is very difficult, we generate the simulated data using the rolling bearings faults original data from the Case Western Reserve University, then to analyze the multiple-fault pattern recognition problem.

The future work will be focused on the following aspects: (1) employing multiscale feature extraction (MFE) as feature extraction method which we will compare with other excellent feature extraction methods; (2) comparing particle parameter adaptive (PPA) with other intelligent algorithms; (3) researching the fundamental principles of multiple faults diagnosis; and (4) study with real data is certainly our next research task.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work is financially supported by the Fundamental Research Funds for the Central Universities under Grant no. 2682016CX031 and National Natural Science Foundation of China (NSFC) under Grant no. 51175442.