#### Abstract

In order to enhance the performance of bearing fault diagnosis and classification, features extraction and features dimensionality reduction have become more important. The original statistical feature set was calculated from single branch reconstruction vibration signals obtained by using maximal overlap discrete wavelet packet transform (MODWPT). In order to reduce redundancy information of original statistical feature set, features selection by adjusted rand index and sum of within-class mean deviations (FSASD) was proposed to select fault sensitive features. Furthermore, a modified features dimensionality reduction method, supervised neighborhood preserving embedding with label information (SNPEL), was proposed to realize low-dimensional representations for high-dimensional feature space. Finally, vibration signals collected from two experimental test rigs were employed to evaluate the performance of the proposed procedure. The results show that the effectiveness, adaptability, and superiority of the proposed procedure can serve as an intelligent bearing fault diagnosis system.

#### 1. Introduction

Bearings are one of the most crucial elements of rotating machinery [1, 2] and bearing faults can seriously affect safe and stable operations of the rotary mechanical equipment [3, 4]. If no effective actions are taken, device faults will inevitably occur, and such faults may lead to serious casualties and enormous pecuniary loss [5]. Thus, it is of significance to identity bearing faults to maintain safety of the device and reduce maintenance cost. Vibration signals collected from rolling bearings usually carry rich information on machine operation conditions [6]. In recent years, with the rapid development of signal processing, data mining, and artificial intelligence technology, the data-driven methods are becoming more important in the fault diagnosis of rolling bearings. Four main steps are necessary for these methods based on vibration signals analysis: signal processing, features extraction, features reduction, and patterns recognition [7, 8]. The first three steps are the foundation of patterns recognition.

In the phase of signal processing and features extraction, due to the complexity of equipment structure and variety of operation conditions [5], the signals collected from rolling bearings often exhibit strong nonlinearity and nonstationarity. Therefore, the time-domain and frequency-domain analysis approaches cannot have essential effects [9]. For these signals, time-frequency analysis can provide an effective way for features extraction. There are representative and commonly used time-frequency analysis methods, such as empirical mode decomposition (EMD), short-time Fourier transform (STFT), Wigner-Ville distribution (WVD), and wavelet transform (WT) [10].

In recent years, various intelligent fault diagnosis systems based on EMD [11–15], STFT [16–18], and WVD [19–21] have been widely developed for monitoring the condition of bearings in rotating machines with varying degrees of success. However, for these time-frequency methods, some challenges exist in the application. EMD has some problems such as over envelope, end effects, and mode mixing [22–24]. The effectiveness of STFT is still hampered by the limitation of single triangular basis [25, 26]. WVD can produce interference terms on the time-frequency domain in a critical condition and high computational complexity [27]. Wavelet analysis is another important time-frequency analysis method, and it is outstanding in rotary machine diagnosis because its multiresolution merit is suitable for analyzing nonlinear and nonstationary signals [28]. Continuous wavelet transform (CWT) and discrete wavelet transform (DWT) are two categories of WT. They have perfect local properties in both time and frequency spaces and can be used as an effective method to preserve signal characteristics [27]. In [29], wavelet filtering to detect periodical impulse components from vibration signals was presented. In [30], the DWT for extracting the rotor bar faults feature was studied. In [31], the CWT and the wavelet coefficients of signals are used to process vibration signals. However, both CWT and DWT have drawbacks. CWT can generate redundant data. Therefore, it has a huge operand and requires a long time to use [31, 32]. Although DWT can overcome this drawback by using the decomposition of the original complex signal to several resolutions [33, 34], DWT requires the sample size to be exactly a power of 2 for the full transform because of the downsampling and has very poor frequency resolution at high frequencies [35–37]. In order to overcome these drawbacks, a new wavelet-based algorithm is developed, namely, maximal overlap discrete wavelet packet transform (MODWPT) [38]. It not only provides better frequency resolution, but also has no restriction about sample size [36, 38]. In [36], simulation signals and gear fault vibration signals collected form a test stand are decomposed into a set of monocomponent signals by MODWPT; then the corresponding Hilbert spectrum is applied for gear fault diagnosis; the simulation and practical application examples show that the Hilbert spectrum based on MODWPT is superior to EMD. However, the time-frequency analysis methods mentioned above can cause a high-dimensional feature vector that can be a primary reason for fault classification accuracy degradation [39]. Thus features selection or dimensionality reduction is needed to find the most useful fault features that can keep intrinsic information about the defects.

Generally, the statistical properties of the signal in time, frequency, and time-frequency domain are extracted to represent features information, such as peak value (PV), root mean square (RMS), variance (), skewness (Sw), and kurtosis (). In [40], 21 time-domain statistical characteristics are extracted from different IMFs obtained by EMD as the feature vectors. Then, principle component analysis (PCA) was employed to extract the dominant components from statistical characteristics for gear faults detection. In [41], two time-domain and two frequency-spectrum statistical characteristics are selected as the features to train the SVM with a novel hybrid parameter optimization algorithm for fault diagnosis of the rolling element bearings. In [31], the statistical parameters of the wavelet coefficients in 1–64 scales were calculated for the vibration signal. In [42], 40 statistical features of wavelet packet coefficients were calculated for a single sample for each state of bearing condition. In [43], for each wavelet packet node, 10 statistical features are extracted from its associated wavelet packet coefficients and 10 statistical features are extracted from frequency spectra of its associated wavelet packet coefficients. However, considering the complex mapping relations between some bearing faults and their signs, it is often difficult to determine which statistical property is worthy of reflecting the fault nature from the feature space. If unsuitable features are used for fault diagnosis, it may lead to a decline in accuracy and efficiency of fault diagnosis [10, 44]. Therefore, how to select the fault sensitive statistical characteristics as the basis of subsequent fault analysis garners considerable attention and is further studied. In this paper, a features extraction method, features selection by adjusted rand index and sum of within-class mean deviations (FSASD), is proposed. FSASD combines the -means method and sum of within-class mean deviations (SWD) of feature data, which can select the sensitive statistical characteristics for fault analysis.

For the high-dimensional statistical characteristics data, if these data are used directly in fault classification, it will lead to the very high computational complexity and fault classification accuracy degradation. Therefore, features dimensionality reduction is another crucial stage in the fault diagnosis process [21]. Up to now, dimension reduction algorithms for machinery fault diagnosis have been intensively investigated [46, 47] and many classical methods have been proposed [48]. Principal component analysis (PCA) and linear discriminant analysis (LDA), as two classical linear dimensionality reduction methods, have been widely used for linear data; when the distribution of a dataset is nonlinear, PCA and LDA may be invalid [49]. Therefore, recently, some nonlinear dimensionality reduction methods, kernel principal components analysis (KPCA), Isomap, Laplacian Eigenmaps (LE), and Local Linear Embedding (LLE), and so on, are presented to provide a valid solution for the dimensionality reduction of nonlinear data [12]. Although nonlinear dimensionality reduction methods have been successfully applied in many fields, they also have some problems in practical applications, such as the problem of “out-of-sample” that has no explicit mapping matrix [50], the problem of overlearning of locality [51], and high computational complexity. Inspired by nonlinear dimensionality reduction methods, a lot of linear unsupervised dimensionality reduction methods based on manifold learning are proposed [52], such as neighborhood preserving embedding (NPE) [53], orthogonal neighborhood preserving projection (ONPP) [54], and locality preserving projections (LPP) [55]. They are the representative ones, which preserve the local geometric structure on the data manifold using linear approximation to the nonlinear mappings [52]. In recent years, some other manifold learning-based dimensionality reduction methods are presented to provide valid solutions for dimensionality reduction. In [56], a novel supervised method, called locality preserving embedding (LPE), is proposed and gives a low-dimensional embedding for discriminative multiclass submanifolds and preserves principal structure information of the local submanifolds. In [57], maximal local interclass embedding (MLIE) is proposed. MLIE can be viewed as a linear method of a multimanifold-based learning framework, in which the information of neighborhood is integrated with the local interclass relationships [57]. In [52], a general sparse subspace learning framework, called sparse linear embedding (SLE), is proposed and can integrate the local geometric structure to obtain sparse projections. And the ONPP is taken as an example to design a novel sparse subspace learning framework [52]. In [58–62], some supervised and semisupervised dimensionality reduction methods based on NPE are proposed. NPE, as a manifold learning method, is a kind of linear approximation of LLE by replacing the nonlinear mapping relation to achieve dimensionality reduction [53, 63]. NPE aims at preserving the local neighborhood structure on the data manifold, and it can work well with multimodal data. In [63], the NPE is applied for bearing fault identification and classification and performs well in feature extraction. However, NPE could not utilize the label information in dimensionality reduction [57]. LDA is a supervised dimensionality reduction method and takes the label information into account in features reduction. Based on the respective attributes of NPE and LDA, supervised neighborhood preserving embedding with label information (SNPEL), a modified NPE, is proposed in this paper, where the fault label information is considered.

The contribution of this paper is the development of intelligent fault diagnosis system of rolling bearings based on multidomain features, systematically combining statistical analysis methods with artificial intelligence techniques. FSASD, a novel features extraction method, was proposed to select the fault sensitive statistical characteristics as the basis of subsequent fault analysis. A modified features reduction method, SNPEL, was proposed to excavate abundant and valuable information with low dimensionality. The execution of the proposed bearing fault diagnosis method is divided into four steps: signal processing, features extraction, features reduction, and fault patterns identification. In the first step, vibration signals collected from bearings are decomposed into different terminal nodes by MODWPT, and multidomain features were calculated from the reconstructed signal. In the second step, the adjusted rand index (ARI) criterion of the clustering method and SWD of samples were used to select fault sensitive statistical characteristics, which can represent the fault peculiarity under different working conditions. Furthermore, due to information redundancy and a high-dimensional dataset, in the third step, SNPEL was applied to obtain a new lower-dimensional space in which the new constructed features were obtained by transformations of the original higher-dimensional features such that certain properties were preserved. Finally, vibration signals collected from two test rigs were conducted to validate the effectiveness, adaptability, and superiority of the proposed method for the identification and classification of bearing faults. The first test rig is from Case Western Reserve University; four cases with 12 working conditions were employed to verify the performance of the proposed method. The second test rig is SQI-MFS test rig; two cases with 10 working conditions were employed to verify the performance of the proposed method. The analysis results for the vibration signals of roller bearing under different working conditions show the effectiveness, adaptability, and superiority of the proposed fault diagnosis approach.

The rest of this paper is organized as follows. In Section 2, a theoretical background of the LDA technique, NPE technique, and SVM is summarized. In Section 3, a description of the proposed diagnosis technique is given, and the system framework of the proposed method is illustrated. In Section 4, bearing faulty vibration signals collected from two experimental test rigs are employed to verify the proposed fault diagnosis method. Finally, some conclusions are drawn in Section 5.

#### 2. Theoretical Background

##### 2.1. Bearing Fault Effects on the Vibration in Frequency Domain

For the bearing, the inner race, outer race, ball, and cage which are placed in the space between the rings make rotating possible. However, due to the inappropriate lubrication of the bearing rolling elements, inadequate bearing selection improper mounting, indirect failure and material defects, and manufacturing errors, various defects can occur [21], such as surface fatigue damage, bonding, and wear. The most common of these faults is the surface fatigue damage, which is further categorized as spalling, crack, or other abnormal conditions [64]. When a fault appears on the surface of bearing, the cyclical impulsive vibration emerges. The frequency of the impulsive vibration is known as the fault symptoms, of which the value depends on the fault size, rotational speed, and damage location [65].

For different bearing components (i.e., outer race, inner race, and ball, as shown in Figure 1), main fault frequencies are the cage fault frequency (CFF), the inner raceway fault frequency (IRFF), the outer raceway fault frequency (ORFF), and the ball/roller fault frequency (BRFF). When the outer ring is fixed, the aforementioned fault frequencies are mathematically described aswhere is the motor driving frequency or rotational frequency of shaft, is the ball/roller diameter, is the pitch diameter, is the number of rolling elements, and is the ball contact angle (zero for rollers) [21]. Therefore, a lot of research work has been carried out based on vibration signal for bearings fault analysis.

##### 2.2. Maximal Overlap Discrete Wavelet Packet Transform (MODWPT)

WT can be treated as a fast-evolving mathematical and signal processing tool in dealing with nonstationary signals [66] and has been widely applied in many engineering fields for decomposing, denoising, and signal analysis over nonstationary signals [26, 42]. Continuous wavelet transform (CWT) and discrete wavelet transform (DWT) are two categories of WT. The CWT have some drawbacks; one of these is that CWT generates redundant data. Therefore, it has a huge operand and requires a long time to use [31, 32]. The DWT can overcome this drawback by using the decomposition of the original complex signal to several resolutions [33, 34]. Let be a column vector containing a sequence , namely, , and is a power of 2. The even-length scaling (low-pass) filter can be denoted by and the wavelet (high-pass) filter can be denoted by . These low-pass filters satisfyfor all nonzero integers . These high-pass filters are also required to satisfy (2). In addition, both low-pass filters and high-pass filters are chosen to be quadrature mirror filters satisfyingWith , for , the th level wavelet and scaling coefficients are given bywhere mod means modulus after division [35–37].

Although the DWT has been developed to improve the drawback mentioned above of CWT [33, 34], it requires the sample size to be exactly of a power of 2 for the full transform because of the downsampling step in the DWT [35]. In order to overcome these drawbacks, maximal overlap discrete wavelet transform (MODWT) is developed [36]. MODWT could be considered as a revised version DWT. While the DWT of level restricts the sample size to an integer multiple of , the MODWT of level is well defined for any sample size [35–37]. A scaling of the defining filters is required to conserve energy and filters are given by

Thus, (2) becomesand the filters are still quadrature mirror filters satisfying

In order to avoid downsampling, the MODWT creates appropriate new filters at each stage by inserting zeros between the elements of and

With , for , the th level scaling coefficients and wavelet coefficients are given by

However, both the DWT and the MODWT have very poor frequency resolution at low frequencies [36]. For this drawback, the maximal overlap discrete wavelet packet transform (MODWPT) can further decompose the high frequency band, which is not decomposed in the DWT and the MODWT. Let be the sequence of MODWPT coefficients at th level and the frequency-index . With , given the series of length , then the can be obtained by usingwhen mod or 3, then ; when mod 4 = 1 or 2, then .

Therefore, with the suitable decomposition scale and disjoint dyadic decomposition, the complicated signal could be decomposed into a number of components whose instantaneous amplitude and instantaneous frequency attain physical meaning [36, 37].

##### 2.3. Linear Discriminant Analysis (LDA) and Neighborhood Preserving Embedding (NPE)

###### 2.3.1. Linear Discriminant Analysis (LDA)

The LDA was proposed by Fisher [67] for dimension reduction, which finds an embedding transformation such that the between-class scatter is maximized and within-class scatter is minimized [68–70]. The objective of the original Fisher’s LDA, namely, Fisher’s criterion, is to maximize the ratio of between-class scatter matrix to within-class scatter matrix :where is a vector and and are two scales. is the absolute value operator. However, a large number of state classes are usually present for identification and classification of different bearing faults. Hence the multiclass LDA is more desired [21].

Let be -dimensional samples and be the associated class labels, where is the number of samples and is the total number of classes. Let be the number of samples in class . When , where , a projection matrix is needed. Both and are by matrices, and the ratio of them cannot be computed directly. The determinant ratio is used:where the definitions of the between-class scatter matrix and the within-class scatter matrix are as follows:where is the mean of the samples in class and is the mean of all samples:The between-class scatter matrix and within-class scatter matrix also have equivalent form [71, 72]:where and are weight matrices, and and are diagonal matrices. is the th diagonal samples of and the sum of elements of the th row of , and is the th diagonal samples of and the sum of elements of the th row of . The solution to minimize the within-class scatter variance and maximize the between-class variance is obtained by an eigenvalue decomposition of and considering the eigenvalues corresponding to the eigenvalues.

###### 2.3.2. Neighborhood Preserving Embedding (NPE)

NPE, which is proposed by He et al. [53] for dimension reduction, aims at preserving the local neighborhood structure on the data manifold and is a linear approximation of the LLE. NPE can avoid a disadvantage of LLE that is sensitive to outliers [63]. NPE not only seeks an embedding transformation such that the local manifold structure is preserved, but also can be performed in either supervised or unsupervised mode when the class information and a better weight matrix are available [53].

Given a dataset of samples assembled in a matrix , the dimension of each sample is , and a transformation matrix* A* can be found that maps these samples to a dataset of samples assembled in . The dimension of each sample is , where the th column vector of corresponds to that of . Thus, the transformation can be expressed by . The specific procedure can be presented as follows [53, 63]:(1)Constructing an adjacency graph: calculate the Euclidean distance between samples and . The -nearest neighbors (knn) are used to construct the adjacency graph . The distance represents the edge connecting and , as(2)Computing the weights: in this step, the weights of the edges are computed. Let denote the weight matrix with having the weight of the edge from node to node and let it be 0 if there is no such edge. The weights of the edges can be computed by the minimizing weighted objective function which is presented as follows: with constraints A reasonable criterion for choosing an expected map is to minimize that cost function which is presented as follows [72]: This optimization problem can be converted to the following expression: where , , and tr is the trace of . is symmetric and semi-positive definite. The specific procedure of how to solve the above minimization problem can be seen in [72].

##### 2.4. Support Vector Machine (SVM)

The key concept of SVM [73], which is originally developed for binary classification problems, uses a hyperplane to define decision boundaries between data points with different class. The idea behind SVM is that it can seek to construct optimal separating hyperplane to separate the two patterns, where the hyperplane minimizes the upper bound of the generalization error by maximizing the margin between the separating hyperplane and the nearest sample points [24]. SVM is able to handle both simple linear classification tasks and the classification of complex and nonlinear multiclass data [12].

Considering that a dataset consists of -dimensional sample, presents the attribute and the corresponding label defines the type. In order to acquire a hyperplane to separate the two types of samples, a linear decision boundary, , can be learned from the training samples, where is the normal direction of a separation plane and is the bias [12, 24]. Samples of each type can be classified through the following constraints:The optimal hyperplane can be obtained by solving the following optimization problem:

When the data are linearly separable, the formulations presented above can work accurately. However, they will be ineffective when the investigated sample is overlapping or nonlinear [12]. Thus, a parameter is adopted to make the classifier more robust, which allows a certain degree of misclassification for some points around the decision boundary. Furthermore, a penalty parameter , imposing a trade-off between training error and generalization [24], is introduced to control the number of misclassified points and adjust the margin between different classes [12]. Therefore, the optimization problem to find the optimal decision can be described as follows:For the constrained optimization problem, by using the duality theory of optimization, the final decision function can be presented as [24]where symbolizes Lagrange multipliers and is a kernel function, which is positive definite. Typical examples of kernel function [10] offer these choices: linear kernel, polynomial kernel, radial basis function (RBF) kernel, and sigmoid kernel.

For roller element bearings, the fault detection is a multiclass pattern recognition task, which can be generally solved by decomposing the multiclass problem into several binary class problems [74]. In [75], the multiclass patterns recognition was handled by the “one-against-one” approach. In this paper, we select the polynomial kernel to solve the multiclass pattern recognition task.

#### 3. Proposed Method and the System Framework

##### 3.1. Features Extraction Method FSASD (Features Selection by Adjusted Rand Index and Sum of Within-Class Mean Deviations)

In this paper, we suggest that the most sensitive statistical characteristics should be selected before the implementation of the fault patterns recognition technique. For this reason, the -means method and SWD are applied to a dataset that includes different statistical characteristics for the case of bearing conditions. In FSASD, each kind of statistical characteristic is clustered by -means method, from which the clustering result adjusted rand index (ARI) becomes an evaluation index of each statistical characteristic. For each kind of statistical characteristic, we compute SWD of characteristic samples in each bearings condition. The sum of SWD in all bearing conditions can be obtained. For each statistical characteristic, the higher the value of ARI is, the greater the characteristic class discriminative degree will be. The lower the value of SWD is, the greater the class cohesion of the characteristic will be. Therefore, the ratio of ARI and SWD is selected to indicate the sensitivity of statistical characteristic. The description of FSASD is summarized in the following steps.

*Step 1. *In the training samples, there are kinds of bearing fault types, vibration signals samples in each type of bearings fault pattern, and kinds of statistical characteristics. By vibration signals processing, we can obtain original feature sets, , where can be expressed bywhere is the th statistical characteristic of the th sample in the th kind of bearings fault type.

Next, can be classified into clustering partitions using the -means method. The ARI of the clustering partitions can be calculated to judge the accuracy of clustering results [76, 77].

Consider a set of objects and suppose and represent two different partitions of the objects in such that and for and , where and represent subsets. The ARI is then defined as [78, 79]where(3)Computing the projections: in this step, the linear projections can be computed by solving the following generalized eigen-vector problem: are arranged according to their corresponding eigenvalues . Thus, the embedding is as follows: where is a -dimensional vector and is an matrix. is number of objects in a pair being placed in the same class in and in the same class in , is number of objects in a pair being placed in the same class in and in different classes in , is number of objects in a pair being placed in different classes in and in the same class in , is number of objects in a pair being placed in different classes in and in different classes in .ARI can give a measure of the agreement between partitions and in classification problems [79]. When the ARI value is 1 (maximum), it indicates that the algorithm is doing the correct distinction between classes [79]. Necessarily, the greater the value of ARI is, the better the clustering performance will be. Therefore, the ARI can give us the characteristic’s discriminant power [79].

Once clustering analysis is performed for the characteristics sets, , can be obtained. In this paper, we presume that the greater the value of is, the greater the characteristic class discriminative degree will be.

*Step 2. *The SWD of characteristic samples of a kind of statistical characteristic in each type of bearings conditions is calculated, that is, the SWD of the elements of the row of the matrix . Therefore, we can obtain SWD sets, , where can be expressed bywhereNext, we can obtain , which is the sum of the SWD of characteristic samples of the th statistical characteristic for all cases of bearing conditions, where can be expressed byIn this paper, we presume that the SWD can be used to express the cohesion of data. Thus, there is the standard deviation sequence , which becomes another evaluation index for features extraction. In this paper, we presume that the lower the value of , the greater the class cohesion of the characteristic.

*Step 3. *Obtain a new sequence, , where the definition of is as follows:In this paper, we presume that the greater the value of , the better the statistical characteristic sensitivity of the corresponding characteristic elements. Therefore, the sorted ratio sequence of ARI and SWD (SASD) can be obtained by sorting the ASD in descending mode.

##### 3.2. Supervised Neighborhood Preserving Embedding with Label Information (SNPEL)

Although NPE can preserve the local neighborhood structure on the data manifold, it is mostly used as an unsupervised dimensionality reduction method, which does not take label information into account. However, the label information is useful for improving the dimensionality reduction performance and increasing the classification accuracy. Therefore, a novel dimensionality reduction method, SNPEL, was proposed. SNPEL naturally inherits the merits of SNPEL and LDA. The underlying idea of solving the problem mentioned above is that the optimization objective of LDA can be integrated into NPE; that is, the between-class scatter is maximized and the within-class scatter is minimized.

Based on the description of NPE and LDA in Section 2, the optimization objective of SNPEL can be obtained by combining the optimization objectives of LDA and NPE. The objective function can be defined as follows:

According to (15), the above objective function can be expressed as follows:The above optimization problem can be converted to the trace ratio optimization problem, and according to (21), the objective function (35) can be simplified as follows:where the matrix and matrix need to be normalized. Thus, the final optimization objective function is presented as follows:where and represent the normalized matrix and the normalized matrix , respectively.

Finally, the dimensionality reduction projection matrix** A** can be formed by solving a generalized eigenvalue problem:where , , and are arranged according to their corresponding eigenvalues . The projection matrix is composed of the first eigenvectors; that is, . Therefore, given , the corresponding embedding projection can be obtained.

The detailed procedures of SNPEL are listed as follows.

*Step 1. *Compute Euclidean distance between samples and , and the -nearest neighbors (knn) are used to construct the adjacency graph .

*Step 2. *Compute the weights on the edges. Let denote the weight matrix with having the weight of the edge from node to node , and let it be 0 if there is no such edge. The weights of the edges can be computed by the minimizing weighted objective equation (18).

*Step 3. *Compute the -dimensional mean vectors for the different classes of the dataset.

*Step 4. *Compute between-class scatter matrix and within-class scatter matrix .

*Step 5. *Compute the eigenvectors and corresponding eigenvalues for the matrix and the matrix . Thus eigenvectors and corresponding eigenvalues are obtained.

*Step 6. *Sort the eigenvectors by decreasing eigenvalues and choose eigenvectors with the largest eigenvalues to form the -dimensional projection matrix .

*Step 7. *Compute the equation . The -dimensional samples can be transformed to the -dimensional samples and procedures of dimensionality reduction have been completed.

Finally, with the utility of SNPEL, the low-dimensional feature matrices of the training and testing dataset can be obtained with more sensitive and less redundant information for the bearings fault identification and classification.

##### 3.3. System Framework

The implementation of the proposed method is shown in Figure 2, where the statistical analysis and the artificial intelligence approaches are systematically blended to detect and diagnose rolling element bearing faults. The whole fault diagnosis procedure is divided into four steps: signal processing, features extraction, features reduction, and patterns recognition.

In the first step, vibration signals collected from bearings are decomposed into different wavelet packet nodes by MODWPT. The single branch reconstruction signals of terminal nodes will be applied to generate statistical characteristics. With the utility of the proposed FSASD, the most sensitive statistical characteristics can be selected to construct feature vectors for the training classifier. The most sensitive statistical characteristics will be directly applied to extracting features for testing samples. Then, for the feature reduction, the low-dimensional training feature space is obtained by the proposed SNPEL, which generates a projection that can be used for dimensionality reduction of the testing feature space. The low-dimensional testing feature space can be obtained. SASD and projection matrix are obtained by processing the training set, which can be directly used by testing set. In the last step, the low-dimensional training feature set is employed as the input of the fault type to train the classifier. The trained classifier will be employed to conduct fault patterns recognition using the low-dimensional testing feature set. The procedure of this proposed method outputs the fault identification and classification accuracy.

#### 4. Experiments and Analysis Results

##### 4.1. Experiments Based on Test Rig 1

###### 4.1.1. Experimental Setup and Cases

The vibration dataset is freely provided by the Bearing Data Center of Case Western Reserve University (CWRU) [45]. Figure 3 shows the system used for measuring the data that includes an electric motor (left), a torque transducer/encoder (center), a dynamometer (right), and control circuitry (not shown). The bearings used in this work are deep groove ball bearings of the type 6205-2RS JEM SKF at DE. The single fault (including ball fault, inner race fault, and outer race fault) was separately seeded on the normal bearing with different defect sizes (0.007 in, 0.014 in, 0.021 in, and 0.028 in) using electro-discharge machining [12]. The vibration signals were collected using accelerometers under different motor loads of 0–3 hp (motor speeds of 1730 to 1797 rpm).

In order to evaluate the effectiveness, adaptability, and robustness of the proposed bearing fault diagnosis method, the vibration signals of different fault types and degrees were employed. The detailed information of the used dataset is presented in Table 1, where ball and inner race faults have four fault degrees, respectively, while the outer race fault has three fault degrees. Furthermore, there is also a normal condition. Therefore, there are 12 working conditions, and these conditions correspond to 12 fault patterns. In each fault pattern, 60 samples are acquired from vibration signals, while each sample contains 2000 continuous data points. The 60 samples of each fault pattern were collected from the bearing installed at the drive end of the motor housing, where the sampling frequency is 12 kHz. In order to verify the adaptability of the proposed diagnosis method, the samples of a kind of motor load are selected as the training samples and the samples of different motor loads are selected as the testing samples. This experimental setup is different from other setups employed in previous research [5, 21, 80]. Therefore, two cases are employed in experiments. Cases 1 and 2 are comparative cases. In case 1, 40 random samples of 2 hp are selected as the testing samples. In case 2, 40 random samples of 3 hp are selected as the testing samples. For the training samples, two cases use the same remaining 20 samples of 2 hp.

###### 4.1.2. Analysis Results

According to the system framework shown in Figure 2, the first step is signal processing, in which vibration signals collected from bearings are decomposed into different wavelet packet nodes by MODWPT. In this paper, the decomposition level is 4 and the “dmey” is selected as mother wavelet. One ball fault vibration signal sample from the training set of 2 hp and the corresponding single branch reconstruction signals of terminal nodes are presented in Figure 4.

According to the decomposition of vibration signals, 16 terminal nodes and the corresponding coefficients can be obtained. Then, we obtain 16 single branch reconstruction signals of terminal nodes and 16 corresponding Hilbert envelope spectra (HES), which can generate 192 statistical characteristics using 6 statistical parameters shown in Table 2. For 192 statistical characteristics of each sample, the class discriminative degree of each characteristic is different, which is reflected in Figures 5 and 6. In this paper, we provide four examples, of which two are time-domain characteristics (energy and energy entropy) and two are HES statistical characteristics (standard deviation and kurtosis).

The original feature set is composed of 192 statistical characteristics. Then, the FSASD is employed to select the sensitive statistical characteristics as the input feature vectors for the training classifier. The ARI, SSWD, and ASD of 192 statistical characteristics of the training samples are presented in Figures 7, 8, and 9, respectively. In Figure 7, the horizontal axis represents the number of statistical characteristics. 1–6, 7–12, …, 85–90, and 91–96 represent time-domain characteristics of single branch reconstruction signals of terminal wavelet packet nodes 1–16, respectively. 97–102, 103–108, …, 181–186, and 187–192 represent HES characteristics of single branch reconstruction signals of terminal wavelet packet nodes 1–16, respectively.

In order to verify the effectiveness and adaptability of the proposed bearing fault diagnosis method, a series of comparative experiments are divided into two groups. The detailed descriptions of them are presented below. Furthermore, in order to verify the superiority of MODWPT, WPT is also applied for fault diagnosis, and the results are compared with those of MODWPT.

In the first group, the FSASD is not applied. The original feature set contains 192 statistical characteristics which are directly processed by some dimensionality reduction methods. OFS-SVM is a SVM-based diagnosis model, in which the OFS is the input of SVM. OFS-PCA/NPE/LDA/SNPEL-SVM are SVM-based diagnosis models with the use of PCA, NPE, LDA, and SNPEL, respectively. According to Tables 3–7, the performance of each model using MODWPT is better than that of the model using WPT.

The detailed results of all models using MODWPT are presented below. For the testing set of case 1, all models can achieve preferable performance. The accuracies of each model can reach over 96%, and the highest accuracy can reach 100%. For the testing set of case 2, compared with OFS-SVM, OFS-PCA-SVM and OFS-NPE-SVM have improvement in diagnosis accuracy. But the performance of OFS-LDA-SVM and OFS-SNPEL-SVM is better than that of OFS-SVM, OFS-PCA-SVM, and OFS-NPE-SVM, and the highest accuracy of OFS-SNPEL-SVM can reach 94.58%. In the experiments mentioned above, two cases are tested in various approaches. According to the experimental results, it is evident that the fault diagnosis model using SNPEL can achieve preferable performance.

In the second group, the FSASD is applied to select the sensitive statistical characteristics before the implementation of features reduction and fault diagnosis. OFS-FSASD-SVM is a SVM-based diagnosis model, in which the sensitive characteristics can be selected from OFS by FSASD. OFS-FSASD-PCA/NPE/LDA/SNPEL-SVM are SVM-based diagnosis models with the use of PCA, NPE, LDA, and SNPEL, respectively. According to Tables 8–12 and Figures 10–21, the performance of each model using MODWPT is better than that of the model using WPT. The detailed results of all models using MODWPT are presented below.

The sfn is the number of selected characteristics. For the testing set of case 1, all models can achieve preferable performance. For the testing set of case 2, compared with the experimental results of the first group, diagnosis accuracies of all models using FSASD appear to be an improvement. The performance of OFS-FSASD-SNPEL-SVM and OFS-FSASD-LDA-SVM is better than that of OFS-FSASD-SVM, OFS-FSASD-PCA-SVM, and OFS-FSASD-NPE-SVM. For OFS-FSASD-SNPEL-SVM and OFS-FSASD-LDA-SVM, the performance of OFS-FSASD-SNPEL-SVM is better. For the testing set of case 1, both the diagnosis accuracies of OFS-FSASD-SNPEL-SVM and OFS-FSASD-LDA-SVM can reach 100%. For the testing set of case 2, the maximum diagnosis accuracy of OFS-FSASD-SNPEL-SVM can reach 100%, but the maximum diagnosis accuracy of OFS-FSASD-LDA-SVM can only reach 97.92%. According to the experimental results of the second group, when a suitable parameter sfn is selected, it can achieve a desirable improvement on the diagnosis accuracy. According to Figures 12–21, we find that the fault diagnosis can attain better performance when the parameter sfn is in a relatively wide range; for example, for the performance of OFS-FSASD-SNPEL-SVM, the highest diagnosis accuracy can reach 100%. Therefore, on the one hand, the validity of the design of the correlation parameter can be verified. On the other hand, it can verify that the proposed bearing fault diagnosis algorithm has great adaptability.

##### 4.2. Experiments Based on the Test Rig 2

###### 4.2.1. Experimental Setup and Cases

In order to validate the adaptability of the proposed bearing fault diagnosis method, we collected vibration signals from SQI-MFS test rig to conduct some experiments. Figure 22 shows the test rig and Figure 23 shows that the bearings used in this work are the type SER205. The single fault (including ball fault, inner race fault, and outer race fault) was separately seeded on the normal bearing with different defect sizes (0.05 mm, 0.1 mm, and 0.2 mm) using laser machining. The vibration signals were collected from the bearings using accelerometers under different motor speeds of 1200 rmp and 1800 rmp, where the sampling frequency is 16 kHz.

The detailed information of the used vibration dataset is presented in Table 13, where ball, inner race, and outer race faults have three fault degrees, respectively, and there is also a normal condition. Therefore, there are 10 working conditions, corresponding to 10 fault patterns. In each fault pattern, 60 samples are acquired from vibration signals, while each sample contains 5000 continuous data points. Two cases are employed in the experiments, the same as test rig 1. The samples of a kind of motor speed are selected as the training samples and the samples of different motor speeds are selected as the testing samples. In case 1, 40 random samples of 1800 rmp are selected as the testing samples. In case 2, 40 random samples of 1200 rmp are selected as testing samples. For training samples, two cases use the same remaining 20 samples of 1800 rmp.

###### 4.2.2. Analysis Results

The procedure of bearing fault diagnosis for SQI-MFS test rig is the same as that for the test rig 1. In the experiments, MODWPT is applied for vibration signals processing. For 192 statistical characteristics, the class discriminative degree of each characteristic is reflected in Figures 24 and 25. We provide four examples, of which two are time-domain characteristics (energy and energy entropy) and two are HES statistical characteristics (standard deviation and kurtosis).

When the original feature set has been obtained, the FSASD is employed to select the sensitive statistical characteristics as the input feature vectors for the bearing fault diagnosis. Then, ARI, SSWD, and ASD of 192 statistical characteristics of training samples can be obtained. They are presented in Figures 26, 27, and 28, respectively.

In order to verify the effectiveness and adaptability of the proposed fault diagnosis method for SQI-MFS test rig, a series of comparative experiments are divided into two groups. In the first group, the FSASD is not applied. The fault diagnosis results of OFS-SVM, OFS-PCA-SVM, OFS-NPE-SVM, OFS-LDA-SVM, and OFS-SNPEL-SVM are presented in Tables 14–18. For the testing set of case 1, all models can achieve preferable performance. The accuracies of each model can reach over 95%, and the highest accuracy can reach 99.67%. For the testing set of case 2, all models have no desirable diagnosis accuracies. According to Tables 15–18, the performance of OFS-PCA-SVM and OFS-SNPEL-SVM is better than that of OFS-NPE-SVM and OFS-LDA-SVM, which indicates that the diagnosis model using different dimensionality reduction methods has different impacts on diagnosis accuracy.

In OFS, different statistical characteristics have different fault sensitivity; some are beneficial to fault identification and classification, but some are not. The FSASD can evaluate the fault sensitivity of each statistical characteristic and select the sensitive statistical characteristics. For the second group, the FSASD is applied before the implementation of features reduction and fault diagnosis. The fault diagnosis results of OFS-FSASD-SVM, OFS-FSASD-PCA-SVM, OFS-FSASD-NPE-SVM, OFS-FSASD-LDA-SVM, and OFS-FSASD-SNPEL-SVM are presented in Tables 19–23. The corresponding curve representations are presented in Figures 29–33. For the testing set of case 1, all models can achieve preferable performance. The accuracies of each model can reach over 97.50%, and the highest accuracy can reach 100%. And the performance of OFS-FSASD-LDA-SVM and OFS-FSASD-SNPEL-SVM is better than that of OFS-FSASD-SVM, OFS-FSASD-PCA-SVM, and OFS-FSASD-NPE-SVM. For the testing set of case 2, the maximum diagnosis accuracy of OFS-FSASD-SNPEL-SVM can reach 89.83%, but the maximum diagnosis accuracy of OFS-FSASD-LDA-SVM can only reach 83.17%. For comparison, the diagnosis results of OFS-FSASD-SVM, OFS-FSASD-PCA-SVM, OFS-FSASD-NPE-SVM, and OFS-FSASD-SNPEL-SVM are also presented in Figure 34. According to the experimental results of the second group, compared with the first group, on the one hand, the performance of the diagnosis model using the FSASD can have an improvement, which indicates that the different numbers of sensitive features have an effect on fault diagnosis accuracy. According to Figures 29–34, we find that the fault diagnosis can attain better performance when the parameter sfn is in a range; for example, for the performance of OFS-FSASD-SNPEL-SVM, the highest diagnosis accuracy can reach 89.83%, which can verify that a desirable improvement on the diagnosis accuracy can be achieved when a suitable parameter sfn is selected. On the other hand, the performance of the diagnosis model using different dimensionality reduction methods can also lead to different impacts on fault diagnosis accuracy, especially the fault diagnosis accuracy of the testing set of case 2. Because the proposed SNPEL can preserve the local geometry of the data and work well with multimodal data, at the same time, it can also take the label information into account in dimensionality reduction. Therefore, the low-dimensional feature space obtained by SNPEL is more beneficial to fault identification and classification. Through a series of comparative experiments, the effectiveness and adaptability of the proposed bearing fault diagnosis procedure for SQI-MFS test rig can be verified.

#### 5. Conclusions

This paper proposed a novel procedure in order to identify and classify different bearing fault conditions. The proposed procedure, systematically blending statistical analysis with artificial intelligence, is developed using MODWPT as multidomain features generation approach. Using the proposed FSASD as the most sensitive features extraction method, the modified NPE (SNPEL) as a feature dimensionality reduction technique, and SVM as an automated fault patterns recognition system, the experimental data collected from two experimental test rigs contain different bearing fault conditions such as ball fault, inner race fault, and outer race fault at different defect sizes.

According to the experimental results, the proposed bearing fault diagnosis method has great potential to be an effective and adaptable tool for precise identification and classification of bearing faults for a variety of bearing conditions. For the experimental test rig 1, in the proposed procedure, two cases are employed in experiments. Cases 1 and 2 are a set of comparative cases. They use the testing samples with different motor loads, which are 2 hp and 3 hp, respectively. They use samples with the same motor load (2 hp) as the training samples. Experimental results indicate that the maximum diagnosis accuracy of case 1 can reach 100%. The diagnosis accuracies of case 2 can reach over 99% when the parameter sfn is in a relatively wide range. In order to verify the adaptability of the proposed procedure, vibration datasets collected from the experimental test rig 2 (SQI-MFS) are employed. Cases 1 and 2 use the testing samples with different motor speeds, which are 1200 rmp and 1800 rmp, respectively. They use samples with the same motor speed (1800 rmp) as the training samples. The experimental results can also indicate that the diagnosis model using the proposed methods can achieve preferable performance.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work is supported by the National Key Research and Development Program of China (no. 2017YFC0804400, no. 2017YFC0804401) and the National Key Basic Research Program of China (973 Program, no. 2014CB046300).