Research Article  Open Access
Shiyuan Liu, Xiao Yu, Xu Qian, Fei Dong, "Rolling Bearing Fault Diagnosis Based on Sensitive Feature Transfer Learning and Local Maximum Margin Criterion under Variable Working Condition", Shock and Vibration, vol. 2020, Article ID 8582732, 34 pages, 2020. https://doi.org/10.1155/2020/8582732
Rolling Bearing Fault Diagnosis Based on Sensitive Feature Transfer Learning and Local Maximum Margin Criterion under Variable Working Condition
Abstract
In real industrial scenarios, the working conditions of bearings are variable, and it is therefore difficult for datadriven diagnosis methods based on conventional machinelearning techniques to guarantee the desirable performance of diagnosis models, as the models assume that the distributions of both the training and testing data are the same. To enhance the performance of the fault diagnosis of bearings under different working conditions, a novel diagnosis framework inspired by feature extraction, transfer learning (TL), and feature dimensionality reduction is proposed in this work, and dualtree complex wavelet packet transform (DTCWPT) is used for signal processing. Additionally, transferable sensitive feature selection by ReliefF and the sum of mean deviation (TSFSR) is proposed to reduce the redundant information of the original feature set, to select sensitive features for fault diagnosis, and to reduce the difference between the marginal distributions of the training and testing feature sets. Furthermore, a modified feature reduction method, the local maximum margin criterion (LMMC), is proposed to acquire lowdimensional mapping for highdimensional feature spaces. Finally, bearing vibration signals collected from two test rigs are analyzed to demonstrate the adaptability, effectiveness, and practicability of the proposed diagnosis framework. The experimental results show that the proposed method can achieve high diagnosis accuracy and has significant potential benefits in industrial applications.
1. Introduction
Rolling element bearings (REBs) are one of the most common machine elements of rotating machinery equipment in modern industry and smart manufacturing [1, 2], and the health state of REBs can seriously affect the safe and stable operation of rotary mechanical equipment [2]. REBs often operate in harsh working environments, and their failure probability is therefore higher than that of other components [3, 4]. Thus, REB fault diagnosis is of great significance for the guarantee of equipment safety and the reduction of maintenance costs [4]. In the past decade, because the vibration signals usually carry rich information about the machine operating conditions, the vibration signals collected from REBs have been commonly used as the analytical signals in many intelligent machine fault diagnosis systems [5]. In recent years, with the rapid development of signal processing, data mining, and artificial intelligence technology, datadriven fault diagnosis has become a popular research topic [5]. Datadriven fault diagnosis consists of four steps, namely, signal collection and processing, feature extraction, feature reduction, and pattern recognition [5–8], among which feature extraction is the crucial step for the extraction of more useful information of the original vibration signals for fault pattern recognition. However, most existing datadriven intelligent diagnosis methods have two main limitations that hinder their applicability in real industrial scenarios [5, 6, 9]: (1) most existing feature extraction and fault classification models assume that the training and testing data have the same distributions. Due to the harsh working environment and different working requirements in an industry, the working conditions are not consistent; this can therefore lead to differences between the distributions of the training and testing data [6]. (2) The variability of working conditions and the diversity of the types of failure of rotating machines often lead to insufficient labeled target fault data. Therefore, diagnostic models based on conventional machinelearning techniques that are learned with training data do not guarantee the preferred diagnosis performance of the use of testing data collected from industrial scenarios. To overcome these two limitations, it is necessary to use an improved fault diagnosis framework in which the training data are the labeled data under one working condition, and the resulting model can be applied to the unlabeled data under other working conditions.
Signal processing is the first step in datadriven REB fault diagnosis methods and has been carried out in many previous investigations by numerous scholars. Because vibration signals collected from REBs generally have nonlinear and nonstationary features, a timefrequency analysis can be effective for feature extraction [10]. Some commonly used and representative conventional timefrequency domain analysis approaches include empirical mode decomposition (EMD), shorttime Fourier transform (STFT), the Wigner–Ville distribution (WVD), and wavelet transform (WT) [11]. In addition, parameterized timefrequency transform (PTFT) methods [12, 13] have been proposed to achieve a more accurate extraction of the instantaneous rotation frequency (IRF) from strong nonstationary vibration signals. In the work by Wang and Xiang [14], splinekernelled chirplet transform (SCT), one of the PTFT methods, was employed to calculate the timefrequency distribution and extract the instantaneous rotation frequency for REB fault diagnosis under varying speed conditions. In the work by Wang et al. [15], polynomial chirplet transform (PCT), another PTFT method, was employed to estimate the IRF of REBs from the vibration signals for fault diagnosis.
EMD is a common and effective timefrequency approach in the fault diagnosis of rotating machinery and can automatically decompose nonstationary and nonlinear signals into multiple modal compositions [16, 17]. In some previous studies [18–21], EMD was employed to process the original signals and extract features for REB fault diagnosis. However, there are some limitations of EMD, such as overenveloping, end effects, and mode mixing [17, 22]. STFT is also an effective timefrequency analysis approach that can be used to divide the entire time domain into numerous segments of the same length, and each time period is an approximately stationary process [23, 24]. Some researchers [25–28] have used STFT for fault diagnosis, but its effectiveness is still hampered by the limitation of its single triangular basis [17, 29]. The WVD is also a widely used nonlinear timefrequency distribution for signal processing due to its excellent resolution and localization in the timefrequency domain [30]; however, the presence of crossterms when they are applied to multicomponent signals can result in misleading interpretations [31, 32]. WT, including continuous wavelet transform (CWT) and discrete wavelet transform (DWT), is an outstanding and powerful method in rotary machine diagnosis because its multiresolution capability is suitable for the analysis of nonlinear and nonstationary signals [33]. However, CWT can generate redundant data, has a huge operand, and is very time consuming [34, 35]. DWT can overcome these drawbacks of CWT, but its limitations of shift variance and frequency aliasing may lead to the loss of useful information [36]. To address these shortcomings of DWT, dualtree complex wavelet transform (DTCWT) was proposed by Tang et al. [37, 38] and further investigated by Selesnick et al. [39] in the dyadic case. DTCWT possesses some advantageous properties [36, 37, 40], including its (1) near shiftinvariance and reduced aliasing, (2) good directional selectivity, which can overcome the lack of the directional selection of DWT, (3) limited redundancy and efficient order, and (4) ability to acquire amplitude information and achieve perfect reconstruction. These properties are all beneficial for feature extraction in the task of mechanical fault diagnosis [31]. Dualtree complex wavelet packet transform (DTCWPT) is an extension of DTCWT and can overcome the foremost limitation of DTCWT, namely, that it cannot realize multiresolution analysis in the highfrequency band. In previous research [36, 37, 40, 41], DTCWPT has been employed to process signals and extract features for REB fault diagnosis. In this paper, DTCWPT is introduced to process the original vibration signals collected from REBs.
For the construction of a feature set for fault pattern recognition, the statistical properties of the signals in the time, frequency, and timefrequency domains can be extracted to represent feature information [9, 42–44], such as the peak value (PV), root mean square (RMS), variance (V), skewness (Sw), kurtosis (K), energy, and energy entropy. In [42], after the vibration signals were processed by wavelet analysis, singlebranch reconstruction signals and the corresponding HHT envelope spectrum (HES) were used to generate 192 statistical features using 6 statistical parameters for bearing fault diagnosis. In [43], vibration signals collected from REBs were decomposed into several different IMFs by EMD. The first four IMFs were selected to obtain the HHT marginal spectrum and HES, which were then used to calculate the original statistical properties. In [9], 29 statistical parameters were selected to extract 29 statistical features, which formed a highdimensional original feature dataset for REB fault diagnosis. In [44], more than 30 feature indicators of vibration signals were calculated for axle bearings under different conditions, and the features that could more effectively and representatively reflect the fault features were selected for fault detection. In the research [45], the RMS and K were used to calculate fault features for wind turbine bearing fault diagnosis. It is often difficult to determine which statistical property can best reflect the nature of a fault from the feature space because of the complex mapping relations between some bearing faults and their signals [42, 43]. Thus, when unsuitable statistical features are chosen for fault pattern recognition, it may lead to a decline in the accuracy and efficiency of fault diagnosis. According to some previous studies [11, 42, 43], the selection of a feature subset that is formed by faultsensitive features is a crucial step for the achievement of the expected diagnostic accuracy.
As discussed previously, the two foremost limitations of most existing datadriven intelligent diagnosis methods [5, 6] are that (1) they assume the uniform distributions of the training and testing data are the same and (2) the variability of working conditions and the diversity of failures often lead to insufficient labeled target fault data. Recently, these two problems have garnered considerable attention and have been further investigated by some researchers. An et al. [5] proposed a novel threelayer model inspired by a recurrent neural network (RNN) and TL for REB fault diagnosis under different working conditions. Ma et al. [9], aiming at overcoming the first limitation, proposed a transfer diagnosis framework based on domain adaptation for bearing fault diagnosis across diverse domains. In the work by Gao et al. [46], the finite element method (FEM) was employed to simulate samples with different faults to overcome the missing sufficient and complete fault samples. In a study by Liu et al. [47], aiming at the problem of the faulty samples of realworld running mechanical systems being difficult to obtain, a personalized fault diagnosis method for the detection of bearing faults was proposed for the activation of smartsensor networks using FEM simulations. Some existing research has shown that TL [48] has broad application prospects and wide applicability in various fields [49–51]. Featurebased transfer, a mainstream branch of TL technology, has been used in image classification [52–54] and has inspired a novel idea for overcoming the two limitations of datadriven intelligent fault diagnosis. In this paper, the use of a novel feature extraction procedure, namely, transferable sensitive feature selection via ReliefF and the sum of standard deviation ratio (TSFRS), is proposed. TSFRS has the following two aspects: (1) it is characterized by the selection of faultsensitive features and combines the ReliefF algorithm and the sum of withinclass mean deviations (SMD) of feature data; (2) a featurebased TL method, namely, transfer component analysis (TCA), is used to reduce the differences between the marginal distributions of the training and testing data.
After the steps of signal processing and feature extraction, a highdimensional feature set can usually be generated; if this feature set is used directly in fault pattern recognition, it will lead to very high computational complexity and the degradation of fault diagnosis accuracy [42]. Hence, dimensionality reduction is another key step that must be taken before fault pattern recognition. In fact, the dimensionality reduction of features can not only limit storage requirements and increase the algorithm speed, but can also improve the predictive accuracy of the classifier model by removing noisy and redundant features while retaining the most useful information regarding diverse bearing failures [55]. Dimensionality reduction methods can be classified into either linear or nonlinear methods. Principal component analysis (PCA) and linear discriminant analysis (LDA), as two classical linear dimensionality reduction methods, have been extensively used for linear data, but they may be invalid for nonlinear data [56]. Therefore, some nonlinear dimensionality reduction methods, namely, kernel principal components analysis (KPCA), Isomap, Laplacian eigenmaps (LE), and local linear embedding (LLE), among others, have presented valid solutions for the dimensionality reduction of nonlinear data [56]. However, nonlinear dimensionality reduction methods have some limitations in practical applications, such as the problem of “outofsample” that has no explicit mapping matrix [57], the problem of the overlearning of locality [58], and high computational complexity. In recent years, some unsupervised manifold learning methods that preserve the local geometric structure on the data manifold using the linear approximation of the nonlinear mappings have been proposed, and some representative methods include localitypreserving projections (LPP) [59], neighborhoodpreserving embedding (NPE) [60], and orthogonal neighborhoodpreserving projection [61]. Among these manifold learning methods, LPP has attracted attention in the fault diagnosis field [62–64], but it does not utilize the label information in dimensionality reduction. LDA is a supervised dimensionality reduction method that considers the label information in feature reduction, but it cannot be directly applied when the withinclass and betweenclass scatter matrixes are singular because of the small sample size (SSS) problem [65]. Based on the respective dominant attributes of LPP and LDA, a novel dimensionality reduction method, namely, local Fisher discriminant analysis (LFDA), was proposed by Sugiyama [66]. LFDA takes into account the label information of data while simultaneously preserving the local geometric structures of the feature data. However, LFDA only considers the neighbor relationships between samples of the same class while ignoring those between samples of different classes. Aiming at the alleviation of the SSS problem of LDA, the maximum margin criterion (MMC), a supervised dimensionality reduction method, was proposed [65]. Inspired by the attributes of LFDA and MMC, this paper proposes a novel feature reduction method, namely, the local maximum margin criterion (LMMC), an improved MMC in which both the neighbor relationships between samples of the same class and those between samples of different classes are considered.
Therefore, the contributions of this paper are summarized as follows. To solve the problem of fault diagnosis via vibration data that are variably distributed under different working conditions, a novel intelligent fault diagnosis framework of REBs based on multidomain features that systematically combine statistical feature extraction, featurebased TL, feature reduction, and pattern recognition is proposed. TSFRS, a novel feature extraction procedure, is proposed for the selection of the transferable faultsensitive statistical features as the basis of the subsequent fault analysis. LMMC, an improved feature reduction method, is proposed for the excavation of abundant and valuable information with low dimensionality, which is beneficial for fault diagnosis. The execution of the proposed fault diagnosis framework of REBs is divided into four steps, namely, signal processing, feature extraction, feature reduction, and fault pattern recognition. First, DTCWPT is performed on raw vibration signals collected from REBs, and different terminal nodes can be obtained. Multidomain statistical features are then extracted from the reconstructed signals of the terminal nodes to construct the original feature set. Secondly, based on the ReliefF algorithm and mean deviation, a new evaluation index, namely, the ratio of the feature weight value and the SMD, is employed to indicate the sensitivity of statistical features; the most sensitive features can be selected to form a feature subset that represents the fault peculiarity of REBs. Additionally, TCA is used to reduce the differences between the marginal distributions of feature datasets under different working conditions. Thirdly, LMMC is performed on the original highdimensional feature set to acquire a new lowerdimensional projection of it. Finally, vibration signals collected from two test rigs under different working conditions are employed to validate the effectiveness, adaptability, and superiority of the proposed method for the identification and classification of REB faults. The first test rig is from Case Western Reserve University, on which two cases with 12 fault types under different motor loads of 2 hp and 3 hp are employed for validation experiments. The second test rig is an SQIMFS test rig, on which two cases with 10 fault types under different motor speeds of 1200 rpm and 1800 rpm are employed to further verify the adaptability of the proposed method.
The remainder of this paper is organized as follows. In Section 2, the theoretical backgrounds of the DTCWPT technique, TCA technique, and MMC are summarized. In Section 3, a description of the proposed diagnosis technique is provided, and the fault diagnosis framework of REBs is illustrated. In Section 4, REB fault vibration signals collected from two experimental test rigs are investigated to verify the performance of the proposed method. Finally, the conclusion of this work is presented in Section 5. Some acronyms used in this paper are presented in Table 1.

2. Theoretical Background
2.1. DualTree Complex Wavelet Packet Transform (DTCWPT)
DTCWT, an enhancement of DWT, is characterized by some important properties including near shiftinvariance and the inhibition of frequency aliasing components [36]. However, DTCWT cannot be used for multiresolution analysis in the highfrequency band where useful fault feature information usually exists [67]. To address this limitation, DTCWPT, which is composed of two parallel discrete wavelet packet transforms with different low and highpass filters, can present a more precise frequency band partition over the entire analyzed frequency band [36, 40]. DTCWPT is divided into real and imaginarypart wavelet packet transforms, which can be, respectively, regarded as the real and imaginary trees. The real tree decomposition and the corresponding coefficients can be expressed as follows [37]:where is the coefficients in the real tree at a scale l and node N and and are the lowpass filter and the highpass filter, respectively. The imaginary tree decomposition and the corresponding coefficients can be expressed as follows:where is the coefficients in the imaginary tree at a scale l and node N and and are the lowpass filter and the highpass filter, respectively. When the scale l is 0, the coefficients are both equal to the original signal , namely, . The decomposition coefficients of DTCWPT are composed of and , and it can be expressed as follows:
The reconstruction procedure of DTCWPT is as follows:where and are the wavelet packet reconstruction filters of the real and imaginary trees, respectively. DTCWPT is characterized by two prominent advantages: (1) it is beneficial to the detection of multiple harmonic signals and (2) it can help to extract the periodic impact features of signals. Therefore, in this work, DTCWPT is used to process the original vibration signals, and the corresponding singlebranch reconstruction signals of the terminal nodes are used to extract original features.
2.2. Maximum Margin Criterion (MMC) and Local Fisher Discriminant Analysis (LFDA)
Linear discriminant analysis (LDA), one of the most popular methods for dimension reduction in statistics research fields [68, 69], was proposed by Fisher [70] for the dimension reduction of binary classification problems and was further extended to multiclass cases by Rao [71]. However, LDA cannot be directly applied when the withinclass and betweenclass scatter matrixes are singular because of the SSS problem [65]. To address this drawback, Li et al. [65] and Song et al. [72] used the difference of the betweenclass and withinclass scatter matrixes as a discriminant criterion called the maximum margin criterion (MMC), which can make the inverse matrix not to be constructed. Thus, the SSS problem in traditional LDA is alleviated.
Let be the input data set and be the associated class label set, where is an Mdimensional sample, N is the number of samples, and c is the total number of classes. To reduce the dimensionality of a sample , some measures are needed to be employed to assess the similarity or dissimilarity. We want to find a linear transformation , transforming x from to , where . After the dimensionality reduction, the similarity or dissimilarity information is preserved as much as possible. In the work by Li et al. [65], the Euclidean distance was applied to measure the dissimilarity, and the objective of the MMC is for a sample to be close to those in the same class but far from those in different classes. Thus, the MMC can be presented as follows:where and are the prior probability of the class and , respectively. The is defined as the distance between mean vectors, that is,where and are the mean vectors of the classes and , respectively. However, due to the fact that (6) neglects the scatter of classes, (6) is not suitable. Though is large, it is not easy to separate two classes that have the large spread and overlap with each other. For this problem, considering the scatter of classes, the betweenclass distance can be redefined as follows:where is some measure of the scatter of the class . The generalized variance or overall variance trace () is usually used to measure the scatter of data, where is the covariance matrix of the class . Thus, based on (5) and (7), two new parts can be obtained by decomposing (5):
By employing the Euclidean distance, the first part in (8) can be simplified as
Because and , (9) can be further simplified as
The second part in (8) can be simplified as
Equation (5) can be transformed towhere is the betweenclass scatter matrix, is the withinclass scatter matrix, and measures the betweenclass separation, while measures the withinclass cohesion.
Local Fisher discriminant analysis (LFDA), a linear supervised dimensionality reduction method, was proposed by Sugiyama [66]. LFDA can not only maximize betweenclass separability and preserve the withinclass local manifold structure at the same time in a reduced dimensional space, but also inherits an excellent property from LDA; that is, it has an analytic form of the embedding matrix, and the solution can be easily computed by solving a generalized eigenvalue problem [66]. LFDA and LDA have the same optimization framework . Furthermore, LFDA incorporates local information into the definition of weight. The objective of LDA is to maximize the ratio of the betweenclass scatter matrix to the withinclass scatter matrix :where is a projection matrix, and the definitions of and are as follows:where is the number of samples in class l, is the mean of the samples in class l, and is the mean of all samples:where n is the number of samples. According to the literature [66, 73], and also have equivalent form [66, 73]:where and are the weight matrices, and are the diagonal matrices, is the ith diagonal samples of and the sum of elements of the ith row of , and is the ith diagonal samples of and the sum of elements of the ith row of .
LFDA incorporates local information into the definition of weight. Thus, has been replaced by and has been replaced by . and are presented as follows [66]:wherewhere can be defined as follows:where is the local scaling around , defined by and is the kth nearest neighbor of . If and are close to each other in the feature space, is large; otherwise, it is small [66]. According to (21), for the far apart sample pairs in the same class, it can be weighted and have less influence on and . Furthermore, the sample pairs in different classes cannot be weighted [74].
2.3. Transfer Component Analysis (TCA)
Transfer component analysis (TCA) [75] is a typical featurebased TL method. Given the source domain data that are the training dataset with corresponding labels and the target domain data that are the dataset without corresponding labels, TCA aims to reduce the difference between the marginal distributions of the different datasets by leveraging the transferable features or knowledge from the source domain [75].
A domain D consists of a Ddimensional feature space X, whose marginal probability distribution is , where is a training dataset, and the representation of D can be . consists of a label space Y and a predictive function , where is a training dataset label and represents the conditional probability distribution. There are two learning tasks, namely, task of and task of . Feature transfer is employed to facilitate the learning process of the target predictive function in by using the knowledge and information in and , where or [75]. Given two datasets and , , and a transformation exits such that and , where is a nonlinear mapping function in a reproducing kernel Hilbert space H. The learning objective of TCA is to find a domaininvariant feature space in which the marginal distribution distance between the source domain and the target domain is minimized. The distribution distance is measured using the maximum mean discrepancy (MMD) criterion, which is defined as follows [76]:where
In equation (22), and represent the numbers of source domain samples and target domain samples, respectively, represents the trace of the matrix, K is a kernel matrix, and , , and are the kernel matrices in the source domain, cross domain, and target domain, respectively. L can be calculated as
TCA maps the features of two domain datasets into the same kernel space through the unified kernel function. The resultant kernel matrix can be calculated as follows:where , and the distribution distance between the different domain datasets can be defined as
The complexity of W needs to be controlled by a regularization term , which is employed to avoid the rank deficiency of the denominator. Thus, the objective function of TCA can be rewritten aswhere is a tradeoff parameter and can be used to guarantee that the optimization objective can be well defined. represents an identity matrix. H is a centering matrix. can avoid the trivial solution .
According to the introduction of TCA, the optimization objective of TCA is that the latent space spanned by the learned samples preserves the variance of the data and minimizes the marginal distributions between the different domain datasets as much as possible. The optimization problem of equation (27) can be efficiently solved by the trace optimization problem.
3. Proposed Method and System Framework
3.1. Feature Extraction Procedure TSFRS (Transferable Sensitive Feature Selection by ReliefF and the Sum of Mean Deviation)
TSFRS has two components: (1) the selection of faultsensitive features, which combines the ReliefF algorithm and the sum of withinclass mean deviations (SMD) of feature data, and (2) the featurebased TL method, in which TCA is used to reduce the difference in the marginal distributions between the training and testing data.
In this paper, it is suggested that the sensitive statistical features be selected before the implementation of fault pattern recognition. Thus, the ReliefF algorithm [77] and MD are employed for a dataset that includes different statistical features for the case of REB conditions. Each type of statistical feature is evaluated by the ReliefF algorithm to determine its weight value (WV). ReliefF, a supervised algorithm for feature ranking, is usually applied in data preprocessing as a feature subset selection method. The basic concept of ReliefF is to compute instances at random, compute their nearest neighbors, and adjust a feature weighting vector to give more weight to features that discriminate the instances from the neighbors of different classes. For each kind of the statistical feature, the MD of the feature data samples in each REB condition can be calculated, and the sum of MD in all REB conditions can be further calculated. Aiming at the evaluation of each statistical feature, the higher the WV, the greater the discriminative degree of the feature class. The lower the value of MD, the greater the class cohesion of the characteristic. Therefore, the ratio of WV and SMD is selected to indicate the sensitivity of a statistical feature, based on which the sensitive feature subset can be selected from the original feature set.
Furthermore, the variable working conditions of REBs in industry scenarios can lead to distribution differences between the training and testing data [6]. Therefore, after the construction of the sensitive feature subset, TCA is employed to reduce the difference between the distributions of the sensitive feature training and testing subsets. The description of TSFRS is summarized as the following steps.
Step 1. In the training samples, there are M types of REB faults, N vibration signal samples in each type of REB fault pattern, and K types of statistical features. Via the processing of the vibration signals, original feature sets can be obtained, where can be expressed bywhere is the kth statistical feature of the jth sample in the ith type of REB fault. Next, can be evaluated to obtain the corresponding feature WV using the ReliefF algorithm, and the WV of each statistical feature can be used to evaluate the distinguishability of the feature. The higher the WV, the greater the discriminative degree of the feature class.
Step 2. The MD of feature samples of a type of statistical feature in each type of REB condition is calculated, i.e., the MD of the elements of row of . Therefore, an MD set, , can be obtained, where can be expressed bywhereNext, can be obtained, that is, the sum of the MD of feature samples of the kth statistical feature for all cases of REB conditions; can be expressed byIn this paper, it is presumed that the MD can be used to express the cohesion of data. Thus, there is a mean deviation sequence , which becomes another evaluation index for sensitive feature selection. The lower the value of , the greater the class cohesion of the feature.
Step 3. A new sequence is obtained, where the definition of is as follows:In this paper, it is presumed that the greater the value of , the better the fault sensitivity of the corresponding feature elements. Therefore, the sorted ratio sequence of WV and SMD can be obtained by sorting the WSD in the descending mode.
Step 4. For the labeled training data under one working condition and unlabeled testing data under another working condition, based on the training data, the sorted sequence WSD in the descending mode is acquired and is used to select the most sensitive statistical features that can construct a sensitive feature set (SFS). The most sensitive statistical features will be directly applied to the extraction of features for testing data. Thus, two sensitive feature sets can be obtained; the first is the SFS of training data, called , and the other is the SFS of testing data, called . Furthermore, and are used as the input of TCA, and a new feature set , in which the difference in marginal distributions between and is minimized, can be generated.
3.2. Local Maximum Margin Criterion (LMMC)
Although the MMC can avoid the SSS problem of LDA, it may be invalid for nonlinear datasets due to the lack of consideration of the local structure of the dataset. LFDA considers the neighbor relationships between samples of the same class while ignoring those between samples of different classes. Aiming at this problem and inspired by the attributes of the MMC and LFDA, this paper proposes a novel feature reduction method, LMMC, which is an improved MMC. The LMMC naturally inherits the merits of the MMC and LFDA, and the underlying idea of the solution to the problem mentioned previously is that the optimization objective of LFDA can be integrated into the MMC; in addition, the neighbor relationships between samples of different classes are taken into consideration.
Based on the descriptions of the MMC and LFDA provided in Section 2, the optimization objective of the LMMC can be obtained by combining the optimization objectives of the MMC and LFDA. In addition, the LMMC has an improvement on the local information of the definition of the weight. The LMMC and MMC have the same optimization framework, but and are, respectively, replaced by and . The objective function can be presented as follows:
According to equations (16) and (17), the local and the local are defined as follows:wherewhere and are the weight matrices and and are the diagonal matrices. In , the means of is that j is the nearest neighbor of i, and they belong to different classes.
According to equations (30)–(37), the local structure of the dataset, including the neighbor relationships between samples of the same class and the neighbor relationships between samples of different classes, can be considered into the dimensionality reduction by changing the WV.
Let be a linear transformation that transforms the highdimensional dataset from to , where . Thus, in the lowerdimensional space, the scatter matrices, respectively, become and , where W can be determined by maximizing
It is assumed that W is constituted by the unit vectors, that is, and , . Thus, W can be obtained by solving the following constrained optimization:
The above constrained optimization can be transformed to the eigenvalue problem:
Thus,
According to equation (41), W is composed of the eigenvectors of corresponding to the first L largest nonnegative eigenvalues. Finally, with the utility of the LMMC, the lowdimensional feature matrices of the training and testing datasets can be obtained with more sensitive and less redundant information for REB fault diagnosis.
3.3. System Framework
The implementation of the proposed fault diagnosis framework is presented in Figure 1, in which the statistical analysis and artificial intelligence approaches are systematically blended to detect and diagnose REB faults under different working conditions. The entire fault diagnosis procedure is divided into four steps, namely, signal processing, feature extraction, feature reduction, and fault pattern recognition.
In the signal processing step, vibration signals collected from REBs under different working conditions are decomposed into different wavelet packet nodes by DTCWPT. The singlebranch reconstruction signals of terminal nodes will be employed to extract statistical features. In the feature extraction step, with the utilization of the proposed TSFRS, the most sensitive statistical features can be selected based on the training dataset to construct a sensitive feature subset for the training classifier, and these most sensitive statistical features will be directly applied to the extraction of features for the testing dataset. The sensitive feature subsets of the training and testing datasets are, respectively, used as the source domain data and the target domain data. TCA is used to reduce the difference in marginal distributions of the source domain data and the target domain data. For the feature reduction, the lowdimensional training feature space is acquired by the proposed LMMC, which generates a projection that can be directly used for the dimensionality reduction of the testing feature dataset; thus, the lowdimensional testing feature dataset can be obtained. The WSD and projection matrix W are obtained by processing the training dataset and can be directly used for the testing dataset. In the final step, the lowdimensional training feature dataset is employed as the input of the fault type to train the classifier. In this paper, support vector machine (SVM) is used as the fault pattern recognition classifier. The trained classifier will be employed to conduct fault pattern recognition using the lowdimensional testing feature dataset. Finally, the procedure of the proposed method outputs the fault identification and classification accuracy.
4. Experiments and Analysis Results
4.1. Experiments Based on Experimental Test Rig 1
4.1.1. Experimental Setup and Cases
The REB vibration data from Case Western Reserve University (CWRU) [78], which reproduces several fault scenarios, were used to verify the effectiveness of the proposed methods. The experimental test rig is presented in Figure 2; the test rig was composed of an electric motor (left), a torque transducer/encoder (center), a dynamometer (right), and control circuitry (not shown). An SKF62052RS deepgroove REB was used in the test rig, and electrodischarge machining was employed to set singlepoint defects with different fault diameters, namely, 0.007, 0.014, 0.021, and 0.028 inches. The collected vibration signals of the REBs consisted of inner race fault signals, ball fault signals, outer race fault signals, and normal signals. The test rig supported a motor load of 0–3 horsepower (hp), and the corresponding motor speeds were 1730 to 1797 rpm. Three accelerometers were, respectively, placed at the 12 o’clock position. The sampling frequency was 12 kHz for the driveend and fanend bearings.
In order to verify the effectiveness, adaptability, and the practical value of the proposed bearing fault diagnosis framework under different working conditions, the vibration signals of different fault types and diameters under different motor loads are employed. The signal samples of 2 hp and 3 hp are applied for experiments, and there are four bearing conditions (normal, ball fault, inner race fault, and outer race fault). Ball fault and inner race fault have four fault diameters (0.007 inches, 0.014 inches, 0.021 inches, and 0.028 inches, respectively). Outer race fault has three fault diameters (0.007 inches, 0.014 inches, and 0.021 inches). Therefore, there are 12 bearing conditions which can correspond to 12 patterns for fault diagnosis. The bearing vibration signals are divided into several data segments, and each segment which is used as a sample has 2000 data points. Each bearing condition contains 60 samples, among which 40 random samples are selected as the testing samples and 20 random samples are selected as the training samples. Based on these samples, two group datasets are used in experiments. The first group dataset includes two cases (cases 1 and 2), and the samples of 2 hp are used as training samples. In case 1, the samples of 2 hp are used as testing samples. In case 2, the samples of 3 hp are used as testing samples. The second group dataset includes two cases (cases 3 and 4), the samples of 3 hp are used as training samples. In case 3, the samples of 3 hp are used as testing samples. In case 4, the samples of 2 hp are used as testing samples. The detailed information of the two group experimental datasets is shown in Tables 2 and 3, respectively.


4.1.2. Analysis Results
According to the diagnosis framework shown in Figure 1, each sample is decomposed into different wavelet packet nodes by DTCWPT, and the decomposition level is 4. Thus, 16 terminal nodes, namely, subband signals, can be obtained. Then, 16 singlebranch reconstruction signals of terminal nodes can be obtained, and 16 singlebranch reconstruction signals are selected to generate 16 Hilbert envelope spectra (HES). By using the 6 statistical parameters shown in Table 4, each singlebranch reconstruction signal can generate 6 statistical features by calculating 6 statistical parameters, and each HES can generate 6 statistical features by calculating 6 statistical parameters. Thus, 16 singlebranch reconstruction signals and 16 HES can generate 192 statistical features which compose the original feature set (OFS). Then, the TSFRS is performed to select the sensitive statistical features and reduce the difference of distribution between the training sensitive feature subset and the testing sensitive feature subset. The WV, SSMD, and WSD of 192 statistical features of the training samples (2 hp) are presented in Figures 3–5, respectively. In Figure 4, the horizontal axis represents the number of statistical features. The 1–6, 7–12, … , 85–90, and 91–96 represent time domain features of singlebranch reconstruction signals of terminal wavelet packet nodes 1–16, respectively. The 97–102, 103–108, … , 181–186, and 187–192 represent the HES features. After the procedure of TSFRS, the feature reduction method LMMC is further performed to obtain a lowdimensional feature set which is used as the input of the SVM.
 
is the series of a dataset for i = 1, 2, …, n, n is the number of data points, and is the energy distribution of the signal . 
In order to verify the effectiveness of the proposed TSFRS and LMMC, two group comparative experiments are performed. In addition, WPT is also used in experiments, and the results of experiments using WPT are compared with those of DTCWPT, which can help to verify the superiority of DTCWPT. In this paper, the training dataset is employed to train the fault diagnosis model, the testing dataset is employed to test the fault diagnosis model, and the accuracy results presented in a series of tables and figures are the average diagnostic accuracy of 12 bearing conditions. Thus, we use the average diagnostic accuracy results for experimental analysis, and the detailed experimental analysis is described as follows.
In the first group of experiments, the TSFRS is not applied. The OFS containing 192 statistical features is directly processed by some dimensionality reduction methods, namely, PCA, LDA, MMC, LFDA, and LMMC. OFSSVM is a diagnosis model based on SVM, in which the OFS is used as the input of the SVM. OFSPCA/LDA/LFDA/MMC/LMMCSVM are also SVMbased diagnosis models that, respectively, use PCA, LDA, LFDA, MMC, and LMMC. According to results shown in Tables 5–10, for the cases 1 and 2, the performance of each model using DTCWPT is better than that of the model using WPT; for the cases 3 and 4, the performance of OFSSVM, OFSLFDASVM, and OFSLMMCSVM models using DTCWPT is better than that of these models using WPT. For the OFSPCASVM, OFSLDASVM, and OFSMMCSVM models using DTCWPT, the diagnosis results of case 4 are better than that of models using WPT. In general, the DTCWPT has more advantages than WPT.






The detailed experimental results of all models using DTCWPT are presented below. For the testing set of case 1, all models can obtain preferable diagnosis accuracy. The maximum accuracy of each model can attain over 98%, and the highest accuracy can attain 100%, which is obtained by OFSLMMCSVM. For the testing set of case 2, the working condition is different from the training set. The diagnosis accuracy of OFSSVM can only attain 83.33%, compared with OFSSVM, and all models have enhancement in diagnosis accuracy. But, the performance of OFSLMMCSVM is better than that of other models, and the highest accuracy of OFSLMMCSVM can attain 93.75% when the dimension size is 11. For the testing set of case 3, the maximum accuracy of each model can attain over 96%, and the highest accuracy can attain 100%, which is obtained by OFSLMMCSVM. For the testing set of case 4, the diagnosis accuracy of OFSSVM can only attain 78.54%, and the model using PCA, LDA, MMC, and LMMC has an obvious enhancement in diagnosis accuracy, respectively. The highest diagnosis accuracy can attain 92.08%, which is obtained by OFSLMMCSVM. According to the experimental results of the four testing cases under various models, it is evident that the fault diagnosis model using LMMC can achieve preferable diagnosis performance.
In the second group of experiments, the TSFRS is applied before the implementation of feature reduction and fault pattern recognition. OFSTSFRSSVM is an SVMbased diagnosis model, in which the most sensitive features can be selected from OFS according to WSD; in addition, the TCA is employed to reduce difference of distribution between the training sensitive feature subset and the testing sensitive feature subset. OFSTSFRSPCA/LDA/LFDA/MMC/LMMCSVM are also SVMbased diagnosis models that, respectively, use PCA, LDA, LFDA, MMC, and LMMC. According to Tables 11–16 and Figures 6–19, the detailed experimental results of all models using DTCWPT are presented in the following.

 
Dimension size is 10. 
 
Dimension size is 11. 
 
Dimension size is 11. 
 
Dimension size is 11. 
 
Dimension size is 11. 
(a)
(b)
(a)
(b)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
For the testing set of case 1, all models can achieve preferable performance, which is reflected on diagnosis accuracy. The maximum diagnosis accuracy of each models can attain over 98%, and the highest diagnosis accuracy can attain 100%, which is obtained by both OFSTSFRSLFDASVM and OFSTSFRSLMMCSVM. For the testing set of case 2, as compared with the experimental results of the first group, diagnosis accuracies of all models using TSFRS appear with an enhancement. Among the models mentioned above, the performance of OFSTSFRSLFDASVM and OFSTSFRSLMMCSVM is better than that of OFSTSFRSSVM, OFSTSFRSPCASVM, OFSTSFRSLDASVM, and OFSTSFRSMMCSVM. The maximum diagnosis accuracies of OFSTSFRSLMMCSVM and OFSTSFRSLFDASVM can attain over 98%, but the highest diagnosis accuracy of OFSTSFRSLMMCSVM can attain 100%. For the testing set of case 3, the maximum accuracy of each model can attain over 96%, and OFSLMMCSVM can achieve 100% fault diagnosis accuracy, which is higher than that of the other models. For the testing set of case 4, as compared with the experimental results of the first group, diagnosis accuracies of all models using TSFRS appear with an enhancement. The performance of OFSTSFRSLMMCSVM is better than that of other models, and the highest diagnosis accuracy can attain 99.79% (when the sfn (selected feature number) is 140).
According to the experimental results of the second group, when a suitable parameter sfn is selected, it can achieve a desirable improvement on the performance of the fault diagnosis model, which can attain a preferable diagnosis accuracy. According to Figures 6–19, it is evident that the fault diagnosis model can attain better diagnosis performance when a suitable sfn is selected. For example, for the testing sets of cases 1, 2, and 3, the diagnosis accuracy of OFSTSFRSLMMCSVM can attain 100% when the sfn is between 35 and 42, and for the testing set of case 4, the diagnosis accuracy of OFSTSFRSLMMCSVM can attain 99.79% when the sfn is between 137 and 146. In summary, the effectiveness of the proposed TSFRS and LMMC can be demonstrated, and a diagnosis model trained by the proposed diagnosis framework using the training data of a single working condition can achieve a desirable accuracy for a testing set collected from bearings under different working conditions.
4.2. Experiments Based on the Experimental Test Rig 2
4.2.1. Experimental Setup and Cases
In order to further validate the effectiveness and adaptability of the proposed methods, the SQIMFS test rig [42] is used to collect bearing vibration signals for fault diagnosis. The experimental test rig is shown in Figure 20, and the corresponding bearings with different fault types are shown in Figure 21. The laser machining is employed to set singlepoint defects with the different fault diameters, which contains 0.05 mm, 0.1 mm, and 0.2 mm. The collected vibration signals of the bearing consist of the inner race fault signals, ball fault signals, outer race fault signals, and normal signals. The test rig supports the motor speeds of 1200 rpm and 1800 rpm. Two accelerometers are used to collect vibration signals. The sampling frequency is 16 kHz.
(a)
(b)
For the experimental data of the SQIMFS test rig, the signal samples of 1200 rpm and 1800 rpm are applied for experiments, and there are four bearing conditions (normal, ball fault, inner race fault, and outer race fault). Ball fault, inner race fault, and outer race fault have three fault diameters (0.05 mm, 0.1 mm, and 0.2 mm), respectively. Therefore, there are 10 bearing conditions which can correspond to 10 patterns for fault diagnosis. The bearing vibration signals are divided into several data segments, and each segment which is used as a sample has 5000 data points. Each bearing condition contains 60 samples, among which 40 random samples are selected as the testing samples and 20 random samples are selected as the training samples. Based on these samples, two group datasets are used in experiments. The first group dataset includes cases 1 and 2. The samples of 1800 rpm are used as training samples. In case 1, the samples of 1800 rpm are used as testing samples. In case 2, the samples of 1200 rpm are used as testing samples. The second group dataset includes cases 3 and 4. The samples of 1200 rpm are used as training samples. In case 3, the samples of 1200 rpm are used as testing samples. In case 4, the samples of 1800 rpm are used as testing samples. The detailed information of the two group experimental datasets is shown in Tables 17 and 18, respectively.


4.2.2. Analysis Results
The experimental procedure is the same as that of the test rig 1, and each sample is decomposed into different wavelet packet nodes by DTCWPT; the decomposition level is 4. Thus, 16 terminal nodes, namely, subband signals, can be obtained. Then, 16 singlebranch reconstruction signals of terminal nodes can be obtained, and 16 singlebranch reconstruction signals are selected to generate 16 HES. By using 6 statistical parameters shown in Table 4, each singlebranch reconstruction signal can generate 6 statistical features by calculating 6 statistical parameters, and each HES can generate 6 statistical features by calculating 6 statistical parameters. Thus, 16 singlebranch reconstruction signals and 16 HES can generate 192 statistical features which compose the OFS. Then, the TSFRS is performed to select the sensitive statistical features and reduce the difference of distribution between the training sensitive feature subset and the testing sensitive feature subset. The WV, SSMD, and WSD of the 192 statistical features of the training samples are presented in Figures 22–24, respectively.
In order to verify the adaptability and effectiveness of the fault diagnosis model using proposed methods, OFSLMMCSVM and OFSTSFRSLMMCSVM are employed for experiments. The experimental results of OFSLMMCSVM are presented in Table 19. For the testing set of cases 1 and 3, the maximum accuracy can attain 99.67% and 99.17%, respectively. When the testing set is from cases 2 and 4, the maximum accuracy can attain 67.67% and 59.50%. When the TSFRS is applied, according to the experimental results of OFSTSFRSLMMCSVM shown in Table 20, it is evident that the diagnosis accuracies appear with an enhancement by the use of TSFRS. OFSTSFRSLMMCSVM can achieve desirable performance. For the testing sets of cases 1, 2, 3, and 4, the maximum accuracies can attain 100.00%, 89.00%, 100%, and 84.33%, respectively. The curve representation of the experimental results of OFSTSFRSLMMCSVM is shown in Figure 25. For comparison, the diagnosis results of OFSTSFRSSVM, OFSTSFRSPCASVM, OFSTSFRSLDASVM, OFSTSFRSLFDASVM, and OFSTSFRSMMCSVM are shown in Figure 26. According to the experimental results, especially for the testing set of cases 2 and 4, it is evident that the performance of OFSTSFRSLMMCSVM is better than that of other models when a suitable sfn is selected. In summary, the effectiveness of the proposed TSFRS and LMMC can be further demonstrated, and the adaptability of the proposed diagnosis framework using four testing sets collected from different bearings under different working conditions is also verified.

 
Dimension size is 11. 
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
5. Conclusions
Due to the harsh environment and variability of the working conditions in real industrial scenarios, datadriven fault diagnosis using traditional machinelearning methods has limited the performance of the model under different working conditions, in which the distribution between the training data and testing data is different. To address this problem, this paper proposed a novel intelligent fault diagnosis framework for REBs under different conditions, with systematically blending statistical analysis with artificial intelligence. In this framework, DTCWPT is used to process raw vibration signals and extract statistical features. A new feature extraction method, TSFRS, is used to select the most sensitive features to form a sensitive feature subset and reduce the difference of distribution between the training sensitive feature subset and the testing sensitive feature subset. A modified MMC, namely, the LMMC, is used as a feature dimensionality reduction method. Compared with the other dimensionality reduction methods, the advantages of the proposed LMMC is presented. SVM is used as an automated fault pattern recognition classifier. Finally, experimental datasets collected from two experimental test rigs contain samples of different bearing fault conditions such as ball fault, inner race fault, and outer race fault at different defect diameters.
According to the experimental results, the proposed methods for REB fault diagnosis have great potential to be beneficial in industrial applications. For the experimental test rig 1, a set of comparative cases, namely, cases 1, 2, 3, and 4, are employed for experiments. The cases 1 and 2 select samples of the same motor loads of 2 hp as the training sets, and the samples of different motor loads of 2 hp and 3 hp are selected as the testing sets, respectively. The cases 3 and 4 select samples of the same motor loads of 3 hp as the training sets, and the samples of different motor loads of 3 hp and 2 hp are selected as the testing sets, respectively. The experimental results indicate that the fault diagnosis model using the proposed methods can achieve desirable performance, when a suitable parameter sfn is selected, and the maximum diagnosis accuracies of cases 1, 2, 3, and 4 can attain 100%, 100%, 100%, and 99.79%, respectively. The experimental test rig 2 is employed to further verify the adaptability and effectiveness of the diagnosis model using the proposed methods. The experimental results show that the proposed methods can help the diagnosis model to achieve preferable diagnosis accuracy, and at the same time, the desirable adaptability of the diagnosis model using the proposed methods is demonstrated.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was funded by the Special Funds Project for Transforming Scientific and Technological Achievements in Jiangsu Province (BA2016017); the National Key R&D Program of China (2017YFC0804400 and 2017YFC0804401).
References
 X. Yan, Y. Liu, and M. Jia, “Research on an enhanced scale morphologicalhat product filtering in incipient fault detection of rolling element bearings,” Measurement, vol. 147, Article ID 106856, 2019. View at: Publisher Site  Google Scholar
 Y. Qin, “A new family of modelbased impulsive wavelets and their sparse representation for rolling bearing fault diagnosis,” IEEE Transactions on Industrial Electronics, vol. 65, no. 3, pp. 2716–2726, 2017. View at: Publisher Site  Google Scholar
 Z. Huo, Y. Zhang, P. Francq, L. Shu, and J. Huang, “Incipient fault diagnosis of roller bearing using optimized wavelet transform based multispeed vibration signatures,” IEEE Access, vol. 5, pp. 19442–19456, 2017. View at: Publisher Site  Google Scholar
 X. Li, W. Zhang, Q. Ding, and J.Q. Sun, “Multilayer domain adaptation method for rolling bearing fault diagnosis,” Signal Processing, vol. 157, pp. 180–197, 2019. View at: Publisher Site  Google Scholar
 Z. H. An, S. M. Li, Y. Xin, K. Xu, and H. J. Ma, “An intelligent fault diagnosis framework dealing with arbitrary length inputs under different working conditions,” Measurement Science and Technology, vol. 30, no. 12, Article ID 125107, 2019. View at: Publisher Site  Google Scholar
 H. Zheng, R. Wang, Y. Yang et al., “Crossdomain fault diagnosis using knowledge transfer strategy: a review,” IEEE Access, vol. 7, pp. 129260–129290, 2019. View at: Publisher Site  Google Scholar
 Z. Gao, C. Cecati, and S. X. Ding, “A survey of fault diagnosis and faulttolerant techniquespart I: fault diagnosis with modelbased and signalbased approaches,” IEEE Transactions on Industrial Electronics, vol. 62, no. 6, pp. 3757–3767, 2015. View at: Publisher Site  Google Scholar
 Z. Gao, C. Cecati, and S. X. Ding, “A survey of fault diagnosis and faulttolerant techniquespart II: fault diagnosis with knowledgebased and hybrid/active approaches,” IEEE Transactions on Industrial Electronics, vol. 62, no. 6, pp. 3768–3774, 2015. View at: Publisher Site  Google Scholar
 P. Ma, H. Zhang, W. Fan, and C. Wang, “A diagnosis framework based on domain adaptation for bearing fault diagnosis across diverse domains,” ISA Transactions, vol. 99, pp. 465–478, 2020. View at: Publisher Site  Google Scholar
 M. Kang, J. Kim, L. M. Wills, and J.M Kim, “Timevarying and multiresolution envelope analysis and discriminative feature analysis for bearing fault diagnosis,” IEEE Transactions on Industrial Electronics, vol. 62, no. 12, pp. 7749–7761, 2015. View at: Publisher Site  Google Scholar
 Z. Feng, M. Liang, and F. Chu, “Recent advances in timefrequency analysis methods for machinery fault diagnosis: a review with application examples,” Mechanical Systems and Signal Processing, vol. 38, no. 1, pp. 165–205, 2013. View at: Publisher Site  Google Scholar
 Z. K. Peng, G. Meng, F. L. Chu, Z. Q. Lang, W. M. Zhang, and Y. Yang, “Polynomial chirplet transform with application to instantaneous frequency estimation,” Ieee Transactions on Instrumentation and Measurement, vol. 60, no. 9, pp. 3222–3229, 2011. View at: Publisher Site  Google Scholar
 Y. Yang, Z. Peng, W. Zhang, and G. Meng, “Parameterised timefrequency analysis methods and their engineering applications: a review of recent advances,” Mechanical Systems and Signal Processing, vol. 119, pp. 182–221, 2019. View at: Publisher Site  Google Scholar
 L. Wang and J. Xiang, “A twostage method using splinekernelled chirplet transform and angle synchronous averaging to detect faults at variable speed,” IEEE Access, vol. 7, pp. 22471–22485, 2019. View at: Publisher Site  Google Scholar
 L. Wang, J. Xiang, and Y. Liu, “A timefrequencybased maximum correlated kurtosis deconvolution approach for detecting bearing faults under variable speed conditions,” Measurement Science and Technology, vol. 30, no. 12, Article ID 125005, 2019. View at: Publisher Site  Google Scholar
 P. Shi, S. An, P. Li, and D. Han, “Signal feature extraction based on cascaded multistable stochastic resonance denoising and EMD method,” Measurement, vol. 90, pp. 318–328, 2016. View at: Publisher Site  Google Scholar
 Y. Yang, X. J. Dong, Z. K. Peng, W. M. Zhang, and G Meng, “Vibration signal analysis using parameterized timefrequency method for features extraction of varyingspeed rotary machinery,” Journal of Sound and Vibration, vol. 335, no. 5, pp. 350–366, 2015. View at: Publisher Site  Google Scholar
 X. Liu, L. Bo, and H. Luo, “Bearing faults diagnostics based on hybrid LSSVM and EMD method,” Measurement, vol. 59, pp. 145–166, 2015. View at: Publisher Site  Google Scholar
 T. Guo and Z. Deng, “An improved EMD method based on the multiobjective optimization and its application to fault feature extraction of rolling bearing,” Applied Acoustics, vol. 127, pp. 46–62, 2017. View at: Publisher Site  Google Scholar
 K. Zhang, T. Lin, and X. Jin, “Low speed bearing fault diagnosis based on EMDCIIT histogram entropy and KFCM clustering,” Journal of Shanghai Jiaotong University (Science), vol. 24, no. 5, pp. 616–621, 2019. View at: Publisher Site  Google Scholar
 K. P. Shankar, K. L. Annamalai, and L. S. Kumar, “Selecting effective intrinsic mode functions of empirical mode decomposition and variational mode decomposition using dynamic time warping algorithm for rolling element bearing fault diagnosis,” Transactions of the Institute of Measurement and Control, vol. 41, no. 7, pp. 1923–1932, 2018. View at: Publisher Site  Google Scholar
 N. Tsakalozos, S. K. Drakakis, and S. Rickard, “A formal study of the nonlinearity and consistency of the empirical mode decomposition,” Signal Processing, vol. 92, no. 9, pp. 1961–1969, 2012. View at: Publisher Site  Google Scholar
 X. Jiao, B. Jing, Y. Huang, J. Li, and G. Xu, “Research on fault diagnosis of airborne fuel pump based on EMD and probabilistic neural networks,” Microelectronics Reliability, vol. 75, pp. 296–308, 2017. View at: Publisher Site  Google Scholar
 J. Zhong and Y. Huang, “Timefrequency representation based on an adaptive shorttime Fourier transform,” IEEE Transactions on Signal Processing, vol. 58, no. 10, pp. 5118–5128, 2010. View at: Publisher Site  Google Scholar
 J. BurrielValencia, R. PuchePanadero, J. MartinezRoman, A. SapenoBano, and M. PinedaSanchez, “Fault diagnosis of induction machines in a transient regime using current sensors with an optimized Slepian window,” Sensors, vol. 18, no. 2, p. 146, 2018. View at: Publisher Site  Google Scholar
 A. H. Boudinar, A. F. Aimer, M. E. Khodja, and N. Benouzza, “Induction motor’s bearing fault diagnosis using an improved short time Fourier transform,” in Advanced Control Engineering Methods in Electrical Engineering Systems, vol. 522, pp. 411–426, Springer, Berlin, Germany, 2019. View at: Google Scholar
 R. M. Ding, J. J. Shi, X. X. Jiang, and Z. K. Zhu, “Instantaneous frequency estimation via multiple ridge integration scheme for bearing fault diagnosis,” in Proceedings of the 2018 Prognostics and System Health Management Conference, pp. 609–613, Chongqing, China, October 2018. View at: Google Scholar
 Y. Xin and S. M. Li, “Novel datadriven shortfrequency mutual information entropy threshold filtering and its application to bearing fault diagnosis,” Measurement Science and Technology, vol. 30, no. 11, Article ID 115006, 2019. View at: Publisher Site  Google Scholar
 I. Attoui, N. Fergani, N. Boutasseta, B. Oudjani, and A. Deliou, “A new time–frequency method for identification and classification of ball bearing faults,” Journal of Sound & Vibration, vol. 397, pp. 241–265, 2017. View at: Publisher Site  Google Scholar
 I. R. Quinde, J. C. Sumba, L. E. Ochoa, A. J. V. Guevara, and R. MoralesMenendez, “Bearing fault diagnosis based on optimal timefrequency representation method,” IFACPapersOnLine, vol. 52, no. 11, pp. 194–199, 2019. View at: Publisher Site  Google Scholar
 Y. Wang, G. Xu, L. Liang, and K Jiang, “Detection of weak transient signals based on wavelet packet transform and manifold learning for rolling element bearing fault diagnosis,” Mechanical Systems and Signal Processing, vol. 5455, no. 5455, pp. 259–276, 2015. View at: Publisher Site  Google Scholar
 H. Li, Q. Zhang, X. Qin, and Y. Sun, “KSVDbased WVD enhancement algorithm for planetary gearbox fault diagnosis under a CNN framework,” Measurement Science and Technology, vol. 31, no. 2, Article ID 025003, 2020. View at: Publisher Site  Google Scholar
 H. H. Bafroui and A. Ohadi, “Application of wavelet energy and Shannon entropy for feature extraction in gearbox fault detection under varying speed conditions,” Neurocomputing, vol. 133, no. 8, pp. 437–445, 2014. View at: Google Scholar
 C. Wang, M. Gan, and C. Zhu, “Intelligent fault diagnosis of rolling element bearings using sparse wavelet energy based on overcomplete DWT and basis pursuit,” Journal of Intelligent Manufacturing, vol. 28, no. 6, pp. 1377–1391, 2015. View at: Publisher Site  Google Scholar
 J.D. Wu and C.H. Liu, “An expert system for fault diagnosis in internal combustion engines using wavelet packet transform and neural network,” Expert Systems with Applications, vol. 36, no. 3, pp. 4278–4286, 2009. View at: Publisher Site  Google Scholar
 J. Qu, Z. Zhang, and T. Gong, “A novel intelligent method for mechanical fault diagnosis based on dualtree complex wavelet packet transform and multiple classifier fusion,” Neurocomputing, vol. 171, no. C, pp. 837–853, 2016. View at: Publisher Site  Google Scholar
 G. Tang, X. Wang, and Y. He, “A novel method of fault diagnosis for rolling bearing based on dual tree complex wavelet packet transform and improved multiscale permutation entropy,” Mathematical Problems in Engineering, vol. 2016, Article ID 5432648, 13 pages, 2016. View at: Publisher Site  Google Scholar
 N. G. Kingsbury, “The dualtree complex wavelet transform: a new technique for shif invariance and directional flters,” in Proceedings of the 8th IEEE Digital Signal Processing Workshop, Salt Lake City, UT, USA, 1998. View at: Google Scholar
 I. W. Selesnick, R. G. Baraniuk, and N. C. Kingsbury, “The dualtree complex wavelet transform,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 123–151, 2005. View at: Publisher Site  Google Scholar
 Q. Niu, Q. Tong, J. Cao, F. Liu, and Y. Zhang, “Feature extraction method for condition monitoring of rolling element bearings based on dualtree complex wavelet packet transform and VMD,” Wireless Personal Communications, vol. 103, no. 1, pp. 831–845, 2018. View at: Publisher Site  Google Scholar
 H. Shao, H. Jiang, F. Wang, and Y. Wang, “Rolling bearing fault diagnosis using adaptive deep belief network with dualtree complex wavelet packet,” ISA Transactions, vol. 69, pp. 187–201, 2017. View at: Publisher Site  Google Scholar
 D. Fei, Y. Xiao, D. Enjie, S. Wu, C. Fan, and Y. Huang, “Rolling bearing fault diagnosis using modified neighborhood preserving embedding and maximal overlap discrete wavelet packet transform with sensitive features selection,” Shock and Vibration, vol. 2018, Article ID 5063527, 29 pages, 2018. View at: Publisher Site  Google Scholar
 X. Yu, F. Dong, E. Ding, S. Wu, and C. Fan, “Rolling bearing fault diagnosis using modified LFDA and EMD with sensitive feature selection,” IEEE Access, vol. 6, pp. 3715–3730, 2018. View at: Publisher Site  Google Scholar
 Y. Li, X. Liang, J. Lin, Y. Chen, and J. Liu, “Train axle bearing fault detection using a feature selection scheme based multiscale morphological filter,” Mechanical Systems and Signal Processing, vol. 101, pp. 435–448, 2018. View at: Publisher Site  Google Scholar
 Y. Hu, S. Zhang, A. Jiang et al., “A new method of wind turbine bearing fault diagnosis based on multimasking empirical mode decomposition and fuzzy Cmeans clustering,” Chinese Journal of Mechanical Engineering, vol. 32, no. 1, 2019. View at: Publisher Site  Google Scholar
 Y. Gao, X. Liu, and J. Xiang, “FEM simulationbased generative adversarial networks to detect bearing faults,” Ieee Transactions on Industrial Informatics, vol. 16, no. 7, pp. 4961–4971, Jul 2020. View at: Publisher Site  Google Scholar
 X. Y. Liu, H. Z. Huang, and J. W. Xiang, “A personalized diagnosis method to detect faults in a bearing based on acceleration sensors and an FEM simulation driving support vector machine,” Sensors, vol. 20, no. 2, p. 420, 2020. View at: Publisher Site  Google Scholar
 S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010. View at: Publisher Site  Google Scholar
 A. Khatami, M. Babaie, H. R. Tizhoosh, A. Khosravi, T. Nguyen, and S. Nahavandi, “A sequential searchspace shrinking using CNN transfer learning and a radon projection pool for medical image retrieval,” Expert Systems with Applications, vol. 100, pp. 224–233, 2018. View at: Publisher Site  Google Scholar
 A. S. Qureshi, A. Khan, A. Zameer, and A. Usman, “Wind power prediction using deep neural network based meta regression and transfer learning,” Applied Soft Computing, vol. 58, pp. 742–755, 2017. View at: Publisher Site  Google Scholar
 S. Mun, M. Shin, S. Shon, W. Kim, D. K. Han, and H. Ko, “DNN transfer learning based nonlinear feature extraction for acoustic event classification,” IEICE Transactions on Information and Systems, vol. E100.D, no. 9, pp. 2249–2252, 2017. View at: Publisher Site  Google Scholar
 M. S. Long, J. M. Wang, G. G. Ding, J. G. Sun, and P. S. Yu, “Transfer feature learning with joint distribution adaptation,” in Proceedings of the IEEE International Conference on Computer Vision 2013, pp. 2200–2207, IEEE, Sydney, Australia, December 2013. View at: Publisher Site  Google Scholar
 M. Ghifary, W. B. Kleijn, M. J. Zhang, D. Balduzzi, and W. Li, “Deep reconstructionclassification networks for unsupervised domain adaptation,” in Proceedings of the 14th European Conference on Computer Vision 2016, pp. 597–613, Amsterdam, Netherlands, October 2016. View at: Google Scholar
 K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, “Unsupervised pixellevel domain adaptation with generative adversarial networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, IEEE, Honolulu, HI, USA, July 2017. View at: Publisher Site  Google Scholar
 I. Attoui, B. Oudjani, N. Boutasseta, N. Fergani, M.S. Bouakkaz, and A Bouraiou, “Novel predictive features using a wrapper model for rolling bearing fault diagnosis based on vibration signal analysis,” The International Journal of Advanced Manufacturing Technology, vol. 106, no. 78, pp. 3409–3435, 2020. View at: Publisher Site  Google Scholar
 L. Lu, J. Yan, and C. W. de Silva, “Dominant feature selection for the fault diagnosis of rotary machines using modified genetic algorithm and empirical mode decomposition,” Journal of Sound and Vibration, vol. 344, pp. 464–483, 2015. View at: Publisher Site  Google Scholar
 Y. Bengio, P. Vincent, O. Delalleau et al., “Outofsample extensions for LLE, Isomap, MDS, Eigenmaps, and spectral clustering,” in Proceedings of the International Conference on Neural Information Processing Systems, Istanbul, Turkey, June 2003. View at: Google Scholar
 N. Vlassis, Y. Motomura, and B. Kröse, “Supervised dimension reduction of intrinsically lowdimensional data,” Neural Computation, vol. 14, no. 1, pp. 191–215, 2002. View at: Publisher Site  Google Scholar
 X. He and P. Niyogi, “Locality preserving projections,” Advances in Neural Information Processing Systems, vol. 16, no. 1, pp. 186–197, 2004. View at: Google Scholar
 X. He, D. Cai, S. Yan et al., “Neighborhood preserving embedding,” in Proceedings of the 10th IEEE international Conference on Computer Vision 2005, pp. 1208–1213, IEEE, Beijing, China, 2005. View at: Google Scholar
 E. Kokiopoulou and Y. Saad, “Orthogonal neighborhood preserving projections,” in Proceedings of the IEEE International Conference on Data Mining 2005, IEEE, New Orleans, LA, USA, November 2005. View at: Google Scholar
 J.B. Yu, “Bearing performance degradation assessment using locality preserving projections,” Expert Systems with Applications, vol. 38, no. 6, pp. 7440–7450, 2011. View at: Publisher Site  Google Scholar
 B. S. J. Costa, P. P. Angelov, and L. A. Guedes, Fully Unsupervised Fault Detection and Identification Based on Recursive Density Estimation and SelfEvolving CloudBased Classifier, Elsevier Science Publishers B. V, Amsterdam, Netherlands, 2015.
 W. Guangbin, D. U. Moujun, H. Qingkai et al., “A bearing fault diagnosis method based on multiscale subband sample entropy and LPP,” Journal of Vibration and Shock, vol. 35, no. 20, pp. 71–76, 2016, in Chinese. View at: Google Scholar
 H. Li, T. Jiang, and K. Zhang, “Efficient and robust feature extraction by maximum margin criterion,” IEEE Transactions on Neural Networks, vol. 17, no. 1, pp. 157–165, 2006. View at: Publisher Site  Google Scholar
 M. Sugiyama, “Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis,” 2007, http://jmlr.org/. View at: Google Scholar
 R. Yu and A. Baradarani, “Sampleddata design of FIR dual filter banks for dualtree complex wavelet transforms via LMI optimization,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp. 3369–3375, 2008. View at: Publisher Site  Google Scholar
 C. Yao, Z. Lu, J. Li, Y. Xu, and J Han, “A subset method for improving Linear Discriminant Analysis,” Neurocomputing, vol. 138, no. 11, pp. 310–315, 2014. View at: Publisher Site  Google Scholar
 A. K. Jain, P. W. Duin, and J. Mao, “Statistical pattern recognition: a review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4–37, 2000. View at: Publisher Site  Google Scholar
 R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936. View at: Publisher Site  Google Scholar
 C. R. Rao, “The utilization of multiple measurements in problems of biological classification,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 10, no. 2, pp. 159–193, 1948. View at: Publisher Site  Google Scholar
 F. Song, D. Zhang, D. Mei et al., “A multiple maximum scatter difference discriminant criterion for facial feature extraction,” IEEE Transaction on Systems, Man, and CyberneticsPart B: Cybernetics, vol. 33, no. 6, pp. 1566–1599, 2007. View at: Google Scholar
 S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000. View at: Publisher Site  Google Scholar
 M. Van and H. J. Kang, “Bearing defect classification based on individual wavelet local Fisher discriminant analysis with particle swarm optimization,” IEEE Transactions on Industrial Informatics, vol. 12, no. 1, pp. 124–135, 2017. View at: Google Scholar
 S. J. Pan, I. W. Tsang, J. T. Kwok, and Q Yang, “Domain adaptation via transfer component analysis,” IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 199–210, 2011. View at: Publisher Site  Google Scholar
 S. J. Pan, J. T. Kwok, and Q. Yang, “Transfer learning via dimensionality reduction,” in Proceedings of the 23rd AAAI Conference on Artificial Intelligence, Chicago, IL, USA, July 2008. View at: Google Scholar
 M. RobnikSIkonja and I. Kononenko, “Theoretical and empirical analysis of ReliefF and RReliefF,” Machine Learning, vol. 53, no. 12, pp. 23–69, 2003. View at: Google Scholar
 C.W.R.University, 2014, http://csegroups.case.edu/bearingdatacenter/pages/downloaddatafile.
Copyright
Copyright © 2020 Shiyuan Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.