Intelligent Feature Learning Methods for Machine Condition Monitoring
View this Special IssueResearch Article  Open Access
Bingru Yang, Qi Li, Liang Chen, Changqing Shen, "Bearing Fault Diagnosis Based on Multilayer Domain Adaptation", Shock and Vibration, vol. 2020, Article ID 8873960, 11 pages, 2020. https://doi.org/10.1155/2020/8873960
Bearing Fault Diagnosis Based on Multilayer Domain Adaptation
Abstract
Bearing fault diagnosis plays a vitally important role in practical industrial scenarios. Deep learningbased fault diagnosis methods are usually performed on the hypothesis that the training set and test set obey the same probability distribution, which is hard to satisfy under the actual working conditions. This paper proposes a novel multilayer domain adaptation (MLDA) method, which can diagnose the compound fault and single fault of multiple sizes simultaneously. A special designed residual network for the fault diagnosis task is pretrained to extract domaininvariant features. The multikernel maximum mean discrepancy (MKMMD) and pseudolabel learning are adopted in multiple layers to take both marginal distributions and conditional distributions into consideration. A total of 12 transfer tasks in the fault diagnosis problem are conducted to verify the performance of MLDA. Through the comparisons of different signal processing methods, different parameter settings, and different models, it is proved that the proposed MLDA model can effectively extract domaininvariant features and achieve satisfying results.
1. Introduction
Rolling element bearings are of great importance for mechanical equipment. They usually need to run for a long time under harsh conditions, which will inevitably cause faults. The failure of the bearings will cause a lot of economic loss and even safety problems [1]. Therefore, the study of reliable and accurate bearing fault diagnosis methods is of great importance, which can monitor and diagnose the health conditions of the bearings, so as to guarantee the normal working condition of the mechanical equipment and reduce the risk of failure. With the development of intelligent manufacturing, higher requirements have been put forward for fault diagnosis in the industrial process.
Vibration signals are highprecision indicators that can provide information for detecting the status of mechanical equipment [2]. Most traditional fault diagnosis methods based on signal processing are used to extract fault information from raw vibration signals, such as empirical mode decomposition (EMD) [3], wavelet packet transform (WPT) [4], and other timefrequency domain signal processing methods [5]. Yu et al. [6] adopted the EMD method to calculate the original statistical characteristics of the vibration signals through the intrinsic mode functions and combined with a modified method which can reduce the feature dimension to comply bearing fault diagnosis. Liu [7] decomposed a vibration signal into subband signals via onelevel stationary wavelet packet transform (onelevel SWPT), which improved the ability to extract fault features. Signal processing methods require analysts to have certain expert knowledge to extract fault features accurately. However, the equipment status can be very complicated in the actual operation process; such methods cannot achieve sufficient accuracy. Therefore, the researchers introduced machine learning (ML) methods to make up for this deficiency. Chen et al. [8] combined rough set theory (RS) with the support vector machine (SVM) to propose a multisensor data fusion fault diagnosis method, which reduced the computing cost of the SVM but improved the effectiveness and accuracy. Yu et al. [9] applied WPT to extract fault features of the planetary gearbox; the features are discretized and regarded as the input of the flexible naive Bayesian classifier (FNBC). Fenineche et al. [10] studied the influence of the parameter selection in the artificial neural network (ANN) to obtain the best performance of fault diagnosis. However, the performance of ML models is often limited by manual feature extraction. When the fault signals are complex, it is difficult to achieve the expected diagnostic accuracy.
With the development of deep learning (DL) [11], it can automatically extract features from nonlinear bearing signals, and the tedious signal preprocessing can be greatly reduced. Zhang et al. [12] applied the sparse autoencoder (SAE) to propose a new label generation method, which can identify samples that do not belong to known categories. Dong et al. [13] introduced the convolutional neural network (CNN) into a deep belief network (DBN) to propose a random convolutional deep belief network for the mechanical fault. By adding unsupervised components, the generalization ability of the model was improved. The novel hierarchical learning rate adaptive CNN presented by Guo et al. [14] was designed for diagnosing bearing faults and determining severity. In order to overcome the shortcomings of traditional vibration signal processing methods, Jiao et al. [15] proposed a method based on multivariate encoder information to diagnose the fault intelligently. The method presented in [16] can effectively diagnose the compound fault by combining the features automatically extracted by the model with the timedomain features designed manually. However, DL models have to satisfy the assumption that the source domain and target domain (for example, training set and test set) should obey the same distribution and feature space. Actually, in many actual industrial scenarios, the difference of distribution between sourcedomain samples and targetdomain samples varies considerably, which degrades the diagnostic performance [17].
To tackle this challenge, the application of transfer learning (TL) to bearing fault diagnosis is an emerging research aspect in recent years. Its purpose is to fully reuse the knowledge learned from the source domain to another different but related target domain [18, 19]. Peng et al. [20] added the idea of residual learning to the model, which can effectively learn highlevel and abstract features. Wen et al. [21] adopted a threelayer SAE as the feature extractor and calculated the maximum mean discrepancy (MMD) to minimize the difference between domains. The representation clustering algorithm proposed by Li et al. [22] can maximize the distance metric of interclass variations and minimize the distance metric of intraclass variations at the same time. Zhang et al. [23] improved the domain adaptation ability of the model through implementing the adaptive batch normalization method. The method presented in [24] matched the marginal distributions of the output of every convolution layer, improving the crossdomain testing performance. For the distribution discrepancy between the source domain and target domain, some studies consider that the transferability of highlevel features drops significantly [25], while others believe that lowlevel features may be more responsible for domain shift [26]. Moreover, most existing TL methods focus on marginal distributions and ignore conditional distributions of different domains, while they both have different effects on domain adaptation.
In this paper, a novel multilayer domain adaptation (MLDA) method is proposed for TLbased intelligent bearing fault diagnosis. By calculating multikernel MMD (MKMMD) and considering conditional distributions in multiple layers, the model can extract effective domaininvariant features, which can clearly contribute to transfer tasks. The main contributions of this work can be summarized as the following: (1) a MLDA method is raised to diagnose the unlabeled bearing fault signals from the target domain through the shared domain features extracted from the source domain. (2) The method can diagnose the compound fault and single fault of multiple sizes at the same time. (3) A special designed residual network based on the ResNet [27] framework is adopted as the feature extractor in the fault diagnosis task to extract features automatically without complex timefrequency domain analysis. (4) In order to minimize the distribution discrepancy between the source and the target domain, MKMMD and pseudolabel learning are adopted in multiple layers, considering both marginal distributions and conditional distributions.
The remaining parts of the paper are organized as follows. In Section 2, the domain adaptation problem is demonstrated, and MKMMD is introduced. The proposed MLDA method for bearing fault diagnosis is raised in Section 3. The comparisons of different signal processing methods, different parameter settings, and different methods are discussed in Section 4. Finally, the conclusions are drawn in Section 5.
2. Theoretical Background
2.1. Problem Formulation
Since bearings are affected by many factors during operation, such as load and running time, the distributions of samples in the source domain are different from those of samples in the target domain. The emergence of TL provides a new idea for solving this problem. There are two important concepts in TL named domain and task. The detailed description of TL is given as follows [28–30].
The domain, abbreviated by D, contains the data space X and its marginal distribution P (X), which can be described as D = {X, P (X)}.
The task, abbreviated by T, contains the label space Y and its predictive function f (·), which can be described as T = {Y, f (·)}. f (·) can also be described as a conditional distribution P (YX) from the perspective of probability.
The labeled sample space of D_{s} can be written as X_{s} = {} = 1 with a relevant task T_{s}, and the unlabeled sample space of D_{t} can be written as X_{t} = with a relevant task T_{t}, where n_{s} and n_{t}, respectively, denote the numbers of samples of their specific domain.
TL aims to make full use of the knowledge learned from source domain D_{s} and source task T_{s} to find a target predictive function f (·) in target domain D_{t}, where D_{s} ≠ D_{t} or T_{s} ≠ T_{t}. The condition D_{s} ≠ D_{t} indicates that P_{s} (X) ≠ P_{t} (X) or (and) X_{s}≠X_{t}, and the condition T_{s} ≠ T_{t} indicates that P_{s} (YX) ≠ P_{t} (YX) or (and) Y_{s} ≠ Y_{t}.
Domain adaptation (DA) can be regarded as a specific setting in TL, as shown in Figure 1, which solves the problem of X_{s} ≠ X_{t}, but T_{s} = T_{t}.
2.2. Multikernel Maximum Mean Discrepancy
DA is a challenge problem when there are no (or limited) labeled data in the target domain. To address this problem, many existing methods focus on minimizing the difference between two domains by adopting a nonparametric distance measure called MMD, which can measure the discrepancy of marginal distributions. As stated in [31], compared with a single kernel, MKMMD can greatly improve the efficiency of domain adaptation.
H_{k} denotes the reproducing kernel Hilbert space (RKHS) with a characteristic kernel k. The MKMMD between distributions U and V is defined as the RKHS distance between the mean embeddings of U and V. The squared formulation of the MKMMD is given as
The most important property is that only when U = V. The calculation formula of the multikernel is given bywhere G is the number of kernels and is the Gaussian kernel with bandwidth . Gretton et al. [32] theoretically studied that the kernel used in the mean embedding of U and V is essential to reduce the test error. The multikernel can enhance MMD test through different kernels, thus providing a method for optimal kernel selection.
3. The Proposed Method
3.1. A Special Designed Residual Network
The ResNet has been proved to have strong feature extraction capability. Considering the size of the bearing fault dataset, ResNet18 is selected as the feature extractor of MLDA. The detailed information of ResNet18 is shown in Table 1. Convolutional layer, batch normalization, rectified linear unit, and fully connected layer are abbreviated as Conv, BN, ReLu, and FC, respectively.

ResNet18 contains 4 blocks, and the internal structure of the block is illustrated in Figure 2.
The block can be represented aswhere is the input of the lth block and F is the residual function. Equation (4) represents the identity mapping, and h is the ReLu activation function. Based on equations (3) and (4), the deep features from lowlevel l to highlevel L can be obtained:
The original ResNet18 has achieved great success in the field of image recognition. In this paper, ResNet18 is adopted as a feature extractor. Combined with the characteristics of the bearing signals, some modifications need to be made: (1) in order to match the input dimension of the bearing signals, the kernel size of Conv1 is changed to 3 × 3. (2) To retain as much fault information as possible, the Max pool layer is removed. (3) Since ResNet18 is only adopted as a feature extractor and its classification function is not required, the FC layer and softmax layer are removed. The modified ResNet18 can effectively extract domaininvariant features.
3.2. Network Architecture
In order to diagnose bearing faults under variable working conditions, the architecture of MLDA is shown in Figure 3.
The data from D_{s} and D_{t} are used as the input of ResNet18 pretrained by labeled data from the source domain. For extracting domaininvariant features effectively, the marginal distributions are minimized through calculating MKMMD in multiple layers. The MKMMD loss can be obtained bywhere N^{l} denotes the number of blocks for calculating MKMMD, K is the number of Gaussian kernels, U^{l} and V^{l} represent the distribution of D_{s} and D_{t} extracted from the lth block, and (U^{l}, V^{l}) is the MKMMD calculated by equation (1) with kernel k.
In two domains with different working conditions, the classification categories are the same. Since the label of D_{s} is available, the classification loss can be minimized, and crossentropy is used as the optimization objective:where M denotes the number of samples, y represents the true label, and represents the label output by the classifier.
MKMMD can bound the marginal distributions of the extracted features from D_{s} and D_{t}. However, unlabeled data from the target domain cannot be directly used in the training process because supervised information is not available. Pseudolabel learning [33] can be one of the solutions to this problem. The pseudolabel of the specific sample is determined by selecting the label with the maximum probability of prediction, which can be summarized into two steps: the predicted probability of labels and the conversion to the pseudolabel [34]. In MLDA, each block is followed by a matching classifier (FC layer). The predicted probability of labels given by the classifier and the softmax layer can be calculated aswhere y_{i} is the ith sample, C is the number of categories, and W is the weight of the corresponding category. The conversion of pseudolabels can be expressed aswhere denotes the pseudolabel of the ith sample. The correctness of pseudolabels will be improved during the training process so that the conditional distributions could be more similar. The pseudolabel loss of each block can be calculated by crossentropy:
The total pseudolabel loss can be obtained by
The loss of the overall model can be expressed aswhere λ_{1} and λ_{2} are tradeoff parameters.
3.3. Diagnosis Procedure
The flowchart of MLDA is developed in Figure 4.
First, the original vibration signals under different loads are collected from the bearing test platform. The frequencydomain signals are constructed via fast Fourier transform (FFT) and reshaped into 2dimention [35, 36]. Then, the data are divided into supervised source domain and unsupervised target domain and further separated into training set and test set. Furthermore, in order to accelerate the training process, ResNet18 is pretrained by sourcedomain data.
Secondly, based on the particular problem of fault diagnosis and the input dataset information, the diagnostic model built is ready for the training process. The data from the training set are fed into the pretrained ResNet18 which can extract domaininvariant features. Both marginal distributions and conditional distributions are minimized in multiple layers. On the final layer of the network, the FC layer is adopted to identify the bearing faults with the extracted domainshared features. The optimization objective of the model (equation (14)) is minimized through the Adam method. When the training process is over, the loss function of the overall method converges in general.
Finally, after training, the test set from the target domain is input into the model to evaluate the model capability and output the fault diagnosis results.
4. Experimental Analysis
4.1. Dataset Description
The bearing fault dataset used to evaluate the effectiveness of the raised MLDA method was collected from the bearing test platform as shown in Figure 5. The drive motor, healthy bearing, and test bearing are fixed on the same motor shaft from left to right. The data were collected by an NI PXle1082 data acquisition system. The adjustable loading system is settled in the radial direction of the motor shaft. An SGSF20K dynamometer is installed in the boltnut system to measure the load. The sampling frequency of the PCB 352C33 accelerometer is 10 kHz, and the motor speed is 896 rpm. During the bearing operation cycle, the accelerometer continuously collects bearing signal data.
The 14 health conditions are gathered under four working conditions with different loads (0 kN, 1 kN, 2 kN, and 3 kN). There are ten health conditions for single faults, namely, normal bearing (NO), inner race fault (IF), outer race fault (OF), and ball fault (BF). Each fault condition covers three fault diameters. Furthermore, four kinds of compound faults are processed in a width of 0.2 mm: inner race and ball fault (IB), inner race and outer race (IO), outer race and ball fault (OB), and inner race, outer race, and ball fault (IOB). For the sake of clarity, all 14 fault patterns are summarized in Table 2. Each sample contains 2048 data points.

When the bearing rotates at a certain constant speed, different fault patterns will generate different vibration signals. The vibration signals under 0 kN load are shown in Figure 6. The vibration signal of the healthy bearing is relatively stable. For single faults, the periodicity of IF and OF can be seen obviously. However, the vibration signal of the BF has no obvious periodicity and amplitude, which is difficult to identify. Compared with the single fault, the amplitude of the compound fault increases significantly and changes greatly. The complexity of compound fault signals makes them difficult to extract features and brings challenges to academic research [37].
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
4.2. Comparison of Different Signal Processing Methods
In order to extract features from the faultrelated bearing signals, traditional methods employ many tricks to process the signal. In this part, we carried out time, frequency, and timefrequency analysis to decide the best signal preprocessing method for the MLDA model. Three experiments are carried out: (1) input timedomain signals to the MLDA model, (2) input timefrequency domain signals by empirical mode decomposition (EMD), and (3) input frequencydomain signals by FFT.
In the experiment, the learning rate is set as 0.0001, and λ_{1} and λ_{2} are set as 1 and 0.01, respectively. The MKMMD adopts a mixture of 5 Gaussian kernels with bandwidths of 4, 8, 16, 32, and 64. A total of 12 transfer tasks are conducted. For example, transfer task 01 indicates that 0 kN is the source domain and 1 kN is the target domain, hereinafter the same.
The results are shown in Table 3. It is found FFT + MLDA can achieve the best results, which demonstrates the power of the deep neural network to model the faultrelated nonlinear vibration signals. Hence, our method does not need complicated techniques on signal preprocessing. Simply transforming the signal from the time domain to the frequency domain by FFT is sufficient for the MLDA model.

4.3. Comparison of Different Parameter Settings
Different parameter settings will bring different effects on the experimental results. For verifying the effect of multilayer domain adaptation, two groups of experiments are designed to apply MKMMD and pseudolabel learning only on block 1 and block 4, respectively, represented as MLDA1 and MLDA4. Furthermore, in order to prove the effect of MKMMD, only one Gaussian kernel with a bandwidth of 4 is set as the third group of comparisons, represented as MLDASK. The experimental results are shown in Table 4.

As shown in Table 4, MLDA achieves the best accuracy in almost all tasks, with an average of 99.14%. It can be seen from MLDA1 and MLDA4 that both lowlevel features and highlevel features will cause domain shift to a certain extent. Matching discrepancy of highlevel features can obtain better accuracy, which indicates that the transferability of highlevel features is better than that of lowlevel features. Moreover, when the source domain is relatively different from the target domain, such as the transfer between 0 kN and 3 kN, the advantages of applying domain adaptation in multiple layers are obvious. The comparison of MLDASK shows that a single kernel of MMD has a limited capability to narrow the marginal distributions. MLDA makes a great improvement in all 12 tasks, which proves the effect of mixed kernels. The radar diagram in Figure 7 shows the diagnostic results of different parameter settings. It can be seen intuitively that MLDA achieves the best results.
Figure 8 shows the classification results of the proposed method for the 03 transfer task. The fault characteristics of NO, IF, and OF are relatively obvious. All samples are diagnosed correctly. Misdiagnosis mostly occurs in BF04, IB, and IOB. The diagnostic results of BF04 are mainly the fault diameter recognition error, which is classified as BF03. IB is misclassified as BF04 and IOB. IOB is the category with the most misdiagnoses, and all the misclassified samples are identified as BF04. BF is easily misclassified because its fault characteristics are not obvious. Since the compound fault is a mixture of multiple fault types, the feature extraction is difficult with mixture features, especially the mixture of three fault types.
4.4. Comparison of Different Models
In order to further demonstrate the effect of MLDA, traditional TL methods are investigated, which include transfer component analysis (TCA) [38], joint distribution adaptation (JDA) [39], correlation alignment (CORAL) [40], and the pretraining model (ResNet18). The comparison results are illustrated in Table 5.

From the comparison results of different methods, the conclusions can be drawn with three points: (1) the best performance is achieved by MLDA distinctly among 5 methods. Without domain adaptation, the targetdomain data cannot be diagnosed effectively by the pretraining model (ResNet18). (2) The traditional TL methods can achieve good results when the discrepancy is relatively small, such as the transfer between 0 kN and 1 kN. The conclusion can be explained by the fact that the transferability of extracted features and the degree of domain shift are affected by the degree of working conditions. (3) The unsatisfied transfer effect will occur by the traditional methods when the variety of working conditions is dramatic, which decreases the accuracy of fault diagnosis. Notably, the raised MLDA method maintains the satisfied accuracy and the generalizability.
Figure 9 illustrates diagnostic results using different models. Clearly, the best performance is achieved in various transfer tasks by the raised MLDA method, which proves its superiority.
Although MLDA achieves satisfactory results, it is still confusing whether MLDA can extract domaininvariant features. TSNE method [41] which can reduce the dimension is introduced to visualize the features extracted by each method. The results are shown in Figure 10, which shows transfer task 30. 20 feature points are sampled randomly in each category.
As a pretraining model, ResNet18 has strong feature extraction ability. However, the discrepancy between two domains can hardly be narrowed in the ResNet18 method. The other three traditional TL methods can narrow the distribution shift between different domain features to a certain extent, but the capability is limited. The proposed MLDA method learns the feature mapping from source and target domains to the shared feature space, decreasing domain shift and effectively using the knowledge learned from the shared feature extractor to diagnose the target domain through unsupervised learning. It can clearly extract domaininvariant features with high generalizability.
5. Conclusions
In summary, this paper develops a MLDA method for bearing fault diagnosis, which can diagnose compound faults and single faults of multiple sizes simultaneously. First, modified ResNet18 is pretrained as a feature extractor. The MKMMD is calculated for the extracted features in multiple layers to narrow the marginal distributions. Second, the features extracted from each block are input into the matching classifier. The predicted probability is calculated through the softmax layer and converted into the pseudolabel to narrow the conditional distributions. Third, the Adam optimization method is adopted to optimize the overall model parameters and speed up the convergence of the model. Through the comparisons of different signal processing methods, different parameter settings, and different methods, the raised MLDA method classifies the fault patterns precisely and achieves better transfer performance. The proposed method is meaningful to prognostics health management (PHM) and can provide reliable fault diagnosis results for practical industrial scenarios.
Data Availability
The data can be obtained from the Institute of Industrial Measurement, Control and Equipment Diagnostics, School of Rail Transportation, Soochow University.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was financially supported by the National Natural Science Foundation of China (no. 51875375) and the Suzhou Science Foundation (no. SYG201802).
References
 D.T. Hoang and H.J. Kang, “A survey on Deep Learning based bearing fault diagnosis,” Neurocomputing, vol. 335, pp. 327–335, 2019. View at: Publisher Site  Google Scholar
 D.T. Hoang and H.J. Kang, “Rolling element bearing fault diagnosis using convolutional neural network and vibration image,” Cognitive Systems Research, vol. 53, pp. 42–50, 2019. View at: Publisher Site  Google Scholar
 S. Park, S. Kim, and J.H. Choi, “Gear fault diagnosis using transmission error and ensemble empirical mode decomposition,” Mechanical Systems and Signal Processing, vol. 108, pp. 262–275, 2018. View at: Publisher Site  Google Scholar
 C. Wang, M. Gan, and C. a. Zhu, “A supervised sparsitybased wavelet feature for bearing fault diagnosis,” Journal of Intelligent Manufacturing, vol. 30, no. 1, pp. 229–239, 2019. View at: Publisher Site  Google Scholar
 X. Jiang, C. Shen, J. Shi, and Z. Zhu, “Initial center frequencyguided VMD for fault diagnosis of rotating machines,” Journal of Sound and Vibration, vol. 435, pp. 36–55, 2018. View at: Publisher Site  Google Scholar
 X. Yu, F. Dong, E. Ding, S. Wu, and C. Fan, “Rolling bearing fault diagnosis using modified LFDA and EMD with sensitive feature selection,” IEEE Access, vol. 6, pp. 3715–3730, 2018. View at: Publisher Site  Google Scholar
 Y. Liu, “Onelevel stationary wavelet packet transform & hilbert transform based rolling bearing fault diagnosis,” in Proceedings of the 2018 IEEE International Conference on Information and Automation (ICIA), pp. 1475–1479, Fujian, China, August 2018. View at: Google Scholar
 G. Chen, Y. Wu, and L. Fu, “Fault diagnosis of fullhydraulic drilling rig based on RS – SVM data fusion method,” Journal of the Brazilian Society of Mechanical Sciences and Engineering, vol. 40, no. 3, pp. 1–11, 2018. View at: Publisher Site  Google Scholar
 J. Yu, M. Bai, G. Wang, and X. Shi, “Fault diagnosis of planetary gearbox using wavelet packet transform and flexible naive bayesian classifier,” in Proceedings of the 2017 36th Chinese Control Conference (CCC), pp. 7207–7211, IEEE, Dalian, China, July 2017. View at: Google Scholar
 H. Fenineche, A. Felkaoui, and A. Rezig, “A effect of input data on the neural networks performance applied in bearing fault diagnosis,” in Proceedings of the Signal Processing Applied to Rotating Machinery Diagnostics, (SIGPROMD’2017), pp. 34–43, Springer, Setif, Algeria, April 2017. View at: Google Scholar
 G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006. View at: Publisher Site  Google Scholar
 S. Zhang, M. Wang, W. Li, J. Luo, and Z. Lin, “Deep learning with emerging new labels for fault diagnosis,” IEEE Access, vol. 7, pp. 6279–6287, 2019. View at: Publisher Site  Google Scholar
 S. Dong, Z. Zhang, G. Wen et al., “Design and application of unsupervised deep belief networks for mechanical fault,” in Proceedings of the 2017 Prognostics and System Health Management Conference (PHMHarbin), pp. 1–7, IEEE, Harbin, China, July 2017. View at: Google Scholar
 X. Guo, L. Chen, and C. Shen, “Hierarchical adaptive deep convolution neural network and its application to bearing fault diagnosis,” Measurement, vol. 93, pp. 490–502, 2016. View at: Publisher Site  Google Scholar
 J. Jiao, M. Zhao, J. Lin, and J. Zhao, “A multivariate encoder information based convolutional neural network for intelligent fault diagnosis of planetary gearboxes,” KnowledgeBased Systems, vol. 160, pp. 237–250, 2018. View at: Publisher Site  Google Scholar
 Y. Xue, D. Dou, and J. Yang, “Multifault diagnosis of rotating machinery based on deep convolution neural network and support vector machine,” Measurement, vol. 156, 2020. View at: Publisher Site  Google Scholar
 J. An, P. Ai, and D. Liu, “Deep domain adaptation model for bearing fault diagnosis with domain alignment and discriminative feature learning,” Shock and Vibration, vol. 2020, Article ID 4676701, 14 pages, 2020. View at: Google Scholar
 C. Tan, F. Sun, T. Kong et al., “A survey on deep transfer learning,” in Proceedings of the International Conference on Artificial Neural Networks, pp. 270–279, Munich, Germany, September 2018. View at: Google Scholar
 S. B. John, B. Koby, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Machine Learning, vol. 79, no. 12, pp. 151–175, 2010. View at: Google Scholar
 D. Peng, Z. Liu, H. Wang, Y. Qin, and L. Jia, “A novel deeper onedimensional CNN with residual learning for fault diagnosis of wheelset bearings in highspeed trains,” IEEE Access, vol. 7, pp. 10278–10293, 2019. View at: Publisher Site  Google Scholar
 L. Wen, L. Gao, and X. Li, “A new deep transfer learning based on sparse autoencoder for fault diagnosis,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 1, pp. 136–144, 2019. View at: Publisher Site  Google Scholar
 X. Li, W. Zhang, and Q. Ding, “A robust intelligent fault diagnosis method for rolling element bearings based on deep distance metric learning,” Neurocomputing, vol. 310, pp. 77–95, 2018. View at: Publisher Site  Google Scholar
 W. Zhang, G. Peng, C. Li, Y. Chen, and Z. Zhang, “A new deep learning model for fault diagnosis with good antinoise and domain adaptation ability on raw vibration signals,” Sensors, vol. 17, no. 2, p. 425, 2017. View at: Publisher Site  Google Scholar
 X. Li, W. Zhang, Q. Ding, and J.Q. Sun, “MultiLayer domain adaptation method for rolling bearing fault diagnosis,” Signal Processing, vol. 157, pp. 180–197, 2019. View at: Publisher Site  Google Scholar
 J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3320–3328, Montreal, Canada, Advances in Neural Information Processing Systems, 2014. View at: Google Scholar
 R. Aljundi and T. Tuytelaars, “Lightweight unsupervised domain adaptation by convolutional filter reconstruction,” in Proceedings of the European Conference on Computer Vision, pp. 508–515, Glasgow, UK, August 2016. View at: Google Scholar
 K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, Lake Tahoe, NEV, USA, July 2016. View at: Google Scholar
 S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010. View at: Publisher Site  Google Scholar
 K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” Journal of Big Data, vol. 3, no. 1, 2016. View at: Publisher Site  Google Scholar
 Q. Li, C. Shen, L. Chen, and Z. Zhu, “Knowledge mappingbased adversarial domain adaptation: a novel fault diagnosis method with high generalizability under variable working conditions,” Mechanical Systems and Signal Processing, vol. 147, 2021. View at: Publisher Site  Google Scholar
 M. Long, Y. Cao, J. Wang, and M. I. Jordan, “Learning transferable features with deep adaptation networks,” in Proceedings of the International Conference on Machine Learning. PMLR, pp. 97–105, Long Beach, CA, USA, July 2015. View at: Google Scholar
 A. Gretton, B. Sriperumbudur, D. Sejdinovic, H. Strathmann, and M. Pontil, “Optimal kernel choice for largescale twosample tests,” in Proceedings of the Neural Information Processing Systems, pp. 1205–1213, Lake Tahoe, NEV, USA, December 2012. View at: Google Scholar
 D. Lee, “Pseudolabel : the simple and efficient semisupervised learning method for deep neural networks,” in Proceedings of the International Conference on Machine Learning, ICML, pp. 2–8, Seoul, Korea, December 2013. View at: Google Scholar
 B. Yang, Y. Lei, F. Jia, and S. Xing, “An intelligent fault diagnosis approach based on transfer learning from laboratory bearings to locomotive bearings,” Mechanical Systems and Signal Processing, vol. 122, pp. 692–706, 2019. View at: Google Scholar
 F. Immovilli, M. Cocconcelli, A. Bellini, and R. Rubini, “Detection of generalizedroughness bearing fault by spectralkurtosis energy of vibration or current signals,” IEEE Transactions on Industrial Electronics, vol. 56, no. 11, pp. 4710–4717, 2009. View at: Publisher Site  Google Scholar
 C. Shen, Y. Qi, J. Wang, G. Cai, and Z. Zhu, “An automatic and robust features learning method for rotating machinery fault diagnosis based on contractive autoencoder,” Engineering Applications of Artificial Intelligence, vol. 76, pp. 170–184, 2018. View at: Google Scholar
 X. Wang, X. Zhang, Z. Li, and J. Wu, “Ensemble extreme learning machines for compoundfault diagnosis of rotating machinery,” KnowledgeBased Systems, vol. 188, 2019. View at: Publisher Site  Google Scholar
 S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 1187–1192, 2010. View at: Google Scholar
 M. Long, J. Wang, G. Ding, J. Sun, and P. S. Yu, “Transfer feature learning with joint distribution adaptation,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2200–2207, Cambridge, MA, USA, June 2013. View at: Google Scholar
 X. Wang, H. He, and L. Li, “A hierarchical deep domain adaptation approach for fault diagnosis of power plant thermal system,” IEEE Transactions on Industrial Informatics, vol. 15, no. 9, pp. 5139–5148, 2019. View at: Publisher Site  Google Scholar
 L. Maaten and G. Hinton, “Visualizing data using tSNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008. View at: Google Scholar
Copyright
Copyright © 2020 Bingru Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.