Abstract

Current studies on intelligent bearing fault diagnosis based on transfer learning have been fruitful. However, these methods mainly focus on transfer fault diagnosis of bearings under different working conditions. In engineering practice, it is often difficult or even impossible to obtain a large amount of labeled data from some machines, and an intelligent diagnostic method trained by labeled data from one machine may not be able to classify unlabeled data from other machines, strongly hindering the application of these intelligent diagnostic methods in certain industries. In this study, a deep transfer learning method for bearing fault diagnosis, domain separation reconstruction adversarial networks (DSRAN), was proposed for the transfer fault diagnosis between machines. In DSRAN, domain-difference and domain-invariant feature extractors are used to extract and separate domain-difference and domain-invariant features, respectively Moreover, the idea of generative adversarial networks (GAN) was used to improve the network in learning domain-invariant features. By using domain-invariant features, DSRAN can adopt the distribution of the data in the source and target domains. Six transfer fault diagnosis experiments were performed to verify the effectiveness of the proposed method, and the average accuracy reached 89.68%. The results showed that the DSRAN method trained by labeled data obtained from one machine can be used to identify the health state of the unlabeled data obtained from other machines.

1. Introduction

Intelligent fault diagnosis can be successful only when two conditions are met [1]. First, the model should be trained by a large amount of labeled fault data. Second, the training data and test data should have the same probability distribution. However, in practical applications, it is often difficult to meet both conditions. First, it is difficult to obtain labeled fault data from some machines [2]. These machines can avoid faults while in operation because unexpected failures may lead to catastrophic accidents and cause heavy losses. Moreover, it may take a long time for a machine to degrade from a healthy state to failure, making obtaining the fault data of the machine very time-consuming. Second, the data are currently labeled manually in most cases, so that the labeling of a large amount of data is expensive and time-consuming. Third, the probability distributions of data obtained from different machines are different, and the classification performance of the intelligent fault diagnosis methods can be significantly weakened when the training and test data sets are obtained from different machines.

As a new machine learning method, transfer learning can make full use of the knowledge learned in the auxiliary domain to solve new but related tasks in the target domain [3], thereby solving problems without enough labeled data to train the model. Moreover, transfer learning brings different domains close to each other by learning domain-invariant features, thereby effectively reducing the differences in the data distribution between the source and target domains.

To date, transfer learning has been widely used in a variety of applications, and many intelligent fault diagnosis methods based on deep transfer learning have been proposed. Zhang et al. [4] trained the domain adaptive ability of a bearing fault diagnosis model using high-order Kullback–Leibler (KL) divergence, and the effectiveness of the model was verified under different working conditions. Wang et al. [5, 6] proposed an adaptive spectrum mode extraction-based fault diagnosis method. Che et al. [7] proposed a deep transfer learning method for rolling bearing fault diagnosis under variable operating conditions. By combining model-based transfer learning with feature-based transfer learning, the versatility of the convolutional neural network (CNN) under variable operating conditions was improved. Wen et al. [8] also developed a deep transfer learning method for fault diagnosis that was tested on bearing data sets collected under different loading conditions. In addition, to solve the problem of the significantly degraded performance of the traditional intelligent fault diagnosis algorithm degrades for changes in the workload, Guo et al. [9] proposed a transfer learning method and verified it by experimenting on the fault diagnosis of the wind turbine gearbox.

Based on the abovementioned studies, currently available intelligent fault diagnosis methods based on transfer learning mainly focus on the conversion between different working and loading conditions. However, in engineering practice, it is difficult to obtain a large amount of labeled data from a machine for model training. Hence, it is of great scientific and practical engineering significance to study transfer fault diagnosis between machines, so that a model trained by labeled data obtained from one machine can be extended to unlabeled data obtained from other machines.

In this study, a deep transfer learning method for bearing fault diagnosis based on domain separation and adversarial learning, domain separation reconstruction adversarial networks (DSRAN), was proposed. Specifically, domain-difference and domain-invariant feature extractors were used to extract domain-difference and domain-invariant features, respectively, of the source and target domains. To ensure the integrity of the features, the two types of features were integrated and reconstructed, and then the training was based on the idea of the generative adversarial network (GAN). When the classifier can correctly classify data in the source domain, the model can be used in cross-domain applications using invariant features.

The major contributions of this study were summarized as follows:(1)A new deep transfer learning method for bearing fault diagnosis was proposed. By directly learning the nonlinear mapping relationship between the original vibration signals and the health state of the bearing, this method automatically extracted the fault features and identified bearing faults from end to end. The domain-difference and domain-invariant feature extractors are used to extract and separate, respectively, domain-difference and domain-invariant features effectively. Moreover, the idea of GAN was used to improve the network in learning domain-invariant features. By using domain-invariant features, the negative transfer can be avoided effectively.(2)Transfer fault diagnosis of bearings between machines was carried out. Traditional intelligent fault diagnosis methods collect the training and test data sets from the same machine. In this study, the training and test data sets were collected from different yet related machines, with the data samples in the test set being unlabeled. The exploration in the intermachine transfer can promote the application of intelligent fault diagnosis in engineering practice.

The structure of this paper is as follows: In Section 1, the research background of transfer fault diagnosis is introduced. In Section 2, the deep transfer diagnosis method is presented. Section 3 describes the experiments and results of transfer fault diagnosis on three bearing data sets. The conclusions are given in Section 4.

2. Transfer Diagnosis

To describe the transfer diagnosis problem in this study, certain definitions in transfer learning [10] need to be introduced. is a data set with samples, where sample is labeled as , and , where denotes the sample space. , refers to the labeled space. is the number of health states. In addition, since the sample data follow the marginal probability distribution , a specific domain in transfer learning is defined as . Traditional intelligent fault diagnosis methods obtain the training set and test set from the same domain; therefore, the two data sets have the same feature space and probability distribution. However, transfer learning obtains the training set and the test set from the source and the target domains, respectively. The feature spaces of the two data sets can be the same or different, but the probability distributions are different.

Based on the above definitions of transfer learning, the transfer learning problem for bearing fault diagnosis between machines is described as follows:(1)The source domain consists of the sample space of labeled data obtained from a machine and its marginal probability distribution , . samples with labeled information are collected from the source domain, .(2)The target domain is composed of the sample space of unlabeled data obtained from other machines and its marginal probability distribution , . samples with health states to be identified are extracted from the target domain , that is, .(3)There should be correlated fault information between the source and the target domains [10]. Moreover, data in the source and the target domains shared the same labeled space, , and the data have different probability distributions, .

The data from the source domain are used for training, and a nonlinear mapping relationship between the sample space and the labeled space can be established, . Since the data in the target and source domains have different distributions, the identification accuracy may be low if the fault diagnosis knowledge obtained from the source domain, , is directly used to identify the health state of the unlabeled samples in the target domain. Therefore, in this study, a deep transfer diagnosis model was constructed to reduce the data distribution differences between the source and the target domains by learning domain-invariant features, so that the fault diagnosis knowledge obtained from one machine can be used to identify the health state of the unlabeled data obtained from other machines.

3. Deep Transfer Diagnosis of Bearing Faults

The bearing fault data obtained from different machines share the same feature space. It is assumed that all of the domains consist of two types of features, namely, domain-invariant and domain-difference features. Domain-invariant features have the same or similar classification capabilities in different domains, while domain-difference features often have strong classification capabilities in one domain, but poor classification performance in another domain. In transfer fault diagnosis between domains, if the domain-difference features are transferred, a negative transfer will be accomplished [11]. Thus, the DSRAN method [12] proposed in this study has two main functions: (1) extracting domain-invariant features in different domains and (2) conducting transfer fault diagnosis based on the extracted invariant features.

3.1. DSRAN

To represent different domain features, the DSRAN method adopts the domain-difference and the domain-invariant modules to extract the domain-difference and domain-invariant features in a domain. The goal of network training is to apply the domain-invariant features obtained by training to different domains and prevent the model from being affected by the distribution differences between the domains. As shown in Figure 1, the DSRAN method is composed of the following six parts:

3.1.1. Target-Difference Feature Extractor

The target-difference feature extractor that is used to extract the domain-difference features of the target domain adopts the structure of a deep CNN. The parameters of the extractor are shown in Table 1.

3.1.2. Source-Difference Feature Extractor

The source-difference feature extractor has the same network structure as that of the target-difference feature extractor.

3.1.3. Domain-Invariant Feature Extractor

The domain-invariant feature extractor that is used to extract the domain-invariant features of the source and target domains has a network structure similar to that of the domain-difference feature extractors, as shown in Table 2.

3.1.4. Reconstructor

The reconstructor combines the domain-difference and domain-invariant features of the source or the target domain and then sends them into the convolutional self-encoder for decoding in order to reconstruct the original signals. Table 3 shows the network parameters of the reconstructor.

3.1.5. Discriminator

The discriminator determines whether the input sample is the original or reconstructed input. Table 4 shows the network parameters of the discriminator.

3.1.6. Classifier

The domain-invariant feature extractor extracts the shared invariant features between the source and target domains. Some of these domain-invariant features possess a strong ability of classifying the fault data in the target domain, while some have poor performance in such classification. The classifier can help the network extract domain-invariant features with strong classification ability. During the training stage, the classifier is used to classify the samples in the source domain to ensure that the training is performed in the expected direction. Once the training is completed, the classifier is used to classify the unlabeled data in the target domain.

3.2. Loss Functions

To train each module of the network effectively, five loss functions are established to constrain the training process.

3.2.1. Difference Loss

During training, sample from the source domain is placed into the source-difference and the domain-invariant feature extractors, from which the source-difference feature and source-invariant feature are obtained. Similarly, after being processed by the two target feature extractors, the target-difference feature and target-invariant feature are obtained. To ensure good results, the DSRAN method must effectively avoid negative transfer by completely separating the domain-difference and the domain-invariant features. Therefore, the difference loss function is proposed to constrain the two features as follows:where calculates the similarities between and as well as between and . When and , reaches the maximum; when is orthogonal to , and is orthogonal to, reaches the minimum. Therefore, by minimizing , and as well as and can be completely separated.

3.2.2. Similarity Loss

Even when and as well as and are completely separated, and may not necessarily be transferred. Hence, the similarity loss function is proposed to improve the similarity between the two. The loss function in discriminant-adaptive neural network (DANN) [13] is applied, as shown in the following equation:where denotes the real domain of the -th input sample and determines whether belongs to the source domain or the target domain ; represents the predicted labeled value of the -th input sample , .

Similarity loss can measure the difference between and . When reaches the minimum, and are so similar to each other that they almost follow the same distribution pattern, and when the distributions of and are approximately the same, the features extracted by the domain-invariant feature extractor are domain-invariant. In this way, the classifier that is effective in the source domain can be transferred to the target domain.

3.2.3. Reconstruction Loss

Although the difference and the similarity loss functions can completely separate from and from and ensure that and follow the same or similar distribution patterns, the integrity of data in the source and target domains cannot be guaranteed. However, the reconstructor can ensure the integrity and validity of the features. To form an adversarial relationship between the reconstructor and the discriminator, the reconstructor is updated, and the reconstruction loss is defined. The binary cross-entropy loss serves as the reconstruction loss, as shown in the following equation:where denotes the total number of reconstructed samples; refers to the labels corresponding to the reconstructed samples, all of which are set to be 1; and represents the probability value of the -th sample to label 1.

3.2.4. Discrimination Loss

Discriminator D is included to classify the original and reconstructed samples accurately, with the former labeled 1 and the latter 0. The discrimination loss is calculated by the binary cross-entropy loss function as shown in the following equation:where denotes the total number of input samples; stands for the real label of the -th sample, which is 1 in the case of real samples and 0 in the case of reconstructed samples; and represents the probability value of the -th sample to label .

The reconstructor and the discriminator are optimized in two steps. The discriminator is optimized in the first step. For the original sample and the reconstructed sample , the discriminator optimization is as follows:

For the original sample , the predicted result is supposed to be as close to 1 as possible, that is, the greater the value of , the better the result. For the reconstructed sample , the predicted result should be as close to 0 as possible, that is, the smaller the value of , the better the result.

Then, the reconstructor is optimized in the second step as follows:

To reduce the difference between the reconstructed and original samples, the label of the reconstructed sample should be 1. At this point, should be as large as possible. To unify the form with Equation (5), is minimized.

The following equation shows the combined optimization of the discriminator and reconstructor:

3.2.5. Classification Loss

By classifying the data samples in the source domain, the training is monitored to ensure that the extracted domain-invariant features can accurately classify the data in both the source and the target domains. The following equation shows the loss function of the classifier:where denotes the total number of samples in the source domain; refers to the one-hot code of the label corresponding to the -th sample in the source domain; and is the softmax output of the -th sample in the source domain.

During the training process, the decreasing rate of , , , , and may be inconsistent, causing the model to be dominated by a certain module [14]. Hence, need to use weight coefficients, for the entire network, the final loss function is given by

The ultimate optimization objective of the network is to minimize the abovementioned loss, where ,,, and denote the weight coefficients of different loss functions; their function is to balance the decreasing rate of various losses. The loss with a faster decreasing rate corresponds to a smaller coefficient, while the loss with a slower decreasing rate corresponds to a larger coefficient. The specific value is generally determined by experiments. After many tests, the final value of is 0.3,; is 1; is 1; and is 1.

4. Results and Analysis

4.1. Transfer Diagnosis Data Sets

Three data sets collected from three different rolling bearing experimental platforms were used as the bearing data in this study.

4.1.1. Case Western Reserve University (CWRU) Bearing Data Set

Collected by the Electrical Engineering Laboratory [15] of CWRU in the United States, the CWRU bearing data set is a universally accepted standard data set for fault diagnosis of rolling bearing. Figure 2 shows the CWRU rolling bearing test stand. The bearings used in the experiment included a normal bearing, bearing with fault at the fan end, and bearing with fault at the drive end. The faults were found in the inner ring, outer ring, or rolling element. The electrical discharge machining (EDM) technology was used to introduce single-point faults in the bearing. Faults with diameters ranging from 0.007 inches to 0.040 inches were introduced. In the experiment, the vibration signals of the bearings were collected by an accelerometer at a sampling frequency of 12 kHz, and the bearing fault data at the drive end were also collected by an accelerometer at a sampling frequency of 48 kHz.

Only the data sets of normal bearings and bearings at the drive end were used in this study. The bearings at the drive end were sampled at a frequency of 48 kHz, and the fault diameter was 0.014 inches. The motor load is 1 hp and has a speed of 1,772 r/min. The constructed data set included three types of faults: normal state (NR), inner ring fault (IR), and outer ring fault (OR).

4.1.2. Paderborn University Bearing Data Set

The Paderborn University bearing data set [16] was collected on the modularized experimental platform. The platform consisted of five modules: a motor, the torque measuring shaft, the rolling bearing test module, the flywheel, and a load motor. The flywheel and the loading device were used to simulate the inertia and load of the drive device. The rated torque of the motor is 6 Nm (power: 1.7 kW).

The data set contained two types of bearing damage data, namely, artificial and real damages. In this study, only the artificial damage data were used, including three types of faults, that is, NR, IR, and OR.

4.1.3. Xi’an Jiaotong University (XJTU-SY) Bearing Data Set

XJTU-SY rolling bearing accelerated life test data Set [17] contains the life cycle data of fifteen rolling bearings under three working conditions that was collected by Sumyoung Technology Co., Ltd. (Changxing City, Zhejiang Province, China) and a research team led by Professor Lei Yaguo from Xi’an Jiaotong University in a two-year rolling bearing accelerated life test. The experimental platform was mainly composed of the alternating-current (AC) motor, the hydraulic loading system, the digital force display, the revolving speed controller, the accelerometer, and the bearings. In the experiment, the vibration signals were sampled every 1 min at a frequency of 25.6 kHz, and the duration of each sampling was 1.28 s. The faults are worn inner ring, fractured outer ring, rolling elements, and cracked retainer of the tested bearings.

The data used in this study were collected at a speed of 2250 r/min and a radial force of 11 kN. The data set included three different bearing states, that is, NR, IR, and OR. For each state, there were 30 sample files.

4.2. Transfer Diagnosis Experiment

Currently, the most challenging part of fault diagnosis is intelligent fault diagnosis of machines with unlabeled data. To overcome this challenge, in this study, the classifier was first trained with labeled data obtained from one machine and then was used to classify the unlabeled data obtained from other machines.

In this section, six experiments were carried out on three rolling bearing data sets obtained from three different but related machines to study the transfer learning method for bearing fault diagnosis between machines. The data were vibration signals collected from different machines under different operating conditions. The types of bearing faults included NR, IR, and OR. Table 5 shows the distribution of the data sets in each transfer diagnosis experiment.

The performance of the DSRAN method was evaluated based on the six transfer fault diagnosis experiments (Table 5). In each experiment, the data sets on the left and right sides of the arrow represent the data from the source and target domains, respectively. The training data set covered all labeled samples from the source domain and 70% unlabeled samples from the target domain, and the test data set contained the remaining 30% unlabeled samples from the target domain.

Figure 3 shows the results on the accuracy of DSRAN in the six transfer diagnosis experiments. The lowest accuracy of fault diagnosis was 85.06% occurred during the transfer from the XJTU-SY bearing data set to the CWRU bearing data set, and the highest accuracy of fault diagnosis was 93.15%, which was found during the transfer from the Paderborn bearing data set to the XJTU-SY bearing data set. The average accuracy of the six experiments was 89.68%, indicating that the DSRAN method proposed in this study can effectively train the classifier with the labeled data collected from one machine and then classify the unlabeled data collected from other machines.

4.3. Comparison of Different Methods

To further verify the effectiveness of the proposed DSRAN method, the same transfer diagnosis experiments were performed using the other five methods, that is, support vector machine (SVM), deep convolutional neural networks with wide first-layer kernels (WDCNN) [18], deep domain confusion (DDC) [19], deep reconstruction-classification networks (DRCN) [20], and DANN [13]. For comparison purposes, these methods were divided into three types (Table 6).(1)Traditional machine learning methods: The comparison aimed to demonstrate the differences between traditional machine learning methods that use manually extracted features and the deep learning methods that automatically extracted the features. With the original vibration signal as the input, the deep learning method performs an end-to-end fault diagnosis without the manual extraction of features.(2)Regular CNN methods: Regular CNN methods only use labeled samples from the source domain for training. However, the transfer learning method can train the model with both labeled samples from the source domain and some of the unlabeled samples from the target domain.(3)Classical transfer learning methods: Currently, available transfer learning methods have achieved good results in minimizing the distribution differences between the source and the target domains by maximum mean difference and domain-adversarial training, and networks can learn more and better domain-invariant features. With the integration of domain separation and adversarial learning, the DSRAN method can ensure that the learned features are domain-invariant.

Figure 4 shows the classification accuracy of the five methods in the six experiments. The proposed DSRAN method outperformed these methods in the six experiments.

Specifically, the classification comparison showed the following results:(1)Compared with traditional machine learning methods, the deep learning algorithms showed higher accuracy, indicating that automatic feature extraction in deep learning was superior to manual feature extraction in machine learning. In particular, the classification accuracy of WDCNN was lower than that of SVM because WDCNN, which is designed to contain only a few simple convolutional layers, was unable to learn enough deep features. Moreover, for the deep learning methods that directly used the original signal for training, the distribution differences in the learned features were reduced.(2)Transfer learning methods generally had a higher classification accuracy than the regular CNN methods. This is largely because transfer learning methods can reduce the distribution differences between the data from the source domain and the data from the target domain, while regular CNN methods only use data from the source domain for training, without effectively utilizing the information collected from the target domain.(3)Compared with the other three widely used transfer learning methods, namely, DRCN, DDC, and DANN, the proposed DSRAN method realized higher classification accuracy in the six transfer diagnosis experiments, indicating that DSRAN narrowed the distribution differences between data from the source and the target domains more effectively than the three extensively applied transfer learning methods. Moreover, the relatively high accuracy verified the practicability of the proposed DSRAN method.

5. Conclusions

In this study, transfer learning was used in intelligent bearing fault diagnosis between machines so that the intelligent fault diagnosis model trained by labeled data obtained from one machine can be used to identify the health state of the unlabeled data obtained from the other machines. The method solved the problems of the difficulty to obtain a large amount of labeled data from some machines for model training and of the fault diagnosis method trained with labeled data from one machine being not able to classify unlabeled data from other machines. The deep transfer method for bearing fault diagnosis proposed in this study was verified in six transfer fault diagnosis experiments, and the following conclusions were drawn:(1)By extracting the domain-invariant transfer fault features from data in the source and the target domains, DSRAN can reduce the differences in data distribution between different domains, so that the fault diagnosis knowledge obtained from one machine can be used to identify the health state of the unlabeled data from other machines.(2)Compared with other methods, the proposed DSRAN method classified bearing faults more accurately, suggesting that the DSRAN method trained by labeled data obtained from one machine can effectively classify unlabeled data from other machines. Hence, DSRAN can be applied to fault diagnosis of machines with unlabeled data.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (Grant nos. 51905452 and 51775452), the Fundamental Research Funds for the Central Universities (Grant nos. 2682019CX35 and 2682017ZDPY09), the China Postdoctoral Science Foundation (Grant no. 2019M663549), the Local Development Foundation guided by the Central Government (Grant no. 2020ZYD012), and the planning project of the Science & Technology Department of Sichuan Province (Grant no. 2019YFG0353).