#### Abstract

The demand for transfer learning methods for mechanical fault diagnosis has considerably progressed in recent years. However, the existing methods always depend on the maximum mean discrepancy (MMD) in measuring the domain discrepancy. But MMD can not guarantee the different domain features to be similar enough. Inspired by generative adversarial networks (GAN) and domain adversarial training of neural networks (DANN), this study presents a novel deep adaptive adversarial network (DAAN). The DAAN comprises a condition recognition module and domain adversarial learning module. The condition recognition module is constructed with a generator to extract features and classify the health condition of machinery automatically. The domain adversarial learning module is achieved with a discriminator based on Wasserstein distance to learn domain-invariant features. Then spectral normalization (SN) is employed to accelerate convergence. The effectiveness of DAAN is demonstrated through three transfer fault diagnosis experiments, and the results show that the DAAN can converge to zero after approximately 15 training epochs, and all the average testing accuracies in each case can achieve over 92%. It is expected that the proposed DAAN can effectively learn domain-invariant features to bridge the discrepancy between the data from different working conditions.

#### 1. Introduction

Bearings and gears are widely used transmission parts in rotating machinery, and their failure directly affects the healthy operation of machinery and even causes serious incidents. Therefore, monitoring and diagnosing the health condition of these transmission parts is crucial [1, 2]. In recent years, the Internet of Things (IoT) based infrastructure is often adopted for condition monitoring and analysis because it can directly handle massive monitoring data with minimal manual intervention [3, 4]. Lei et al. [5] developed an intelligent method based on sparse filtering for bearing fault diagnosis. Jia et al. [6] presented a stacked autoencoders (SAE) based network to diagnose the fault problems of bearing and planetary gearbox. Xu et al. [7] used a deep convolutional neural network (CNN) to achieve a bearing fault diagnosis problem under different working conditions. An et al. [8] adopted a recurrent neural network (RNN) to process variable size sequences of bearing fault samples and achieved satisfactory performance. Xiao et al. [9] proposed a deep mutual information maximization (DMIM) method using a variational divergence estimation approach to maximize the mutual information between the input and output of a deep neural network and achieved motor fault diagnosis. Wang et al. [10] presented a capsule neural network for bearing fault diagnosis and obtained a high classification accuracy. Although these methods have achieved excellent diagnosis performance, they require plenty of labeled data. Besides, the training and testing data must own the same probability distribution. But obtaining a considerable amount of labeled data is quite hard for some machines, and the probability distribution of the fault samples constantly changes due to variable speeds and loads.

Transfer learning provides a promising idea of addressing these problems [11, 12]. In recent years, various related methods have been investigated for fault diagnosis. Wen et al. [13] introduced maximum mean discrepancy (MMD) into SAE and achieved feature transfer learning under variable speeds. Lu et al. [14] developed a transfer learning-based model with domain adaptation for bearing fault diagnosis. Guo et al. [15] presented a deep convolutional transfer learning network (DCTLN) and used six cases to test the effectiveness of DCTLN. Yang et al. [16] offered a domain-shared CNN model to learn the transferable features from bearing used in the laboratory machines and real-case machines simultaneously. An et al. [17] proposed a multilayer multiple kernel variant of MMD, which introduced the kernel method to replace the high dimensional map of MMD and achieved bearing fault diagnosis under different working conditions. Zhang et al. [18] developed a novel sparse filtering based domain adaptation (SFDA) for the mechanical fault diagnosis, which employed *l*1-norm and *l*2-norm to MMD to obtain high dimensional adaptive features. These studies utilized MMD to minimize the target loss by using the source loss to achieve the learning of cross-domain-invariant features. However, MMD measures the discrepancy using a high dimensional map based on reproducing kernel Hilbert space (RKHS), which cannot guarantee the sufficient closeness of different domain features close enough in RKHS.

In recent years, adversarial learning represented by the generative adversarial networks (GAN) [19] has drawn widespread attention. Various emerging GANs based variant networks have remarkably improved the learning effect compared with traditional GANs [20–22]. In the field of fault diagnosis, GAN has also been successfully used for data augmentation to enrich training datasets. Wang et al. [23] utilized GAN to generate synthetic fault signals from frequency spectra to expand the training amount and achieve effective fault diagnosis of the gearbox. Mao et al. [24] also used GAN to solve the imbalance fault diagnosis problem and provided a comprehensive comparative study. Liu et al. [25] trained an autoencoder based on the adversarial training process of GAN to perform bearing fault diagnosis. GAN aims to generate training samples different from those of transfer learning. However, since there are naturally a source domain and a target domain existing in transfer learning, Ganin et al. [26] thought the process of generating samples can be avoided and the data in one of the domains (usually the target domain) can be directly treated as the generated samples. At this point, the generator extracts features instead of generating new samples by continuously learning the characteristics of domain data and making it impossible for the discriminator to distinguish the differences. Thus, the original generator can also be referred to as the feature extractor. So a domain adversarial training of neural networks (DANN) is developed in reference [26]. However, the gradient of DANN is always unstable when training the discriminator. In order to overcome these limitations, Wasserstein distance [27] is employed in the discriminator to evaluate the difference between the two distributions. The Wasserstein distance is also called earth-mover distance, which is a distance metric for measuring the discrepancy of the distribution between the two domains. It can improve the stability of the optimization process, and the Wasserstein distance-based domain adversarial method can directly extract the domain-invariant features from the original signal. Furthermore, Spectral Normalization (SN) [28] is applied to the discriminator to stabilize the training process. SN controls the Lipschitz constant of discriminator function by strictly restricting the spectral norm of each layer so that the discriminator does not make intensive adjustment while Lipschitz constant is the only hyperparameter. In contrast, other normalized terms impose stronger constraints on the weight matrix than expected, which limits the discriminator to recognize the generated or real distribution. Therefore, a novel deep adaptive adversarial network (DAAN) is developed in this study. The main contributions can be summarized as follows:(1) The Wasserstein distance-based domain adversarial method for transferring fault diagnosis is proposed. (2) A new discriminator is designed using the SN strategy to stabilize the training process and accelerate convergence.

The remainder of this paper is organized as follows. Section 2 describes the transfer learning problem. Section 3 details the proposed DAAN model. Section 4 presents the fault diagnosis experiments under different working conditions. Section 5 finally provides conclusions.

#### 2. Theory Background

##### 2.1. Wasserstein Distance

The Wasserstein distance, also called the earth-mover distance, is a distance metric for comparing probability measures and distributions. The gradient of the DANN is always unstable when training the feature extractor. In order to reduce the gradient vanishing problem, Wasserstein distance is employed in discriminator *D* as the distribution measurement function, which is used as the minimum cost to converge to as follows:where represents the set of all joint distributions *γ*(*x, y*) whose marginals are and , respectively. Intuitively, *γ*(*x, y*) can be considered the cost of moving an amount from *x* to *y* in order to transform into . The Wasserstein distance has been used to solve the optimal transportation problem, so is the minimum transport cost under optimal path planning.

Therefore, the improved objective function can be obtained as follows:where is the set of 1-Lipschitz functions.

##### 2.2. Spectral Normalization (SN)

SN can control *D* via constraining the spectral norm of each network layer. Giving a linear layer , the norm is defined by Lipschitz constant:where is equal to the Lipschitz norm , and is the SN operation of *W*:which is equal to the largest singular value of *W*. If the Lipschitz norm is equal to 1, then the inequality can be used to observe the following bound on :

The SN normalizes the spectral norm of *W* to make sure it can satisfy the Lipschitz constraint :

#### 3. Proposed Framework

##### 3.1. Deep Adaptive Adversarial Network (DAAN)

As shown in Figure 1, the proposed DAAN includes the condition recognition module and domain adversarial learning module. The condition recognition module contains a feature extraction network and a fault classify network. The feature extraction network can automatically learn the fault features, and the fault classify network identifies health conditions according to the extracted features. The domain adversarial learning module is completed by using the discriminator network which is connected to the feature extraction network to help learn the domain-invariant features.(1)Condition Recognition: a three-layer feedforward neural network (FFNN) is used to construct this module, and then a classifier is followed so as to recognize the health condition. The optimal objective of the classifier *C* is to train the feature extractor with parameter *θ*_{F} and *C* with parameter *θ*_{C}*.* The following loss *L*_{C} is defined as cross-entropy between the predicted softmax probabilistic distribution and the corresponding labels: where is the indicator function; is the kth value of the predicted distribution, and K is the number of health conditions.(2)Domain adversarial learning: The adversarial training strategy of the GAN is used to extract domain-invariant features. The discriminator *D* is optimized via maximizing the domain adversarial loss *L*_{D} considering parameter *θ*_{F} to minimize the distribution discrepancy between two domains. Therefore, *L*_{D} is defined as follows:

By combining the two optimization objectives, the final loss function can be written as follows:where the hyperparameter *λ* determines the strength of the domain adversarial strategy.

##### 3.2. Training Strategy of DAAN

As displayed in Figure 2, training the proposed method by Adam algorithm is convenient since the optimization objective of the DAAN is built. In the discriminator *D*, a gradient reversal layer [26] is used to connect the feature extractor during the training process. This layer can ensure the feature distribution in the two domains remain indistinguishable enough for the discriminator *D* to obtain the domain-invariant features.

Therefore, the loss can be rewritten as follows:

Based on the above equations and Adam algorithm, the parameters *θ*_{F}, *θ*_{C}, and *θ*_{D} are updated as follows:where is the learning rate.

As the network training is finished, the classifier can accurately identify the unlabeled dataset in the target domain if there are fuzzy domain categories existing in the learned features. In the testing process, the rest target domain dataset is used as the input of the DAAN, and then the classifier outputs the classification result.

#### 4. Experiment Studies

##### 4.1. Case 1: Fault Diagnosis under Different Rotating Speeds

###### 4.1.1. Data Description

The bearing data are collected from the test rig as displayed in Figure 3(a). The rig includes a motor, a driving belt, a shaft coupling, and a bearing seat. There are five bearing health conditions: normal condition (NC), inner ring fault (IF), outer ring fault (OF), roller fault (RF), and concurrent fault in the outer ring and roller (ORF). The four fault bearings are depicted in Figure 3(b). Vibration signal is commonly utilized in condition monitoring and diagnosis due to its rich and useful information with high sampling frequency [29, 30]. All vibration acceleration data were measured under three different speeds of 1100r/min (dataset A), 1300r/min (dataset B), and 1500r/min (dataset C). The sampling frequency of the accelerometer is 25.6 kHz, 200 samples are selected from each bearing health condition, and each sample contains 2400 data points. Hence, a total of 1000 samples are acquired. The spectra of the raw signals are then transformed via FFT, and 1200 data points of each sample in frequency-domain are obtained as the input of the DAAN model. In each experiment, all the source domain samples and half of the target domain samples are used for training. The rest target domain data samples are used for testing. The spectra of those three datasets and the transfer learning cases are displayed in Figure 2.

**(a)**

**(b)**

###### 4.1.2. Diagnostic Results

Figure 4 shows that the proposed DAAN is evaluated on six transfer learning cases: A ⟶ B, B ⟶ A, B ⟶ C, C ⟶ B, A ⟶ C, and C ⟶ A. In each case, the part before and after the arrow refers to the source domain and target domain, respectively. For example, in the case A⟶B, datasets A and B are the source domain and target domain, respectively. The structure of the condition recognition module is [1200, 600, 200, 100, 5], and the domain adversarial module is [1200, 600, 200, 100, 1], in which the unit number of the input layer is determined by the dimension of the samples, the unit number of the output layer for the condition recognition module is determined by the number of the health conditions, and the unit number of the output layer for the domain adversarial module is determined by the result of true or false. The unit numbers of the hidden layer are determined by the dimension to reduce the principle. The learning rate is 0.002, and the penalty parameter *λ* is 0.005. Each training batch includes 500 samples from the source domain and target domain, respectively. The other 500 target domain samples are adopted for testing. In each experiment, a total of 15 trials were conducted to reduce the effects of randomness, and the training step is 50. In case A ⟶ B, the curves of training and testing accuracies are plotted in Figure 5. Accordingly, the training accuracy is approached 100% after approximately 15 training epochs, and the testing accuracy needs approximately 47 training epochs to achieve this goal. The classifier loss curve of DAAN is plotted in Figure 4, and the training loss in DAAN converges to zero after approximately 15 training epochs. For comparison, the loss curves of DANN and DANN without SN are also plotted in Figure 4, it is easy to find that DANN is much more difficult to converge, and DANN without SN needs 25 training epochs to convergence. These performances indicate that the proposed DAAN owns a strong domain-invariant feature extraction ability and can help the model to achieve fast convergence. The results of six transfer cases are displayed in Table 1. All the testing accuracies in each case are over 90%, while some are even over 98%. This high accuracy indicates that the DAAN can effectively identify the health condition of bearing in the absence of labeled data.

To further demonstrate the effectiveness of the DAAN, three methods are adopted for the comparison of the six transfer learning cases. The five comparison methods are SAE [6], transfer component analysis (TCA) [31], MK-MMD [17], SFDA [18], and DANN. The subsequent classifier of SAE and TCA is softmax regression. SAE is trained only by the source domain data. TCA, MK-MMD, and SFDA are three representative examples of using the MMD-regularized subspace learning method in the domain adaptation field. The testing accuracies on the six transfer learning cases are listed in Table 1. It is easy to find that the DAAN achieves the highest accuracies and the lowest standard deviations among the given approaches. The average testing accuracy of SAE without transfer learning is only 60.20% because the target domain data have not participated in the model training. Therefore, compared with SAE, it is obvious that the transfer learning-based method is more effective in handling unlabeled data than traditional intelligent fault diagnosis. The traditional DANN without Wasserstein distance and SN strategy achieved 86.88% accuracy. The average accuracies of TCA, MK-MMD, and SFDA are 81.53%, 94.90%, and 92.98%, respectively. These results are considerably better than those of SAE but are still worse than those of the proposed method. Thus, it can be concluded from the comparison that the DAAN can learn more robust domain-invariant features than the other transfer learning methods.

Furthermore, the t-SNE [32] algorithm is adopted to map the learned features into a 2D scatter diagram to offer visual insights on the two domains. Taking the case A⟶B as an example, the domain-invariant features learned by the DAAN are displayed in Figure 6(f), and the mapping results obtained using the other comparison methods are shown in Figures 6(a)-6(e). The source and target domains are represented in terms of S and *T*, respectively. The result in Figure 6(a) demonstrates that although the SAE model obtains good cluster results, the distribution discrepancies of the two domains are substantially large, except for the NC condition. Thus, it can not effectively classify the unlabeled target samples when the model is only trained using the source samples. Figures 6(b) and 6(c) plot the mapped results of the transferred features learned by TCA and DANN, and the cross-domain discrepancies are clearly reduced. However, some overlapping samples still exist between the IF and RF conditions. Meanwhile, the source and target domain samples of ORF are poorly clustered. Figures 6(b) and 6(c) plot the mapped results of the transferred features learned by MK-MMD and SFDA; it can be seen that the cluster performances have been further improved, but there is still some distance among the two domains. Figure 6(f) illustrates that the proposed DAAN method not only reduces the distribution discrepancy of the two domains, but also amplifies the feature distance of different health conditions. Therefore, it validates that the DAAN can extract considerably more robust transferable features than other traditional methods.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

##### 4.2. Case 2: Fault Diagnosis under Different Loads

Another experiment bench for the transfer learning task under different loads is displayed in Figure 7. This experiment also has five bearing health conditions of NC, IF, OF, RF, and ORF. The rotating speed was fixed at 1800 r/min, and the sampling frequency was 12.8 kHz. The vibration signals were measured under three different loads of 20N (dataset D), 40N (dataset E), and 60N (dataset F). 200 samples were also collected from each health condition under one load, and each sample contained 2400 data points. The frequency-domain samples were also utilized as the inputs of DAAN, and the other parameter sets were the same as in Case 1.

The results were compared with the three other methods, as displayed in Table 2. It shows that the DAAN also achieved the highest diagnosis accuracies for all cases among these four methods at an average testing accuracy of 92.65%. The SAE method without the transfer learning strategy still performed the worst, yielding a success rate of 50.65%. Besides, the average testing accuracies of TCA and DANN are 72.78% and 81.46%, respectively. The average testing accuracies of MK-MMD and SFDA are 91.10% and 89.70%, respectively. These results demonstrate that the proposed DAAN method presents better transfer performance than other methods.

Similarly, in case of D⟶E, the reduced dimension results of these methods are displayed in Figure 8. Figure 8(a) shows that the learned features via SAE still poorly clustered the same health condition samples under different loads and corresponded to a low classification accuracy of 55.28%. Figures 8(b) and 8(c) demonstrate that the learned transferable features through TCA and DANN are subject to a smaller distribution discrepancy than that via SAE. However, the RF and ORF samples under different loads are still separated. Figures 8(d) and 8(e) show the results of MK-MMD and SFDA. We can find that the distributions of transferred features from the two domains are closer than the ones of the features learned by TCA and DANN. Figure 8(d) displays the excellent cluster result obtained by the proposed DAAN. The source and target features under the same health condition are gathered remarkably close, and different health condition samples are also effectively separated. Consequently, the proposed DAAN method can learn domain-invariant features to reduce the discrepancy between different domains.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

##### 4.3. Case 3: Fault Diagnosis Using CWRU Bearing Dataset

In order to test the proposed method for the case under different loads and speeds, a bearing dataset offered by Case Western Reserve University (CWRU) [33] is applied in this section. Four fault types of bearing are considered: (1) normal condition (NC); (2) inner ring fault (IF); (3) outer ring fault (OF); (4) roller fault (RF). There are three different severity levels (0.18, 0.36, and 0.53 mm) for IF, OF, and RF cases. Therefore, there are 10 different bearing health conditions. The raw vibration data was drawn under four different loads, i.e., 0, 1, 2, and 3 hp which corresponded to the three different rotating speeds:1790, 1772, 1750, and 1730 rpm, respectively. The four datasets are named as Datasets *G*, H, I, and J. In this experiment, each fault type under one load includes 200 samples, and each sample contains 2400 data points, so there is a total of 2000 samples for each load.

The accuracies and the corresponding standard deviations of all different transfer scenarios are shown in Table 3. As we can see in Table 3, there are totally 12 different transfer scenarios applied to obtain the diagnosis accuracies. It presents that the average testing accuracies of all the scenarios using the proposed method can obtain more than 98.71% and the standard deviations below 0.17%, which means the proposed method can effectively and stably achieve transfer fault diagnosis under different loads and speeds. In addition, the other transfer learning-based methods can also achieve a good result, maybe because the difference between different working conditions is not big enough. The dimension reduction results of all the transfer learning-based methods are also basically the same. So we only provide the results of G⟶H, I⟶H, and J⟶H to show the effectiveness of the proposed method, which is displayed in Figure 9. It is observed that almost all the transferable features of the same health condition are assembled in the corresponding cluster, and different health condition features are separated. This indicates that the proposed method can learn transferable features without being affected by the varying loads and speeds.

**(a)**

**(b)**

**(c)**

#### 5. Conclusions

In this paper, a novel transfer learning method called DAAN is proposed for mechanical fault diagnosis under different working conditions. The training process of domain adversarial can be guaranteed due to the employment of Wasserstein distance, and the SN strategy can accelerate convergence with much less iteration steps. Three bearing experiments show that the DAAN can obtain over 92% average classification accuracy and achieve fast converge under about 15 training epochs. Moreover, the proposed method presents superior transfer performance to the other transfer learning methods. Therefore, the DAAN can promote the successful application of mechanical fault diagnosis under different working conditions. Although the proposed method can promote the practical application of intelligent fault diagnosis under different domains, it still needs a considerable number of target domain samples for the model training. Therefore, the next challenge is to improve our method under less target domain training samples.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported by the China Postdoctoral Science Foundation (2019M662399) and the Project of Shandong Province Higher Educational Young Innovative Talent Introduction and Cultivation Team (Performance enhancement of deep coal mining equipment).