#### Abstract

Effective fault diagnosis methods can ensure the safe and reliable operation of the machines. In recent years, deep learning technology has been applied to diagnose various mechanical equipment faults. However, in real industries, the data distribution under different working conditions is often different, which leads to serious degradation of diagnostic performance. In order to solve the issue, this study proposes a new deep convolutional domain adaptation network (DCDAN) method for bearing fault diagnosis. This method implements cross-domain fault diagnosis by using the labeled source domain data and the unlabeled target domain data as training data. In DCDAN, firstly, a convolutional neural network is applied to extract features of source domain data and target domain data. Then, the domain distribution discrepancy is reduced through minimizing probability distribution distance of multiple kernel maximum mean discrepancies (MK-MMD) and maximizing the domain recognition error of domain classifier. Finally, the source domain classification error is minimized. Extensive experiments on two rolling bearing datasets verify that the proposed method can implement accurate cross-domain fault diagnosis under different working conditions. The study may provide a promising tool for bearing fault diagnosis under different working conditions.

#### 1. Introduction

Rolling element bearings are an integral part of the rotating mechanical system, which are widely applied to many fields, such as gearbox, locomotive wheel, and gas turbine. Failure of the bearing directly affects the unexpected downtime, which will lead to higher maintenance costs and even safety issues. Therefore, it is of great significance to implement high accuracy fault diagnosis of bearing. In recent years, with the development of big data technology, data driven intelligent fault diagnosis technology is widely concerned because it can provide accurate diagnosis results without extensive expert knowledge and cumbersome artificial feature extraction. In particular, many researches introduced deep learning into bearing fault diagnosis and achieved good results [1–3].

A number of researches neglected the change of working conditions, which assumed that the distribution of training data and testing data is the same. Zhang et al. [4] designed a deep belief network and verified the effectiveness of the proposed method through the turbofan engine degradation dataset. Lei et al. [5] established the two learning stages: one is using unsupervised networks to extract features and the other is using softmax regression to classify the health conditions. Jia et al. [6] presented a local connection network which is constructed by normalized sparse autoencoder, and the performance of the method was verified on a gearbox dataset and a bearing dataset. Jian et al. [7] combined with adaptive one-dimensional convolution neural network (CNN) on Wide Kernel and Dempster-Shafer evidence theory to put forward a one-dimensional fusion neural network. And experimental results on the bearing data of Center of Case Western Reserve University (CWRU) showed that this method has good diagnostic accuracy. Wen [8] used a conversion method converting signals into two-dimensional images to build a new fault diagnosis model of CNN based on LeNet-5. Shao et al. [9] developed a novel method for intelligent fault diagnosis of rolling bearings using ensemble deep autoencoders. Wang et al. [10] used the optimization method called batch normalization to train the deep neural network, and experimental results show that the proposed method can extract features quickly in an elegant way. Huang et al. [11] added a new layer in front of the convolution layer to construct the composite signal; the validity and necessity of adding a new layer were verified by experiments. Zhang et al. [12] proposed residual learning algorithm to solve the issue that the gradients in optimization may vanish or explode during backpropagation. With the rapid development of deep learning technology, various fault diagnosis methods based on deep learning were continuously proposed [13–15]. However, there are two conditions for the direct application of deep learning to fault diagnosis. (1) The accuracy of diagnosis needs a lot of labeled data for training. (2) The distribution of training data and testing data is the same. In some fields, such as locomotive bearing or aerospace bearing, the labeled data are difficult to obtain. In addition, due to the constant change of working conditions in real industries, the distribution of training data and testing data is often different, which leads to the decline of model generalization ability. Therefore, it is of great practical significance to propose a fault diagnosis model that can implement accurate fault diagnosis under different working conditions.

Targeting this issue, various signal processing methods were proposed. Liu et al. [16] proposed fault diagnosis technology of unknown time-varying speed bearing based on multicurve extraction and selection, Vold–Kalman filter, and generalized demodulation. Zheng et al. [17] developed a multiscale fuzzy entropy method for measuring the complexity of time series. However, these methods depend largely on the quality of manually extracted features, and they require domain knowledge and human intervention. Therefore, we consider whether this problem can be solved by a deep learning method that directly takes the raw vibration signal as input. Inspired by the idea of transfer learning, it just can provide an effective solution to solve these problems. Transfer learning consists of source domain data and target domain data; the source domain data is labeled, and transfer learning can be divided into supervised transfer learning, semisupervised transfer learning, and unsupervised transfer learning based on whether the target domain data is completely labeled, partially labeled, or not labeled. The purpose of transfer learning is to reduce the features distribution discrepancy of source domain data and target domain data [18]. In transfer learning, when the distribution of source domain data and target domain data is different but the two tasks are the same, this special transfer learning is called domain adaptation. The deep learning algorithm based on domain adaptation has achieved important results in image recognition and speech recognition [19, 20]. In recent years, a variety of intelligent bearing fault diagnosis methods based on domain adaptation have been proposed [21–23]. A domain adaptation bearing fault diagnosis method was proposed in [24] by fast Fourier transformation of the original signal. Ren et al. [25] used multiscale permutation entropy and time-domain features as network input to train neural network, and it is verified by experiments that the proposed method can implement fault diagnosis under different working conditions. However, the diagnostic performance of the methods in [24–25] is also affected by the quality of manually extracted features. Similarly, deep learning can also be used to achieve transfer learning. Lu et al. [26] proposed a novel domain adaptation model of deep neural network, which shorted the distance between the source domain features and the target domain features through maximum mean discrepancy (MMD). Through the addition of the MMD adaptation layers, the distance between the two domains can be significantly shortened, and the accuracy of cross-domain diagnosis can be improved [27, 28]. In [26–28], their methods alone used the MMD to reduce the distribution discrepancy between source and target domain features. Li et al. [29] proposed a novel cross-domain fault diagnosis method based on deep generative neural networks. Based on condition recognition and domain adaptation, Guo et al. [30] established a deep convolutional transfer learning network. Inspired by [29, 30], to effectively reduce distribution discrepancy and improve diagnostic accuracy, we consider both MK-MMD loss and domain classifier loss.

Fault diagnosis under different working conditions is very common and practical, so it is of great significance to find a widely applicable fault diagnosis method under different working conditions. For comprehensive consideration in all the above analysis, in order to implement the accurate fault diagnosis of unlabeled data under different working conditions, a new DCDAN is proposed in this paper. The main contributions of this paper can be summarized as follows. Firstly, a new domain adaptation method is proposed, which can implement the accurate fault diagnosis without labeled data under various working conditions. Secondly, a new optimization objective function is proposed, which includes minimizing source domain classification error, minimizing MK-MMD, and maximizing domain recognition error. Lastly, the cross-domain fault diagnosis experiment of two datasets is established and the superiority of the proposed method is demonstrated by comparing with the existing methods.

The structure of this paper is as follows: Section 2 details the domain adaptation, CNN, and MMD problems. Section 3 presents the model framework and optimization function of the method proposed. In section 4, two case studies are conducted with the proposed model. Section 5 is conclusion of the whole paper.

#### 2. Previous Works and Preliminaries

##### 2.1. Domain Adaptation

Domain adaptation learning can effectively solve the problem of inconsistent probability distribution between training data and testing data. In general, let represent a domain data, where is the feature space of inputs, is the marginal probability distribution of inputs, and is a series of learning samples. Usually, represents source domain and represents target domain. Given a labeled source domain and unlabeled target domain , suppose feature space , category space , conditional probability , and marginal probability distribution . The goal of domain adaptation learning is to use the labeled data to learn a classifier to predict the label of the target domain .

In this paper, we try to solve the domain adaptation issues under different working conditions, that is, how to use the labeled data under one working condition to implement fault diagnosis of unlabeled data under other unknown working conditions. Since the source domain data and the target domain data are derived from vibration data under different working conditions, thus, the marginal distributions of these domains result in discrepancy. As shown in Figure 1(a), if the classifier learnt from source domain is directly applied to the target domain classification, the classification result is very poor. If the domain invariant features are extracted by domain adaptation learning, the classifier learnt from the source domain can effectively classify the target domain data. So, learning domain invariant features are a key step to implement fault diagnosis under different working conditions.

##### 2.2. Maximum Mean Discrepancy

MMD is a measure of the discrepancy between two domains, which is the most frequently used nonparametric distance metric in domain adaptation learning. MMD is a kernel learning method, which measures the distance between two distributions in reproducing kernel Hilbert space (RKHS) [31]. Supposing that source dataset and target dataset are obtained from the distributions and through independent and identically distributed sampling and the sizes of the data set are and , respectively, MMD is defined aswhere represents the RKHS, , and . If , , k(.,.) is the feature kernel.

In the deep neural network, the features become exclusive with the deepening of layers. And in the higher layer, the domain adaptability of features decreases significantly with the characteristics becoming exclusive. So, optimal kernel choice is crucial for effect of domain adaptation. In this paper, in order to enhance the portability of domain feature representation and better implement domain transfer learning, we focus on the multiple kernel variant of MMD (MK-MMD). As mentioned in [32], MK-MMD assumes that the optimal kernel can be obtained by linear combination of multiple kernels, which is defined aswhere are the coefficients.

#### 3. The Proposed Method

A novel DCDAN model is proposed to solve the domain adaptation problem in bearing fault diagnosis under changing working conditions. The method has the ability to extract features directly from the raw vibration signal of source domain and target domain and can diagnose bearing faults of unlabeled target domain without manual data conversion. In this part, we mainly introduce the details of DCDAN model, including model structure, optimization algorithm, and training strategy. Because CNN has very good performance in image feature extraction, this paper transforms the original vibration signal into two-dimensional image through data preprocessing.

##### 3.1. DCDAN

The architecture of the DCDAN model is shown in Figure 2. This model can be divided into three parts: feature extraction, domain adaptation, and fault recognition.

In feature extraction, the labeled data from the source domain and the unlabeled data from the target domain are inputted into CNN network at the same time, and the feature is extracted through four stacked convolution layers. In the convolution layer of each cell, there are four operations: convolution, batch normalization (BN), activation function, and maximum pooling. In each convolution layer, filter is used to convolute the inputting image, and zeros-padding operation is implemented to prevent the feature map dimension from changing. After the convolution, the BN is operated to improve the training speed and pull the data distribution back to the standard normal distribution, so that the gradient is always in a large state. After the BN, the nonlinear activation function is introduced to enhance the learning ability of the network and eliminate the problem of gradient disappearance or explosion. The nonlinear activation function is usually ReLU. ReLU proved to be an excellent activation function in neural network [33]. Finally, max-pooling layer retains the main features, meanwhile reducing the parameters and calculations of the next layer and preventing overfitting. Through previous studies, the number and size of convolutional filters have a great impact on the diagnosis results. Then, the learned features are tiled into one-dimensional feature vector, and the feature vector is used as the input of F1 and F4. The F1 layer connects F2 and F3 layers, and F4 layer connects F5 layer.

The feature distribution is different under different working conditions. In the domain adaptation part, in order to minimize the distribution discrepancy, we use MK-MMD to measure the distribution discrepancy of the learned transferable features. As shown in Figure 2, we embed the features of F1, F2, and F3 layers into RKHS and use the linear combination of Gaussian kernel to align distribution of different domains. Then, the multiple kernel distribution discrepancy is taken as the optimization goal, and the network parameters are trained by minimizing MK-MMD. In addition, the addition of domain classifier is also a means to reduce the distribution discrepancy of domain. For the domain classification task, the predicted results of domain classification are outputted in the final domain classifier layer.

In fault recognition part, for the fault classification task with the label in the source domain, the final output layer of the source domain uses softmax regression to output the predicted fault categories. The features distribution of the target domain data and the source domain data are drawn closer by domain adaptation learning. Consequently, the network can correctly classify the target domain unlabeled data. In the deep learning model training, if the parameters of the model are too many and the training samples are too few, the trained model is easy to produce the phenomenon of overfitting. To prevent overfitting, the model uses the dropout technique after each fully connected layer.

##### 3.2. Optimization Objective

The proposed optimization object of DCDAN consists of the following three parts: (1) minimizing the classification error of source domain data; (2) minimizing MK-MMD discrepancy between the distributions of two domains; and (3) maximizing domain classification error.

In order to make the predicted fault categories of DCDAN model more close to the actual fault categories, the cross-entropy loss is used as the loss function. Cross-entropy can be used to evaluate the difference between the predicted results and the real results. Reducing the cross-entropy loss can improve the prediction accuracy of the model. The cross-entropy loss *L*_{c} of source domain is presented as follows:where is the batch size of input data, is total fault categories, is the real label of input data, and is output results of softmax.

Although MMD has achieved a good domain adaptation effect in some researches, most of the studies minimize the distribution discrepancy of the last layer of the full connection layer. Single layer MMD cannot completely eliminate the feature discrepancy because feature transferability deteriorates in multiple top layers. So, in this study, we embed the features learned in fully connected layer into RKHS, which is called MK-MMD. The second optimization object of the DCDAN model is to minimize MK-MMD. MK-MMD loss based on equation (2) is calculated as follows [34]:

At the same time, in order to better maintain the domain invariant, we design a domain classifier model. As shown in Figure 2, the domain classifier is connected to the last convolution layer. If the domain classifier cannot distinguish the data from the source domain and the target domain, it means that the features of the last convolution layer are invariant. Therefore, the third optimization object of the DCDAN model is to maximize the domain classification error. The domain classifier is a two classification problem, and the cross-entropy loss of classification is as follows:where is the batch size of input data, is real domain label of input data, and is output results of softmax.

To maximize *L*_{dc}, a special gradient reversal layer (GRL) was introduced. The purpose of GRL is that transfer weight is unchanged during the forward propagation, and the sign of neuron weight increment is reversed during the back propagation. The GRL can be formulated as a function described as follows:where is an identity matrix.

Through equations (3)–(5), we can write the final optimization objective as follows:where and are the trade-off parameters and .

In this paper, the optimization objective of DCDAN model is trained by the stochastic gradient descent (SGD) method. In model training, we define , , and as the parameters of feature extractor, fault recognition, and domain classifier, respectively. The loss function (8) can also be written as follows:

SGD is applied to update , , and as follows:where is the learning rate.

##### 3.3. Algorithm and Training Strategy

The overall framework of DCDAN model is shown in Figure 3. Firstly, the data from different working conditions are divided into labeled source domain data and unlabeled target domain data. Then data is extracted domain invariant feature through convolution layer. In the process of training DCDAN model, the SGD is used to minimize the loss as in (9) of the whole batch of data. After a certain number of epochs, the loss function can generally converge. Finally, we can use DCDAN model of saving the trained network parameters to classify the testing data and get the diagnosis results.

#### 4. Results

In this section, two experiments of rolling element bearings are taken as examples to verify the effectiveness of the proposed model. Case 1 adopted the CWRU bearing dataset. Case 2 is the rolling element bearings collected by the bearing test rig we built.

##### 4.1. Case 1: CWRU Experiment

###### 4.1.1. CWRU Experiment Description

The first experiment in this study adopts the CWRU bearing dataset [34].There are four states of bearing: normal (N), inner-race faults (IF), outer-race faults (OF), and ball fault (BF). Each fault position has a damage diameter of 0.007 inches, 0.014 inches, and 0.021 inches, plus N, a total of ten categories. The drive-end data is adopted and the sampling frequency is 12 kHz. In this paper, we use overlapping slice technology to expand the data, and the principle is shown in Figure 4 [35]. We used 784-size sliding window to expand the CWRU bearing dataset. The overlapping parameter is set to 16. Therefore, data enhancement methods are used to obtain 2000 samples for each health condition, with a total of 20000 samples. Before the data are inputted to the DCDAN model, we need to convert one-dimensional 784-length data into two-dimensional data with size of 28 × 28. In this experiment, the data of the abovementioned ten states are collected at four rotating speeds (1797 rpm, 1772 rpm, 1750 rpm, and 1730 rpm), and datasets (N1, N2, N3, and N4) under four different working conditions are formed, as shown in Table 1. N1, N2, N3, and N4 are four different domains; each domain contains ten health conditions (nine fault states and one normal state).

In order to prove the benefits of the method proposed in this paper, we design 12 sets of domain adaptation fault diagnosis tasks: N1 ⟶ N2, N1 ⟶ N3, N1 ⟶ N4, N2 ⟶ N1, N2 ⟶ N3, N2 ⟶ N4, N3 ⟶ N1, N3 ⟶ N2, N3 ⟶ N4, N4 ⟶ N1, N4 ⟶ N2, N4 ⟶ N3. In each domain adaptation experiment, the front part of the arrow represents the source domain, and the back part of the arrow represents the target domain. Source domain data is labeled data, and target domain data is unlabeled data. The training data consists of source domain data and 50% target domain data. The remaining 50% target domain data is used to test the model effect.

###### 4.1.2. Implementation Details

In the fault diagnosis experiment, the setting of network parameters is also very important. This paper sets parameters of CNN convolutional layer network structure and fully connected layer network structure through a large number of experiments; the parameters of network are shown in Table 2.

The related parameters are as follows: the network parameter settings of CNN feature extraction are shown in Table 2. In the domain adaptation part, the value of two trade-off parameters affect the cross-domain diagnosis results of the DCDAN. In order to find the optimal combination, *μ* and *λ* are searched from {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}, respectively, and *μ* + *λ* = 1. Each experiment conducted 4 trails, and the mean value is calculated. Through comparative experiments, the optimal result is obtained when *µ* = 0.5 and *λ* = 0.5. The batch size is set at 128 and the training epoch is set at 1000. The initial learning rate is set at 0.01 and its value is reduced to 10% of the original value after every one hundred epochs. Each group of domain adaptation experiments are repeated 10 times, and the accuracy is averaged. The calculation process of the model is conducted on a PC with Intel i7-8700 CPU, 16 GB RAM, and NVIDA RTX 2070 GPU, and PyTorch platform is used for the programming. Taking experiment N1⟶N2 as an example, the training loss is shown in Figure 5. It can be seen that the training loss of the proposed DCDAN converges after 1000 training epochs.

###### 4.1.3. Comparison Methods

In order to show the superiority of the DCDAN model, the proposed method in this paper is compared with several common domain adaptation learning methods. Specifically, other methods are used in the same dataset for the same task of domain adaptation learning, and the results of domain adaptation diagnosis are obtained. This paper mainly studies the following methods.(1)CNN: the CNN model trained in the source domain is directly used for fault diagnosis in the target domain, without domain adaptation. The CNN network consists of the four stacked convolution layers and four fully connected layers. The parameters of four stacked convolution layers are the same as the four convolution modules of DCDAN, as shown in Table 1. Three fully connected layers are used in the network with 256, 128, and 10 hidden units, respectively. The learning rate is set at 0.0001. The optimization objective only includes the cross-entropy loss. Like DCDAN, CNN uses raw vibration data of two dimensions as network inputs. The other parameter settings are the same as in DCDAN.(2)Transfer component analysis (TCA): it is a domain adaptation method based on handcrafted features [36]. The inputs of TCA are frequency spectrum data. And the 14 handcrafted features used are mean, RMS, kurtosis, variance, crest factor, wave factor, and eight energy ratios of wavelet package transform, respectively [23]. The trade-off parameter is searched from {0.01, 0.1, 1.0, 10, 100}.(3)Deep domain confusion (DDC): the loss term of MMD is added in the last full connection layer of convolutional neural network. The basic network structure and model parameters are the same as the proposed method. The DDC also uses raw vibration data of two dimensions as network inputs.(4)DCDAN method without domain classifier (DCDAN-WDC): DCDAN-WDC method is to remove the domain classifier in the DCDAN method and only retain the domain adaptation part of MK-MMD. For a fair comparison, the same DCDAN model structure is utilized, in which no domain classifier part is added. The optimization objective includes the cross-entropy loss and MK-MMD loss. The other parameter settings and input data are the same as in DCDAN.

###### 4.1.4. CWRU Diagnosis Results and Discussion

The diagnostic results of the testing samples of the proposed DCDAN method and the compared approaches on the CWRU dataset are given in Table 3. Under without domain adaptation, the average diagnostic accuracy is only 82.88%. Two existing methods TCA and DDC have limited effect on improving diagnostic performance, and their average diagnostic accuracy are 83.93% and 88.81%, respectively. Meanwhile, the average diagnostic accuracy of DCDAN-WDC is 96.55%, which shows that the addition of MK-MMD can significantly improve cross-domain diagnostic performance. However, the diagnostic accuracy of the proposed DCDAN can reach 99.35%. It can be concluded that the proposed method can implement accurate bearing fault diagnosis under the labeled source domain data and the unlabeled target domain data as training data, and the two domain data come from different working conditions.

To better demonstrate the superiority of the proposed method, t-distributed stochastic neighbor embedding (t-SNE) is used to implement feature visualization of output features [37]. Taking experiment N1⟶N2 as an example, the t-SNE visualizations of the last layer features learned by comparison methods and DCDAN are shown in Figure 6. It can be seen from Figures 6(a) and 6(b) that there is a remarkable distribution discrepancy of feature between the source domain and the target domain. And we can also see that the features of target domain are not well separated. It can be seen from Figures 6(c) and 6(d) that part of the source domain data features and target domain data features of the same health conditions are aggregated together. Among them, Figure 6(e) can clearly show that the source domain data features and target domain data features of the all health conditions are well aggregated together, and the features of target domain samples achieve a good clustering phenomenon. Therefore, the superiority of the proposed method is further verified.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

##### 4.2. Case 2: Validation on Bearing Test Rig

###### 4.2.1. Bearing Test Rig Experiment Description

In order to further verify the proposed method, this paper establishes a practical bearing test rig as shown in Figure 7. Rolling bearing is used as the testing bearing in the experiment. Five bearing health conditions (N, IF, OF, BF, and a compound fault (OF and BF)) are simulated in the experiment. The accelerometer is installed on the bearing house and is used to collect vibration data. The sampling frequency was set as 20 kHz and the sampling time is 220 s. In the experiment, data on five health conditions is collected at rotating speed of 600, 1200, and 1800 rpm, respectively. Thus, this experiment studies six domain adaptation tasks as shown in Table 4. Each domain consists of five health conditions, and 1000 samples are taken from each health condition without the use of data enhancement technology. The output of the F3 of the DCDAN network is 5. Other experimental settings are similar to CWRU experiment.

###### 4.2.2. Bearing Test Rig Diagnosis Results and Discussion

The CWRU data that the speed setting is relatively close is not suitable fault diagnosis of real industries. For this reason, three working conditions with large speed difference are selected to better suit the application of real industries in this experiment, but at the same time it also increases the difficulty of cross-domain fault diagnosis. The accuracy of cross-domain fault diagnosis in this experiment is shown in Table 5. Under without domain adaptation, the average diagnostic accuracy is only 39.94%. This shows that the features distribution discrepancy of the five health conditions under our experimental settings are quite different, and it is more difficult to implement cross-domain fault diagnosis than the CWRU experiment. The proposed method can achieve more than 86% average diagnostic accuracy. Meanwhile, the diagnostic accuracies of TCA, DDC, and DCDAN-WDC are 49.92%, 66.45%, and 72.62%, respectively. This shows that the proposed DCDAN can effectively achieve cross-domain fault diagnosis even when the working conditions are very different. It is worth noting that, the average accuracy between T1 and T2 is 86.10%, the average accuracy between T2 and T3 is 91.93%, and the average accuracy between T1 and T3 is 82.87%. This indicates that the greater the variation of working conditions, the greater the difficulty of cross-domain fault diagnosis.

Taking experiment T1⟶T2 as an example, the t-SNE visualizations of the last layer features learned by CNN, TCA, DDC, DCDAN-WDC, and DCDAN are shown in Figure 8. It can be seen from Figures 8(a) and 8(b) that, through CNN and TCA methods, target domain features are not separated. Similarly, methods DDC and DCDAN-WDC achieve alignment of partial health conditions features, but the target domain features are not completely separated. However, Figure 8(e) can show that the five health conditions are well aggregated together, and target domain features are basically separated. This is enough to show that the proposed method can achieve bearings fault diagnosis under different working conditions; meanwhile the proposed method has strong robustness and generalization ability.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

#### 5. Conclusion

The intelligent fault diagnosis in real industry is suffering the decline of diagnostic performance due to changing working conditions. To address this issue, this study proposes a novel domain adaptation method of bearing fault diagnosis under different working conditions. In our study, AMK-MMD of three fully connected layers between the two domains are minimized and a recognition error of domain classifier after high-dimensional features is maximized to better learn domain invariant features. Through extensive experiments on two datasets, the results show that the DCDAN outperforms the comparison methods. In CWRU experiment, compared with the CNN, TCA, DDC, DCDAN-WDC, and DCDAN achieve 1.01%, 5.93%, 13.67%, and 16.74% improvements on the average accuracy, and average accuracy of the DCDAN can reach 99.35%. The visualizations of output features verify that the proposed method can obtain the more accurate feature distribution alignment across domains and good clustering phenomenon of target domain. In the bearing test rig dataset, the proposed method can achieve more than 86% average diagnostic accuracy, although the working conditions are very different. Compared with the comparison methods, the effectiveness and superiority of the DCDAN are further demonstrated. Therefore, the proposed method has a wide range of industrial application prospects.

#### Data Availability

All data included in this study are available upon request by contacting the corresponding author.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This project was supported by the Fundamental Research Funds for the Central Universities (grant no. N180304018) and the National Key Research and Development Program of China (grant no. 2017YFB1103700).