Abstract

Bearing failure is the most common failure mode in rotating machinery and can result in large financial losses or even casualties. However, complex structures around bearing and actual variable working conditions can lead to large distribution difference of vibration signal between a training set and a test set, which causes the accuracy-dropping problem of fault diagnosis. Thus, how to improve efficiently the performance of bearing fault diagnosis under different working conditions is always a primary challenge. In this paper, a novel bearing fault diagnosis under different working conditions method is proposed based on domain adaptation using transferable features(DATF). The datasets of normal bearing and faulty bearings are obtained through the fast Fourier transformation (FFT) of raw vibration signals under different motor speeds and load conditions. Then we reduce marginal and conditional distributions simultaneously across domains based on maximum mean discrepancy (MMD) in feature space by refining pseudo test labels, which can be obtained by the nearest-neighbor (NN) classifier built on training data, and then a robust transferable feature representation for training and test domains is achieved after several iterations. With the help of the NN classifier trained on transferable features, bearing fault categories are identified accurately in final. Extensive experiment results show that the proposed method under different working conditions can identify the bearing faults accurately and outperforms obviously competitive approaches.

1. Introduction

Bearings are the most critical components and widely used in rotating machinery, whose health conditions, for example, the fault degree in different places under different motor speeds and loads, may have a huge effect on the performance, reliability, and residual life of the equipment [1] or even can lead to heavy casualties [24]. Hence, it is important to diagnose bearings under different working conditions.

Cracks or spalls on the surfaces of the roller, outer race, or inner race are commonly failure modes in bearings [5]. Vibration signal is the most intuitive description for the operating state of a bearing. With the vibration signals under different conditions being collected by sensors [6], many intelligent fault diagnosis methods have already achieved significant success in the field of fault diagnosis. In [7], a genetic algorithm-based SVM (GA-SVM) model was presented, and it had high accuracy and generalization ability by optimizing parameters of SVM. N. Saravanan et al. [8] proposed fault diagnosis method based on DWT and ANN, and it has been proved such approach had the potential to diagnose various faults of the gearbox. There are two key points for common intelligent fault diagnosis technologies, namely, feature extraction and classification. Raw vibration signal collected by sensors is abound in redundant information. Thus, it is important for fault diagnosis to achieve effective features [9]. Many signal processing approaches are applied to feature extraction from vibration signals. Such as, time-domain statistical analysis, frequency domain analysis [10], and time-frequency domain analysis [2]. Then reducing the dimensions is conducted for the sake of computational efficiency, such as principal component analysis (PCA) [11], locally linear embedding (LLE) [12], and linear discriminant analysis (LDA) [13]. Finally, with the help of a suitable classifier, such as nearest-neighbor (NN), support vector machine (SVM), or artificial neural networks (ANN), features acquired from above technological process are used for defect classification.

To be true, most of intelligent fault diagnosis methods work well only under a general assumption: the training and test data are drawn from the same distribution. However, in operation of rotating machinery, because of complicated working conditions and complex sensor signals, the distribution of fault data is not consistent. Vibration signals sampled under different working conditions violate above assumption and show large distribution differences between domains [9, 14], which leads to drop dramatically of performance. More specifically, taking the roller bearing fault diagnosis problem as an example, classifier was trained under a very concrete type of data sampled under a certain motor speed and load; however, the actual application in fault diagnosis is to recognize test data collected under another motor speed and load. Although the fault diameter and categories are not changed, the distribution differences between training data (training domain) and test data (test domain) changes with working condition vary. As a direct result, the classifier can achieve high accuracy on training domain while performing poorly on test domain [14]. This is caused by distribution differences between two domains, since features extracted from one domain can not represent for another domain. Of course we can spend lots of time and efforts to recollect data to build a new classifier for effective fault diagnosis on test domain. However, we can not always replace classifier by repetitively recollecting data. Worse, it is so expensive or even impossible to rebuild the fault diagnosis model from scratch using newly recollected training data for the actual task. Therefore, there is still plenty of room for improvement.

In order to avoid such recalibration effort, we might want to refine a fault diagnosis model trained in one condition (training domain) for a new working condition (test domain) or to refine the model trained on one rolling bearing (training domain) for a new rolling bearing (test domain). This leads to the research of domain adaptation (DA) [15, 16]. DA can be considered as particular setting of transfer learning [17, 18] which aims to leverage the knowledge learnt from a training domain to use in a different but related test domain by reducing distribution differences [18, 19]. Maximum mean discrepancy (MMD) [2022] in the field of DA can be applied to evaluate distribution divergences.

In this paper, considering actual fault diagnosis application, we propose a novel bearing fault diagnosis under different working conditions based on domain adaptation using transferable features (DATF). Dataset of normal bearing and faulty bearings are achieved through the fast Fourier transformation (FFT) of raw vibration signals under different motor speeds and load conditions. Fault diagnosis model is built by using nearest-neighbor (NN) classifier in training domain, and then we resort the pseudo outputs of NN classifier in test domain to refine this model by reducing distribution differences between domains constantly, so that transferable feature representation could be learnt from training and test domains. Finally, NN classifier is built with extracted transferable features and bearing faults are identified accurately.

The rest of this paper is organized as follows. Section 2 sketches out previous works and preliminaries, including domain adaptation and maximum mean discrepancy. Section 3 introduces fault diagnosis using transferable features, including feature space generation and transferable feature extraction and diagnosis. Section 4 presents the experimental evaluations. The conclusion is given in Section 5.

2. Previous Works and Preliminaries

2.1. Domain Adaptation

DA as one research of transfer learning is aimed at making full use of information coming from both training domain and test domain during the learning process to adapt automatically [18, 19, 23]. Generally domain is considered as consisting of a feature space of inputs and a probability distribution of inputs , where is a series of learning samples. Note that distributions of two domains are diverse when source domain and target domain are different; that is, and [20, 24].

In our work, the objective of domain adaptation is to extract transferable features between two domains for realizing successfully bearing fault diagnosis under different working conditions. We denote the labeled training domain , where is the input and is the related class label. Similarly, let the unlabeled test domain be , where the input . In the aspect of distribution, let and be the marginal distributions of and from the training and test domains, respectively. Similarly let and be the conditional distributions of and from the training domain and test domain, respectively [20, 25, 26].

In this literature, we focus on the following settings. One training domain and one test domain share the same fault types and feature space. Domain adaptation in our work is unsupervised and training domain is of labels while test domain is fully unlabeled. The marginal distribution and the conditional distribution . The above settings are well suited to real-world variable working conditions fault diagnosis. Our task is to predict the fault types of bearing accurately in the unlabeled test domain with entirely different distribution by using the model built in training domain.

2.2. Maximum Mean Discrepancy

Typical procedure of domain adaptation is to reduce marginal distribution difference across domains. In our work, domain adaptation is to reduce both marginal and conditional distribution difference simultaneously by explicitly minimizing the empirical distance measure, which is more suitable for the situation of bearing fault diagnosis under different working conditions. In order to avoid expensive distribution calculation caused by the parametric criteria, a nonparametric distance metric, known as MMD, is employed for domain adaptation in our work. Taking data from source domain and target domain , the MMD calculates the empirical estimate of distances across domains in the -dimensional embedding [20, 24]:where is the distance of marginal distributions across domains, is the adaptation matrix, and and denote the number of source instances and target instances, respectively.

3. Fault Diagnosis Using Transferable Features

As mentioned in Section 1, huge distribution difference across training domain and test domain under different working conditions directly leads to poor performance of bearing fault diagnosis. In order to solve this problem, we need to learn the shift between two domains and extract more robust transferable features for two domains. In this section, we present our novel bearing fault diagnosis method under variable working conditions. The framework of our method is illustrated in Figure 1. As shown in Figure 1, fault diagnosis model built via labeled training data is iterated revision according to pseudo-label, and the final diagnostic results are obtained through the above revised model. Details of each part are elaborated in the following subsections.

3.1. Feature Space Generation

Raw time series vibration signals are readily available and abound in bearing information. Owning to the rotating nature of raw vibration signals from a defective bearing, the periodic impulse would appear in obtained signals once a fault occurs. Thus, these fault impacts can be detected generally in frequency domain.

In our work, we directly catch FFT amplitudes from the raw time series vibration signals as samples, where all samples have the same dimension, and these samples are generated under different motor speeds and load conditions, as described in Figure 2.

They are divided into two parts: labeled training data () and unlabeled test data(). Then we use principal component analysis (PCA) to generate feature space. The main steps of feature space generation are as follows.

Step 1. Catch FFT amplitudes from raw time series vibration signals collected under different working conditions as samples .

Step 2. Take one of the conditions with different fault types from as training samples with label , and take another of the conditions with different fault types from as unlabeled test samples .

Step 3. Denote and , where denotes the identity matrix and is considered as the ones vectors. Then, the dimensional representation is found by solving the following optimization problem , and, then, feature space is created by .

3.2. Transferable Feature Extraction and Diagnosis

In order to reduce the marginal distribution difference and extract robust feature for two domains, we resort MMD as the distance measures between and to compare different distributions:where is the MMD matrix and is computed as follows [24, 26]:The marginal distributions between training domain and test domain are brought closer under the new representation by minimizing (2).

In theory, training and test data under different working conditions collected from sensors should be of the same marginal and conditional distributions while the reality is very different. For improving the performance of bearing fault diagnosis under different work conditions, in our work, the differences of conditional distribution between domains are also reduced by mining the class-conditional distribution. Formally, the class-conditional distributions can be measured according to modified MMD.where is MMD coefficient matrix that includes the class label , and it can be calculated according to [24, 26]The conditional distributions between training and test domains are brought closer under the new representation by minimizing (4).

In order to obtain effective and robust transferable feature representation and improve the quality of fault diagnosis, our work aims to reduce the impact of discrepancies from both the marginal and conditional distributions between training and test domains by resorting the pseudo labels of test data [26] on diagnosis, and these pseudo labels can be obtained from a base classifier (NN classifier) built on the labeled training data to predict the fully unlabeled test data. Thus, the final optimization problem (6) in this paper comprised (2) and (4).where is the Frobenius norm that guarantees the optimization problem to be well defined, and is the regularization parameter [24] that trades off the impact of regularization term on the transformation matrix A. The goal is to find the latent feature space created by a transformation matrix where the discrepancies of both the marginal and conditional distributions between domains are significantly reduced. The Lagrange function for (7) is constructed, where is the Lagrange multiplier.According to , the optimal solution of (6) can be acquired through the generalized eigen decomposition.Finally, the adaptation matrix A is obtained from solving (8) for smallest eigenvectors. The procedure of fault diagnosis using DAFT can be depicted as follows in detail.

Step 1. For given training data with label and unlabeled test data in the feature space.

Step 2. Construct MMD matrix by (2). Adaptation matrix generated by the smallest eigenvectors can be acquired by solving (8) through Lagrange multiplier. Then the robust representation for two domains is obtained .

Step 3. Train the NN classifier on projected training data , and then obtain pseudo test data labels that denote the conditional probability by using the trained NN classifier.

Step 4. Update MMD matrix by (5) according to , and then obtain the updated adaptation matrix by solving (8) through Lagrange multiplier. The updated robust representation for two domains is obtained , and then jump to Step 3 until the end of the iteration.

Step 5. Finally the test data labels are predicted accurately by the adaptive NN classifier.

4. Experimental Evaluations

In order to demonstrate the effectiveness of the proposed fault diagnosis method, the vast bearing vibration signals collected from a bearing test rig are used. Dataset is acquired from the bearing data center of Case Western Reserve University (CWRU) [27]. DATF is compared with the baseline approaches and several successful methods.

(a) Baseline: NN classifier with no projection and no adaptation is created. That is, original input is directly used for diagnosis.

(b) NN NA: NN classifier with no adaptation is created. Specifically, we use a new representation extracted from original input by PCA without domain adaptation.

(c) NN SA: NN classifier with projection and domain adaptation using subspace alignment that only reduces the marginal distribution [28].

(a) is a baseline method without projection and domain adaptation techniques, which is widely used in the field of fault diagnosis. (b) is a classical method without domain adaptation, which has achieved success in many fault diagnosis applications. (c) is one of the novel and efficient approach in domain adaptation.

4.1. Experimental Setup and Dataset Preparation

The test-bed illustrated in Figure 3 consists of a driving motor, a 2 hp motor for loading, a torque sensor/encoder, a power meter, accelerometers, and electronic control unit [27, 29]. The test bearings locate in the motor shaft. Subjected to electrosparking, inner-race faults (IF), outer-race faults (OF), and ball fault (BF) of different sizes (0.007in, 0.014in, and 0.021in) are introduced into the drive-end bearing of motor [30]. The vibration signals are sampled with the help of accelerometers installed to the rack with magnetic bases.

The working condition of the rotating machinery is usually complex in real-world. For purpose of simulating the actual application and making the experimental results more persuasive, in our experiment, dataset, collected from Drive-End Bearing Fault Data and sampled at a frequency of 12kHz, is obtained from different working conditions. Dataset includes three kinds of fault degrees (0.007in, 0.014in, and 0.021in). Each fault degree contains four fault types of bearings: NO, IF, OF, and BF. Each fault type of vibration data is collected from four kinds of working conditions, i.e., L0 = 0 hp/1797 rpm, L1 = 1 hp/1772 rpm, L2 = 2 hp/1750 rpm, and L3 = 3 hp/1730 rpm. Each sample contains 2049 Fourier coefficients transformed from the raw vibration signals using FFT. Each domain on dataset contains four fault types and each fault type contains 200 samples. Under our experimental setup, it is impossible to find the optimal and via cross validation, since labeled training data and unlabeled test data are sampled from different working conditions. Thus, empirically searching the parameter space is used to find the optimal parameter settings, and details are described in Section 4. Finally, and are used in our work.

In order to verify the benefits of DATF, contrast methods of (a)-(c) are also carried out simultaneously. The scenario settings of all experiments are trained on labeled training data under one single load (training domain) to diagnose the unlabeled test data under another load (test domain). In all, 48 different transferring tests are conducted and the description of experimental setup in detail is shown in Table 1.

4.2. Diagnosis Results of the Proposed Method

The diagnostic results for fault size being 0.007in, 0.014in, and 0.021in are shown in Figures 4, 5, and 6. The average classification accuracies of four methods are described in Figure 7.

Each figure is composed of four subfigures and test domains in every figure are ordered clockwise from (a): L0, L1, L2, and L3. The left of the symbol "" in every subfigures represents the training domain and the right represents the test domain. For each set of bars in Figures 4, 5, and 6, the performances indicate transferring from training domain to test domain, which simulates fault diagnosis under different working conditions. The load and speed between different domains have large discrepancies. For example, in Figure 4(a), the test domain is L0 (the motor load is 0hp and speed is 1797rpm), the training domain is L1 (the motor load is 1hp and speed is 1772rpm), L2 (the motor load is 2hp and speed is 1750rpm), and L3 (the motor load is 3hp and speed is 1730rpm).

From the performances of bearing fault diagnosis in Figures 4, 5, and 6, the highest accuracy rates can always be achieved when the training set of one domain is the same with the testing set of one domain and this phenomenon is reasonable theoretically. We can obviously find that performances of the baseline method and NN NA are all very poor. For example, in Figures 6(a), 6(b), and 6(c), the accuracies are only about when we transfer L3 to L0, L1, and L2, respectively. Especially in Figure 4, a lot of accuracies of baseline method and NN NA can not reach when we transfer L1 to L2. These results illustrate traditional methods without domain adaptation can not be applied to fault diagnosis in variable working conditions. The performances of NN SA are better than the first two types of methods. In Figures 5 and 6, the accuracies of NN NA for variable working condition bearing fault diagnosis are very high. However, in Figure 4(c), the performance transferring between L1 and L2 is only about and the accuracy is about when we transfer L3 to L2. Similar phenomena also appear in Figure 4(a). These results mentioned above indicate that NN NA also can not be applied to complex and variable working condition bearing fault diagnosis. What is exciting is that the proposed method is evidently superior to the other three compared methods in all cases, whatever the training domain and test domain are. Note that the accuracies of DATF all can achieve in Figures 4, 5, and 6. Even in Figure 4(a), DATF can still achieve a favorable accuracy () while baseline method and NN NA just reach about and NN SA only achieve when transferring from L1 to L2. Compared to the other three methods, the average classification accuracy () of DATF has been markedly improved. These results are all obtained from the benchmark datasets of fault diagnosis research under a relatively fair experiment condition. Through the above analysis result, we can conclude that the proposed method is very potential for solving bearing fault diagnosis problems under different working conditions.

To further illustrate the influence of extracted transferable features on the results, receiver operating characteristics (ROC) are applied for evaluation [32]. An ROC curve is generated by plotting the false positive rate and true positive rate as the threshold level is varied. In this paper, ROC curves are obtained from different models based on NN classifier, which are built on different extracted features, and we only report ROC results on transferring test that transfers L1 to L2 with fault size being 0.007in in Figure 8 and similar trends on all other tests. Before the iteration begins in Figure 8(a), performances of the model built on extracted features are unsatisfactory. After iteration 1 time in Figure 8(b), performances of the model built on extracted transferable features are improved dramatically, and what is exciting is that performances based on extracted transferable features achieve the perfect detection results ultimately.

4.3. Parameter Sensitivity

In this section, we investigate the influence of the parameter , which represents regularization parameter, during transferable feature extraction. Theoretically, larger values of can make shrinkage regularization more important in our work. When and , the optimization problem is ill-defined. Different has different effects on classification accuracy. Figure 9 reports the results. From Figure 9, it is obvious that different have a great influence on diagnostic results with fault size being 0.007in and performances with fault size being 0.021in and it has little overall effect on results with fault size being 0.014in. What is noticeable is that results are little affected by parameter when the training domain and test domain are the same, and [0.05,0.5] can be optimal parameter values, which can indicate the proposed method can achieve stable and excellent performance under a wide range of parameter values.

4.4. Domain Discrepancy Effect of Empirical Analysis

In many actual fault diagnosis and classification scenarios, the distribution of training data domain is different from the testing data domain, which leads to fault diagnostic accuracy-dropping. In fact, the data distribution differences between domains (training data domain and test data domain) reflect the differences of the data structures that contain plenty of fault messages. It is a key point for fault diagnosis to extract fault features from data structures. In order to profoundly understand the effect of distribution differences between two domains and explain why the proposed method works, we resort the t-SNE technique [31] to visualize high dimensional representation of mentioned methods in our experiment in a two-dimensional map.

In all of the above-mentioned cases, take the transferring test that transfers L1 to L2 with fault size being 0.007in as an example in Figure 10.

From Figure 10, it is clear that the distribution discrepancies of transferable features extracted via DATF between training domain and test domain are much smaller than the compared methods, and transferable features are much more divisible than others. These results verify that DATF can figure out a robust feature representation for training domain and test domain, and test samples can be discriminated significantly with NN classifier built in training domain by using extracted transferable features.

4.5. Discussion

The proposed method provides a way of domain adaptation to extract robust fault features and classify fault types under different working conditions. Several remarks still need to be described.

This work presents a new point of view that uses domain adaptation to realize bearing fault diagnosis under different working conditions. Li [30] utilized spectrum images as features to conduct bearing fault diagnosis, which applied two-dimensional principal component analysis (2DPCA) into the dimension reduction of the spectrum images of vibration signals and feature extraction, and most accuracies were very high. Unfortunately, there are still several instances having lower accuracies. To solve this problem, we apply the domain adaptation into this field and transferable features for training domain and test domain are extracted to classify fault types. Finally the accuracies all can reach . In this paper, our work considers more bearing conditions (fault size being 0.007in). Compared with the method [30] in this situation, advantages of our method are highlighted.

The vast results indicate that the proposed method is suitable for effectively classifying mechanical health conditions under different working conditions. In [9], Deep Convolutional Neural Networks with Wide First-Layer Kernel (WDCNN) and AdaBN are applied to diagnose three datasets which contain 10 kinds of health conditions (BF IF OF with fault size being 0.007 in, 0.014 in, and 0.021 in) under three load conditions (Load 1, Load 2, and Load 3), respectively, which is similar to L1, L2, and L3 in this paper. The average accuracy of this method in [9] is , whereas average accuracy of DATF is . The main reason is that transferable features extracted based on domain adaptation take full advantage of structure information of training domain and test domain, and the distributions of transferable features extracted from training domain and testing domain are very close after our methods as shown in Figure 10.

It is noted that our method is unsupervised and focuses on fault transfer diagnosis based on the same fault diameter under different working conditions. In [14], a method based on neural network by using transferring parameters is proposed and success for diagnosing two datasets including 6 kinds of health conditions sampled from different fault diameters (BF IF OF with fault size being 0.007 in and 0.021 in) with the same motor load and speed (L0), and it focuses on fault diagnosis between two kinds of fault diameters under the same working conditions. In addition, unlike our method, it should be noted that a small amount of labeled data in test domain is needed when training modified neural networks, while our method does not need labeled test data during the training.

5. Conclusion

This paper presents a new way for solving bearing fault diagnosis under different working conditions. Although baseline approaches and several successful methods are all capable of detecting the bearing defects, distributional difference of datasets sampled from different working conditions has a huge impact on these methods, and their shallow representations are insensitive to distinguish different patterns under different working conditions. To tackle this problem, DATF extracts transferable feature representation for training and test domain by reducing the discrepancy between domains and strengthen the recognizable information in raw vibration signal. To evaluate the proposed DATF method, bearing fault diagnosis experiments were carried out. Extensive experiment results show that DATF is capable of improving the performance of bearing fault diagnosis under different working conditions, comparing with the peer methods.

Data Availability

Data used in this paper is acquired from the bearing data center of Case Western Reserve University (CWRU) and web page: http://csegroups.case.edu/bearingdatacenter/home (accessed October 2015).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research is supported by National Key R&D Program of China (2016YFC0802900), National Natural Science Foundation of China (no. 51475455), and the Natural Science Foundation of Jiangsu Province (no. BK20160276).