Abstract

In real industrial scenarios, with the use of conventional machine learning techniques, data-driven diagnosis models have a limitation that it is difficult to achieve the desirable fault diagnosis performance, and the reason is that the training and testing datasets are assumed to have the same feature distributions. To address this problem, a novel bearing fault diagnosis framework based on domain adaptation and preferred feature selection is proposed, in that the model trained by the labeled data collected from a working condition can be applied to diagnose a new but similar target data collected from other working conditions. In this framework, an improved domain adaptation method, transfer component analysis with preserving local manifold structure (TCAPLMS), is proposed to reduce the differences in the data distributions between different domain datasets and, at the same time, take the label information of feature dataset and the local manifold structure of feature data into consideration. Furthermore, preferred feature selection by fault sensitivity and feature correlation (PSFFC) is embedded into this framework for selecting features which are more beneficial to fault pattern recognition and reduce the redundancy of feature set. Finally, vibration datasets collected from two test platforms are used for experimental analysis. The experimental results validate that the proposed method can obviously improve diagnosis accuracy and has significant potential benefits towards actual industrial scenarios.

1. Introduction

In various parts of rotating machinery, the failure probability of rolling element bearings (REBs) is often higher than that of other components when the rotating machinery operate in harsh working environments [1, 2]. Therefore, bearing fault diagnosis features prominently in industrial applications, such as ensuring its reliability and reducing economic losses [3]. With the advent of the big data era, signal processing and data mining technologies are undergoing rapid development, and data-driven fault diagnosis methods are also developing rapidly [36]. However, towards its applicability in practical industrial applications, the conventional intelligent diagnosis method based on data-driven has two main disadvantages [4, 5, 7]. (1) Feature extraction and fault classification models abide by a common premise that the training data and the test data have identical distributions. If this premise is not true, the generalization of these intelligent diagnosis methods will be greatly reduced. When the working conditions are not consistent in real industrial scenarios, the premise is not true. (2) Insufficient labeled target fault data are existed, due to that the working condition is variable and the types of failure of rotating machines are diverse. Therefore, those conventional data-driven intelligent diagnosis methods cannot establish an accurate fault diagnosis model of the target bearing in real diagnosis scenarios. In order to overcome the above limitations, it is essential to construct an advanced fault diagnosis model. This advanced model can perform accurate fault classification in a specific dataset and can show significant generalization ability in unlabeled data which is from other working conditions.

In the process of fault diagnosis, the crucial steps for fault pattern recognition are signal processing and feature extraction. In the current research on bearing fault diagnosis, the analysis signals mostly come from the vibration signal of REBs, and this signal is generally analyzed by time-frequency domain methods [59]. Yu et al. [10] decomposed raw signal with MODWPT and calculated the reconstructing signal to obtain the primitive statistical features representing the fault state in different frequency bands. For compound fault diagnosis of bearings, Shao et al. [11] proposed a new diagnosis method called adaptive DTCWPT based on the high-order spectrum. In order to decompose and reconstruct the vibration signal faster and more accurately, Zhang et al. [12] raised a lifting wavelet method, which is better than the traditional wavelet and combined with the morphological fractal dimension to recognize the rolling bearing's running status. By combining VMD time-frequency analysis and SVM, Zhou proposed a new method in [13] to detect the fault of rolling bearings, where the original signal is decomposed into intrinsic mode function (IMF) set through VMD. For the weak damage characteristic extraction, Xingxing et al. [8] proposed a new fault diagnosis method based on VMD, which is guided by ICF. For detection and isolation of multiple failures, a new fault detection technology is proposed for rolling bearings by Zhang in [14], which combined FAWT with time-frequency analysis. To identify the health state under varying status for the rolling bearings, Zhong et al. [15] proposed an intelligence method based on STFT and CNN. In order to combine the characteristics of WT to extract interference instantly and the adaptive ability of EMD to analyze time-varying or nonlinear signal, a new method is raised by Merainani in [16], which combined EWT with S transform (ST).

After the raw vibration signal is processed, statistical characteristics can be selected to show characteristic information, such as PV, RMS, V, Sw, K, energy, and energy entropy. In [9], the raw signal was decomposed by wavelet decomposition and the reconstructed signals were obtained, and then 192 statistical characteristics are obtained from the corresponding HHT envelope spectrum (HES) of reconstructed signals. In [17], raw signals were decomposed into many distinct IMFs through EMD, the first four IMFs were selected to obtain the HES and HHT marginal spectrum, and then they were used to calculate statistical features. Wang et al. [18] proposed a technology of the original signal preprocessing based on wavelet packet denoising (WPD) and random forest algorithm (RFs) and used the value of SNR and mean square error as the mother wavelets set of the signal preprocessing. Shuangli et al. [19] combined wavelet analysis and entropy theory to decompose and reconstruct the collected signals and then input it into GA optimized SVM (GA-SVM). Considering the modulation characteristics of the bearings fault signal and the disadvantages of selecting the resonant high-frequency band based on experience, a bearings fault diagnosis technology was proposed by Jing in [20], which combines EMD and spectral kurtosis. Considering the effectiveness of the extracted statistical features in the entire life cycle process, Xiaodong [21] proposed a feature extraction approach based on the optimal selection of statistical indicators. Qing-feng et al. [22] used the variance contribution to eliminate false components in IMF and combined the wavelet packet decomposition to improve EMD. By using wavelet and FFT to decompose signals, Seryasat et al. [23] extracted the energy and RMS of signals in distinct frequency bands, which can recognize the bearings fault effectively.

The high-dimensional characteristic sets can usually be obtained after signal processing and feature extraction. Taking account into the compound mapping relationship between the bearing fault and its sign, it is hard to decide which statistical attribute is worth reflecting the nature of the fault from the high-dimensional characteristic space. The high-dimensional characteristic sets are prone to generate redundant characteristics and lead to reduced accuracy and efficiency of troubleshooting. The key step of the classifying process is selecting a subset of features. It is indicated in previous studies that feature selection seems to be an important premise to attain the accuracy for prospective diagnosis. Yu et al. [17] proposed a novel method of feature extraction for selecting the most sensitive characteristics, which combines the STD of characteristic data with K-means method. Fei et al. [9] raised a characteristics selection technology to determine the fault-sensitive features by combining ARI and sum of within-class MD. Considering that the stable distribution can extract features with high discriminating ability, Chouri et al. [24] combined alpha-stable distribution feature extraction with the weighted support vector machine (WSVM) to extract features efficiently. For the problem of nonsensitive characteristics in the features set, Liu et al. [25] raised a characteristic selection approach based on sensitive characteristic extraction and nonlinear feature fusion, which used CDET to choose sensitive characteristics and weighing them and then used locality preserving projections (LPP) to reduce the size of the weighted sensitive features for getting more sensitive characteristics. Liu et al. [26] used the K-SVD approach to train a sparse dictionary from sample data and apply it to sparsely decompose the feature vector for fault classification and identification. Considering that the actual working condition is complex and changeable, Chen et al. [27] raised a cross-domain features selection approach with TCA. Sun et al. [28] raised a feature extraction and diagnosis algorithm using CNN, which can automatically select a feature from time-domain vibration signals and discover distributed features of data effectively. By analyzing vibration and current signals, BR et al. [29] selected 12 statistical time-domain characteristics, among which SSC, VAR, and STD can be determined as good characteristics. Therefore, how to select the statistical features which are more beneficial to fault pattern recognition is also an important step. In this paper, a new characteristics selection approach, priority selection of features based on fault sensitivity and correlation between features (PSFFC), is proposed.

According to the above discussion, aiming at the two main limitations of conventional data-driven fault diagnosis technologies, some existing research recently has shown that transfer learning or domain adaptation methods [30] have vast application prospect and extensive applicability in every field [3133]. For overcoming the existing limitations, a novel idea was inspired by transfer learning or domain adaptation. For addressing the problem of insufficient fault information in significant fault samples, Chen et al. [34] presented an early fault diagnosis model, which combines DNN with transfer learning and can select the fault features of a great number of fault samples and unimportant fault features of other fault samples. With HKL and transfer learning, a robust fault diagnosis network applied to different working conditions was built by Qian in [35]. In order to overcome the problem that the training and test data have different distribution, and considering the valid and ordinary diagnostic knowledge obtained from multiple concerning source domains, Zheng et al. [3] raised a new intelligent fault recognition approach for multiple source domains. In order to overcome the problem that too few labeled samples are insufficient for accurate diagnosis, Zeng et al. [36] raised a fault diagnosis method with certain parameters unrelated to subsequent tasks for pretraining, and a great amount of unlabeled data was obtained. Zhou et al. [37] raised MSDCTL for fault diagnosis in new working condition, and it did not need new labeled data. To overcome the problem of unmarked cross-domain diagnosis of bearings, Shao et al. [38] raised a confrontation domain adaptive approach with deep transfer learning. The basic idea of the above studies is learning the fault detection knowledge from training data and transferring the knowledge to test data. It improves the capability to recognize the generalization of these models on the target dataset. Unlike the traditional machine learning techniques need to start from scratch to learn new tasks, transfer learning or domain adaptation learns the fault detection knowledge from the source domain and applies it to the target domain, which is more suitable for cross-domain learning applications [3, 6]. The transfer learning method based on characteristic is the main branch in it, and among these methods, TCA is a representative method [6]. For reducing the distance between the source domain and the target domain, features are mapped to a higher-dimensional reproducing kernel Hilbert space by means of nonlinear transformation [30]. Although it is very useful, TCA needs further optimization, which ignores the category information of samples and the local manifold structure of data. Therefore, based on TCA, transfer component analysis with preserving local manifold structure (TCAPLMS) is proposed in this paper, it reduces the distribution difference between the source domain and the target domain and improves discriminative of feature dataset.

Given the above discussion, a novel intelligent fault diagnosis method for bearings is proposed which is based on domain adaptation and preferred feature selection. The contributions of this paper are summarized as follows.(1)A new feature selection method, PSFFC, is raised to select features which are more beneficial to fault pattern recognition and reduce the redundancy of the feature set.(2)An improved feature-based transfer learning approach, TCAPLMS, is raised for domain adaptation. TCAPLMS can reduce the differences in the marginal distributions between different domain datasets and, at the same time, take the label information of the feature dataset and the local manifold structure of feature data into consideration, and then, the domain adaptability and the discriminant performance are improved.(3)A new intelligent bearing fault diagnosis framework is proposed to suit variable working conditions, which address the limitations of fault diagnosis methods based on data-driven and enhance the generalization ability of fault diagnosis model in actual diagnosis scenario.

The rest of this paper is introduced as follows. Section 2 introduces the theoretical backgrounds of the MODWPT, LFDA, and TCA. Section 3 presents the fault diagnosis framework, the PSFFC and TCAPLMS. Section 4 is about the experimental analysis of the method, which is used to validate the effectiveness and adaptability. Finally, Section 5 is the conclusion. Furthermore, we present some acronyms in Table 1.

2. Theoretical Background

2.1. Maximal Overlap Discrete Wavelet Packet Transform (MODWPT)

Discrete wavelet transform (DWT) is a typical time-frequency analysis method, but it also has limitations. Such as, in order to transform fully, the algorithm requires that the signal sequence length of the analyzed signal is an integer power of 2 [9]. To overcome this limitation, a highly redundant nonorthogonal wavelet transformation named MODWT is proposed, and it has no restriction on the signal sequence length. But MODWT is replaced by MODWPT, which has good frequency resolving power at high frequency and preserves the good performance of MODWT [39].

DWT can be explained as follows: is a real-valued time sequence and is the signal sequence length. is even-length low-pass filter of DWT, which can output the low-frequency part of the input signal and filter out the high-frequency part, while the high-pass filter is the opposite, where is the filter’s length. For all nonzero integer , the low-pass and high-pass filters can be simply written by the following equation [9, 39]:

Additionally, low-pass and high-pass filters are chosen to be quadrature mirror filters, so and are related to each other as follows:

With , the th level scale transformation coefficient is , where . For the DWT pyramid algorithm, is the th level input, so the th level output is the th level scaling transform coefficient and wavelet transform coefficient which are presented as follows:where mod represents remainder after division.

DWT has the following limitations [9, 39, 40]: (1) when the length of the signal sequence to be analyzed is an integer power of 2, DWT could be fully performed; (2) when the signal is cyclically shifted, the wavelet coefficient and proportional coefficient of DWT cannot reach the identical cycle shift; (3) the wavelet coefficients and scale coefficients of DWT will be halved with the increase in the level of DWT series, which will affect the statistical analysis of the coefficients.

The sample size of the -level DWT restricts to an integer power of 2, and MODWT does not require to consider sample size. Thus, MODWT is the optimized DWT. The filters of MODWT are redefined to conserve energy:

At the same time, and satisfy the following equations now:

MODWT algorithm performs the weighted average of all observation starting points, which can reduce the deviation caused by cycle shift. To avoid halving the coefficient, MODWT rebuilds filters by inserting zeros for each th level between the elements of and :

With , the updated scaling coefficients and wavelet coefficients are presented as follows:

In order to decompose high-frequency signals, MODWPT can be used to process signals. The coefficient of MODWPT is defined as follows:where is the frequency band number.

2.2. Local Fisher Discriminant Analysis (LFDA)

LFDA integrates the advantages of LDA and LPP [17, 41], that is, LFDA aims to obtain the best separability between classes in the space and at the same time retain the local structure within the class. On the basis of LDA, LFDA considers the proximity relationship between the sample data of the same classes, so that the data after dimensionality reduction are more conducive to classification [41].

LDA can be explained as follows: let be d-dimensional samples and be associated class labels, where is the number of samples and is the number of classes. Let be the number of samples in class . Let be the scatter matrix within the class and be the scatter matrix between classes [17, 41]:where is the sample average in class and is the average of all samples.

Assuming has full rank and is invertible, the LDA transformation matrix is defined as follows:

LDA can be explained as follows: is an affinity matrix and is the appetency between and , where . The value of depends on the proximity relationship between and in the feature space. is defined as follows [40]:where is the local scaling of the data samples around which is defined as follows:where is the K-th nearest neighbor of and is generally 7.

The LPP conversion matrix is represented as follows:subject towhere is the n-dimensional diagonal matrix and satisfies the following equation:

The LFDA transformation matrix is defined as follows [41]:where and are the updated scatter matrices within the class and between classes, respectively:where and represent the weight values for the sample pair in the identical class and distinct class, respectively, which can be defined as follows:

It is necessary to weight the values of sample pair in the identical category based on the affinity so that the samples which are far apart in the identical category have less impact on and .

2.3. Transfer Component Analysis (TCA)

TCA [42] is a typical feature-based transfer learning method, which aims to reduce the difference between the marginal distributions of the different datasets. Given a domain D, it consists of a D-dimensional feature space X, whose marginal probability distribution is ; is the training dataset. is the learning task, where is the D-dimensional label space and is the predictive function which represents the conditional probability distribution given (source domain), (target domain), and corresponding learning tasks and . TCA aims to facilitate the predictive function in by learning the d data from and , but in real diagnostic procedure, or . A nonlinear mapping function in a reproducing kernel Hilbert space H exits that and . The optimization objective of TCA is that the variance of the feature data can be preserved in a latent space and the marginal distributions between the and datasets can be minimized as much as possible. In TCA, the empirical maximum mean discrepancy (MMD) represents the distance of two marginal distributions and , and the definition of MMD is represented as follows [7, 42, 43]:where and are the sample number of and , respectively. K represents a kernel matrix, as follows:where , , and are the kernel matrices in the , cross-domain, and , respectively. The expression of L is shown as follows:

Empirical kernel mapping can reduce the matrix dimension, and it can transform high-dimensional data to low-dimensional data by using matrix which is embedded into K. The resultant kernel matrix is as follows [7]:where , and equation (20) can be transformed as

Therefore, the kernel learning problem (the objective function of TCA) can be replaced as follows:where is used to control the complexity of W and at the same time to avoid the rank deficiency of the denominator. In equation (26), is a trade-off parameter. and H are identity matrices, respectively. Finally, based on the trace optimization problem, equation (26) can be efficiently solved.

3. Proposed Method and System Framework

3.1. Preferred Feature Selection by Fault Sensitivity and Feature Correlation (PSFFC)

In order to select features which are more beneficial to fault pattern recognitions and reduce the redundancy of feature set, in this paper, we suggest that fault sensitivity of statistical feature and correlation between features should be considered to select preferred features. Therefore, there are two aspects:(1)The K-means algorithm (KA) [44] and the SMD of feature data are applied to indicate the fault sensitivity. Each type of statistical feature is processed by KA, and KA can get an index, that is, adjusted rand index (ARI). For each kind of feature, the MD of data samples in each condition can be calculated, and the SMD in all bearing conditions can be further obtained. The ARI and SMD indicate the class discriminative degree and the cohesiveness of the feature data, respectively. The ratio of the ARI and SMD is used to evaluate the fault sensitivity of feature, and the higher the ratio, the greater the fault sensitivity will be.(2)PCC [45] is used to evaluate the correlation between features. The higher the PCC, the higher correlation between features will be.

On the basis of the above two aspects, a new feature evaluation index is proposed, feature priority selection degree (FPSD), that is used to select preferred features for fault pattern recognition. The introduction of PSFFC is summarized as follows.Step 1: given a raw vibration signal dataset, there are M fault types, and each fault type has N vibration signal samples. K types of statistical features can be obtained by the vibration signals processing and original feature extraction. These features can constitute the raw statistical feature set , where the expression of is as follows:where is the k-th feature of the j-th sample in the i-th fault type. Then, by using KA, the ARI of the clustering partitions can be indicated by the cohesiveness of the feature data [46, 47]. The definition of ARI is described as follows:Given a set of n objects , , and are supposed to represent two different partitions of the objects in X, the ARI is then defined as follows [46, 47]:where a is the number of that objects in a pair belonging to the same classes in U and V; d is the number of that objects in a pair belonging to the different classes in U and V; b represents the number of that objects belonging to the same classes in U and the different classes in V; c represents the number of that objects belonging to the different classes in U and the same classes in V.The maximum of ARI is 1, which indicates that the correct classification between classes is achieved by KA [17, 47]. Therefore, the value of ARI can be used to indicate the clustering performance, which can reflect the feature’s discriminant power [47]. When the feature sets, , are performed for clustering analysis by K-means algorithm, the corresponding can be obtained. When the value of ARI is higher, the class discriminative degree of feature is greater.Step 2: for each type bearing condition (fault pattern), for a type of statistical feature, the MD of each feature data (the elements of row of ) is calculated. Thus, the corresponding MD set can be obtained, that is . The expression of is as follows:whereThen, for the k-th statistical feature of M fault types, the SMD of feature samples is calculated to obtain . The expression of is as follows:Therefore, K types of statistical features have a mean deviation sequence . We suppose that the MD can be used to indicate the cohesion of feature data. When the value of is smaller, the class cohesion of the feature is greater.Step 3: evaluation index of fault sensitivity, FSD (fault sensitivity degree), can be obtained by calculating the ratio of ARI and SMD. For K types of features, there is a FSD sequence , where the is defined as follows:When the value of FSD (k) is higher, the fault sensitivity of feature is better.Step 4: by calculating the PCC between features, for the raw feature set which contains K types of statistical features, the PCC between each feature and the remaining K − 1 features should be calculated, and thus, each feature has K − 1 PCCs. Then, the SPCC (sum of the K − 1 PCCs) can be obtained. Given two samples and , the PCC is defined as follows:where and are the mean of samples, and are, respectively, the standard deviation of samples X and Y. Next, there is a SPCC sequence , and the is defined as follows:where represents the PCC between the k-th type feature and i-th type feature. In this paper, we suppose that the higher the SPCC of a feature, the higher the redundancy degree of raw feature set caused by the feature will be.Step 5: a new feature evaluation index, FPSD, can be obtained by combining the FSD and SPCC. The expression of FPSD is presented as follows:where is a balance factor. When is 0, FPSD just takes feature correlation into account, and on the contrary, FPSD just takes fault sensitivity into account when is 1. In this paper, it is presumed that the selection priority of feature is better when the value of FPSD is higher. Therefore, in the descending mode, the sorted FPSD sequence can be obtained by sorting the FPSD of features. The sorted FPSD sequence can be used to select features for the implementation of the subsequent fault diagnosis process.

3.2. Transfer Component Analysis with Preserving Local Manifold Structure (TCAPLMS)

TCA can keep the variance of the data to the greatest extent and minimize the marginal distribution differences between datasets in different domains as much as possible [41]. However, TCA does not take the label information and the local manifold structure of feature data into consideration. Aiming at fault pattern recognition, the label information of the training feature dataset is beneficial to improve the discriminant performance of feature data and increase the classification accuracy [9, 17]. Furthermore, preserving the local manifold structure of data is beneficial to pattern recognition and classification of multimode feature data [2, 17]. Therefore, TCAPLMS, a novel feature-based transfer learning method, is proposed in this section. TCAPLMS naturally inherits the merits of TCA and LFDA, that is, the optimization goal of improved LFDA can be integrated into TCA, where the label information of feature data is considered and the local manifold structure of data is preserved.

Based on the introduction of TCA and LFDA in Section 2, the optimization goal of the TCAPLMS can be defined by integrating the optimization goal of TCA and improved LFDA. The goal function of TCAPLMS is presented as follows:where and are obtained by modifying and (equations (17) and (18)). The expressions of and are shown as follows:where and are presented as follows:

In equation (39), represents that j is the nearest neighbor of i. The reason for the above modification is a shortcoming of LFDA, that is, the neighbor relationships between samples in the same classes are taken into account. However, the neighbor relationships between samples in different classes are not taken into consideration. Aiming at this problem, the between-class scatter matrix can be modified to .

The solution of equation (36) can be transformed to solve the trace optimization problem. The Lagrange multiplier is contained in diagonal matrix that is employed to equation (36), and it is presented as follows:

Then, the matrix W can be solved out by solving a generalized eigenvalue problem:

Finally, eigenvalues and the corresponding eigenvectors can be obtained by solving the above problem; the first d (d < D, D is a higher dimension of the inputs of TCAPLMS) eigenvectors which are corresponding to the first d smallest nonnegative eigenvalues can be selected to compose the transformation matrix W.

With the use of the proposed TCAPLMS, the low-dimensional representation of the training and testing datasets can be obtained with a smaller difference of marginal distributions between them, and they have greater discriminant performance and less redundant information.

3.3. System Framework

The structure block diagram of the proposed system framework for variable-condition bearing fault diagnosis is shown in Figure 1. According to the system framework, the entire procedure has four steps, namely, signal processing, features extraction, feature transfer learning, and fault pattern recognition. There are two phases, training and testing phases. First of all, the vibration signals collected for training and testing are, respectively, decomposed into different packet nodes by MODWPT. Then, the original statistical feature generation is performed. For the training phase, the original feature set is processed by the proposed PSFFC to obtain sorted FPSD sequence. The most preferred features can be selected for the training fault diagnosis model. In the testing phase, the sorted FPSD obtained from the training phase will be directly applied to select preferred features to construct feature subset. Next, labeled feature data from the training phase and unlabeled feature data from the testing phase are chosen as source domain and target domain, respectively. The proposed TCAPLMS is employed to process source domain and target domain, which can obtain the low-dimensional feature dataset. Finally, the low-dimensional feature dataset in the training phase is employed to train the pattern recognition classifier. Finally, the trained pattern recognition classifier is used to test the low-dimensional testing feature dataset and output the fault diagnosis accuracy.

4. Experiments and Analysis Results

In this paper, bearing vibration datasets obtained from two experimental test platforms are used to validate the effectiveness of the proposed fault diagnosis framework towards real industrial scenes. The introduction of two experimental test rigs is as follows:(1)Test rig 1, as shown in Figure 2, is from Case Western Reserve University (CWRU) [4850], and this test rig supports a motor load of 0–3 horsepower (hp). There are three accelerometers that are placed on the fan-end and drive-end bearings at the 12 o’clock position.(2)Test rig 2 is SQI-MFS test rig, as shown in Figure 3 [9, 17]; there are different fault conditions which are presented in Figure 4. SQI-MFS supports the motor speeds of 1200–1800 rpm. Two accelerometers are placed on the fan-end and drive-end bearings to collect vibration signals. A high-speed AD collector is used to collect the vibration data under different working conditions.

4.1. Experiments Based on Test Platform 1 (CWRU)
4.1.1. Experimental Setup and Cases

Aiming at the verification of the effectiveness of the proposed diagnosis framework under different working conditions, the vibration signals collected from test platform 1 under two motor loads (2 hp and 3 hp) are used for experiments. We collect vibration data of four fault types, including normal condition, ball fault condition, inner race fault condition, and outer race fault condition. Therefore, there are 12 bearing conditions, for each bearing condition, we choose 40 random samples as the testing samples and 20 random samples as the training samples, and a sample has 2000 continuous data points. For the experimental cases, two cases (cases 1 and 2) are used in experiments. Cases 1 and 2 have the same training samples, that is, the samples of 2 hp of motor load are chosen as the training samples. However, the testing samples of cases 1 and 2 are different, the samples of 2 hp and 3 hp of motor loads are chosen as the testing samples in Case 1 and Case 2, respectively. The detailed introduction of experimental data is shown in Table 2.

4.1.2. Experimental Analysis

First of all, vibration signals are processed by MODWPT, and some wavelet packet nodes can be obtained. In this section, the “dmey” is selected as mother wavelet and the decomposition level is 4. One normal sample, one ball fault sample, one inner race fault sample, and one outer race fault sample from the training set of 3 hp are presented in Figures 58, respectively. 16 terminal nodes are obtained by signal processing, and 16 HES of reconstruction signals of 16 terminal nodes can be calculated. Therefore, 192 statistical features, the composition of the raw feature set (RFS), can be generated by calculating the 6 statistical parameters of 16 reconstruction signals and 16 HES. Table 3 presents the 6 statistical parameters.

For the RFS, the proposed feature selection method PSFFC is employed to evaluate the feature priority selection degree of each statistical feature of the training data. The ARI, SSMD, FSD, SPCC, and FPSD of 192 features of the training samples are, respectively, shown in Figures 913. The horizontal axis of Figures 913 is the number of features. After the procedure of PSFFC, a sorted FPSD sequence can be obtained, and preferred features subset can be formed. Then, the proposed feature-based transfer learning method TCAPLMS is further performed to reduce the marginal distribution differences between the training and testing feature subsets and obtain a low-dimensional feature set with desirable discriminant performance. In this paper, the parameters a and in TCAPLMS are 0.5 and 0.3, respectively. Finally, the low-dimensional feature set is employed for the training fault diagnosis model.

For the verification of the effectiveness of the proposed PSFFC and TCAPLMS, two-group comparative experiments are performed. The training and testing datasets are employed for training and testing fault diagnosis model, respectively. In experiments, SVM and KNN are applied for constructing fault diagnosis model, and a series of fault diagnosis models are presented in Table 4. For example, RFS-SVM is a diagnosis model based on SVM, in which the RFS is used as the input of the SVM. TCA is embedded in the RFS-SVM model, which is RFS-TCA-SVM model. PSFFC is embedded in RFS-TCA-SVM model, which is RFS-PSFFC-TCA-SVM model. In this paper, the average diagnostic accuracy of 12 bearing conditions are shown in the experimental analysis, and the detailed description is presented as follows.

In the first group of experiments, PSFFC is not performed. The experimental results of diagnosis models listed in Table 4 are shown in Tables 58. For the verification of the superiority of MODWPT, WPT is also used to construct diagnosis model, and the experimental results of the RFS-SVM and RFS-KNN models using MODWPT and WPT are, respectively, presented in Table 5. According to the diagnosis accuracy in Table 5, it is evident that the diagnosis result of model using MODWPT is better than the model using WPT. Therefore, the experimental results and analysis of all models using MODWPT are introduced below. For the testing set of case 1, all models can obtain desirable diagnosis accuracy. The maximum accuracy of RFS-SVM, RFS-LFDA-SVM, RFS-TCA-SVM, and RFS-TCAPLMS-SVM can attain 98.54%, 99.79%, 91.67%, and 99.38%, respectively. The main reason is that the training and testing samples are from the same working condition and the distribution between them is almost the same. For the testing set of case 2, the highest accuracy of RFS-SVM, RFS-LFDA-SVM, and RFS-TCA-SVM can only attain 83.54%, 87.08%, and 87.29%. However, the highest accuracy of the RFS-TCAPLMS-SVM model can attain 97.50%, which is obviously higher than that of other models. The experimental results of KNN-based models are similar to that of SVM-based models, and the highest accuracy of the RFS-TCAPLMS-KNN model can attain 97.08%, which is obviously higher than that of other models. A problem can be found from the first group experimental results, that is, for the conventional fault diagnosis model, it is not easy to guarantee a preferable diagnosis performance when the distribution between the testing and training set is different. The use of TCAPLMS can help the diagnosis model attain desirable diagnosis performance.

There is an experimental analysis about the second group of experiments; PSFFC is performed before the steps of features transfer learning and patterns classification. The experimental results of fault diagnosis models listed in Table 4 are shown in Tables 912 and Figures 1422. The dimension represents the number of dimension sizes. All diagnosis models can achieve desirable results when the testing data are from case 1. The highest accuracy can attain over 99.5%, in that the highest accuracies of RFS-PSFFC-SVM, RFS-PSFFC-LFDA-SVM, and RFS-PSFFC-TCAPLMS-SVM can attain 100%. For the testing set of case 2, compared with RFS-SVM, RFS-PSFFC-SVM, RFS-PSFFC-TCA-SVM, and RFS-PSFFC-TCAPLMS-SVM can achieve desirable performance which is obviously better than RFS-SVM, the maximum diagnosis accuracy of them can attain 95.63% (psfn = 90), 97.50% (psfn = 40), and 100% (psfn = 40, 70, 80), respectively. The experimental results of KNN-based models are similar to that of SVM-based models, and for the testing set of case 2, the highest diagnosis accuracy of RFS-PSFFC-TCAPLMS-KNN can attain 100%. According to the second group experimental results, under different working conditions, it is obvious that the proposed PSFFC can help diagnosis models to improve the diagnosis performance, and the combination of PSFFC and TCAPLMS can attain ideal fault diagnosis accuracy when a good parameter psfn is used; for the testing set of case 2, RFS-PSFFC-TCAPLMS-SVM model can attain 100% accuracy when psfn is 40, 70, or 80. Therefore, the effectiveness and adaptability of the PSFFC and TCAPLMS are verified.

4.2. Experiments Based on Test Platform 2 (SQI-MFS)
4.2.1. Experimental Setup and Cases

The vibration signals collected from the SQI-MFS test platform which is working under different motor speeds are used for experiments. The vibration signal samples of 1200 rmp and 1800 rmp are chosen for a series of experiments; there are 10 bearing conditions, including four (normal, ball fault, inner race fault, and outer race fault) bearing fault types. For each bearing condition, we choose 40 random samples as the testing samples and 20 random samples as the training samples, and a sample has 5000 continuous data points. For the experimental cases, two cases (cases 1 and 2) are used in experiments. Cases 1 and 2 have the same training samples, that is, the samples of 1800 rmp of motor speed are chosen as the training samples. However, the testing samples of cases 1 and 2 are different, the samples of 1800 rmp and 1200 rmp of motor speeds are, respectively, chosen as the testing samples in case 1 and case 2. The detailed introduction of the experimental dataset is shown in Table 13.

4.2.2. Experimental Analysis

The experimental procedure is the same as that of test rig 1; first of all, vibration signals are processed by MODWPT, and different wavelet packet nodes can be obtained. One normal, one ball fault, one inner race fault, and one outer race fault vibration signal samples from the training set of 1800 rmp are presented in Figures 2326, respectively. Then, according to the decomposition of a vibration signal, it is similar to Section 4.1.2, 192 statistical features that compose the RFS can be obtained.

For the RFS, the proposed feature selection method PSFFC is employed to evaluate the feature priority selection degrees of 192 statistical features. The ARI, SSMD, FSD, SPCC, and FPSD of 192 statistical features of the training samples are, respectively, shown in Figures 2731. After the procedure of PSFFC, a sorted FPSD sequence can be obtained, and preferred features subset can be formed. Then, the proposed feature-based transfer learning method TCAPLMS is further performed to process the training and testing feature subsets, which can help to obtain a desirable discriminant performance. Finally, the low-dimensional feature set is employed for the training fault diagnosis model.

The description of the verification of the effectiveness of PSFFC and TCAPLMS is presented as follows. RFS-SVM, RFS-KNN, RFS-TCAPLMS-SVM, RFS-TCAPLMS-KNN, RFS-PSFFC-TCAPLMS-SVM, and RFS-PSFFC-TCAPLMS-KNN are applied for experiments. In Table 14, the experimental results of RFS-SVM and RFS-KNN models are presented. The experimental results of RFS-TCAPLMS-SVM and RFS-TCAPLMS-KNN are shown in Table 15. The diagnosis results of RFS-PSFFC-TCAPLMS-SVM and RFS-PSFFC-TCAPLMS-KNN are presented in Table 16. When the testing dataset is from case 1, all models can attain desirable diagnosis accuracy. The highest diagnosis accuracy of RFS-PSFFC-TCAPLMS-SVM (psfn is 110–150) and RFS-PSFFC-TCAPLMS-KNN (psfn is 120–150) can attain 100%. When a good parameter psfn can be used, the fault diagnosis model for the testing set of cases 2 can attain desirable diagnosis results, for example, the highest testing accuracy of RFS-PSFFC-TCAPLMS-SVM model can attain 89.50% when psfn is 80, which is obviously higher than that of RFS-TCAPLMS-SVM. The curve representation of testing accuracies of RFS-PSFFC-TCAPLMS-SVM and RFS-PSFFC-TCAPLMS-KNN is shown in Figures 32 and 33. The testing accuracies of models with the use of PSFFC, TCA, LFDA, and TCAPLMS are presented in Figure 34. Therefore, the effectiveness and adaptability of the PSFFC and TCAPLMS can be further validated (Figures 3234).

5. Conclusions

In the face of real industrial scenarios, the complex working conditions can lead to that data-driven diagnosis methods using conventional machine learning techniques often highlight a limitation that it is difficult to achieve the desirable fault diagnosis performance, due to that the feature distributions of training and testing data are assumed to the same. Aiming at this problem, a novel intelligent bearing fault diagnosis framework is proposed towards real industrial scenarios. In this framework, an improved domain adaptation method, transfer component analysis with preserving local manifold structure (TCAPLMS), is proposed to reduce the marginal distributions differences between different domain datasets, and at the same time, take the label information of feature dataset and the local manifold structure of feature data into consideration, Furthermore, preferred feature selection by fault sensitivity and feature correlation (PSFFC) is embedded into this framework for selecting features which are more beneficial to fault pattern recognitions and reduce the redundancy of feature set. Finally, vibration signal datasets collected from two experimental test platforms are used for experiments.

It is obvious that the proposed PSFFC and TCAPLMS have a great potential to be beneficial in actual bearing fault diagnosis applications. In experiments, two cases are selected as comparative cases, and for the experimental test rig 1, cases 1 and 2 have the same training samples, but the testing samples are different. The experimental results show that the diagnosis model using PSFFC and TCAPLMS can attain desirable performance and improve the generalization ability of models, and when a good parameter psfn is used, cases 1 and 2 can both attain 100% diagnosis accuracies. In summary, the experimental results from test rig 2 further demonstrate the effectiveness, adaptability, and great potential of the diagnosis model using PSFFC and TCAPLMS under variable working conditions.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Key R&D Program of China (nos. 2017YFC0804400 and 2017YFC0804401), “Smart mine” key technology R&D open fund of China University of Mining and Technology and Zibo Mining Group Co., Ltd, and Special translation on innovation ability of scientific research base of China University of Mining and Technology (NO. 2018CXNL02).