Abstract

In this paper, an improved simultaneous fault diagnostic algorithm with cohesion-based feature selection and improved backpropagation multilabel learning (BP-MLL) classification is proposed to localize and diagnose different simultaneous faults on gearbox and bearings in rotating machinery. Cohesion evaluation algorithm selects high sensitivity feature parameters from time and frequency domain in high-dimensional vectors to construct low-dimensional feature vectors. The BP-MLL neural network is utilized for fault diagnosis by classifying the feature vectors. An effective global error function is proposed in BP-MLL neural network by modifying distance function to improve both generalization ability and fault diagnostic ability of full-labeled and nonlabeled situations. To demonstrate the effectiveness of the proposed method, simultaneous fault diagnosis experiments are conducted via wind turbine drivetrain diagnostics simulator (WTDDS). The experiment results show that the proposed method has better overall performance compared with conventional BP-MLL algorithm and some other learning algorithms.

1. Introduction

Rotating machinery is a power transmission device in various mechanical equipment and also has been an indispensable part in industrial applications. Components in the rotation machinery including rotor, rotating shaft, bearing, and gearbox are all under arduous work and, thus, are subject to performance degradations and mechanical failures [1, 2]. However, any failure of key components in rotating machinery will cause serious accidents with high economic losses [3]. Therefore, the accurate detection of mechanical fault locations and types in rotating machinery is highly needed.

Currently, there are two types of quantitative analysis fault diagnostic methods for rotating machinery: model-based and data-driven diagnosis [4]. Model-based methods implement dynamic process models in the form of mathematical formulas and parameters; however, describing models by mathematical structure can be difficult and inefficient because of the more and more complicated industrial processes [5]. Compared with model-based fault diagnosis, data-driven methods tend to transfer diagnostic problems to the pattern recognition problems. The data-driven methods are mainly composed of multivariate statistical analysis, such as regression [6] and principal component analysis [7], and machine learning methods, such as support vector machine (SVM) [8], random forest [9], neural networks [1014], and transfer learning [15, 16].

Neural networks have been commonly used for intelligent fault diagnosis due to the powerful capabilities of pattern classification and function approximation [17]. On the basis of learning strategies, diagnostic algorithms based on neural networks can be classified into supervised and unsupervised learning. Backpropagation neural network is one of the most popular supervised learning strategies; in the 20th century, researches in [1820] all proved the effectiveness of backpropagation neural networks in fault diagnosis. Additionally, compared with SVM, neural networks have a higher classification accuracy for fault diagnosis of rotor bearing systems [17]. Modifications to conventional backpropagation neural networks are also recommended to address the problem of fault diagnosis. Meireles et al. pointed out that radial basis function (RBF) networks offer advantages of higher training speed and an easier optimization of performance over conventional neural networks for fault diagnosis [21]. Wu and Chow developed a RBF network-based system to induct machine faults and propose the cell-splitting grid algorithm so that the architecture of RBF network is automatically determined [22]. This proposed system can detect unbalanced electrical and mechanical faults under different working environment.

Unsupervised networks have different architectures such as self-organizing neural networks whose structures are adaptively determined to realize that all nodes in a neighborhood have similar output to an input when stable. The method based on self-organizing maps (SOM) proposed in [23] is not only able to detect the bearing faults, but also locates them and evaluates the failure extent. Jounela et al. developed a process monitoring system based on SOM associated with heuristic rules to detect machine malfunctions [24]. All in all, neural networks are capable of classifying arbitrary regions in space which makes it a good choice for fault diagnosis [25]. The recognition of simultaneous multiple faults is also discussed in [26]; the diagnostic performance by using single-fault recognition techniques may be limited: (1) fault isolation operations can be difficult since noise in the measured signals can obscure a particular fault feature; (2) a large training set is required, which is difficult and time-consuming to collect; (3) the choice of the most suitable classifier is still vague in engineering practice.

Multilabel learning methods are usually adopted during the detection and diagnosis of simultaneous fault in rotating machinery. Three main groups of multilabel learning strategies are data transformation, adaptation, and ensemble of classifiers [26]. The basic idea of data transformation methods is to turn the multilabel problems to other known learning problems according to [27]; one of the representative algorithms is the binary relevance which converts the original multilabel dataset to binary dataset [28]. Adaptation methods improve the conventional classification algorithms and directly employ the adapted algorithms for learning on multilabel data [29]. The kernel-dependent SVM in [8] is utilized to select features and to realize simultaneous fault detection of continuous processes. Zhang et al. extended k nearest neighbor (KNN) to a multilabel learning approach, named ML-KNN [30]. In detail, maximum a posteriori (MAP) principle is employed to determine the label set after k nearest neighbors are recorded for each instance in the training set. In terms of an ensemble of classifiers, Zhong et al. advanced a new probabilistic framework that combines multiple classifiers with a new ensemble method to realize simultaneous fault diagnosis with only single-fault data trained [31]. The first multilabel learning algorithm derived from the feedforward neural network is proposed in [32], named backpropagation multilabel learning (BP-MLL). This neural network is optimized by minimizing the differences between the actual outputs and desired outputs on each training example. One of the most popular error functions is the sum-of-squares error functions, but BP-MLL applies a novel error function that does an exponential operation on the differences between the outputs of labeled units and unlabeled units to capture characteristics of multilabel learning, i.e., yield output of labeled unit larger than that of unlabeled unit, and then a threshold function is used to determine a label set associated with each instance. BP-MLL neural network has been applied to assist medical syndrome diagnosis [33,34]. Multilabel text categorization systems based on BP-MLL neural networks are developed to classify multilabel documents [32]. Moreover, the prediction model based on BP-MLL in [35] is applied to estimate the types of sustainable flood retention basins.

However, the computation in BP-MLL neural networks is complex; according to [32], the total training cost of BP-MLL is , where represents the total number of weights and bias, is the number of training instances, and is the total number of training epochs. Furthermore, the distance between relevant labels and irrelevant ones in conventional BP-MLL is represented by subtraction, which may be not obvious enough to be observed. Thus, two new distance functions that enhance pairwise labels discrimination to improve BP-MLL algorithm are proposed in [36]. Besides the problems mentioned above, the conventional BP-MLL algorithm is also not applicable for scenarios with full-labeled or nonlabeled situations. Multilabel classification assigns each instance with multiple categories that reflect properties of a data-point such as topics relevant to a document. A text might be about any of politics, education, specialties, or finance at the same time or none of these. Assume that a set of labels is organized and associated with each instance; if an instance is relevant to all the labels, then all labels in the set will be marked, so called full-labeled situation. Similarly, if an instance does not have connection to any labels in the set, it will be considered as nonlabeled situation. Modifications for BP-MLL algorithm made in [37] avoid failures under these two situations by taking differences between the rank values and the thresholds into account; besides, experimental performance of the modified algorithm is better shown on the same dataset as that in [32]. However, none of the existing literature specifically addresses full-labeled or nonlabeled situations, which may cause serious problems in practical application as computational errors may happen during network learning process in current approaches. As a result, the normal working rotating machinery would be misdiagnosed as a faulty one. Therefore, in this paper, we proposed an improved BP-MLL algorithm with a novel global error cost function and regularization term enhancing the generalization ability. Additionally, the cohesion evaluation algorithm based on standard deviation analysis is applied to obtain more comprehensive signal information and improve the adaptive ability of dynamic models.

Based on the above literature review and discussions, the main contributions of this paper are declared as follows: (1) a new global error function is proposed to deal with the problem of full-label and nonlabel learning situations; (2) a fault diagnosis method based on the improved BP-MLL and cohesion evaluation is proposed; (3) the problem of multilabel gearbox and bearing fault diagnosis in rotating machine under different working and environmental conditions is investigated.

The structure of this paper is as follows. Section 2 discusses the preliminaries and formulates the problem. In Section 3, the proposed method is introduced. In Section 4, hardware experiments and comparative studies are carried out to verify the effectiveness of the method. Section 5 concludes this paper.

2. Preliminaries and Problem Formulation

2.1. BP-MLL

Suppose the training set is composed of multilabel instances, i.e., . Each is a -dimension feature vector and is the associated set of labels. The BP-MLL architecture is shown in Figure 1, where input neurons correspond to a feature vector, output neurons represent labels in , and the hidden layer has hidden units. Each layer is fully connected with the next layer, with the weights . The number of hidden layers may be more than one in different neural network structures.

The error function proposed in [32] is

This error cost function reflects the relationship between relevant labels and irrelevant ones by calculating the difference between them:

Specifically, is the complementary set of and measures the cardinality of a set. represents the output of the network on one label belonging to the instance () and represents the one not belonging to it (). Apparently, the larger the difference is, the smaller value of the error function of BP-MLL algorithm is, so that labels in will get greater neural network outputs than those not in. Therefore, when the training set covers sufficient information to disseminate the learning problem, the trained neural network will eventually distinguish the relevant labels from irrelevant ones.

2.2. Problem Formulation

Consider the following uncertain cases in the diagnostic system:(1)There are not any labels for one instance, indicating all components in the rotating machinery run perfectly such that .(2)All the components are broken down such that , where is the complementary set of .

When the diagnostic system applies error function Eq. (1), either uncertain case would cause mathematical failures, in the full-labeled case:

Firstly, because of , the denominator ; and furthermore, the value of would toward infinity based on the property of exponential function. Similarly, when there do not exist any labels for a specific instance, leads to an unreasonable denominator and besides, is approaching infinity as well:

According the chain rule and the gradient descent rule for updating weights, the mathematical formulations are shown as below:where is the learning rate, represents the weights from the hidden layer to the output layer, represents the weighted sum, and is the actual output of -th output unit.

Apparently, there exists nondifferentiability in Eq. (5) since the value of tends to infinity. Therefore, our approach is to optimize the error function to develop a fault diagnostic system such that it would not be affected by uncertain cases such as nonlabeled and full-labeled situation. In the next section, the improved error function that tolerates uncertain cases with high generalization ability is introduced.

3. Proposed Fault Diagnosis Method

Figure 2 illustrates the steps of fault diagnosis in rotating machinery. Firstly, the signals from the time-frequency domain are collected from multiple channels under different working conditions. Secondly, to fully grasp the characteristics of the signal and enhance the recognition ability of the fault diagnosis system, the cohesion evaluation algorithm is employed to pick out feature parameters with high sensitivity to form the sensitive feature vector. Finally, an improved BP-MLL neural network is trained and utilized to classify the constructed feature vectors to make the system have dynamic model adaptability.

3.1. Feature Selection

Compared with conventional algorithms, the cohesion evaluation algorithm based on standard deviation analysis can combine multiple signals to obtain more comprehensive signal information and achieve the purpose of improving the accuracy of fault diagnosis [10]. The distance assessment technique is described as follows:

Assume a set of -dimensional feature vector has different classes and the index of samples for each category is :where , , and are positive integers, and reflects the -th feature parameter of the -th instance of the -th category.

Table 1 is the specific operational steps of the cohesion evaluation algorithm where steps 1–3 reflect the intracategory standard deviation computation. is a characteristic parameter, in which different characteristic parameters represent different practical meanings and reflects the weight of the corresponding position neuron. The classification can be improved by reducing average intracategory standard deviation and the intracategory standard deviation . Steps 4–8 represent the standard deviation computation of the feature distance, where the larger standard deviation of the feature distance and the smaller imparity measure of intercategory cohesion difference are more favorable for classification. Steps 9–10 determine the sensitivity of each feature parameter.

Figure 3 represents the cohesion evaluation process using the parameters in Table 1 as the horizontal and vertical coordinates, where and , respectively, represent the size of circle radius and the position of circle center. In Figure 3, the intracategory standard deviation in class 3 and class 4 are easily overlapped and the distances between the points in each class are similar, resulting in a small average intercategory cohesion difference , which is not conducive for distinguishing; in contrast, classes 1 and 2 belong to the easy classification feature parameter class. Overall, the cohesion evaluation algorithm can reflect the internal dispersion of the data and compare the detail of data differences. According to the steps of the cohesion evaluation algorithms in Table 1, the sensitivity weighting factor can be calculated as

The sensitivity factor iswhere is the proportional adjustment coefficient.

The input feature vector of classification neural network is constructed by selecting parameters with large sensitivity factor according to equation (7):where sorts the feature parameters in descending order, and first () high sensitive parameters construct a -dimension input feature vector.

3.2. Feature Classification

The conventional BP-MLL algorithm captures correlation between relevant labels and irrelevant ones by using distance function which calculates the difference between them. The error function accumulates the differences in each instance and then normalize the summation by the total number of pairwise labels, i.e., . As a result, with the increase in distances, the value of error function equation. (1) in BP-MLL algorithm will be smaller and smaller, which helps to rank labels belonging to an instance higher than those not belonging to.where represents the output of relevant labels of -th instance and is the output of those irrelevant ones.

Nevertheless, as mentioned in Section 2, the conventional algorithm does not take full-labeled or nonlabeled situations into account. Full-label in an instance is shown in Figure 4 and the nonlabeled situation implies an instance does not have any marked labels.Mathematical failures such as an unreasonable denominator would occur during training or the trained neural network cannot attain an acceptable classification result on unseen cases if not considering those two situations. The most direct way is to modify the distance function so that the algorithm can let labels be as close to targets as possible while considering the characteristics of multilabel learning. Therefore, in this paper, modifications are made on the distance function to handle this problem. If an instance is not marked by any labels, that is, , then there only exist irrelevant labels so that the distance function would be modified as

Similarly, if all labels are marked in an instance, then these labels are all relevant so that the distance between 1 and these labels should be as small as possible:

The error function in BP-MLL algorithm is visualized in Figure 5, along with the modifications marked as red solid line and blue dotted line. From the perspective of image, two new distances functions based on full-labeled and nonlabeled situations are derived from original error function, which indicates these would not generate conflicts in different types of training samples. Therefore, the improved error function in BP-MLL algorithm preserves the ability of discriminating relevant labels and irrelevant ones; meanwhile, it has capability of dealing with full-labeled or nonlabeled situations. Apart from the modifications on distance function, a regularization term is added to enhance the generalization ability. The main contribution in this paper is the improvement of BP-MLL algorithm to let it concentrate on both the correlations between different labels and the occurrence of empty sets by the following global error function:where is the regularization coefficients and

Remark represents the actual output of -th output unit and this label belongs to this instance (), while represents the output of one label that does not belong to this instance (). Due to the property of the exponential term, the bigger the difference between and is, the smaller the global error is.

In conventional BP-MLL algorithm, saturation may occur due to the choice of activation function: sigmoid or hyperbolic tangent (tanh). Moreover, due to the exponential computation in the error function, using these two functions can also lead to high time complexity. Therefore, to avoid vanishing gradient problem and to reduce the difficulty of calculation, the improved method proposed in this paper adopts leaky rectified linear unit (Leaky ReLU) as the activation function:where .

Let represent the weighted sum; then the actual output of -th output unit is

In this paper, the gradient descent rule is adopted to adjust weights and bias until the global error converges to an acceptable value so that the updated iswhere is the learning rate, and define as

Let ; then

Substituting equations (9) and (14) into (13):

So equation (12) can be rewritten as

A preset threshold is used for classification of each instance and fault detection, represented by :

The proposed algorithm for the multifault diagnosis of rotating machinery is summarized as Table 2 and the main contributions of the proposed algorithm are as follows: (1) optimize the BP-MLL algorithm by improved error cost function and regularization term; (2) propose a fault diagnostic algorithm based on the improved BP-MLL and cohesion evaluation; (3) perform an experimental study on the multilabel gearbox and bearing fault diagnosis in rotating machine under various working and environmental conditions.

4. Experimental Results

4.1. Experimental Platform

In this paper, wind turbine drivetrain diagnostics simulator (WTDDS) produced by SpectraQuest, USA, is used as the experimental platform. Figure 6 illustrates its operation diagram, in which label a represents the torque sensor set on the shaft, b and d are the vibration sensors, fixed above and to the left of the parallel shaft gearbox, respectively, and c is the pressure sensor. These sensors are connected to a multichannel signal acquisition device which can aggregate signals to a computer and convert them into voltage signals.

In Figure 7, the reference number 4 is a single-phase motor to power the entire system. The reference number 3 is a parallel shaft gearbox which can transmit the kinetic energy of the motor to the planetary gearbox by coupling with the motor. In the planetary gearbox (referred to by 2), four planetary gears are rotated under the traction of the driving wheel that can transmit kinetic energy to the load brake referred to by 1. The windmill (referred to by 6) then can be driven by the next stage of rotating shaft after the braking action of the load brake.

4.2. Experimental Setup

In reality, the motor frequency is affected by the mechanical structure and wind speed, and the load is related to the generator structure and voltage. To collect accurate and effective data, the combination of different motor frequency and load voltage is used to simulate different working conditions. Through the software LabView, the input voltage of the load controller and the speed controller is adjusted to realize manual control of shaft load and motor speed. Table 3 summarizes the six operating conditions considered and set in this experiment:

The proposed method is applied to classify the five types of faults in gears, which are ball bearing, outer bearing, inner bearing, chipped tooth, and missing tooth, respectively, as shown in Figure 8. A label of five-digit binary code represents the expected output value. In the experiment, the length of the feature vector for the signal segment is 2048, and the sampling time is 6.4s, and the sampling frequency is 5120 Hz. The categories of simultaneous faults and the samples assigned to training and testing procedures are shown in Table 4.

4.3. Experimental Results
4.3.1. Classification of Different Simultaneous Faults

The multilabel algorithms (improved BP-MLL, BP-MLL, and ML-KNN) and the conventional classification technique (BP neural network) were performed on the same data sets. In this paper, all experiments have been performed on a computer with 8G RAM and Intel® Core i5-7200U CPU @ 2.70 GHz. The improved BP-MLL neural network has two hidden layers, the numbers of neurons in input, first hidden, second hidden, and output layers are 32, 72, 12, and 5, respectively. To approach the optimal solution and make the algorithm converge, the learning rate in improved BP-MLL neural network is set as .

Table 5 summarizes the classification accuracy of different types of simultaneous faults and the total training time in terms of various algorithms. The ML-KNN algorithm required the least time to train and obtained a relatively high classification accuracy, but the classification accuracy of certain cases cannot be guaranteed, such as 00000 and 00101. Although the basic BP technique trained the neural network fast, it is limited to nonlabeled situation discriminations and fails to classify the second faulty types. As mentioned in Section 2, conventional BP-MLL algorithm is unable to deal with nonlabeled situations. Additionally, conventional BP-MLL spent more time than improved BP-MLL to train the same network. This is because the proposed method applied Leaky ReLU as the activation function that helps to reduce time consumption for calculation during the gradient descent process by judging the weighted sum in error function equation (9) firstly. The faults on outer bearing with chipped tooth confuses BP, BP-MLL, and ML-KNN. However, all algorithms for comparison can detect the faults on outer bearing with missing tooth. In general, the proposed method can achieve the accuracy of with less training time than conventional BP-MLL method. Although the training time of the proposed method is longer than ML-KNN method, this is acceptable as the training process is conducted offline and the classification accuracy of the proposed method is consistently higher than ML-KNN method.

As a nonlinear dimensionality reduction algorithm, T-distribution stochastic neighbor embedding (t-SNE) technique proposed in [38] uses conditional probability to express the similarity of distance between data points, which is very applicable for dimensionality reduction. To visualize classification results directly, the three-dimensional mapping results are shown in Figure 9. Although data points in conventional BP-MLL have clear borders, the third type of simultaneous faults cannot be detected correctly due to the missing classification of one specific fault. Additionally, there are only a few points in ML-KNN; this is because the value of each output unit is the probability of each label. The predicted data instances after training in improved BP-MLL algorithm are basically distributed around their centers.

4.3.2. Comparison on Different Algorithms

To compare the performance of the proposed method with other conventional ones, the following six evaluation metrics are used to measure the classification results [39].F1-score is also known as balanced F score, which is defined as the harmonic mean of precision and recall:where denotes the predicted labels and is the desired value of -th label in -th instance. Recall is the fraction of the correct labels expected from the actual labels, while precision is the fraction of labels correctly classified from the expected positive labels, averaged on all instances:Hamming loss is used to investigate the misclassification of an instance on a single label; i.e., the correlation label does not appear in the predicted label set or the irrelevant label appears in the predicted label set. The smaller the value of Hamming loss is, the better the system performance is:where XOR represents exclusive or.Ranking loss evaluates the fraction of pairs of labels that are misclassified for the instance. The lower the values of this metric are, the better the performance is:where is the size of error-set and , and is the true value of output before being labeled.Average Precision, which is also called classification accuracy or exact match ratio, computes the percentage of instances whose predicted labels are exactly the same as the actual corresponding set of labels:One error describes the possibility that top-ranked labels in one instance are not the actual labels in the proper set. The smaller the value of one error is, the better the system performance isCoverage measures how far the traversal all the labels is in the ranking averagely associated with an instance. The smaller the value is, the better the performance is

Table 6 summarizes the performance of each algorithm based on the evaluation metrics. All four algorithms get high f1-score: 1, 0.8967, 0.9594, and 0.9783, separately; particularly, improved BP-MLL obtained the highest one. The values of average precision are all above 0.9. Additionally, all these classifiers can recognize the relevant labels in each instance which leads to one error at 0. Improved BP-MLL and ML-KNN shared the same figures in coverage and ranking loss. Nevertheless, values of coverage for basic BP and conventional BP-MLL algorithms are twice as high as those for both improved BP-MLL and ML-KNN. In comparison, the Hamming loss of improved BP-MLL is slightly lower than ML-KNN, which is because ML-KNN shows misclassifications on single labels in some instances. Although values of different metrics vary, multilabel learning algorithm can predict the simultaneous faults more effectively, among which improved BP-MLL outperforms the other compared two methods.

5. Conclusion

In this paper, an improved fault diagnostic method based on cohesion evaluation and improved BP-MLL classification is proposed. Compared with the conventional single-fault diagnosis, the problem of simultaneous faults occurring on gearbox and bearings of rotating machinery under different environmental conditions is investigated. On the basis of BP-MLL, this paper proposes a new global error function to deal with full-label and nonlabel learning situations through modifying its distance function and enhancing the generalization ability. Experiments conducted on WTDDS show that the proposed method is superior to conventional methods under six performance evaluation metrics. Although this paper has achieved good experimental results, there are still limitations such as the current algorithm being supervised learning only. Therefore, further studies can be focused on improvements such as semisupervised learning based on partial labels and transfer learning based on different working conditions.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

An earlier version of this manuscript has been presented in 2020 IEEE 29th International Symposium on Industrial Electronics as An Improved Simultaneous Fault Diagnosis Method Based on Cohesion Evaluation and BP-MLL for Rotating Machinery.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by National Natural Science Foundation of China (61603223), Jiangsu Provincial Qinglan Project, Research Development Fund of XJTLU (RDF-18-02-30, RDF-20-01-18), Key Program Special Fund in XJTLU (KSF-E-34), and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (20KJB520034).