Research Article  Open Access
Pengcheng Jiang, Hua Cong, Jing Wang, Dongsheng Zhang, "Fault Diagnosis of Gearbox in Multiple Conditions Based on FineGrained Classification CNN Algorithm", Shock and Vibration, vol. 2020, Article ID 9238908, 15 pages, 2020. https://doi.org/10.1155/2020/9238908
Fault Diagnosis of Gearbox in Multiple Conditions Based on FineGrained Classification CNN Algorithm
Abstract
The use of the convolutional neural network for fault diagnosis has been a common method of research in recent years. Since this method can automatically extract fault features, it has played a good role in some research studies. However, this method has a clear drawback that the signals will be significantly affected by working conditions and sample size, and it is difficult to improve diagnostic accuracy by directly learning faults, regardless of working conditions. It is therefore a research orientation worthy of a diagnosis of high precision defect in various working conditions. In this article, using a finegrained classification algorithm, the operating conditions of the object system are considered an approximate classification. A specific failure in different working conditions is considered a beautiful classification. Samples of different faults in different working conditions are learned uniformly and the common characteristics are extracted from the convolutional network so that different faults of different working conditions can simultaneously be identified on the basis of the entire sample. Experimental results show that the method effectively uses the set of samples of the working conditions of the variables to obtain the dual recognition of defects and specific working conditions and the accuracy of the recognition is significantly higher than the method of learning regardless of working conditions.
1. Introduction
Traditional methods of fault diagnosis, whether in time domain or frequency domain analysis, are highly dependent on physical experience. In recent years, because convolution neural network (CNN) was proposed [1], deep learning algorithms have developed rapidly; their powerful endtoend learning ability enables feature extraction work that needs experience to be completed independently by CNN, which becomes a new direction of fault diagnosis research. For rotating machine, especially for gearbox, using CNN to learn the vibration signals was the main method of fault diagnosis [2–5]. However, most of these studies are only based on specific conditions. Obviously, many systems have very strong characteristic changes under different working conditions. Although some research studies try to use CNN’s ability to directly model the multiworking conditions of the object, the settings of this working condition change were very limited [6]. In fact, in some practical fault diagnosis problems, the influence of working conditions on signal characteristics is greater than that of fault types [7]. Therefore, extracting fault features directly without considering working conditions can seriously reduce the accuracy of classification [8, 9]. In order to solve this problem, some studies use the method of separated conditions modeling [7, 9], which divides the entire problem into independent problems that are not related to each other and reduces the utilization efficiency of samples. The latter method addresses another problem, which means that the sample size must be quite large. But the diagnosis of fault is often difficult to sample; this means that a separate model must be formed on a smaller sample, so accuracy is limited. Transfer learning is used to solve this problem, including adjusting the source model to adapt to new conditions [10, 11] and using the source model to accelerate learning [12]; the premise is that there is a source model. These methods have achieved excellent results, but they are all based on an important premise, that the sample size is large enough to support diagnostic modeling in a single working condition. However, in many problems, this is a difficult requirement. A neglected message is that, although under different conditions, they have the same characteristics as the same target system. Because the influence of working conditions and defects on the system response is interactive and the influence of working conditions is greater than that of defects, the diagnosis of defects in different working conditions can be considered a finegrained classification problem, which is a different structure from the CNN traditional. This structure is very effective in dealing with hierarchical problems [12–15] and has been applied in the diagnosis of defect [16], but only applied as a means of improving information. However, this structure was not used to solve the problem of defect diagnosis of limited samples under the disturbance state. Because the influence of working conditions is more important than health states, the model is designed as shown in Figure 1.
In the traditional method (Figure 1(a)), when the sample size is sufficient, the modeling effect has been achieved by using subworking condition modeling. According to the traditional method (Figure 1(b)), the coupling of the working conditions to the fault is ignored, which has a significant impact. In method in Figure 1(b), although it seems that more samples are used to support the training of a single network, the modeling accuracy will often be seriously reduced instead. This means that although it is necessary to find a more efficient way to use the sample, the effect of distinguishing working conditions on fault classification must also be guaranteed at the same time.
The finegrained method (Figure 1(c)) uses the same convolutional layers to extract common features of faults. Through the twolevel fully connected network and the twostage loss function, recognize the working condition features with stronger significance as coarse classification, and the fault features are subdivided into specific faults under different working conditions as a finegrained identification. In order to increase the recognition accuracy, the function of increasing class spacing is designed and added to the loss. All the above operations eliminate the impact of changes in operating conditions on fault diagnosis. Our work has the following advantages:(i)High utilization efficiency of the data: all samples unified are used to train a single feature extractor, the sample utilization ratio is doubled compared with the traditional division of labor modeling, the convolution layer can obtain more adequate training to achieve better feature extraction effect, and the working condition label originally used for manual division is also used in learning(ii)Under the constraint of finegrained model, the fault under different working conditions is automatically distinguished, and the influence of working conditions on fault diagnosis is solved while all samples are trained uniformly
Experiments show that the gearbox is a system with significant differences in working conditions and rare effective fault samples and its fault diagnosis problem is solved.
2. Research Object
Gearbox is an object with distinct characteristics. Because different working conditions mean completely different mesh states, working conditions are a kind of information that cannot be ignored. Gearbox is also a kind of object with a high cost of collecting the sample, due to the low failure rate; hundreds of hours of work can often encounter a single failure as an effective sample. This means that it is not possible to form the model through a large sample size. An appropriate method must take into account the economic cost. In this paper, the planetary gearbox for a particular type of vehicle is utilized as an object. The gearbox transmission principle is shown in Figure 2. There are three planetary gears K1, K2, and K3 that need to be analyzed. The test bench and sensor arrangement are shown in Figure 3.
The health states corresponding to the three planetary wheels include (a) normal, (b) planetary wheel tooth fault (hereinafter referred to as fault (1)), (c) planetary wheel tooth fault (hereinafter referred to as fault (2)), (d) planetary wheel tooth fault (herein after referred to as fault (3)), and (e) solar wheel tooth fault (herein after referred to as fault (4)). There are 4 kinds of corresponding working conditions: 1 to 4 gears. The signal characteristics of the same fault under different working conditions are not exactly the same. For example, normal and fault 3 and fault 4 conditions include 1/2/3/4/5 and reverse gear. The possible working conditions of fault 1 and fault 2 include gear 2 and gear 3. However, the conventional convolutional network can only label the 5 situations that need to be identified. The relationship between different working conditions and the change of fault signal characteristics by working conditions cannot be connected. The variation of working conditions becomes a disruptive factor in fault diagnosis, which reduces the accuracy of the diagnosis. Finegrained classification algorithm can distinguish the influence of working conditions and health state from principles. Its performance depends on the design of the loss function.
In [12, 13], the contrast loss function is utilized. In [14, 15], a triple loss function is used to learn a feature that maximizes the distance between classes while minimizing the distance between classes and improve the precision of the finegrained classification algorithm. However, when the amount of data is relatively large, the computation amount of these two algorithms increases exponentially when constructing binary or triple groups. Experiments show that the quality of binary or triple sets has a major influence on the final classification accuracy. Therefore, problems such as slow model convergence, large computation, increased training complexity, and increased uncertainty of results will occur when using the above methods. In order to solve the above problems, a new loss function has been proposed in the literature [17], which has improved the classification accuracy of finegrained classification problems of the following two aspects: (1) a structure of cascading classification was designed to better describe the hierarchical relationship between finegrained and coarse categories; (2) a wide margin loss method is proposed, which aims to minimize the inner distance and maximize the distance between classes, and the distance between finegrained classes belonging to the same class of parents should be smaller than the distance between classes with finegrained belonging to different classes of parents. The algorithm in [18] is applied to the gearbox fault diagnosis to explore its effect.
3. Methodology
Before introducing our method, we give some definitions in order to explain our method in a mathematical way. Given a signal data set , where is the th input signal and is the total number of the training sets, each label of the input signal has a hierarchical label structure , where is the class label of the th level, is the number of label levels of the hierarchical label set, and is the number of classes of the th level. We suppose that the first level denotes the finegrained label. So is the finegrained label corresponding to signal , and is the total number of finegrained classes. For each input signal , we define the preliminary layer of the CNN model as its feature vector, denoted as . For the problem of fault signal diagnosis to be solved in this paper, the hierarchical label structure consists of two levels (). The first level is the finegrained class level, which identifies the specific sources of failure ( = the total number of health states). The second level is the coarse class level () which denotes different working conditions. After explaining the basic definitions above, we introduce the two main parts of our method, respectively.
3.1. Cascaded Softmax Loss
In the problem of finegrained classifications with two hierarchical labels (working condition coarse class classification and health state finegrained class classification), we divided the last classification layer of the CNN model into two fully connected layers and used a cascaded softmax loss for training. Figure 4 shows the whole framework of our method. The number of neurons in fc6 layer (health state finegrained class classification layer) and fc7 layer (working condition coarse class classification layer) is and , respectively. For each input signal , the output of fc5 layer is the feature vector of the signal . The output of the fc6 layer and fc7 layer is the probability scores of the health state finegrained class and the probability scores of the working condition coarse class .
The purpose of adding the skipping connection between fc5 layer and fc7 layer is that it could provide not only the features of the health state finegrained class but also the probability scores (the output of fc6 layer) of it. Intuitively, using the abovementioned two different types of information to conduct the working condition coarse level classification will be better than the one using only the health state finegrained level classification results, since the former not only explores the semantic information of the fault signals (i.e., the learned features), but also learns the hierarchical label structure of the fault signals. Besides, in the iterative learning procedure, the error flows of fc7 layer backpropagate to fc6 layer, fc5 layer, and the first few layers of the CNN model; this can help to improve the health state finegrained class classification accuracy.
To train the network shown in Figure 4, the cascaded softmax loss of fc6 layer and fc7 layer is as follows:where denotes the parameters of the whole network. For finegrained classifications problem with two hierarchical labels (working condition coarse classification and health state finegrained classification) (), and are used to train fc6 layer and fc7 layer, respectively. In fact, the cascaded softmax loss can be seen as a multitask learning problem. One task is the health state finegrained class classification; the other is the working condition coarse class classification. In the joint training procedure, these two tasks could improve each other by sharing the feature representation.
The whole loss function of training the CNN network in Figure 4 can be defined aswhere is the cascaded softmax loss defined in formula (1), and represents the GLM (generalized largemargin) loss, used to train the feature layer (fc5 layer) of the network. The input of the GLM loss is the training feature set and hierarchy structure label set . is the hyperparameter which is used to balance the cascaded softmax loss and the GLM loss.
3.2. Generalized LargeMargin Loss
For each health state finegrained class , we define two groups and which make up the remaining health state finegrained classes. These two groups consist of the health state finegrained classes which share and do not share the same parent working condition coarse class with class , respectively. The purpose of the GLM loss includes two aspects: (1) the distance between health state finegrained class and the nearest health state finegrained class in larger than the intraclass distance of health state finegrained class by a predefined margin; (2) the distance between health state finegrained class and its nearest health state finegrained class in is larger than the distance between health state finegrained class and its farthest health state finegrained class in by a predefined margin. In the following, we will first define the intraclass distance and interclass distance and then use these definitions to describe the GLM loss. The principle is shown in Figure 5.
The features belonging to health state finegrained class in the training set are defined aswhere denotes the index set belonging to the health state finegrained class in the training data set. The mean vector of is defined aswhere . The intraclass distance of is
We define and as two feature sets:
Then the interclass distance between and can be defined aswhere is the ()th item in affine matrix between and , , is the set consisting of nearest pairs of samples in the set of sample pairs , is the Laplacian Matrix of , i.e., , and is the matrix trace.
After we describe the definitions mentioned above, the two constraints in the GLM loss can be expressed aswhere and are two predefined margin values, , and
In formulas (11)–(13), consists of health state finegrained classes that share the same parent (i.e., working condition coarse class) with health state finegrained class . And is the feature vector set of fault signal samples which is the closest to health state finegrained class in the health state finegrained classes of , and is the feature vector set of fault signal samples which is the farthest to health state finegrained class in the health state finegrained classes of . Besides, consists of the health state finegrained classes that do not share the same parent working condition coarse class with health state finegrained class and is the feature vector set of the fault signal samples which is the closest to health state finegrained class in the health state finegrained class of (Figure 5).
Using the definitions mentioned above, the GLM loss which contains twolevel label structure can be described as
3.3. Optimization
The CNN model formation algorithm we use here is the standard backpropagation algorithm (BP) based on the minibatch. The full loss function is defined in formula (2). By minimizing this objective function, the CNN model can learn fault characteristics that can better distinguish different health conditions in different working conditions, thus improving the classification accuracy of faults. Finally, therefore, we need to calculate the gradients of the entire loss function with respect to the activations of all CNN layers, which are called the error streams of the corresponding layers. As it is very simple to calculate the gradient of softmax loss, we only provide the gradient bypass process for the loss of GLM with respect to the following. In this paper, we use a twolevel hierarchical label structure to describe the fault diagnosis problem. Therefore, the derivatives of the GLM loss regarding can be computed as follows:where is an indicator function, which equals one if the condition is true, and zero otherwise. The th column of the matrix is represented by a subscript , and
Algorithm 1 describes the training algorithm based on the network framework shown in Figure 4 of formula (15).

4. Experiments
4.1. Data Description
The experimental data used in this paper is collected on a planetary gearbox for a particular type of vehicle mentioned in Section 1. And the physical diagram of fault simulation experimental platform is shown in Figure 6.
The planetary gearbox data samples are divided into five health states: normal state (Normal), K1 planetary gear failure (fault 1), K1 planetary gear failure (fault 2), K2 planetary gear failure (fault 3), and K3 planetary sun gear failure (fault 4). The labels correspond to five health states denoted as 0–4, respectively. Each health state samples is collected under 4 working conditions, which correspond to the gear positions 1 to 4. The data sampling frequency is 20 kHz and each working condition has four types of input speed: 600 r/min, 900 r/min, 1200 r/min, and 1500 r/min. The load torque is . A total of 4,800 samples are obtained for each health state, and 80% of samples are used for training. Each sample contains 4 measurement points, which are the vibration signal of the gearbox, and the length of each measurement point is 2000 which is longer than one rotation period. So the sample data length is . There are 12,000 samples in total. All the sample data are shown in Table 1.

For the fault diagnosis problem described in this paper, the traditional method (a) needs to establish a model according to each independent condition; a total of four models can be completely processed; each model has 5 states; each state has 600 tags, while the traditional method (b) ignores the working conditions to use all the samples, a total of 5 states, 2400 samples each. The fine granularity method used in this paper also has the same label form as the traditional method (a) while using all the working condition samples uniformly. And the method (c) focuses on fault diagnosis while avoiding losing the working condition information at the same time. There are 4 conditions as the coarse class and faults as the fine class. As for every method, the total number of samples is 12,000.
4.2. Experiment Settings
The CNN models used in this paper have the same network structure and hyperparameters except for the different output structures, that is, 5 convolutional layers and 2 fully connected layers, each of which includes a pooling layer and a BN layer. The output layer is a softmax layer. The activation function of the convolution layer is Relu. The activation function of the fully connected layer is Sigmoid. The crossentropy loss is used as the loss function. And the optimization method is the Adam algorithm. The regularization function is L2. The learning rate is 0.0001 and the batch size is 32. The hyperparameter used in the loss function (formula (2)) of the proposed method is 0.1.
4.2.1. Traditional CNN Fault Diagnosis Method Using Separate Modeling of Multiple Working Conditions
The framework of the traditional CNN model using separate modeling of multiple working conditions is illustrated in Figure 1(a), and this method is abbreviated as CNNS. It has four CNN models in total and each model corresponds to one working condition. The output of each CNN model contains 5 health states; the working condition information of the input sample needs to be clarified during testing stage. The 5 health states are K1 planetary row large planet wheel failure (fault 1), K1 planetary row asteroid gear failure (fault 2), K2 planetary gear failure (fault 3), K3 planetary gear solar gear failure (fault 4), and normal status. The composition of the data samples used in the traditional CNN method using separate modeling of multiple working conditions is shown in Table 2. It shows only the data used by one model, and each model has the same data structure.

4.2.2. Traditional CNN Fault Diagnosis Method Using Simultaneous Modeling of Multiple Working Conditions
The framework of traditional CNN model using simultaneous modeling of multiple working conditions is illustrated in Figure 1(b), and this method is abbreviated as CNNM. The output of the traditional CNN model contains 5 health states; it cannot recognize the working condition of the input sample. The 5 health states are K1 planetary row large planet wheel failure (fault 1), K1 planetary row asteroid gear failure (fault 2), K2 planetary gear failure (fault 3), K3 planetary gear solar gear failure (fault 4), and normal status. The composition of the data samples used in the traditional CNN method using simultaneous modeling of multiple working conditions is shown in Table 3.

4.2.3. FineGrained Fault Classification Algorithm
The framework of the finegrained defect classification algorithm is illustrated in Figure 1(c), and our method is abbreviated as CNNFG. Compared to CNN’s traditional methods, the finegrained classification algorithm uses two levels of label hierarchical structure, which are coarse class level and finegrained class level. So, it has two classifiers (softmax layer) after the fully connected layer of CNN; one recognizes the finegrained class level and the other recognizes the coarse level. The coarse class level corresponds to working conditions and the finegrained class level represents health conditions. Working conditions are divided into condition 1, condition 2, condition 3, and condition 4, which correspond to coarse labels 1/4. Data samples under each working condition are divided into 5 health conditions: normal state and fault 1–fault 4; and corresponding state tags are 0–4. So we have a total of 20 finegrained class labels, corresponding to finegrained class labels 0–19. The composition of the data samples used in the finegrained classification model is shown in Table 4.

5. Results and Discussion
5.1. Experimental Results
5.1.1. The Results of CNNS
CNNS is the most common method. During the model training, when the training set accuracy reaches 100% and the loss function no longer decreases, the average fault diagnostic accuracy of each working condition diagnostic model on the test set is only 95.5%, 88.5%, 94.5%, and 95.8%. It shows that the network structure is reasonable and the functional operation is preliminarily effective. However, due to the limitation of the number of samples (each model only has 600 samples per health state in total), it has not been fully activated in training, which appears to be a typical overfitting. In order to understand the diagnosis results in more detail, Figure 7 shows the confusion matrix of the CNNS model on each working condition test set and the overall statistics. On the confusion matrix plot, the rows correspond to the predicted class and the columns correspond to the true class. Numbers 0∼4 mean the health state labels as shown in Table 2. The diagonal cells correspond to observations that are correctly classified. The number of observations and the percentage of the total number of observations are indicated in each cell. The column on the far right of the plot shows the precision and false discovery rate, respectively. The row at the bottom of the plot shows the recall and false negative rate, respectively. The cell in the bottom right of the plot shows the overall accuracy. The blocks (a)∼(d) show the confusion of each separated condition and their statistical values of merging conditions are shown in block (e).
(a)
(b)
(c)
(d)
(e)
In order to analyze the fault diagnosis results more intuitively, tdistributed stochastic neighbor embedding (tSNE) algorithm is used to visualize the extracted features. The results are shown in Figure 8.
(a)
(b)
(c)
(d)
Figure 8(a) shows the classification results of the extracted features of the corresponding working condition 1 data set. From the figure, we can see that the class spacing between the scattered points of different health states is relatively small, and some scattered points even overlap each other. It is easy to cause the two health states to be incorrectly identified, which will lead to misdiagnosis. For the scattered points of the same health state, the inner distance is relatively large in some classes, and some scattered points of the health state may even be distributed across classes. This will result in the inability to effectively identify the health state and cause failure to diagnose. The above analysis results show that the CNNS model has no obvious effect on the classification of the health status under working condition 1 and cannot accurately diagnose the fault status. The visualized classification results of the extracted features of the data sets under working condition 2, working condition 3, and working condition 4 are shown in Figures 8(b)–8(d), respectively. Similarly, the classification results of these three working conditions show the following: the cases where the distance between classes of different health states is relatively small, and the distance between classes in the same health state is relatively large.
5.1.2. The Results of CNNM
In order to prove the influence of working conditions on gearbox fault diagnosis, and because the sample size is not enough to support CNNS, another common method of CNNM is used here. The final diagnostic accuracy of CNNM is 79.9% on test set. Although data from all working conditions are based on a single model, the number of samples increases, but the accuracy of the final diagnosis is lower than that of one of CNNS’s four unique work status models.
Figure 9 shows the confusion matrix of the extended sample data set shown in Table 3. From Figure 9, the same CNN network structure with the same network parameters can be seen, the diagnostic accuracy for the extended sample data set is 66% to 93%, and the average accuracy is only 79%, which is much lower than any of the four single working condition models in CNNS. Further, the feature extraction visualization of CNNM is shown in Figure 10. It shows that there is a serious overlap between the scattered points of different health status features, which is more serious than any of the cases in Figure 8. In addition, the intraclass spacing between the scattered points of the same health status feature becomes larger, which indicates that the accuracy of fault diagnosis on the expanded sample data set is very low. The results show that the working condition cannot be ignored as the key element of gearbox fault diagnosis. An appropriate global modeling method needs to distinguish all faults under different working conditions.
5.1.3. The Results of CNNFG
The final diagnostic accuracy of CNNFG is 98.8% on test set. Figure 11 shows the fault diagnosis accuracy of the proposed method, both the coarse class (working condition) and the finegrained class (health state). It can be seen that the specific working condition information of the input signal and the specific health state can be accurately identified. This also reflects the effectiveness of the hierarchical label structure proposed in this paper. It combines the working condition information with the health state information, and these two complement each other, providing more reliable information for signal fault diagnosis.
The confusion matrix of CNNFG is shown in Figure 12, which shows the ability of CNNFG model to recognize the coarse class of the data samples. From the figure, it can be seen that, for any data sample under working conditions 1∼4, the CNNFG model is able to identify the working conditions accurately. The numbers 0∼3 are coarse class labels defined in Table 4.
Figure 13 shows the confusion matrix of finegrained class recognition using finegrained classification model. It can be seen from the right column that all accuracies are almost 100%, which means that the finegrained classification model can accurately identify the 20 finegrained classes.
In order to compare the two common methods (a) and (b) more intuitively, the diagnosis results of CNNFG method are merged according to the working conditions. The statistical results are shown as confusion matrix, as shown in Figure 14. Obviously, this method is much better than (a) and (b).
Figure 15 shows the tSNE visualization of the overall feature extraction of the finegrained classification model. It is obvious that the finegrained classification model can clearly identify all 20 health states from the joint feature distribution of states across working conditions. The above results clearly show the superiority of the finegrained classification algorithm for fault diagnosis under the coupling of working conditions and health states.
5.2. Discussion
The comparison of the traditional CNN methods (CNNS and CNNM) with the CNNFG proposed in this paper is shown in Figure 16. Figure 16(a) is the comparison of test loss drop curves of those three methods. We can see that the proposed algorithm can achieve lower loss on the test set. It indicates that the proposed method has better generalization performance. As mentioned above, the network output of CNNS and CNNM model has only five classes, and the network output of CNNFG contains a total number of 20 finegrained classes that belong to 4 coarse classes, so the latter loss function value is larger at the beginning. However, as the training progresses, the loss function value of the latter decreases rapidly and is lower than the traditional CNN methods. It indicates that the finegrained classification algorithm based on the working condition is more suitable to describe the problem of fault diagnosis for variable working conditions. Figure 16(b) shows the fault diagnosis accuracy comparison between the traditional CNN methods and the finegrained fault diagnosis method proposed in this paper. It can be seen that the method used in this paper is significantly better than the traditional CNN methods in fault diagnosis recognition accuracy. It indicates that the traditional CNN methods which directly use the CNN model to extract fault signal features without considering the working conditions have obvious defects. The use of finegrained classification algorithm to extract the common features of different health states under different working conditions more fully explains the problems addressed in this paper. When the health status features are significantly affected by the working conditions, accurately identifying the health states means that the working conditions must be accurately identified at the same time, and the implementation of the two should ultimately be synchronous.
(a)
(b)
Table 5 shows the comparison of fault diagnosis accuracies between traditional CNN models and finegrained CNN models. The advantages of finegrained classification methods in diagnostic accuracy can be seen very intuitively from the table.

6. Conclusion
The innovations of the proposed algorithm are twofold: (1) modify the CNN structure, introduce the skipping connection, and use softmax cascade loss to train the network, so that the CNN model has the ability to extract the common characteristics of the types of failures and working conditions, and use these common characteristics. (2) The hierarchical label structure allows CNN to simultaneously identify the types of failures and working conditions and at the same time improve the accuracy of the diagnosis of the types of failures. Compared with traditional CNN methods, the finegrained classification model can not only effectively expand the number of original samples by combining the samples of different working conditions. Considering the influencing factors of working condition coupling, the coupling between working conditions and health status is unified into the same CNN model, so as to extract features from all samples to the maximum and complete the feature extraction across the working conditions. Finally, the decoupling of working conditions and health status is achieved, and the fault diagnosis accuracy is effectively improved. The problem of the sample expansion of the planetary gearbox under a limited sample situation and the problem that the working conditions and the health states are coupled to reduce the diagnosis accuracy are solved. The results show that this method is superior to existing methods.
It should be noted that this method is very suitable for the gearbox, or other systems whose characteristics are significantly affected by working conditions, and that it is determined that it is difficult to obtain sampling on a large scale. When the influence of working conditions is limited, the SIMPLE CNNM method can be used. If the system is heavily affected by working conditions, with large sample size, CNNS can achieve the same effect. However, where both exist, the methods provided in this paper are irreplaceable.
Data Availability
The data from military equipment that cannot be disclosed.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the National Natural Science Foundation of China under Grant no. 51875576.
References
 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. View at: Publisher Site  Google Scholar
 L. Wen, L. Gao, X. Li, L. Wang, and J. Zhu, “A jointed signal analysis and convolutional neural network method for fault diagnosis,” Procedia CIRP, vol. 72, pp. 1084–1087, 2018. View at: Publisher Site  Google Scholar
 G. Sheng, Y. Tao, G. Wei et al., “A novel fault diagnosis method for rotating machinery based on a convolutional neural network,” Sensors, vol. 18, no. 5, p. 1429, 2018. View at: Publisher Site  Google Scholar
 O. Janssens, V. Slavkovikj, B. Vervisch et al., “Convolutional neural network based fault detection for rotating machinery,” Journal of Sound and Vibration, vol. 377, pp. 331–345, 2016. View at: Publisher Site  Google Scholar
 D. T. Hoang and H. J. Kang, “Rolling element bearing fault diagnosis using convolutional neural network and vibration image,” Cognitive Systems Research, vol. 53, pp. 42–50, 2018. View at: Publisher Site  Google Scholar
 F. Jia, Y. Lei, J. Lin, X. Zhou, and N. Lu, “Deep neural networks: a promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data,” Mechanical Systems and Signal Processing, vol. 7273, pp. 303–315, 2016. View at: Publisher Site  Google Scholar
 T. Wang, H. Wu, M. Ni et al., “An adaptive confidence limit for periodic nonsteady conditions fault detection,” Mechanical Systems and Signal Processing, vol. 7273, pp. 328–345, 2016. View at: Publisher Site  Google Scholar
 X. Ding and Q. He, “Energyfluctuated multiscale feature learning with deep convnet for intelligent spindle bearing fault diagnosis,” IEEE Transactions on Instrumentation and Measurement, vol. 66, no. 8, pp. 1926–1935, 2017. View at: Publisher Site  Google Scholar
 M. Cerrada, G. Zurita, D. Cabrera et al., “Fault diagnosis in spur gears based on genetic algorithm and random forest,” Mechanical Systems and Signal Processing, vol. 7071, pp. 87–103, 2015. View at: Publisher Site  Google Scholar
 K. Yan and D. Zhang, “Calibration transfer and drift compensation of enoses via coupled task learning,” Sensors and Actuators B: Chemical, vol. 225, pp. 288–297, 2016. View at: Publisher Site  Google Scholar
 L. Wen, L. Gao, and X. Li, “A new deep transfer learning based on sparse autoencoder for fault diagnosis,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 1, pp. 136–144, 2017. View at: Publisher Site  Google Scholar
 S. Shao, S. McAleer, R. Yan et al., “Highlyaccurate machine fault diagnosis using deep transfer learning,” IEEE Transactions on Industrial Informatics, vol. 15, no. 4, pp. 2446–2455, 2018. View at: Publisher Site  Google Scholar
 S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 539–546, San Diego, CA, USA, June 2005. View at: Publisher Site  Google Scholar
 Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep learning face representation by joint identificationverification,” Advances in Neural Information Processing Systems, vol. 27, pp. 1988–1996, 2014. View at: Google Scholar
 F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: a unified embedding for face recognition and clustering,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 815–823, Boston, MA, USA, June 2015. View at: Google Scholar
 L. Wen, X. Li, and L. Gao, “A new twolevel hierarchical diagnosis network based on convolutional neural network,” IEEE Transactions on Instrumentation and Measurement, vol. 69, pp. 1–9, 2019. View at: Publisher Site  Google Scholar
 Y. Wen, K. Zhang, Z. Li, and Y. Qiao, “A discriminative feature learning approach for deep face recognition,” Computer VisionECCV 2016, pp. 499–515, 2016. View at: Publisher Site  Google Scholar
 S. Weiwei, G. Yihong, T. Xiaoyu et al., “Finegrained image classification using modified DCNNS trained by cascaded softmax and generalized largemargin losses,” IEEE Transactions on Neural Networks and Learning Systems, vol. 99, pp. 1–12, 2018. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2020 Pengcheng Jiang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.