#### Abstract

Vibration signals of gearbox under different loads are sensitive to the existence of the fault and composite fault vibration signals are complex. Traditional fault diagnosis methods mostly rely on signal processing methods. It is difficult for signal processing methods to separate effective information from those fault signals. Therefore, traditional fault diagnosis methods are difficult to accurately identify those faults. In this paper, a one-dimensional convolutional neural network (1-D CNN) intelligent diagnosis method with improved SoftMax function is proposed. Local mean decomposition (LMD) decomposes the signals into different physical fictions (PF). PFs are input into the matrix sample entropy based on Euclidean distance (MESE), and the PFs which best reflect fault characteristics are selected. Finally, the PFs by MESE are used to train the CNN to identify the faults of parallel-shaft gearbox. Experiment shows that MESE can quickly and accurately select the PFs with the most significant fault features. 1-D CNN can get nearly 100% recognition rate with less time and the CNN of SoftMax improved can effectively eliminate LMD endpoint effect. This method can successfully identify single faults, combination faults, and faults under different loads of the gearbox. Compared with other methods, this method has the characteristics of high efficiency, accuracy, and strong anti-interference. Therefore, it can effectively solve the problem of complex fault signal decomposition of gearbox and can diagnose the gearbox fault under different load operation. It has great significance for gearbox fault diagnosis in actual production.

#### 1. Introduction

Gearbox, as one of the most important parts of mechanical equipment, has a wide range of applications and a high utilization rate. Therefore, it will bring unimaginable safety problems and high maintenance costs when the fault occurs. So, the fault diagnosis of gearbox gear is very important. However, the structure of the gearbox is complex. The gears should be able to bear a heavy load, under complex and changeable operating conditions. The large gears, small gears, and bearings in the gearbox are prone to failure parts.

Gearbox diagnosis can be made from many aspects, such as vibration signal, sound signal, current signal, and oil-based signal. These signals can reflect the status of the gearbox. The analysis of vibration signal is the most commonly used and best effect. The traditional fault diagnosis methods are the application of signal processing experience and manual feature extraction [1, 2]. The signal analysis method and feature extraction method are the main factors affecting the accuracy of fault diagnosis. Time-domain, frequency-domain, time-frequency domain analysis, and wavelet basis function [3], etc. are the most commonly used feature analysis methods in fault diagnosis [4, 5]. Rui et al. [6] designed a dynamic model of gear pairs with variable meshing stiffness to study the fault vibration characteristics of spur gear pair with local spalling defect. This method has great limitations due to the fact that gear spalling is likely to cause other faults and the fault is obvious in actual production. Brethee et al. [7] exploited a comprehensive dynamic model to analyze the effect of surface wear on the dynamic response of gears. It is based on the amplitude of grid vibration and the side band components increase significantly with the increase of wear degree. Sanz et al. [8] proposed a multistage algorithm for gear dynamic state monitoring. The meshing dynamics of gears is monitored by means of the gear state information. But gear state information is changeable and complex and difficult to determine.

Due to the complexity of noise and gearbox system, the vibration signal collected from the gearbox through the sensor is usually mixed with a large number of noise interference signals. If the fault is complex, the signal mixed with noise will become more complex. So, some recursive mode decompositions are very suitable for gearbox fault diagnosis [9, 10]. Wu et al. [11] proposed a fault diagnosis method of planetary gear box based on two-dimensional variational decomposition (2-D VMD) and full vector spectrum technology and verified its correctness and effectiveness. Li et al. [12] proposed a VMD method that is complementary to the DDTFA method. Comparing this method with the current VMD and the overall empirical mode decomposition, it can be seen that VMD-DDTFA is very effective in fault diagnosis of gear crack and broken tooth measurement under variable working conditions. Xiao et al. [13] proposed a gear fault diagnosis method based on kurtosis criterion, variant modal decomposition (VMD), and self-organizing map (SOM) neural network. This method uses the VMD algorithm to decompose the gear vibration signal, extracts the kurtosis value of the IMF to form a feature vector and input SOM for diagnosis, and obtains a more ideal effect. But the SOM network is an unsupervised topology neural network, the final output size is difficult to determine, so it is not universal; Most of these methods use vibration signals or other signals through traditional wavelet, EMD [14], VMD, and other decomposition methods to decompose them into appropriate signal components [15, 16]. The type of fault is obtained by the parameters of the signal components. These methods can detect faults well, but they rely heavily on mechanical expertise and are inefficient due to the manual operation of actual diagnosis. Therefore, the development of traditional fault diagnosis in the direction of artificial intelligence is an inevitable trend. Therefore, the development of traditional fault diagnosis towards artificial intelligence is an inevitable trend.

With the development of artificial intelligence, machine learning is increasingly used in the field of fault diagnosis. The CNN has great advantages in feature classification. CNN [17] is currently mostly used for two-dimensional image recognition. Chen et al. [18] used two stacked CNNs to build a novel deep image saliency computing framework. The proposed framework highlights the objects of interest from complex background while preserving details. Sergey et al. [19] developed a method based on a partially observed guided policy search method and CNN, which was used to learn policies that map raw image observations directly to torques on the robot’s motors. CNN can be used to directly process large data or multidimensional data samples, which is beneficial for more detailed local feature extraction and retaining the relative relationship of multidimensional data, leading to the acquisition of improved identification results. Deng et al. [20] proposed an improved quantum-inspired differential evolution, which uses the MSIQDE with global optimization ability to optimize the parameters of the DBN and construct an optimal DBN model. Experimental results show that MSIQDE-DBN has higher classification accuracy. This is an improvement and perfection of the deep belief network. It shows that deep neural network is an algorithm suitable for fault classification. Chen et al. [21] proposed a convolutional neural network (CNN) for gearbox fault identification and classification, which is suitable for fault diagnosis of industrial reciprocating machinery. It also shows good performance in gearbox fault diagnosis, but the selection of parameters and hyperparameters is very difficult. John et al. [22] proposed a deep convolutional neural network (DCNN) based on hierarchical correlation propagation (LRP) method for gearbox fault diagnosis. It converts the vibration signals of time series data into time spectrum images by wavelet transform and then classifies them by DCNN. Yao et al. [23] input the time-domain and frequency-domain signals as the original signals to the end-to-end convolutional neural network (CNN) for gear fault diagnosis. This method is to analyze the sound signal and to identify the gear pitting fault problem. Li et al. [24] proposed a method that combines convolutional neural network (CNN) and gated recurrent unit (GRU) network and achieves an accuracy of more than 98% with fewer training samples. Li et al. [25] developed an improved deep neural network based on a domain-adaptive diagnosis model and used the original vibration signal for transfer learning. It used particle swarm optimization algorithm and L2 regularization algorithm to optimize the improved deep neural network. However, this method has higher requirements for data collection and classification. Chen et al. [26] proposed a gearbox fault diagnosis method based on feature learning of one-dimensional residual convolutional automatic encoder. This unsupervised learning method applies 1-D convolution automatic encoder for feature extraction and deconvolution for filter signal reconstruction. It performs well in signal denoising and feature extraction. But many times of convolution and deconvolution make the learning efficiency of network lower. The deeper neural network also makes the parameters more complex and less adaptable.

Through the survey and analysis of these literatures, most intelligent diagnostics rely heavily on traditional signal processing methods [27, 28], which makes intelligent diagnostics still limited by traditional methods. Many methods based on deep learning use two-dimensional CNN to analyze the image of the vibration signal. This makes the operating time and storage cost increase with the increase of data. As the neural network becomes deeper, multiple feature extraction and optimization such as feature image and convolution pooling will inevitably affect the accuracy of fault diagnosis, and the influence of irrelevant signals such as noise on the accuracy of diagnosis will also gradually expand.

Therefore, in order to solve these problems, local mean decomposition (LMD) and one-dimensional CNN (1-D CNN) are combined [29]. LMD has a better adaptive system and lower error rate compared with EMD, and the disadvantage of decomposition of finite vibration signal, endpoint effect, is greatly reduced. Compared with 2-D CNN, 1-D CNN directly extracts the vibration data, which reduces the time and improves the accuracy. Compared with the traditional method, it is no longer limited by fault types and complexity, and different faults can be obtained by training different fault signals. The method of calculating the distance between the features of SE, which is more suitable for the correlation analysis of PF, is improved. After the analysis, the selected components will be more suitable for CNN training. The mode of data input is changed from 1-D to 2-D matrix, which is to improve the extraction and calculation speed of SE. The SoftMax activation function of CNN was improved and it will reduce the impact of LMD endpoint effect on CNN. LMD is used for adaptive local decomposition of fault signals. The CNN feature normalization ability and feature extraction ability enable this method identify single fault, faults under different loads, and composite faults. This method gets higher accuracy. The recognition accuracy of this method is nearly 98.8%.

The research in this paper is organized as follows: Section 2 introduces the basic principle of LMD and the improvement method of endpoint effect. Section 3 introduces the improved distance algorithm of SE and the method of fast solved by matrix. In Section 4, the principles of 1-D CNN and improved SoftMax activation function are introduced. In Section 5, experiments and results of gearbox fault diagnosis are provided to verify the effectiveness of the method, and the results are compared with other methods. Finally, the conclusion and some works in the future are given in Section 6.

#### 2. LMD Principle Analysis and Endpoint Effect Compensation Method

LMD is to separate pure frequency modulation signals and envelope signals from the original signal. The PF component is the product of the frequency modulation signal and the envelope signal. The signal is repeatedly separated until the isolated PF component becomes monotone or reaches a given threshold. Each PF signal has its own unique physical meaning [30]. The specific decomposition process of any one-dimensional signal is as follows:

Find all the local extreme points of and call them . is the average value of and , which are every two adjacent extreme points. The algorithm is shown in the following equation:

Connect all and with a straight line and then apply the lines smoothly with the moving average method. The local mean function is called .

Then, use the local extreme point to get the envelope estimate . Connect all and . The envelope estimation function, , is obtained by smoothing with the moving average method.

The LMD decomposition process is a process of continuously separating high-frequency signals, but the actual signal length is limited [31]. This makes it necessary for LMD to infer the value near the endpoint when processing the final signal, which makes the local mean function and the estimation function generate severe errors.

A section of the original signal and the curve of the local mean function are compared to analyze the influence of the endpoint effect and its attenuation method. Figure 1 shows the envelope and local mean function of two actual vibration signals. The way to find the envelope of LMD is to find the maximum and minimum points of the signal, and the signal outside the endpoint is unknown, so this little piece of the envelope is going to be unknown. If it is an excellent situation, the endpoint value is exactly the extreme point, the local mean function and envelope function obtained will be accurate, and the endpoint effect is almost eliminated. The cosine function is used to verify the validity of the method. Figure 2 shows the component signals after the decomposition of two vibration signals.

**(a)**

**(b)**

Figure 3 shows the first PF component and the original vibration signal. After analyzing the results of the decomposition, it can be seen that the LMD component of the vibration signal whose end point passes through the origin is almost unaffected by the end point effect and can coincide with the original signal, while the other vibration signal has a large difference value after decomposition. The vibration amplitudes between the PF component and the residual component vary by 4 orders of magnitude, so the residual component can be ignored. It is proved that the end point of the original signal is the extreme point. The end effect can be mostly eliminated. So, reducing or moving the original signal to a qualified endpoint is a kind of effective means. The reduced dimension time is defined as and the original vibration signal function is , and substitute from the LMD decomposition formula. Since is very small, it hardly affects the values of *b* and *l*. It can be assumed that *b* and *l* do not change as a result of the signal moving. Separating the local mean function from the shifted delta function , which has been gained can be expressed as

**(a)**

**(b)**

The vibration signal is similar to the local periodic function. According to the basic principle of periodic function, when the function moves, the conclusion iswhere is the deviation of the function from the coordinate axis, which is a fixed value of a function. divided by the envelope estimation function demodulates to get :

Continue to iterate according to the LMD rule to the end. The process is as follows:

The termination condition for the iteration is expressed as

Multiply and the frequency modulation signal to get the first PF component which is reduced to the end effect:

Therefore,

Repeat the above algorithm:where and *PF* (*t*) are very different in order of magnitude, and can be processed by taking an extreme value, and is the ratio of the vibration offset to the endpoint offset. The endpoint effect of LMD can be reduced by a small shift value.

#### 3. Matrix Sample Entropy Based on Euclidean Distance

SE measures the complexity of the sequence by measuring the probability of generating a new pattern in the signal. The greater the probability of the new pattern, the greater the complexity of the sequence will be. The PF decomposed by LMD is actually a high-frequency signal with physical significance. Since the sample entropy has higher self-adaptability and consistency, sample entropy is easier to make sure whether the PF component’s regularity is suitable for CNN [32]. The basic principle and improvement of sample entropy are as follows:

There is a time series, , which is composed by *N* data. A set of vector sequences of dimension *m*, , composed by sequence number, , in this sequence is . These vectors represent *m* consecutive *x* values starting from the *i*-th point. The PF components of length *m* are arranged and calculated according to the following matrix. Each *X* is a vector sequence of PF components:where is defined as the distance between the vector is the absolute value of the maximum difference in the corresponding elements. The feature space is an n-dimensional vector space . , and the distance of is defined as

The Euclidean distance when *p* is equal to 2:

If *p* is equal to infinity, each distance is the maximum, which is the original distance of SE:

Because the fault data are all displacement, velocity, and acceleration signals in the same coordinates, this method converts the one-dimensional distance characteristic into a high-dimensional distance. The algorithm makes the features of various directions equally important and reduces the difference value of each feature point of the same component. The method can normalize the feature of each direction and better identify the PF component which can best represent the gearbox fault:

Calculate the matrix distance of for each *i*:

The number of *j* is the whose distance between and should be less than or equal to *r*. The number is called . If is right, the definition of is expressed as

Increase the dimension of the matrix from *m* to + 1 and then calculate the value of the matrix which is less than the threshold *r*. It is the distance, which is between is less than or equal to *r*.

The definition of is expressed as

When , it becomeswhere is the probability of two sequences matching *m* points under similar tolerance *r*, and is the probability that two sequences match *m* + 1. So, the sample entropy is defined as the following formula:

The value of sample entropy is related to the value of *m* and *r*. For mechanical vibration signals, Euclidean distance greatly reduces the difference between different dimensions of the same fault, and the matrix algorithm has greatly improved the rapid entropy value of large quantities of vibration data.

#### 4. 1-D CNN and the Improvement of SoftMax

##### 4.1. The Principle of One-Dimensional CNN

Both 1-D CNN and 2-D CNN have the same characteristics and the same processing methods. Their key difference is the data and the sliding method of the filter [33]. The working mode of 1-D CNN is studied through 2-D CNN. The upper figure of Figure 4 shows the working mode of a convolutional layer of 1-D CNN. Each row represents one element, and the convolution kernel convolves with the same width as the convolutional layer with a step length of 1, and the convolution is carried out 8 times. The below figure of Figure 4 shows the convolution method of a two-dimensional CNN. The convolution kernel is a 22 freely defined convolution kernel, and the convolution kernel is a three-layer convolution kernel with the characteristics of RGB, and the step size is 1. However, the convolution is carried out reciprocating convolution along a certain direction until the convolution kernel goes through the whole image.

In order to reduce the risk of overfitting, the convolutional layer usually has fewer parameters than the fully connected layer. Figure 5 is a 1-D CNN model training simulation graph. The original data are obtained by three convolution layers, three pooling layers, and three full connection layers to obtain specific eigenvalues.

##### 4.2. Weight Compensation SoftMax Algorithm

If the original data are processed directly by vibration migration, it not only needs to process each group of data but also the offset point of each group of data cannot be determined. In order to make the method more general, SoftMax is improved. Adding an adaptive offset correction to the function theoretically cannot completely eliminate the impact of the end effect, but it will no longer affect the recognition of the neural network.

SoftMax is a numerical processing of output unit that deals with multiple classification problems. The basic definition of the SoftMax function is as follows:

Because of the existence of exponential operation, the value after operation is often overlapped, so the element is degraded and extracted by taking a value and *V* is the output value:

For the linear classifier output, the input *x* is multiplied by the weight matrix, and the linear mixing function corresponding to the correct category is set as which is output for corresponding SoftMax. So, to log on operation does not affect the characteristics of the function. After a series of manipulations, the degradation results have become a simple SoftMax loss function. The function is expressed aswhere is usually the output of the fully connected layer in CNN. Put the weight of the LMD endpoint compensation, , into equation. In the formula, . So, the equation can be expressed as

In the iterative process, the centralization weight is . It changes the distance calculation method by metric learning which converts to . It is used to reduce class-inner-distance and increase class-out-distance. It is achievable that the centering weight compensates for the end point offset in the distance. The range that can be preset is [1, 5] when a model network is trained. Debug the parameters until CNN is optimal.

#### 5. Experiment and Verification

##### 5.1. Data Collection

The experimental data collection is carried out using the experimental rig as shown in Figure 6. Figure 7 shows the sensor distribution and gearbox internal structure. The synthetic experimental system is composed of a first-stage planetary gearbox and a second-stage parallel-shaft spur gearbox. In order to accurately diagnose the gear faults of parallel-shaft gearboxes, planetary gearboxes are equipped with intact gears without sampling. Experiment is performed with large gear and small gear in the parallel-shaft gearbox which is also an input shaft. There is servo motor, magnetic powder brake and controller, 4 magnetic accelerometers, and displacement sensors in the driving equipment. The sensors are placed in the axial and radial directions of the gear housing on the input and output sides to pick up the radial and axial vibration signals of the bearing housing. The large and small gears of the first input shaft are replaced. The large gear used in the experiment is S45 C gear with a modulus of 2 teeth and 75 teeth, and the pinion has a tooth number of 55, which is lubricated by immersion type. There are six different kinds of failure data, including gear broken teeth, gear pitting, pinion wear, gear broken teeth and pinion wear, big gear pitting and small gear wear, and normal conditions. The sampling frequency is 5120 Hz. Compound fault signals are collected by the system. Because the damage of one gear is likely to cause the damage of other parts, considering this situation, complex multigear faults are added into fault classification. Each dataset includes 6656 vibration signals collected by 9 sensors, 2 speeds, and 3 loads under each working condition, which is collected 8 times in each state interval. There are 2,160 sets of data, in which 1296 groups were used as training samples and other 864 groups were used as test samples. The gears’ fault patterns are given in Figure 8. Table 1 is the specific experimental sampling situation.

Table 2 shows 6 specific fault descriptions of the large and small gears of the input shaft of the gearbox and the load and speed during sampling. There are 21,600 groups of data which were collected by 8 sensors. 60% of the data were used for CNN training and 40% of the samples were used for testing.

##### 5.2. Local Mean Signal Decomposition

LMD is used to decompose the vibration signal into time domain, and they are shown in Figure 9. The figure shows the time-domain waveform of the normal state, the broken teeth of the big gear, the pitting erosion of the big gear, the wear of the big gear, and the pitting erosion of the big gear. They are all under the 0.2a load with the measured speed of 1475 rpm. Comparing these time-domain signals, the vibration interval is changing constantly. The waveforms of the last two compound faults are very similar, both of which have small amplitude and large random vibrations. Their average amplitude is also very similar, but the entropy of the large gear pitting and the pinion wear is larger than that of the large gear broken tooth and pinion wear. So, it is impossible to distinguish different compound faults only by time-domain and frequency-domain decomposition.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

Time-domain signal decomposition is performed for different loads and speeds of the same fault. Four waveforms in Figure 10 are the time-domain waveforms of large gear pitting pinion wear under load of 0.05 A, 0.1 A, and 0.2 A at 880 rpm. By comparing the time-domain waveforms under different loads, it can be noticed that the increase of the load under compound fault will result in the increase of amplitude jumping frequency and larger amplitude vibration. However, the average amplitude tends to decrease. It can be indicated that the load has an amplifying effect on the fault amplitude of the gear, but it will generate more classes for CNN training.

**(a)**

**(b)**

**(c)**

**(d)**

##### 5.3. Comparison of LMD and EMD

There are many similarities between LMD and EMD as signal decomposition. Both of them have end-effects [34, 35]. In order to compare the advantages and disadvantages of the two, the fault signal is decomposed by EMD and LMD in turn. Figures 11–13 show the waveform and envelope diagrams generated after decomposition.

**(a)**

**(b)**

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

Both methods use small frequency ratio modal aliasing for amplitude correction. The vibration signals are decomposed at 1447 rpm and the same speed without load. LMD decomposes signals into PF0, PF1, PF2, PF3, and the remaining components. High-frequency signals can better represent the characteristics of the signal. After decomposition, it is easy to see that PF0 and PF1 have a large difference in value; PF2, PF3, and the remaining components also show different change disciplines. EMD separates the trends of different feature scales in the signal layer by layer and it generates a series of signal components IMF1 to IMF6 with different feature scales. Comparing the IMF of the two compound faults, the endpoint effect of EMD is very serious [36]. Due to the end effect, the decomposition of the same signal can completely be useless. Because false end components will gradually interfere with the entire signal sequence from the end point, EMD is suitable to deal with faulty signals. The fault diagnosis with EMD has great disadvantages.

##### 5.4. Selecting the Most Useful PFs by MESE

The original signal standard deviation is set to 0.7. The entropy template vector length is set to 5 × 5, and *m* = 2 for sample entropy extraction. Figure 14 shows results of extraction for each state.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

The sample entropy curve of PF0, which is supposed to fall steadily, shows jump and reverse growth. Compared with PF2 and PF3, the signals are very similar without obvious pulsation. Therefore, sample entropy with standard deviation of 0.3 is collected for further comparison and the original sample entropy of the same data is collected for comparison. Table 3 lists the SE and MESE values of PF2 and PF3 with standard deviations of 0.3 and 0.7.

LMD is a method to separate signals from high frequency to low frequency successively, so PF2 should have higher sample entropy than PF3. Entropy of PF3 is higher than that of PF2, but it is lower than that of a second-order sample. This is caused by the endpoint effect of LMD in the vicinity of PF2 so that the entire PF3 data may become false. After analysis, this set of data PF2 is more suitable for CNN training than PF3. After comparison, it can be seen that the PF value obtained by MESE has a greater difference than the value obtained by the original one. In other words, MESE can better increase the difference between each entropy value than the original SE. The larger component of characteristic entropy is selected and used in CNN training. The CNN which trained by MESE has higher accuracy. Formula (26) ensures the MESE of a PF component should be monotonically decreasing. Formula (27) ensures the MESE of high-order PF component should be higher than the low-order:

##### 5.5. Construction of One-Dimensional CN

The weight-optimized SoftMax is used as the activation function of the final classification to train the one-dimensional CNN under different working conditions to classify complex fault types. In the training process, the Adam algorithm is used to optimize the model, and the generator which can greatly save the time to read the data is used to read the data.

The model is built using Keras in a Python environment. For the first time, a filter with a height of 10 is defined. In order to enable the CNN to learn a single feature, 100 filters are defined in the first layer. The output of the first layer is 71 100 matrices, and each column of the output matrix contains a weight of the filter.

Maximum pooling layer of size 4 is selected in the pooling layer, which makes the pooled matrix only 1/4 of the original matrix [37], and it reduces the complexity of the next one and the probability of data overfitting. Continuous convoluting and pooling make the state and fault characteristics of multiple sensors of vibration data appear as much as possible.

An average pooling layer is added to further reduce the probability of overfitting during training after multiple convolution and pooling. The average pooling layer is the average value of the two weights, and the output size is 512 1. Each feature detector has only one weight through this layer.

Due to the diversity of sensor sampling and the change of load, the ReLu function is used for activation in the first three, ranging from 0.1–0.6 loss rate for training and the last activation function is changed to sigmoid. It improves the generalization ability of the network. During training, dropout can be used to reduce the loss rate. The dropout layer can randomly assign neurons to 0. Because of too many convolutions and multiple sensors in different ways, the weight is assigned 0.8, which means 80% of the neurons will have zero weight. The network will not respond sensitively to small changes in data. This layer does not change the size of the output matrix. A value of 1.0 was assigned to the of improved SoftMax function and it forms a classification layer. It is determined that = 2.6 is the most appropriate after a series of training. In order to increase or decrease the number of test periods and improve the accuracy, the strategy of extracting stop method is adopted to stop in time when verifying errors of training. Table 4 shows the specific situation of CNN training.

5 kinds of data are input into one-dimensional convolutional neural network for training and testing, and parameters are adjusted repeatedly to adjust dropout to 1.0 and learning rate to 0.05, and SoftMax is selected as the activation function. The arithmetic mean value of each category index is used as the evaluation standard. The equation is expressed aswhere is the precision corresponding to the *i*-th category and is the recall corresponding to the *i*-th category.

The flow chart of final program is expressed as Figure 15.

##### 5.6. Comparison of Fault Recognition with LMD-MESE-CNN and Other Methods

The raw data are input into CNN for fault diagnosis; the result is shown in Figure 16(a). The accuracy is only 0.71. A preliminary conclusion can be drawn: the model that directly trains the CNN has a low recognition rate and the net cannot be used for fault diagnosis. All PF components are directly input into CNN for fault diagnosis; the result is shown in Figure 16(b). After LMD decomposition, the faulty data are obviously distributed in each PF component. If all PF components are input into CNN for training, the accuracy rate of CNN drops to only 10%. Therefore, neither data input method is suitable for fault diagnosis.

**(a)**

**(b)**

The analysis curve and the obfuscation matrix are shown in the left of Figures 17–19. The corresponding obfuscation matrices [38, 39], whose schematic is in Figure 20, are shown in the right of those figures. The accuracy of EMD-CNN training models keeps increasing, but there are some random declines and the accuracy cannot be achieved appropriately. When the selected PF signals are input into CNN for training, the net obtains high accuracy and low loss value. However, feature curve does not increase linearly as the number of training data increases. It can also be seen from the confusion matrix that most conditions can be separated. But a small number of confusion features does not. Because some false data are generated by LMD endpoint effect, errors are produced in the training model. The location of these errors is difficult to estimate, so it is essential to find a way to reduce the loss caused by the endpoint effect [40]. The last one shows that those PFs, selected by MESE, are input into CNN with weight-SoftMax and it is found that the curve jump is disappeared. The results show that one-dimensional CNN with selected PFs can effectively reduce the impact of endpoint effect. This net can accurately identify combined failure. Table 5 shows all the accuracy and loss of these methods.

**(a)**

**(b)**

**(a)**

**(b)**

**(a)**

**(b)**

##### 5.7. Comparing the Proposed Method with Other Methods

Compared with the traditional 1-D CNN, the signal analysis method based on LMD and MESE proposed in this paper can not only reduce the network parameters and speed up the training speed but also obtain good results for the diagnosis of mixed faults. The network structure proposed by this method is compared with other methods proposed in accuracy and running time. As shown in Table 6, the test accuracy of the DCNN method based on LRP is 99.90%, which is very close to the proposed method. It can be seen from Table 6 that the deep learning method has a higher accuracy for the compound fault diagnosis of the gearbox. The method proposed in this paper is slightly better than other deep learning methods on the testing set.

The number of training parameters and training time of the four best methods are shown in Table 6. It can be seen from Figure 21 that the training parameters of 1-D CNN fault diagnosis method based on LMD and MESE proposed in this paper are about 10% less than that of the traditional 1-D CNN. The training parameters of this method are much lower than those of LRP-DCNN and GRU-CNN method. Although LPR-CNN has a high accuracy rate, the training parameters and time are much higher than the proposed method. The training time of the traditional CNN method is shorter, but the accuracy rate is much lower than the proposed method. The method proposed in this paper can reduce the number of training parameters and reduce the training time based on excellent diagnostic performance.

In the future, transfer learning to reduce the time of data tabulation and classification will be applied to achieve better universality and higher efficiency.

#### 6. Conclusions

For vibration signals extracted from a working gearbox contain complex noise, nonlinearity, and nonstationarity, a one-dimensional convolutional neural network algorithm (1-D CNN) with weighted activation function based on the advantage of CNN depth to extract signal features is designed. Then, an improved matrix sample entropy based on Euclidean distance (MESE) is proposed to improve the selection accuracy of components, speed up the classification rate, and avoid the identification error caused by too large difference of PF components. The LMD signal decomposition method is used to decompose the fault signals. The end effect of LMD is solved, and the appropriate PF component is obtained. The gearbox fault data with 24 different loads were used to verify the effectiveness of the proposed method. From the experiment results, the recognition speed of PF component by MESE is several times faster than SE, and PF component with greater difference can be obtained. For the end effect of LMD, the improved 1-D CNN algorithm applies the method of weight compensation, which can eliminate the effect from the signal starting point and obtain the correct PF component. For the fault signal of different loads, four times convolutions and poolings and parameters adjustment of neural network can effectively extract the fault feature. The use of 1-D CNN greatly accelerates the speed and accuracy of convolutions and poolings. It avoids the problem of overfitting and too much parameter and reduces the running time of the algorithm. Therefore, this method can solve the end effect of LMD, accelerate the screening of component differences and classification, and quickly diagnose the complex faults and faults with different loads. This study provides a good choice for the gearbox fault diagnosis with nonideal conditions. The fault diagnosis based on vibration signals is not only for single faults but also for more complex and uncertain gearbox faults.

Due to the deepening of the number of CNN layers, more data are needed for multiple types of fault recognition, which makes the method take more time to train new models for new faults. It is important to make the trained one-dimensional CNN model have better versatility and mobility in the future work.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request. Data were collected from the parallel-shaft gearbox in the Fault Diagnosis Laboratory of Inner Mongolia University of Science and Technology. The data are precious, so the tutor hopes that the data and code will not be uploaded to the database. If it is necessary, the readers can contact the corresponding author via email: [email protected].

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.