Abstract

As the most basic component of rotating machinery, rolling bearing frequently works in harsh environments and complex working conditions, and its health status affects seriously the working efficiency. The health statuses of rolling bearing can not only reduce equipment maintenance costs but also contribute to reducing major accidents. Based on this, an adaptive diagnosis method that combines deep gated recurrent unit (DGRU) with wavelet packet decomposition (WPD) and extreme learning machine (ELM) is proposed for rolling bearing. Firstly, WPD is utilized to eliminate the noise of data. Secondly, DGRU is designed to extract the representative features of denoised data. Finally, ELM is utilized to output the diagnosis results. Massive results prove that the superiority and robustness of our approach outperform existing popular methods. Additionally, the proposed method can also achieve powerful antinoise ability.

1. Introduction

The health state of rotating parts directly affects the operation reliability of the whole mechanical system [13]. Once the rotating parts fail, it will cause serious accidents. Machinery and equipment are widely used in various industrial scenarios and electrified transmission systems, and sometimes, this equipment may run under unfavorable conditions, such as high temperature, high humidity, and high load environment, which will eventually lead to equipment failure and cause high maintenance of high maintenance cost, serious property loss, and safety hazards. The faults of mechanical equipment can usually be attributed to different types of faults, including driving inverter faults, stator faults, rotor faults, and bearing failures. According to statistics, bearing faults are the most common types of faults, and the incidence of failure reaches 30% to 40% [46]. Since bearing is the most vulnerable parts of mechanical equipment, the diagnosis of accurate bearing faults has been a study of engineers and scientists in the past few decades. Therefore, an effective rotating machinery condition monitoring and fault identification system are established to ensure the safe operation of equipment and personnel safety. As the most basic component, bearings frequently work in harsh environments and complex working conditions, and its health status affects seriously the working efficiency [710]. The health statuses of rolling bearing can not only reduce equipment maintenance costs but also contribute to reducing major accidents [11, 12].

Fault diagnosis methods based on deep learning are booming. This method is based on data-driven methods and integrates feature learning and intelligent recognition. Compared with traditional methods, it gets signal preprocessing and expert knowledge, especially when analyzing massive monitoring data. Bearing fault diagnosis has long been a hot topic of research [1315]. Deep learning methods have made lots of achievements on the advent of avoiding extracting manually features [1619]. However, most of these models can exhibit excellent performance under the premise that the data have the same data distribution. Unfortunately, it is difficult or even unrealistic to meet the premise when considering complicated operating conditions, the degradation of equipment performance [2023]. The diagnosis performance of most deep learning models will be greatly reduced when the premise is not accessible. Some researchers try to use fine-tuning algorithm or retraining model strategy to tackle the above issue, but a few labeled target data still need to be provided. Collecting labeled data requires lots of expenses or is even impossible in actual scenarios. Hence, it is very necessary to explore some promising methods that can apply the knowledge from relevant areas to solve problems. The generative adversarial network (GAN) was innovatively designed by Goodfellow et al., which utilizes the adversary between generators and discriminators for generating data with the same distribution as the raw. However, the adversarial mechanism renders the model challenging to be in equilibrium. Hence, many scholars have offered research solutions for further improving the GAN model. Radford et al. [24] proposed deep convolutional generative adversarial networks (DCGANs) fusing CNN with GAN, which avoids the GAN model to converge the learned data distribution to the same one.

Unlike DAE, DBN, CNN, and GAN, RNN is still in its infancy in diagnosis field. The main reason is conventional RNN that has an unignored problem-gradient vanishing [25]. Gated recurrent unit (GRU) can solve this problem [26]. GRU, as the newest variant of RNN, has achieved huge success in fault diagnosis issues [27, 28]. Thus, in this paper, GRU-based network is developed to effectively solve problems. However, the vibration signals are always contaminated by the noise that heavily influence the diagnosis performance of network [29]. Thus, wavelet packet decomposition (WPD) that has been recognized as an effective vibration signals denoising method is used for eliminating the noise of vibration signals [3034].

An adaptive diagnosis method that combines deep gated recurrent unit (DGRU) with wavelet packet decomposition (WPD) and extreme learning machine (ELM) is proposed for rolling bearing. Firstly, WPD is utilized to eliminate the noise of data. Secondly, DGRU is designed to extract the representative features of denoised data. Finally, ELM is utilized to output the diagnosis results. Massive results prove that the superiority and robustness of our method outperform existing popular methods. Additionally, the proposed method can also achieve powerful antinoise ability.

The specific arrangements of this paper are as follows: Section 2 describes basic theoretical knowledge. A concrete introduction of our method is given in Section 3. Section 4 analyzes the effectiveness. Conclusions are generalized in Section 5.

2. The Brief Theory of Gated Recurrent Unit

Similar to long short-term memory neural network (LSTM), gated recurrent unit (GRU) is also a method proposed to solve the problem, but it is simpler than LSTM [25, 26]. GRU uses an update gate and a reset gate. These two gates together determine the output of GRU [35]. The specific structure is shown in Figure 1.where and denote the sigmoid and tangent activation functions. , , , , , and are the weight matrices and element-wise multiplications.  is an activation at time t, and means a candidate activation.

3. The Proposed Method

Rotating machinery is applied to many fields. Rolling bearing is a necessary component to ensure the normal operation of rotating machinery. It has a direct impact on the accuracy and reliability of rotating machinery equipment. Therefore, rolling bearing faults are one of the most common reasons for rotating mechanical failures. Due to the long-term operation of rotating machinery under harsh and complex conditions, it is inevitable that faults will occur. Therefore, the state of machinery must be monitored in time to diagnose faults as soon as possible. One of the four key tasks is to find out whether the rotation of the machine is abnormal or not, and to predict the severity of the rotation of the machine. Due to the higher requirements for high performance, safety, and reliability, fault diagnosis of rotating machinery becomes not only more and more important but also more and more difficult. Therefore, in recent decades, rotating machinery fault diagnosis has received more and more attention and considerable development. This paper develops a new rotating machinery fault diagnosis method that combines a deep gated recurrent unit (DGRU) with wavelet packet decomposition (WPD) and extreme learning machine (ELM) to identify locomotive bearing fault conditions.

3.1. Wavelet Packet Decomposition Denosing

WPD is generally used to deal with nonstationary signals. It can analyze both time domain and frequency domain, and analyze the characteristics of signals locally. Wavelet transform decomposition mainly focuses on low-frequency signals and cannot decompose high-frequency signals containing a large amount of detailed information, such as rolling bearing vibration signals, remote sensing images, seismic signals, and biomedical signals. WPD is based on the idea of multiresolution analysis, that is, the signal can be decomposed and reconstructed in different frequency bands under the wavelet basis, which is suitable for dealing with discontinuous and nonstationary signals. WPD makes up for the shortcomings of wavelet transform. It can solve the signal energy from different decomposition scales. The multilevel division of frequency band can decompose not only low-frequency signals but also high-frequency signals, making the division of signals more precise. The signal decomposition process reflects the relationship between the wide-band signal and the fine band signal. The nonstationary vibration signal can approach the fault characteristic frequency of the system through WPD to obtain the instantaneous signal containing stationary signal components. The decomposition algorithm principle is to calculate the average and the difference between the first number and the average, and the system fault can be detected by analyzing the energy distribution in different frequency bands; WPD has neither redundancy nor omission. The vibration signals often contain noise that greatly influences the diagnosis accuracy. Thus, it is essential to eliminate the noise firstly. WPD is viewed as an effective method for vibration signal denoising [31]. WPD splits into two branches, low and high frequencies at all decomposition processes [36]. The three-layer binary trees of WPD are shown in Figure 2. The steps of signals denoising using WPD is illustrated in Table 1.

3.2. Deep Gated Recurrent Unit Construction

The operating conditions of mechanical equipment are changing, and the data label information of the training set data under most of the working conditions is unknown, and it is difficult to effectively train intelligent identification models. However, the training process of the above methods uses a small amount of labeling data, so it cannot be used to solve the problem of health status recognition under the case of unknown label information. Rotating machinery plays a vital role in the application of coal industry. Due to the long-term operation of rotating machinery under harsh and complex conditions, it is inevitable that faults will occur. Therefore, the state of machinery must be monitored in time to diagnose faults as soon as possible. One of the four key tasks is to find out whether the rotation of the machine is abnormal or not, and to predict the severity of the rotation of the machine. Due to the higher requirements for high performance, safety, and reliability, fault diagnosis of rotating machinery becomes not only more and more important but also more and more difficult. Therefore, in recent decades, rotating machinery fault diagnosis has received more and more attention and considerable development.

The health state of rotating parts such as bearings and gears directly affect the operation reliability of the whole mechanical system. Once the rotating parts fail, it will cause serious safety accidents and huge economic losses. Therefore, the establishment of an effective rotating machinery condition monitoring and fault identification system is of great significance to ensure the safe operation of equipment and personnel safety. Signal processing technology is an important subject of rotating machinery fault diagnosis, which has been widely used in various industrial fields. In addition, due to more and more attention, artificial intelligence technology has also been applied to rotating machinery fault diagnosis. Based on this, the illustration of DGRU is presented in Figure 3. X(s) means the denoised data processed by WPD. Y(s) and Z(s) represent the extracted first-layer and second-layer features.

The loss function is cross-entropy loss function that estimates the difference between the predicted label and actual label.where denotes the actual label and is the predicted label.

3.3. Extreme Learning Machine Classification

ELM is the result of improving the algorithm [37]. The structure is shown in Figure 4. x and t are the input vectors and output labels respectively, (W, b) are the weights and bias of input layer and hidden layer, β is the value of the implication layer and the output layer. The difference from the BP neural network trained by gradient descent is that the weights generated during the training of the limit learning machine, and there is no need to adjust after generation. The specific calculation formulas are shown in equations (4), (5), and (6), the hidden layer vector hi (i = 1, 2, …, n), where n is the number of input samples. T is the matrix composed of sample label vector, and H+ is the generalized inverse matrix of H. Based on the principle of the least square method, the whole process does not need feedback iterative adjustment.

Known from Reference [38], can be represented bywhere is the Moore–Penrose generalized inverse of .

3.4. General Steps

The health state of rotating parts directly affects the operation reliability of the whole mechanical system. Once the rotating parts fail, it will cause serious accidents. Therefore, the establishment of an effective rotating machinery condition monitoring and fault identification system is to ensure the safe operation of equipment and personnel safety. An adaptive diagnosis method that combines DGRU with WPD and ELM is proposed. The updating process of our method is described in Figure 5 with the following three steps.(i)Step 1: measure data from rotating machinery(ii)Step 2: eliminate the noise of vibration signals by using WPD(iii)Step 3: divide the denoised signals into trained and tested samples(iv)Step 4: DGRU with ELM is constructed to diagnose railway locomotive bearing faults(v)Step 5: the constructed model is used for learning the trained-samples features and verified by the tested-samples features(vi)Step 6: output the diagnosis result.

4. Experimental Verification

4.1. Experimental Data Description

Because rotating machinery under harsh and complex conditions, it is inevitable that faults will occur. Therefore, the state of machinery must be monitored in time to diagnose faults as soon as possible. One of the four key tasks is to find out whether the rotation of the machine is abnormal or not, and to predict the severity of the rotation of the machine. Due to the higher requirements for high performance, safety, and reliability, fault diagnosis of rotating machinery becomes not only more and more important but also more and more difficult.

In this section, a representative simulating high-speed bearings dataset is selected to validate the feasibility of the proposed method. Various indicators are adopted to prove the effectiveness of data augmentation. Simulated fault diagnosis experiments are conducted separately with other approaches to corroborate the superiority of the algorithm in this paper. The simulated high-speed bearing dataset is applied to the laboratory dataset. The dataset is provided by railway locomotive bearing. According to different health conditions and three damage levels, 9 health categories with 12.8 kHz sampling frequency are obtained to form the dataset. The experimental platform presented in Figures 6 and 7 represents the specific faults of rolling bearings.

4.2. Compared with Traditional Methods

To evaluate the effectiveness of our approach for noisy signals, the collected data are added white Gaussian noise (WGN) in this part as shown in Figure 8. The noisy signals are described aswhere represents the noisy signals, is the collected vibration signals, is the WGN, and k is the coefficient. Larger k means heavier noise.

In this part, k is 0.4 and then we obtain the noisy signals. Then, the noisy signals are one input. The denoised signals that are processed by WPD are another input. Figure 9 describes the noisy data and the denoised data of each condition, each condition contains 8,192 data points. There are two important points to be explained: (1) For the proposed method and DGRU with SVM classifier, the only input is the noisy signals. (2) SVM and ANN both have two inputs, the noisy signals and the denoised signals.

To demonstrate the superiority of our, six methods are considered as the comparison methods. More details about these methods are provided as follows. As in this article, the optimization algorithm is the Adam algorithm. The learning rate is 0.0002 in all experiments. The relevant parameters of these methods are determined by relevant literature and experiments so that these methods could achieve the best recognition performance for different diagnosis tasks. These results are depicted in Figure 10. The confusion matrix is illustrated in Figure 11. Table 2 shows the results per method in all tasks.

It is obviously observed from Table 3 that the average accuracy of the proposed method is 94.98%, which is obviously higher than the other five methods, which are 78.64%, 55.47%, 73.95%, 44.61%, and 58.85%, respectively. The standard deviation is only 1.10, obviously lower than the other five methods which are 2.35, 3.28, 2.25, 3.96, and 3.22. The results present that: (1) Comparing all the methods, we can clearly observe that DGRU, SVM, and ANN are all sensitive about the noise. (2) Comparing Method 1 with Method 2, it can be known that the denoised signals could make much better diagnosis accuracy. It also proves the necessity and effectiveness of the noisy signals processed by WPD. (3) By comparing Method 1, Method 4, and Method 6, it can be seen that the proposed method has much more accurate and robust performance than SVM and ANN. The main reason is that the deep architecture has a more powerful ability to learn functions. Therefore, it can automatically learn more appropriate internal error characteristics from the inputs and provide more reliable conclusions.

4.3. The Antinoise Ability of the Proposed Method

This part is mainly to research the influence of different noisy signals and the antinoise ability of the proposed method. To avoid the chance of result, each condition runs 5 times. The description of each condition and the average accuracy is shown in Table 4. The noisy signals represent the signals contain noise; the denoised signals represent the noisy signals processed by WPD. The concrete diagnosis result of each trail is shown in Figure 12.

The average accuracy and standard deviation of each condition are shown in Table 4. It can be known that for the DGRU with ELM classifier, the denoised signals could lead to much better accuracy and robust performance than the noisy signals. With more noise of the vibration signals, the diagnosis accuracy becomes lower and more and more unstable. No matter how powerful the denoising method is, the denoised signals could not be better than the normal vibration signals. It also can be found that with the increase of noise, the result of WPD dealing with noisy signals is also getting worse. However, the diagnosis result is also higher than 90%. According to the above-mentioned and diagnosis results, it can be confirmed that the proposed method has a powerful antinoise ability.

The classifiers with the identical parameters are employed for the same purpose. Ablation experiments are required for the CNN classifier to ensure that it is resistant to engineering noise interference. The settings and results of the ablation experiments are listed in Table 5. CNNs with different frameworks and parameters are adopted to fault diagnose on the dataset to select the best CNN diagnostic model. From Table 5, it is clear that framework A is more robust for better fault diagnosis compared to other structures. Accordingly, an alternative ablation experiment is conducted on the key parameters of CNN with framework A as the basis.

5. Conclusion

In this paper, an adaptive diagnosis method that combines deep gated recurrent unit (DGRU) with wavelet packet decomposition (WPD) and extreme learning machine (ELM) is proposed for rolling bearing. Firstly, WPD is utilized to eliminate the noise of data. Secondly, DGRU is designed to extract the representative features of denoised data. Finally, ELM is utilized to output the diagnosis results. Massive experimental results prove that the superiority and robustness of the proposed method outperform existing popular methods. Additionally, the proposed method can also achieve powerful antinoise ability.

In conclusion, we will further improve our model to deal with the challenge of transfer experimental data knowledge to diagnose practical engineering equipment faults in future research [39].

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

All the authors equally contributed in the manuscript for “conceptualization, validation, formal analysis, investigation, and resources.” All authors have read and agreed to the published version of the manuscript.