Abstract

In the actual environment, there are difficult points such as complicated mechanical system fault types, random fault locations, and inconspicuous minor fault signals, which make it difficult to accurately diagnose faults. This paper proposes a new method for fault diagnosis of an adaptive multisensor bearing-gear system based on GAF/MTF (Gramian angular fields and Markov transition fields) and ResNet (deep residual network). First, we establish a multisensor signal acquisition system to monitor the running signals of the bearing-gearbox composite test bench in real time. Faulty parts include multiple types of composite faults of different sizes, different fault types, and different transmission stages. Second, based on GAF/MTFs, the multichannel timing signal collected by using the acquisition system is converted into multichannel pictures, and pictures are fused and compressed into three-channel pictures. Finally, we input these pictures into ResNet for fault diagnosis. The experimental results show that the GAF/MTF-ResNet model has a recognition accuracy of 72.14% for a total of 520 classification label test sets under different motor speeds, different sampling times, and different types of faults. Among them, the accuracy of the motor speed and sampling time is close to 100%, and the accuracy of gearbox failure and bearing failure is 75.25% and 88.97%, respectively. This shows that the method provides new ideas for the composite fault diagnosis of mechanical systems under different working conditions and different types of faults and has theoretical guiding significance.

1. Introduction

With the progress of science and technology and the rapid development of the economy, mechanical equipment plays an increasingly irreplaceable role in people’s daily life. This puts forward more stringent requirements for the safety and stability of the equipment. Prognostic health management (PHM), as an extremely critical part of equipment operation and maintenance, has attracted widespread attention [1, 2].

Bearings and gearboxes, as the basic components of mechanical equipment, have the characteristics of large usage, high precision requirements, and the complex operating environment. Their failure will directly affect the overall operating status of the equipment. They have become a research hotspot in the field of PHM [3, 4]. However, there are still insufficient research studies on early failures, composite failures, and multiple types of failures of these components [5, 6].

For the diagnosis of a single component failure, Sun and Jia proposed a new method based on experimental data-driven random fuzzy evidence acquisition and intuitionistic fuzzy set fusion [7]. Meng et al. calculate the time-varying mesh stiffness of gears teeth with different crack lengths. The results show that the impulse factor is sensitive to fault characteristics [8]. Shao et al. use transfer learning for bearing fault diagnosis and have achieved good results [9]. Although these studies have good performance in the fault diagnosis of a single component, they have not considered the complexity of the mechanical system and compound faults caused by the coupling of various components in actual operation.

Gearboxes have become the main platform for compound failure experiments due to their wide range of applications and simple structure [10]. Compared with the signal generated by the failure of a single component, the signal, vibration, sound, current, torque, and other different signals, generated by the compound failure of the gearbox is more diverse and complex. These signals have different performances and require different sensors to collect. Different sensors have higher sensitivity to specific fault types. Therefore, fusion of signals from multiple sensors is a key link in gearbox fault diagnosis [11]. Feature extraction and selection of different types of signals have become two major difficulties in multisensor information fusion [12, 13].

The neural network has the advantages of self-adaptation and high precision and has been widely used in various fields [14, 15]. As a typical structure of the neural network, the convolutional neural network (CNN) [16] has been used in image recognition [17], pose estimation [18], and other fields. Deep convolutional neural networks (DCNNs) [19] is to meet people’s higher precision requirements for neural networks, which are obtained by further deepening the number of network layers based on the CNN network structure. Studies have found that DCNN has better recognition accuracy than CNN [2022]. The deep residual network (ResNet), as a kind of DCNN, solves the degradation problem of deep networks [23]. Therefore, the network can be used to build a super deep network, such as 101-layer depth to solve complex problems [24, 25].

In the field of mechanical fault diagnosis, DCNN has also been well applied. Shao et al. use the wavelet transform to generate time-frequency distribution (TFD) images from multisensor signals and then use DCNN to learn discriminative representations from TFD images to perform fault diagnosis of asynchronous motors [26]. Based on multisensor data fusion, Jing proposed an adaptive multisensor data fusion method based on DCNN for fault diagnosis. The results demonstrate that the proposed method can detect the conditions of the planetary gearbox effectively [27].

Although these studies have made in-depth research on the fault diagnosis of mechanical components, there are the following problems [28]:(1)At present, much research exists on the fault diagnosis of a single component of the mechanical system, and the research on coupling faults is not in depth. In particular, coupling failures of different transmission stage components have not been studied [26, 27].(2)The operating conditions are single, fault types are few, and the diagnostic results are single [29], and the complex operating state under actual operation cannot be well simulated.(3)The classical neural network model has the problem of gradient disappearance or gradient explosion when dealing with long-term series data, which reduces the accuracy of fault diagnosis.

In response to the abovementioned problems, we built a multisensor acquisition system, including three-way accelerometers, microphones, current sensors, torque sensors, and rotary encoders based on the bearing-gearbox composite test bench. The designed fault types include different sizes and types of faults, individual faults, composite faults in the same transmission stage, and composite faults in different transmission stages. After that, the GAF/MTF algorithm is used to fuse the multidimensional time series into two-dimensional RGB images. These images are used to train ResNet in order to find the optimal model. Through the GAF/MTF algorithm, the complex multidimensional time series signal task is transformed into a two-dimensional image classification task suitable for neural networks. The multilayer advantage of ResNet can better deal with the transformed images. Finally, the multisensor bearing-gear system for multitype coupling fault diagnosis is realized, and accuracy is high.

2. Time Series Imaging

2.1. Gramian Angular Field and Markov Transition Field

Deep learning has the problem that it performs well in computer vision and pattern recognition, but it performs poorly in processing time series. In response to this problem, Wang and Oates proposed a GAF/MTF method for encoding time series into pictures [30]. This method can encode any type of time series into Gramian angular summation fields (GASF), Gramian angular difference fields (GADF), and Markov transition field (MTF) pictures, respectively. The loss of information after encoding is very small.

Given a time series of the length , the range of the interval is scaled to through min-max scaler.

The standardized time series is recalibrated by the polar coordinate encoding of Formula (2), where radian is the arc cosine of , the interval range is , the radius is determined by the timestamp corresponding to and the interval phase mapping, and is a constant factor to regularize the span of the polar coordinate system and is related to the number of time stamps included in the time series.

The coding of Formula (2) has the following advantages: the bijective mapping is realized from the time series to polar coordinates; that is, given a time series point , there is only one point corresponding to it in the polar coordinate system. Contrary to the Cartesian coordinate system, the radius maintains the time dependence in the polar coordinate system, which can prevent the loss of the time label.

Therefore, in polar coordinates, we can identify the time correlation in different time intervals by calculating the triangular sum or difference between each point. We define GASF and GADF as follows:where is the unit row vector and subscripts i and j of represent the different rows and columns of the matrix GASF and GADF, respectively. After converting to the polar coordinate system, we regard the time series of each time step as a one-dimensional quantity space. By defining and , the GAFs of the two inner products are actually quasi-Gramian matrices .

The diagonal matrices of GASF and GADF are

GAFs have several advantages. First, they provide a way to maintain time dependence, because when the matrix position moves from the upper left corner to the lower right corner, the corresponding time also increases. Second, they include time correlation because represents the relative correlation of superposition or difference with respect to the direction of the time interval . The main diagonal is a special case when and contains the original value or angle information. Finally, the time series can be reconstructed from the main diagonal. However, when the length of the original time series is , the size obtained by the Gramian matrix formula is , which leads to larger GAFs.

Similarly, for time series , we define MTF as follows:

By dividing the data into Q quantiles, the Markov transition matrix (W) of can be established. Among them, and are the data quantiles containing the time stamp and (temporal axis), respectively. is given by the frequency with which a point in the quantile is followed by a point in the quantile . By considering the time position, the matrix containing the transition probability on the amplitude axis is expanded into the MTF matrix (M). The main diagonal is the probability from each quantile to itself when ((the self-transition probability). The MTF matrix (M) is more sensitive to the distribution of and the time dependence of the time step than the Markov transition matrix (W), which is constructed by directly calculating the transitions between quantiles along the time axis in a first-order Markov chain [30].

2.2. Multidimensional Time Series Imaging

We introduce the GAF/MTF framework into multisensor data fusion. First, the GAF/MTF unit is established, and then, the multidimensional time series imaging framework is established based on this unit. The framework is used to synchronously compress and fuse the time series collected by different sensors into RGB images of a specified size according to the timestamp.

GASF, GADF, and MTF conversion are performed on the time series of the length , and a three-dimensional matrix of size is obtained. Each layer in the third dimension of matrix I corresponding to GASF, GADF, and MTF conversion results from top to bottom. Because the size of the matrix is related to the length of the time series , when is large, the size of the matrix will be too large, which is not conducive to subsequent calculations. Therefore, after downsampling I with a decimation filter, a matrix of size is obtained. The GAF/MTF unit is established, as shown in Figure 1, where the input is a single-channel timing signal and the output is expanded into three two-dimensional single-channel pictures, represented by grayscale images.

Further more, for the multichannel timing signal {}, the number of channels is and the signal length is . After GAF/MTF conversion, a matrix set containing pictures of size is obtained. The set size is .

In Formulas (4)–(6), the larger the value of the calculation result, the more obvious the feature. We propose the concept of vertical maximum pooling; that is, the maximum pooling window with a depth of is established at the corresponding position of the multidimensional picture, and the window size and sliding step are set to 1. By taking the maximum value in each window, the size of the matrix set is changed from . After that, the GASF layer, GADF layer, and MTF layer are placed on the red layer, green layer, and blue layer, respectively, and saved as RGB images. The flowchart is shown in Figure 2.

Through this method, the time series signal classification problem is converted to an image classification problem, which will bypass the shortcomings of deep learning that does not perform well in processing the time series signal. At the same time, it realizes the feature-level fusion of the signal, reduces data dimension, and is more conducive to the calculation of subsequent steps. The GAF/MTF conversion used does not need to set hyperparameters, which avoids the interference caused by manual feature selection.

3. Deep Residual Network

3.1. Architecture of the Deep Residual Network

For the two-dimensional image classification problem, there are already relatively mature neural networks available for use, including AlexNet [31], GoogleNet [32], and VGG [33]. These networks are deep convolutional networks. With the improvement of people’s requirements for image classification accuracy and data volume, the depth of the network model continues to increase, and the learning ability is further enhanced. However, the degradation problem of the deeper network may cause a higher error rate than the shallower network [34]. In response to this problem, He et al. proposed the deep residual network (ResNet), which has a special “shortcut connection” method compared with the previous network [23]. Figure 3 shows the two types of shortcut connections.

Through this structure, the input and output of the block undergo element-wise superposition. This simple addition does not add additional parameters and calculations to the network, but it can greatly improve the speed and effect of model training. When the number of layers of the model increases, this simple structure can solve the degradation problem well.

3.2. Training Network Parameter

In the process of model building, the training time of the network is the most important part of the time cost. We introduce some special mechanisms to improve the training speed of the network, including stochastic gradient descent with momentum (SGDM) solver, L2 regularization, and minibatch. The SGDM solver is based on the stochastic gradient descent (SGD) solver combined with first-order momentum to simulate the concept of momentum in physics. It replaces the true gradient by accumulating previous momentum [35]. The solver can achieve good acceleration in the early stage of the descent; in the middle and late stages of the descent, it jumps out of the local minimum; when the gradient changes direction, it suppresses oscillation and accelerates convergence. L2 regularization will add an L2 norm after the original loss function. This constraint usually imposes a large penalty on sparse and peaked weight vectors and prefers uniform parameters [36]. This effect will encourage the neural unit to use all the input of the upper layer instead of part of the input. Therefore, it will avoid the overfitting phenomenon. The minibatch divides a large training set into several small training sets, and each time only the samples contained in the small training set are used for training [37]. Compared with the batch gradient descent trained with all samples, the parameters using mini batch gradient descent are updated faster. It is conducive to more robust convergence and avoids local optimal solutions and can avoid the amount of data imported into the network at a time and reduce the hardware demand.

3.3. Adaptive Diagnostic Model for Multicondition Composite Fault of the Bearing-Gear System

An adaptive multisensor bearing-gear system for the fault diagnostic model based on GAF/MTF and ResNet is proposed. This model can fuse different lengths and different types of time series collected by using sensors at different sampling frequencies into two-dimensional images through GAF/MTF adaptive feature-level data. All two-dimensional images have the same size and number of channels. In theory, this fusion method can fuse arbitrary sensor data. ResNet used later, through its ultradeep network structure and special residual mechanism, can mine deep fault features in a relatively short time and make decisions to obtain the final diagnostic result.

As the network depth increases, the model accuracy and training time will increase accordingly. Therefore, we have selected ResNet18 for the model, which contain 18 convolutional layers, representing the deep network and ultradeep network.

The flowchart of the model is shown in Figure 4. First, different types of sensors collect experimental data. Sensors include three-way accelerometers, microphones, current sensors, torque sensors, and rotary encoders. The sampling frequency of sensors is related to the motor speed. After that, the collected experimental data are preprocessed, including operations such as deleting invalid signals and randomly dividing signal segments. Through these operations, the -channel sensor data can be preprocessed to obtain a matrix of size . In the matrix, indicates the length of the signal segment and represents the number of signal segments divided. The preprocessed time series signal matrix uses GAF/MTF for feature-level data fusion to obtain an RGB image dataset of size [m, m, z]. Finally, ResNet is used to classify the image set to obtain the fault diagnostic result.

In Figure 4, the input of the model is the multichannel time series collected by using sensors, and the output is the diagnostic result, which realizes end-to-end diagnosis. In the operation of the model, the internal structure can be adjusted adaptively without manual participation. The adaptive step includes the automatic fusion of time series signals of different lengths into pictures of equal size in the GAF/MTF conversion; in the training of ResNet, the adaptive adjustment of hyperparameters facilitates the sharing of parameters in network operation.

4. Experiment and Analysis

4.1. Construction of a Test Bed

The mechanical part of the bearing-gear system fault diagnostic test bench includes a drive motor, a bearing seat with replaceable bearings, and a planetary gearbox, as shown in Figure 5. The bearing is fixed in the bearing seat by the bearing end cover and can be replaced. The planetary gearbox includes a sun gear at the driving end, three planet wheels surrounding the sun gear, and a ring gear fixed to housing. The motor drives the bearing and the sun gear to rotate synchronously through the transmission shaft, and the planetary gear is driven by the meshing of the sun gear. There is a long transmission distance between the bearing and the gearbox. By replacing the faulty parts of the bearing and the gearbox, it is possible to simulate compound failures under different transmission stages.

In this study, we built a multisensor acquisition system, as shown in Figure 6. The acquisition system includes two three-way accelerometers, two microphones, torque sensors, rotary encoders, and current sensors. The three-way accelerometer and the microphone are, respectively, arranged at the bearing seat and the planetary gearbox; the torque sensor is arranged at the output end of the motor; the rotary encoder is arranged at the input end of the planetary gearbox; the current sensor is arranged on an input line of the driving motor. Table 1 shows the types of sensors used in the experiment and the types of signals collected separately. Through these sensors, various data during the operation of the experimental platform can be monitored synchronously and saved by using the NI industrial computer.

4.2. Types of Faulty Parts

In the experiment, the artificial faults of bearings and gears were made to simulate the faults generated under the actual operation of the equipment. The bearing is a cylindrical roller bearing, which is divided into two structures: the outer ring is detachable, which is used to make the outer ring and rolling element failure; the inner ring is detachable, which is used to make the inner ring failure. The two bearings have the same size. Regarding the gearbox, we made different types of failures of the planetary gear and the sun gear. Figure 7 shows the types of faulty parts of bearings and gearboxes. Through the combination of fault types of bearings and gearboxes, many mechanical faults in the actual environment can be simulated.

Table 2 shows a list of faulty parts, corresponding to the faulty part coding shown in Figure 7. Three processing methods are mainly used in the production of faulty parts, electrodischarge machining (EDM), wire cutting, and milling. Among them, wire cutting is used to produce larger faults; EDM and milling are used to produce smaller faults. There are 12 types of faulty parts and 2 types of normal parts in the experiment. The 12 types of faulty parts include 6 types of bearing failure parts and 6 types of gearbox failure parts.

Correspondingly, we designed the experiments, as shown in Table 3, including the no failure experiment (No. 1), single failure experiment (No. 2–No. 12), compound failure experiment in the same transmission stage (No. 13–No. 19), compound failure experiment in different transmission stages (No. 20–No. 25), and multiple compound failure experiment (No. 26). Each experiment in the table contains subexperiments with four motor speeds. The approximate speeds of the motors are 86.4 RPM, 288 RPM, 576 RPM, and 864 RPM, respectively. The precise speed can be calculated by using the rotary encoder on the test bench. A total of 104 experiments are performed.

4.3. Comparative Model

We divide the collected data into the training set, validation set, and test set at a ratio of 7 : 1 : 2. For the 26 types of faults at the four speeds, 5 sampling lengths of 0.2 s, 0.4 s, 0.6 s, and 1 s were used to intercept. Finally, a total of 520 categories are divided, and each category contains 200 signal samples.

In order to verify the superiority of the GAF/MTF-ResNet model, we established some appropriate comparison models based on the results of peer research. Figure 8 shows the structure of each comparative model. (a) In 2019, Shao et al. proposed the fault diagnostic model for multisensor [26], which uses the accelerometer and current sensor to collect the operating signal of the test bed, and then the authors perform wavelet transform on the signal collected by using each sensor to obtain the corresponding time-frequency diagrams and establish a multi-input single-output DCNN network for single fault diagnosis of the gearbox. (b) On the basis of (a) model, we increase the input sensor signal to 10 types, including the vibration signal, current signal, torque signal, and sound pressure signal. (c) Jing et al. proposed a multisensor diagnostic model for compound faults in 2017 [27]. The model preprocesses the signals collected by using sensors and directly inputs them into the one-dimensional deep convolutional network (1D-DCNN) model. The output of the network is the type of fault. This model uses some convolutional layers to replace specialized feature extraction units. (d) Thus, we have built the GAF/MTF-ResNet model for multisensor complex fault diagnosis.

All comparative models are built using the Keras framework [38] based on TensorFlow [39], and Numba [40] is used to accelerate training. TensorFlow is an open source code developed and maintained by Google Brain. It is a symbolic mathematics system based on dataflow programming, which has been widely used in various machine learning. Keras is an open source neural network library written in Python and has a high-level application program interface with TensorFlow. The code structure is written by an object-oriented method that is fully modular and extensible, which simplifies the difficulty of model building. Numba is a JIT compiler that can compile Python functions into machine codes. The Python code is compiled by Numba (only for array operations), and its running speed can be close to those of the C code or Fortran code.

The model running environment is the Windows10 system equipped with the NVIDIA TITAN XP graphics card, and the GPU is used to accelerate training.

Table 4 shows the parameters of each model including the comparative model. Among them, the parameter setting of the batch size should be chosen as large as possible based on enough computer video memory. Keskar et al. [41] and Smith et al. [42] proved that when other conditions are the same, a larger batch size will be selected, and the model will show better performance.

4.4. Experimental Results and Analysis

Figure 9 shows the model using ResNet18. As the number of epochs increases, the accuracy of the training set and validation set changes. In order to ensure the reliability of the experimental results, for composite faults, if the model does not predict all faults, the accuracy rate is considered 0.

In Figure 9, for ResNet networks of different depths, the accuracy of the training set can eventually reach 100% accuracy. As the epoch increases, the accuracy of the verification set does not decrease, which shows that the “shortcut connection” mechanism can well avoid the overfitting phenomenon. The accuracy of the validation set of ResNet18 can reach 71.92%, and the accuracy of the test set is 72.14%. The results show that the model has the ability to identify complex fault types. In order to test this ability, we carried out the experiments, as shown in Tables 5 and 6.

We conducted a more detailed analysis of the results of the test set. All classification labels are divided into five types: the gearbox fault label, bearing fault label, gearbox and bearing fault label, speed label, and time label, among which compound faults at the same transmission stage are regarded as separate fault types. The five types of labels show the accuracy of the test set under each label.

In Table 5, our model has a particularly good classification effect for speed and time labels, and the recognition rate is close to 100%. Compared with gearbox faults, this model has better classification capabilities for bearing faults.

In order to reflect the advantages of the model built compared to those of other models, experiments were carried out on all comparative models. Using the model parameters, as shown in Table 5, all the experiments used the same training set, validation set, and test set and the same running environment. Table 6 shows the results of the comparative experiments. We recorded the accuracy of the training set, validation set, and test set, as well as the training time of the model.

In Table 6, compared to that of other models, the GAF/MTF-ResNet model has better performance for the training set, validation set, or test set. The training time of the model using ResNet18 is not much different from that of other models. The wavelet-DCNN model performs poorly compared to the wavelet-DCNN (max) model because the former uses less sensor data than the latter. The 1D-DCNN model with the one-dimensional time series signal as input has the worst performance, and it has almost no classification ability for different types of faults. The reason may be that in the process of building the model, the input of the neural network must be guaranteed to have the same size, so the strategy of adopting signals of different lengths adaptively according to the motor speed is used in this experiment. These signals are scaled in data preprocessing. In the process of scaling, the loss of information is inevitable, and especially, the loss of information from the one-dimensional time series signal may be more obvious, which ultimately leads to poor performance of the model.

5. Conclusion

In this paper, a GAF/MTF-ResNet model based on multisensor detection is proposed for coupled fault diagnosis of bearing-gear mechanical systems. The model first uses the GAF/MTF algorithm, which bypasses the gradient disappearance or gradient explosion problem existing in the classic neural network model, to extract multichannel time series signals of different types and lengths into two-dimensional RGB pictures through features. Then, we use the ResNet network to train and classify picture sets, and finally, we realize the diagnosis of different fault types.

In the process of building the model, the artificial selection of key parameters such as feature extraction was avoided, and the end-to-end operation of the model was realized. The experiment we designed has complex failure types, small failure sizes, and low motor speed, which requires the model to have high resolution capabilities. More importantly, we have designed mechanical faults in different transmission stages so that the experimental content can more realistically simulate the coupling faults in the actual complex working conditions of the mechanical system.

The GAF/MTF algorithm is used to realize the fusion and dimensionality reduction of multidimensional information based on the time-frequency domain information from the original signal. Moreover, this algorithm transforms the original time series classification problem into a two-dimensional image classification problem, which can avoid the shortcomings of the neural network’s poor performance in processing time series. ResNet introduced afterwards has a deeper number of layers and better performance than other networks, and the training time has not increased excessively.

The experimental results show that the GAF/MTF-ResNet model has a training set accuracy of 100% and a test set accuracy of 72.14% for 520 types of experimental tags with different speeds, different sampling times, and different types of failures. Among them, the resolution of the speed and sampling time is close to 100%, and the resolution of gearbox failure and bearing failure is 75.25% and 88.97%, respectively. It has high application performance and further research value.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Natural Science Foundation of Shandong Province, China (No. ZR2021ME026), the Natural Science Foundation of Shandong Province, China (No. ZR2020QE158), the Shandong Province Science and Technology Small and Medium Enterprise Innovation Capability Improvement Project, China (No. 2021TSGC1045), and the Key Research and Development Project of Qingdao, China (No. 21-38-04-0002).