Abstract

The rolling bearing is one of the important parts of rotating machinery, while the degree of dependence on the machine is becoming heavier nowadays. Therefore, it is always necessary to monitor its operating status and diagnose faults. To better analyze the bearing vibration signal from the time domain and frequency domain and reduce information loss, we propose a model that decomposes the original bearing vibration signal with a length of 1024 by a two-layer wavelet packet. For the analysis, four low-frequency and high-frequency feature vectors of a length of 1024 are obtained as the input for the analysis model. The proposed model uses frequency subbands to automatically extract features from network input and then fuse the features. The accuracy of the model on a single load on the Case Western Reserve University (CWRU) dataset is 98−100%, which shows the diagnostic effect is satisfactory.

1. Introduction

The rolling bearing is an indispensable part of rotating machinery. Unexpectable damage to bearings has a huge impact on the operation of the equipment and immeasurable consequences. In the development of manufacturing, the use of machines demands reliability and safety for the better availability of the mechanical system [13]. Therefore, it is important to monitor the status and diagnose rolling bearings in the machine.

Effective fault diagnosis is required to avoid production safety accidents of machines and reduce equipment maintenance costs, which also improve productivity [4]. To detect the abnormal situation of rolling bearings in time, the vibration signals of rolling bearings from sensors under different working conditions are collected andused for intelligent fault diagnosis methods. Then, the analysis and processing of the signals are performed to identify the type of failure [57]. The intelligent fault diagnosis method mainly has two steps: extraction and classification of fault features.

Extracting fault features from the vibration signal is carried out in time and frequency domain analysis. The time-domain analysis calculates the peak, mean square, root mean square, skewness, and kurtosis for the recognition and characterization of the corresponding bearing fault. Frequency domain analysis includes wavelet transform [8], Fourier spectrum analysis [9], Hilbert–Huang transform [10], and other methods. The time-domain analysis and the frequency domain analysis have their limitations, and it is still necessary to select appropriate analysis methods according to different working scenarios of rolling bearings based on professional and technical knowledge. For domain analyses, deep learning is generally used. Deep learning depends on the learning and reasoning ability of the neural network to extract and analyze the vibration signal of the bearing and identify the fault by using complex nonlinear functions [11]. Therefore, deep learning in the fault diagnosis of the rolling bearing has attracted researchers’ interest, and the results are used for improving the stability of machine operation [12].

Based on the previous research results, a CNN is used to analyze the rolling bearing fault diagnosis [11] in the time-frequency domain analysis with the two-layer wavelet packet decomposition for this study. The method was combined with the multibranch one-dimensional convolutional neural network. The vibration signal of the rolling bearing and judging the fault category are then analyzed efficiently.

The proposed method has the following advantages:(1)Using two-layer wavelet packet decomposition, a multiscale analysis of bearing vibration signals is carried out in the time and frequency domain which results in reduced loss of fault information.(2)Each branch is designed with different convolution kernel sizes and steps to extract corresponding fault features according to the characteristics of low-frequency and high-frequency components, and fault classification becomes better after feature fusion.(3)The model structure has a strong generalization ability when diagnosing bearing faults across loads.

The rest of this paper is organized as follows: in the second section, the basics of wavelet packet decomposition and the convolutional neural network are introduced. The third section proposes the fault diagnosis model. The fourth section describes the experimental verification and analysis of the experimental results. The fifth section concludes this research.

2.1. Literature Review

The theory of deep learning was proposed by Hinton in 2006 [13]. Owing to its powerful feature extraction and learning capabilities, it has developed rapidly in various fields [3, 14, 15]. Janssens et al. used the global spectrum analysis to identify bearing faults by using a fast Fourier transform. The vibration signal was preprocessed to select the dominant frequency feature, and the main feature was extracted by using the principal component analysis. Finally, the extracted main features were classified using a linear discriminator [16]. For the first time, Janssen et al. used convolutional neural networks to diagnose the faults of bearings and gears in gearboxes. The time-domain vibration signals were converted into frequency-domain signals through discrete Fourier transform, and then the frequency-domain signals were used in convolutional neural networks. The input was used to identify the fault of the rolling bearing. The accuracy of this method in the experimental system was improved by 6% compared with the conventional algorithm [17].

Xu et al. proposed a hybrid deep learning model based on a convolutional neural network (CNN) and gcForest [18]. They used continuous wavelet transform to convert bearing vibration signals into time-frequency images, then used the CNN to extract bearing fault features, and finally input the extracted corresponding features to gcForest. The classification was performed in the classifier, and the experimental results showed that the model had a fault diagnosis accuracy of higher than 98% on data sets of different scales [3, 18, 19]. Jin et al. proposed an end-to-end deep convolutional neural network with a local sparse structure, by using local sparse nodes to replace high-dimensional convolutional layers and fully connected layers even in the presence of noise. 47% of the obtained parameters showed similar performance to the original method [20]. Wang et al. proposed an adaptive overlapping convolutional neural network (AOCNN) by using root-mean-square pooling layers to overcome displacement changes and edge problems[21]. Huang et al. used an improved deep CNN to implement multiscale information in bearing fault diagnosis [22]. Liu et al. proposed a time series dislocation CNN (DTS-CNN) to overcome the shortcomings of traditional CNN in the diagnosis of modern faulty motors [23]. Zhang et al. proposed a bearing fault diagnosis method based on deep 1D-CNN with the original vibration signal as input without any denoising processing and achieved high accuracy under noise and different loads [24]. [21, 25]. Wang et al. proposed a method to fuse multimodal sensor signals (that is, data collected by accelerometers and microphones) by using 1D-CNN to extract features from the original vibration signals and acoustic signals and perform feature fusion and obtain stronger robustness and performance with high accuracy [3, 19, 21]. Yao et al. proposed a superimposed inverse residual convolutional neural network (SIRCNN) to reduce the size of the model by using deep separable convolution. They introduced a residual structure to ensure the stability of the model in a noisy environment. The experimental results showed that the fault diagnosis model of the rolling bearing based on SIRCNN effectively identified the type and degree of damage to the bearing [25].

2.2. Backgrounds

Traditional time and frequency domain analyses perform a global transformation of the signal but cannot analyze the signal in both domains at the same time. Although wavelet analysis can be used for the time and frequency domain data at the same time, it decomposes the low-frequency signal only. As the high-frequency signal of the bearing vibration also contains fault information, the loss of fault information may occur. The wavelet packet decomposition is to decompose high-frequency signals with wavelet analysis for a better resolution of time-frequency analysis and to perform a multiscale detailed analysis of bearing vibration signals.

2.2.1. Wavelet Packet Decomposition

Wavelet packet decomposition is to convolve the signal through low-pass and high-pass wavelet filters. The equations of the low-pass filter and the high-pass filter are shown as follows:(: the Haar wavelet function.: the scaling function. the inner product operation, and are variables).

The calculation of the wavelet coefficients of the bearing vibration signal in each frequency band and decomposition layer is carried out with the following equations: (: the original bearing vibration signal.: the wavelet coefficients in the i-th decomposition layer and the j-th frequency band.: the translation operations, ).

We carried out two-layer wavelet packet decomposition on the bearing vibration signal S(0,0) and obtained two low-frequency and two high-frequency components. The frequency increases from left to right. The wavelet packet decomposition tree structure is shown in Figure 1. A schematic diagram of the relationship between the coefficients of wavelet packet decomposition and the size of the convolution kernel is shown in Figure 2.

2.2.2. Deep-Learning: Convolutional Neural Networks

The convolutional neural network (CNN) was inspired by the biological vision system and proposed by LeCun et al. [26]. The CNN consists of an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer [27]. The CNN only needs to include the forward propagation of the training data and the backpropagation of the error in the training process. Forward propagation is required to process the training data layer by layer according to the network parameters and obtain the probability of each category corresponding to the output training data. Backpropagation refers to adjusting the training parameters of the network by calculating the difference between the data type of the network output and that of the sample label. General classification problems use cross-entropy as a loss function to calculate the difference between the network output and the sample label. After each forward propagation is completed, the training parameters of each layer of the network need to be updated according to the loss value calculated by the loss function.

The convolutional layer is the essence of the entire convolutional neural network. The corresponding features of the input data are extracted through multiple convolution operations. Its characteristics of local connection and weight sharing reduce the parameters involved in the training of the network and avoid the slowing-down and the overfitting problem caused by too many parameters in the network calculation. The convolution operation generally changes the size of the original feature map, where the step size of the convolution kernel affects the output. The calculation of the convolutional layer is shown as follows:where represents the first layer output value features, represents the first layer .In the weight matrix of a convolution kernel, the operator represents the convolution operation, is for the first output of the layer, represents the bias term, and function represents the activation function of the output.

The activation function performs a nonlinear transformation of the output value in the convolution operation. Then, the convolution operation is repeated to extract further data abstract features. The commonly used activation functions in neural networks include the hyperbolic tangent function (Tanh) and the rectified linear unit (ReLU). The expressions of the two activation functions are shown in the following equations:

The pooling layer (the subsampling layer) and the down-sampling layer reduce the risk of network overfitting to save computing resources and the dimension of the feature map while maintaining important features and the calculation parameters in the network. Commonly used operations of the general pooling layer include maximum pooling and average pooling. Maximum pooling uses the local maximum value as the output, while average pooling uses the local average value as the output value.

The output layer realizes the classification of the features extracted after the convolution operation. The commonly used multielement classifier is softmax, which makes decisions through a probability distribution (equation).where represents the first . The output value of each neuron in the classification layer, and represents the general category.

3. Proposed Method and Analysis of Data

3.1. Proposal Experimental Method

A multibranch 1D-CNN network model structure is proposed for the analysis of the data. The structural parameters of the four branch networks are shown in Tables 14. The fault features extracted by the four branches are fused after passing through the flat layer. Then, after layers are fully connected, the extracted fault features are classified by the softmax classification function. A single sample with a sampling frequency of 12 kHz and a data point length of 1024 is subjected to two-layer wavelet packet decomposition. Lastly, we will calculate the flattened and concatenated neurons shown in Table 5.

The bearing vibration signal is decomposed into four wavelet packet coefficients, which are 0–1.5 kHz (LL), 1.5–3 kHz (LH), 3–4.5 kHz (HL), and 4.5–6 kHz (HH) as input for each branch network. After the convolutional pooling operation of each branch network, the fusion is performed in the feature layer, the fused fault features are input to the fully connected layer, and then the softmax activation function identifies and classifies faults.

Due to the long signal of the low-frequency band, a larger convolution kernel is used to cover the complete signal, and the stride becomes relatively large. The signal of the high-frequency band has a shorter period. A smaller convolution kernel is used, and the stride becomes relatively small (Tables 2 and 3).

A dropout operation with a deactivation ratio of 0.3 is added before the second convolutional layer of the four branch networks. Its random selection of connected neurons is used to achieve the effect of not relying on specific neuron connections and thereby enhance the generalization ability of the model. A fully connected layer with 64 neurons is added to the concatenate layer, and then the softmax classification function is used in the output layer for classification.

4. Experimental Results and Discussion

4.1. Dataset Description

The public data set provided by Case Western Reserve University (CWRU) is generally used in the research study of rolling bearing fault diagnosis. Therefore, we used the CWRU public data set to verify the proposed multibranch network model. The bearing vibration data provided by the CWRU data set were collected by the accelerometers placed on the drive end and the fan end with a sampling frequency of 12 and 48 kHz by a 16-channel data logger. The used bearing models at the drive and the fan ends were SKF6205 and SKF6203. To simulate the failure damage, single point damage by EDM is used on the ball, inner ring, and outer ring. The damage diameters were 0.007, 0.014, 0.021, 0.028, and 0.04 inches. The damage points of the outer ring of the rolling bearing are set at three positions: 90°, 180°, and 360°.

During this experiment, the vibration signal of the drive end bearing with the sampling frequency of 12 kHz under three load states of 1, 2, and 3 HP. The data set of each load has 9 different fault states and normal states (10 different states in total), and the original vibration signal is sequentially segmented into 1024 points as a sample. 1150 sample signals in total are obtained under each load, of which 900 are training samples and 250 are testing samples as shown in Table 4.

4.2. Verification of Results
4.2.1. Single Load

The proposed model is trained and tested on four workloads. The model uses the Adam optimizer with the loss function set to categorical_cross-entropy. The learning rate is set to 0.0001, and the batch is set to 5-fold cross-validation. To reduce random errors, the experiment is repeated 50 times to calculate the average value. The experimental results are shown in Table 5. The proposed model has the highest accuracy of 100% on four loads and the lowest accuracy of 98%. The average accuracy rate on four different loads reaches 99.82%.

The standard deviations of the 50-fold and 5-fold cross-validation for 0, 1, 2, and 3 HP are 0.16, 0.48, 0.11, and 0.13%, respectively. According to the standard deviation results under various loads, the proposed multibranch convolutional neural network has better performance under different loads and strong generalization ability, which shows it applies to various load conditions. The model accuracy under different loads is shown in Figure 3.

4.2.2. Performance in Different Workloads

In the actual industrial production environment, mechanical equipment does not operate at a constant load as different loads are required for different tasks. Thus, the performance of the multibranch network model is tested for various loads and compared with that of the other models in references such as fast Fourier transform-based support vector machine (FFT-SVM), FFT-multi layer perceptron (FFT-MLP), FFT-deep neural network (FFT-DNN), deep convolutional neural networks with wide first-layer kernels (WDCNN), text and image information-based convolutional neural network (TICNN), and ensemble TICNN. Tables 1, 6, 7, and 8 represent the datasets under the workload of 1, 2, and 3 HP, and “1 ⟶ 2” means that after dataset 1 is trained, it is tested on dataset 2.

The experimental results in Table 8 and Figure 4 prove the effectiveness of the proposed multibranch network model. The average accuracy rate of the five models across loads of the model is 96.4%, which is 29.8% higher than the average accuracy rate of FFT-SVM of 66.66%. The accuracy is 16% higher than that of FFT-MLP (80.4%) and 6.4% higher than that of WDCNN of 90%. It is slightly higher than that of TICNN and ensemble TICNN. The accuracy of the multibranch network model is 90.2% in the 3 ⟶ 1 scenario. In other scenarios, the average accuracy of the model is higher than 96%.

5. Conclusions

We propose a multiconvolutional neural network model for bearing fault diagnosis. The model processes the data of a sample length of 1024 which is divided into four low and high-frequency components of the length of 1024. These low-frequency and high-frequency components go through two-layer wavelet packet decomposition and are sent to the four branches of 0–1.5 kHz (LL), 1.5−3 kHz (LH), 3–4.5 kHz (HL), and 4.5–6 kHz (HH). The corresponding fault features are extracted in four branch networks. The low-frequency and high-frequency information in the bearing vibration signal is analyzed at the same time, which allows for minimal loss of fault information. After a feature fusion process in the proposed network, we use a softmax classification function to identify and classify the extracted fault features. In summary, the proposed model uses subband frequencies to extract bearing fault features from the output of 4 branches which are fused in the classification model. The proposed model has the highest accuracy of 100% on four different loads (0, 1, 2, and 3 HP) and the lowest accuracy of 98%. The average accuracy on four different loads reaches 99.82%, which suggests the effect of the proposed model in diagnosing faults in bearings. The average accuracy of the model across multiple loads reaches 96.4%, and the generalization ability is considered to be robust.

Data Availability

The nature of the data includes excel files, and the data can be accessed on the website: https://engineering.case.edu/bearingdatacenter/normal-baseline-data and fault (12 kHz) and https://engineering.case.edu/bearingdatacenter/12k-drive-end-bearing-fault-data. There are no restrictions on data access. The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the Ministry of Science and Technology grant (MOST 111-2221-E-035 -066 - and 110-2622-E-035 -006 -) in this research.