#### Abstract

It is crucial to carry out the fault diagnosis of rotating machinery by extracting the features that contain fault information. Many previous works using a deep convolutional neural network (CNN) have achieved excellent performance in finding fault information from feature extraction of detected signals. They, however, may suffer from time-consuming and low versatility. In this paper, a CNN integrated with the adaptive batch normalization (ABN) algorithm (ABN-CNN) is developed to avoid high computing resource requirements of such complex networks. It uses a large-scale convolution kernel at the grassroots level and a multidimensional 3 × 1 small convolution nuclear. Therefore, a fast convergence and high recognition accuracy under noise and load variation environment can be achieved for bearing fault diagnosis. The performance results verify that the proposed model is superior to Support Vector Machine with Fast Fourier Transform (FFT-SVM) and Multilayer Perceptron with Fast Fourier Transform (FFT-MLP) models and Deep Neural Network with Fast Fourier Transform (FFT-DNN).

#### 1. Introduction

The characteristics of vibration values are a crucial issue in a designed machine. A lack of balance of its rotating parts and elements can result in noncoaxiality of shafts [1]. Noise and vibrations can even occur in complex structures of ships and result in noticeable obstacle in ship construction [2]. In such a phenomenon, the vibration measurement can help identify dynamic state changes in a rotational system [3]. Accordingly, it is known that fault diagnosis of bearings is based on vibration signals, and a set of features is extracted to classify for rotating shaft fault prediction. The key to the success of fault diagnosis is determined by the choice of feature extractor and classifier [4].

The current research on intelligent algorithms for either diagnosis or prediction has been attracting more attention in past years. However, the feature extractor and classifier of neural networks may not suit the load change for maintaining a highly accurate outcome [5–8]. For this reason, a combination of different neural network methods attempted to achieve a highly possible solution in a variety of disciplines. For example, trademark image retrieval was implemented using a combination of deep CNNs. A pretrained process and the trademark image retrieval task were carried out by fine tuning the network using two different databases [9]. Another case is that recurrent neural network (RNN) was used to learn inner spectral correlations, while CNN was focused on saliency features and spatial relevance [10]. Recently, a hybrid pattern recognition method was proposed for short-term wind power forecasting. The time series of produced power is estimated through combination of preprocessing, feature selection, and regression steps [11].

To solve problems of fault diagnosis of bearings, the most ideal way may combine two aspects of feature extraction and classification into one model without loss of information. The convolutional neural network (CNN) algorithm has the characteristic of “end-to-end.” It means that it can complete the whole process of feature extraction, feature dimension reduction, and classifier classification through a neural network process [12–14]. The CNN is a multilayer neural network, including a filtering stage and a classification stage [15–18]. The filtering stage is used to extract the characteristics of the input signal, and the classification level classifies the learned features, where the two-level network parameters can be obtained through a joint training.

In recent years, two-dimensional convolutional neural networks containing stacked 3 × 3 convolution kernels, e.g., VGGNet, ResNet, and Google’s Inception V4, have been reported [19–21]. This type of CNN model can deepen the network learning [22]. At the same time, it can achieve a larger receptive field with fewer parameters, thereby suppressing overfitting. However, for a one-dimensional vibration signal, the structure of the two-tiered 3 × 1 convolution, at the expense of six weights, only acquired the receptive field of 5 × 1 so that its application is limited.

#### 2. The Proposed Model

##### 2.1. WFK-CNN Algorithm

The model of the Deep Convolutional Neural Networks with Wide First-layer Kernel (WFK-CNN) is characterized by large-scale convolution kernels at the grassroots level, followed by 3 × 1 small convolution kernels in the convolution layer. The convolution kernel size of all the convolution layers except the primary layer was set as 3 × 1. After each convolution operation, batch normalization (BN) is performed, and then maximum pooling of 2 × 1 is carried out. The structure of WFK-CNN is shown in Figure 1.

##### 2.2. Batch Normalized (BN) Algorithm

The BN algorithm can reduce the internal covariate transfer, improve the training efficiency of the network, and enhance the generalization ability of the network [19–21]. In this paper, a batch normalized layer is added between the convolutional layer and the active layer and between the full-connected layer and the active layer. Its main operation step is to subtract the mean of the Mini-Batch from the input of the convolutional layer or the fully connected layer before dividing it by its standard deviation so that the training process can be accelerated. However, this may limit the input value to a narrower range, reducing the network implementation ability. Therefore, the value of normalization is multiplied by a scaling amount *γ* and an offset *β* for enhanced expression. When BN acts on the fully connected layer, let the input of the *l*th BN layer be , and the batch normalization operation is expressed in equations (1)–(4):where is the input of the *l*th BN layer, and are BN layer scaling and offset, is BN layer output, and *ε* is the numerical stability constant term.

When BN acts on the convolutional layer, the batch normalization operation is represented as equations (5)–(8):

From the literature by Guo et al., 2018, and of each Mini-Batch are used to solve the differences of and of the entire training sample by unbiased estimation. This paper directly solves and of the entire training sample. Replace and with and during the test.

When BN acts on a fully connected layer, the loss function *L* on the neurons of the BN layer and the derivative of the hyper parameter are shown in the following equation:

When BN acts on the convolution layer, the loss function *L* on the neurons of the BN layer and the derivative of the hyper parameter are shown in the following equation:

##### 2.3. WFK-CNN Parameter Standard

The size, step size, and the number of convolutional layers in the WFK-CNN model are chosen for the base convolution kernel. The core of the convolutional neural network is the receptive field, which is the range of perception of a neuron in its underlying network [23–27]. The profile of neuronal receptive field is shown in Figure 2.

For the WFK-CNN filtering stage to learn displacement-independent features, the size of the receptive field in the input signal of the last neuron in the pooling layer should be greater than one cycle. Suppose that the size of the receptive field at the last input neuron in the pooled layer is , *T* is the number of data points recorded by using the accelerometer of the rotation of the bearing, and *L* is the length of the input signal. Then, can be used as a criterion for WFK-CNN parameter selection.

First, solve the relationship between the receptive field of the *l*th pooling layer and the receptive field at the *l* − 1th pooling layer, as shown in the following equation:where and are the step size and convolution kernel scale for the first convolutional layer and is the number of downsampling points for the first pooling layer.

When , , , and , equation (11) can be simplified to the following equation:

When and , the size of the receptive field of the first pooled layer neuron in the first pooling layer is shown in the following equation:where *n* is the number of convolutional layers.

Substituting equation (13) into equation (11), the receptive field on the input last neuron of the pooled layer is obtained in the following equation:

According to the design rule , the final design criteria can be determined as

The input signal length , and the signal period . If there are 5 layers of convolution, then the convolutional layer of the grassroots convolution can only have 8 or 16 convolution steps. The number of neurons in the network will increase when the number of convolution kernels, the scale, and the number of network layers, or the step size of the convolution kernel, are reduced, which can increase the expressiveness of the network but at risk of overfitting. Other hyperparameters of the network need to be further adjusted in the experiment according to the amount of training data [28–34].

The parameters of WFK-CNN model used in this study are shown in Table 1. It has five convolution and pooling layers. The size of the convolution kernel at the base level is 64 × 1, and the other convolution kernels are 3 × 1. The number of neurons in the hidden layer is 100, and there are 10 outputs in the layer softmax, corresponding to 10 kinds of bearing states.

##### 2.4. ABN-CNN Model

The ABN-CNN model is based on the integration of WFK-CNN algorithm and Adaptive Batch Normalization (Ad-BN) algorithm. Therefore, the domain adaptation ability can be effectively improved when the distribution of test samples is different from the distribution of training samples. The ABN-CNN model is briefly described in Table 2.

The flowchart of ABN-CNN algorithm is shown in Figure 3. The training samples are used to train the WFK-CNN model until the training process is completed. If the distribution of the training signal and the test signal is not consistent, the test set is input to the WFK-CNN model, where only the data are forward propagating, and the mean variance of all BN layers is replaced by the mean variance of the test set, but other network parameters remain unchanged.

#### 3. Performance Results

In this study, the data from the CWRU Rolling Bearing Data Center, which provides a world-recognized standard data set, were used for bearing fault diagnosis of the proposed model as well as the comparison with existing algorithms [35]. This ball bearing test data came from experiments conducted using a 2 hp Reliance Electric Motor. Accordingly, the actual test conditions of the motor can sufficiently support the proposed model for industrial applications.

##### 3.1. Performance Analysis of Bearing Fault Diagnosis in Noise Environment

In this section, the anti-noise capability of ABN-CNN algorithm is analyzed. First, the noise signal is added to test samples to simulate noise pollution in industrial environment. The noise of the diagnostic signal is generally additive Gauss white noise. Therefore, the test signals contain different levels of additive white Gaussian noise.

Signal-to-noise ratio (SNR) is defined as the ratio of *P*_{signal} and *P*_{noise}, represents the energy of signal and noise, respectively:

As shown in Figure 4, from top to bottom, the failure signal of the bearing from the inner ring without adding noise, the additive Gauss white noise signal, and the inner ring fault signal with noise are presented, respectively, where SNR = 0 dB. It can be seen that it is difficult to directly extract valid fault information from noise-containing signals.

##### 3.2. Model Training and Testing

Based on the recognition rate, the training convergent curves with/without BN in the WFK-CNN model are shown in Figure 5. The curves of target function value with/without BN using logarithm is shown in Figure 6. The size of each Mini-Batch is set as 256, and the learning rate of Adam algorithm is set as 0.001.

##### 3.3. The Influence of the Convolution Kernel Size on the Noise

The recognition rate of the influence of convolution kernel size on noise is studied in the WFK-CNN model and ABN-CNN model. In the test, the SNR is set to −4 dB to 10 dB, and the convolution kernel size ranges from 16 to 128.

As shown in Table 3, the WFK-CNN model can achieve a higher recognition rate in general when the convolution kernel is large in a noisy environment. For example, when the size of convolution kernel is 112 × 1, the recognition rate can reach more than 90% when the SNR is 0 dB, and when the convolution kernel size is 16 × 1, the recognition rate is as low as 55.46%. However, when the convolution kernel size is larger than 128 × 1, the recognition rate may decrease slightly. For example, the recognition rate of the convolution kernel 128 × 1 is 89.65%, which is slightly lower than the 89.93% of the 96 × 1 convolution kernel. In Table 4, when the SNR is >−4 dB, the recognition rate of the ABN-CNN model is generally above 90%. Compared with the WFK-CNN model, the recognition rate is significantly improved, especially at lower SNR and smaller convolution kernel size. For example, when SNR = −4 dB, the WFK-CNN model with 64 × 1 convolution kernel size has only 51.89%, but the ABN-CNN model reaches up to 92.56%. As mentioned above, please note that the convolution kernel size influences the recognition rate considerably, particularly under small SNR situation. Accordingly, the large-scale convolution kernel should be appropriately selected when the noise is large, especially in WFK-CNN model.

##### 3.4. Anti-Noise Comparison

The anti-noise comparison between ABN-CNN, WFK-CNN, FFT-SVM, FFT-MLP, and FFT-DNN models is shown in Figure 7. SVM uses a radial basis kernel function, and the number of neurons per layer of the FFT-DNN model is selected as 1025, 500, 200, 100, and 10, respectively. The convolution kernel size of the ABN-CNN and WFK-CNN models is set to 112 × 1.

It can be seen that the recognition rate of FFT-DNN is higher than that of DNN-MLP, but the immunity of the two is relatively weak. For example, at SNR = 4 dB, the recognition rate of both models is less than 90%. In addition, the WFK-CNN model has higher noise and interference immunity than the FFT-DNN and DNN-MLP models, but it is slightly lower than the FFT-SVM. On the other hand, in various noise environments, the recognition rate of the ABN-CNN model reaches more than 90%, which is the strongest anti-noise ability of all models.

##### 3.5. Fault Diagnosis Performance Analysis under Variable Load Condition

Figure 8 shows diagnostic signals with 0.014 inch normalized inner circle defects under different loads. As can be seen, the number of features in the vibration signal varies with different loads. In addition to the inconsistent amplitude, the period and phase of the fluctuations are also very different. This phenomenon makes it difficult to correctly classify the extracted features for fault recognition.

The proposed WFK-CNN (Ad-BN) model trained by different kinds of load data using 1 hp, 2 hp, and 3 hp motors has a great practical significance to diagnose the vibration signal particularly when the load changes. The detailed description of variable loads for training and testing is shown in Table 5. The data, e.g., A set, were set for training, and the other two loads, e.g., B and C sets, were used for testing, and so on.

##### 3.6. Performance Analysis under Various Scenarios

As shown in Figure 9, the average recognition rate of FFT-SVM algorithm is less than 70% and FFT-MLP and FFT-DNN algorithms can reach about 80% recognition rate, while the WFK-CNN algorithm can achieve 90% average recognition rate. The ABN-CNN algorithm further improves the recognition rate up to 95.9% average. Especially, when set B (load for 2 hp) is used for training and set A (load for 1 hp) is used for testing, the recognition rate from FFT-SVM, FFT-MLP, and FFT-DNN models is 20% lower than that of the WFK-CNN model. On the other hand, the ABN-CNN has about average 6% higher recognition rate than the WFK-CNN model. When set C (load for 3 hp) is trained and set A or B is tested, the recognition rate of ABN-CNN model increases more than 10% than the WFK-CNN model.

#### 4. Conclusions

In this study, the proposed ABN-CNN model presents a relatively simple network using a large-scale convolution kernel with a small coiling convolution layer. Unlike the traditional Fourier transform, it is trained by the deep CNN with the adaptive batch normalization algorithm for bearing fault diagnosis. It can effectively reduce the difficulty in adjusting the parameters of the WFK-CNN model. When the SNR exceeds −4 dB, the proposed model can even reach a high recognition rate of more than 90% average using the one-dimensional bearing vibration signal. However, if the distribution of the test samples is significantly different from the training samples, the diagnostic performance efficiency may decrease. Also, noise interference and load change may affect the recognition rate. In a better situation, higher SNR can achieve recognition rates as high as 99%. In general, the experimental results confirm that the ABN-CNN algorithm can considerably improve the recognition rate of the WFK-CNN model and it is superior to FFT-SVM, FFT-MLP, and FFT-DNN algorithms. The future research can be further studied in increasing the recognition rate for low SNR and load variation under a noise circumstance. In addition, an axial piston pump that is widely used in various hydraulic pumps is one of the critical noise sources in the industry. Accordingly, it can be further studied on fault diagnosis of axial piston pumps using convolutional neural networks or deep belief networks via iterative learning processes.

#### Data Availability

The data that support the findings of this study are openly available in the CWRU Rolling Bearing Data Center at http://csegroups.case.edu/bearingdatacenter/home.

#### Conflicts of Interest

The authors declare no conflicts of interest.

#### Authors’ Contributions

Chao Fu developed the model. Qing Lv collected the data and carried out the performance analysis. Hsiung-Cheng Lin helped edit the manuscript. All the authors contributed to the writing of the final research paper.

#### Acknowledgments

This work was supported by the Key Fund of Hebei Provincial Education Department (Grant no. ZD2018028).