#### Abstract

Deep learning has the ability to mine complex relationships in fault diagnosis. Deep convolutional neural network (DCNN) with deep structures, instead of shallow ones, can be applied to mining useful information from the original vibration data. However, when the number of the training samples is small, the diagnosis accuracy will be affected. As an improvement of the DCNN, deep convolutional neural network based on the Fisher-criterion (FDCNN) can be used for the fault diagnosis of small samples. But the model parameters in the method are based on human labor or prior knowledge, which is bound to bring negative influence on the diagnosis accuracy. Therefore, a novel adaptive Fisher-based deep convolutional neural network (AFDCNN) method, which can optimize the model parameters adaptively, is proposed as an improvement of the FDCNN. Comparative verification test results show that AFDCNN has more outstanding performance.

#### 1. Introduction

There are many benefits of the intelligent rotating machinery health monitoring and fault diagnosis; for example, it can reduce the dependence on the costly training and highly skilled operators and detect potential hazards before a catastrophic failure occurs [1, 2]. Meanwhile, it can also reduce the operation and maintenance costs of the complex engineering systems. Rolling bearings are widely used as critical moving parts of rotating machinery, its state of health matters [3, 4]. Therefore, the intelligent health condition monitoring and accurate fault diagnosis of the rolling element bearings are of great significance.

To meet the needs aforementioned, some fault diagnosis methods, such as BPNN and SVM, have been used for the machinery health monitoring [5–11]. While, with the larger scale, higher speed and much more complex of the rotating machinery, it is ideal for fault diagnosis method that can identify the health status of the diagnosis object accurately, quickly, and intelligently. There are more stringent requirements that the fault diagnosis methods could be more intelligent indeed.

As a great progress of the diagnosis method, deep learning [12] has the ability to solve the problems that the traditional fault diagnosis methods have to extract features on the basis of prior knowledge and have limited capacity to mine the hidden relationships in the fault quantitative diagnosis. The deep convolutional neural network (DCNN) with deep structures can be established on the basis of the deep learning theory [13, 14]. It can mine distributed features from the original vibration data adaptively [15, 16]. Ever since deep learning theory has been used in the mechanical fault diagnosis, it has attracted a lot of attention [17]. Jun Lee and Kim [18] proposed a novel algorithm for localizing slab identification numbers (SINs) in factory scenes by using DCNN. Bai [19] used DCNN to extract features and achieved good diagnostic results. Guo et al. [20] proposed a novel hierarchical learning rate adaptive deep convolution neural network based on an improved algorithm and applied it to the bearing fault diagnosis. Verstraete et al. [21] proposed a fault diagnosis method based on DCNN and time-frequency image analysis and achieved good results on two public datasets of rolling element bearing vibration signals. Zhuang et al. [22] proposed a novel deep learning method based on the DCNN and achieved ideal results as well. Zhang et al. [23] proposed a deep graph convolutional network on the basis of graph convolution operators, graph coarsening methods, and graph pooling operations; the experimental results demonstrate that the proposed method can be used to detect different kinds and severities of faults in roller bearings by learning from the constructed graphs. Wang et al. [24] proposed an enhanced intelligent diagnosis method based on multisensor data-fusion and DCNN, and the proposed method achieved higher prediction accuracy and more obvious visualization clustering effects.

The aforementioned applications show that the DCNN is a potential tool in dealing with fault diagnosis of rolling element bearing. While, as a diagnosis model based on the training samples, DCNN is influenced by the number of training samples as well [25]. Here comes the problem, the experimental vibration samples with labels cannot be always sufficient. In which, some of them are very difficult to obtain [26, 27]. The deep convolutional neural network based on the Fisher-criterion (FDCNN) is used for words recognition of small samples [28]. Aiming at the shortcoming of DCNN and learning from related methods in image recognition, this paper has adopted the Fisher classification criteria in the back propagation of model training. But the model parameters in [28] are set based on prior knowledge, which is bound to bring negative influence on the recognition accuracy. Therefore, a novel adaptive Fisher-based deep convolutional neural network (AFDCNN) method in which the model parameters can be optimized adaptively is proposed for the fault diagnosis of bearings in this paper. The advantages of the proposed method are stated again as follows:(1)The AFDCNN is able to extract fault features from the original data adaptively(2)The AFDCNN is able to establish the hidden relationship between the machinery health conditions and the signals measured adaptively(3)Based on limited samples, AFDCNN can achieve perfect performance compared to DCNN(4)The proposed method avoids dependence on expert experience to some extent

The architecture of the paper is organized as follows. First, a brief introduction to the traditional DCNN is given. Second, the DCNN model is improved based on the idea of Fisher-criterion. It can be more conducive to the classification characteristics direction. However, the model parameters in the method are based on prior knowledge, which is bound to bring negative influence on the diagnosis accuracy. Therefore, the FDCNN model can be improved by using the optimization algorithms for optimizing the parameter combination adaptively, and then, the AFDCNN is proposed. Third, the collected bearing fault samples are mainly used for two purposes. One of them, the training sample set is used to build the model, and the other, the test sample set is used to verify the model. Furthermore, the contrast verification is expanded between the traditional methods and the AFDCNN method. Forth is the conclusion.

#### 2. Brief Introduction to the DCNN

Essentially, a typical 10-layer DCNN model shown in Figure 1 has two parts [20]: the feature extractor and the Softmax classifier. The feature extractor has one inputting layer and three alternating convolutional layers (or C-layer), max-pooling layers (or P-layer), and two full connection layers (or FC-layer). The C-layer is used for feature extraction, and the P-layer is used for resampling. After several alternating C-layers and P-layers, the FC-layer is followed to compute the class scores. Then, the class scores are inputted into the Softmax classifier and the diagnosis results could be obtained.

##### 2.1. The Convolutional Layer (C-Layer)

The filter bank is described as follows:in which is a linear filter of the *l*-th layer, its size is , and , is the number of different kernels or filters in the . A matrix with size is convolved with the filter . The operation can be rewritten as

The Softmax function is

##### 2.2. The Pooling Layer (P-Layer)

The P-layer is used for resampling. After the operation, the matrixes’ size becomesin which is the size of inputting sample of the *l*-th layer and *s* is the down sampling size, for example, when the mean sampling method is used, *s* is 2.

##### 2.3. The Softmax Classifier

The Softmax classifier can be described as follows:where is an activation function and its parameter is . The parameter is learned by a training set, and is the learned feature. The result of equation (5) is a label between 0 and 1. Furthermore, the predicted class and score can be described as

Compared to traditional fault diagnosis methods, DCNN has won widespread attention by relying on the advantages of adaptive feature extraction. The reconstruction errors between the inputs and outputs have been selected as the energy function in the method. The connection weights of the network will be optimized and adjusted through the forward and back propagation process. Then, the energy function can be minimized. The weights sharing principle has been used in the forward propagation process to reduce the complexity of the algorithm. The sample feature vector obtained will be adjusted by the weights and bias, and then the sample prediction labels can be obtained through an activation function. In order to obtain a better training model, the process of weight optimization will be one of the key factors.

#### 3. Adaptive Fisher-Based Deep Convolutional Neural Network (AFDCNN)

##### 3.1. The Traditional Process of Model Training

Assuming that samples constituted the sample sets and they are categories, respectively, the traditional energy function [26] can be represented as follows:where is the weight value of each unit and *b* is the bias term and is the output of the last neural network layer, namely, the fault-pattern index of the sample . The target of the training network is to find the minimum value of the function by adjusting the and *b*. Using the gradient descent method to optimize the objective function, the iterative formula can be represented aswhere is the learning rate. Before using the back propagation algorithm, the first step is the forward propagation, and it has been used to calculate the output value of the last layer of the network. Then, the error value between the and the actual value can be calculated. The error can be represented aswhere *nl* is the order of the output layers, is the sum input of the layer of the unit, is the sum input of the last layer of the unit. The minimum error between the input tag value and prediction value has been used as the energy function in the back propagation process for the adjustment of the accurate weights.

##### 3.2. The Optimization Process of Model Training

In the back propagation process of the DCNN, the adjustment of the weights can be more conducive to the classification characteristics direction based on the idea of Fisher-criterion. At the same time, the search space of the weights iteration is affected by the discriminant conditions, and it can be more conducive to the classification characteristics direction as well.

is the similarity measure function in the class, and it is defined as the sum of all the samples with the category average distance. is the similarity measure function between the classes, and it is defined as the sum of the average distance classes of all samples.where is the mean value of the category samples, and it can be represented as

When is used as an energy function in the gradient algorithm, after each iteration, the prediction category will be closer to the actual one, and when is used, the distance between the different categories will be bigger. In order to make the features learned by each DCNN layer more conducive to the diagnosis, the model as follows is used.in which is the energy function of the DCNN and is the overall energy function. The parameter combination is depending on the expert experience, which is bound to bring negative influence on the training model.

##### 3.3. The Improved Optimization Process of Model Training

In order to avoid the influence of human factors on model training and obtain the parameters adaptively, several optimization algorithms have been adopted and compared.

Before the optimization process, the objective functions can be derived as follows.

For the function , the calculation formula of the output layer of residual error for each unit can be represented as

For the function , the calculation formula of the output layer of residual error for each unit can be represented as

The particle swarm optimization (PSO) and stochastic gradient decent (SGD) are adopted for optimizing the parameter combination , adaptively and respectively.

In the model, all the weights can be obtained from the BP algorithm after the last layer residual error is minimized. According to the different working conditions of the same object being diagnosed, the optimal parameter combination obtained by the use of the optimization algorithm should satisfy the condition that the AFDCNN diagnosis model can be quick and accurate.

In this paper, the rolling element bearing is used as the object being diagnosed. Assume that the object being diagnosed has kinds of faults, the category has samples and the sampling frequency is .

The proposed method includes three convolutional layers, three max-pooling layers, and two full connection layers. The flow chart for AFDCNN is shown in Figure 2.

#### 4. Experimental Comparison

The bearing data are provided by the Case Western Reserve University (CWRU) [29]. The main components of the experimental apparatus were a 2-hp motor, a torque transducer, and a dynamometer. The motor shaft was supported by 6205-2RS JEM SKF bearings. The data were collected with the sampling frequency of 12 kHz, and the sampling time was 1 s. Figure 3 shows the time domain samples of four kinds of health conditions that are normal (*N*), outer race fault (OF), inner race fault (IF), and roller fault (RF). Table 1 shows the sample division of the dataset obtained. The configuration of the computer is Intel(R) Core(TM) i7-7400 CPU 16G RAM.

**(a)**

**(b)**

**(c)**

**(d)**

##### 4.1. Description of the Data

Description of the data are provided in Figure 3 and Table 1.

##### 4.2. Comparison with DCNN, FDCNN, PSO-AFDCNN, and SGD-AFDCNN Method

The convolutional neural network structure of the DCNN and FDCNN are 6*C*-2*S*-12*C*-2*S*-12*C*-2*S*; it means that the model includes three convolutional layers and three pooling layers; the size of the convolution kernel is . Based on the experience, the model parameters combination . The fault recognition results are shown in Figures 4 and 5.

**(a)**

**(b)**

The convolutional neural network structure of the AFDCNN is the same to the DCNN and FDCNN. The flow chart of Figure 5 described the hierarchical framework of the proposed method. The flow chart of Figure 6 described the architectural hierarchy of the AFDCNN. Table 2 stated some of the parameters of the AFDCNN model during training. By using the PSO and SGD, the optimal parameters combination is obtained, respectively. As shown in Figures 7(a) and 8(a), it can find that the minimum stable value of the appeared in the eleventh generation and eighth generation, respectively. The optimized combinations obtained are and , respectively. The fault recognition results are shown in Figures 7(b) and 8(b).

**(a)**

**(b)**

**(a)**

**(b)**

Table 3 stated the models adopted in this paper and the diagnosis results. From the diagnosis results of the different models, it can find that both the PSO-AFDCNN and SGD-AFDCNN models have the superior ability on the recognition rate, and because of the difference of optimization speed, the SGD-AFDCNN has shown better performance.

Furthermore, the comparisons of the bearing fault quantitative diagnosis between the PSO-AFDCNN model and SGD-AFDCNN model are adopted, and the diagnosis results are shown as Figures 9 and 10.

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

**(c)**

To further analyze the evaluation performance of the two methods, a statistical indicator is used to quantify the accuracy of the second layer of the proposed system. The accumulation error, which denotes the maximum deviation from actual fault size, is defined as follows:

The formula above is used to calculate the maximum error achieved using the PSO-AFDCNN and SGD-AFDCNN methods, and the results are listed in Table 4 for comparison.

The conclusion that both PSO-AFDCNN and SGD-AFDCNN have perfect diagnosis ability on the bearing experiment data can be obtained from the compassion results. Furthermore, the SGD-AFDCNN showed faster diagnosis speed, while the PSO-AFDCNN obtained better diagnosis accuracy. The superiority of the proposed hierarchical AFDCNN model is confirmed by the experimental comparison results collectively.

#### 5. Conclusion

In this paper, a novel DCNN model, which can be called as AFDCNN, is proposed, and the contrast verification between DCNN, FDCNN, PSO-AFDCNN, and SGD-AFDCNN has been done on the bearing dataset.

The advantages of the AFDCNN are stated as follows:(1)It is able to extract fault features from the original data adaptively(2)It is able to establish the hidden relationship between the machinery health conditions and the signals measured adaptively(3)Both SGD-AFDCNN and PSO-AFDCNN have perfect performance on the bearing fault-pattern recognition, and SGD-AFDCNN showed better calculation ability than PSO-AFDCNN, while, in the process of quantitative diagnosis, POS-AFDCNN obtained better diagnosis accuracy(4)The proposed method avoids dependence on expert experience to some extent

The results of the experiments demonstrated that the proposed AFDCNN model has superior ability compared to other methods, such as DCNN and FDCNN. The AFDCNN model achieved a high degree of fault diagnosis accuracy and offered an automatic feature extraction method which could be a practical and convenient method for the bearing fault diagnosis.

#### Data Availability

The data are available in the website: https://csegroups.case.edu/bearingdatacenter/pages/welcome-case-western-reserve-university-bearing-data-center-website.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grant nos. 51975576 and 51475463) and Defense Industrial Technology Development Program (Grant no. WDZC20195500305).