Abstract

Intelligent bearing fault diagnosis has received much research attention in the field of rotary machinery systems where miscellaneous deep learning methods are generally applied. Among these methods, convolution neural network is particularly powerful because of its ability to learn fruitful features from the original data. However, normal convolutions cannot fully utilize the information along the data flow while the features are being abstracted in deeper layers. To address this problem, a new supervised learning model is proposed for small sample size bearing fault diagnosis with consideration of imbalanced data. This model, which is developed based on a convolution neural network, has a high generalization ability, and its performance is verified by conducting two experiments that use data collected from a self-made bearing test rig. The proposed model demonstrates a favorable performance and is more effective and robust than other deep learning methods.

1. Introduction

Rolling element bearings (REBs) are common and important mechanic components used in various industries [1]. They are also the primary sources of mechanical faults in these devices [2]. REBs have three main components, namely, the inner race, the rolling element ball, and the outer race [3]. The REB faults that emerge during a running operation may lead to severe safety problems and huge maintenance costs [4]. Therefore, measuring and diagnosis of the healthy integrity and status of REBs, including their fault type, have attracted the attention of many researchers. Over the past few decades, many scholars have examined effective fault diagnosis algorithms for bearings and employed machinery vibration data for bearings diagnosis.

Data-driven intelligent fault diagnosis methods have rapidly developed over the past few years [5]. Many studies have adopted various machine learning algorithms, including K-nearest neighbor (KNN) [6], support vector machines (SVM) [7], random forest (RF) [8], extreme learning machine [9], and artificial neural networks (ANN) [10]. These methods usually involve two procedures, namely, fault feature extraction and fault type classification. Feature extraction relies on prior knowledge and expertise background. The signal features of the time, frequency, or time-frequency domains are usually extracted from the raw vibration signals via short-time Fourier transform (STFT) [11], empirical model decomposition [12], local mean decomposition [13], or Hilbert–Huang transform [14]. In fault diagnosis procedure, these extracted features are inputted into a machine learning model to obtain the diagnosis results. However, one common drawback of these conventional intelligent methods is that an effective features extraction depends on specific domain knowledge and thereby requires much human effort.

With their increasing of computational power, deep learning-based methods can productively and rapidly utilize mechanical vibration signals [15] and generate accurate diagnosis results without requiring much expertise [16]. With their powerful representation learning ability, deep learning methods have attracted much attention in fault diagnosis research. These methods have also achieved a state-of-the-art performance in bearing fault diagnosis. Zhang et al. [17] proposed a fault diagnosis model based on deep neural networks, and their experimental results showed that this model can efficiently recognize bearing fault types. Wang et al. [18] proposed a deep neural network with batch normalization for the diagnosis of bearings and gears. He and He [11] used an optimized deep learning model called large memory storage retrieval (LAMSTAR) neural network for bearing fault diagnosis and used acoustic emission signals to validate its classification performance. Lu et al. [19] applied a stacked denoising autoencoder (SDA) for bearing fault diagnosis. Shen et al. [20] proposed a method based on stacked CAE for automatic robust features extraction and fault diagnosis for bearings and gearbox and the method performed well under noise interference. Janssens et al. [21] developed a convolutional neural network and feature learning model to build a fault detection system for outer-raceway faults. Eren et al. [22] used a 1D convolutional neural network to achieve a generic real-time bearing fault diagnosis, and their experiment results showed that this network can demonstrate an excellent classification performance.

Although the above studies have applied different algorithms, they generally all assume that their employed datasets for validation are balanced. However, in reality, acquiring normal data is much more convenient than acquiring faulty ones, thereby resulting in imbalanced datasets. Such problem arises when classes of a dataset are not equally distributed [23]. One aim of classic algorithms is to maximize the classification accuracy. However, the employed accuracy metric is biased toward the majority class; that is, a classifier may achieve a very high classification accuracy without accurately predicting the minority class even if such class is often more important than the majority class in reality. Accordingly, an accurate prediction or detection of the minority class should be prioritized over the majority class [24]. Therefore, how to fully represent the minority class or how to classify an imbalanced data warrants further investigation.

The imbalanced classification problem has received much research attention across different fields. There are two main research streams, one is focused on the data and the other one is focused on the algorithm. Xu et al. [25] develop a new framework for fault diagnosis. They first designed and extracted features and then selected features for eliminating redundant features and identifying effective ones before using them for fault classification. Zhang et al. [26] proposed a synthetic oversampling approach called weighted minority oversampling (WMO) to balance the imbalanced data distribution and then used the balanced data to achieve the fault diagnosis. Mao et al. [27] used fast Fourier transform to get the frequency spectrum of original vibration signal. Then they applied a generative adversarial network to generate the synthetic minority samples and then used the synthetic balanced data into classifier for fault diagnosis. These data-based methods focus on the data distribution. Their key idea is trying to transfer imbalanced data into balanced ones. Usually, these methods consist of lots of work on feature extraction and selection. Zhao et al. [28] used a normalized CNN to automatically and accurately identify the condition of rolling bearings. They used imbalanced data and tested the performance of their method on an open-source experimental dataset. Zhang et al. [29] applied the fast clustering algorithm and SVM for rotating machinery fault diagnosis with imbalanced data. However, they only considered four fault classes in their work. Studies on imbalanced data are also relatively fewer than those focusing on balanced data probably due to the fact that the most commonly employed methods are more suitable for balanced data than unbalanced ones. However, the percentage of normal samples is usually much larger than that of fault samples in reality. In this case, the diagnosis performance of proposed methods should be tested by using imbalanced data. This study then focuses on the development of a model that is applicable to imbalanced data and can deal with different cases.

This study proposes the generative convolutional neural network (GCNN) to diagnose bearing faults while taking imbalanced data into consideration by using small-scale samples. This model is inspired by both the convolutional neural network and the residual neural network [30] and comprises two main structures, namely, the basic network body and the skip connections between the encode and decode parts. This model can concatenate encoding feature maps to the corresponding decoding feature maps. When the features are being upsampled in the decode part, each layer connects to a corresponding layer in the encode part. In this way, GCNN prevents the loss of information during the convolution process without using superfluous parameters.

The rest of this article is organized as follows. Section 2 introduces the proposed model. Section 3 describes the experiment test rig and its setup, the model implementation, and the data preprocessing and training processes. Section 4 discusses the experimental situation, analyzes the experiment results for both balanced data and imbalanced data, explores the generalization ability of GCNN in three scenarios, and compares its performance with that of several deep learning methods. Section 5 concludes the paper.

2. Theoretical Background and the Proposed Model

2.1. Basic Theory of the Convolutional Neural Network

The convolutional neural network (CNN) is a deep learning model with a deep feed-forward neural network. The typical CNN architecture includes an input layer, several convolutional layers, several pooling layers, one or two fully connected layers, and an output layer as illustrated in Figure 1.

2.1.1. Convolutional Layer

The convolutional layer in a CNN comprises several convolution filters that aim to generate feature maps through the convolutional functions of each filter. Such convolutional function can be formulated aswhere represents the convolutional output of the th channel of layer , is the number of filters in this layer, represents the th channel output in the previous layer , represents the weights between this layer and the previous layer, represents the bias of the th channel of convolutional layer , represents the convolutional operation, and represents the nonlinear activation function.

2.1.2. Pooling Layer

The pooling layer is another nonlinear downsampling layer that mainly reduces the dimensions of the convolutional layer output through downsampling. This layer lacks any weights or bias units, but its number of parameters can be reduced. Several pooling methods are available, including average pooling, max pooling, logarithmic pooling, and weight pooling, among which max pooling is frequently used in classification tasks. Max pooling can obtain the maximum value of a set area and can be formulated aswhere represents the pooling filter, and represent the size of the filter, represents the th channel output in convolutional layer , and represents the overlap between the output of the previous layer and the pooling filter.

2.1.3. Fully Connected Layer

After passing through the convolution and pooling layers, the extracted features enter the fully connected layer, where these features are further extracted and connected to the next layer (usually the output layer). All neurons in a fully connected layer are defined aswhere represents the weight matrix connecting the th and th layers, represents the bias vector, and represent the outputs of the th and th layers, and represents the nonlinear activation function.

2.1.4. Softmax Layer

The output layer uses a softmax function, which defines the probability of each class for multiclass classification. In softmax, the probability for one sample to belong to the th class with the -multiclasses problem can be defined as

If the output for a softmax neuron is close to 1, then the outputs for the other neurons will be close to 0. The sum of all predicted class probabilities is 1.

2.2. The Proposed Method: GCNN

The proposed generative convolutional neural network (GCNN) is a new model built based on the architecture of CNN with extra skip connections. This model mainly involves convolution operations, max pooling operations, ReLU activations, concatenations (skip connection), and upsample convolution operations, all of which comprise the encode and decode parts of the model. Figure 2 shows the GCNN architecture.

The input data are fed into the network before they pass through the encode and decode parts. The atomically extracted features are then flattened into a single dimension before imputed into a classifier to obtain the output. The encode part learns features at different scales, whereas the decode part then combines information from the former layers to obtain highly comprehensive features.

The encode part has three 2D convolutional layers, with each layer having a set of parameters that form learnable kernels (also known as filters). The number of kernels increases at each subsequent layer, and each filter has a small receptive field whose depth is equal to that of the input volume of the layer. The convolution process involves element-wise multiplications where the entries of the filter are multiplied by the input matrix values. This procedure produces layer-by-layer learned feature maps.

The convolutional and fully connected layers use the rectified linear unit (ReLU) as their nonlinear activation functions. The ReLU is a nonsaturating activation function that is defined as

The encode part comprises three superpositions of the convolution and max-pooling layers.

Another convolutional layer that serves as the model’s center of GCNN is located between the encode and decode parts.

The decode part of GCNN is symmetric to the encode part. The convolutional layers in this part also have 3 × 3 kernels but use a decreasing number of filters to make the feature map return to its original size.

At the end of the network, the learned features are flattened into a single dimension before being fed into the softmax classifier. The softmax layer can generate mutually probabilistic output categories.

GCNN has an additional skip path to create an expansion feature map. This skip path directly concatenates the encode and decode parts of the model with layers that share similar feature sizes. Afterward, extra rich features are added from the encode part to the corresponding expansion layer of the decode part. The concatenation can benefit the feature representations and retain some information from the prior layer in the encode part that may be lost after the data pass through the encode part.

GCNN can learn to associate the characteristics of the former encode part layer with those of the later decode part layer. The principal idea of this model is to extract features from all layers in the decode part separately and finally to use them in the classified layer for classification.

2.3. Initial Bias Setting

When the dataset is imbalanced, bias can be accurately initialized by considering the imbalance degree. Setting the bias correctly can accelerate convergence and this setting can just be applied in the final layer of the model.

The correct bias to set can be derived aswhere is the initial probability, is the initial bias of the model’s, represents the positive samples, and represents the negative samples.

When the initial bias is set this way, the model no longer needs to spend the first few epochs in learning the bias influenced by the positive examples, thereby saving training time and facilitating convergence. Therefore, setting the bias correctly provides an easy starting point for training the model and facilitating the feature learning process.

2.4. Adam Optimizer and Learning Rate Decay

Adaptive moment estimation (Adam) [31] is an algorithm for efficient stochastic optimization that only requires first-order gradients and a small amount of computer memory. Adam updates any parameter with an individual learning rate, that is, each parameter in the network is associated with a specific learning rate. The parameters are updated as follows:where and are the model parameters of the and timesteps. is the learning rate, is assigned a small value to prevent the divisor from being equal to 0, and and are the bias corrected first and second moment estimates calculated as

In these equations, and are the exponential decay rates for the moment estimates that belong to , and are the biased first moment estimates of the and timesteps, and are the biased second moment estimates of the t and t-1 timesteps, and is gradients at timestep .

Although Adam adapts to dynamic learning rates, using a learning rate decay schedule can substantially improve the performance of this algorithm. Learning rate decay is observed when the metric stops improving. For instance, using testing loss as a metric during the training process and when it decreases normally, such decrease has no effect on the learning rate. However, when the patience epoch is reached (i.e., testing loss does not decrease or directly increase), a learning rate decay takes place. The decayed learning rate can be computed as follows by multiplying the original learning rate by a factor:

2.5. Evaluation Metrics

The most direct way to evaluate the performance of a classifier is to use a confusion matrix. Consider an imbalance binary class classification problem where the minority class is positive and the majority class is negative. As shown in Table 1, a confusion matrix has four basic elements, namely, true positive (TP), false negative (FN), false positive (FP), and true negative (TN). Among these elements, TP and TN are samples that have been correctly diagnosed or classified, whereas FP and FN are samples that have been incorrectly classified.

Accuracy is the most frequently used metric for evaluating classifier performance. This metric represents the proportion of correctly diagnosed samples to all samples. The confusion matrix defines accuracy as

Accuracy presents a good and simple way to measure the performance of a model with balanced data. However, when the class distribution varies, accuracy also greatly varies due to its sensitivity to changes in the data. The presence of imbalanced data also hinders relative analysis and prevents an algorithm from dealing with such data.

Precision, recall, F1-score, and AUC are frequently used as the evaluation metrics when dealing with imbalanced data [32]. Precision and recall measure exactness and completeness, respectively, and demonstrate an inverse relationship. These metrics can be computed as the percentage of positive predictions and classes that have been correctly classified, respectively:

F1-score combines precision and recall into a single measure. This metric is positively related to the performance of a model and may present a highly comprehensive description of such model. F1-score can be formulated as

Area under the curve (AUC) indicates the probability for how more confident of a classifier to randomly choose a positive example is actually positive compared to randomly choosing a negative example which is positive. The specific curve is the receiver operating characteristic (ROC) curve that plots the true positive rate (TPR) against the false positive rate (FPR). The TPR is defined similarly to Recall (see (12)), whereas the FPR is defined as

The performance of a model is positively related to the value of the AUC, which ranges from 0 to 1. A model with 100% correct predictions has an AUC value of 1.0.

This study employs accuracy, precision, recall, F1-score, and AUC to evaluate the performance of GCNN across different situations.

2.6. Overall Flowchart of the Proposed Method

Figure 3 presents the flowchart of GCNN, in which main stages are summarized as follows.Stage 1: data acquisition: the mechanical vibration signal data of bearings are detected by sensors and collected by a data acquisition system.Stage 2: data preparation: without extra signal preprocessing and manual feature extraction, the original signal data are split into different dataset according to the dividing descriptions of these datasets (Section 3.2).Stage 3: model building: the GCNN is constructed on a computer server and is compared with other deep learning models.Stage 4: GCNN is trained by using the training set used for Experiment 1, Scenario 1, Case 1.Stage 5: the trained GCNN is tested by using the testing set for Experiment 1, Scenario 1, Case 1, and the diagnosis results are obtained.Stage 6: stages 4 and 5 are repeated for the rest of the experiment, scenarios, and cases.

3. Implementation and Experiments Setup

3.1. Experiment Details

The self-made bearing test experimental platform shown in Figure 4 was used for the vibration signal acquisition [33]. This platform comprises a drive motor, healthy bearings, acceleration sensors, a loading system, an NIPXle-1082 data acquisition system, and testing bearings. PCB 352C33 was used as the accelerator and was installed on the testing bearing at a 12 : 00 direction. 6205-2RS SKF bearings were used as test bearings. The motor speed was set to 961 r/min (loaded 1 kN) and the sampling frequency was set to 10 kHz.

Four categories of rolling bearings, namely, normal (Norm), inner-race fault (IF), rolling ball fault (BF), and outer-race fault (OF), under different faulty severities were collected from the test platform. Test experiments were carried out under five bearing fault severity levels (0.2, 0.3, 0.4, 0.5, and 0.6 mm) for IF, BF and OF, separately. Figure 5 presents the vibration signals under different health conditions.

3.2. Dataset

Two datasets were used for the two experiments. Experiment 1 employed a balanced dataset, whereas experiment 2 employed an imbalanced dataset containing several different imbalance degree cases.

To evaluate the generalization of GCNN, each of the aforementioned datasets involved three scenarios, four health conditions (four classes), ten health conditions (ten classes), and sixteen health conditions (sixteen classes). Case 1 represented the balance data situation. For all cases, the number of Norm samples for all cases was set to 100, whereas the number of faulty samples was reduced from 100 to 45, thereby suggesting that the imbalance rates of normal samples to the faulty samples gradually increased. For each condition, 80% of the data were used for training, and the remaining 20% were used for testing. The dataset details are shown in Tables 2 and 3.

3.3. Implementation

The model was trained and tested on a laboratory computer server with a Windows 7 operating system. Python 3.6, CUDA 8.0, cudnn 7.6.0, Tensorflow 2.0.0 [34], and Keras 2.3.1 [35] were among the software or packages used in the implementation. And Table 4 presents the parameters of GCNN.

3.4. Training

The model was trained by Adam with an initial learning rate of , learning rate decay factor of 0.2, patience of 5, and minimum learning rate of . During the training process, the validation loss was used as the training metric to adjust the learning rate. According to the learning rate decay strategy, if the loss does not decrease for five consecutive epochs, then the learning rate will be computed as the current learning rate multiplied by the decay factor. The training epoch was set to 100 for each round of training.

Each scenario, experiment, and case followed the same training methodology.

4. Results and Analysis

To evaluate the diagnosis performance and the generalization ability of GCNN, two experiments with balanced and imbalanced data were separately performed in three scenarios (four classes, ten classes, and sixteen classes). The imbalanced experiment included six imbalance degree cases.

4.1. Evaluation with Balance Data: Diagnosis and Analysis

GCNN was initially trained with balanced data in the ten-classes scenario. The Case 1 dataset listed in Table 2 was used in this experiment. To compare GCNN with other artificial intelligence methods, several deep learning methods were applied for diagnosis, including a wide convolutional neural network (WCNN), a narrow convolutional neural network (NCNN), and a classical five-layer neural network (FNN). WCNN has seven convolutional layers with ReLU activation functions. Each convolutional layer in this model was followed by a max pooling layer and fully connected and softmax layers can be found at the end of the model. NCNN has a similar architecture as WCNN yet only has two convolutional layers and two max pooling layers. Meanwhile, the FNN used here is a normal fully connected neural network and it has an input layer, three fully connected layers, and one softmax layer as its output layer. The nodes numbers of three hidden layers are 3072, 1536, and 500. The other hyperparameters setting are same as the other models.

Figure 6 shows the training and testing accuracies of the compared methods in the four-classes, ten-classes, and sixteen-classes scenarios. GCNN achieves a 100% classification accuracy across all scenarios in the training process (Figure 6(a)). GCNN also outperforms the other models in terms of testing accuracy across the three scenarios. Despite demonstrating a favorable diagnostic performance in all scenarios, WCNN reported 4%, 3%, and 4% lower accuracies than GCNN in the four-classes, ten-classes, and sixteen-classes scenarios, respectively. NCNN did not fare better than GCNN or WCNN, and FNN demonstrated the worst performance among all models.

Having more classification classes increases the difficulty for a model to generate good results. Such limitation reflects the limited generalization ability of a model. The performance of GCNN in the three scenarios validates its good generalization ability to deal with different classes and numbers of diagnosis.

GCNN was initially designed for fault diagnosis in the ten-classes scenario. Figure 7 plots the accuracy and loss curves of the four compared models during their learning process of the training and testing datasets. This plot facilitates the comparison of the performance and convergence rates of these models. As shown in Figure 7(a), the training loss of GCNN converged sharply and much faster than those of NCNN and FNN. The testing loss of GCNN also converged quickly and showed minimal vibrations. The training loss curves of WCNN were similar to those of GCNN yet took more epochs to converge. Meanwhile, the loss curves of NCNN converged slower than those of former two models, but its testing loss curve converged to a larger value. FNN encountered an overfitting problem. The trends shown in Figure 7(b) were opposite to the changes in the loss curve. All four models achieved high training accuracies with different convergence rates, but only GCNN achieved the highest accuracy quickly and stably.

Given that introducing more category labels will increase complexity, a traditional model cannot perform well in a different class situation. To check whether GCNN has such generalization ability and to evaluate the suitability of this model for different-class labels fault diagnosis, the results for the two classification scenarios are described as follows.

Figures 8 and 9 present the loss and accuracy change curves of the compared models during the learning processes in the four-classes and sixteen-classes scenarios, respectively. Among all models, GCNN achieved the highest accuracy and lowest loss in both scenarios.

A balanced dataset was used for fault diagnosis in experiment 1. GCNN outperformed all the other models when dealing with balanced data and demonstrated a favorable generalization ability in diagnosing bearing faults in the four-classes, ten-classes, and sixteen-classes scenarios.

4.2. Evaluation with Imbalanced Data: Diagnosis and Analysis

The diagnosis performance of the GCNN with imbalanced data was investigated to further test its ability. Experiment 2 used an imbalanced dataset with different imbalance degrees. As shown in Table 2, six imbalanced cases (Cases 2 to 6) were implemented in experiment 2 to investigate the generalization ability and robustness of GCNN in dealing with different imbalance degree situations.

First, the proposed GCNN was compared with a classical method named the synthetic minority oversampling technique and support vector machine (SMOTE-SVM) designed for imbalanced dataset. The compared method was applied along the classical pipeline way; first we use the SMOTE algorithm to the imbalanced dataset to create a balanced dataset, then extract the signals’ classical time and frequency features, and then applied SVM to do the classification. This comparison evaluation used the most severe imbalance degree case which is Case 7 as their dataset. And the parameters setting of each method were the same in three different scenarios.

Figure 10 is the t-SNE visualization of the diagnosis results for different scenarios. Figures 10(a)10(c) are results of SMOTE-SVM for four-classes, ten-classes, and sixteen-classes fault diagnosis, respectively. Figures 10(d)10(f) are results of GCNN for three different scenarios, respectively. It is clear that GCNN got better results than SMOTE-SVM in all three scenarios. As the number of classes increased, data distribution may vary minorly from each other, and SMOTE failed to generate robust samples for each faulty class. This brings complexity for the SVM classifying each class type and finally made the results in chaos. Conversely, the GCNN’s relatively stable performance shown a good automatically feature extraction ability and good generalization. More studies on GCNN are then carried on in the following part.

Next, during the deep learning model evaluation, a confusion matrix was used to summarize the true and predicted labels for each model during the testing process. If the model had predicted everything perfectly, then the confusion matrix would be a diagonal matrix where the values off the main diagonal, which indicate incorrect predictions, would be equal to 0. A lighter color of the confusion matrix corresponded to fewer incorrectly labeled samples. Figures 1114 show the confusion matrix of the GCNN and the three other compared models in the ten-classes scenario with different cases.

Figure 11 shows that the predicted labels of GCNN in the ten-classes scenario closely match the true labels in six different cases. The confusion matrices of almost all cases have clear diagonals. The true label 5 showed the worst match in different cases, especially in Cases 5 and 6. Label 5 in the ten-classes scenario represented BF with 0.2 mm fault size.

As shown in Figure 12, the confusion matrices of WCNN in different cases were similar to those of GCNN, thereby validating the favorable classification performance of the former. Figures 13 and 14 show that NCNN and FNN encountered difficulties in dealing with all imbalance cases. While NCNN managed to handle the first few low imbalance level cases, FNN performed poorly in all cases.

Precision, recall, F1-score (calculated from precision and recall), and AUC were used as metrics in analyzing the results obtained with imbalanced data as discussed in Section 2.5. Among these metrics, AUC was mainly considered, with a larger value indicating a better performance. Table 5 compares the performance of the deep learning models in the ten-classes scenario.

As shown in Table 5, GCNN outperforms all the other models in terms of AUC. Specifically, the AUC values of GCNN were above 0.97 across all 6 cases, with the AUC value in Cases 2 and 3 nearing 1. All metrics decreased along with an increasing imbalance degree. A higher imbalance degree prevented a model from effectively diagnosing fault class because most classifiers are sensitive to the distribution of samples. The values of these metrics also decreased along with an increasing imbalance degree. GCNN demonstrated the best performance even if some of its metrics were lower than those of WCNN.

To discover additional information, the changes in precision, recall, AUC, and loss during the learning process were plotted for visualization. Figure 15 presents the curves of GCNN under different cases. The training and testing curves of GCNN were better than those of the other methods. This model also achieved a fast-converging speed and low vibration for all metrics.

In sum, GCNN demonstrated a good performance when diagnosing with imbalanced data. This model also shows good robustness and generalization ability when dealing with different imbalance degree data in the ten-classes scenario.

Figure 16 presents the fault diagnosis AUC results of the models in the four-classes and sixteen-classes scenarios to further check whether GCNN can handle different complex scenarios, including the balanced dataset and six imbalanced cases. Tables 6 and 7 present the four metrics of all models in the four-classes and sixteen-classes scenarios of six imbalanced cases, respectively.

GCNN outperformed all models in both the four-classes and sixteen-classes scenarios and showed great robustness and generalization ability to deal with increasingly imbalanced data. For both the four-classes and sixteen-classes scenarios, WCNN and NCNN obtained poor results as the imbalance degree increased. FNN did not obtain an AUC value above 0.85.

Experiment 2 used an imbalanced dataset for fault diagnosis, and results highlight that GCNN can solve the imbalance diagnosis problem and achieve better results than the other models. In sum, GCNN has a good generalization ability to diagnose bearing faults in different scenarios with imbalanced data.

5. Conclusions

This paper proposes GCNN, an effective and efficient architecture for complex small sample size bearing fault diagnosis while considering imbalanced data. This model benefits from the automatic feature learning ability of a CNN and the generalization capability of the skip connection that uses the features independently from the encode part to the decode part. Experiments are conducted on balanced and on imbalanced data to evaluate the performance of GCNN. Given that imbalanced data classification is an inherently difficult task, six different imbalance degree cases are used for verification in experiment 2. Results showed that GCNN outperforms all the other deep learning methods compared in this work. To sum up, GCNN has a robust feature representation capability and can use balanced or imbalanced data for either small or large classes fault diagnosis scenarios.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Nos. 51875376 and 51875375) and in part by the National Key R&D Program of China (Nos. 2017YFC0804803 and 2018YFB1600500).