#### Abstract

Vibration signals of gearbox are sensitive to the existence of the fault. Based on vibration signals, this paper presents an implementation of deep learning algorithm convolutional neural network (CNN) used for fault identification and classification in gearboxes. Different combinations of condition patterns based on some basic fault conditions are considered. 20 test cases with different combinations of condition patterns are used, where each test case includes 12 combinations of different basic condition patterns. Vibration signals are preprocessed using statistical measures from the time domain signal such as standard deviation, skewness, and kurtosis. In the frequency domain, the spectrum obtained with FFT is divided into multiple bands, and the root mean square (RMS) value is calculated for each one so the energy maintains its shape at the spectrum peaks. The achieved accuracy indicates that the proposed approach is highly reliable and applicable in fault diagnosis of industrial reciprocating machinery. Comparing with peer algorithms, the present method exhibits the best performance in the gearbox fault diagnosis.

#### 1. Introduction

Gearboxes play crucial roles in the mechanical transmission systems, are used to transmit power between shafts, and are expected to work 24 hours a day in the production system. Any failures with the gearboxes may introduce unwanted downtime, expensive repair, and even human casualties. Therefore it is essential to detect and diagnose faults in the initial stage [1–4]. As an effective component for the condition-based maintenance, the fault diagnosis has gained much attention for the safe operations of the gearboxes [5, 6].

Machine fault identification can be done with different methodologies such as vibration signature analysis, lubricant signature analysis, noise signature analysis, and temperature monitoring. The gearbox conditions can be reflected by such measurements as vibratory, acoustic, thermal, electrical, and oil-based signals [7–12]. Of the above the diagnostic by vibrations is the most employed for the reason that every machine is considered to have a normal spectrum until there is a fault, where the spectrum changes [13, 14]. The vibration signals have been proven effective to reflect the healthy condition of the rotating machinery. In the vibration-based gearbox fault diagnostics, Wang et al. [15] proposed the application of local mean decomposition of the vibration signal to diagnose a low-speed helical gearbox. Hong et al. [16] investigated the vibration measurements for the planetary gearbox fault detection. The vibration characteristics in both the time and the frequency domains were analyzed by Lei et al. [17] for the diagnostics of the planetary gearboxes.

Various studies exist, of algorithms for detection and diagnostics of faults in gearboxes; among these are support vector machines and artificial neural network. A support vector machines based envelope spectrum was proposed by Guo et al. [18] to classify three health conditions of the planetary gearboxes. An intelligent diagnosis model based on wavelet support vector machine (SVM) and immune genetic algorithm (IGA) was proposed for the gearbox fault diagnosis [19]. The IGA was developed to determine the optimal parameters for the wavelet SVM with the highest accuracy and generalization ability. Tayarani-Bathaie et al. [20] suggested a dynamic neural network to diagnose the gas turbine fault. The artificial neural network combining with empirical mode decomposition was applied for automatic bearing fault diagnosis based on vibration signals [21]. Among all the typical classifiers, the support vector classification (SVC) family (i.e., the standard SVC and its variants) attracted much attention due to their extraordinary classification performance. According to the researches, the SVM family received good results in comparison with the peer classifiers.

Recently, deep learning received great success in the classification field. The deep learning gained better classification performance owing to its “deeper” representations for the faulty features. Up to now, different deep learning networks such as deep belief network [22], deep Boltzmann machines (DBMs) [23], deep autoencoder [24], and convolutional neural networks [25] have been introduced, but few been used for the fault diagnosis cases. Tran et al. [26] introduced the application of the deep belief networks to diagnose reciprocating compressor valves. Tamilselvan and Wang [27] employed the deep belief learning based health state classification for iris dataset, wine dataset, Wisconsin breast cancer diagnosis dataset, and Escherichia coli dataset. The limited reports used the deep learning structure for the fault diagnosis, with commonly one modality feature.

This paper presents a study for the application of the convolutional neural network in the identification and classification of the gearboxes fault. Convolutional neural network (CNN) is a type of feed-forward artificial neural network. Its individual neurons are tiled in such a way that they respond to overlapping regions in the visual field [28]. CNN and its variations are widely used models for image and video recognition [29, 30]. In this work, it is used as a classifier for the gearbox faults diagnosis.

The most successful methods of vibration-based fault diagnoses are composed of two main steps: extracting the sensitive features and classifying the condition patterns. In the vibration-based fault diagnosis, the most commonly used features have been generated from the temporal [31], spectral [32], wavelet [33], and other representations of the signals. Different representations can be regarded as different observations on the vibration signals [34]. In this work, statistical measurements such as standard deviation, skewness, and kurtosis are computed from the acquired time domain data. In the frequency domain, the spectrum obtained with a FFT is divided into multiple bands. The root mean square value is calculated for each band so the energy maintains its shape at the spectrum peaks. Vectors of the features of the preprocessed signal are formed, which are used as input parameters for the CNN. It is important to point out that the testing is performed under five different rotation frequencies and for each one four different load conditions are applied, which simulates the most likely scenario within an industrial application.

The rest of this paper is structured as follows. The CNN model and method of extracting statistical features are introduced in Section 2; Section 3 explains the mechanical conditions for the experiment; Section 4 presents the implementation of classifier based on the CNN model; and Section 5 shows the obtained results and their evaluation. Finally some conclusions are drawn.

#### 2. Methodologies

In this section, we first present the representations of the convolutional neural network. And then the approach of extracting the sensitive features is introduced, where some classical statistical parameters are calculated from the time and the frequency.

##### 2.1. Deep Learning with Convolutional Neural Network

Convolutional neural network was inspired by the visual system’s structure [35] and in particular by the models of it proposed by [36]. The first computational models are based on local connectivity between neurons and on hierarchically organized transformations of the image in Fukushima’s neocognitron [37]. LeCun and collaborators, following up on this idea, designed and trained convolutional networks using the error gradient, where state-of-the-art performance was obtained [38, 39] on several pattern recognition tasks. Modern understanding of the physiology of the visual system is consistent with the processing style found in convolutional networks in the literature [40]. To this day, pattern recognition systems based on convolutional neural networks are among the best performing systems [41]. This has been shown clearly for handwritten character recognition [38], which has served as a machine learning benchmark for many years.

A typical convolutional neural network [38] is organized in layers of two types: convolutional layers and subsampling layers. Each layer has a topographic structure.

At each location of each layer, there are a number of different neurons. Each has its set of input weights that is associated with neurons in a rectangular patch in the previous layer. The same set of weights, but a different input rectangular patch, is associated with neurons at different locations.

Figure 1 presents the architecture of typical convolutional neural networks, in which the early analysis consists of alternating convolution and subsampling operations, while the last stage of the architecture consists of a generic multilayer network: the last few layers (closest to the outputs) will be fully connected 1-dimensional layers. CNNs work on the 2-dimensional data, so called maps, directly, unlike normal neural networks which would concatenate these into vectors. Typically convolutional layers are interspersed with subsampling layers to reduce computation time and to gradually build up further spatial and configural invariance. A small subsampling factor is desirable in order to maintain specificity at the same time.

Convolutional layers move forward with deriving the back propagation updates in a network, which compose feature maps by convolving kernels over feature maps in layers below them. At a convolution layer, the previous layer’s feature maps are convolved with learnable kernels and put through the activation function to form the output feature map. Each output map may combine convolutions with multiple input maps. In general, it is calculated as follows [41]:where represents a selection of input feature maps; is the th layer in a network, is a matrix of ; here, is the size of convolutional kernels; is a nonlinearity active function, typically hyperbolic tangent or sigmoid function. Each output map is given an additive bias; for a particular output map, the input maps will be convolved with distinct kernels . That is to say, if output map and map both sum over input map , then the kernels applied to map are different for output maps and .

A subsampling layer produces downsampled versions of the input maps. If there are input maps, then, there will be exactly output maps, although the output maps will be smaller. More formally [41],where represents a subsampling function. Typically this function will sum over each distinct -by- block in the input feature map so that the output feature map is times smaller along both spatial dimensions. Each output map is given its own multiplicative bias and an additive bias .

To discriminate between classes a fully connected output layer with neurons is added. The output layer takes as input the concatenated feature maps of the layer below it, denoting the feature vector, ,where is a bias vector and is a weight matrix.

, , , and of the model are learnable parameters. Learning is done using gradient descent which can be implemented efficiently using a convolutional implementation of the backpropagation algorithm as shown in [41]. It should be clear that because kernels are applied over entire input maps, there are many more connections in the model than weights; that is, the weights are shared. This makes learning deep models easier, as compared to normal feedforward-backpropagation neural nets, as there are fewer parameters, and the error gradients goes to zero slower because each weight has greater influence on the final output.

##### 2.2. Statistical Features of the Gearbox Vibration Signals

The gearbox condition can be reflected by the information included in different features in frequency and time domain. From the set of signals obtained from the measurements of the vibrations at different speeds and loads, the features in frequency and time domain are obtained. From the group of graphs the values that can be used as input parameters for the CNN are selected. Sixty percent of the samples set are used for the training of the CNN, and forty percent are used for testing.

###### 2.2.1. Time Data Statistical Features

Usually, statistical parameters are good indices for extracting the condition information. In this research, statistical measurements such as standard deviation, skewness, and kurtosis for each node are used. Standard deviation, skewness, and kurtosis are computed from the acquired time domain data; the formulas used for this are shown in Table 1, where is the expected value of . Correction bias is used for the evaluation of skewness and kurtosis. The standard deviation, skewness, and kurtosis evaluated on each of the vibration signals are used for training and testing of the CNN. The evaluation of these is done using standard MATLAB functions.

###### 2.2.2. Fast Fourier Transform Banded RMS Value

Figure 2 shows the vibration signal spectrum obtained during the test under the following condition patterns: gear with face wear 0.4 [mm], gear with face wear 0.5 [mm], bear with 2 pits on inner ring, and bear with 2 pits on outer ring for 5 different rotation speeds, and load at 375 W. Figure 3 shows frequency spectrum under five combinations of different condition patterns. The time domain signal was multiplied by a Hanning window to obtain the FFT spectrum, in which a shift in the frequency and an increment in the amplitude in function of the speed increment are noticeable during the test. The different spectrum graphs showed that the amplitude of each component increases in a proportional manner to the load variation. Also in the spectra some accentuations and attenuations were observed on certain spectral component, which suggests dependency of the fault features with respect to the amount of load applied.

With the objective of reducing the amount of input data to the CNN the spectrum was split in multiple bands, because with this number of bands the root mean square (RMS) values keep track of the energy in the spectrum peaks [42], where the RMS value is evaluated with (4), and is the number of samples of each frequency band. Consider

Vectors of the features of the preprocessed signal are formed as input parameters for the CNN as follows: RMS values, standard deviation, skewness, kurtosis, rotation frequency, and applied load measurements. In this work, the frequency range is 0 to 22050 Hz and the size of the data vector in the frequency is 18000 samples. The spectrum is divided into frequency bands, .

#### 3. Experimental Setup

To validate the effectiveness of the proposed method, we carried out the experiments on a gearbox fault experimental platform. Figure 4 indicates the internal configuration of the gearbox and positions for accelerometers. There are 3 shafts and 4 gears composing a two-stage transmission of the gearbox. An input gear (, modulus = 2, and Φ of pressure = 20) was installed on the input shaft. Two intermediate gears ( = = 53) were installed on an intermediate shaft for the transmission between the input gear and the output gear ( = 80, installed on the output shaft). The faulty components used in the experiments included gears , , , and and bearings , , , and as labeled in Figure 4(a). Test’s conditions are described in the Table 2. The vibration signal is obtained from the measurements of a vertically allocated accelerometer in the gearbox case. Tables 3 and 4 present the description of each fault condition of each component of the gearbox used in the experiment. We call them basic condition pattern. In our experiment, a test case includes several basic condition patterns, which is a combination of multiple component faults. For example, the test case A shown in Table 5 includes the following information of faults: Gear : gear with pitting on teeth. Gear : gear with face wear 0.5 [mm]. Bear : bearing with 4 pits on outer ring. Bear : bearing with 2 pits on outer ring. Gears and and bears and : normal.

**(a)**

**(b)**

To evaluate the performance of the proposed method for gearbox fault diagnosis, first, we constructed 12 condition patterns as listed in Table 5. Each pattern with 4 different load conditions and 5 different input speeds was applied during the experiments. For each pattern, load and speed condition, we repeated the tests for 5 times. In each time of the test, the vibratory signals were collected with 24 durations each of which covered 0.4096 sec.

#### 4. Implemented Classifier Based on CNN and Statistical Features

In this section, the implementation of classifier based on CNN will be introduced. Figure 5 shows the block diagram of the process followed in the processing of the signal. The CNN-based classifier includes parameters as follows:(1)The size of input feature map, , depends on the feature representation of the preprocessed signal.(2)The number of alternating convolution and subsampling layers that decides the architecture of CNN is as follows. Two schemes are investigated: one is two convolutional layers and two subsampling layers; another is one convolutional layer and one subsampling layer.(3)The number of output feature maps of convolution layer, , expresses th convolution layer. is as the number of output feature maps of .(4)The scale of subsampling layer, , which means the size of output feature map of subsampling layer, is 1/ of that of the input feature map. expresses that of th subsampling layer; and is as the scale of layer .(5)For each input map convolve with corresponding kernel and add to output map; the convolutional kernel is usually a matrix of , where is called convolutional kernel size.

To confirm the optimal architecture of CNN-based classifier for gearbox fault diagnosis, some parameter tunings are performed. Table 6 presents 11 schemes with different parameters of the CNN-based classifier. They are applied to a test case with the 12 patterns indicated in Table 5, using data with 12000 sample signals, where sixty percent of the samples set are used for the training of the CNN, and forty percent are used for testing. The classification rate and computation time (Intel Core i7-4710MQ CPU @2.50 GHz 2.50 GHz, Memory 8.00 GB) of each epoch training are recorded in Table 6. From Table 6, we can assume that the cases of 16 × 16 input feature map are superior to those of 28 × 28. The cases with one convolutional layer and one subsampling layer are superior to those of two convolutional layers and two subsampling layers. #7~#11 cases have very good classification accuracy. #9~#11 cases have less computation times. So we select a configuration for the proposed CNN-based classifier as follows: one convolutional layer and one subsampling layer, , , , and . The suggested architecture of the CNN-based classifier for gearbox fault diagnosis is described in Figure 6.

#### 5. Experiment Evaluations

The training is done in first instance with the 12 patterns indicated in Table 5. The used data have 12000 sample signals, where sixty percent of the samples set are used for the training of the CNN, and forty percent are used for testing. For further tuning parameters, we consider the #9~#11 cases in Table 6 with different number of iteration epochs: 50, 100, 150, 200, 250, and 300, respectively. Table 7 indicates the classification rate for the first instance. As shown in Table 7, each parameters pair has excellent performance for the gearbox faults classification. The least classification rate is 89.46% of the pair of and epochs = 50; the best one is 98.35% of the pair of and epochs = 200. In the following experiment, and epochs are set to 8, 200, respectively.

Confusion matrix is an effective tool and is a visualization tool of the performance of a classification algorithm. Each column of the confusion matrix represents the instances in a predicted class (output class), while each row represents the instances in an actual class (target class). Figure 7 presents the confusion matrix using CNN model for 12 patterns indicated in Table 5. As shown in Figure 7, the global percentage of true positive classification of the 12 condition patterns of faults is 98.4% and the total error is 1.6%. The smallest percentage of true positive classifications is obtained for type 3; this is because this kind of conditions patterns with 6 basic faults. This is evident by observing the confusion matrix in which 30 times of type 4 are classified as type 3, noticing that mostly there is confusion between type 4 and type 3, in which they have 4 same basic faults. The percentages of true positive classification of Type 1, Type 6, and Type 12 are all 100%. Confusion matrix in Figure 7 shows that the presenting CNN model has very high percentage of true positive classification.

To further validate the robustness of the present CNN model, a fault condition pattern library was constructed, which has 58 kinds of combinations based on the basis patterns described in Tables 3 and 4. 20 test cases are used to test the robustness of the present CNN method; each test case has 12 kinds of combinations that are randomly selected from the pattern library. The experiment results of 20 test cases using the CNN are shown in Table 8. With regard to the CNN method, its smallest classification rate is 91.4% of 1st test case, and the largest one reaches 98.9% of 10th test case. The mean, standard deviation, and median of classification rate using CNN are 96.8%, 2.57%, and 97.7%, respectively.

In addition, the CNN method was compared with “shallow” learning algorithms SVM. As for the SVM, one of the most important representatives in the “shallow” learning community, good classification results can be found for the gearbox fault diagnosis, which is similar with some existing researches (e.g., [43]). The algorithm SVM is applied using the LibSVM [44]. The parameters for SVM are chosen as and core (kernel) given by a radial basis function where . These parameters were found through a search, aiming at the best model for the SVM. Figure 8 shows the confusion matrix using SVM method for 12 patterns indicated in Table 5. The experiment results of 20 test cases using the SVM method are shown in Table 8. Its global percentage of true positive classification of the 12 condition patterns of faults is only 69.0% and the total error is 31.0%. The smallest percentage of true positive classifications is 30% of type 3. The mean, standard deviation, and median of classification rate using SVM for the 20 test cases are 67.8%, 6.49%, and 66.8%, respectively. Comparing with the deep learning CNN method, the SVM method exhibits inferior performance for the gearbox fault diagnosis.

#### 6. Conclusions

In this paper, a deep learning technique based CNN for the vibration measurements has been proposed to diagnose the fault patterns of the gearbox. The present CNN method identifies and classifies faults in gearbox by using the vibration signals measured with an accelerometer. Feature representations are selected as the input parameters of the CNN with a vector formed by RMS values, standard deviation, skewness, kurtosis, rotation frequency, and applied load. For evaluating the proposed CNN method, the gearbox fault diagnosis experiments were carried out using different techniques. The results show that the present method has the outstanding performance of the gearbox fault diagnosis, comparing with peer methods. This type of classifiers could make a contribution to maintenance routines for industrial systems, towards lowering costs and guarantying a continuous production system, and, with the appropriate equipment, online diagnostics could be performed.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work is supported by Natural Science Foundation Project of CQ CSTC (nos. cstc2012jjA40041 and cstc2012jjA40059), Science Research Fund of Chongqing Technology and Business University (no. 2011-56-05), the National Natural Science Foundation of China (51375517), and the Project of Chongqing Innovation Team in University (KJTD201313).