This paper proposes a condition monitoring method for the early defect detection in a chain sprocket drive (CSD) system and classification of fault types before a catastrophic failure occurs. In the operation of a CSD system, early defect detection is very useful in preventing system failure. In this work, eight fault types associated with the CSD system components, such as the gear tooth, bearings, and drive motor shaft, were arbitrarily damaged and incorporated into the CSD system. To detect the fault signals during the CSD system operation, the vibration was measured using an Internet of Things (IoT) device, which features a wireless MEMS accelerometer, Bluetooth function, Wi-Fi function, and battery. The IoT device was mounted on the gearbox housing. The measured one-dimensional vibration time-series was transformed into time-scale images using continuous wavelet transform (CWT). A convolution neural network (CNN) was employed to extract deep features embedded in the images, which are closely related to fault types. To update the learning parameters of the CNN, the RMSprop learning algorithm was applied, and the CNN was trained using 500 image samples. Multiple-classification performance of the trained network was tested using 100 image samples. Feature maps for different fault types were obtained from the final CNN convolution layer. For the visualization of fault types, t-stochastic neighbor embedding was employed and applied to the feature maps to convert high-dimensional data into two-dimensional data. Two-dimensional features enabled excellent classification of the eight fault types and one normal type.

1. Introduction

In a mechanical system, the gear train, belt pulley drive, and chain sprocket drive (CSD) are used to transmit power using rotating shafts. When the drive shaft and driven shaft are near each other, a gear drive unit is used. For a location where the distance between the drive shaft and driven shaft is relatively short, a belt pulley drive unit or CSD unit is often used. A CSD system has the advantage of transmitting power with greater force than a belt pulley drive system and does so by transmitting power without slip during the transmission process. Thus, CSDs have been applied in nearly all mechanical industries, such as machine tools, marine and aerospace drives, motorcycles, and the timing system for automotive engines. A CSD system failure while in operation can cause catastrophic human and economic losses. Therefore, it is critical to identify defects in a CSD system before it breaks down. A CSD system generally consists of a chain sprocket unit and its driving system. Damage to the chain sprocket unit is mainly due to roller chain fatigue [1]. The driving system of a chain sprocket unit consists of the electric motor, gears, bearings, and rotating shaft. Damage to a driving system is caused by a defect or failure [2, 3] of the bearing supporting the rotating shaft and gear tooth, which transmits the driving force to drive the chain sprocket unit. Several studies have been conducted to evaluate methods for rotating machinery. For the feature extraction of fault signals, traditional fault diagnosis methods such as time-average method [4], cepstral analysis [5], pseudo–Wigner–Ville distribution (PWVD) [6], discrete wavelet transform (DWT) [7], higher-order method [8], adaptive line enhancement [9], empirical mode decomposition (EMD) [10], and cyclostationary analysis [10, 11] have been used. CWT is nonorthogonal wavelet transform, and DWT is orthogonal transform. Fourier transform of mother wavelets for CWT and that for DWT are different from each other. The smoothing effect of CWT is better than that of DWT in the frequency domain. DWT is a useful method for the compression or recovery of signal, but CWT has advantage for the time-frequency analysis of signal to extract the fault information. In PWVD (pseudo–Wigner–Ville distribution) if we determine wrong kernel function, there is a cross-term problem. EMD is theoretically incomplete tool and not useful for the analysis in case of multiple faults. Thus, CWT has been widely used for time-frequency analysis of a fault signal. To classify fault patterns using fault features, k-nearest neighbor algorithms [12], Bayesian classifiers [13], support vector machines (SVMs) [14], and artificial neural networks (ANNs) [15] have been used. Most recently, owing to the development of computational performance and learning algorithms, deep learning approaches that include both feature extraction and classification of fault patterns have been applied in the field of fault diagnosis [16, 17] and have become the most popular diagnosis methods [1823]. Most papers on the fault diagnosis of rotating machinery using deep learning techniques have focused on the classification of a specific single component defect, such as the bearing wear pattern [16, 1820], gear tooth failure pattern [17, 21, 22], and unbalanced rotating shaft [23]. These previous studies analyzed and classified the fault pattern of single components such as a tooth, bearing, and rotating shaft of the mechanical system. This paper presents the diagnosis of a CSD system and multiple classifications of fault patterns using a deep learning technique. In this work, the CSD system components such as the electric motor shaft, bearings, and gears were arbitrarily damaged and assembled. A total of eight fault-states were created. The signals used for fault diagnosis were the vibrational acceleration measured in the CSD system. In this work, Internet of Things (IoT) sensor has been developed for the wireless measurement of fault signal and the real time transmission of measured data. The developed IoT sensor was firstly applied to the laboratory test for the validation of the new method. It is being used for the detection of chain convey system. An IoT device measured the acceleration. The IoT device is newly developed and consists of a wireless microelectromechanical system (MEMS) accelerometer, Bluetooth function, Wi-Fi function, and battery. The wireless MEMS accelerometer is useful in IoT applications because its small size and battery-powered operation are the typical requirements for IoT sensors [24, 25]. For the fault stage diagnosis or normal state using deep learning, image data are essential. Therefore, one-dimensional time data were converted into time-scale image data by continuous wavelet transform (CWT). The image data include time-frequency information related to the fault types. A convolutional neural network (CNN) was employed for multiple classifications of eight fault-states and one normal state, and the image data were used as the CNN input. The combination of CWT and CNN was introduced in field of fault detection [2628]. The innovative difference for these algorithms is the difference in the CNN structure according to application area. The CNN structure such as filter size (kernel size), number of layers, and number of inputs and outputs should be determined optimally for the successful application in different systems. In this study, the combination of CWT and CNN was also employed for the condition monitoring of the CSD system. The new optimal structure of CNN was presented for diagnosis of the multiple faults in the CSD system. Throughout the CNN, the patterns of eight fault types and a normal type were successfully classified, and their feature maps were well extracted.

2. Theory

2.1. Theory of CWT

The CWT [29] is based upon a family of functions:where is a fixed function called the “mother wavelet,” which is localized in terms of both time and frequency. The function is obtained by applying the operations of shifting (b-translation) in the time domain and scaling in the frequency domain (a-dilation) to the mother wavelet. The mother wavelet used throughout this study is the Morlet wavelet [30],where is the center frequency of the “mother wavelet” when the mother wavelet is transformed to the frequency domain. B is the bandwidth defined as the variances of the Fourier transform of the Morlet wavelet,where f indicates the frequency and denotes the complex conjugate.

The CWT of a signal is defined bywhere is the complex conjugate of and the function satisfies the condition

Here, plays an analogous role to the in the definition of the Fourier transform. If the mother wavelet satisfies the admissibility condition:then the inverse wavelet transform can be obtained by

2.2. Theory of CNN

Artificial neural networks have been widely used for the prediction and classification of sound and vibration signals. Before deep neural networks (DNNs) were introduced, ANNs with shallow neural network (SNN) structures were used [3133]. An SNN uses a supervised training process with a feature vector. However, it is difficult to extract the system fault features if the dynamic system characteristics are unknown. Therefore, the DNN structure was developed for feature extraction [34]. A CNN is one of the DNN structures [35], as shown in Figure 1. A CNN uses an unsupervised training process and feature maps related to fault information, which are extracted from the stages of several convolutional and pooling layers. The neurons in a CNN are arranged in the form of feature maps. The input to a convolutional layer is the image x of size m × n in the CNN. The convolutional layer contains f filters (kernels) of size r s, which have smaller dimensions than the input image. The output of the convolutional layer is a set of f feature maps of size (m − s + 1) × (n − s + 1) by striding over one pixel. The filter, realized by assigning a weight fij to each pixel in the input image and calculated as a weighted sum, extracts certain features contained in the image. The weighted sums are then added by an additive bias and passed through a nonlinear function to obtain pixels in the convolutional map. Traditionally, sigmoid and hyperbolic tangent functions were used; recently, rectified linear units [35] have become popular. The activation output of a particular feature map j in the convolutional layer l is given aswhere ϕ is the nonlinear activation function; is the scalar bias for the layer; is the selected feature map i in the (l-1)th layer, which is summed up by the feature map j in the layer; ⊗ denotes the convolutional operator that convolutes the activation of the preceding layer; and is the filter used to perform the convolutional operation.

The filter weights are trained to detect specific features. Hence, effective feature selection in successive stages that can distinguish between different categories is necessary for the accurate classification of new input images. This is followed by a pooling layer. Each feature map is subjected to region-wise pooling, such as the maximum or average of nonoverlapping pixels. The output of the pooling layer leads to a dimensional reduction depending on the chosen stride. The activation output after downsizing the feature map d in a layer l is given aswhere χ is the downsizing function, such as the average or maximum function downsized by a factor of , and is the convoluted feature map to be downsized.

As the original input passes through successive convolution and pooling processes, the network learns to efficiently represent all of the images. The last neural network layer is a fully connected layer whose output is given bywhere is the bias for the output layer, W is the weight matrix between the input and output layers of the fully connected layer, f denotes the feature maps of the fully connected input layer, and is the softmax function [35]. The parameters , , , and W were learned during training. The training takes place via stochastic gradient descent (SGD) with the objective of minimizing the error between the actual and desired output. The gradient was computed using the backpropagation method [36]. All the filter weights and biases were updated according to the objective function for each input sample until an optimal representation is obtained for the training samples. For the backpropagation method, the cost function J is defined aswhere P is the number of output neurons, tp is the pth element of the target output for the pth fault, and yp is the actual output of the network for the pth fault. A common problem encountered in training CNNs is overfitting, which results in poor performance in a set of holdout tests after the network is trained on a small or even large training set; this affects the ability of the model to generalize unseen data. The SGD algorithm has been used as a learning algorithm [34] in backpropagation. To overcome the overfitting problem, adaptive moment estimation (ADAM) [37, 38] was proposed, which is another method that computes the adaptive learning rates for each parameter. Recently, Hinton [39] proposed the RMSprop algorithm. RMSprop is an unpublished, adaptive learning rate method; it lies within the realm of adaptive learning rate methods that have seen growing popularity in recent years. In this study, the RMSprop algorithm was employed. For the RMSprop algorithm, the update rule is mathematically given bywhere is the moving average of squared gradients, is the gradient of the cost function with respect to the weight, η is the learning rate, and β is the moving average parameter. The default value for the moving average parameter that can be used in projects is 0.9, which works very well for most applications.

3. Experiment

3.1. CSD System and Synthetic Fault Patterns for Test

A visualization of the CSD system used for the experiment is shown in Figure 2. The CSD system used for testing is the one on the right in Figure 2(a). The technical specifications are summarized in Table 1.

In the CSD system used for testing, there are four bearings, four gears, one chain, two sprockets, and one electric motor. The input shaft rotational speed is 1800 rpm (30 Hz), and that of the output shaft is 59.4 rpm (0.99 Hz). The speed reduction ratio is 30. If the chain and sprocket are damaged, the CSD system cannot work; therefore, artificial faults were created only in the CSD system drive portion, such as the motor shaft, bearings, and gear teeth. Faults were not introduced in the sprocket and chain.

The faults and conditions created are as follows:(1)Cracked tooth in Driven Gear 2 (Fault 1)(2)Broken tooth in Driven Gear 2 (Fault 2)(3)Unbalanced shaft in electric motor (Fault 3)(4)Outer-raceway fault in Bearing ① (Fault 4)(5)Inner-raceway fault in Bearing ① by inserting matter (Fault 5)(6)Unbalanced shaft in electric motor + broken tooth in Driven Gear 2 (Fault 6)(7)Unbalanced shaft in electric motor + broken tooth in Driven Gear 2 + outer-raceway fault in Bearing ① (Fault 7)(8)Unbalanced shaft in electric motor + broken tooth in Driven Gear 2 + inner-raceway fault in Bearing ① (Fault 8)(9)Normal condition (normal)

Some images of synthetic fault conditions are presented in Figure 3. Figures 3(a)3(e) show the cracked gear tooth, broken gear tooth, unbalanced shaft of the electric motor, outer-raceway fault by making a hole, and inner-raceway fault by inserting matter through the hole, respectively.

3.2. Data Acquisition

The experiments were performed under the conditions shown in Figure 2(c). The helical gearbox consists of two gears mounted on independent shafts. The input shaft is connected to the electric motor, which transforms electrical energy into rotational movement for transmission to the mechanical system. The output shaft is linked to the chain sprocket unit, which has a chain connected to the final output shaft and transforms electrical energy into mechanical force, as opposed to the rotational movement of the final shaft. The final output shaft can be used to drive the conveyor system. The most important details regarding these mechanical components are listed in Table 1. Vibration data were obtained using a vibration sensor unit mounted on the gearbox housing to determine the acceleration in the vertical direction. Sound data were obtained using a microphone placed at a distance of 1 m from the CSD system to measure radiated noise, as shown in Figure 2(b). A vibration sensor unit, called “Happy-Go,” was built with the following components:STM32F030C8T6 (MCU) (CPU)Intsain BPT2640TXSRF (BLE) (Bluetooth)Intsain WIS3200 (WLAN) (IoT development platform)Analog devices ADXL337 (MEMS accelerometer)Battery (3.7 V, 2600 mAh)

“Happy-Go” has its own built-in low-pass filter set to 1/2 the sampling rate; the device is shown in Figure 4.

The accelerometer also has a built-in high-pass filter, but it was deactivated. The test rotor was sampled at 2000 Hz, indicating that the cut-off frequency of the low-pass filter was 1000 Hz. The measurement range, sensitivity, noise density, and frequency bandwidth of the sensor are ±3 g, 270 mV/ga, 175 μg, and 2000 Hz, respectively. The vibration data were transmitted to a computer via Wi-Fi. Bluetooth is installed for the measured data transmission to the smart phone in the future but was not used in this study. A ½-inch free-field microphone (B&K 4192, Denmark) was used to measure sound data, which were transmitted to the computer through a data acquisition system (NI 9233, USA). With this setup, a dataset was created incorporating the healthy and faulty conditions presented in Section 3.1. A test was performed for each fault condition, resulting in a total of 700 test runs. Each test had a runtime of 5 min, from which the last 30 s of vibration data were captured using the accelerometers.

4. Signal Processing for the Measured Data

4.1. Data Analysis Based on Vibration Theory

The vibration data measured in the time domain were analyzed in the frequency and time-frequency domains using the MATLAB (MathWorks, USA) signal processing toolbox. Figures 5 shows the time history of eight faults and one normal signal measured by the accelerometer during one test.

Figure 6 shows a comparison between the power spectrum of the normal vibration signal and eight fault vibration signals, respectively. The spectrum shapes of the eight fault signals were different from that of the normal signal. According to vibration theory of rotating machinery [4043], under normal conditions, in rotating machinery like the CSD system, there are major vibration sources such as impact vibration between the sprocket and chain, gear meshing vibration, bearing rolling vibration, and electric motor shaft rotor vibration. The frequencies of these sources in the CSD system were calculated using the system specifications listed in Table 1:(i)Gear meshing frequency in Gear 1 : 450 Hz (rotating frequency × number of teeth: 30 Hz × 15)(ii)Gear meshing frequency in Gear 2 : 88 Hz (rotating frequency × number of teeth: 30 Hz × 15/82 × 160)(iii)Bearing 1 rolling frequency: 270 Hz (rotating frequency × number of balls: 30 Hz × 9)(iv)Bearing 2 and 3 rolling frequency: 39 Hz (rotating frequency × number of balls: 5,49 Hz × 7)(v)Bearing 4 rolling frequency: 8 Hz (rotating frequency × number of teeth: 0.99 Hz × 8)(vi)Chain and sprocket impact frequency at drive shaft: 18.8 Hz (rotating frequency × number of sprockets: 0.99 Hz × 19)(vii)Structural dynamic resonance of the CSD system: 57 Hz

The frequencies have several vibration peaks corresponding to these sources, as shown in Figure 6. The frequency region of these vibration peaks is under 500 Hz. The variations in vibration caused by the eight faults primarily occur in the frequency region above 500 Hz, as shown in Figure 6. The peaks under 500 Hz are related to the rotating frequency of the shaft, the teeth meshing frequency, the contact frequency of balls in bearings, and the contact frequency between chain and sprocket. The high frequency peaks are related to mechanical faults such as broken teeth, bearing wear, and rotating shaft imbalance. The mutual spectrum shapes of the eight fault signals were also different. According to the spectrum shape differences, the eight fault types can be distinguished and classified using vibration signals. For a more meaningful analysis of fault classification, CWT was applied to the vibration signals, and the time-frequency information for vibration signals was obtained. Figure 7 shows the CWT analysis results applied to the eight vibration signals. Detailed explanation of the major frequency components in the CWT analysis for the normal vibration signal is given in Figure 8 and listed in Table 2. The vibrations of these major frequencies are related to the vibration sources and listed in Table 2.

4.2. Feature Extraction

The traditional fault classification method in the field of machine learning uses a feature vector. Feature vectors extracted from raw signals have been used as the input of classifiers such as SVM and SNN. Therefore, numerous feature extraction methods have been studied for many years [411]. The major features of rotating machinery can be summarized [17, 44] as follows:(i) Peak to peak(ii) Root mean square time and frequency(iii) Standard deviation time and frequency(iv) Shape factor(v) Frequency center(vi) Impulse factor(vii) Crest factor(viii) Mean time and frequency (first moment)(ix) Variation time and frequency (second moment)(x) Skewness (third moment)(xi) Kurtosis (fourth moment)

To reduce feature dimensionality and improve classification accuracy, feature selection is critical to subsequent classifications. Several researchers have proposed effective methods of feature selection [45]. The classification machine using feature vectors has adopted the supervised training method [34]. However, it is difficult to extract the feature vectors for multiple-fault classification. Therefore, it is necessary to extract the features automatically based on the unsupervised training method.

5. CNN for Multiple-Fault Classification

5.1. Input for CNN

Recently, machine classification using an unsupervised training method based on DNN was proposed. The CNN is a classification machine using an unsupervised training method. The supervised training method demands a feature vector correlated to fault characteristics as the input of a classification machine. The major feature vectors of rotating machinery were mentioned in Section 4.2. These feature vectors are useful for the fault pattern classification of one mechanical element such as the tooth itself, bearing itself, or shaft itself. However, when multiple faults are taking place during CSD system operation, it is difficult to find which feature vectors are correlated to the multiple faults of tooth, bearings, shaft imbalance, and their combination. Therefore, the feature vectors for the multiple classifications should be self-extracted using a DNN such as CNN. The CNN uses the raw signal instead of using features and generates a feature map at each CNN convolution layer stage. It is an unsupervised training method because the raw signal is used as the input of machine learning based on CNN. In general, the CNN uses images as input data. In this study, the CWT was applied to the signal recorded by a wireless MEMS accelerometer. The image data obtained by the CWT were used as the CNN input image, as shown in Figure 7. Vibration data recorded every 30 s were classified by the multiclass classifier as Fault 1, Fault 2, …, Fault 8 or normal. Among the images recorded every 30 s (126 × 60,000), a reduced image (126 × 400) recorded for 0.2 s was used as the CNN input.

5.2. Network Setup and Training

After the vibration signals were processed using CWT and converted into two-dimensional images, they were used as CNN inputs. The DNN toolbox for MATLAB (MathWorks, USA) was utilized. The network architecture is shown in Figure 9. The first layer following the input layer is a convolutional layer with eight feature maps of filter size 3 × 3. This is followed by a mean-pooling layer of size 2 × 2. The next layer is a convolutional layer with 16 feature maps of filter size 3 × 3, followed by a 2 × 2 mean-pooling layer. The output layer contains nine neurons corresponding to the eight different fault conditions and one normal condition. All the layers are fully connected. The softmax function was used as the classification function. The RMSprop method [39] was used to train the network with an initial learning rate of 0.001. The batch size was taken as 128. Training was carried out for 1000 iterations (50 epochs). The change in accuracy during the learning iterations is shown in Figure 10. An optimal value of weight is achieved in 390 iterations with minimum error. For this training, out of the 700 samples, 500 samples were used for training, 100 samples for validation, and 100 samples for testing, and 10 different networks were produced.

5.3. Results and Discussion

The classification accuracy ratio is the ratio of the number of correctly classified test samples to the total number of test samples. In this case, among the 100 test samples, 97 samples were correctly classified. As shown in Figure 10, the accuracy rate arrives at the maximum value and becomes stable at 390 iterations. The filter size in the convolution layer and the number of filters were optimized to ensure that the accuracy rate converges to the maximum value and is stable. The accuracy rate is defined as follows [20]:

The network performed with a classification accuracy rate of 97%, misclassifying just three samples, as shown in Figure 11.

The feature maps for the eight fault-states and one normal state were extracted from the final convolutional layer of one network among the 10 trained networks, as shown in Figure 12. The figure clearly shows different feature maps for the eight fault-states and one normal state. According to these results, the feature map of each fault shows a different time-frequency characteristic.

The features must be visualized to verify the image recognition accuracy. If the data for each pixel are used for features as shown in Figure 12, then the features are high-dimensional data and show the time-frequency characteristics related to each fault. Therefore, t-stochastic neighbor embedding (t-SNE) was used for visualization by converting high-dimensional data into low two-dimensional data, as shown in Figure 13 [46]. “High-dimensional” implies that the number of features is high and “two-dimensional” indicates that the number of features is two. Figure 13(a) shows the reduced two-dimensional features for 20 input images of the CNN; Figure 13(b) shows the reduced two-dimensional features for 20 iterated feature maps extracted from the final network convolution layer trained by the CNN. Finally, for the 10 feature maps obtained from the final convolution layer of 10 different networks trained by 10 different input datasets, the t-SNE was used for visualization by converting these features into low two-dimensional data. The results were plotted as shown in Figure 13(c). The features of the raw images are scattered randomly, making it difficult to classify the fault and normal states. However, the features obtained by the trained network are grouped according to fault or normal conditions. Therefore, it is possible to classify the fault and normal states.

6. Conclusions

In this paper, a novel condition monitoring approach for the CSD system is proposed, based on the integration of CWT and CNN. In the CSD system, eight fault-states and one normal state were artificially manufactured. An IoT device for vibration measurement was also developed. The device features a wireless MEMS accelerometer, Bluetooth function, and Wi-Fi function. The vibration data were measured using the wireless MEMS accelerometer mounted on the CSD system. One-dimensional vibration signals in the time domain were transformed into time-scale images via CWT. These images were then classified by a CNN, which can extract the underlying, deep features embedded in images that are closely related to fault types. As critical factors affecting the network classification accuracy, the filter size and number of filters were optimized in the convolutional and pooling layers of the CNN structures. Input images obtained by the CWT and feature maps extracted by the CNN are high-dimensional data. Thus, t-SNE was used for visualization by converting the high-dimensional data into low two-dimensional data. Two-dimensional features enabled clear classification of the eight fault-states and one normal state. The results showed that the condition monitoring approach for the CSD system based on the integration of CWT and CNN is an excellent classification method.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.


This work was supported by the Inha University Research Grant.