#### Abstract

Vibration signal and shaft orbit are important features that reflect the operating state of rotating machinery. Fault diagnosis and feature extraction are critical to ensure the safety and reliable operation of rotating machinery. A novel method of fault diagnosis based on convolutional neural network (CNN), discrete wavelet transform (DWT), and singular value decomposition (SVD) is proposed in this paper. CNN is used to extract features of shaft orbit images, DWT is used to transform the denoised swing signal of rotating machinery, and the wavelet decomposition coefficients of each branch of the signal are obtained by the transformation. The SVD input matrix is formed after single branch reconstruction of the different branch coefficients, and the singular value is extracted to obtain the feature vector. The features extracted from both methods are combined and then classified by support vector machines (SVMs). The comparison results show that this hybrid method has a higher recognition rate than other methods.

#### 1. Introduction

With the increasing complexity of mechanical structures, it is increasingly important to monitor and diagnose the condition of rotating machinery such as hydropower units and wind turbine system. In many cases, the vibration signal and shaft orbit are important features that reflect the state of mechanical equipment. Therefore, an efficient feature extraction and fault diagnosis method plays a significant role in the operation management and condition monitoring of mechanical equipment.

In order to obtain useful information from the vibration signal that can reflect the operating status of mechanical equipment, various signal processing methods have been proposed, such as Fourier analysis [1, 2], empirical mode decomposition (EMD) [3, 4], and wavelet transform [5, 6]. On vibration signal processing, Guan et al. [7] used EMD to decompose the signal into several intrinsic mode functions (IMFs) with different signal-to-noise ratios. Mogal and Lalwani [8] studied the amplitude and phase of the vibration signal from three directions by using FFT and order analysis, making a number of conclusions applicable to fault recognition. However, when using the EMD method to batch process data, the number of IMF components is uncertain, and EMD is prone to end effect and pattern blending [9]. For signals with significant local characteristics, Fourier analysis methods are often powerless. Although these problems are mitigated by subsequent improvements or new methods, they still exist. Among many signal processing methods, wavelet transform is widely used in fault diagnosis of rotating machinery because of its good time-frequency localization. Chen et al. [10] used discrete wavelet transformation (DWT) to present time-frequency-distributed representation of the planetary gearbox vibration signals. Lu et al. [5] realized the vibration signal of hydropower generating units denoised with multiple wavelets.

In order to identify the different operating states reflected by the vibration signal, feature extraction is an important part of signal processing. Different from other feature extraction methods, singular value decomposition (SVD) extracts signal features from the perspective of data matrix transformation and thus has been widely used. Wang et al. [11] employed SVD on the established matrix and obtained an optimal weight vector according to the orthogonal matrix parameters. Golafshan and Yuce Sanliturk [12] used SVD to denoise the bearing vibration signal and got a higher reliability of fault diagnosis results. One of the contributions of this paper is to combine the wavelet transform and SVD to extract features of the swing signal of the rotor test bed.

In fault diagnosis, in addition to the analysis of the vibration waveform signal, the rotor shaft orbit is used as the information carrier for the actual operation condition of rotor machinery, and its shape characteristics are very important for judging the operating state of the unit. However, relevant research is limited because the shape of the shaft orbit is a two-dimensional image, making it hard for the feature to be extracted. The diagnosis heavily depends on the expert knowledge or experience. The common methods are Hu invariant moment [13], Fourier descriptor [14], Walsh transform [15], and so on, but these methods mainly rely on the mathematical transformation of graphics and cannot accurately extract the features of the image. In the recent years, with the rise of deep learning [16], deep learning techniques represented by convolutional neural networks (CNNs) have shown particular advantages and potentials in image feature extraction and recognition.

As one of the classical algorithms for deep learning, CNN has received more and more attention. Its unique convolution structure has powerful feature learning ability, which can effectively reduce the complexity of the network, and has strong robustness and fault tolerance, while being easy to train and optimize. These excellent characteristics make it a great success in the field of image recognition, and it also gets widely used in the fault diagnosis of rotating machinery. You et al. [17] applied CNN and support vector regression (SVR) for intelligent diagnosis of the automobile transmission gearbox. Janssens et al. [18] proposed a feature extraction model for condition monitoring based on CNN and reached the goal of learning useful features for bearing fault detection from the data itself autonomously. Jeong et al. [19] realized identification of the rotor shaft orbit by CNN. Verstraete et al. [20] developed a CNN to learn features directly from time-frequency image of rolling element bearings raw signal.

In this paper, DWT and SVD are used to obtain the features of swing signals (including *X* and *Y* directions) collected on the rotor test bed. Firstly, the DWT is used to transform the denoised swing signal of the rotor test bed, and the wavelet decomposition coefficients of each branch of the signal are obtained by transformation. Secondly, after single branch reconstruction of the different branch coefficients, the SVD input matrix is formed and the singular value (SV) is extracted to form the swing waveform feature vector. At the same time, a CNN is built to extract features of the different shaft orbits synthesized by swing signals. Lastly, the hybrid features extracted by SVD and CNN are input into support vector machines (SVMs) to identify the different operating status of the rotor test bed and thereby achieve the purpose of fault diagnosis.

The remainder of the article is organized as follows: in section 2, the theoretical background of the proposed method is introduced, including the basic structure and training process of CNN and the theory of DWT and SVD; Section 3 details the feature extraction method proposed in this paper; in Section 4, the hybrid feature extraction method is verified and compared with the other methods for the operating state recognition results of the rotor test bed, which shows the superiority of the method; and the conclusion is summarized in Section 5.

#### 2. Theoretical Background of the Proposed Method

##### 2.1. Convolutional Neural Network

CNN is a typical feedforward neural network. In the recent years, it has shown excellent performance in image recognition and target detection. It can handle overfitting problems well and enable deeper learning on a larger scale [21]. CNN has now been successfully applied in the fields of document recognition [22], voice detection [23], vehicle license plate recognition [24], and fault diagnosis [25, 26].

The basic structure of CNN can be divided into input layer, convolution layer, pooling layer, full connection layer, and output layer. The input layer is used to input the image; the convolution layer extracts the image features by convolution operation of the convolution kernel, and the pooling layer performs the pooling process on the feature extracted by the convolution layer to reduce the network calculation amount. The convolution layer and the pool layer are alternately distributed to further abstract a higher level feature on the previously extracted image features. The full connection layer receives the feature vector transmitted by the front-end network and calculates the network output, and the output layer classifies the graphics.

###### 2.1.1. Convolution Layer

Convolutional layer is the core component of CNN. On the convolutional layer, multiple convolution kernels are used to convolve with the input image, and after adding a bias, an activation function can be used to obtain a series of feature maps [27]. In general, it is calculated as follows:where is the th image of the th layer, is the nonlinearity activation function used by the convolutional layer, represents a selection of input feature maps, is the weight matrix corresponding to the convolution kernel, and is the additive bias given to each output feature map.

In neural networks, traditional nonlinearity activation function is sigmoid function , but it has the problem of the gradient disappearing [28]. The ReLU activation function solves the problem very well and has been widely used in deep learning neural networks in recent years [25, 29, 30]. It is described as follows:

###### 2.1.2. Pooling Layer

The pooling layer is often connected behind the convolutional layer. It can reduce the size of the picture because it combines adjacent pixels in a particular area of the image into a single representative value while maintaining the invariance of the feature scale to some extent.

A pooling layer is calculated as follows:where is the th image of the th layer (pooling layer) and is the pooling function. Each output image is given its own multiplicative bias and an additive bias .

Max pooling, mean pooling, and random pooling are currently the most commonly used pooling algorithm. In the CNN developed in this article, the mean pooling algorithm is adopted.

###### 2.1.3. Full Connected Layer

The classification phase of CNN is generally composed of a fully connected layer and a classifier, which mainly integrates and classifies information from the previous layer. In CNN, the commonly used classifier is the softmax classifier [29, 31].

The cross entropy function is very popular loss function in the deep learning network due to its high sensitivity to errors. Its expression iswhere is the actual output of output node, is the expected output of the output node, and is the number of output nodes. In this paper, the ReLU activation function and the cross entropy-driven learning rule are employed, and the minibatch stochastic gradient descent (MSGD) is applied to train the CNN.

##### 2.2. Wavelet Transform and Singular Value Decomposition

###### 2.2.1. Wavelet Theory

The discrete wavelet transform (DWT) directly discretizes the scale parameter and the shift parameter in the continuous wavelet transform (CWT). The continuous wavelet transform is expressed as follows:where is the scale parameter, is the shift parameter, and is the wavelet basis function.

and are discretized at the same scale, that is, and . Generally, [32]. Then, the DWT can be expressed as follows:

In order to implement multiscale analysis by computer, discretization is indispensable by processing the decomposition coefficients by using a fast algorithm. For the energy-limited physical signal , a finite precision decomposition method can be used, as follows:

In equation (7), is the scale coefficient, is the scale space, is the wavelet coefficient, and is the wavelet space. and can be calculated by the Mallat decomposition algorithm, which is shown in Figure 1. The details of the algorithm can be found in [33, 34].

In practice, the continuous signal satisfies the Shannon sampling theorem, and obtained by digital sampling can be used as an approximate representation of the initial coefficient , as shown in the following equation:

In this article, each wavelet coefficient is reconstructed by a single branch, and the signal component of the original signal at the scale corresponding to the coefficient is obtained, and its length is consistent with the original signal.

###### 2.2.2. Singular Value Decomposition (SVD)

Singular value (SV) is defined as follows: set a matrix . The arithmetic square root of the eigenvalue of the matrix is called the singular value of .

The essence of SVD is a matrix decomposition method. Its decomposition theory is that assuming a matrix , there exists a unitary matrix and a unitary matrix , which satisfies

Equation (9) is the definition of SVD, which means that matrix can be decomposed into the product of three matrices, where is the SV vector . In this paper, the wavelet decomposition coefficients of signal are used as SVD input matrix .

#### 3. The Proposed Hybrid Method

The proposed hybrid fault diagnosis method is a combinational algorithm based on CNN and DWT-SVD theories, and it is named CNN-wavelet SVD in the following. In this model, the CNN is only used as a feature extractor. The features extracted by CNN and DWT-SVD are combined and classified by support vector machine (SVM) [35, 36]. When it is used for fault diagnosis of rotating machinery in this article, the process can be divided into the following four steps, and the flowchart is shown in Figure 2:(1)The denoised swing signals collected on the rotor test bed are decomposed into a series of wavelet decomposition coefficients by DWT. After reconstruction of the different branch coefficients, the SVD input matrix is formed and the SVs are extracted as the feature vector.(2)The swing waveforms (*X* and *Y* directions) are synthesized into corresponding shaft orbit images. After preprocessing, the shaft orbits are processed into grayscale images and have the uniform size and serve as input to the CNN. Structure of the CNN developed in this paper is shown in Figure 3.(3)The CNN is trained with the shaft orbit images in different operating states, and feature vectors of the shaft orbit are obtained by using trained CNN.(4)The feature vector extracted on two different methods is combined into a hybrid vector. In the classification phase, a SVM is trained to recognize the operating condition of the rotor test bed.

As shown in Figure 3, the CNN applied in this article contains two convolution layers, two pooling layers, and a full connected layer. The convolution kernel size is determined as and , respectively. Other parameters are shown in Table 1.

#### 4. Experiment Results and Analysis

In order to verify the effectiveness of the method, the rotor test bed was used as the experimental object for feature extraction. The rotor test bed used in this work is shown in Figure 4. The test bed was driven by a DC motor, and the DH5600 speed controller was used to control its speed. The rotor had a diameter of 10 mm and a length of 850 mm. It was formed by connecting two segments of the shaft through flange. Two discs with a diameter of 75 mm were installed on the rotor. The sensors of the system included two displacement sensors for measuring the swing, an acceleration sensor for measuring vibration, and a rotational speed sensor for measuring the rotational speed. The rotational speed sensor was placed between the motor and the top bearing, and the displacement sensors were placed horizontally and vertically at a distance of 20 cm from the mass disc.

The swing and vibration signals collected by the sensor were ultimately transmitted to a computer for storage, display, and analysis. The sampling frequency was 2048 Hz, and the rotation speed of rotor was 1200 r/min. Figure 5 shows the signal collection and experimental control device.

This test bed can simulate a variety of rotor faults such as imbalance, misalignment, and rubbing of the unit. Among them, the imbalance fault was simulated by screwing a 2 g mass into the threaded hole near the edge of the mass disc to create unbalance centrifugal force; the misalignment fault is that the axis of the two shafts connected by the flange is not on a straight line, and it was achieved by misaligning the two shaft positions at the flange; the rubbing fault means that the shaft collides or rubs with other parts of the machine during the rotation. It can be realized by screwing the rubbing bolt to make it contact the rotating shaft.

##### 4.1. Data Acquisition and Processing

The studied states in this experiment included four types: normal state, unbalanced state, misaligned state, and contact-rubbing state. We collected 350 sets of swing waveform data under the normal state and unbalanced state, respectively; 210 sets of swing waveform data under the misaligned state; and 270 sets of swing waveform data under the contact-rubbing state. Each set of waveform signal includes 2048 points. The denoised swing signal of each condition and corresponding shaft orbit are shown in Figure 6. In this paper, since the shaft orbit is formed by combining and superposing multiple times of the swing signal, there is no problem in selecting the initial point for the identification and feature extraction of the axis trajectory.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

##### 4.2. Feature Extraction

The denoised swing signals under different states are decomposed by wavelet. In the experiment, wavelet basis function was selected as “DB8,” and value of decomposition layer was set as “4”. Using MATLAB’s wavelet function “wavedec” to decompose swing signals, a series of wavelet coefficients were obtained. The wavelet coefficients are reconstructed by the zero-padding extension, and the obtained wavelet reconstruction waveform is shown in Figure 7.

**(a)**

**(b)**

**(c)**

**(d)**

The five sets of wavelet reconstruction coefficients of each signal are used to construct an input matrix of SVD. Because the length of signal is 2048, . According to equation (9), there exists a unitary matrix and a unitary matrix , which satisfies

In equation (10), and . So are the SVs including signal component characteristics. In other words, is the SV vector representing signal feature.

As shown in Figure 6, shaft orbits of different operation conditions are different, so it is feasible to use CNN for feature extraction and recognition. In this experiment, 50% shaft orbit images of each condition were selected randomly and input to CNN for training, and the other half were used for testing. To prevent overfitting, the dropout algorithm was used in the CNN, and MSGD was used as the approximation method. The selection of number of features extracted from CNN is very important. Experiment was implemented with different number of features. The accuracy of CNN is shown in Figures 8 and 9.

In these two figures, the number of features extracted from CNN is changed from 10 to 100. As shown in Figures 8 and 9, CNN with different number of features has different convergences; when the number of features is between 10 and 50, the convergence process appears unstable; and when the number of features is between 60 and 100, the convergence process is relatively stable. Figures 9 and 10 show that the CNN with 90 features has better convergence and faster convergence speed, while more features may lead to the problem of overfitting.

The trained CNN was used to extract features of shaft orbit images, and the number of features is 90. Then, the extracted 90 features of each shaft orbit image were combined with the 10 features of corresponding swing waveform (*X* and *Y* directions) extracted by SVD to become a hybrid feature vector. The length of each hybrid feature vector is 100. To avoid large differences in the values of different feature vectors, all feature vectors were normalized.

##### 4.3. Comparison Experiments

In this section, the extracted hybrid feature vectors of different states were input into SVM for classification and identification. 50% feature vectors of each state were selected in the same random rule in Section 4.2 and input into SVM for training, and the other 50% were used for testing. In this paper, the SVM uses radial basis function (RBF) as kernel function. In terms of parameter selection, a grid parameter optimization algorithm and a cross-validation method are used to select the parameters of the SVM. In order to verify the advantage of the proposed feature extraction method, the classification and identification results were compared with the CNN. The comparison of the results of these two methods is shown in Table 2.

In Table 2, the identification rate of each generation of the CNN-wavelet SVD is 100%. While the identification rate of CNN is 49.49% when epoch = 1, it reaches 100% when epoch = 35. This comparison shows the CNN-wavelet SVD has a higher identification rate than the CNN in each epoch and initially indicates this method can be better applied to rotor mechanical fault diagnosis and feature extraction than traditional CNN.

To further highlight the effectiveness of the proposed feature extraction method, this paper explored the recognition effect of the method when the training set was a small sample and the test set was a large sample. Next, the accuracy of this feature extraction method was tested when the training set for each state was 10, 15, and 20 samples, and the rest were test set samples. As a comparison, the image features and waveform features in the mixed feature samples were separately input to the SVM for recognition.

###### 4.3.1. The Number of Training Set = 10

When the training set for each state is 10, the recognition rate of mixed features of each generation by SVM and the recognition rate of the image features of each generation by SVM are shown in Figure 11. The parameters of SVM are determined as the best penalty factor and the best core parameter given by a radial basis function where .

As shown in Figure 11, although the recognition rates of SVM on these two kinds of features can eventually reach 100%, it is clear that the recognition effect of SVM on hybrid features is better than that of single image features.

The recognition rate of SV features of the swing waveform by SVM is shown in Figure 12. In Figure 12, because of the similarity of SV features in misaligned state and contact-rubbing state, the SVM fails to effectively distinguish the contact-rubbing state based on waveform features, and the accuracy is 77.7193%.

The classification result of SVM on these three kinds of features is shown in Table 3. It is clear that, with a small number of learnable samples, the hybrid features extracted by CNN-wavelet SVD can still make the SVM distinguish the four states well.

###### 4.3.2. The Number of Training Set = 15

When the training set for each state is 15, the recognition rate of mixed features of each generation by SVM and the recognition rate of the image features of each generation by SVM are shown in Figure 13. As the number of training samples increases, the recognition rate of SVM on hybrid features and image features also increases, but the recognition effect of SVM on hybrid features is still better than that of single image features. The recognition rate of SV features of the swing waveform by SVM is shown in Figure 14, and the accuracy is 99.2857%.

In Table 4, the recognition results of SVM on the three different features in Figures 13 and 14 are compared, and it can be seen that, under the same experimental samples, the hybrid features can more fully reflect the operating state of the machine.

###### 4.3.3. The Number of Training Set = 20

When the training set for each state is 20, the recognition rates of SVM on hybrid features and image features are shown in Figure 15, and the accuracy of hybrid features has reached 100%. The recognition rate of SV features of the swing waveform by SVM is shown in Figure 16, and the accuracy is 99.8182%.

Table 5 shows the recognition rate of SVM on three different features, and it can be seen that the recognition rate of hybrid features is still the highest when the training set for each state is 20.

#### 5. Conclusions

In this article, a novel feature extraction method combining wavelet SVD with CNN is proposed. The wavelet SVD is used to extract waveform features of swing signals (*X* and *Y* directions) collected from the rotor test bed, and the trained CNN is applied for image features extraction of shaft orbits synthesized by corresponding swing signals. The number of features extracted from CNN is determined by comparative analysis. In experiment, SVM is used to distinguish the different running states of the machine based on the extracted features, and a grid parameter optimization algorithm and a cross-validation method are used to select the best parameters of the SVM.

In order to verify the effectiveness of the proposed method, the article first inputs 50% of the feature vectors extracted by this method into SVM for training and the other 50% for testing. And the recognition result is compared with CNN. The result initially shows that the hybrid features extracted by this method can better reflect the operating status of mechanical equipment. Furthermore, in order to highlight the effectiveness of the method, this paper tests the recognition effect of SVM on hybrid features when the training set is a small sample. As comparison, the same number of image feature samples extracted by CNN and waveform feature samples extracted by wavelet SVD are also, respectively, input into the SVM for training and testing. The results show that even when the training set is a small sample, SVM still has the highest recognition rate for hybrid features, which can effectively improve the classification accuracy of SVM when the training sample is insufficient.

In summary, the proposed CNN-wavelet SVD method is able to obtain more obvious features of rotor signals and better recognition effect than other methods. Therefore, it can be better applied to rotor mechanical fault diagnosis and feature extraction.

#### Data Availability

The data used to support the findings of this study are included within the supplementary information file.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

#### Authors’ Contributions

D.L.1. and Z.X. conceptualized the study. D.L.1. was responsible for the methodology. D.L.1. and X.L. analyzed using software. X.H. and D.L.4. performed formal analysis. Z.X. provided funding acquisition. D.L.1. wrote the original draft. P.Z. reviewed and edited the manuscript.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 51379160) and the 948 Program of the Ministry of Water Resources of China (Grant no. 201321).

#### Supplementary Materials

The supplementary material is the data used to support the findings of this paper.* (Supplementary Materials)*