Abstract

This paper constructs a novel network structure (SVD-1DCNN) based on singular value decomposition (SVD) and one-dimensional convolutional neural network (1DCNN), which takes the original signal as input to realize intelligent diagnosis of bearing faults. The output of the first convolution layer was also analyzed from the perspectives of time domain and time-frequency domain in the simulation experiment. Through qualitative analysis and quantitative analysis, it was found that the convolution kernel not only extracted the classification features of signals but also gradually highlighted the learned features in the network training process. Moreover, applying this network in fault diagnosis of bearing date provided by the Case Western Reserve University (CWRU) Bearing Data Center, it was found that the convolution kernel could also achieve the above operation. The novel network of this paper achieved a good classification effect on both the simulated signals and the measured signals.

1. Introduction

A small fault in a mechanical device often affects the stability and safety of the entire system and can even lead to catastrophic consequences [1]. As a key component of mechanical equipment, bearings are widely used in various types of machinery. Failure of the bearing can cause many serious mechanical failures, so the safe and smooth operation of the bearing is critical to the mechanical equipment. Timely detection, positioning, and troubleshooting of bearing faults can effectively improve the safety of industrial production. Therefore, it is of great significance to study the fault diagnosis of bearing.

The traditional fault diagnosis process generally consists of three steps: data acquisition, feature extraction and selection, and fault pattern recognition [2].

The collected data include vibration signal, acoustic signal, and temperature signal, and since the vibration signal can directly characterize the state of the mechanical equipment, the vibration signal is most commonly collected in fault diagnosis [3]. In the fault diagnosis of bearing, commonly used signal processing methods include Fourier transform (FT) [4], short-time Fourier transform (STFT) [5], wavelet transform (WT) [6], Wigner–Ville distribution (WVD) [7], and empirical mode decomposition (EMD) [8]. The above methods can extract features that are conducive for classification and diagnosis [9, 10] and then pass the extracted features through various classifiers to realize pattern recognition of bearing faults. Among the various pattern recognition methods, the machine learning-based method is the most used. Wang et al. [11] used KPCA to extract features from bearing fault signal and used k-nearest neighbor (KNN) as a classifier to achieve diagnosis; Fei et al. [12] reconstructed the characteristics of bearing vibration signal after singular value decomposition based on wavelet packet transform phase space and established support vector machine (SVM) model of bearing diagnosis; Mahamad and Hiyama [13] performed fast Fourier transform (FFT) and envelope processing on the bearing vibration signal, extracted time domain and frequency domain feature as input, and then used ANN to fulfill the diagnosis. However, the existing intelligent fault diagnosis methods based on the above feature extraction and classification still have three limitations: first, the feature extraction methods often require the operators to have professional prior knowledge and rich experience. As the research progresses, the form of input signal becomes more diversified, and its objectivity and accuracy may be affected if feature extraction is still based on past experience [14, 15]; second, the feature extraction methods are poor in generality, and often a method only has a good feature extraction result for a certain type of signal; and third, feature extraction and pattern recognition are two independent processes, and the diagnosis model cannot be jointly optimized globally.

In recent years, the successful application of deep learning in the fields of speech recognition [16], face recognition [17], computer vision [18], and image processing [19] has made it a research hotspot. Various deep learning models can extract abstract features directly from the original signal, avoiding manual extraction of feature [20], and they also have better universality [21] and can jointly optimize the two processes of feature extraction and pattern recognition in various classification problems [22]. Thanks to these advantages, researchers have introduced a variety of deep learning models in bearing fault diagnosis; for example, Duong and Kim [23] constructed a DNN structure which is based on the stacked denoising autoencoder (DAE) nonmutually exclusive classifier (NMEC) method for combined modes to realize bearing fault diagnosis, Shao et al. [24] developed a convolutional deep belief network with Gaussian visible units to obtain an excellent accuracy rate of bearing fault diagnosis, Chen and Li [25] utilized the acceleration sensors to collect the vibration signal of the bearing and input the time domain and frequency domain characteristics of the signal into multiple two-layer sparse autoencoder (SAE) neural networks for feature fusion, and then the fused feature was further classified by DBN. Lu et al. [26] established a deep neural network model based on autoencoder (AE) and achieved good results in bearing fault diagnosis. Shao et al. [27] proposed a novel optimization deep belief network (DBN) for bearing fault diagnosis which is verified by the simulation signal and experimental signal of a rolling bearing.

Figure 1 shows the main differences between the traditional fault diagnosis method and the deep learning-based fault diagnosis method.

Convolutional neural network [28] is a typical deep learning model that has also attracted attention. It extracts the characteristics of the signal layer by layer through convolution, pooling, and nonlinear activation function mapping. Compared with the fully connected deep learning model, CNN has stronger robustness and better generalization ability [29]. At the same time, CNN improves network performance and reduces training costs by weight sharing and pooling operation and is less prone to overfitting problem than other deep learning models [30]. From the perspective of input, the existing CNN models include two types: one-dimensional convolutional neural network (1DCNN) and two-dimensional convolutional neural network (2DCNN).

For 2DCNN, its input is actually two-dimensional matrix. In the fault diagnosis of bearing, researchers used a variety of methods to convert one-dimensional original signal into two-dimensional matrix and then used it as 2DCNN input. In [20], the one-dimensional signal was converted into two-dimensional gray map as the input of 2DCNN, and the input of 2DCNN in [31] was the root mean square (RMS) map of the characteristics of the vibration signal after Fourier transform (FT). In [32], the continuous wavelet transform scale (CWTS) map was directly classified by 2DCNN.

However, in practice, the bearing vibration signal is a one-dimensional time signal, and the method of converting the original one-dimensional signal to two-dimensional signal also depends on experience. These methods cannot guarantee whether there is torsion, distortion, or even loss of useful information in the conversion process, which may result in insufficient characteristic learning and low accuracy. Therefore, if the original one-dimensional signal is used as input directly, the input of the network will contain all the feature information in the original signal and the above problem can be avoided. In addition, compared with 2DCNN, 1DCNN has better interpretability, and the convolution kernel and its extracted feature are one-dimensional vectors, so that multiple signal processing methods can be used to study the convolution kernel and its extracted feature conveniently, which is conducive to further understand 1DCNN and its feature extraction mechanism.

For 1DCNN, its input is one-dimensional vector. In practice, the actual measured signal often contains a lot of noise, which will greatly increase the difficulty in extracting fault features in a simple shallow CNN model. In the case where the measured noisy signal is input, the diagnosis accuracy can be improved by the following two methods.

One idea is to preprocess and denoise the signal. Common denoising methods with good performance include wavelet transform [33], singular value decomposition (SVD) [34], and ARMED filtering [35]. The noise components in the signal are removed by an artificial method, and the denoised signal is used as the input of the 1DCNN. However, these methods also rely on experience. The denoised signal also loses some features. It is impossible to determine whether the removed signal components contain the classification features required by the network, and the process of denoising and network extraction is also two independent processes.

Another way of thinking is to reduce the influence of man-made, directly using the original signal as input, and complete feature extraction and pattern recognition through 1DCNN. Previous studies have shown that, for noisy signal, increasing the number of network layers allows the network to learn higher-level, richer signal classification features. However, there are two shortcomings in the network with deeper layers. First, the error is calculated by the chain rule in the form of backpropagation, which easily leads to the exponential decreasing or increasing of the gradient with the increase of layers. Therefore, the deeper the CNN network is, the easier it is to encounter gradient disappearance or gradient explosion problem, and the more difficult to train [29]. Second, the deeper the network layer, the more likely to cause network degradation, which leads to the increase of sample error in the training process. Similarly, increasing the number of feature maps can also increase the content learned by the network, enabling the network to learn more signal features, but it also brings overfitting problem to the network.

These problems have greatly limited the application of CNN in fault diagnosis. Therefore, this paper proposes a network structure based on SVD and 1DCNN (SVD-1DCNN), which improves the pattern recognition accuracy rate of the network by embedding the SVD layer in the network, and its input is the original signal. The feasibility of the method was verified by the simulated signal and the measured signal.

The rest of the paper is organized as follows: Section 2 briefly describes SVD-DCNN, Section 3 performs simulation experiment, Section 4 uses the proposed method for bearing fault diagnosis and verifies the effectiveness and feasibility of the method, and Section 5 presents the conclusions.

2. Materials and Methods

2.1. Signal Denoising Based on SVD

SVD is a classical matrix transformation method. Because of its zero phase offset, no initialization parameters, and easy implementation, it has been widely used in signal denoising.

For an arbitrary matrix, after SVD decomposition:where is a matrix of , is a matrix of , is a matrix of whose elements are 0 except those on the principal diagonal line, and the elements on the principal diagonal line of are called singular values of matrix .

Express and in matrix form as follows: and , where and .

Express in matrix form as follows:

When ,

When ,

is further rewritten into the form of and :where and is further rewritten into the form of matrix sum:

It can be seen that the essence of SVD is to decompose any matrix of into linear superposition of several submatrices of the same dimension. The weight of each submatrix, i.e., singular value , reflects the importance of the matrix. Singular values often imply potentially important information in matrix. Based on the above characteristics of SVD, singular values of signal matrix containing complex information can be conveniently selected to study, so as to provide the possibility of signal feature extraction.

As mentioned earlier, SVD is a decomposition method for matrix, but the actual signal is one-dimensional. Therefore, the key to extracting signal features by SVD is to transform one-dimensional signal into two-dimensional matrix. The existing forms of matrix construction mainly include Cycle matrix, Toeplitz matrix, and Hankel matrix. Among them, SVD based on Hankel matrix can better highlight the useful features of signals [36], which is conducive to the separation of useful signal and noise.

For a noisy signal with length , the Hankel matrix of the signal is constructed as follows:where , . Each matrix has multiple Hankel matrices with different column combinations. When constructing Hankel matrix, the product of row number and column number of matrix should be maximized as far as possible, and the best way to construct matrix should be square matrix or near square matrix [37]. According to the inequality principle, when and are equal or close, the product of the two numbers is the largest, so the structure of the optimal Hankel matrix is determined as follows:

The key of denoising noisy signal by SVD is how to determine the singular value of useful signal and the singular value of noise signal. Zhao et al. [38] proposed a method to determine the singular value of useful signal based on singular value difference spectrum. Assuming that the form of Hankel matrix of noisy signal is shown in equation (5) and that there are singular values of useful signal determined by singular value difference spectrum method, so the Hankel matrix of useful signal separated by this method can be expressed as

Furthermore, the denoised signal can be obtained by reducing to one-dimensional signal. It can be seen that the singular value difference spectrum method is a denoising method based on the characteristics of the data itself.

2.2. Proposal of Diagnostic Model

Figure 2 shows the SVD-1DCNN structure constructed in this paper. The network embeds an SVD layer after C1 to realize further feature transformation. The feature maps in the SVD layer are connected to the corresponding feature maps in C1.

The SVD-1DCNN mainly includes input layer, convolution layers, pooling layers, fully connected layer, output layer, and SVD layer. The convolution layers, the pooling layers, and the SVD layer are the core structures of the SVD-1DCNN, and each of the convolution layers, the pooling layers, and the SVD layer has several feature maps. Each feature map connected to a corresponding feature map on the adjacent layer, and the output of the previous layer feature map is the input of the next layer feature map.

In this structure, SVD layer denoises and reconstructs the output of C1 (primary classification feature) to achieve joint optimization of feature extraction and denoising and reconstruction. The denoised and reconstructed features are used as input to the next layer. In this network structure, the convolution kernels realize adaptive denoising of signals, and the useful feature components required by the network are highlighted. The SVD layer’s denoising and reconstruction process further highlights the features, so it is more conducive to network extraction classification features.

Since SVD-1DCNN includes a SVD layer, in order to ensure that the network can carry out backpropagation, it is necessary to ensure that the error can be backpropagated from S2 layer to C1 layer, and the weights and bias in C1 can be updated. This process of SVD-1DCNN is described below.

Suppose the signal in the input layer is , the output of C1’s feature map is , and the output of SVD layer’s corresponding feature map is . Therefore, the output of the node of the feature map of C1 is , the output of the corresponding node on the SVD layer’s corresponding feature map is , and the difference between the two is . The singular value difference spectrum method is based on the characteristics of the signal itself to achieve denoising, so when given the input, is a constant. In the process of error backpropagation, assuming that the error from S2 layer to the node on SVD layer’s feature map is , then the error from this node on SVD layer’s feature map to the corresponding node on C1’s corresponding feature map is . Because is a constant, is differentiable to the weights and bias in the backpropagation process. In the process of network training, the weights and bias in C1 can be updated by .

Figure 3 shows the flowchart of the method, and the specific steps are as follows:Step 1: the original signals are taken as network input. After the feature transformation through the C1 layer, the primary classification features in the original signals are extracted, and the primary classification features are used as the input of the SVD layer.Step 2: in each training process, the SVD layer denoises and reconstructs the output of C1 to further extract higher-level classification features.Step 3: the classification features extracted by the SVD layer are used as the input of the next layer to achieve deeper feature transformation.

In each training process, the SVD layer denoises and reconstructs the output of C1, which is conducive to the network to obtain more obvious classification characteristics of signals under the background of noise and enhance the network fault diagnosis ability.

3. Performance Analysis Based on Simulated Signals

3.1. Construction of Simulation Signals

The following are four types of simulation signals with a signal-to-noise ratio of 20 dB. The feasibility of the research is verified by the classification of four types of signals. Table 1 shows the relevant parameters of the signals, where is the Gaussian white noise and is the phase of the signals, which is randomly generated between 1 and 100. The time domain and time-frequency diagrams of the four types of signals are shown in Figure 4.

In SVD-1DCNN, since the original signals contain noise, the output of each feature extraction layer also contains noise. In order to quantify the feature extraction effect of each feature extraction layer, is defined as an index. Assuming that the output of a layer is and the denoised signal reconstructed by SVD is , the noise component can be expressed as . Then the of the output of this layer can be expressed by the following equation:

Further, the of the simulation signals are calculated by equation (9), and the results are shown in Table 2. As can be seen from Table 2, can accurately reflect the SNR of the signals.

3.2. Pattern Recognition of Simulated Signal

The network structure of SVD-1DCNN is as shown in Figure 2. Because SVD-1DCNN is an improved network based on 1DCNN, its network structure and network parameters are also based on specific pattern recognition tasks and experience. In this pattern recognition task, there are only four types of simulated signals, so according to the previous experience, the learning rate of the network is set to be 0.1, the training batch is set to be 10, and the maximum number of iterations is set to be 1500, the pooling method of the two pooling layers is average pooling, and the step size is set to be 2. For convenience of representation, (m, n)-[p, q] is used to represent the relevant parameters in the network, where m and n, respectively, represent the size of the convolution kernels in the two convolution layers, and p and q, represent, the number of convolution kernels in the corresponding convolution layer.

SVD is usually used in the preprocessing in signal processing, that is, the original signals are denoised firstly and then the denoised signals are used in the subsequent analysis. Therefore, in order to compare the classification effects, a network structure (SVD + 1DCNN) is constructed. In the new network, the original signals are denoised firstly, and the denoised signals are used as the input of the network to realize pattern recognition as shown in Figure 5. The other parameters setting in the new network are the same as in SVD-1DCNN.

Four types of simulation signals are used for the experiment. Each signal contains 60 samples (50 training samples and 10 test samples). In order to verify the stability of the networks, 10 experiments were conducted for each network structure. In addition, the classification results of each experiment of the two networks were evaluated by confusion matrix and accuracy. Confusion matrix is calculated by the four parts composed of true label and prediction label, which are true positive (TP), false negative (FN), false positive (FP), and true negative (TN), respectively. The confusion matrix is shown in Table 3.

The accuracy is the overall judgment of the classification model and the proportion of correct prediction in the total amount. The calculation method is as follows:, , and were, respectively, used to represent the classification accuracy of each experiment, the average accuracy, and the variance of the accuracy, and then , , and satisfy the following equations:

In the classification of simulation signals, different network structures need to be set up for multiple experiments to determine the best network structure. Among various network structures, the network whose structure is (351, 80)-[3, 3] is taken as an example to show its confusion matrix in one experiment and the and of the network after 10 experiments.

Table 4 is the confusion matrix in one experiment when the structure of SVD-1DCNN is (351, 80)-[3, 3].

Table 5 is the confusion matrix in one experiment when the structure of SVD + 1DCNN is (351, 80)-[3, 3].

As can be seen from Table 4, SVD-1DCNN can classify every type of signal correctly; as can be seen from Table 5, for , the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as . For , the classification accuracy of SVD + 1DCNN is 80%, but 10% is classified as , and the remaining 10% is classified as . For , the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as . For , the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as . In general, SVD-1DCNN has a higher classification accuracy than SVD + 1DCNN.

10 experiments were carried out on both networks. Table 6 shows of each experiment, , and of the two networks after 10 experiments.

As can be seen from Table 6, in multiple experiments, the variance of SVD-1DCNN is 0 and the variance of SVD-1DCNN is 1.25 × 10−4, indicating that both networks have excellent stability. According to the final experimental results, SVD-1DCNN has a higher than SVD + 1DCNN. In the classification of simulation signals, SVD-1DCNN has a better classification effect.

In addition, and of SVD-1DCNN and SVD + 1DCNN were calculated with different network structures, as shown in Table 7.

As shown in Table 7, both networks have excellent stability, and the classification effect of SVD-1DCNN is better than that of SVD + 1DCNN. It can be seen that the number of convolution kernels has a greater impact on the classification results. The number of convolution kernels in the network is too small to make the network fail to achieve high classification accuracy, but the number of convolution kernels is not as good as possible. Excessive convolution kernels may even reduce the training effect of the network.

In the above network structure, the network structure of (351, 80)-[3, 3] has the best classification effect, so it will be taken as the research object in the following part. Therefore, the final parameters of SVD-1DCNN are as follows: the first convolutional layer contains three convolution kernels, each of which has a size of 1 × 351; the second convolutional layer contains three convolution kernels, each with a size of 1 × 80; the learning rate is 0.1; the training batch is 10; the maximum number of iterations is 1500; the pooling mode of the two pooling layers is average pooling; and the step size is 2. The corresponding parameters in SVD + 1DCNN are the same as those in SVD-1DCNN.

3.3. Analysis of the Role of Convolution Kernel
3.3.1. Qualitative Analysis

In the SVD-1DCNN network structure, each feature map in the convolution layer contains a convolution kernel, and the convolution results of the convolution kernel with the signals are the output of the feature map. In order to analyze the role of the convolution kernels during training, is taken as an example. During the training process, the output of the feature map of C1 of SVD-1DCNN is extracted, and its time domain and time-frequency diagrams are as shown in Figure 6.

It can be seen from Figure 6 that, during the training process, the convolution kernel highlights part of the frequency characteristics and suppresses other frequency characteristics. C1 highlights this portion of the frequency characteristics as primary classification features of the input. It is worth noting that the convolution kernel only selects part of the frequency features from the input as the primary classification features, which adaptively realizes the dimensionality reduction of the data and improves the classification efficiency of the network.

Figure 7 shows the time domain and frequency domain diagrams of the output of the C1 feature map for the four types of signals at the end of the network training. It can be seen that the convolution kernel performs different denoising operations on the four types of signals and retains the main frequency components in the original signals.

At the same time, in order to more intuitively analyze the feature extraction effect of the convolution kernel on the signals, the output of C1 is shown in Figure 8. Figure 8 shows the time-frequency diagrams of the four original signals and the time-frequency diagrams of the output of C1 during network training.

It can be seen from Figures 7 and 8 that, in the training process, the convolution kernel realizes the feature extraction of the original signals. As the number of iteration increases, the noise components are gradually eliminated, highlighting the features learned by the network.

3.3.2. Quantitative Analysis

In order to analyze the feature extraction effect of convolution kernels on the original signal, is used as the index for evaluation. Table 8 is the of C1’s three feature maps’ output when the iteration times are 200, 500, and 1000, respectively.

In order to visually explain the change in , is used as an object. Figure 9 shows the of three feature maps’ output according to equation (9).

It can be seen from Figures 8 and 9 that, in the training process, the primary classification features of each convolution kernel extraction have a higher than the input, which highlights the useful feature components in the signals. At the same time, as the number of iterations increases, the of the primary classification features is higher, and the useful feature components in the signal are more significant.

Through simulation experiment, it can be found that, in the training process, the convolution kernels can adaptively remove the noise components in the signals according to the characteristics of the original signals and retain the learned features. It can be said that the convolution kernels not only extract the characteristic components in the original signals, but also achieve denoising.

3.4. Analysis of Two Networks’ Classification Effects

SVD + 1DCNN and SVD-1DCNN have different classification effects on the same dataset. In the two network structures, except for the structure, the other parameters are the same. By comparing the two network structures, it can be seen that in SVD + 1DCNN, the S2’s input is the output of C1, and in SVD-1DCNN, that is the output of the SVD layer. Therefore, of the S2’s input is calculated to analyze the feature extraction capabilities of the two networks. For convenience of presentation, map 1, map 2, and map 3 are used to represent the output of C1 of SVD + 1DCNN, and Smap 1, Smap 2, and Smap 3 are used to represent the output of the SVD layer in SVD-1DCNN. Figure 10 shows of the S2’s input in two networks.

It can be seen from Figure 10 that, in the training process of the two networks, the of S2’s input increases with the increase of the number of iteration, and the networks’ feature extraction ability is stronger. With the same number of iteration, the of S2’s input in SVD-1DCNN is higher than that in SVD + 1DCNN. This indicates that the input features of S2 in SVD-1DCNN are more obvious. In further feature extraction, features with high are more conducive to network learning high-level features.

Therefore, it can be said that SVD-1DCNN has stronger feature extraction ability than SVD + 1DCNN, which is conducive to improving the accuracy of pattern recognition.

4. Bearing Fault Diagnosis Based on SVD-1DCNN

4.1. Data Collection and Processing

The experimental data in this paper come from the bearing database of the Case Western Reserve University (CWRU) [38]. The experimental data are the acceleration data of the drive end at a sampling frequency of 12 KHz. The data include four types: data with a fault diameter of 0.007 mils on the rolling element, data with a fault diameter of 0.007 mils on the inner ring, data with a fault diameter of 0.007 mils on the outer ring, and normal data.

The length of each segment of the signal is about 120,000. In order to increase the randomness of the training set and the test set, a window of length 1024 is used to sample the signal in random steps from the first node of each signal, as shown in Figure 11. In the sampling process, 60 samples are obtained from each signal, and among the 60 samples, 10 samples are randomly selected as the test set, and the remaining samples are used as samples of the training set. In this way, a training set containing 200 samples and a test set containing 40 samples are obtained.

4.2. Fault Diagnosis of Measured Signals

Figure 12 shows the signals in the four states in the time domain and time-frequency diagrams. It can be seen that the potential failure modes are masked by noise and the signal characteristics are hidden in strong background noise and unrelated interference.

As can be seen from Figure 12, the four measured signals contain a large amount of noise, which increases the difficulty of pattern recognition. The of the four types of signals is calculated according to equation (9), as shown in Table 9.

Figure 13 shows the output of C1 of four types of signals during network training. As can be seen from Figure 13, the characteristics of the original signals are submerged in a large amount of noise, but after C1 convolution, the noise in the original signal is gradually eliminated. As the number of iteration increases, the characteristic components in the original signal gradually become prominent.

In order to visually reflect the change process of , the rolling element fault signal is selected for explanation. Figure 14 shows the change of of the output of the three feature maps of C1 during the training.

It can be seen from Figure 14 that the measured signal has a lower , and the of the signal is improved after the C1 feature extraction. In the pattern recognition of the measured signals, the convolution kernel also selectively filters out the noise in the original signals, and as the number of iteration increases, the denoising effect is more significant.

In addition, of the S2’ input of SVD + 1DCNN and SVD-1DCNN is calculated as shown in Figure 15.

As can be seen from Figure 15, for the measured signals, similarly, in the training of the two networks, of the S2’s input increases as the number of iteration increases. At the same number of iteration, of the S2’s input in SVD-1DCNN is higher than that in SVD + 1DCNN. Through the experimental analysis of the measured signals, it can be said that SVD-1DCNN has stronger feature extraction ability than SVD + 1DCNN. The confusion matrices of SVD-1DCNN and SVD + 1DCNN in this classification process are shown in Tables 10 and 11.

As can be seen from Table 10, SVD-1DCNN can correctly classify each type of measured signals. According to Table 11, for roll damage signal, the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as normal signal. For inner ring damage signal, the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as outer ring damage signal. For outer ring damage signal, the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as inner ring damage signal. For normal signal, the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as roll damage signal. For measured signals, SVD-1DCNN had a higher classification accuracy than SVD + 1DCNN.

10 experiments were carried out on both networks. Table 12 shows of each experiment, and of the two networks after 10 experiments.

As can be seen from Table 12, in the classification of measured signals, both networks have excellent stability, and the experimental results of SVD-1DCNN are better than those of SVD + 1DCNN.

5. Conclusions

This paper proposes a fault diagnosis method based on SVD and 1DCNN, which takes the original signals as input and avoids the loss of feature information. The feasibility of the method is verified by experiments of simulated signals and measured signals. In addition, the role of convolution kernels in feature extraction is also analyzed. The main conclusions can be summarized as follows.

A novel network structure, SVD-1DCNN, is constructed by embedding an SVD layer after the first convolution layer of 1DCNN. In the novel network, the SVD layer denoises and reconstructs the output of the first convolution layer (primary classification feature) to achieve joint optimization of feature extraction and denoising and reconstruction, and the output of the SVD layer is used as input to the next pooling layer. Experiments show that the method has higher pattern recognition accuracy, which shows SVD-1DCNN is more conducive to the accurate diagnosis of bearing faults.

By analyzing the output of the first convolution layer, it is found that the convolution kernels in the network extract different frequency components for different signals and filter out other frequency components. In the training process, the convolution kernel plays the role of extracting features and denoising, and as the number of network training increases, the denoising effect of the convolution kernel is better.

Data Availability

The data used to support the findings of this study are available in [39].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was funded by the National Natural Science Foundation of China (grant number: 51705531).