Due to variation of working conditions and influence of noise in vibration data, rolling bearing intelligent diagnosis based on deep learning faces challenges in efficient utilization of monitoring data and scientific extraction of fault features. This study proposes a one-dimensional convolution neural network (1D-CNN)-based intelligent diagnosis method for a rolling bearing, which fuses the horizontal and the vertical vibration signals, makes full use of spectral order features by full-spectrum analysis, and achieves accurate classification of fault pattern by 1D-CNN model. The experimental datasets of constant and variable working conditions of rolling bearing are constructed. The test results of the proposed method show that spectral order features are extracted effectively by full-spectrum analysis and high diagnostic accuracy is obtained by the constructed 1D-CNN model on both datasets. The comparison with the other four similar methods indicates that the diagnostic accuracy of the proposed method outperforms the comparative methods significantly in the case of variable operating conditions.

1. Introduction

Due to the complexity of the working environment and the variability of operating conditions, rolling bearings are prone to get various faults during operation. The development of accurate fault techniques of rolling bearing is crucial to monitor the health condition of rotating machinery [1]. With the fast development of multisensor technology and artificial intelligence, intelligent diagnosis (ID) methods have been put into effect, which monitor the health condition of bearing based on a data-driven model and avoid the dependence on professional technology of analytic-based model or knowledge-based model [2]. As a representative artificial intelligence method, deep learning originating from shallow neural networks has the advantage of learning the potential features automatically from raw data and mapping the learned features to the target output with the assistance of deep hierarchical architectures [3]. Several typical deep learning approaches have been employed in the field of bearing ID, such as stacked autoencoder (AE) [4], deep belief network (DBN) [5], long short term memory (LSTM) [6], and convolution neural network (CNN) [7, 8], and have revealed outstanding performances in these references.

CNN biologically inspired by the visual cortex of mammals has the capacity for fusing multisource raw data appropriately and learning highly discriminative features by alternating convolution and subsampling operations. Due to its sharing weight structures, CNN needs a smaller number of training parameters than other deep learning approaches. In view of these advantages, CNN is widely used in image recognition [9], electrocardiogram signal classification [10], mechanical fault diagnosis [11], etc. There are two main CNN models in the area of bearing ID, that is, 2-dimensional (2D) CNN-based model and 1-dimensional (1D)-CNN-based model. Researchers have put forward a number of effective solutions to transform the 1D vibration data into 2D images, such as wavelet transform [12], wavelet packet transform [13], and spectral kurtosis diagram [14], in which 1D vibration signals are converted into 2D time-frequency images. The other path is to physically cut off 1D vibration signals and reshape them to meet the structural requirements of 2D-CNN. For example, Wang et al. [15] cut off vibration signals of three orthogonal directions into a series of segments with the same length and then converted signal amplitudes of the same position as pixel values of 2D image. Compared with 2D-CNN-based diagnosis models, 1D-CNN-based ones are more convenient in classifying mechanical vibration data, since it no longer needs to convert dimensions. Up to now, many scholars have adopted such models. For example, Levent et al. [16] provided a compact adaptive 1D-CNN classifier for induction-bearing fault diagnosis without any dimension transformation, hand-crafted feature extraction, and feature selection. Appana et al. [17] employed envelope spectrum and 1D-CNN to deal with bearing fault diagnosis under variable speed, in which 1D-CNN automatically extracted high-quality features and classified bearing defects. On the other hand, many literature have been proved that multichannel data fusion is an effective way to improve the accuracy of diagnosis. In literature [18], Jing et al. combined four types of signals (i.e. vibration signal, acoustic signal, current signal, and instantaneous angular speed) together with head-to-tail connection as one data sample to form 1D data-level fused input data. Chen et al. [19] combined horizontal and vertical vibration signals in parallel as an input sample of 1D-CNN. Besides this, Bai et al. [20] proposed a diagnosis strategy based on multichannel CNN combining multiscale clipping fusion data augmentation technique. Xue et al. [21] established a two-stream feature fusion CNN model for rolling bearing fault diagnosis. In the aspect of variable condition diagnosis, many scholars have put forward methods of resolution. For example, Zhang et al. [22]constructed a multimode CNN model to improve the fault detection accuracy of rolling bearings under variable working conditions. Li et al. [23] proposed a data-driven fault feature separation method to eliminate the working condition information and extract the precise classable feature for fault diagnosis. Su et al. [24]proposed a hierarchical branch CNN scheme that considered polluted data and variations in the working environment conditions. However, the above studies rely on practical experience to construct features, some of which are limited to analyzing specific working conditions.

The literature mentioned above reveal that both 2D-CNN and 1D-CNN have been successfully utilized in some applications of bearing fault diagnosis. Nevertheless, the intelligent diagnosis of rolling bearings with high efficiency and high precision in actual scenarios still suffers from three challenges as follows: (1) the kinds of sampled data from monitoring sensors are variable. If all of data are used as CNN inputs, then the computational load will increase significantly. Thus, enhancement of data utilization efficiency is an important issue for improving the diagnosing efficiency. (2) The organization of input data gives a great influence on the diagnosis effect, especially under various working conditions. The trained network may output wrong diagnosis results when the input data are organized improperly. In this point, the research of the proper data organization is of great significance to CNN-based diagnosis approaches. (3) The parameters of CNN structure have a significant impact on the diagnosis results, such as the number of layers, the size of kernels in each convolution layer, and the subsampling rates in each pooling layer. The optimal parameters are usually obtained by trial and error. Hence, in the actual diagnosis scenario, the establishment of an optimal CNN hierarchy is worth further study. In the background of above analysis, this study proposes a novel 1D-CNN-based diagnosis method for the rolling bearing, in which the full spectrum of dual-sensors data is calculated, the elements of each spectral order orbit are combined as input information, and a constructed 1D-CNN network is used for realizing fault pattern classification. The main contributions of the study are summarized as follows: (i) through analyzing the full spectrum of vibration signals in two orthogonal directions of rolling bearing in operation, the relationship between the vibration frequency response of the two directions is displayed, which provides more information about the characteristic of specific malfunctions than half spectrum like Fourier spectrum. (ii) Spectral order based on rotating frequency is considered as datum, and three elements of each spectral order orbit are combined as a pixel point for a sample. On account of this, the approach can obtain satisfactory performance when the rolling bearing works under different speeds. (iii) The proposed approach designs a 1D-CNN-based model, which integrates the idea of pixel processing in 2D-CNN to fusion the three elements of each spectral order orbit for better feature extraction and categorization. The performance of the designed model is verified to be superior to the other four similar methods.

The rest of this study is structured as follows: Section 2 introduces the basic theory of full spectrum, spectral order, and 1D-CNN model. Section 3 presents the process of the proposed approach and the detailed setting parameters of the designed 1D-CNN network. In Section 4, the proposed approach is used in analyzing rolling bearing experimental datasets, and then it is compared to the other four similar methods in classification performance. Finally, the conclusions and future work are discussed in Section 5.

2.1. Full Spectrum

As a commonly used spectral analysis technology, the Fourier spectrum is calculated from one channel signal and provides information about contained spectral components in the right half-plane. Nevertheless, in the case of multichannel homologous signals, half spectrums are independently obtained from these signals, giving no information about the phase correlation among them. In contrast, full spectrum, with the aid of analyzing two homologous channel data in orthogonal directions, obtains forward and backward spectral components in relation to the direction of rotor rotation, and the relative phase correlation between the spectral components of the two directions [25]. Thus, full spectrum is a more powerful tool in the field of mechanical fault diagnosis.

Full-spectrum technology is based on the fact that the rotor lateral response is the synthesis of a series of spectral components. The trace of the rotor is considered to be composed of a series of elliptical orbits under these spectral frequencies [26]. Each elliptical orbit is a sum of two circular orbits, rotating at the same frequency but in the opposite direction. The radiuses of the forward orbit and reverse orbit represent the spectral component strength in the direction of rotation and the opposite direction, respectively. The relative phase correlations between the forward and reverse responses are revealed by the inclination angle of ellipses.

Supposing and are two vibration signals with length , which are collected from two orthogonal directions of rolling bearing, we can achieve full-spectrum analysis through the following steps:(i)Constitute a complex sequence.where represents imaginary unit.(ii)Implement Fourier transform (FT) to the sequence , gotten , where and are the real part and imaginary part of , respectively.(iii)Calculate long axis and short axis of each ellipse as follows:where and are the long axis and short axis of the th ellipse orbit, respectively, and indicates taking complex modulus.(iv)Define inclination angle of each ellipse as the angle between long-axis and horizontal axis, and calculate the tangent of by the following equation:

It can be seen that the information fusion of dual-sensor data can be carried out with only one complex Fourier transform. Due to leakage phenomena and lack of time-domain resolution of FT, time-domain signal is commonly multiplied by a window function to weaken influence of the two drawbacks before implementing FT. In this study, each sample of dual-sensor data is collected under one working condition, and Hann window function is chosen to overcome spectrum leakage. Since each ellipse orbit reveals the information of mono-frequency component, we can use it to represent the spectral component. Each ellipse orbit is determined by three elements: long axis, short axis, and inclination angle, which are marked as . Particularly, one case is that the ellipse degenerates into a circular, its relative phase is 0, and the long axis equals the short axis, that is, . The other case is that the ellipse degenerates into a straight line, then the strength of forward spectral component or reverse spectral component is equal to 0, and the relative phase is defined as the orientation of the line, that is, .

2.2. Spectral Order

Spectral order is defined as the ratio of one frequency to a reference frequency. For rotating machinery, the rotation frequency of shaft is commonly taken as the reference frequency, since most fault characteristic frequencies are directly related to it. Thus, rotating frequency-based spectral order analysis is an effective technology for fault diagnosis under variable speed conditions.

The structure and parameters of a rolling bearing are shown in Figure 1, and its four fault characteristic frequencies can be expressed as follows:where , , , and represent the fault characteristic frequency of outer race, inner race, ball, and cage, respectively. is the number of balls. denotes the rotation frequency of shaft. and are the roller diameter and pitch diameter, respectively. Through formula (4), it is seen that the four fault characteristic frequencies are proportional to rotation frequency. Based on this, rotating frequency-based spectral orders can be used as fault features for distinguishing.

Supposing rotating frequency is obtained, the spectral order of a discrete sequence can be calculated by discrete-time Fourier transform (DTFT) as follows:where represents the spectral value at the th order and is the sampling frequency of sequence . The rotating frequency should be a fixed value in formula (5); otherwise, spectral order should be obtained by the equal angle sampling technique. In this study, each vibration signal is sampled at a fixed rotating frequency, and different vibration signals are sampled at different rotating frequencies. Hence, it is competent to calculate spectral order by DTFT.

2.3. 1D-CNN

As a feed-forward neural network, CNN with three characteristics (i.e. local receptive fields, shared weights, and spatial subsampling) has shown outstanding performances in many areas such as image recognition and video processing. Its architecture mainly consists of convolution layers, pooling layers, and fully connected layers with an output classifier. 1D-CNN is a degraded version of CNN, where input raw data, filter kernels, and feature maps are all one dimensional.

The main function of convolution layer is to nonlinearly map input data into a series of feature vectors, named feature maps. Filter kernels, acting as a visual cortical perceptron, are convoluted with the input data of their receptive fields. Then activation function receives convolution results with biases to obtain feature maps. The mathematical model of convolution layer can be expressed as follows:where and are the output and bias of the th feature map at layer , respectively, and are the output of the th channel and the number of channels at layer , respectively, is the shared weight between the th channel at layer and the th feature map at layer , and denotes an activation function. Through convolution operation layer by layer, the inherent features of input samples can be learned naturally and expressed as the output of feature maps.

The pooling layer usually follows after each convolution layer, and its main function is to reduce the dimension of feature maps and maintain the invariance of characteristic scale. Max pooling, mean pooling, and stochastic pooling are the three commonly used pooling methods. A good pooling method can speed up calculation and prevent overfitting. In our proposed 1D-CNN model, max pooling function is chosen for its super performance on the preservation of local features, whose formula is as follows:where is the output of the th pooling region in the th feature map, and are the size and the stride step of the pooling region, respectively, and denotes the corresponding element of the pooling region.

Fully connected layer receives the feature maps extracted by the previous convolution and pooling layers and summarizes them with a fully connected single layer perceptron to obtain a higher level feature. At last, a designed classifier function acts as the decider of categorization judgment. Softmax regression function is one of the commonly used classifier functions, and its mathematical expression can be written as follows:where the output denotes the estimated probability for the class with the input feature vector , is the number of class, and and represent the weight coefficients and bias in the fully connected layer, respectively.

3. Proposed Diagnosis Approach

The flowchart of the proposed diagnosis approach is illustrated in Figure 2. It mainly consists of four essential stages: (1) preprocessing raw vibration signals of rolling bearing; (2) analyzing full spectrum and calculating the three elements of each spectral order to form spectral order features as the input of constructed 1D-CNN network; (3) constructing a 1D-CNN network structure as a diagnosis decision maker; (4) training the constructed network with known samples to obtain its effective and robust parameters and diagnosing testing samples according to the output of the trained network.

3.1. Raw Data Preprocessing

In our method, vibration data in two orthogonal directions are considered as the raw data sample to enrich the expression of fault information. First, two acceleration sensors installed in the orthogonal direction around the rolling bearing acquire raw data at the same time. Second, the raw data of each health state are overlapped sliced into lots of segments by a Hann window function of length . In this way, we can obtain enough samples of each health state and maintain time-domain resolution when performing Fourier transform. Figure 3 shows the detail of the processing step, where samples will be obtained if the length of the raw data is :where is the length of Hann window function and is the length of shift. Third, the two segments of the orthogonal direction with the same time are combined into a complex sequence, as shown in formula (1).

3.2. Spectral-Order Feature Formation

After raw data preprocessing, the maximum spectral order number is set according to the length of each complex sequence. Then, the spectral order of each complex sequence is calculated by discrete-time Fourier transform (DTFT), as shown in formula (5). Supposing represents the spectral value at the th order , we take the real and imaginary part of and record them as and , respectively.

As described in Sections 2.1 and 2.2, the spectral value of each order reveals some information of a rotation-frequency-based component and each spectral order corresponds to an ellipse orbit whose shape is determined by three elements, that is, long axis, short axis, and inclination angle. Therefore, we use the three elements of each ellipse orbit to represent the corresponding spectral order component. The long axis and short axis of each ellipse orbit are calculated as shown in the following formula:where and are the long axis and short axis of the th ellipse orbit, respectively, and indicates taking complex modulus. The inclination angle of each ellipse orbit is calculated according to the following formula:where is the inclination angle of the th ellipse orbit and denotes the tangent of .

After that, the three elements of each ellipse orbit are combined together as a tricolor pixel . All these tricolor pixels are spliced one by one into a spectral order feature matrix.

is normalized by column and is then used as an input sample of the constructed 1D-CNN.

3.3. 1D-CNN Construction

Similar to three channels of image signal, the three elements of spectral orders can be regarded as three channels of a 1D-CNN input. The network structure of 1D-CNN is shown in Figure 4. It is mainly composed of one input layer, four convolution pooling layers, one flattening and random dropout layer, two fully connection layers, and one output layer. In order to avoid the variation of internal covariates and gradient dispersion of neurons in each layer and accelerate the convergence speed of the network, the convolution results of each layer are batch normalized before being fed into activation functions. The batch normalization (BN) operation is described as follows:where and are the input and output of BN operation respectively, and are the mini-batch mean and the mini-batch variance respectively, and is an infinitesimal to avoid zero denominator. and are the scale transformation factor and the offset factor respectively, which are learned during network training to restore the network expression ability.

In view of the experience of the existing literature, the size of convolution kernels used in 1D-CNN should be longer so as to increase the receptive field of each convolution kernel. Meanwhile, because the first convolution layer receives three channel data of spectral orders, we set the kernel size of first convolution layer with a length of 24 and a width of 3. As the output mappings of the first convolution layer become one dimensional, the widths of kernels in subsequent convolution layers are set to 1, but the lengths of kernels are set to decrease layer by layer. The stride step of all convolution layers is 2, so that the size of the feature map is halved. Each convolution layer is followed by a maximum pooling layer with both pooling region size and stride step of 2, so that the size of the feature map is halved again. The number of channels in the first layer is 32, and then the number of channels in the following layers is doubled or kept unchanged. The output of the last pooling layer with a size of 4 × 128 is flattened into a 512-dimensional vector. In order to reduce the interdependence and structural risk among neurons, the random dropout technique is used to the output of flattening layer with a dropout probability of 0.5; that is, the output of each neuron in the hidden layer is set to zero with 50% probability during network training. Then, the output of random dropout layer is fed into two fully connected layers for classification. The specific parameters of construction network are listed in Table 1.

Figure 5 shows the operation details of the first two convolution layers. In the first convolution layer, the three-channel data of input are convoluted with kernels of size 24 × 3 to output the first level feature mappings. Taking the first block data of input convoluted with kernel 11 in Figure 5 as an example, the result of 4 is obtained based on the convolution operation of 0 × 1 + 1 × 1 + 0 × 1 + 1 × 0 + 2 × 1 + 0 × 1 + 1 × 1 + 0 × 0 + 1 × 0. In the second convolution layer, the output mappings of the first convolution layer are convoluted with kernels of size 18×1 to output the second level feature mappings. Taking the first block data of map 11 convoluted with kernel 21 as an example, the result of 4 is obtained based on the convolution operation of 4 × 1 + 6 × 0. The convolution operation of subsequent layers is the same as that of the second convolution layer.

3.4. Network Training

In our 1D-CNN-based model, the commonly used cross-entropy is applied as the cost function during the network training, which is defined as follows:where and are the target vector and the estimated output vector of the th samples respectively, and and are the total numbers of samples and categories respectively.

The adaptive moment estimation (ADAM) [27] is used as an optimizer to minimize the cost function, so that each parameter of network can be adjusted with adaptive learning rates based on the first moment and the second moment of the gradients. The batch size and initial learning rate are set as 128 and 0.001, respectively. The maximum number of iterations is 50 and an early stop strategy is used to prevent overfitting. The convolution kernel weights, bias terms, and other parameters of each layer are gradually updated in each cycle during training until the cost function is within the acceptable range or no longer changes, and the network can be considered to converge and stop training.

4. Experimental Verification

4.1. Experiment Setup

Experiment data are collected from the XJTU-SY bearing datasets [28] to verify the effectiveness of the proposed diagnosis approach. The bearing testbed is shown in Figure 6, which consists of a driven motor, a motor controller, a support shaft, two support bearings, a hydraulic loading system, and so on. The rolling bearings of type LDK UER204 were tested under different health states, such as inner race wear, outer race wear, and outer race fracture. The bearing of each health state was tested under three different operating conditions. Two accelerometers were placed perpendicular to one another on the housing of the tested bearings to collect vibration data in the horizontal and the vertical direction synchronously. The sampling frequency was set to 25.6 kHz and every sampling record lasted for 1.28 s (i.e., 32768 data points) with an interval of 1 min.

4.2. Details of the Constructed Experimental Datasets

In this section, we constructed two experimental datasets to evaluate the effectiveness of the proposed diagnosis approach; that is, one consists of vibration data records under the same operation condition (called dataset A), and the other consists of vibration data records under different operation conditions (called dataset B).

As the XJTU-SY bearing datasets contain vibration data of the complete degradation processes, different records may represent different stages of degradation. We mainly consider 7 types of health states including normal (N), inner race wear (IRW), outer race wear (ORW), outer race fracture (ORF), cage fracture (CF), IRW&ORW, and IRW&ORF. Dataset A is constructed with sampling records under the 2nd operating condition (37.5 Hz/11 kN), and dataset B is constructed with sampling records under the 1st (35 Hz/12 kN) and the 2nd and 3rd (40 Hz/10 kN) operating conditions. The first few records of each tested bearing are selected as the raw data of normal state, and the last few records of each tested bearing are selected as the raw data of the corresponding fault states. All of the selected records were overlapped sliced into segments according to the method described in Section 3.1, where and were set as 2048 and 300, respectively. A pair of segments of the horizontal and the vertical direction were combined into a complex sequence, which was treated as a raw sample. In dataset A, each health state contains 2140 training samples and 400 testing samples. In dataset B, each health state contains 3260 training samples and 700 testing samples. The details of the two constructed experiment datasets are listed in Table 2.

4.3. Experimental Results and Discussion
4.3.1. Evaluation Metrics

Usually, the performance of a classification model can be evaluated with four metrics, including accuracy , precision , recall , and F1 score , which are calculated based on the following formula:where denotes the number of samples classified correctly into a class, denotes the number of samples not belonging to the class classified into other classes, denotes the number of samples classified incorrectly into the class, and denotes the number of samples belonging to the class but classified into other classes. The accuracy represents the proportion of the number of samples with correct classification to the total number of samples, which measures the overall classification effect of the model in all categories. The other three metrics measure the classification effect of each category [29].

4.3.2. Results of the Proposed Method

In our proposed method, each raw sample of dataset A and dataset B is calculated spectral order values with a range of (0, 20] and an interval of 0.01 according to formula (5). The three elements of each spectral order calculated based on formulas (2) and (3) are combined together into a spectral order feature vector with a length of 2000. After normalized by column, the spectral order feature vector is fed into the first convolution layer of the constructed 1D-CNN structure to train the network and gradually update the optimization parameters of each layer. Finally, the spectral order feature vectors of testing set are input to the trained network and the diagnostic results can be obtained according to the output of the network.

In order to verify the feature learning ability of the constructed 1D-CNN, t-SNE (t-distributed stochastic neighbor embedding) [30] is used for dimension reduction and visualization of the features learned from the test samples. Taking testing samples from dataset A and dataset B for one trial, their features learned from the penultimate fully connected layer are shown in Figure 7. It can be seen that the data points of the same category can aggregate well and the data points of different categories are separated for both datasets, which shows that the constructed 1D-CNN network is competent for concentrating spectral order features and making a correct classification decision. Then, the confusion matrices of this trial are visualized in Figure 8, in which the rows and columns denote actual category and predicted category, respectively. Figure 8(a) shows that almost all of testing samples in dataset A are correctly classified except for 5 misclassification instances. Although there are few misclassification instances in Figure 8(b), the classification accuracy of each category in dataset B has reached more than 98.9%, which preliminarily demonstrates that the proposed method has the capacity for fault diagnosis of rolling bearing under constant and variable working conditions.

In order to show the category diagnosis result of this trail, the four evaluation metrics mentioned in Section 4.3.1 are listed in Table 3, from which it can be seen that the four evaluation metrics of each category for both datasets are over 99.8%. For dataset A, the highest precision of 100% appears in category N (i.e. normal), the lowest precision of 99.25% appears in category ORF (i.e. outer race fracture), and the F1 scores for each category label are similar. For dataset B, although there is a slight decline for the four metrics, the lowest accuracy and precision still reach 99.71% and 98.57%, respectively. The accuracy ratios for each category in dataset A and dataset B are over 99.86% and 99.28%, respectively, which further validates the effectiveness of the proposed rolling bearing diagnosis approach no matter whether the raw vibration data are collected under constant or variable working conditions.

4.3.3. Comparison with Other Methods

To further demonstrate the performance of the proposed method, four other methods are employed in our comparative experiments. The first comparative method is introduced from literature [18], in which four types of signals (i.e. vibration signal, acoustic signal, current signal, and instantaneous angular speed) are combined together with head-to-tail connection to form data-level fused input of 1D-CNN model. For similar considerations, we combine horizontal and vertical vibration signal segments together with head to tail connection as input samples of a 1D-CNN model, named 1D-CNN with serial input method (1DCNN-SI). The hierarchical structure of 1D-CNN model is the same as that of Figure 4, but the length of input and kernels in each convolution layer become twice. The second comparative method is derived from [19], where horizontal and vertical vibration signals are combined together as input samples with two channels and then fed into a 1D-CNN-based diagnosis model. We set the structural parameters of 1D-CNN to be the same as those in Table 1, except that the width of kernels in the first convolution layer changes from 3 to 2. The second comparative method is called 1D-CNN with a parallel input method (1DCNN-PI). The third comparative method named 2DCNN-u8 is come from literature [15], which converts multivibration signals into two-dimensional images and uses the 2D-CNN model to extract features and diagnose faults. Two channel vibration data are combined into pixels by row and column in our comparative experiment. The pixel value represented by an 8-bit unsigned integer is determined by the scalar product of the signal value at the pixel position. For consistency of comparison, the hierarchical structure of 2D-CNN is the same as that of proposed method, except that the network parameters of each layer become two dimensional. In the fourth comparative method, vibration signals samples are converted into spectrum amplitude vectors by full frequency analysis and then input a 1D-CNN model with the same structural parameters as those in 1DCNN-SI method. The fourth method is abbreviated as 1DCNN-FT.

The four comparative methods and our proposed method are all tested for 10 times with the two datasets in order to eliminate the influence of random factors. Their diagnosis accuracy rates are plotted in Figure 9, where it is seen that our proposed method is superior to the four comparative methods in terms of overall accuracy and fluctuation range for both datasets. For dataset A, the diagnosis accuracy of five methods is all over 95% and the average diagnosis result of our proposed method is about 2–3 percentage points higher than that of the other four methods. But for dataset B, there is a significant decline in the diagnosis accuracy of the four comparative methods, especially that of 1DCNN-SI and 1DCNN-PI methods drops down to about 60%. The reason for this is that both methods directly receive input samples in the time domain and lack feature extraction in the frequency domain to deal with variable working conditions. Comparatively, owing to spectral order feature formation before inputting 1D-CNN, our proposed method still achieves convincing diagnostic accuracy above 98.6% in dealing with variable working dataset B.

Table 4 lists the average results of four evaluation metrics in the comparative experiment. In detail, the four evaluation metrics obtained by the proposed method for dataset A are over 99.3%, which are about 4 and 3 percentage points higher than those of 1DCNN-SI and 1DCNN-PI methods, respectively. For dataset B, our proposed method gets a diagnosis result of over 98.6%, but those of 1DCNN-SI and 1DCNN-PI methods are decreased significantly and even no longer in force for lack of treatment for variable working conditions. Although the result of 2DCNN-u8 method is significantly higher than that of 1DCNN-SI and 1DCNN-PI methods, it is still lower than that of the proposed method by about 14 percentage points, which proves that the proposed method is more effective than the 2DCNN-u8 method in dealing with variable working condition. The result of 1DCNN-FT is obviously worse than that of the proposed method and lower by more than 15 percentage points, which illustrates that the step of spectral order feature formation is essential for variable working conditions and significantly enhances diagnostic accuracy.

5. Conclusions and Future Work

This study presents a novel 1D-CNN-based fault diagnosis method for the rolling bearing with dual-sensor vibration data fusion. The method reasonably fuses two vibration signals of horizontal and vertical directions, efficiently excavates correlation information between the signals from different sensors with full-spectrum analysis, and forms valuable spectral order features as input of 1D-CNN model. Through the formation of rotating frequency-based spectral order feature, the presented method can be competent for fault diagnosis under variable operating conditions. The effectiveness and superiority of the presented method are validated through the datasets of constant and variable operating conditions in comparative experiments, and the results show that the evaluation metrics of the presented method are significantly better than those of four comparative methods. In addition, the benefit of spectral order feature formation is also reflected in the comparative experiments, and the diagnostic accuracy decreases significantly when skipping this step of spectral order feature formation for both datasets.

In future work, we will try to research the 1D-CNN-based fault diagnosis method for more mechanical objects, such as gearbox, and optimize network structure for a specific fault diagnosis task. Moreover, as there are still a few misclassification instances in dealing with the case of variable operating conditions, we will introduce transfer learning to the 1D-CNN-based fault diagnosis method to further improve the diagnosis accuracy.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was supported by the program of Henan provincial Machine Learning & Image Analysis working studio collaborated with prominent foreign scientists (Grant No. GZS2022012, Department of Science and Technology of Henan Province), and the open research program of National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology (Grant No. 20200206).