#### Abstract

Remaining useful life (RUL) prediction plays a significant role in developing the condition-based maintenance and improving the reliability and safety of machines. This paper proposes a remaining useful life prediction scheme combining deep-learning-based health indicator and a new relevance vector machine. First, both one-dimensional time-series information and two-dimensional time-frequency maps are input into a hybrid deep-learning structure network consisting of convolutional neural network (CNN) and long short-term memory network (LSTM) to construct health indicator (HI). Then, the prediction results and confidence interval are calculated by a new RVM enhanced by a polynomial regression model. The proposed method is verified by the public PRONOSTIA bearing datasets. Experimental results demonstrate the effectiveness of the proposed method in improving the prediction accuracy and analyzing the prediction uncertainty.

#### 1. Introduction

Rotating machinery has played an essential role in industrial applications. However, most rotating machinery operates under severe working conditions which may cause different types of faults. Therefore, timely maintenance is vital for the reliability of the rotating machinery [1–4]. Industrial Internet of Things (IoT) and data-driven techniques have been transforming the scheduled maintenance into predictive maintenance. Remaining useful life (RUL) prediction is a critical component of a predictive maintenance scheme, which will reduce the cost of unplanned maintenance and enhance the reliability, safety, and availability of the rotating machinery [5].

The data-driven techniques for RUL of machinery mainly consist of two steps: health indicator (HI) construction and remaining useful life prediction based on the constructed HI [6–10]. The HI is a quantitative value that represents the degradation process of the monitored machinery, including root mean square (RMS) [11], kurtosis [12], and entropy [13]. However, many traditional HIs have poor monotonic trend, which is against the prediction accuracy. For example, the HI curve with an excellent monotonic trend will be well correlated with the degradation process, making the RUL able to be predicted by extrapolating the historical data. However, most HI curves do not show an evident trend until severe degradation starts, which is terrible to make maintenance scheme and reduce the prediction accuracy. Besides, many HI construction methods do not consider the historical data of similar machinery that contains tremendous degradation information.

Recently, the deep-learning network has shown great potential in dealing with big data [14–16]. Motivated by the strong power of the deep learning, researchers have done many related works about remaining useful life prediction based on deep-learning method. Zhu et al. [17] presented a deep-learning method for RUL through a multiscale convolutional neural network, the input of which was time-frequency maps. Liao et al. [18] proposed an enhanced restricted Boltzmann machine with a novel regularization term to construct a new HI that is suitable for RUL. The input data of the network was time-series features. Xia et al. [19] presented a two-stage automated approach to estimate the RUL, in which the autoencoded deep neural networks were used to classify different degradation stages and a shallow neural-networks-based regression model was used to predict the remaining useful life. Zhang et al. [20] constructed a new HI called “waveform entropy.” Then, the new HI and some traditional HIs were input into the long short-term memory network to identify the bearing remaining useful life. Al-Dulaimi et al. [21] proposed a hybrid deep neural network framework for RUL estimation. This framework used an end-to-end RUL prediction scheme, the output of which was the RUL value. Although these deep-learning-based RUL prediction methods have shown great performance, they still may be confronted with some problems: (1) The deep-learning-based RUL prediction scheme does not provide any confidence limit, which is not beneficial for people to make a maintenance scheme. (2) Most of the deep-learning networks process only one-type input data, missing some important degradation information.

Relevance vector machine (RVM) is an artificial intelligence method to learn the machinery degradation patterns from available data instead of building statistical models. It can deal with the prognostic issues of sophisticated machinery whose degradation process is challenging to be interrelated by the statistical model [22]. What is more, the RVM-based RUL prediction method also gives a confidence interval to provide uncertainty estimation and probability significance. Therefore, the RVM has been attracting more and more attention in the RUL prediction of machinery [23–25]. The relevance vector of RVM is sparsity and the hyperparameters are simple, which is beneficial for the online remaining useful life prediction [26]. However, the long-term prediction accuracy of RVM is poor. Therefore, a new RVM prediction method with the sparsity characteristic and the accurate long-term prediction ability has been proposed.

There are many sources of uncertainty in RUL prediction, such as measurement error, randomness of load, degradation feature extraction error, and modeling error, which need to be quantified and managed during the prediction process, and the confidence interval of forecast results is given to facilitate the planning of maintenance. At present, the research on the uncertainty of RUL mainly focuses on statistical data-driven methods. The statistical data-driven method is based on the theory of probability and statistics. Through statistical or random model, the probability distribution of the remaining life can be solved naturally, which is easy to quantify the uncertainty of the prediction results of the remaining useful life. Liao et al. [27] constructed a multiphase degradation model with jumps based on Wiener process, which is formulated to describe the multiphase degradation pattern. All the parameters of the model were assumed to be random variable, which led to the uncertainty of the final remaining useful life. Gao et al. [28] proposed a right-time prediction method to reduce the prognostics uncertainty of mechanical systems under unobservable degradation. Wang et al. [29] presented a probabilistic framework for remaining useful life prediction of bearings. In the proposed model, the Markov chain Monte Carlo method is investigated in posterior sampling for predicting RUL and outputting uncertainty. Most of the existing deep-learning methods can only achieve point prediction and cannot provide uncertainty of prediction results, which greatly limits the practical application of deep learning in RUL prediction field [30]. Some researchers tried to establish RUL prediction model based on Bayesian neural network to solve the uncertainty problem [30, 31]. Although Bayesian neural network can be used to solve the uncertainty problem of RUL prediction, the disadvantage of high training cost limits the practical application of Bayesian neural network.

Although the deep-learning-based HI construction methods and RVM-based RUL prediction methods have been widely studied, the methods combining them are relatively lacking. To fill the research gap, a new RUL prediction scheme that combines a new deep-learning structure-based HI construction method and a new RVM-based RUL prediction method is proposed. The new RUL prediction scheme can not only learn the degradation process features from different types of data and get RUL prediction result automatically but also provide a confidence interval (CI).

The contributions of this paper can be summarized as follows:(1)A new deep-learning structure that can deal with one-dimensional time-series data and two-dimensional image data simultaneously is proposed to construct HI. The constructed HI has better performance compared with other deep-learning-based HI construction methods.(2)The proposed systematic approach integrates deep-learning-based HI and a new RVM-based prediction method into a framework to realize the goal of estimating RUL automatically and provide a confidence interval.(3)A new RVM model is proposed by combining traditional RVM and polynomial regression model, improving the long-term prediction accuracy.

The paper is organized as follows: Section 2 provides the theoretical backgrounds. Section 3 introduces a new RUL prediction scheme combining deep-learning-based HI and a new RVM. Section 4 demonstrates the effectiveness of the presented RUL estimation scheme with an experimental bearing dataset. The conclusions are presented in Section 5.

#### 2. Theoretical Background

The proposed RUL prediction scheme mainly consists of four functional layers, which are time-series information learning layer, time-frequency map information learning layer, fully connected layer, and RUL prediction layer. The hybrid deep-learning structure consists of two parallel paths followed by a fully connected multilayer neural network to use the information contained in the original data fully. The two parallel paths are time-series information learning layer constructed by long short-term memory (LSTM) neural network and time-frequency map information layer made up of convolutional neural network (CNN), respectively. The LSTM is used to extract temporal features, while the CNN is utilized to extract spatial features, which are then fused by fully connected layer to construct an HI. Finally, the HI is put into RUL prediction layer to get the remaining useful time and its confidence intervals. The theoretical background of each layer is introduced as follows.

##### 2.1. Time-Series Information Learning Layer

The time-series information learning layer mainly consists of the long short-term memory network. The long short-term memory network is a state-of-the-art sequence data processing method. It develops from the recurrent neural network with a memory cell, which overcomes the problem of gradient vanishing or exploding. Figure 1 represents the hidden layer replaced by memory cells in LSTM network.

The memory cell of LSTM mainly consists of an input gate, output gate, and forget gate. Equations (1)–(6) represent the network update process at time *t* [32]:

In the above equations, , , , , , and represent input gate, output gate, forget gate, the output value of input gate, the state value of memory cell, and the output value of hidden layer, respectively. , , , , , , , and are the weights matrices. , , , and are the bias values. is point multiplication operation.

##### 2.2. Time-Frequency Map Information Learning Layer

The time-frequency map information learning layer is made up of a deep convolutional neural network, consisting of a convolutional layer and a pooling layer.

In the convolutional layer, local features are generated by convolutional kernels from the feature maps. Then, the convolutional results are input into the activation layer to construct the feature maps of the current layer, whose equation process is as follows [33]:

In the above equation, is the *j*th feature map of the th layer. is the *i*th feature map of the -th layer. is the convolutional kernel with size of . is the bias of the th layer. is the feature map of the convolutional layer. is the activation function.

In the pooling layer, the feature is extracted from feature maps with the subsampling method to increase computational efficiency. The max-pooling method is given as

In the above equation, and are the *j*th input feature map of layer and the *j*th output feature map of layer *l* + 1. *m* is the pooling filter size, *c* and *d* are the value after convolution, and and *q* are the moving step length.

##### 2.3. Fully Connected Layer

The fully collected layer is added after the time-series information learning layer and time-frequency map information learning layer. The features leaning from the above two layers are flattened to construct the fully connected layer, which can be represented by the following equation:

In the above equation, is the final output value. is the *j*th neuron. represents the weights between the *j*th neuron and the output node. is the bias. is the activation function.

##### 2.4. RUL Prediction Layer

This layer can filer the unwanted measurement noise and manage the uncertainty in prognostics. The RUL prediction layer is constructed with a new relevance vector machine (RVM) combining the traditional RVM method with polynomial models.

RVM is a kernel function algorithm based on Bayesian inference framework [29]. The RVM model of the given dataset , , , iswhere , and is the weight of the RVM model; is the design matrix; , , and is sample number; is the kernel function; is the Gaussian distributed random error with the mean of 0 and the variance of .

According to the Bayesian inference, the likelihood of the dataset satisfies distribution, which can be written as

Maximum-likelihood estimation of and from (11) will generally lead to severe overfitting, so a Gaussian prior over the weights is defined to smooth the functions as

In the above equation, is the Gaussian prior probability over the weights, is the weight of the RVM model, and is a vector of hyperparameters corresponding with weights . These hyperparameters are the critical features of the model and are ultimately responsible for the sparsity properties.

The posterior over the unknowns could be computed with Bayes’ rule, given the defined noninformative prior distribution.

Equation (13) cannot be computed directly, but it can be decomposed as

The posterior distribution of the weights is

The posterior covariance and the mean of equation (15) arewhere .

As can be seen from equations (16) and (17), the values of the hyperparameter and noise variance need to be obtained in order to get the values of and . The maximum-likelihood estimation method is used to obtain the estimated values of and . The probability distribution of the output value is calculated by the new input value aswhere

In the training process, most values tend to be infinite, so the corresponding weights have posterior distributions, whose mean and variance are both zero, suggesting that those parameters and corresponding kernel functions play no role in regression analysis, which represents the sparsity of the RVM. The inputs data corresponding to the nonzero weights is called relevance vector (RV). The 95% upper and lower confidence interval can be calculated aswhere and are the upper bound and lower bound of the predicted value , respectively. is the kernel function; represents the set of the relevance vectors, represents the relevance vector, and is the input data.

The polynomial models are suitable for long-term RUL prediction. Polynomial regression belongs to the least-square curve fitting family. Specifically speaking, it estimates the coefficients of a polynomial function to approximate the curve closely. The mathematical expression of polynomial regression is as follows:where is the response variable, is the predictor variable, and are model coefficients that can be estimated by curve fitting methods.

In this paper, we take advantage of the RVM and polynomial model, the response variable is RV, and is the corresponding running time. The coefficients of the polynomial model are determined by *x* and relevance vector *y* of the test bearing. Then the predicted value is calculated by equation (23). The variance value is calculated by equation (20) and the 95% upper and lower confidence interval can be calculated by equations (21) and (22).

##### 2.5. Time-Frequency Analysis

In the process of performance degradation of rolling bearings, vibration acceleration signals have nonstationary characteristics. Time-frequency analysis includes both time-domain information and frequency-domain information, which can effectively characterize the characteristics of nonstationary signals. Continuous wavelet transform is a time-frequency analysis method commonly used in state monitoring of rotating machinery. The calculation formula is as follows:where is the scale parameter; is the transformation parameter; is the original vibration acceleration signal; is the mother wavelet function; is the complex conjugate of . There is a standard or universal method for the selection of the mother wavelet function. In this paper, Morlet wavelet, which is similar to the impact signal of rolling bearing, is chosen as the mother wavelet. After the continuous wavelet transform, the one-dimensional vibration acceleration signal is mapped to the two-dimensional coefficient matrix, and the time-frequency diagram of the vibration signal is obtained.

#### 3. RUL Scheme Combining Deep-Learning-Based HI and a New RVM

A hybrid deep-learning structure that can learn temporal features and spatial features simultaneously is proposed to take advantage of mutual information from multidimensional features for degradation assessment and RUL prediction. What is more, the training set is constructed with historical whole lifetime monitoring data. Then the training set consisting of different HI curves is used to train the RVM. The sparsity of RVM regression is highly dependent on the choice of kernel functions. The common kernel functions are classified into local kernels and global kernels. In local kernels, only the data points that are close or in proximity of each other have an effect on the kernel values.

In contrast, a global kernel allows data points that are far away from each other to affect the kernel values as well. Furthermore, the common global kernels are polynomial function, spline function, and so forth [34]. Different types of kernels perform distinctly in the interpolation and extrapolation ability. A multikernel RVM-based prediction method is proposed to make full use of the superiorities of different kernels by combining them with the particle swarm optimization (PSO) algorithm. Figure 2 shows the proposed RUL scheme.

First, the time-series information including time-domain features, frequency-domain features, and time-frequency map information of the whole lifetime is extracted from the original vibration signal. Different information is processed by different information learning layer. Then, a fully connected layer is used to combine different features learned from the time-series information leaning layer and time-frequency map information learning layer together. The HI is constructed by a three-layer neural network using combined information. Finally, the constructed HI curve is used to predict the RUL with the RUL prediction layer, which is constructed by the RVM and polynomial model.

At the inspection time , the future HI can be predicted with the constructed polynomial curve. When the polynomial curve reached the failure threshold, the bearing is considered to fail. According to the concept of the first hitting time [31], the RUL of the bearing can be defined as

In the above equation, is the remaining useful life at inspection time , is the predicted HI at the time , and is the failure threshold.

#### 4. Experimental Results and Analysis

In this section, the run-to-failure data acquired from accelerated degradation tests of rolling element bearings are used to verify the effectiveness and superiority of the proposed RUL scheme in practical applications. The experimental data comes from PROGNOSTIA in the IEEE PHM 2012 Data Challenge [35]. The experimental platform mainly consists of three parts, a rotatory part, a degradation generation part, and a signal acquisition part, which is shown in Figure 3.

##### 4.1. Data Description

In this experiment, 17 rolling element bearings working under three different conditions are tested. The experimental conditions are listed in Table 1. Under each condition, two bearings’ data are used as a training dataset, while others are testing datasets, which are listed in Table 2. The acceleration sensor is installed in the outer layer of the rolling element bearing. The sampling frequency is 25.6 kHz and every sampling process includes 2560 points. The sampling process is repeated every 10 s.

##### 4.2. HI Construction

The whole lifetime data of the first bearing is selected to be analyzed. The acceleration signal on the horizontal direction shown in Figure 4 shows that the vibration amplitude increases as the experiment cycle increases, but it is hard to determine accurately when the incipient fault occurred. Therefore, different features are extracted from the original signal including 10 time-domain features, 12 frequency-domain features, and 1 time-frequency domain feature, which are listed in Table 3.

The training data can be presented by , in which is the time-frequency map with size at time *t*. represents the performance degradation degree of bearing at time period *t*, , and *T* is the life cycle period of bearing for training. It was verified that the relationship between training data label and running cycle does not affect the final result of health indicator construction. Therefore, the linear model is selected here to construct the training dataset label. Here, bearing 1_1 is taken as an example. The time period when the bearing fails completely is 2800 period, and the degradation degree corresponding to the 1400 period is .

In the proposed deep learning network, the convolution structure mainly refers to classical AlexNet network and time-series information learning layer constructed by stacking three-LSTM-layer network. The literature shows that the network structure can effectively extract the characteristics of time-series data. The CNN and LSTM connected layer is used to connect the information extracted by time-series information learning layer and time-frequency map information learning layer together, which can get degradation information comprehensively. Finally, a fully connected layer is constructed to output the final result. Detailed network parameters can be seen in Table 4.

##### 4.3. RUL Prediction

The HI constructed in Section 3 is used to predict the remaining useful life by the RUL prediction layer. Figure 5 shows the complete degradation prediction process of bearing1_5 at inspection time running period to describe the prognostics procedure in detail. First, all the HIs constructed with hybrid deep-learning networks, which are shown by hollow blue dots in Figure 5, are input into RVM to perform regression analyses with different kernel parameter values. Next, the kernel parameter value is selected by the PSO algorithm. As is shown in Figure 5, the relevance vector is calculated by RVM with the optimized kernel parameter value. The polynomial model is constructed by fitting the RVs. Subsequently, the HI values at the future running periods are predicted with the constructed polynomial model. When the predicted HI values reached the predefined failure threshold, the bearing is considered to be a failure. The RUL and the confidence interval can be computed by equation (22).

#### 5. Results and Analysis

Figure 6 represents the HIs of bearing 1_3 constructed with the different methods to illustrate the superiority of the proposed method compared with the deep-learning method of single-structure network and traditional HI. Figures 6(a)–6(d) are traditional HIs, which are commonly used in the RUL prediction domain. It can be seen in Figure 6(a) that the RMS increases significantly at the end of the experiment. The kurtosis and crest factor are sensitive to the incipient degradation process with more background noise, which cannot present the degradation process of the whole lifetime clearly as shown in Figures 6(b) and 6(c). In Figure 6(d), the peak-peak-value-based HI has similar trends to RMS, which is insensitive to the incipient degradation process. RUL prediction based on these HIs cannot provide timely maintenance suggestions.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

Figures 6(e)–6(g) described the deep-learning-based HI suggesting better monotonicity and trendability than the traditional HI. However, the single deep-learning structure cannot fully utilize the information contained in the original acceleration signal. It can be seen in Figure 6(e) that the HI constructed by the deep CNN could present the degradation process, but there exists too much background noise at the earlier stage, since the single CNN structure cannot learn and distinguish the time-frequency picture effectively at the earlier degradation stage. From Figure 6(f), it can be concluded that the HI constructed with LSTM is almost constant at the earlier degradation stage because the single LSTM structure is unable to learn the difference of the earlier degradation features between the training set and testing set. Figure 6(g) is the HI curve constructed with the proposed method, suggesting that the HI constructed with deep hybrid structure has better linearity and less background noise, which are beneficial to promoting prognostic accuracy.

Figure 7 shows the RUL prediction results of bearing 1_3 to illustrate the influence of different deep learning structure on the predicted results further. Figure 7(a) shows the RUL prediction results of LSTM-based HI, where the predicted RUL time is 4800 s and the 95% confidence interval is [2510, 7090] s. It can be seen in Figure 7(a) that the relevance vector in the earlier stage affects the polynomial type, which lowers the prediction accuracy. Figure 7(b) shows the RUL prediction results of CNN-based HI, where the predicted RUL time is 5190 s and the 95% confidence interval is [0, 10830] s. A larger confidence interval brings about too much uncertainty to the RUL results, which is terrible for making maintenance plan. The RVs in the middle period influence affect confidence interval. Figure 7(c) shows the RUL prediction results of hybrid-structure-based HI, where the predicted RUL time is 5600s, and the 95% confidence interval is [5398, 5803] s. The actual RUL time is 5730 s, which has been included in the confidence interval. What is more, the confidence interval is narrow, leading to the low uncertainty of the prediction results.

**(a)**

**(b)**

**(c)**

The RUL prediction results of all the test bearings are shown in Figure 8. It can be seen from Figure 8 that the method proposed in this paper can effectively predict the performance degradation trend and obtain a relatively narrow confidence interval. It can be concluded from Figure 8 that there exist 3 types of degradation curve: linear type (such as bearing1_3, beaing1_4, bearing1_7, bearing2_4, and bearing2_5), exponential type (such as bearing2_2), and S-shape type (such as bearing1_5, bearing1_6, bearing2_1, bearing2_3, and bearing2_6). The polynomial model can effectively fit different types of degradation curve.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

**(i)**

**(j)**

**(k)**

Different RUL prediction methods are compared with six other studies with the same dataset to illustrate the superiority of the proposed scheme, which are listed in Table 4. Column 1 shows the testing bearings. The prediction starting time is shown in column 2. For each testing bearing, the actual and predicted RUL times are displayed in columns 3 and 4, respectively. The predicted errors of the proposed method are shown in the final column, and the six comparative studies are shown in columns 5 to 10. The mean and SD of the percent errors and the scoring metrics are shown in the last three rows. A scoring function to evaluate the final prediction results is defined as follows:where is the prediction score of the *i*th test bearing and is the percent error of RUL prediction results for the *i*th testing dataset, and it can be calculated aswhere represents the actual RUL of test bearing *i* and represents the predicted RUL results of test bearing *i*.

The RUL prediction method proposed by Sutrisno et al. [36] has predicted the RUL based on vibration frequency signature anomaly detection and survival time ratio. However, the anomaly detection time point is decided by subjective criteria, and the prediction errors are large in the table. Hong et al. [37] have constructed the packet-EMD and SOM-based HI to predict RUL to improve the RUL accuracy compared to the previous work. However, it requires extracting more than 100 features to construct the HI, which is time-consuming. Lei et al. [38] have proposed a new HI construction method based on weighted minimum quantization error (WMQE) to predict RUL of bearings. These three methods use feature extraction, selection, and fusing, which rely on manual experience and time consumption.

Guo et al. [40] constructed a deep-learning-based method to construct HI with multiple features from the time domain, frequency domain, and time-frequency domain. This method showed superiority over SOM-based HI construction method, but it has a lower accuracy than Lei et al.’s. Yoo [41] has proposed a new method to construct the HI with CNN and the Gaussian process regression method for RUL prediction. This method improves the prediction accuracy and efficiency. Si et al. [42] has constructed HI with wavelet packet decomposition, empirical mode decomposition, and self-organizing map and used RVM combined with exponential degradation model to predict RUL, which improves the RUL accuracy effectively. However, the HI construction process is complicated and time-consuming.

In Table 5, the proposed method shows the lowest percent errors and low deviation, proving that the model is accurate and reliable on every tested bearing. What is more, this method does not design a sophisticated feature extraction algorithm based on human experience and realizes the intelligent RUL prediction. The performance degradation processes of the test bearings are not consistent. Compared with the other methods, the prediction accuracy of the proposed method is not the highest in all test bearings, but the average error and score of the prediction results are the best. In the next step, the performance degradation process of different test bearings will be studied in depth to further improve the prediction accuracy of the remaining useful life of each type of test bearings.

#### 6. Conclusions

This paper proposes a new RUL prediction scheme combining deep learning and a new RVM method. Firstly, different types of degradation data are input into the deep-learning network with a hybrid structure to construct the health indicator. Then the new RVM model consisting of RVM and a polynomial model is used to predict the RUL and calculate confidence interval. Finally, the proposed method is compared with different RUL prediction methods to verify the effectiveness.

The proposed deep-learning network with a hybrid structure could learn from different types of degradation data. The constructed health indicator curve has better monotonicity and trendability than the single-structure deep-learning network, such as CNN and LSTM. The RVM is widely used in RUL prediction. On the one hand, the RVM could reduce the redundancy of the degradation curve to enhance the prediction accuracy. On the other hand, the prediction results of RVM are profoundly affected by kernel function and the long-term prediction ability is reduced. The proposed method retains the advantage of RVM and overcomes the disadvantage by combining the polynomial model with RVM. The final RUL prediction results show that the proposed method can enhance prediction accuracy and narrow down the confidence interval.

Although the proposed RUL scheme improves the prediction results, it is time-consuming. In future work, it is expected to raise the computational efficiency by researching a better deep-learning structure.

#### Data Availability

The experimental data are obtained from PROGNOSTIA in the IEEE PHM 2012 Data Challenge by Nectoux, P.; Gouriveau, R.; Medjaher, K.; Ramasso, E.; Chebel-Morello, B.; Zerhouni, N. “PRONOSTIA: An Experimental Platform for Bearings Accelerated Degradation Tests,” presented at the IEEE Int. Conf. Prognostics Health Manage., Denver, CO, USA, 2012, 1–8.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest.

#### Acknowledgments

This work has been supported in part by the National Natural Science Foundation of China (61640308).