#### Abstract

We present a prediction framework to estimate the remaining useful life (RUL) of equipment based on the generative adversarial imputation net (GAIN) and multiscale deep convolutional neural network and long short-term memory (MSDCNN-LSTM). The method we proposed addresses the problem of missing data caused by sensor failures in engineering applications. First, a binary matrix is used to adjust the proportion of “0” to simulate the number of missing data in the engineering environment. Then, the GAIN model is used to impute the missing data and approximate the true sample distribution. Finally, the MSDCNN-LSTM model is used for RUL prediction. Experiments are carried out on the commercial modular aero-propulsion system simulation (C-MAPSS) dataset to validate the proposed method. The prediction results show that the proposed method outperforms other methods when packet loss occurs, showing significant improvements in the root mean square error (RMSE) and the score function value.

#### 1. Introduction

Prognosis and Health Management (PHM) aims to monitor, predict, and manage the health of the system through models and algorithms and is widely used in aviation, military equipment, industrial manufacturing, and other fields [1]. As one of the important research issues of PHM, remaining useful life prediction (RUL) can provide strategy support for establishing the best maintenance management for equipment. The data-driven method for RUL prediction is developed to analyze the equipment operation data through modeling to determine the remaining available time of equipment. Therefore, the quality of the data is directly related to the accuracy of the RUL prediction [2].

Precision equipment and multisensor fusion are widely used in the industrial field, and obtaining complete monitoring data is crucial to predicting the remaining useful life (RUL) of equipment. In engineering applications, various factors, such as failure of data storage, sensor damage, and mechanical failure, may lead to missing information during equipment information collection and storage [3]. Data packet loss is especially detrimental in complex and harsh working environments, such as aerospace and agricultural production environments [4]. The high cost and difficulty of obtaining equipment degradation data and the existence of information intervals between samples make RUL prediction challenging.

In 1987, Rubin [5] proposed that missing data mechanisms fall into three categories: missing at random (MAR), missing completely at random (MCAR), and missing not at random (MNAR). Handling missing data appropriately is particularly important to ensure the accuracy of missing data imputation [6]. Scholars have conducted numerous studies, and the methods can be roughly divided into three categories: ignoring data or deletion, imputation, and statistical models.

Deleting the missing items in the dataset is the simplest data processing method. Strike et al. [7] simulated three types of mechanisms for dealing with missing data and used different techniques for processing missing data, such as listwise deletion, mean imputation, and eight types of hot-deck imputation. A detailed simulation study was carried out, and it was concluded that simple deletion was a suitable choice when the missing data volume was small. The imputation method fills in missing values. The most probable value is typically used for imputation, which causes less information loss than incomplete samples obtained by deleting all missing values. Commonly used methods include mean imputation, median imputation, mode imputation, and maximum likelihood estimation. Inspired by machine learning, prediction models were used to estimate missing values from the available information in the dataset [8]. Troyanskaya et al. [9] used the k-nearest neighbor (KNN) to estimate missing values in gene microarray data. The imputation effect was better than the imputation method based on singular value decomposition (SVD). Duan et al. [10] used a deep learning model with a denoising stacked autoencoder (DSAE) to estimate missing values in traffic data. This method proved effective for traffic data imputation and analysis. A statistical model was used to impute the missing values based on the linear or nonlinear relationship between the missing data and the observed data. Ni et al. [11] proposed an advanced calculation method based on a Bayesian network to learn from the raw data. A Markov chain Monte Carlo method was used for sampling based on the probability distribution learned by the Bayesian network. It imputes the missing data multiple times and makes statistical inferences about the results. Li et al. [12] proposed a systematic calculation method of traffic flow data based on probabilistic principal component analysis and historical data to estimate missing flow data. A statistical model was used to impute the missing values based on the prior knowledge of the data model, providing excellent results. However, the statistical model has shortcomings due to the incomplete dataset and incomplete prior knowledge. Machine learning has substantial application potential for data imputation. This study focuses on exploiting the use of existing data and machine learning algorithms to impute missing values.

We propose an RUL prediction framework based on data imputation to deal with missing sensor data in engineering applications. First, the missing data are simulated using various missing sample rates. Then, the generative adversarial imputation net (GAIN) model is used to impute the missing values and fill in the dataset. Finally, the proposed multiscale deep convolutional neural network and long short-term memory (MSDCNN-LSTM) prediction model is used to obtain the RUL value of the equipment. The proposed method is well suited for predicting the RUL of equipment if the sensor data are affected by data packet loss in engineering applications. The performance of the proposed method is demonstrated using the commercial modular aero-propulsion system simulation (C-MAPSS) dataset.

#### 2. Related Work

In recent years, deep learning has powerful function mapping capabilities and data processing capabilities. To extract the complex characteristics inside the spectrum and predict the nicotine volume in tobacco, Jiang et al. [13] proposed a one-dimensional fully convolutional network (1D-FCN) model. Hu et al. [14] presented a deep neural network-based visual analysis approach to process videos to detect different augmentative and alternative communication users in practice sessions.

Deep learning has also been widely used in data-driven RUL prediction methods. Babu et al. [15] first tried to use convolutional neural network (CNN) to predict the RUL of the engine, which improved the ability to automatically extract multidimensional features. Then, Li et al. [16] improved the prediction accuracy by using the deep CNN (DCNN) structure and time window data processing. In order to make the CNN model learn more detailed features, Li et al. [17] proposed an algorithm of MSDCNN, that is, the DCNN with different convolution kernel sizes. In order to extract the time correlation features of condition monitoring data, Kong et al. [17] proposed a hybrid algorithm of CNN and long short-term memory (LSTM) to learn spatial and temporal features. Huang et al. [18] developed a novel deep convolutional neural network-bootstrap-based integrated prognostic approach for the remaining useful life (RUL) prediction of rolling bearing. Hu et al. [19] applied the long short-term memory (LSTM) model for RUL prediction of turbine engines and studied a parameter optimization method with Bayesian theory.

In this article, we use the RUL prediction model of MSDCNN-LSTM proposed by Liu et al. [20] to learn more detailed features in a high-dimensional space and predict RUL of aircraft engines. The hybrid MSDCNN-LSTM model consists of an MSDCNN submodel and an LSTM submodel. Among the MSDCNN-LSTM model, the MSDCNN is used to extract high-dimensional features from the input data by time window processing, and the LSTM performs time-series learning on the input data at the same time. Then, the feature map of MSDCNN and LSTM are added and flatten. Finally, the output is sent to a dense layer that represents the RUL output value. The structure chart of MSCNN-LSTM is shown in Figure 1.

#### 3. Proposed Method

##### 3.1. Missing Data Imputation Method Based on GAIN

In 2014, Lan Goodfellow et al. [21] first proposed the generative adversarial net (GAN), which generates data in an adversarial manner with generators and discriminators. The method attracted the attention of researchers and was verified theoretically and practical in engineering applications. The GAN has wide applicability in image, text, and audio processing and other fields.

We used the GAIN model [22] to generate time-series data with a similar distribution as the original for missing data imputation. The basic structure of the model is shown in Figure 2.

The generator is used to observe each part of the real data, and the missing data are imputed according to the observations. The vector in the missing data imputation is expressed as follows:where represents a small sample with missing data, represents a binary matrix with the same size as , represents noise, and represents the multiplication of the corresponding elements.

Finally, the generator outputs a complete vector after imputation as follows:

Since some of the output of the generator is real and some is generated, the difference between the GAIN and the GAN is that the discriminator of GAIN does not determine the authenticity of the entire vector but detects the real and generated parts, i.e., it predicts the value of in . The model trains by maximizing the probability of correctly predicting and trains by minimizing the probability of correctly predicting . The objective function is expressed as

The discriminator distinguishes the source of each part of the input data, and the obtained discriminant matrix is represented by . The cross-entropy loss function is used to evaluate each element in :

The loss function of the generator is defined as

##### 3.2. RUL Prediction Framework Based on GAIN and MSDCNN-LSTM

An RUL prediction framework that combines the GAIN and MSDCNN-LSTM is designed; it consists of three parts: preprocessing the missing data, missing data imputation based on the GAIN model, and RUL prediction based on the MSDCNN-LSTM model, as shown in Figure 3.

First, data preprocessing is performed on the C-MAPSS dataset. The method described in Section 1 is used to construct sample data for the training set with different missing data rates. Subsequently, the GAIN network is used to impute the samples with different missing data rates, and the generator and the discriminator generate data in an adversarial manner to obtain a dataset close to the original one. Finally, the generated samples are used as the input of the MSDCNN-LSTM prediction model. The MSDCNN and LSTM models process the data simultaneously. The multiscale structure of the MSDCNN substantially improves the feature extraction capability. Convolution kernels of different sizes (, , and ) are used to extract features from the input data, and the feature maps are spliced together and combined with the time series to obtain the prediction results of the LSTM. Continuous iteration is used to evaluate the trained model using two indicators (root mean square error (RMSE) and the score function), and the test set data are input to obtain the RUL prediction result.

The RUL prediction process based on the combination of GAIN and MSDCNN-LSTM is shown in Figure 4. First, the missing data are generated using missing data rates of 0.1, 0.2, 0.3, 0.4, and 0.5. Then, we use the GAIN network to impute the missing values of the samples. During the training of the GAIN network, we set the epoch to 1000 times, and the newly generated time-series dataset are standardized and processed by a time window. After setting the RUL labels of the training set and test set, the next stage is model training and system prediction. The parallel MSDCNN-LSTM hybrid model performs multiscale feature extraction on the time-series training set, and the parameters and weights in the model are updated using a minibatch of 512. When the early stopping conditions set by the system are met, the model training ends early. If the early stopping condition is not met, the minibatch training is continued until the maximum epoch. After the model is trained, we input the test set data to predict the RUL result of each engine.

#### 4. Experimental and Results

##### 4.1. Experimental Dataset and Settings

The experiments are conducted on a server computer configured with an Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10 GHz and an NVIDIA GeForce TITAN XP. The C-MAPSS dataset is used to verify the proposed method. The C-MAPSS dataset is divided into four subsets (FD001, FD002, FD003, and FD004) according to the operating conditions and failure modes. Each dataset contains engine degradation data monitored by 21 sensors, as listed in Table 1. Each subset is divided into a training set and a test set. FD002 and FD004 have 6 operating conditions, and FD003 and FD004 have 2 failure modes.

The score function and the RMSE are used as evaluation indicators. The formula of the score function is [23]

The formula of the RMSE iswhere represents the difference between the predicted value of RUL and the true value, . When is less than 0, the predicted value is less than the true value, and the result is referred to as an advanced prediction; otherwise, it is a lagging prediction.

The lower the value of the score function and RMSE, the better the predictive ability of the model. The RMSE is a symmetric function and provides the same result for an advanced prediction and lagging prediction. However, the score function is an asymmetric function and is more sensitive to lagging prediction. Because lagging prediction has more serious consequences, it results in stronger penalties than advanced prediction. Therefore, these indicators comprehensively measure the performance of the algorithm.

##### 4.2. Simulating the Missing Data Rate

It is assumed that the original dataset is , , where is the number of sensors, is the length of the time series, and is the measured value of the th sensor corresponding to the th period. Here, we define a binary matrix , and , which has the same size as the original data and consists of 0 and 1 values. The reconstructed missing data can be expressed aswhere represents the observed component of . A value of 1 indicates the observed data, and a value of 0 represents the missing data . Datasets with different missing sample rates can be created by changing 0 to another value.

##### 4.3. Data Imputation Simulation Results and Analysis

In the 4 subsets of the C-MAPSS dataset, 7 sensor data with no changes were eliminated. Therefore, the sensor numbers used in this experiment are 2, 3, 4, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20, and 21. The RMSE was used as an evaluation indicator to evaluate the imputation effect of GAIN.

Table 2 lists the imputation accuracy of GAIN for different missing data rates (0.1, 0.2, 0.3, 0.4, and 0.5) on the C-MAPSS dataset.

As the missing data rate increases, the RMSE values of the four subdatasets FD001, FD002, FD003, and FD004 increase, and the accuracy decreases. In the case of a high missing data rate, there is less sample information, and it is difficult to fill in the missing sample data. The imputation performance of GAIN is better for the FD002 and FD004 datasets with complex working conditions and a large sample size than for the FD001 and FD003 datasets with simple working conditions and a small sample size. Therefore, the imputation performance is better for a larger sample size, and the prediction accuracy decreases as the missing data rate increases.

Figure 5 shows the visualization results of GAIN after missing data imputation for a missing data rate of 0.5. The horizontal axis represents the operating cycle of the first engine, and the vertical axis is the result of the first sensor data after maximum-minimum standardization. The middle black rectangle represents the real data, and the red dots represent the results of GAIN after imputation. Although the effect of missing data is more serious when the missing data rate is high, the data after imputation based on GAIN fluctuates in a small range around the real data, and the overall distribution is consistent with the real data distribution.

Table 3 shows the influence of the loss function on the GAIN model performance. During data imputation, the loss function is particularly important for training the generator and discriminator models. After conducting experiments, we found that the model performance was best when the cross-entropy loss function and the mean square error loss function were used. We use the FD001 dataset with a missing data rate of 0.5 as an example to verify the impact of the loss function on the model performance and compare the simulation results obtained from different combinations of loss functions. It can be seen from Table 3 that the combinations of the two loss functions and the adjustment of the parameters significantly affect the results. The optimal RMSE value is obtained when the cross-entropy loss function is used for the discriminator, and the cross-entropy loss function + mean square are used for the generator. In the experiment, different combinations were used under the same conditions to verify the effect of the parameter in the loss function. The results are listed in Table 4.

Adding the parameter to the generator loss function improves the imputation accuracy, but and substantially increase the RMSE value. Therefore, the model provides the best performance when the coefficient of the RMSE in the generator loss function is .

Figure 6 shows the results of the GAIN imputation and other methods. The GAIN imputation, mean imputation, median imputation, and mode imputation are compared using the FD001 dataset. The horizontal axis represents the missing data rate, and the vertical axis represents the RMSE. As the missing data rate increases, the RMSEs of the four methods show an upward trend, and the imputation performance decreases. The results for different missing data rates indicate that the mean value imputation results are more stable than the mode and median imputation methods. However, the GAIN achieves the smallest RMSE values for the different missing data rates, indicating that it outperforms the other methods.

Figure 7 shows the RUL prediction results of all engine units after GAIN imputation on the C-MAPSS dataset when missing data rate is 0.1. The test engine is sorted by RUL from small to large to better observe the changes in prediction accuracy. The horizontal axis represents test engine unit, and the vertical axis represents the RUL. In the figure, the black dots represent the real RUL, and the red dots represent the model prediction results. It can be seen from Figure 7 that, at the initial stage of engine operation, the RUL value is relatively large and the prediction error is relatively large. When the engine runs for a long time or is about to fail, the degradation information is more obvious, and the predictive performance is significantly enhanced. The proposed framework reflects a good forecasting effect.

**(a)**

**(b)**

**(c)**

**(d)**

Table 5 shows the RUL prediction results with and without GAIN imputation for a missing data rate of 0.1. It is worth noting that the system automatically replaces missing data with 0 to ensure the smooth execution of the RUL prediction algorithm. Therefore, when the missing data rate is 0.1, the score function value cannot be obtained, and it causes difficulties for the subsequent RUL prediction, such as a substantial increase in the RMSE value. However, after the missing data are imputed by GAIN, the prediction results are significantly improved. The RMSE has increased by at least 80.16%, and the score function value has increased by at least 99.98%.

Table 6 shows the prediction results of the proposed GAIN method for missing data rates of 0.1, 0.2, 0.3, 0.4, and 0.5. As the missing data rate increases, the prediction accuracy of the 4 subdatasets decreases. The score function is more affected than the RMSE. When the missing rate is less than 0.4, the proposed RUL prediction framework based on data imputation can show better performance. When the missing data rate is higher than or equal to 0.4, too much data information is lost, resulting in low prediction performance. However, the prediction result of the proposed framework is much better than using no missing data imputation for a missing data rate of 0.1.

#### 5. Conclusions

This paper proposed a RUL prediction method based on the combination of GAIN and MSDCNN-LSTM. Experiments were carried out with a missing data rate of 0.1–0.5 to simulate data packet loss in industrial production. In the GAIN method, the generator interacts with the discriminator to impute the missing data and generate a sample dataset that is close to reality. This dataset is used as the input of the MSDCNN-LSTM prediction model. A comparison of the simulation results of GAIN and other methods indicated that the GAIN imputation method outperformed the mean, median, and mode imputation methods. The proposed prediction framework was compared with no data imputation when packet loss occurred and exhibited a significant improvement. The RUL prediction framework showed better prediction performance than other methods on the C-MAPSS dataset for different missing data rates.

#### Data Availability

The data used to support the findings of the study are included within the article.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This work was supported by the Beijing Natural Science Foundation (Grant no. 4202026), Qin Xin Talents Cultivation Program of Beijing Information Science and Technology University (QXTCP A202102), Beijing Postdoctoral Science Foundation (Grant no. ZZ-2019-65), Chaoyang District Postdoctoral Science Foundation (Grant no. 2019ZZ-45), the National Key R&D Program of China (Grant no. 2018YFB1308300), and National Natural Science Foundation of China (Grant no. 62103056).