Abstract

Because they are key components of aircraft, improving the safety, reliability and economy of engines is crucial. To ensure flight safety and reduce the cost of maintenance during aircraft engine operation, a prognostics and health management system that focuses on fault diagnosis, health assessment, and life prediction is introduced to solve the problems. Predicting the remaining useful life (RUL) is the most important information for making decisions about aircraft engine operation and maintenance, and it relies largely on the selection of performance degradation features. The choice of such features is highly significant, but there are some weaknesses in the current algorithm for RUL prediction, notably, the inability to obtain tendencies from the data. Especially with aircraft engines, extracting useful degradation features from multisensor data with complex correlations is a key technical problem that has hindered the implementation of degradation assessment. To solve these problems, deep learning has been proposed in recent years to exploit multiple layers of nonlinear information processing for unsupervised self-learning of features. This paper presents a deep learning approach to predict the RUL of an aircraft engine based on a stacked sparse autoencoder and logistic regression. The stacked sparse autoencoder is used to automatically extract performance degradation features from multiple sensors on the aircraft engine and to fuse multiple features through multilayer self-learning. Logistic regression is used to predict the remaining useful life. However, the hyperparameters of the deep learning, which significantly impact the feature extraction and prediction performance, are determined based on expert experience in most cases. The grid search method is introduced in this paper to optimize the hyperparameters of the proposed aircraft engine RUL prediction model. An application of this method of predicting the RUL of an aircraft engine with a benchmark dataset is employed to demonstrate the effectiveness of the proposed approach.

1. Introduction

Because they are core components of an aircraft, the failure of engines is often a major cause of major accidents and casualties [1]. Therefore, the safety and the reliability of engines are vital to the performance of aircraft. However, it is difficult to ensure their safety and reliability due to their complicated structures, and engine failure has arisen inevitably due to effects of aging, environment, and variable loading as the working time increases. For this reason, it is essential to detect underlying degradation, predict how soon an engine will fail effectively, implement maintenance promptly, and ultimately prevent catastrophic failure.

In the field of aircraft maintenance, traditional maintenance is either purely reactive (fixing or replacing an aircraft engine component after it fails) or blindly proactive (assuming a certain level of performance degradation with no input from the aircraft engine itself and maintaining the aircraft engine on a routine schedule whether maintenance is actually needed or not). Both scenarios are quite wasteful and inefficient, and neither is conducted in real time [25]. Given the scheduling of maintenance tasks based on fault diagnosis, performance degradation assessment and the predicted remaining useful life of the aircraft equipment and the need to prevent faults in advance, prognostics and health management (PHM) is gradually replacing these two maintenance strategies. Prognostics, as the core of PHM, involves managing performance deterioration processes or faults in the aircraft engine and forecasts when components/systems of the engine will breakdown or when the performance will reach to an unacceptable level.

There are three main classes of RUL prediction methods: (1) data-driven methods, (2) physics model-based methods, and (3) methods that combine data-driven and physics model-based methods [69]. The data-driven methods use past condition monitoring data, the current health status of the system, and data on the degradation of similar systems. The methods based on physics models use system-specific mechanistic knowledge, failure regulation, and condition monitoring data to predict the RUL of a system or component. There are two main challenges in prognostics based on physics: (1) there is not enough physical knowledge to construct a physical degradation model and (2) the values of the physical model’s parameters are difficult to determine exactly. Therefore, it is important to understand the failure mechanism of the system correctly, and experienced personnel are required for physics-based models [10, 11]. In addition, the peripheral environment during device operation (e.g., the temperature and humidity) and the operating conditions (e.g., the fan speed) may be used as inputs and constitute additional dimensions to be considered. Therefore, the requirements of data-driven methods to model the degradation and predict the RUL are easier to satisfy in reality. At present, data-driven methods are widely used in RUL prediction [12, 13].

The performance of many data-driven prognostics methods is heavily dependent on the choice of the performance degradation data to which they are applied [14]. However, engines have many sensor parameters. The sensitivity of the data from different sensors varies in terms of showing engine performance degradation; the data from some sensors is sensitive and the data from other sensors is not sensitive. Therefore, it is necessary to select suitable sensor parameters whose data are more sensitive to the engine’s performance degradation trend as the training data for the RUL prediction model. By observing the characteristic variations of the data from all sensor parameters, quadratic fitting curve is used to fit the degradation data from different sensors and rank the engine’s sensor parameters by sensitivity.

Three problems hinder the implementation of performance degradation feature extraction in practice. The first is to select the most sensitive performance degradation features for identifying performance degradation trends easily. The second is that the relevant performance degradation features are often not available and unknown a priori; a large number of candidate performance degradation features have been proposed to better represent the performance degradation state. The last is that most traditional methods of extracting performance degradation features for prognostics are unsupervised and cannot automatically adjust the feature extraction modal parameters based on feedback from the prediction [1517]. Such feature extraction and choice is significant but represents a principal shortcoming of popular prognostics algorithms: the inability to extract and organize discriminative or trend information from data. Therefore, it is important to develop an automatic feature extraction method that is capable of extracting the prominent feature to achieve better insight into the underlying performance degradation state.

Deep learning, a new method that has been put forward in the last few years, can be used to extract multilevel features from data, which means the method could express data at different levels of abstraction [18]. Deep learning is an end-to-end machine learning system. It can automatically process an original signal, identify discriminative and trend feature in the input data layer by layer, and then, directly output the classification/regression result. The whole process of feature learning and classifier/regression model training is based on optimizing an overall objective function. In contrast, traditional machine learning processes are divided into several discontinuous data preprocessing steps, such as manual feature extraction and classifier/regression model training, and each step is based on optimizing a separate objective function. Due to the advantage of feature self-learning, deep learning has had great success in applications in artificial intelligence, including computer vision (CS), natural language processing (NLP) [19, 20], object recognition [21], and image information retrieval [22, 23]. Deep learning is not only popular in the academic world but also favored in the industrial world. Companies such as Google, Microsoft, Apple, IBM, and Baidu [24], whose products are widely used, are researching deep learning and have made achievements, such as AlphaGo.

There are many deep learning methods: deep neural networks (DNNs), convolutional deep neural networks (CNNs), deep belief networks (DBNs), and so on [25], for instance, have been proposed. The stacked sparse autoencoder (SAE) [26] is one of the most commonly used deep neural network approaches. SAE consists of multilayer autoencoder such as sparse autoencoder, denoising autoencoder, and so on. Sparse autoencoder is on the basis of autoencoder and introduced sparse constraint condition to aid the expression code as sparse as possible. Denoising autoencoder can learn to remove the noise which is added to the initial input data and extract more robust expression of the input data [27]. For this reason, SAE can effectively capture the important factor of input data, extract more helpful and robust features of data, and then realize excellent performance in pattern recognition and machine learning.

In recent years, various researchers have demonstrated the success of DNN and SAE models in the application of machine health monitoring, such as fault classification of induction motor operated under six different conditions, vibration based fault diagnosis of rolling bearing and hydraulic pump, fault detection within tidal turbine’s generator from vibration data acquired from an accelerometer sensor placed within the nacelle of the turbine, vibration based condition monitoring of air compressors, multi class fault classification of spacecraft using large variety of data generated during the spacecraft test, anomaly detection and fault disambiguation in large flight data, drill bit and steel plate health monitoring using vibration data, fault recognition of voltage transformer in electric power industry and so on [2836]. Most of the research of SAE based health monitoring mainly focus on anomaly detection and fault diagnosis at present. However, there are few applications on RUL prediction, especially for aircraft engine RUL prediction.

Consequently, a prognostics method based on a stacked sparse autoencoder is proposed to promote self-learning of multilayer features and to predict the RUL of an aircraft engine. The remainder of this paper is organized as follows: Section 2 presents the entire prediction method procedure and framework. Section 3 presents and discusses the prediction results. Finally, conclusions are drawn in Section 4.

2. Methodology

This section introduces the relevant algorithms used in this research. As depicted in Figure 1, the whole procedure for RUL prediction for an aircraft engine consists of two main steps: data preprocessing and RUL prediction using the SAE.

2.1. Data Preprocessing

Selection of sensors that are sensitive to performance degradation and standardization of sensor data with different dimensions are the primary tasks necessary to obtain a high RUL prediction accuracy. Three steps are needed to preprocess the data.

2.1.1. Sensor Selection

Different sensors in an aircraft engine have very different responses to the performance degradation process. Some sensors show unclear tendencies because of noise or insensitivity to degradation trends. Choosing insensitive parameter data may reduce the RUL prediction accuracy. To improve the performance of the prediction model, sensors that are more sensitive to the performance degradation process are chosen as inputs to the RUL prediction model. A method called slope analysis is proposed for sensitivity measurement. Its three main steps are as follows: Step 1:curve fitting is performed on the degradation data for each parameter of each engine. Then, the parameters of the best-fit curves, called slopes, are used to analyze the sensitivity of the degradation data.Step 2:the average values of all the engine parameters in the step 1 that belong to the same sensor are calculated. Then, the different average parameter values for the different sensors show the individual sensitivity of the degradation data.Step 3:the degradation data with larger slopes are selected for predicting the RUL of the engine.

2.1.2. Data Normalization

The linear function that best preserves the original performance degradation pattern of the aircraft engine is chosen to map the data for each selected sensor to [0, 1].

2.1.3. RUL Normalization

The proposed prediction method outputs a result in the range from 0 to 1. In the training stage of the prediction model, the RUL of each cycle of aircraft engine should also be normalized to [0, 1] using a linear function. The test outputs of the prediction model need to be inversely mapped from [0, 1] to the real RUL.

2.2. SAE Model Construction
2.2.1. Deep Architecture

Cortical computations in the brain have deep architecture and multiple layers of processing. For example, a visual image is processed in multiple stages by the brain, first by cortical area “V1,” then by cortical area “V2,” and so on [37]. Inspired by the information-processing scheme of the brain, deep neural networks have similar deep architectures and multiple hidden layers, which can support complex recognition tasks [6, 37]. As is typical of deep neural networks, the stacked sparse autoencoder (SAE) consists of multiple autoencoders. Compared with traditional neural networks with shallow architectures, it can learn features better and extract deeper discriminative representations [38].

However, it is difficult to train deep architectures [39]. This problem has been addressed by Hinton et al. [4042], who showed that deep architectures can be trained by relying on two main procedures: (1) on the basis of a unsupervised autoencoder, the deep architecture layers are processed by pretraining, and the output of the top layer’s autoencoder is used as the input to a logistic regression and (2) fine-tuning based on backpropagation is used to adjust the model parameters to obtain accurate prediction results.

2.2.2. Sparse Autoencoder

An autoencoder, first introduced by Hinton et al. [40], is a general form of deep learning method [43] that has been extensively used in unsupervised feature learning. As shown in Figure 2, an autoencoder has three layers: an input layer, a hidden layer, and an output layer. The whole network is trained to realize the reconstruction from the input layer to the output layer, while the hidden layer is accepted as the key feature. However, the traditional autoencoder is not an efficient way to obtain significant representativeness due to its intrinsic limitations. The SAE, as an extension of an autoencoder, can be trained to obtain relatively sparse representatives by introducing a sparse penalty term into the autoencoder [44]. The sparse features learned by the SAE have meanings that are more practical in experiments and applications.

The SAE model contains two parts: (i)An encoder mapThe encoder maps an input vector (the ith training example) to the latent representation through deterministic mapping where sigmoid is the activation function of the encoder with weight matrix and bias vector .(ii)A decoder mapThe decoder maps feature back to a reconstruction of the vector in the output space [45] through a mapping function The decoder map tries to learn a function , which means making the output similar to the input . Similarly, sigmoid is set as the activation function of the decoder map with weight matrix and bias vector .

During the learning process, the parameters of the SAE are adjusted using backpropagation by minimizing the cost function within the sparsity constraint. The sparsity constraint works on the hidden layer to limit its units and makes it into a sparse vector in which most elements are zero or close to zero [44]. For the autoencoder’s network structure, a neuron with a sigmoid activation function is in the active state if its output is close to 1 and the inactive state if its output is close to 0. Therefore, the sparsity constraint is introduced to restrict most of the neurons to inactivity most of the time.

The activation of hidden unit is denoted by , and the average activation of hidden unit is as follows:

Then, we define the sparsity constraint as , where denotes the sparsity criterion and has a value that is close to zero, that is, most of the neurons in the hidden layer are inactive.

To reach the goal of sparsity, a penalty term is introduced to the objective function that penalizes if it deviates significantly from . In our study, the KL divergence [45] is selected as the penalty term;

The training set of training examples is denoted by , and the original cost function is defined as

The first term in (5) is an average sum-of-squares error term, and the second term is a regularization term or weight decay term, which tends to decrease the magnitude of the weights. Here, and are the same as in (1) and (2), and is the weight decay parameter.

By adding the sparse penalty term, the cost function is modified to where represents the weight of the sparsity penalty term.

2.2.3. Denoising Autoencoder

Despite the process described above, learning features well to improve the performance and generalization ability of the prediction model continues to face challenges because of the noise and outliers that commonly appear in real-world data. To force the hidden layer to discover more robust features, the autoencoder can be trained by reconstructing the input from a corrupted version of it, which is the idea behind denoising autoencoders [37], as shown in Figure 3.

These data corruption is implemented by corrupting the initial input to create a partially destroyed version by means of a stochastic mapping,

The standard approach is to apply masking noise to the original data by setting a random fraction of the elements of to zero. Next, the corrupted data pass through a basic autoencoder process and is mapped to a hidden representation,

From this equation, we reconstruct

In the last stage, the parameters are trained to minimize the average reconstruction error to make as close as possible to the uncorrupted input .

2.2.4. Structure of the Stacked Sparse Autoencoder

As a typical neural network, the stacked autoencoder consists of multiple layers of sparse or denoising autoencoders (discarding the decoder) and a logistic regression. The outputs of each layer of the stacked autoencoder are wired to the inputs of the subsequent layer. The architecture of a two-layer stacked sparse autoencoder is shown in Figure 4. Each sparse or denoising autoencoder generates a representation of the inputs (data from the aircraft engine’s sensors) that is more abstract and high dimensional than the previous layer’s because it is obtained by applying an additional nonlinear transformation. The output of the last layer of the sparse autoencoder are input to the logistic regression and then, the results (the predicted RUL) are obtained.

(1) Prediction Using Logistic Regression. The purpose of logistic regression is to find an optimal model for matching independent variables and class distinctions of dependent variables (probabilities of the occurrence of an event). The logistic function is expressed by

The logistic or logit model is where is a linear combination of the independent variables .

Parameters of models (such as ) need to be determined beforehand, which is the major premise for determining . Because of the existence of dichotomous dependent variables, it is improper to estimate the values of the parameters using the least-squares method [46]. Therefore, compared with the method of minimizing the sum of the squared errors, the paper uses the maximum likelihood method to estimate the parameters (such as ) of the logistic regression [47]. Then, the probability of the occurrence of the event can be obtained using (11) once the vector has been determined.

(2) Fine Tuning. The process of fine-tuning mainly focuses on adjusting the weights in the SAE network, which leads to much better prediction performance.

First, feed-forward is used to compute the activations for all the autoencoder layers.

In the next step, we set for the output layer, where , is the input label, and is the vector of conditional probabilities. Then, for layers , we set , and then, the desired partial derivatives are where , , and are as in (1) and (2).

Finally, the batch gradient descent algorithm is used to minimize the overall cost function.

2.3. Training and Optimization of SAE-Based RUL Prediction
2.3.1. Procedure for Training the SAE-Based RUL Prediction Model

A two-layer stacked sparse autoencoder and a logistic regression (LR) model were used as an example to illustrate the training procedures in the proposed deep learning-based RUL prediction methodology. The values of the SAE parameters are predetermined. A grid search is used to find a set of optimal SAE parameters. The four major steps of the procedure are as follows: Step 1:a single-layer denoising autoencoder (DAE), the first layer of the SAE, is trained to extract robust performance degradation features using unsupervised learning [37]. The signals of the selected sensors are input into the DAE, and then, low-level features are output by the hidden layer of the DAE.Step 2:a single-layer sparse autoencoder (AE), the second layer of the SAE, is trained for unsupervised self-learning of features. The low-level features are input into the AE, and the high-level features are output by the hidden layer of the AE.Step 3:high-level features are used as inputs to train the LR model for RUL prediction. The target output of the LR model is the normalized RUL of the aircraft engine.Step 4:the previously trained SAE and LR model are combined into an integrated feature learning and RUL prediction model. Then, the integrated model is trained using supervised learning for the final feature learning to obtain the RUL prediction model. The signals of the selected engine sensors are the inputs of the integrated model, and the normalized RUL of the engine is used as the target output during model training. Training the integrated model using supervised learning can fine-tune the modal parameters (the parameters of the DAE, the AE, and the LR models) based on the values obtained in the previous training, steps 1 to 3. The features obtained from the fine-tuned DAE and AE more clearly present the degradation trend of the engine performance. Based on these features, the LR model can provide a more accurate RUL prediction result.

The process of training the proposed RUL prediction method is summarized in Figure 5.

2.3.2. Grid Search-Based SAE RUL Prediction Model Parameter Optimization and Validation

The hyperparameters of the deep learning, which have significant impacts on the feature extraction performance, are adjusted in most cases based on expert experience. In view of the difficulty of adjusting hyperparameters by means of deep learning, a method of optimizing the hyperparameters is necessary.

There are currently two main types of method of automated hyperparameter selection for the SAE (which is shown in Figure 6). One includes model-free methods, which include the grid and random search methods; the other includes model-based methods, which mainly include three subcategories, the Bayesian optimization (e.g., spearmint [48]), nonprobabilistic methods (such as RBF surrogate models [49]), and evolutionary algorithms (e.g., genetic algorithms [50] and particle swarm optimization [51]). Model-based methods efficiently explore the solution space according to the algorithm selected and then, quickly obtain the accepted parameter value. However, the identified hyperparameter value may be a local optimum, and the method has several individual hyperparameters, which would increase its complexity.

Unlike model-based methods of hyperparameter selection, model-free hyperparameter selection methods search for the optimal parameters within the defined space; the main ones are grid and random searching [52, 53]. In the paper, the grid search method is chosen to search for the hyperparameters of the SAE.

There are a few reasons why grid search is chosen as the hyperparameter optimization algorithm used in the proposed SAE-based RUL prediction model. (1)Compared with the manual search method of optimizing the hyperparameters, a grid search is more likely to identify better model parameters than pure manual sequential optimization (in the same time).(2)Compared with model-based hyperparameter selection methods, a grid search is simple to implement, and parallel computing is easy to implement.(3)Compared with the random search method of hyperparameter optimization, mesh searches are recommended when few parameters need to be optimized.

Theoretically, when the space defined by the optimized parameters is large enough and the changes in the optimal parameters are small enough, the optimization method called mesh searching could be used to find the global optimal solution.

There are three main steps in the grid search-based hyperparameter optimization of the SAE RUL prediction model. Step 1:the hyperparameters to be optimized are defined in the space, and the space is divided into grids with a fixed step size. Each point on each grid is a combination of model parameters.Step 2:the training set is divided into several subsets of equal size. Then, the SAE is trained with one combination of model parameters. Details of the procedure for training the SAE-based RUL prediction model are in Section 2.3.1 of this paper.Step 3:step 2 is repeated until the grid search has been completed. The resulting optimal hyperparameters are output.

Figure 7 shows the process of the method. To obtain an SAE with the optimal parameters, predicted degradation data for the engines are input into the trained model, and the RUL of each engine is obtained.

3. Case Study

3.1. Engine Data Description

The challenge datasets used for the prognostics challenge competition at the 2008 PHM International Conference consist of multiple multivariate time series, which were collected via a dynamical simulation of an engine system. The model simulated various degradation scenarios in any of the five rotating components of the simulated engine (fan, LPC, HPC, HPT, and LPT), and the connections among the engine modules in the simulation are shown in Figure 8. The engine begins in normal operation, then, degradation appears in some cycle of the simulation. The degradation data for each engine are recorded until the engine fails. The simulation model results in 218 engine datasets defined as unit 1 through unit 218 with different failure times measured by the number of operating cycles for the same engine system.

The complete dataset for each cycle of each engine unit consists of the unit ID, the operating cycle index, the operational regime settings, and typical sensor measurements. A total of 21 sensors (shown in Table 1) are installed in different components of the aircraft engine. A total of 21 sensory signals are obtained under the R1 operation regime shown in Table 2. In this study, sensor data were collected from 200 aircraft engines injected with the HPC degradation fault mode. The dataset considered in this study consists of three files, which include degradation data for 100 training and 100 testing units and the remaining useful life of the 100 testing units. Each training unit runs to breakdown, and each testing unit stops running at some time before it breaks down. Through investigation and research, it is found that the dataset is highly authoritative and accurate [5456].

3.2. Results and Discussion
3.2.1. Data Preprocessing

In the process of engine performance degradation, the performance data from the sensors gradually change over time, and the data indirectly reflect the degradation tendency of the engine’s performance. However, the sensitivity of different parameters to degradation varies over time. Figure 9 shows the degradation tendency of the 21 performance parameters. According to Figure 9, data from seven of the sensors (1, 5, 6, 10, 16, 18, and 19) exhibit no tendency, so the sensitivities of the remaining parameters to engine performance degradation are analyzed. The results of the 14 performance parameters for which the sensitivity analysis is conducted are shown in Table 3. To reduce the computational complexity, the data from the first six sensors (4, 7, 8, 11, 12, and 15) in the sensitivity ranking are selected. By surveying and analyzing relevant information about the RUL of engines, parameters T24 and T30 are also chosen as objects of study. Finally, eight performance parameters (2, 3, 4, 7, 8, 11, 12, and 15) are chosen for predicting the RUL of the aircraft engine [5759].

3.2.2. SAE Parameters Optimized Using Grid Searching

The SAE used in this paper has eight hyperparameters: input layer, hidden layer 1, hidden layer 2, output layer, learning rate of SAE, learning rate of NN model, and number of training cycles. Based on the results, the parameters input layer, learning rate of NN model, and number of training cycles are shown in Table 4, which is a good parameter match. Then, there are three hyperparameters that need to be optimized using grid searching. The grid search method of automated hyperparameter selection for SAE is performed in a defined space with a fixed step size. The proposed parameters of the SAE obtained using the grid search method are shown in Table 5.

3.2.3. Results

Through automated selection of the DNN hyperparameters using a grid search, the experimental results show that the method is effective, with an accuracy rate of up to 83.82% and an acceptable rate of up to 86.32% of ranking first (shown in Table 6). Compared with the accuracy of the 2008 PHM data challenge engine life prediction, the first-rank prediction accuracy is 84.19% (shown in Table 7), and the RUL prediction accuracy is quite close. However, there are six types of working condition and 218 training and test sets in the FD005T dataset, which were used in the data challenge. Then, comparing the method proposed in this paper and the 2008 PHM data challenge provides only a relative comparison. The accuracy of the RUL predictions obtained in this paper is acceptable in the field of engine prediction, and the results are satisfactory. Table 8 shows the first seven optimal parameter arrays obtained by the grid search method. Table 6 shows the life prediction results based on the first seven optimal parameters. Table 7 shows the seven most accurate engine life predictions in the 2008 PHM data challenge.

4. Conclusions

In this paper, a new data-driven approach to engine prognostics is developed based on deep learning that can capture effective nonlinear features by themselves and reduce manual intervention. The SAE, a type of deep learning model, is not only able to capture the tendency of the system to evolve but also sufficiently robust to noise. To automatically select the hyperparameters of the SAE, the grid search algorithm is used. The method of predicting an aircraft engine’s remaining useful life is applied to the 2008 PHM data challenge dataset to demonstrate the effectiveness of the proposed approach. The experimental results, which show a satisfactory prediction accuracy and acceptance rate for all the samples, show that the method is effective at predicting the RUL of an aircraft engine. It also has significance for enhancing the safety of aircraft engines and prognosticating and managing the health of aircraft engines to reduce the cost of maintenance.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grant nos. 51605014, 51105019, and 51575021), the Aviation Science Fund (Grant no. 20163351018), the Technology Foundation Program of National Defense (Grant no. Z132013B002), and the Fundamental Research Funds for the Central Universities (Grant no. YWF-18-BJ-Y-159).