Abstract
The prediction of the aero-engine performance parameters is very important for aero-engine condition monitoring and fault diagnosis. In this paper, the chaotic phase space of engine exhaust temperature (EGT) time series which come from actual air-borne ACARS data is reconstructed through selecting some suitable nearby points. The partial least square (PLS) based on the cubic spline function or the kernel function transformation is adopted to obtain chaotic predictive function of EGT series. The experiment results indicate that the proposed PLS chaotic prediction algorithm based on biweight kernel function transformation has significant advantage in overcoming multicollinearity of the independent variables and solve the stability of regression model. Our predictive NMSE is 16.5 percent less than that of the traditional linear least squares (OLS) method and 10.38 percent less than that of the linear PLS approach. At the same time, the forecast error is less than that of nonlinear PLS algorithm through bootstrap test screening.
1. Introduction
Aeroengine condition monitoring is applied as a better way to estimate aero-engine condition and reliability amongst most of the commercial airlines [1, 2]. Predicting and detecting abnormal behavior of main aeroengine performance parameters is of great importance in the aeroengine condition monitoring. With the development of modern industry, one typical aeroengine becomes a more and more integrative and complex system, of which working condition and fault are very difficult to predict and judge, and the influence and harm caused by system faults are also more serious than ever. Therefore, there will be a growing demand for effective prediction methods of long-term trend of aeroengine system to predict abnormal behavior of significant performance parameters and potential faults. Then we can take preventive maintenance measure to avoid faults occurrence, shorten aircraft down time, thereby reducing maintenance cost and improving aeroengine life.
The aeroengine system is a complex system with nonlinear and dynamic chaos action, so it is difficult to establish an exact analytic mathematical model of performance parameters. As a result, analysis usually depends on time series obtained from observation. But in practical monitoring the availability of obtaining characteristic data is very limited, and one fault usually shows various changes in performance parameters, and a same change may result from different faults [3, 4]. However, extracting multiple fault characteristic of parameters usually brings complexity and redundancy, which may bring difficulty for the following prediction. The theory of chaotic prediction is a mathematics method to deal with inexactness, uncertainty, and incompletion, which is remarkable efficiency in aspects of data reduction and feature extraction.
Studies on chaos prediction can be traced back to Parker and Chua (1989) [5]. More sophisticated approaches of nonlinear chaotic time series prediction mainly rely on neural network [6], support vector theory [7], fuzzy rules [8], a simple least square algorithm [9]. In recent years, considerable interest has been aroused in the study of nonlinear chaos prediction with the actual observation of aeroengine dynamic system [10, 11]. These papers summarize the prediction model of nonlinear chaos system based on ordinary least squares (OLS); however, there are some practical problems in those models such as serious multi-collinearity and poor robustness of regression function.
Partial least squares (PLS) can overcome information overlaps caused by multi-collinearity since it adopts canonical correlation analysis, which improves accuracy of nonlinear system modeling [12]. What is more, PLS adopts cross validity method, which rejects redundancy principal component and enhances regression equations robustness. Hence, in our investigation the designed chaotic algorithm is based on the nonlinear partial least squares (NPLS) regression and the prediction research on the time series.
2. Data Pretreatment
Typical aeroengine performance parameters are exhaust gas temperature (EGT), low-spool rotor speed (NI), high-spool rotor speeds (N2), and fuel flow (WF) [13]. These four measurements are often called the four basic parameters, which can be recorded in the air-borne equipment on most new and old aeroengines. Note that we take the most important performance parameter, EGT, as example to analyze in this paper; the other parameters can be fitted in the same method.
2.1. Similarly Correcting EGT Sequence
According to operating principle of the aeroengine, the data of main performance parameters of the same aeroengine in different external flight conditions such as the external atmosphere total temperature, total pressure, MACH numbers are quite different, so the original engine exhaust temperature EGT data usually cannot be directly used for comparison analysis. To solve this problem, similar theory is adopted to eliminate the influence of external conditions on the aeroengine EGT sequence [14]. where EGTobs is the original EGT data, EGTcor is the corrected value of , is the total air temperature.
2.2. Eliminating Outliers According to Layida Criterion
If outliers exist in the data, they will seriously affect the prediction accuracy of an algorithm. Layida criterion is the most commonly used for discrimination outliers, of which the precondition is that there is sufficient data [15]. For time series , if is large enough (normally ), and only random disturbance exists in all data. Assume that the random time series is following a normal distribution whose mean is μ and deviation is , so the probability that the deviation falls in ranges is about 0.3%. According to Layida criterion, if the absolute deviation of a time series data is larger than , then can be judged as an outlier and should be removed.
Let the number of sample be , the observed value of time series is , the sample mean and standard deviation is and , respectively. If the sample point meets , then is an outlier which will be eliminated. and should be computed again after all the outliers are removed.
3. Chaos Prediction Model Based on Nonlinear Partial Least Squares (PLS) Regression
Suppose observed time series is , the chaotic phase space is reconstructed by [16]. where is the delay time, m is the embedding dimension and is the length of the time series.
3.1. Characterizing Chaos through Lyapunov Exponents
Through the years, the existence of chaos has been characterized in time series by means of several methods, among others: analysis of Fourier spectra, fractal power spectra, entropy, fractal dimension, and calculation of Lyapunov exponents. However, several of these methods have proven not to be very efficient [16]. The Lyapunov exponents provide all the information about the local and global complexity of chaos, therefore the measurement of Lyapunov exponents has been an effective method to judge whether a time series is caused by a chaotic dynamical system.
The Lyapunov exponents are the average rates of expansion () or contraction () near a limit set of a dynamical system [16]. In fact, the LE is a quantitative measure of the divergence of several trajectories in the system. There is one LE for each direction in the d-dimensional phase space where the system is embedded. The variable d represents the Lyapunov dimension or phase space dimension. The LE does not change when the initial conditions of a trajectory are modified or when some perturbation occurs in the system. If at least one LE is positive, the system presents chaotic motion [17]; hence if chaos exists in a dynamical system, the largest Lyapunov exponent must be positive.
We will calculate the largest Lyapunov exponents according to the algorithm offered by Wolf et al. (1985) [17]; the numerical results are shown in Table 1.
3.2. Chaos Prediction Model
Let state vector at time be , we obtain the nearest neighbor points by calculating the Euclidean distance between the target points and any one of reconstruction vectors [18], where
Let be the sample matrix of reconstructed variable,
Let be the prediction variable, then the nonlinear chaos prediction model is described as where is the column of , is the length of prediction step.
In practice, the observed data sets of independent variable and dependent variable can usually be obtained. However, the model form of their relations is unknown especially in case of higher dimensions, and the problem becomes more complicated when the relationship between the dependent variable and independent variables is nonlinear. Usually (3.4) is fitted by additive model or multiplicative model. For the sake of computation convenience, we consider the addition model of each independent variable; that is,
At present, the nonlinear relationship between the original variable usually is into a quasi-linear relationship, after then, the prediction model is established by an effective linear theory method. Spline function and kernel function are commonly used in the transformation of the basic functions.
Cubic Spline Function Transformation
Let be the cubic spline fitting function, then, where where are the model parameters to be determined; are range points, segment length and section number are divided on the variable , respectively.
The minimum observed value of variables is recorded as min() and the maximum as max(), then the nonlinear prediction function relationship between independent variables and dependent variable can be transformed as the following equation:
Equations (3.6)–(3.8) show that the relationship between and is linear, and the relationship between and is nonlinear, so the chaos prediction model with cubic spline function transformation is a quasi-linear regression model (see in (3.10)).
Kernel Function Transformatio
The principle of Kernel function transformation is same as spline function transformation; that is, a kernel function transformation based on one dimensional nonlinear function , and the cubic spline function is replaced by basic function . Thus, the chaos prediction model with kernel function transformation is as follows [19]:
The commonly used kernel functions include uniform kernel function, Epanechnikov kernel function, biweight kernel function, tri-weight kernel function and Gaussian kernel function.
3.3. Nonlinear Partial Least Squares Regression (NPLS)
However, there exist some problems when establishing the above quasi-linear regression model based on ordinary least squares method, because the dimension of regression model will increase significantly after a function change, which may lead to the result that the number of sample points is less than the number of transformed variables; on the other hand, the data of reconstructed variable are come from the same series, and the nonlinear correlation exists between each other. So it is not appropriate to employ the ordinary least squares method to estimate the model parameters when the multicollinear relationship among the new transformed independent variables is found. One of the effective methods to solve this problem is to use partial least squares method (PLS) to estimate parameter of the predictive model.
In this paper, a modified chaos prediction approach based on nonlinear PLS regression with basic function transformation is developed. The cubic spline function and various kernel function transformation are used in our model, and the comparative analysis between them is presented in Section 5.
3.4. Prediction Evaluation
Normally, prediction accuracy is evaluated by the mean squared error MSE, where is predict value, is observed value. However MSE is concerned with the order of the magnitude of the observed data. In this paper, the normalized mean square error NMSE is used as the evaluation criterion of prediction. where is sample variance of the prediction sequence. Our investigation adopts the mean of 20 predicted values NMSE in order to overcome contingency.
4. Algorithm and Algorithm Flowchart
Step 1. The observational data of aeroengine performance parameter EGT are collected and similarly corrected, and then the outliers are removed according to Layida criterion.
Step 2. For the selected embedding dimension m and delay time ), the chaos phase space on pretreated EGT time series is reconstructed, and the appropriate nearby points are selected to construct independent variable matrix and dependent variable vector y.
Step 3. Choose section number M ; each row vector in independent variable matrix is transformed to by various basis function transformation such as (3.8), then
Step 4. The new independent variable and dependent variable are standardized and marked as , so the prediction model is described as where ) are the parameter of regression model (4.2), is the stochastic error and follows the normal distribution with zero mean and the same variance.
Step 5. The partial least squares (PLS) is applied to build regression model (4.2). The number of appropriate PLS components is extracted by effective cross-method [20]; estimate the regression coefficients , and obtain the regression coefficients , of original model (3.5) by returning .
Step 6. If estimation of prediction model for all the combination of are completed, then go to step 7, otherwise go to step 3.
Step 7. Calculate the predicted value according to prediction function (3.6) and calculate the corresponding predicted NMSE.
Step 8. If the calculation of the combination of all the embedding dimension and delay time is completed, then go to step 9, otherwise go to step 2.
Step 9. Find out the optimal prediction model with the smallest NMSE, and obtain the corresponding dimension , delay time and section number .
Algorithm flow chart is shown in Figure 1.
5. Results and Analysis
The in-the-wing ACARS data of four PW4077D aeroengines have been collected for one year, and about 1400 EGT data of each engine were selected to form a time series used in numerical example analysis.
The maxim value of embedding dimension , the delay time , and the segment number in the above algorithm are estimated after a large amount of experiment. In our research, the range of is among 1–30, is 1–20, is 1–8, respectively. The cubic spline function and five kernel function introduced in Section 2 are employed for transformation, and we minimize chaotic prediction NMSE based on PLS algorithm.
The numerical results of the first engine are shown in Table 1, in which line 1 to line 6 show the prediction results based on various NPLS, line 7 to line 8 show the prediction results based on the common PLS and OLS.
As can be seen in Table 1, all the largest Lyapunov exponents are positive and it is clarified that the nonlinear and chaotic natures exist in EGT sequence. The first seven NMSE calculated based on PLS are all less than that based on OLS, which indicates that there is an obvious advantage of PLS over OLS in fitting chaotic prediction models. The NMSE of PLS based on various nonlinear transform methods is even less than that of free transformed PLS, and it indicates that there is a greater advantage of chaotic prediction based on PLS in forecasting aeroengine EGT. PLS prediction algorithm based on biweight Kernel transformation is the optimal chaotic prediction algorithm, estimated optimal embedding dimension is 8, the delay time is 19, and the number of nonlinear transformation section is 2.
PLS regression model includes all of the original selected variables, and there is multi-collinearity in the independent variables, which can affect the prediction accuracy. The model parameters estimated based on PLS is of a pretty complex nonlinear character, so the independent variables cannot be selected by the parameter test of linear methods. In this paper, the nonparametric statistical method (Bootstrap [20]) is adopted to select the independent variables transformed with kernel function and Cubic spline function, and to find the minimum NMSE; the optimal results of the first engine are shown in Table 2.
Table 2 shows the predictive results based on various NPLS algorithm after selecting independent variable by Bootstrap method. Only the optimal predictive NMSE based on triweight Kernel function transformation decreases, and the minimum NMSE in Table 2 is larger than that in Table 1; it indicates that better NMSE cannot be achieved by Bootstrap test for selecting independent variables.
Figure 2 shows the comparison of the prediction values based on OLS, PLS, and biweight Kernel transformation chaos algorithm with the original value. The PLS algorithm based on biweight Kernel function transformation has the optimal NMSE in Figure 2. However, not all of the predictive value has the minimum forecast NMSE.
The deviation between the predictive value and original value based on the above three algorithms are shown in Figure 3. The fluctuations of the deviations curve based on biweight Kernel method with the optimal NMSE is the smallest, and larger predicted error is not appeared in its deviation sequence.
Through the numerical results of the first engine we also find that the search range of designed algorithm is enormous and it takes a long time, so we hope to further reduce search range and the algorithm complexity. The percentages of the number of NMSE in Table 1 which falls in are statistically analyzed, and the results are shown in Figures 4, 5 and 6.
As can be seen from Figure 4 to Figure 6, the percentage of NMSE falling in is lower for different value of , τ, M, which indicates that the probability of obtaining suboptimal solution by our algorithm is very low, and also illustrates that the selection of variables is of significant importance to improve the prediction accuracy. From Figure 4 we can see that when the dimension of predicted model is over 15, there is no optimal NMSE which falls within the range above mentioned, So the range of in the algorithm can be selected among ranging from 1 to 15. Figures 5 and 6 also show that the predicted NMSE distribution of various delay time and section number are more dispersed, so the search range of and cannot be minimized.
The brief computational results of the other three engines are as in Table 3.
From the above calculation results we can see that the algorithm based on various nonlinear PLS (Cubic Spline Interpolation, Uniform Kernel, Epanechnikov Kernel, biweight Kernel, Tri-weight Kernel, and Gaussian Kernel) is superior to that based on PLS, and the algorithm based on PLS is superior to that based on OLS. The results in Tables 1 and 3 show that the predictive results derived from the algorithm based on biweight Kernel are the optimal among the four aeroengines. And there are no significant difference between the first aeroengines and the second aeroengines in the same aircraft in predictive NMSE, so we can conclude that the proposed algorithm is stable for predicting the aeroengine parameter EGT.
Tables 1 and 3 also show that the optimal values of the embedding dimension (m), delay time , subset numbers searched by various NPLS algorithm are approximately the same, but they are quite different among the four aeroengines.
6. Conclusion
The chaos prediction model based on NPLS through Kernel function or Cubic Spline function transformation is developed and is applied in predicting the aeroengine performance parameters EGT. Numerical results show that the proposed model is able to effectively predict the performance parameter EGT with a higher degree of accuracy. The optimal embedding dimension is 8, the delay time is 19, and the number of segments is 2. NPLS chaotic prediction algorithm based on biweight Kernel function transformation is the optimal EGT prediction algorithm because of the smallest predictive NMSE. Our prediction NMSE is 16.5 percent less than that of the traditional linear least squares (OLS) method and 10.38 percent less than that of the linear PLS approach. The NPLS algorithm of chaotic prediction with selection variables through Bootstrap test fails to make NMSE further reduced. And we find that reducing calculation time complexity by lessening search range of embedding dimension will not affect the prediction accuracy of EGT. It is proved that the proposed algorithm is robust by the predictive results from four aeroengines.
In particular, the conservative baselines of main aeroengine performance parameters are given in ECM program provided by jet engine manufacturers [14]. Set the baseline value of EGT be EGTbase, the deviation between EGTcor and EGTbase can be defined as
The trend plot of the deviation () sequence in continuous flights is routinely used by airlines for the performance monitoring and diagnostics [13]. On the basis of our prediction algorithm, the EGTcor of upcoming flights can be predicted, and then a layout for trending of the deviation of main performance parameters for upcoming flights can be provided. Experienced power maintainer or engineer look at a given trend shift in the measurement deviations can identify significant behavior disparities, evaluate the aeroengine condition, and isolate the faulty module or component using the ECM trend plot reports and related PW4000 fingerprint.
In summary, accurately forecasting the aeroengine performance parameters is of great importance in the aeroengine condition monitoring. The anomalies of the aeroengine performance parameters can be detected in time by the predictive value calculated from our algorithm, then maintenance measures are taken early to prevent the aeroengine from failure effectively, which assists to reduce aircraft down time, and improve the reliability of the aeroengine.
Acknowledgment
This work is supported by the Fundamental Research Funds for the Central Universities under Grant no. ZXH2012 K005.