Abstract

Although very uncommon, the sequential failures of all aircraft Pitot tubes, with the consequent loss of signals for all the dynamic parameters from the Air Data System, have been found to be the cause of a number of catastrophic accidents in aviation history. This paper proposes a robust data-driven method to detect faulty measurements of aircraft airspeed, angle of attack, and angle of sideslip. This approach first consists in the appropriate selection of suitable sets of model regressors to be used as inputs of neural network-based estimators to be used online for failure detection. The setup of the proposed fault detection method is based on the statistical analysis of the residual signals in fault-free conditions, which, in turn, allows the tuning of a pair of floating limiter detectors that act as time-varying fault detection thresholds with the objective of reducing both the false alarm rate and the detection delay. The proposed approach has been validated using real flight data by injecting artificial ramp and hard failures on the above sensors. The results confirm the capabilities of the proposed scheme showing accurate detection with a desirable low level of false alarm when compared with an equivalent scheme with conventional “a priori set” fixed detection thresholds. The achieved performance improvement consists mainly in a substantial reduction of the detection time while keeping desirable low false alarm rates.

1. Introduction

The Air Data System (ADS) [1] is a critical component of the conventional suite of sensors for both manned and unmanned aircraft. The ADS provides direct measurements of the aircraft true airspeed; typically connected with the ADS are the vanes for measuring the important aerodynamic angles known as angle of attack and angle of sideslip. Since the Pitot probes and - vanes are installed on the fuselage and/or wings, they are exposed to external atmospheric agents; in general, these devices are robust to even extreme weather conditions. However, there are very specific weather conditions which can lead to peculiar ice crystals obstructing the tiny conducts of the ADS, thus inducing faulty sensor measurements. Incorrect measurements of airspeed could lead to unrecoverable flight conditions, such as in the cases of Air France Flight 447 [2] and the NASA X-31 experimental aircraft [3]. Additional causes of crashes included erroneous covering of the Pitot tubes (AeroPerù 757) [4] and formations of insect nests inside the static taps (Birgenair Flight 310) [5] as well as icing of the angle-of-attack vane (XL Airways Germany Flight 888T) [6]. At present, the conventional approach to provide fault tolerance for flight sensors is based on Hardware Redundancy (HR) [7] which basically consists in the installation of multiple sensors (initially 4, later reduced to 3) measuring the same parameter. These sensors are continuously monitored to check their status, and their values are continuously compared. The use of Built-In-Testing (BIT) voting schemes allows the detection and the isolation of a faulty sensor. An alternative approach is given by Analytical Redundancy (AnR) [8]. AnR basically involves the use of virtual sensors, such as conventional state estimators, neural networks, Kalman Filters, special cases of Kalman Filters (such as Extended Kalman Filters and Unscented Kalman Filters), and/or other predictive models, to provide alternative estimates of the parameter from the faulty sensor. Recently, an AnR scheme based on machine learning techniques has been proposed in [9] for the failure detection and correction of the airspeed sensor of an aircraft.

Through data from other functional sensors (measuring parameter highly correlated to the parameter associated with the faulty sensor) used as inputs, the virtual sensor can estimate the target measurement which when compared with data provided by the sensor under scrutiny produces the residual signal. The residual signal is clearly very important for fault detection purposes; in fact, an alarm status can be declared when the residual signal exceeds some detection thresholds. It should be emphasized that the approach proposed in this paper is completely data-driven [10, 11] and does not require any specific aircraft-dependent dynamic mathematical model. In summary, the innovative aspects of this research effort are the complete independence from any model-based requirement along with the introduction and the validation of “ad hoc” designed adaptive thresholds (floating limiters) that are applied to the whitened and filtered residuals with the goal of improving the performance in terms of fault detectability, fault detection delay, and low false alarm rate when compared with conventional fixed-threshold schemes [1215].

Recently, the authors of this paper have also proposed in [16] another data-driven approach for the design of robust fault detection filters for aircraft air data sensors. In [16], the robustness of the scheme is achieved through the adoption of parametric interval prediction models that are derived by the solution of a convex optimization problem. Instead, in the present study, the robustness of the failure detection scheme was achieved through time-varying fault detection thresholds that are derived online by a simple floating limiter filter that is based on the windowed mean and variance of the whitened residual.

The paper is organized as follows. Section 2 describes the identification model for the neural networks used in this study. Section 3 briefly introduces the P92 Tecnam aircraft from which the flight data were recorded. Section 4 outlines the data-driven modelling of the true airspeed, angle-of-attack , and angle-of-sideslip sensors. Sections 5 and 6 are dedicated to the analysis of the residuals and to threshold computation, respectively. Finally, Section 7 provides the results along with an assessment of the performance of the proposed scheme in terms of false alarm rate and detection delay compared with a conventional fixed-threshold fault detection system.

2. Prediction Model: Feedforward Neural Network

This section describes the neural network-based model used for the estimation of airspeed , , and as functions of other available measurements. Analytical Redundancy approaches are based on the use of online estimators that are supposed to provide an estimation of the sensor measurement. Several types of estimators can be found in the literature, such as Kalman Filters [17], Unscented Kalman Filters [18], Extended Kalman Filters [19], FIR filter estimator [20], and neural networks adaptive and nonadaptive [21, 22]. Specifically, in this effort, a nonautoregressive model [2325] has been used. The model is essentially a nonlinear time series model, also known as nonlinear regression [26] modelled as follows: where is the (target) estimation, is the final set of regressors (see Sections 4 and 5), and is a modelling error. Note that the NR model does not depend on previous values of; this guarantees that the estimation is not affected by the occurrence of a fault on . Thus, the NR model is able to provide a reliable long-term (multistep ahead) estimation independently of the occurrence of the fault; on the contrary, an adaptive model (i.e., NARX model) estimation is influenced by the previous occurrence of a fault on .

3. P92 Aircraft and Flight Data

The proposed scheme has been validated using flight data from the semiautonomous P92 Tecnam aircraft shown in Figure 1.

The P92 is a general aviation single-engine light aircraft with a classic high-wing configuration [27]. During the test flights, the aircraft was remotely controlled through a data link from a ground station. The on-board pilot was operative only during the take-off and landing phases. The nominal aircraft mass is 600 kg (1324 lb); the propulsion is provided by a Rotax 912 ULS, 74 kW (68 hp) with a two-blade fix pitch propeller, capable of providing a max cruise speed of 219 km/h (118 kts) and an operational ceiling of 4267 (14,000 ft). A total of 8 flights were recorded for approx. 3.5 hours of flight time. The available flight data were divided into two groups, that is, training data (used for the goal of training a pseudo-online learning neural network used as AnR-based virtual sensors) and validation data (used for the assessment of the performance of the estimation scheme). Since the recorded flight data were not completely homogenous, a different number of flights were used for the robust estimation of the and of the aerodynamic angles and .

For the and analysis, two flights were used for training purposes and five flights for validation purposes. Instead, for the analysis, two flights were used for training purposes and six flights for validation purposes. Data statistics for all the flights are shown in Tables 1(a), 1(b), and 1(c).

For all flights (both training and validation), only the portion of the flight at “cruise” altitude was considered. In fact, the take-off and the initial climbing, as well as the descent and the landing, were not considered since they are associated with specific aerodynamic configurations (including, for example, the use of flaps and/or high angles-of-attack conditions). This decision was made because it was clear that not enough data were available for training the neural estimators at these conditions since only a few minutes for each flight are available. Therefore, only flight data with altitude higher than were considered for the purpose of this study. The flight parameters were recorded with a sampling time of .

4. Data-Driven Modeling of the Aircraft Airspeed Velocity, Angle of Attack, and Angle of Sideslip

The true airspeed (), the angle of attack (), and the angle of sideslip () are all estimated as functions of a combination of measurements from all other sensors (excluding, of course, , , and ). Furthermore, the scheme does not require any information, data, or functional relationships such as the aircraft equations of motion. In other words, the approach can be applied without any loss of generality to any type of aircraft. Thus, the proposed approach can be classified as “data-driven” instead of “model-driven.” It should be clarified that in all cases, the GPS velocity was not considered. In fact, the GPS velocity and the are strongly related through the relationship . Therefore, a wind gust could result as an additive fault of the same amplitude.

Thus, , , and are estimated as function of other measurements using the model structure described in Section 2. The initial vector of regressors for , , and has been set simply selecting the signals that characterize the aircraft kinematic and dynamic response as well as considering the correlation coefficient with the target signal. The resulting regressors are where , , and are the aircraft roll, pitch, and yaw angles, respectively; and are the angles of attack and sideslip, respectively; , , and are the aircraft angular rates; , , and are the axial load factors; , , and are the elevator, aileron and rudder deflections, respectively; is the throttle; is the altitude; is the air temperature; is the true aircraft airspeed; is the aircraft climbing rate; is the track angle; and is the wind direction. Note that in more general cases, additional control surface deflections could be introduced.

To cope with possible nonlinearities and dynamic effects, each vector of potential regressors was augmented with the quadratic terms (only pure quadratic terms were added and not mixed terms), resulting in

5. Regressor Selection

A Matlab-based automated procedure known as Stepwise Selection [28] has been successfully used in this effort for the selection of the regressors. This scheme iteratively adds and removes variables relying on the t-statistics of the estimated regression coefficients. Starting from the initial set of regressors in (3), the final set of selected regressors provided by the Stepwise Selection procedure is given by

Specifically, Figures 2(a)2(c) show the variable selection process by the stepwise selection method for the three sensors. The index of the selected variables is represented by the red dots in the upper section of each figure while in the lower section the RMSE prediction error (evaluated on the training data) as function of the algorithm iteration is reported.

The correlations between each regressor selected by the stepwise process and the target signal are reported, for each of the sensors, in Tables 2(a), 2(b), and 2(c):

6. Multilayer Perceptron Models

The estimation model introduced in Section 2 is given by

For each sensor , , and , the function was mapped through a neural network architecture. After considering different options, the neural topology selected for this study is the well-known multilayer perceptron with one hidden layer. The specific architecture for each neural approximator is shown in Table 3.

Table 4 summarizes the statistical results achieved for the training flight data of the residual signals which are defined as the difference between the real values of the measured signal and its estimation .

In a recent study [16], the authors compared the performance of neural network models with those of simple linear regression models using the same set of regressors. In that study, it was verified that the mapping performances of the linear regression models are worse than those provided by the neural network models. This was the main motivation of the use of the neural network models in the present study. A possible motivation of this fact could be that the unknown identifying functions are not linear in the selected regressors; therefore, neural estimators work better than linear regression models.

7. Failure Modeling and Residual Whitening

Failure modeling is a very important task in the overall design of fault tolerant schemes [2931]. In the literature, several fault modeling approaches can be found, such as bias, freezing, drift, loss of accuracy, and calibration error [32]. In this effort, without any loss of generality, only two specific classes of failures have been considered, that is, step failure and ramp failures to simulate the injection of hard or soft failures. For each of these cases, the magnitude of the failure was also varied to test the fault sensitivity. Clearly, hard and soft failures provide different challenges in terms of detectability. The residual signal is defined as the difference between the sensor output and the estimate of the same signal, that is,

At nominal conditions (no fault), the signal should theoretically approach the statistical zero. Unfortunately, this is not true in practice due to unavoidable system and measurement noise as well as estimation errors. Thus, the goals of the fault detection scheme are to detect the occurrence of the failure, as soon as possible after its occurrence, along with the minimization of false alarm rates. From a detailed statistical analysis, it became evident that the residual signal is characterized by a high level of autocorrelation caused by uncertainties at low frequencies and noise. Therefore, in its raw form, the residual signal is not a suitable tool since it contains high levels of colored noise. In fact, the residual signal should be uncorrelated for optimal FD. Therefore, a whitening filter has been designed to remove as much correlation as possible from the residual. For our purposes, it was assumed, as shown in (9), that the correlated sequence can be modeled as an autoregressive (AR) process of order [33]: where are AR parameters and is white noise. The term can be considered as a one-step-ahead prediction of and as the prediction error. The parameters were estimated through the Least Squares method to the mean squared error of the training data. Then, after the computing of the parameters, the can be simply derived as output of the linear filter: where the discrete time transfer function can be easily obtained as

The signal is therefore labeled as “whitened residual” .

The correct length of the whitening filter was experimentally determined by evaluating for increasing values of the filter length , applying the Ljung-Box test [34] to verify whether the whitened residual exhibits a significant autocorrelation. Table 5 shows the final values of selected by this design process.

A comparison between the residuals before and after the whitening filtering is shown in Figures 3(a), 3(b), 4(a), 4(b), 5(a), and 5(b).

8. Design of the EWMA Filter

Following the design of the whitening filter, the design of the fault detection filters was performed using tools from the statistical process control (SPC). In this context, the exponentially weighted moving average (EWMA) filter is a well-known chart used to detect the presence of small shifts in process variables. The EWMA chart tracks the exponentially weighted moving average mean of all the previous samples, so that the most recent samples are weighted more heavily than the previous ones. Even though normality of the distribution is usually the basic assumption in many EWMA charts, this chart has exhibited enough robustness even with non-normally distributed data [35]. The output of the filter is the output of a first-order linear digital filter taking as input the whitened residual, that is,

Clearly, the selection of the appropriate value of the smoothing parameter is critical. Larger values of are associated with a larger weight of the current input on the current output; conversely, smaller values of are associated with larger weights of all the previous input samples on the current output. The output mean and standard deviation can be computed, at stationary condition, using (14). where is the mean and is the standard deviation.

A standard commonly adopted procedure for the selection of was not found in the literature. In this effort, the Least Squares method [36] was used, with the goal of finding an optimized value of so that the sum of the sample squared errors (SSE) of the one-step prediction error is minimized.

Figure 5 shows sample squared errors as function of for the EWMA signals produced by the signals that originate from the three AD models evaluated on the training data (SSE are all shifted so that for all the three filters).

Based on Figure 6, the tuning of the EWMA filter was performed selecting the value of such that the SSE is minimized. Thus, the minimum values of SSE were obtained with the values of in Table 6:

9. Data-Driven Threshold Setting

For FD purposes, conventional approaches include the use of fixed thresholds or varying (floating) thresholds. Several studies have been conducted using both the approaches [12, 13, 37, 38], typically resulting in a trade-off between them. Fixed thresholds are easier to set since they depend on a lower number of parameters. On the other hand, fixed thresholds may not allow detection following the occurrence of small amplitude failures. The setup of floating thresholds is, of course, more complex; however, they provide in general more robustness and better performance in terms of detectability, detection delay, and false alarm rate. Clearly, their success requires a careful design. A typical problem of floating (adaptive) thresholds can be given by an excessively fast adaptation to the failure that could result in a missed fault detection. In this effort, floating (adaptive) thresholds [39] are defined, as follows: where and are the mean value and the standard deviation, respectively, of the EWMA-filtered residual, both computed through a sliding window of length 1800 (180 sec), while and are design parameters. Once again, a standard procedure for setting and could not be found in the literature; therefore, an “ad hoc” sensitivity study was conducted using the training data to find the optimal values with the goal of minimizing false alarms. Specifically, a grid search was performed over the two parameters, with in the range 0 to 10 (with a 0.5 step increment) and in the range 0 to 3 (with a 0.05 step increment). For each grid point, the experimental was computed on the training data retaining only the couples of values ( and ) leading to an experimental . Next, between these couples, the best was selected as the one minimizing the following index:

Clearly a small value for implies a small band for the floating limiter. This is positive from a fault detection point of view. Table 7 shows the optimized values of and achieved applying the proposed procedure:

The results of this design are summarized in Figures 7(a)7(c), where the experimental probability plots derived from the training data are provided.

In each of the Figures 7(a)7(c), there are three lobes. The central lobe is the probability distributions of EWMA outputs, while the blue lobes are lower threshold probability distributions and the red lobes are upper threshold probability distributions. Reducing false alarms requires an intersection between the probability distribution of the EWMA filter and the probability distributions of the thresholds as small as possible. If the EWMA output does not exceed significantly the thresholds, this entails that those high false alarm rates are avoided.

10. Results

Fault detection during validation tests was performed using 5 flights for the true airspeed (), 6 flights for the angle of attack, and 5 flights for the angle of sideslip. The testing was conducted in simulated real time using a Simulink model. For each flight, an artificial additive fault was injected at with the detectors set using the values of , , and reported in Tables 6 and 7. Without any loss of generality, two failures were considered, that is, a “fast” step and a “slow” ramp (the duration of the ramp is 10 sec, then for , the amplitude remains constant at the value for ; in this case, for magnitude (or “size”) of the fault, we conventionally mean the steady state value reached by the fault signal for ) with modeling parameters shown in Table 8.

A critical parameter for a FD scheme is the number of false alarms. Clearly, a lower false alarm level implies higher robustness against system and measurement noise. The evaluation of the FD scheme was conducted in two steps; in the first step, the scheme was tested on every validation flight in a fault-free condition (without injecting any fault). This is due to the fact that false alarms are independent from the fault injection; therefore, false alarm rate was evaluated on the whole length of the flight assuming no fault. During the second step, a fault was injected to evaluate the detection delay. Tables 9(a), 9(b), and 9(c) show, for the three sensors, the number of false alarms in terms of number of samples along with the “longest continuous stay out of thresholds” time in seconds.

Other critical indexes for assessing the performance of a FD scheme are the fault detection delay and the minimum amplitude detectable fault. These indexes are defined as follows: (i)Fault Detection Delay. This is the time elapsed from the injection of the fault to the moment when the detector detects a “true fault.”(ii)Minimum Amplitude Detectable Fault. This is the minimum amplitude of the fault that produces the detection of a “true fault.”

In this study, a fault detection is considered a “true fault” if, considering an observation period of 60 s following the failure, the fault detection thresholds are crossed for a time duration longer than 4 s during the observation period; otherwise, the detection is considered not reliable and the true fault is not declared. This logic implies that the minimum decision time for a fault declaration is 60 seconds.

Tables 10(a), 10(b), and 10(c) show the performance for , , and in terms of detection delay, while Figures 8(a), 8(b), 9(a), 9(b), 10(a), and 10(b) show the threshold evolution during the injection of the fault.

The minimum amplitude detectable fault was computed by incrementally decreasing the amplitude of the fault injected until it was no longer possible to declare a “true fault” within the predefined observation period.

The results for the minimum amplitude detectable faults are shown in Table 11:

Tables 12(a), 12(b), and 12(c) summarize the performance of the FD scheme with the minimum amplitude detectable along with the associated fault detection delay.

11. Comparison between Adaptive- and Fixed-Threshold Approaches

As discussed in the literature, a very interesting area of research is the performance comparison in terms of specific FD parameters between “fixed” and “adaptive” thresholds. Considering the fixed-threshold computation approach discussed in [37] as baseline, in this study, the comparison was performance using the same training and validation data for both approaches, reporting the performance in terms of false alarms and detection delay. For the purpose of being completely objective and conducting a fair comparison, fixed thresholds have been computed starting from the residual signals using the same neural estimators and the same whitening filters to obtain a whitened residual. The difference between the two approaches starts after the EWMA filtering (designed with the same parameters). In fact, for the fixed-threshold approach, the EWMA output was processed to find optimized values for upper and lower limits. In fact, during the training, the UCL and LCL were derived from data depending on the desired probability of false alarm , where the is defined as the probability that exceeds the detection thresholds in a fault-free condition. Defining as the experimental cumulative PDF of , the fault detection thresholds are defined as follows:

was set the same way in both approaches. Also, the validation tests were conducted on the same flights used in Section 10, using the same fault amplitude and the same injection time for the failure. Tables 13, 14, and 15 show validation results in terms of false alarms and detection delay:

As far as soft failures are concerned, the detection delay analysis was performed using the minimum amplitude detectable fault used in Section 10 and reported in Table 11. The achieved results are shown in Tables 15(a), 15(b), and 15(c):

Analyzing the results, it is evident that detecting fault featuring adaptive limiters could save from 0% to 50% of the time, with the same false alarm rate in the case of hard failures. Substantially different results were achieved when comparing the two approaches in the case of soft failures. In fact, adaptive thresholds allow a much lower detection delay versus fixed thresholds (up to a ratio of 1 : 30). In some cases, the occurrence of a soft failure was not even detected using fixed thresholds.

Note: the study presented in this paper is specifically focused on the design of experimental FD filters. Therefore, the problem of fault isolation is not considered at all in the study. Indeed, if the fault signature matrix for the residuals for , , and are computed, it can be immediately seen that the three faults are not strongly isolable since each residual depends on , , and . Different approaches can be used to deal with the fault isolation problem, such as (1)to perform a fault sensitivity analysis to investigate whether the fault has a different effect on the amplitudes of three residuals and then exploit this information for discriminating the faulty signal.(2)to exclude, a priori, from the set of regressors the reaming two signals (for instance the regressor set used for approximating the signal does not include the and signals). In this case, we achieve, by construction, experimental models that produce a strong isolable failure signature matrix. Interested readers are referred to the recent paper by the authors in [16] where this last approach has been applied.

12. Conclusions

The goal of this study was the design of a robust adaptive threshold scheme able to deal with failures involving air data sensors. A completely data-driven approach for the design and the tuning of the FD schemes for true airspeed, angle-of-attack, and angle-of-sideslip sensors of an aircraft has been proposed. It has been shown that, through the use of this scheme, it is possible to improve significantly the robustness in terms of false alarms and failure detection delay. The data-driven design philosophy was used in many parts of the overall FD procedure that is in the regressor selection phase for the considered sensors, in the design of the withering and EWMA filters as long as for the computation of detection thresholds. Validation studies based on actual flight data have shown good overall performance in the presence of hard and soft failures and the possibility to detect very small faults. Additionally, a comparison with a fixed-threshold approach was performed. The results of the comparison revealed that for hard failures, there is a limited difference between the approaches. In contrast, a different trend has been observed in the case of soft failures. Specifically, a significant detection delay has been observed in the case of fixed thresholds, leading also to the “no detection at all” situation in some cases.

The capabilities of this data-driven FD approach coupled with the flexibility of the adaptive threshold provided by the floating limiters have proven to be substantially better than fixed-threshold methods in terms of FD performance. This last aspect makes the proposed scheme appealing for the application to a variety of sensor failures for systems where physical redundancy in the sensors is not available.

Data Availability

Research data are not made freely available due to a confidentiality agreement between the authors and Tecnam.

Conflicts of Interest

The authors Fabio Balzano, Mario L. Fravolini, Marcello R. Napolitano, Stéphane D’Urso, Michele Crispoltoni, and Giuseppe del Core declare that there is no conflict of interest regarding the publication of this paper.