Abstract

The increased complexity of plants and the development of sophisticated control systems have encouraged the parallel development of efficient rapid fault detection and isolation (FDI) systems. FDI in industrial system has lately become of great significance. This paper proposes a new technique for short time fault detection and diagnosis in nonlinear dynamic systems with multi inputs and multi outputs. The main contribution of this paper is to develop a FDI schema according to reference models of fault-free and faulty behaviors designed with neural networks. Fault detection is obtained according to residuals that result from the comparison of measured signals with the outputs of the fault free reference model. Then, Euclidean distance from the outputs of models of faults to the measurements leads to fault isolation. The advantage of this method is to provide not only early detection but also early diagnosis thanks to the parallel computation of the models of faults and to the proposed decision algorithm. The effectiveness of this approach is illustrated with simulations on DAMADICS benchmark.

1. Introduction

Fault tolerant control and reliability issues for industrial systems and technological processes require the development of advanced fault detection and isolation (FDI) approaches. The main objective of fault detection and isolation is to provide early warnings to operators, such that appropriate actions can be taken to prevent the breakdowns of the system after the occurrence of faults. As a consequence, FDI methods help to avoid system breakdowns and material damages. During the last decades many investigations have been made using analytical approaches for FDI, based on mathematical models. Among the model-based methods, parameter estimation, parity relation, and observers design are the most often applied techniques [15]. Unfortunately these approaches require a mathematical description that is often not available in engineering practice. In order to solve this problem, system identification strategies can be applied [6, 7]. One of the most popular nonlinear systems identification approaches is based on the application of artificial neural networks (ANNs) [814]. ANNs are computational models with particular properties such as ability to learn, simplicity of implementation, generalization, and good approximation properties.

The aim of fault detection is to deliver alarms when faults occur. The aim of fault diagnosis is to determine the type, magnitude, and location of faults. Detection and diagnosis procedures are based on the observed system and knowledge about the process. So, the inputs of a knowledge-based fault diagnosis system are observed measurements and fault-relevant knowledge about the process [15, 16]. Then, diagnosis of actuators, sensors, and system components can be achieved either via a remote and supervisory diagnosis system or using local intelligence and self-validation methods [1720]. Such self-validation techniques and condition monitoring are popular for numerous applications in various domains. The usual difficulty with FDI is to provide early diagnosis. Even if detection is generally obtained with short delays algorithms, diagnosis requires more time in order to collect data and process history [2123]. So, diagnosis needs generally the observation of the fault consequence over a quite long time window and is achieved a posteriori.

This paper focuses on the problem of early fault detection and diagnosis using nonlinear models that are designed with neural networks. Fault detection is based on residuals design and analysis that also estimate the time of fault occurrence. Fault isolation results from the analysis of a bank of additional residuals. These residuals are obtained according to models of faults designed from the faults candidate database. Such models run simultaneously and are updated with the estimated time of occurrence of the faults. The idea to start several models of faults simultaneously once a fault is detected leads to parallel computation that accelerates the diagnosis. Each model behaves according to a single expected fault and is compared using the Euclidean distance with the collected data to provide a rapid diagnosis. The effectiveness of this approach is illustrated according to simulation results obtained with DAMADICS benchmark.

2. FDI Method

The potential of ANNs for FDI problems has been demonstrated in recent years. The neural network approach is applicable to systems for which mathematical models are difficult to obtain. Adaptive ANNs are used to differentiate various faults from the normal conditions, and from one another, according to different fault patterns. Such patterns are extracted from the measured input-output system data, either by offline training or by online learning. In addition ANNs are also helpful to model the nonlinear dynamics according to nonlinear autoregressive structures with exogenous inputs (NARX) [24]. In this section, a FDI method is proposed, that is based on the design of neural models that represent fault-free behaviors as well as faulty behaviors.

2.1. Fault-Free Model Design

In what follows we consider dynamic systems with inputs and outputs , and we assume that the state variables are not measurable. Such systems often exhibit complex nonlinear dynamics. As a consequence, knowledge-based models are not easy to obtain. Another approach lies in the data-based models. Artificial neural networks are often used for that purpose [25, 26]. The goal is to design models for fault-free and faulty behaviors that will be used for the design of residuals (Figure 1).

In order to get the best ANN architecture, several configurations are tested according to a trial-error processing that uses pruning methods to eliminate useless nodes. The learning of the ANN is obtained according to Levenberg-Marquardt algorithm with early stopping. This algorithm is known for its rapid convergence. During learning stage, the ANN is trained with data collected during the normal functioning (fault-free model). Then the ANN reference models are validated with another set of data. Details about the sizing and training of ANN can be found in [2527].

2.2. Fault Detection

During monitoring, the direct comparison of the system outputs and fault-free model outputs leads to n residuals : The residuals provide a source of information about faults for further processing. Fault detection is based on the evaluation of residuals magnitude. It is assumed that each residual , should normally be close to zero in the fault-free case, and it should be far from zero in the case of a fault. Thus, faults are detected by setting threshold on the residual signals (Figure 2, a single residual and a single fault are considered for simplicity). The analysis of residuals also provides an estimate of the time of occurrence used for diagnosis issue. When several residuals are used, the estimate of the time of occurrence of faults is given by The faults are detected when the absolute value of one residual becomes larger than the threshold :

The main difficulty with this evaluation is that the measurement of the system outputs is usually corrupted by disturbances (e. g., measurement noise). In practice, due to the modeling uncertainties and disturbances, it is necessary to assign large thresholds in order to avoid false alarms. Such thresholds usually imply a reduction of the fault detection sensitivity and can lead to nondetections.

2.3. Design of the Models of Faults

When multiple faults are considered, the isolation of the detected faults is no longer trivial, and early diagnosis becomes a difficult task. One can multiply the measurements and use some analysis tools (residuals analysis) in order to isolate the faults. But the number of sensors limits the use of such approach. Another approach is to use a history of collected data to improve the knowledge about the faulty behaviors and then to use this knowledge to design models of faults and additional residuals. We design and use such models to estimate each fault candidate and compare with measurements to provide the most probable fault according to the Euclidean distance between estimated and measured signals [28]. This approach is developed in the following: faults candidates are considered. For each fault candidate, a model FM, based on ANNs is designed (Figure 3). These models are trained according to a procedure similar to the one used for fault-free reference models design. Each model of fault is trained with data that result from the observation of the faulty behavior corresponding to a single fault . Such data can be founded in history collected for the considered systems or with specific study where faults are enforced in order to investigate the consequences on the system safety, and availability.

Once the model of fault is trained and validated, it produces estimates of the signals , , under the assumption that the fault occurs at time . The comparison of this estimates with the actual measurements is used to isolate the fault.

The inputs of network FM are the input signals , the fault candidate , and the estimation of the time of occurrence for the fault provided by the detection stage. The outputs of network FM are the estimated faulty outputs , obtained assuming that fault disturbs the system from time . The models design and learning are obtained with a method similar to the one used for fault-free model.

2.4. Early Fault Diagnosis

The models of faults run simultaneously once a fault is detected according to the estimate of the time of occurrence of the fault. Each model will behave according to a single fault candidate, and the resulting behaviors will be compared with the collected data to provide a rapid diagnosis. In case of numerous fault candidates , , the outputs of the model FM are compared with the measurements to compute additive residuals. The most probable fault candidate is determined according to the comparison of all residuals , , resulting from the outputs and models of faults: The proposed method uses a time window that can be sized according to the time requirement. Multistep diagnosis with a large window includes a diagnosis delay but will lead to a decision with a high confidence index. On the contrary single step diagnosis leads to immediate diagnosis but with a lower confidence index. To evaluate the probability of each fault candidate, let us define as the cumulative residuals over the sliding time interval of maximal size : Then, is the Euclidean norm of the vector of dimension : is used to decide the most probable fault according to single or Multistep (i.e., immediate or delayed) diagnosis.

Multistep diagnosis at time results from the a posteriori analysis of computed for the time interval (i.e., ). The most probable fault at time is given by A confidence factor that the current fault is will be given by is near 1 when is near 0 and is far from 1 when is far from 0.

Immediate diagnosis results from the analysis of computed at time t according to In order to attenuate the effects of noise and outlaw values, the most probable fault candidate is determined according to the comparison of the cumulative residuals over a sliding time interval of maximal size . Single-step diagnosis results as a consequence: and the confidence factor that the current fault is will be given by The window width (i.e., number of steps) is selected in order to satisfy real time requirements for rapid diagnosis.

3. Application to the DAMADICS Benchmark

The proposed method is applied on signals obtained with the DAMADICS simulator. Such system has been used to validate and discuss several FDI systems [2931]. In [30], binary-valued evaluation of the fault symptoms is explored, and the authors focus on the optimization of the neural network architecture according to Akaike Information Criteria and Final Prediction Error. Both criteria include the learning error and also a term that depends on the complexity (size of the network in number of nodes) and on the dimension of the learning set in order to optimize the ratio complexity/performance. The authors provide interesting performances with small networks for detection but some faults are not isolable. In comparison, our approach requires a larger number of networks, and the networks have more nodes but all faults are detected and isolated. In [31], multiple-valued evaluation of the fault symptoms is introduced to improve the distinguishability of faults. Such a method requires a heuristic knowledge about influence of faults on residuals. In comparison, our approach uses binary-valued evaluation of the residuals but needs analytical models of faults of the actuator including the faults candidates that are not used in [31].

3.1. DAMADICS Description

The DAMADICS benchmark is an engineering research case study that can be used to evaluate detection and isolation methods. The benchmark is an electropneumatic valve actuator in the Lublin sugar factory in Poland [32]. The actuator consists of a control valve, a pneumatic servomotor, and a positioner (Figure 4).

In Figure 4, D/A is the data acquisition unit, PC is the positioner processing unit, E/P is the electropneumatic transducer, V1, V2, V3 are bypass valves, DT stands for displacement, PT for pressure, FT for value flow transducer, and TT for temperature.

In the actuator, faults can appear in control valve, servomotor, electropneumatic transducer, piston rod travel transducer, pressure transmitter, or microprocessor control unit. 19 types of faults are considered (, Table 1). The faults are emulated under carefully monitored conditions, keeping the process operation within acceptable quality limits.

Five available measurements and 1 control value signal have been considered for benchmarking purposes: process control external signal CV, values of liquid pressure on the valve inlet P1 and outlet P2, liquid flow rate F, liquid temperature T1, and stem displacement X (Table 2) [32].

3.2. Residuals Design for Detection

The positioner and control valve are modeled with two multilayer ANNs: netX and netF (Figure 5) that represent the interaction between inputs and the outputs in fault-free case according to To select the structure of the neural networks netX and netF, numerous tests have been carried out to obtain the best architectures (i.e., number of hidden layers and number of neurons by layer) in order to model the operation of the actuator. The training and test data were generated by simulation using Matlab-Simulink models [27]. The selected structures are ANNs with 6 nodes in layer 1, 3 nodes in layer 2, and a single output neuron. The Table 3 sums some results obtained during the training stage in order to select the best architecture [25, 27]. For this purpose, the mean square error (MSE) is worked out over the set of training data and for a training of 1000 epochs.

When the training is over, netF provides estimates of the outputs, and MSE over the training data is about for netX and about for netF according to a training of 1000 epochs.

Validation is done with the measured data provided by the Lublin Sugar Factory in 2001 [32]. Validation is illustrated in Figure 6.

Two residuals are designed according to

3.3. Fault Detection

Fault detection is obtained according to the comparison of the current residuals value with some thresholds. The thresholds are determined according to the standard deviation of the residual for fault-free case. For output X, and . For output , and . During normal operation the residuals remain near zero. In Figure 7, the residuals and are depicted when the fault is simulated during interval [20 s 80 s] time units (time units are in seconds). A fault is detected by and by at time  s when the residuals reach the threshold. From this figure one can also evaluate the delay to detection that is about 2 s for fault. Such information will be used further for diagnosis issue.

The residuals and are binary valued according to the detection threshold. The Table 4 sums up the residual signatures for the 19 types of faults (). means that residual is large positive (according to the previous thresholds) and means that residual ri0 is large negative. To conclude, all faults are detected according to residual and .

The residuals analysis is an essential step in FDI systems. The choice of constant or adaptive detection thresholds strongly influences the quality and performance of the FDI. The problem of the threshold selection is closely linked to the behavior of residuals and also to constraints that may be imposed such as security margins tolerance [29]. For this reason, we study the variation of the delay to detection according to the magnitude of detection thresholds and . The results are summarized in the Table 5.

According to Table 5 one notices that if the detection thresholds increase, the delay of detection also increases in exponential manner and may lead to nondetections (ND). On the contrary small thresholds may lead to false alarms (FA). So, the thresholds must be thoroughly selected. For the continuation of our work we select the thresholds and .

3.4. Fault Diagnosis

According to Table 4, three groups of faults with similar symptoms can be separated:group 1 = ,group 2 = ,group 3 = .

Within each group, faults are not isolable. For this reason we propose to use the method described in Sections 2.3 and 2.4 in order to improve the isolability of faults and to perform the complete early diagnosis. For this purpose, 19 models of faults FM are designed according to the history of data available with DAMADICS benchmark. Each model FM computes two estimated outputs , and and comparisons with measured data lead to the residuals and : The application of diagnosis stage leads to the results in Table 6 and Figures 9 to 11. Let us define the cumulative residuals , , and the distance , according to (5) and (6). Single- and Multistep diagnoses are obtained according to (7) and (10).

For example, is simulated within time interval [444 s 1000 s] (Figure 8). The residuals and are also depicted as the detection thresholds and . According to the detection stage, a fault is detected at time  s with a delay of 7 s, and the group 2 is isolated (Figure 8). Multistep diagnosis is illustrated with a large time interval  s (Figure 9).

For Multistep diagnosis, Figure 9 and Table 6 report the location of each model FM in plan (, ) and also the distance at time t = 1000 s. The model FM corresponding to the fault candidate f15 provides the estimated outputs with the smallest Euclidean distance from the measured outputs. To conclude f15 is the most probable fault when residuals are analyzed within time interval [0, 1000 s]. Similar conclusions have been obtained for numerous other simulations.

Another example is provided when fault is simulated within time interval [500 s, 1000 s]. According to the detection stage, a fault is detected at time  s, and the group 1 is isolated. Early diagnosis is illustrated with a small time interval  s. Figures 10 and 11 plot the location of each model FM in plan (, ) for . For any , the trajectory with minimal distance to the origin (i.e., minimal value of ) corresponds to the most probable fault. In Figures 10 and 11, the most probable fault f12 is highlighted. Figure 11 plots details about the trajectory for model FM.

One can notice that the trajectories start near the origin (i.e., the effects of the expected faults on residuals are weak) and then go far from origin (i.e., the effects of the expected faults increase). The trajectory corresponding to FM (i.e., the expected fault is the actual one) remains near origin in comparison to the other trajectories. One can conclude that the fault candidate is the most probable fault because the distance to the origin is the smallest one. One can also notice that cumulative residuals and cover the positive part of plan . The repartition of the cumulative residuals in plan confirms the significance of both outputs and to design residuals. Thus, Figures 10 and 11 are also useful to check if the considered outputs are helpful for diagnosis issues. Similar conclusions have been obtained for numerous other simulations.

3.5. Discussion

Kościelny et al. [31] have introduced the distinguishability factor Γ that depends on the cardinal of the set of distinguishable faults. As long as all faults are isolable with our approach, we obtain . In comparison, has been obtained in [31]. In [30], Patan et al. just consider 3 faults and mention that f1 is not isolable with his contribution, so that for sure. In addition, the delay to detection is already not discussed in [30, 31]. From our point of view, delay to detection is quite important to consider as long as this delay will also influence the rapidity (and efficiency) of the diagnosis.

It is important to notice that the good performances of our approach are due to a large computation effort: 20 models with 2 outputs each of them are required to compute 40 residuals. In comparison [30] uses only 4 models with 2 outputs for each of them and [31] uses 10 residuals. The Akaike Information Criterion (AIC) and Final Prediction Error (FPE) [30] that measure the ratio complexity/performances remain quite good: FPE (netF) = 0.15 and IAC (netF) = −1.85, but these criteria do not include the complexity due to the number of networks working in parallel. Anyway, in our work, we do not consider the optimization of the ratio complexity/performance as introduced in [30], and we just use NNs of appropriate dimensions so that the training error will be small enough.

4. Conclusion

The method proposed in this work for early detection and diagnosis of faults combines the computing power and the robustness of neural networks with simple real time decision according to the Euclidean distance of cumulative residuals in residual space. The method leads to fault detection, time to failure estimation, and most probable fault evaluation. The results obtained with DAMADICS benchmark illustrate the performance of the method. But it is important to notice that the good performances of our approach are due to a large computation effort. Twenty models with two outputs each of them are required, and these networks work in parallel.

The hardest limitation of the proposed method is the necessity to design models of faults according to each fault candidate. Such design requires time, computational resources, and large history of data. We will consider systematic design of models of faulty behaviors in our future works. Another perspective is to take benefit from the correlation of diagnosis performances with the selection of estimated outputs for residuals design. The analysis of cumulative residuals in the residual space provides an interesting point of view to continue this investigation, and the covering of the residual space will be used to select appropriate residuals. Our next works will also include a deeper interpretation of the distance as a probability or likelihood ratio for performance evaluation.

Acknowledgment

The work was supported in part by Ministry of Higher Education and Scientific Research in Algeria.