About this Journal Submit a Manuscript Table of Contents
Journal of Control Science and Engineering
Volume 2008 (2008), Article ID 265189, 10 pages
Research Article

Reliability Monitoring of Fault Tolerant Control Systems with Demonstration on an Aircraft Model

1Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada T6G 2V4
2Department of Computer Science and Engineering, Aalborg University Esbjerg, Niels Bohrs Vej 8, Esbjerg 6700, Denmark

Received 4 April 2007; Revised 6 September 2007; Accepted 13 November 2007

Academic Editor: Kemin Zhou

Copyright © 2008 Hongbin Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


This paper proposes a reliability monitoring scheme for active fault tolerant control systems using a stochastic modeling method. The reliability index is defined based on system dynamical responses and a safety region; the plant and controller are assumed to have a multiple regime model structure, and a semi-Markov model is built for reliability evaluation based on the safety behavior of each regime model estimated by using Monte Carlo simulation. Moreover, the history data of fault detection and isolation decisions is used to update its transition characteristics and reliability model. This method provides an up-to-date reliability index as demonstrated on an aircraft model.

1. Introduction

In order to meet high reliability requirement of safety-critical processes, major progress has been made in fault tolerant control systems (FTCSs). FTCSs usually employ fault detection and isolation (FDI) schemes and reconfigurable controllers to accommodate fault effects, also known as active FTCSs. Most work on reconfigurable controller design is performed under the assumption of perfect FDI detections. However, imperfect FDI results are inevitable owing to disturbances or modeling uncertainties and may corrupt designated reliability requirement. Therefore, it is necessary to validate the design of FTCSs from a reliability perspective.

The reliability of FTCSs has been investigated using various methods. The key problem is to set up appropriate reliability models with control objectives and safety requirements incorporated. As fault occurrences and system failures are rare events, dynamic models are usually not suitable for reliability analysis. For example, Wu used serial-parallel block diagrams and Markov models for evaluation purpose, and defined a coverage concept to relate reliability and control actions [1]. Walker proposed Markov and semi-Markov models to describe the transitions of fault and FDI modes, but control actions are not considered [2]. In previous work, we considered static model-based control objectives and built a semi-Makov model from imperfect FDI and hard-deadline concepts [3, 4]. However, in many practical systems, the safety and reliability of operation are often assessed based on dynamic system responses. For instance, reliability in structural control is defined as the probability of system outputs outcrossing safety boundaries and evaluated by using Gaussian approximation [5]. Also, an online available reliability monitoring scheme using updated information may aid maintenance scheduling, provide prealarming, and avoid emergent overhauls. How to evaluate reliability when it is defined based on system trajectory and how to implement an online-monitoring scheme are the main motivations of this paper.

The objectives of this paper are threefold. First of all, a steady-state test (SST) is proposed to reduce false alarms of FDI decisions. The stochastic modeling of such an FDI scheme is studied based on which the transition characteristics of FDI modes can be described. The second objective is to develop a reliability evaluation scheme for FTCSs based on system dynamic responses and safety boundary. At last, online monitoring features are considered, such as estimation of FDI transition parameters based on history data and timely update of reliability index to reflect up-to-date system behavior.

The remainder of this paper is organized as follows: the assumptions and system structure are given in Section 2; FDI scheme, modeling, and parameter estimation are discussed in Section 3; the determination of outcrossing failure rates and hard-deadlines are discussed in Section 4; the reliability model construction is discussed in Section 5 followed by a demonstration example of an F-14 aircraft model in Section 6.

2. Assumptions and System Structure

Assumption 1. The considered plant is assumed to have finite fault modes, and dynamics under each fault mode can be effectively represented by a linear system model.

Fault modes are represented by a set 𝑆 with 𝑁 integers; {𝑖𝑖𝑆} represents the set of dynamical plant models under various fault modes; {𝒦𝑗𝑗𝑆} denotes a set of reconfigurable controllers in a switching structure. 𝒦𝑗 is designed for fault mode 𝑗 based on 𝑗, 𝑗𝑆. However, true fault modes are usually not directly known, so an FDI scheme is used to generate estimates of fault modes, which may deviate from true fault modes with error probabilities.

Assumption 2. FDI scheme is assumed to generate a fault estimate based on a batch of measurements and calculations for every fixed period 𝑇c.

This assumption states a cyclic feature of FDI, such as statistical tests and interactive multiple model (IMM) Kalman filters [6]. FDI modes are represented by a discrete-time stochastic process 𝜂𝑛𝑆, where 𝑛, the set of nonnegative integers. The time duration between consecutive discrete indices is equal to FDI detection period 𝑇c. 𝒦𝑗 is put in use when 𝜂𝑛=𝑗, 𝑗𝑆. Corresponding to 𝜂𝑛, a discrete-time stochastic process 𝜁𝑛 denotes true fault mode. In reliability engineering, constant failure rates are usually assumed for the main part of component life cycle. In such a case, 𝜁𝑛 can be described as a Markov chain [7], and its transition probabilities are denoted as 𝐺𝑖𝑗=Pr{𝜁𝑛+1=𝑗𝜁𝑛=𝑖}, 𝑖,𝑗𝑆.

Remark 1. The semi-Markov process can be used as a general FDI model. It can describe any type of sojourn time distribution; in contrast, the Markov process model accepts exponential sojourn time distributions only. More discussions can be found in [4].

Assumption 3. System performance is assumed to be represented by a vector signal 𝑧(𝑡). Safety region, denoted as Ω, is assumed to be a fixed region in the space of 𝑧(𝑡) bounded by its safety threshold. Failure is assumed to occur when 𝑧(𝑡) exceeds a safety region for the first time.

This assumption intends to define an appropriate reliability index based on system dynamical response. It is common in control systems to use a signal 𝑧(𝑡) to represent performance, and 𝑧(𝑡) is usually to be kept at small values against influences from exogenous disturbances, modeling uncertainties, and dynamical characteristic changes caused by faults. Safety region Ω is assumed to be fixed and known a priori. The scenario that 𝑧(𝑡) exceeds Ω represents lost of control and system failures. More discussions on this assumption can be found in [8].

Definition 1. For a time interval from 0 to 𝑡, the reliability function 𝑅(𝑡) is defined as the following probability: 𝑅(𝑡)=Pr0𝜏𝑡,𝑧(𝜏)Ω.(1) Mean time to failure (MTTF) is defined as the expected time of satisfactory operation: MTTF=0𝑅(𝑡)𝑑𝑡.(2)

Remark 2. Different from repairs relying on human intervention when system operation is stopped, control actions are executed automatically and can be deemed as an internal actions of FTCSs. Therefore, MTTF represents the mean operational time without human intervention before failure.

Compared with 𝜁𝑛 and 𝜂𝑛, 𝑧(𝑡) is typically a fast changing function determined by both continuous and discrete dynamics. As shown in Figure 1, 𝜁𝑛 and 𝜂𝑛 are two regime modes and determine the transitions among regime models. When 𝜁𝑛=𝑖 and 𝜂𝑛=𝑗 are fixed, 𝑧(𝑡) evolves according to plant model 𝑖 and controller 𝒦𝑗. As a result of this hybrid dynamics, directly evaluating 𝑅(𝑡) and MTTF is a difficult problem. Therefore, a discrete-time semi-Markov chain 𝑋𝑛 is constructed for reliability evaluation purpose. The main idea is that the hybrid system is decomposed into various regime models; each regime model is then evaluated for related safety characteristics, and 𝑋𝑛 is constructed to integrate these characteristics with transition parameters of regime modes and to solve its transition probabilities for reliability evaluation. The structure and main components of reliability monitoring scheme are illustrated in Figure 2.

Figure 1: Transitions among regime models.
Figure 2: System structure.

Semi-Markov reliability model 𝑋𝑛 is the kernel component for calculating MTTF. It is constructed based on the following parameters: (1) the transition rates of 𝜁𝑛, called plant failure rates, (2) the estimates of 𝜁𝑛 from FDI and confirmation test, called confirmed fault modes, (3) the parameters of 𝜂𝑛 estimated from history data, called FDI transition characteristics, (4) the probability of 𝑧(𝑡) crossing safety boundary during an FDI cycle 𝑇c when 𝜁𝑛=𝜂𝑛, called failure outcrossing rates, (5) the average number of periods before crossing safety boundary when 𝜁𝑛𝜂𝑛, called hard deadlines. Among these parameters, the second and third ones can be updated online.

3. FDI Scheme and Its Characterization

3.1. Steady-State Tests

It is well known that false alarm and missing detection rates are two conflicting quality criteria of FDI. One is usually improved at the cost of degrading the other. What is worse, the general rules of adjusting FDI to improve these two criteria simultaneously are often not known. For example, in a scheme based on IMM Kalman filters, it is not clear how to determine Markov interaction parameters. Considering that most false alarms last for short time only, an SST strategy is adopted for postprocessing FDI decisions.

SST requires that, when FDI decision changes, new decision is accepted only when it stays the same for a minimum number of detection cycles. Let 𝑇SST𝑗 denote the required number of consistent cycles for FDI mode 𝑗, 𝑗𝑆. The effectiveness of this SST strategy relies on the distribution of false alarm durations. For example, if a nonnegative discrete random variable 𝜆0 denotes the false alarm duration when system fault mode 𝜁𝑛=0, 𝑇SST0 can be taken as (1𝛼)-quantile of 𝜆0, 0<𝛼<1, meaning 𝜆Pr0>𝑇SST0𝛼,(3) which implies that false alarm probability can be reduced by ratio 𝛼 when accepting FDI decisions after 𝑇SST0. The weakness of this method is additional detection time delay of 𝑇SST𝑗 when fault occurs. However, this happens only under rare occurrences of faults. Compared with the improvement on relatively more frequently transitions of FDI modes, this weakness is acceptable.

Detection decisions from SST are represented by 𝜂𝑛 and used for controller reconfigurations. In Figure 2, the confirmation test is an SST with large test period to further reduce false alarm probability to a negligible level. It generates confirmed fault modes, which are used with FDI trajectories for updating transition parameters of 𝜂𝑛 and reliability index.

3.2. Stochastic Models

A sample path of 𝜂𝑛 is given in Figure 3. Let 𝜃𝑚𝑆 and 𝑇𝑚 denote the FDI mode and cycle index, respectively, after the 𝑚th transition of 𝜂𝑛, 𝑚. For example, in Figure 3, 𝜃1=𝜂5 and 𝑇2=5. 𝜃𝑚 and 𝑇𝑚 together determine FDI trajectory, and 𝜂𝑛=𝜃𝑆𝑛, where 𝑆𝑛=sup{𝑚𝑇𝑚𝑛} is the discrete-time counting process of the number of jumps in [1,𝑛]. (𝜃,𝑇){𝜃𝑚,𝑇𝑚𝑚} is called a discrete-time Markov renewal process if 𝜃Pr𝑚+1=𝑗,𝑇𝑚+1𝑇𝑚=𝑙𝜃0,,𝜃𝑚;𝑇0,,𝑇𝑚𝜃=Pr𝑚+1=𝑗,𝑇𝑚+1𝑇𝑚=𝑙𝜃𝑚(4) holds for fixed 𝜁𝑇𝑚=𝜁𝑇𝑚+1==𝜁𝑇𝑚+1=𝑘, 𝑘,𝑗𝑆, 𝑙,𝑚. 𝜂𝑛=𝜃𝑚 is then called the associated discrete-time semi-Markov chain of (𝜃,𝑇). It can be shown that 𝜃𝑚 is a Markov chain, and its transition probability matrix is denoted by 𝑃𝑘.

Figure 3: A sample path of 𝜂𝑛.

Given 𝜁𝑇𝑚=𝜁𝑇𝑚+1==𝜁𝑇𝑚+1=𝑘, let 𝜏𝑘𝑖𝑗=𝑇𝑚+1𝑇𝑚 if 𝜃𝑚=𝑖 and 𝜃𝑚+1=𝑗, 𝑖,𝑗,𝑘𝑆. 𝜏𝑘𝑖𝑗 is the sojourn time of 𝜂𝑛 between its transition to state 𝑖 at 𝑇𝑚 and the consecutive transition to 𝑗 at 𝑇𝑚+1. If the transition destination state is not specified, let 𝜏𝑘𝑖 denote the sojourn time at state 𝑖.

As shown in Figure 3, 𝜏𝑘𝑖𝑗 is the sum of two variables: a constant 𝑇SST𝑖 for SST period and a random sojourn time 𝜎𝑘𝑖𝑗. Let 𝑘𝑖𝑗(𝑙) and 𝑔𝑘𝑖𝑗(𝑙) denote the discrete distribution functions of 𝜏𝑘𝑖𝑗 and 𝜎𝑘𝑖𝑗 respectively, which have the following relations: 𝑘𝑖𝑗𝜏(𝑙)=Pr𝑘𝑖𝑗==𝑙0,𝑙𝑇SST𝑖,𝑔𝑘𝑖𝑗𝑙𝑇SST𝑖,𝑙>𝑇SST𝑖.(5) This semi-Markov description provides a general model on FDI mode transitions, but it involves a large number of parameters. The transition characteristics of 𝜂𝑛 are jointly determined by 𝑃𝑘 and 𝑘𝑖𝑗 (or 𝑔𝑘𝑖𝑗). If 𝑆 contains 𝑁 fault modes, there are 𝑁 transition probability matrices 𝑃𝑘 and 𝑁3 distribution functions 𝑘𝑖𝑗. If each 𝑘𝑖 follows geometric distribution, the description of 𝜂𝑛 may degenerate to a hypothetical Markov model 𝜂𝑛.

All Markov chains can be considered as a special type of semi-Markov chains. If 𝜂𝑛 can be modeled as a Markov chain with transition probability matrix denoted by 𝐻𝑘 for 𝜁𝑛=𝑘, the following relations hold: 𝑃𝑘𝑖𝑗=𝐻𝑘𝑖𝑗1𝐻𝑘𝑖𝑖,(6)𝑘𝑖𝑗𝐻(𝑙)=𝑘𝑖𝑖𝑙1𝐻𝑘𝑖𝑗,(7)𝑘𝑖𝐻(𝑙)=𝑘𝑖𝑖𝑙11𝐻𝑘𝑖𝑖.(8) It is obvious that 𝑘𝑖 is a geometric distribution. In fact, this is an essential property of Markov chain, as shown in the following lemma.

Lemma 1. A discrete-time semi-Markov chain degenerates to a Markov chain if and only if the sojourn time at each state (when subsequent state is not specified) follows geometric distribution.

The proof is given in the appendix. When 𝑇SST is nonzero, the sojourn time of 𝜂𝑛 does not follow geometric distribution owing to this deterministic constant, and Lemma 1 cannot be directly applied. However, as 𝑇SST is known, a hypothetical process 𝜂𝑛 can be constructed by setting 𝑇SST to zeros; if the sojourn time of 𝜂𝑛 is geometrically distributed, it can be described as a Markov chain; the original sojourn time of 𝜂𝑛 can be recovered by adding 𝑇SST to that of 𝜂𝑛. This method may greatly reduce the number of parameters for characterizing FDI results.

3.3. Transition Parameter Estimation

FDI transition parameters can be estimated as an offline test on FDI when both fault mode and FDI detection results are known. This estimation can also be carried out online using FDI history data and confirmed fault modes.

When 𝜂𝑛 is modeled as a semi-Markov chain, 𝑃𝑘 and 𝑘𝑖𝑗 (or 𝑔𝑘𝑖𝑗) are parameters to be estimated. 𝑃𝑘 can be estimated from the transition history of 𝜂𝑛. For example, when 𝜁𝑛 is kept as a constant 𝑘, if there are 𝑀𝑖𝑗 transitions from 𝑖 to 𝑗 among all 𝑀 transitions leaving 𝑖, the 𝑖𝑗th element of 𝑃𝑘 can be estimated as 𝑃𝑘𝑖𝑗=𝑀𝑖𝑗/𝑀.

The estimation of sojourn time distribution 𝑔𝑘𝑖𝑗 can be completed in two steps: the histogram of sojourn time is firstly examined to select a standard distribution such that nonparametric estimation is converted to a parametric one; ̂𝑔𝑘𝑖𝑗 is then obtained by estimating unknown parameters in distribution functions.

If ̂𝑔𝑘𝑖𝑗 follows geometric distribution for all 𝑖,𝑗,𝑘𝑆, 𝜂𝑛 can be described as a hypothetical Markov chain 𝜂𝑛 under the hypothesis that 𝑇SST𝑖=0. As a result, transition probability 𝐻𝑘𝑖𝑗 from 𝑖 to 𝑗 and sojourn time 𝜏𝑘𝑖 at 𝑖 have the following relation: 𝜏Pr𝑘𝑖=𝐻=𝑛𝑘𝑖𝑖𝑛11𝐻𝑘𝑖𝑖.(9) Therefore, 𝐸(𝜏𝑘𝑖)=1/(1𝐻𝑘𝑖𝑖), and 𝐻𝑘𝑖𝑖 can be estimated by 𝐻𝑘𝑖𝑖=11𝑀𝑙=1𝜏𝑘𝑖,(𝑙)/𝑀𝑀𝑙=1𝜏𝑘𝑖(𝑙)𝑀0,1,otherwise,(10) where 𝜏𝑘𝑖(𝑙) denote 𝑀 sojourn time samples at state 𝑖, 𝑙=1,,𝑀. 𝐻𝑘𝑖𝑗 can be estimated based on the transition frequency from state 𝑖 to 𝑗: 𝐻𝑘𝑖𝑗=𝐻1𝑘𝑖𝑖𝑤𝑘𝑖𝑗𝑀,(11) where 𝐻1𝑘𝑖𝑖 is a normalization coefficient and 𝑤𝑘𝑖𝑗 represents the number of FDI transitions from 𝑖 to 𝑗.

4. Outcrossing Failure Rates and Hard-Deadlines

Owing to FDI delays or incorrect decisions, controller 𝒦𝑖 may be used for its designated regime model 𝑖 (namely, matched cases) and other model 𝑗, 𝑖𝑗 (namely, mismatched cases). Matched cases usually account for major operation time, while mismatched cases often appear as temporary operation.

Definition 2. The outcrossing failure rate in matched cases is defined as 𝑣𝑖𝑖Pr𝜏,𝑛𝑇c<𝜏(𝑛+1)𝑇c,𝑧(𝜏)Ω𝑧𝑛𝑇cΩ,𝜁𝑛=𝜂𝑛=𝑖,𝑖𝑆.(12)

Monte Carlo simulation can be used for estimating 𝑣𝑖𝑖: sample simulations are performed by using generated sample uncertain plant model and sample disturbance input; the simulation time when system fails is called a sample time-to-failure. With a large number of time-to-failure samples obtained, 𝑣𝑖𝑖 can be estimated as the ratio between 𝑇c and sample mean of time-to-failure.

Mismatched cases are usually temporary operation caused by FDI false alarms or delays, and system may return to matched cases if 𝑧(𝑡) does not diverge to unsafe region. So, it is important to find out the average tolerable time before system failure. This time limit is called hard-deadline, denoted by 𝑇hd𝑖𝑗 for 𝜁𝑛=𝑖 and 𝜂𝑛=𝑗. It can also be estimated by sample mean of time-to-failure using Monte Carlo simulations.

5. Reliability Model Construction

The states of semi-Markov chain 𝑋𝑛 for reliability evaluation are classified into two groups: one unique failure state, denoted by 𝑠F, and multiple functional states, defined as state combinations of 𝜁𝑛=𝑖 and 𝜂𝑛=𝑗, denoted as 𝑠𝑖𝑗, 𝑖,𝑗𝑆. For example, if two types of faults are considered in the plant, 𝜁𝑛 includes states of fault-free, fault type 1, fault type 2, and both fault 1 and fault 2, represented by 𝑆={0,1,2,3}, and 𝑋𝑛 contains 17 states.

The semi-Markov kernel of 𝑋𝑛 is denoted as 𝑄(,,𝑚), representing the one-time transition probability in 𝑚 cycles. It is determined by the following parameters: (1) transition characteristics of fault and FDI modes, (2) outcrossing failure rate in state 𝑠𝑖𝑖 denoted by 𝑣𝑖𝑖, (3) hard-deadline in state 𝑠𝑖𝑗 denoted by 𝑇hd𝑖𝑗, (4) FDI SST period denoted by 𝑇SST𝑗 for FDI mode 𝑗.

Let us begin with the case that FDI mode can be described as a hypothetical Markov chain 𝜂𝑛 with transition probability denoted by 𝐻𝑘𝑖𝑗. The calculation of 𝑄 is classified into the following cases.

Case 1. The transitions from functional states to themselves are not defined and the corresponding elements are assigned as zeros: 𝑄𝑠𝑖𝑖,𝑠𝑖𝑖𝑠,𝑚=0,𝑄𝑖𝑗,𝑠𝑖𝑗,𝑚=0,𝑖,𝑗𝑆.(13)

Case 2. Failure state 𝑠F is absorbing: 𝑄𝑠F,𝑠F=,𝑚1,𝑚=1,0,𝑚>1.(14)

Case 3. Initial states are matched states 𝑠𝑖𝑖: 𝑄𝑠𝑖𝑖,𝑠F=,𝑚1𝑣𝑖𝑖𝑚1𝐺𝑚1𝑖𝑖𝑣𝑖𝑖,𝑚𝑇SST𝑖,𝑝𝑖𝑖1𝑣𝑖𝑖𝐺𝑖𝑖𝐻𝑖𝑖𝑖(𝑚𝑇SST𝑖1)𝑣𝑖𝑖,𝑚>𝑇SST𝑖,𝑄𝑠𝑖𝑖,𝑠𝑗𝑖=,𝑚1𝑣𝑖𝑖𝑚1𝐺𝑚1𝑖𝑖1𝑣𝑖𝑖𝐺𝑖𝑗,𝑚𝑇SST𝑖,𝑝𝑖𝑖1𝑣𝑖𝑖𝐺𝑖𝑖𝐻𝑖𝑖𝑖(𝑚𝑇SST𝑖1)1𝑣𝑖𝑖𝐺𝑖𝑗𝐻𝑖𝑖𝑖,𝑚>𝑇SST𝑖,𝑄𝑠𝑖𝑖,𝑠𝑖𝑗=,𝑚0,𝑚𝑇SST𝑖,𝑝𝑖𝑖1𝑣𝑖𝑖𝐺𝑖𝑖𝐻𝑖𝑖𝑖(𝑚𝑇SST𝑖1)1𝑣𝑖𝑖𝐺𝑖𝑖𝐻𝑖𝑖𝑗,𝑚>𝑇SST𝑖,𝑄𝑠𝑖𝑖,𝑠𝑘𝑗=,𝑚0,𝑚𝑇SST𝑖,𝑝𝑖𝑖1𝑣𝑖𝑖𝐺𝑖𝑖𝐻𝑘𝑗𝑗(𝑚𝑇SST𝑖1)1𝑣𝑖𝑖𝐺𝑖𝑘𝐻𝑖𝑖𝑗,𝑚>𝑇SST𝑖,(15) where 𝑝𝑖𝑖=Pr{𝑋1=𝑋2==𝑋𝑇SST𝑖=𝑠𝑖𝑖𝑋0=𝑠𝑖𝑖}=(1𝑣𝑖𝑖)𝑇SST𝑖𝐺𝑇SST𝑖𝑖𝑖, 𝑖𝑗, 𝑘𝑖, 𝑖,𝑗,𝑘𝑆.
The derivation of these equations are based on Markov transition probabilities and the decomposition of each event. For example, 𝑄𝑠𝑖𝑖,𝑠F𝑋,𝑚=Pr1=𝑋2==𝑋𝑚1=𝑠𝑖𝑖,𝑋𝑚=𝑠F𝑋0=𝑠𝑖𝑖𝑋=Pr1=𝑋2==𝑋𝑚1=𝑠𝑖𝑖𝑋0=𝑠𝑖𝑖𝑋×Pr1=𝑠F𝑋0=𝑠𝑖𝑖.(16) Considering the SST of FDI, if 𝑚𝑇SST𝑖, 𝑋Pr1=𝑋2==𝑋𝑚1=𝑠𝑖𝑖𝑋0=𝑠𝑖𝑖=1𝑣𝑖𝑖𝑚1𝐺𝑚1𝑖𝑖.(17) If 𝑚>𝑇SST𝑖, 𝑋Pr1=𝑋2==𝑋𝑚1=𝑠𝑖𝑖𝑋0=𝑠𝑖𝑖𝑋=Pr1=𝑋2==𝑋𝑇SST𝑖=𝑠𝑖𝑖𝑋0=𝑠𝑖𝑖×1𝑣𝑖𝑖𝐺𝑖𝑖𝐻𝑖𝑖𝑖(𝑚𝑇SST𝑖1).(18)𝑄(𝑠𝑖𝑖,𝑠F,𝑚) can be obtained by combining these two probabilities with Pr{𝑋1=𝑠F𝑋0=𝑠𝑖𝑖}=𝑣𝑖𝑖.

Case 4. Mismatched states, 𝑠𝑖𝑗, 𝑖𝑗. When 𝑚𝑇SST𝑗, the transition probability of 𝑋(𝑡) to any other state is zero because of SST period. When 𝑇SST𝑗<𝑚𝑇hd𝑖𝑗, the probability of 𝑋(𝑡) transiting to any other state is zero except to 𝑠𝑖𝑖. The above reasoning is based on the facts that FDI rarely jumps to other false modes when current mode is incorrect, and mean fault occurrence time is in a much higher order compared with a short false FDI detection period. Therefore, when 𝑇SST𝑗<𝑚𝑇hd𝑖𝑗, 𝑚>𝑇hd𝑖𝑗+1 When 𝑋𝑛, 𝑠F jumps to 𝑚=𝑇hd𝑖𝑗+1 at the earliest time 𝑄𝑠𝑖𝑗,𝑠F,𝑇SST𝑖+1=1𝑇hd𝑖𝑗𝑘=𝑇SST𝑖+1𝑄𝑠𝑖𝑗,𝑠𝑖𝑖𝐻,𝑚=11𝑖𝑗𝑗𝑇𝑖𝑗𝑇SST𝑗+11𝐻𝑖𝑗𝑗𝐻𝑖𝑗𝑖.(20) only: 𝜂𝑛

In the general cases, 𝜁𝑛=𝑖 is modeled as a semi-Markov chain, and the competition probabilities methods discussed in [4] can be utilized.

Definition 3. Given 𝜂𝑛=𝑗 and (𝑖,𝑗), the combinational mode is denoted as 𝑖,𝑗𝑆, (𝜁𝑛+1,𝜂𝑛+1)==(𝜁𝑛+𝑚1,𝜂𝑛+𝑚1)=(𝑖,𝑗). Suppose 𝜁𝑛 and the next combinational mode after the consequent transition of 𝜂𝑛 or/and 𝑛+𝑚 at (𝜁𝑛+𝑚,𝜂𝑛+𝑚)=(𝑘,𝑙) is 𝑘𝑖, where 𝑙𝑗 or/and 𝑘,𝑗𝑆, 𝜌(𝑖,𝑗)(𝑘,𝑙)(𝑚). The probability of this event is called the competition probability, denoted by 𝜌(𝑖,𝑗)(𝑘,𝑙)(𝑚).

The calculation formulas of 𝑋𝑛 were derived in [4, Section 3] and are omitted here for brevity. As the states of 𝜁𝑛 are mainly defined as the state combinations of 𝜂𝑛 and 𝑋𝑛, the calculation of the semi-Markov kernel of 𝜌(𝑖,𝑗)(𝑘,𝑙)(𝑚) is simplified when 𝑄𝑠𝑖𝑖,𝑠𝑘𝑙=,𝑚1𝑣𝑖𝑖𝑚𝜌(𝑖,𝑖)(𝑘,𝑙)𝑄𝑠(𝑚),𝑖𝑖,𝑠F=,𝑚1𝑣𝑖𝑖𝑚1𝑣𝑖𝑖,𝑄𝑠𝑖𝑖,𝑠𝑖𝑖𝑄𝑠,𝑚=0,𝑖𝑗,𝑠𝑘𝑙=𝜌,𝑚(𝑖,𝑗)(𝑘,𝑙)(𝑚),𝑚𝑇hd𝑖𝑗𝑄𝑠,𝑘=𝑙=𝑖,0,otherwise,𝑖𝑗,𝑠F=,𝑚0,𝑚𝑇hd𝑖𝑗,1𝑇hd𝑖𝑗𝑚=1𝑄𝑠𝑖𝑗,𝑠𝑖𝑖,𝑚,𝑚>𝑇hd𝑖𝑗,𝑄𝑠F,𝑠F=,𝑚1,𝑚=1,0,𝑚>1.(21) is available, as shown in the following listed formulas: 𝑋𝑛 Although these formulas appear to be simpler, both the parameter estimation and competition probability calculations need much more calculation burden than the first case when FDI decision is modeled as a hypothetical Markov chain. Once 𝑣 is constructed, calculation of reliability function and MTTF are straightforward using available formulas [9].

6. Demonstration on an F-14 Aircraft Model

6.1. Model Description

A control problem of F-14 aircraft was presented in [10], and also used as a demonstration example in MATLAB Robust Control Toolbox.1 This problem considers the design of a lateral-directional axis controller during powered approach to a carrier landing with two command inputs from the pilot: lateral stick and rudder pedal. At an angle-of-attack of 10.5 degrees and airspeed of 140 knots, the nominal linearized F-14 model has four states: lateral velocity, yaw rate, roll rate, and roll angle, denoted by 𝑟, 𝑝, 𝜙, and 𝛿dstab, respectively, two control inputs: differential stabilizer deflection and rudder deflection, denoted by 𝛿rud, and 𝑝 respectively, and four outputs: roll rate, yaw rate, lateral acceleration, and side-slip angle, denoted by 𝑟, 𝑦ac, 𝛽,, and 𝑝 respectively. The system dynamics equations are ignored here, and can be loaded in MATLAB 7.1 using command “load F14nominal.” An additional disturbance input is added to represent the wind gust effects.

The control objective is to have desired handling quality (HQ) responses from lateral stick to roll rate 𝛽 and from rudder pedal to side-slip angle 5(2/(𝑠+2)). Under fault-free modes, the HQ models are 2.5(1.252/(𝑠+2.5𝑠+1.252)) and 5(1/(𝑠+1)); when fault occurs, HQ models degrade to 2.5(0.752/(𝑠+1.5𝑠+0.752)) and 14nom, respectively.

The system block diagram is shown in Figure 4, where F-𝐴𝑆 represents the nominal linearized F-14 model, and 𝐴𝑅 and 𝑒𝑝 the actuator models. 𝑒𝛽 and 𝑒act represent the weighted model matching errors. Actuator energy is described by 𝐴𝑆=𝐴𝑅=25𝑠+25.(22), and noise is added to the measured output after antialiasing filters.

Figure 4: Control design diagram for F-14 lateral axis (Courtesy of The MathWorks, Inc.).

The considered fault occurs in two actuators. Under fault-free mode, their transfer functions are 105 Two types of actuator faults are considered here, each has mean occurrence time 105 of FDI periods or its failure rate is 𝐴𝑆. Under fault type 1, the transfer function of 𝐴𝑆=0.515𝑠+15.(23) becomes 𝐴𝑅 Under fault type 2, the transfer function of 𝐴𝑅=0.510𝑠+10.(24) becomes 𝑆={0,1,2,3} These fault modes are described as the change of actuator gains and time constants. The set of fault modes is denoted by 𝐻, representing fault-free, fault type 1, type 2, and simultaneous occurrence of both.

6.2. Performance Characterization of Controller and FDI

Four 𝑇SST𝑗=6 controllers are designed for each fault mode to achieve nominal HQ control objectives under fault-free mode and degraded HQ performance under fault modes. Typical output trajectories under fault-free mode are shown in Figure 5, where the curves labeled with “Real” represent the measured outputs, “Ideal” the outputs under nominal HQ performance, and “Degraded” the outputs under degraded HQ performance. The absolute minimal matching errors between the real responses and the expected outputs under ideal HQ performance are shown in Figure 6, which are assumed to represent system safety behaviors. When these matching errors go over the safety limits, 30% of expected output, aircraft is considered as failed.

Figure 5: Output trajectories.
Figure 6: The trajectories of matching errors.

An IMM FDI is constructed to detect fault occurrences. To reduce false alarms, a steady-state test strategy is applied on FDI decisions with 𝑗 for any FDI mode 𝐻0=0.999000.00100.00001.00000000.133000.867000.5000000.5000.(25). A typical FDI trajectory is shown in Figure 7. It is clear that the steady FDI mode is free of false alarms in the shown time period. But detection time delays are introduced when fault occurs at 20 and 50 seconds, respectively.

Figure 7: FDI trajectory.

To represent FDI detection characteristics, a batch of fault and FDI history data is collected for statistical estimation. First, histograms of FDI delays are generated to check its distribution type. When there is no fault, the histogram of FDI sojourn time at fault-free mode is shown in Figure 8. It clearly resembles a geometric distribution. Equations (10)-(11) are then used to estimate Markov transition probabilities, and those under fault-free mode are obtained as 𝐻0(2,1)=1 Note that 𝐻0(2,2)=0 and 𝐻0(2,2) represent the transition probabilities of FDI from a false alarm state. Estimated based on the given history data, these values imply that the FDI leaves false alarm state in one transition cycle. But there may exist estimation error, and the true value of 𝒦2 may be close to but not exact zero.

Figure 8: Histogram of FDI sojourn time.

As a result of FDI false alarms, missing detections, and detection delays, controllers may be engaged for various fault modes for which they are not designed. So, it is necessary to evaluate system behavior under all possible combinations of FDI and fault modes. Here, Monte Carlo simulations are adopted with the following settings: (1) command stick inputs are square waves with frequency as a random variable ranging from 0.2 to 2 Hertz, (2) wind gust disturbances and sensor measurement noises are assumed to be Gaussian processes, (3) actuator saturation effects limit control inputs to 20 and 30, respectively, (4) system failure is assumed to occur when model matching errors go over 30% of stick commands. For example, with fault mode 2 occurred and 𝒦2 engaged, mean time to system failure is 57 403 seconds when controller 𝒦1 is used, and 6 seconds when 𝑣22=1/574030 is used. Considering the sampling period to be 0.1 second for IMM FDI, the outcrossing failure rate and hard-deadline are 𝑇hd21=60, 𝑇SST.

6.3. Reliability Evaluation

Reliability semi-Markov model can be constructed based on fault transition rates, FDI transition parameters, outcrossing failure rate, and hard-deadlines. Predicted reliability function and MTTF can be thereby calculated. By using MTTF as an objective, an optimization is performed on 𝑇SST𝑗. It is found that MTTF will be improved from 27 727 to 32 605 seconds if 𝑠00 is reduced from 6 to 1. A comparison of reliability functions before and after this optimization is shown in Figure 9. It is clearly shown that reliability index is improved.

Figure 9: Reliability functions comparison.

Comparisons on the transition probabilities between these two SST periods are shown in Figure 10, in which each subfigure gives the transition probability curves from 𝑠01 to other states. For example, the subfigure at the first row and second column shows that the transition probabilities to 𝑇SST𝑗 are increased from 0 to about 0.008. This is a natural result of increased false alarms when reducing 𝑇SST𝑗=1. In fact, when 𝐻0, new Markov transition parameters 𝐻0=0.98220.00170.01220.00380.26340.7366000.198900.801100.3530000.6470.(26) become 𝐻0

Figure 10: Comparison of transition probabilities.

Compared with 𝐻, the element on the first row and second column is increased from 0 to 0.0017, a confirmation of increased false alarms. On the other hand, detection delays are reduced approximately from 6 to 1, and system stays less time under mismatched fault and FDI cases. Overall, MTTF is improved.

This evaluation procedure can be completed in an online manner. Estimated FDI transition parameters 𝜁𝑛 and current mode of 𝜂𝑛 provided by confirmed test on FDI can be used to provide updated MTTF based on this most recent information.

7. Conclusions

A reliability monitoring scheme for FTCSs is reported in this paper. The scheme contains two postprocessing strategies on FDI results to provide estimated fault mode for control reconfiguration and confirmed mode for updating reliability. The stochastic transitions of FDI mode is represented by a semi-Markov chain with parameters estimated from history data. Under geometric sojourn time distributions, FDI mode can be described by an equivalent hypothetical Markov chain that simplifies its model and reliability analysis. Safety and satisfactory operation of system is defined by system trajectories and safety boundaries; the probability of violating this safety criterion under fixed fault and FDI modes is estimated using Monte Carlo simulations. Overall reliability evaluation is obtained through a semi-Markov model constructed by integrating FDI transition characteristics and failure probabilities under each regime model. This scheme provides timely monitoring on the reliability index of FTCSs, and was demonstrated on an F-14 aircraft model.


1MATLAB and Robust Control Toolbox are the trademarks of The MathWorks, Inc.


Proof of Lemma 1. The “only if” part is trivial as shown in (8). Let 𝜃𝑚 denote a semi-Markov chain; the associated Markov renewal processes are denoted as 𝑇𝑚 and 𝑘𝑖, and the sojourn time distribution 𝜂Pr𝑛+1=𝑗𝜂1,,𝜂𝑛𝜂=Pr𝑛+1=𝑗𝜃1,,𝜃𝑆𝑛,𝑇1,,𝑇𝑆𝑛.(A.1) when subsequent state is not specified is in geometric distribution: 𝜃𝑆𝑛=𝑗 If 𝜂Pr𝑛+1=𝑗𝜂1,,𝜂𝑛𝑇=Pr𝑆𝑛+1>𝑛+1𝜃1,,𝜃𝑆𝑛,𝑇1,,𝑇𝑆𝑛,𝑇𝑆𝑛+1𝑇>𝑛=Pr𝑆𝑛+1>𝑛+1𝜃𝑆𝑛,𝑇𝑆𝑛,𝑇𝑆𝑛+1𝑇>𝑛=Pr𝑆𝑛+1𝑇𝑆𝑛>𝑛+1𝑇𝑆𝑛𝜃𝑆𝑛,𝑇𝑆𝑛+1𝑇𝑆𝑛>𝑛𝑇𝑆𝑛𝑇=Pr𝑆𝑛+1𝑇𝑆𝑛>1𝜃𝑆𝑛𝜂=Pr𝑛+1=𝑗𝜂𝑛;(A.2), 𝜃𝑆𝑛𝑗 otherwise, 𝜂Pr𝑛+1=𝑗𝜂1,,𝜂𝑛𝜃=Pr𝑆𝑛+1=𝑗,𝑇𝑆𝑛+1=𝑛+1𝜃1,,𝜃𝑆𝑛,𝑇1,,𝑇𝑆𝑛,𝑇𝑆𝑛+1𝜃>𝑛=Pr𝑆𝑛+1=𝑗,𝑇𝑆𝑛+1=𝑛+1𝜃𝑆𝑛,𝑇𝑆𝑛,𝑇𝑆𝑛+1𝜃>𝑛=Pr𝑆𝑛+1=𝑗,𝑇𝑆𝑛+1𝑇𝑆𝑛=𝑛+1𝑇𝑆𝑛𝜃𝑆𝑛,𝑇𝑆𝑛+1𝑇𝑆𝑛>𝑛𝑇𝑆𝑛𝜃=Pr𝑆𝑛+1=𝑗,𝑇𝑆𝑛+1𝑇𝑆𝑛=1𝜃𝑆𝑛𝜂=Pr𝑛+1=𝑗𝜂𝑛.(A.3), and we have 𝑇Pr𝑆𝑛+1𝑇𝑆𝑛>𝑛+1𝑇𝑆𝑛𝑇𝑆𝑛+1𝑇𝑆𝑛>𝑛𝑇𝑆𝑛𝑇=Pr𝑆𝑛+1𝑇𝑆𝑛,𝑇>1Pr𝑆𝑛+1𝑇𝑆𝑛=𝑛+1𝑇𝑆𝑛𝑇𝑆𝑛+1𝑇𝑆𝑛>𝑛𝑇𝑆𝑛𝑇=Pr𝑆𝑛+1𝑇𝑆𝑛.=1(A.4) In the above derivations, the memoryless property of geometric distributions has been used: 𝜂𝑛 The Markov property of 𝜂𝑛 is proved, so is a Markov chain.


  1. G. J. Balas, A. K. Packard, J. Renfrow, C. Mullaney, and R. T. M'Closkey, “Control of the F-14 aircraft lateral-directional axis during powered approach,” Journal of Guidance, Control, and Dynamics, vol. 21, no. 6, pp. 899–908, 1998.
  2. V. Barbu, M. Boussemart, and N. Limnios, “Discrete-time semi-Markov model for reliability and survival analysis,” Communications in Statistics: Theory and Methods, vol. 33, no. 11, pp. 2833–2868, 2004. View at Publisher · View at Google Scholar
  3. R. V. Field Jr. and L. A. Bergman, “Reliability-based approach to linear covariance control design,” Journal of Engineering Mechanics, vol. 124, no. 2, pp. 193–199, 1998. View at Publisher · View at Google Scholar
  4. W. Kuo and M. Zuo, Optimal Reliability Modeling, John Wiley & Sons, Hoboken, NJ, USA, 2002.
  5. H. Li, Q. Zhao, and Z. Yang, “Reliability modeling of fault tolerant control systems,” to appear in International Journal of Applied Mathematics and Computer Science.
  6. H. Li and Q. Zhao, “Reliability evaluation of fault tolerant control with a semi-Markov fault detection and isolation model,” Proceedings of the Institution of Mechanical Engineers Part I, vol. 220, no. 5, pp. 329–338, 2006. View at Publisher · View at Google Scholar
  7. J. Song and A. Der Kiureghian, “Joint first-passage probability and reliability of systems under stochastic excitation,” Journal of Engineering Mechanics, vol. 132, no. 1, pp. 65–77, 2006. View at Publisher · View at Google Scholar
  8. B. Walker, “Fault tolerant control system reliability and performance prediction using semi-Markov models,” in Proceedings of Safeprocess, pp. 1053–1064, Kingston Upon Hull, UK.
  9. N. E. Wu, “Coverage in fault-tolerant control,” Automatica, vol. 40, no. 4, pp. 537–548, 2004. View at Publisher · View at Google Scholar
  10. Y. Zhang and X. R. Li, “Detection and diagnosis of sensor and actuator failures using IMM estimator,” IEEE Transactions on Aerospace and Electronic Systems, vol. 34, no. 4, pp. 1293–1313, 1998. View at Publisher · View at Google Scholar