Abstract

Fault tolerant technology is often used to improve systems reliability. However, high reliability makes it difficult to get sufficient fault samples, resulting in epistemic uncertainty, which increases significantly challenges in these systems diagnosis. A novel dynamic diagnosis strategy for complex systems is proposed to improve the diagnostic efficiency in the paper, which makes full use of dynamic fault tree, Bayesian networks (BN), fuzzy sets theory, and TOPSIS. Specifically, it uses a dynamic fault tree to model dynamic fault modes and evaluates the failure rates of the basic events using fuzzy sets to deal with epistemic uncertainty. Furthermore, it generates qualitative structure information based on zero-suppressed binary decision diagrams and calculates quantitative parameters provided by reliability analysis using a hybrid approach. Additionally, sensors data are incorporated to update the qualitative information and quantitative parameters. Qualitative information, quantitative parameters, and previous diagnosis result are taken into account to design a new dynamic diagnosis strategy which can locate the fault at the lowest cost. Finally, a case study is given to verify the developed approach and to demonstrate its effectiveness.

1. Introduction

The growing demand for high reliability and safety in the modern systems has caused these systems to become increasingly complex. Fault tolerant technologies are widely employed to improve their dependability. However, high reliability makes it difficult to get sufficient fault samples (epistemic uncertainty) and complex structure makes components interact with each other (failure dependency of components), which increases significantly challenges in these systems diagnosis. Efficient diagnostic approaches, which bring the system back with the fastest speed as well as with minimal test cost, can reduce downtime and therefore enhance the operational functionality. Aimed at this issue, many researchers have developed lots of effective theories and methodologies. A sequential diagnostic algorithm based on heuristic information search was proposed by Johnson [1]. An efficient sequential test procedure was constructed using information theory to realize the failure location. But the result was not very optimistic. So, Huang and Jing presented a diagnosis strategy for multivalue attribute system based on Rollout algorithm, which could obtain the optimal diagnosis strategy [2]. Based on the above researches, Zhang et al. proposed a sequential fault diagnosis method with imperfect tests considering life cycle cost [3]. A function-based diagnosis strategy was presented, which constructed a correlation matrix and determined the diagnosis sequence using two indices for fault detection and fault isolation [4]. However, these approaches need to build the fault-test matrix beforehand, which was unsuitable to diagnose redundant systems, because there is not a one-to-one correspondence between the faulty component and test in redundant systems but many-to-one. Doguc proposed a reliability-based diagnosis approach which calculated the system reliability parameters using Bayesian networks (BN) and continuously monitored the overall system reliability value [5]. When significant changes to the system reliability detected exceeded the threshold, a search algorithm was used to locate the failed component which had the greatest changes between the prior probability and posterior probability. However, constructing BN was based on model training, which needed sufficient fault data and could not deal with epistemic uncertainty. A diagnostic importance factor (DIF) was introduced to determine the diagnosis sequence using dynamic fault tree analysis [6], which could, to some extent, alleviate fault data acquisition bottleneck. And, on this basis, a diagnosis method was proposed to incorporate the evidence information from sensors into the fault diagnosis in order to reduce the numbers of the minimal cut sets (MCS) [7]. Nonetheless, the solution for dynamic fault tree was based on Markov chain (MC) which usually leads to the state space explosion problem. Furthermore, it did not update the posterior failure probability based on the sensors information, which affects the diagnostic efficiency. In addition, diagnosis strategies of these methods are only based on DIF and usually caused MCS with a smaller DIF to be diagnosed first [8]. In the work of [9], reliability results were calculated by mapping a dynamic fault tree to a corresponding discrete-time BN (DTBN) and an efficient diagnostic algorithm, proposed based on the DIF of both components and cut sequences, could overcome the above disadvantages. However, DTBN is still an approximate solution for dynamic fault tree and requires huge memory resources to obtain the query variables probability accurately. Furthermore, these diagnostic methods usually assume that the failure rates of the components are regarded as crisp values describing their reliability characteristics. However, fault tolerant technology has greatly improved the system reliability, which causes the epistemic uncertainty problem. For this reason, it is impossible to estimate precisely the failure rates of the basic events, especially for the new equipment. That is to say, the crisp approaches have difficulty in expressing imprecision or vagueness nature in system model and cannot deal with epistemic uncertainty. This always happens in systems where available fault data are insufficient or under a dynamically varying environment. Fuzzy approaches might be a good choice when little quantitative information is available [10, 11]. So, fuzzy fault tree analysis is proposed to handle such issues. Tanaka et al. pioneered the work of using fuzzy sets theory in the fault tree analysis, treating the probabilities of basic events as trapezoidal fuzzy numbers, and used the fuzzy extension principle to calculate the probability of the top event [12]. Based on their work, many extensive researches were conducted [1317]. However, these approaches use the static fault tree to model the system fault behaviors and cannot cope with the dynamic failure behaviors of complex systems. So fuzzy dynamic fault tree (FDFT) analysis has been introduced [18, 19], which takes into account the combination of failure events as well as the order in which they occur. Nonetheless, the solution for FDFT is still MC based approach and has the notorious state space explosion problem. To overcome these difficulties and limitations, Duan and Zhou proposed a new diagnosis method which used fuzzy sets to evaluate the failure rates of the basic events and used a dynamic fault tree model to capture the dynamic failure mechanisms [20]. But the solution for the dynamic fault tree is still based on DTBN and cannot avoid the aforementioned problems. Besides, all the diagnosis algorithms are, in essence, single attribute decision-making. Although multiple attribute decision-making was used in [21, 22], both methods failed to incorporate sensors data and update the reliability results to optimize the diagnosis process using the previous diagnosis results.

Motivated by the problems mentioned above, this paper presents a dynamic diagnosis framework for redundant systems based on reliability analysis and sensors considering epistemic uncertainty shown in Figure 1. It pays special attention to meeting above challenges. We adopt expert elicitation and fuzzy sets theory to deal with uncertainty by treating the failure rates and diagnostic test cost as fuzzy numbers. Furthermore, we use a dynamic fault tree model to capture the dynamic behaviors of redundant systems failure mechanisms and calculate some quantitative parameters provided by reliability analysis using BN and algebraic technique in order to avoid the aforementioned problems. In addition, we present a new approach to incorporate sensors data and the previous diagnosis result to update the reliability parameters. At last, qualitative information, DIF, sensitivity index (SI), and heuristic information value (HIV) are taken into account comprehensively to obtain the optimal diagnostic ranking order using an improved TOPSIS. The proposed method takes full advantage of the dynamic fault tree for modeling, fuzzy sets theory for handling uncertainty, BN for inference, and TOPSIS for the best fault search scheme, which is especially suitable for fault location of redundant systems. The remaining sections of this paper are organized as follows. Section 2 provides the dynamic fault tree model construction of redundant systems, estimation of failure rates for the basic events, and dynamic fault tree solution based on BN and algebraic technique. A scheme is proposed to incorporate the evidence data from sensors to update the sum of all MCS and reliability parameters in Section 3. Section 4 presents a new dynamic diagnosis algorithm using an improved TOPSIS solution, which makes full use of DIF, SI, HIV, and the previous diagnosis result. Section 5 is devoted to a simple illustration example of the proposed approach. Several conclusions and future research recommendations are presented in the final section.

2. Fault Model Construction and Analysis

2.1. Dynamic Fault Tree Model Construction

Fault tree is a deductive method to determine the potential causes that may lead to the occurrence of a predefined undesired event, generally denoted as the top event. Dynamic fault tree extends static fault tree to model the dynamic behavior of system failure mechanisms such as priorities of failure events, spares and dynamic redundancy management, and sequence-dependent events. Usually, there are six major types of dynamic gates defined: the sequence enforcing gate (SEQ), the functional dependency gate (FDEP), the priority AND gate (PAND), the cold spare gate (CSP), the hot spare gate (HSP), and the warm spare gate (WSP). The construction of the fault tree usually requires a deep understanding of the system and its components. It usually includes the following steps: (1) Define the top event to be analyzed. (2) Set boundary conditions for the analysis. (3) Identify and evaluate fault events. (4) Construct the fault tree using the equivalent gates. You may refer to [23] for more details.

2.2. Fuzzy Failure Rate Generating Algorithm of the Basic Events

In order to calculate some reliability parameters for diagnosis, the failure rates of the basic events must be known in advance. However, fault tolerant technology has greatly improved the system reliability, and high reliability caused the lack of sufficient fault data. For this reason, it is very difficult to estimate precisely the failure rates of the basic events, especially for the new equipment. In this study, the expert elicitation through several interviews and fuzzy sets theory are used to estimate the failure rates of the basic events through qualitative data processing. It consists of four modules, that is, experts evaluation, fuzzification module, defuzzification module, and fault rates generator module. Experts evaluate the failure rates of the basic events by using qualitative natural languages based on their knowledge about the system which capture uncertainties rather than by expressing judgments in a quantitative manner. The component failure rate is defined by seven linguistic values, that is, very high, high, reasonably high, moderate, reasonably low, low, and very low. The objective of fuzzification module converts experts evaluation expressed in terms of qualitative natural languages into the operational format of fuzzy numbers and quantifies the basic event qualitative data into their equivalent quantitative data in the form of membership function of fuzzy numbers. Apparently, the final quantitative data taken from the fuzzification module are still in the form of fuzzy numbers and cannot be used for fault tree analysis because they are not crisp values. So, fuzzy number must be converted to a crisp score named as a fuzzy possibility score (FPS) which represents the most possibility that an expert believes occurring of a basic event. This step is usually called defuzzification [24]. There are several defuzzification techniques. We use an area defuzzification technique to realize this algorithm, which has lowest relative errors and the closest match with the real data. The event fuzzy possibility score is then converted into the corresponding fuzzy failure rate (FFR) by failure rates generator module. Based on the logarithmic function proposed by Onisawa [25], which uses the concept of error possibility and likely fault rate, FFR can be obtained.

2.3. Qualitative Analysis of Dynamic Fault Tree

Once the fault tree has been developed for the top event, it can be evaluated to identify MCS that would lead to the top event. Traditional algorithms such as the Boolean manipulation approach and binary decision diagrams based method are effective for solving MCS [2629]. But they are inappropriate to a dynamic fault tree. Zero-suppressed binary decisions diagrams (ZBDD), introduced by Tang and Dugan, separate timing constraints and logic constraints and convert the dynamic fault tree into the static fault tree [30]. This algorithm generates the MCS of the resulting static fault tree using some set operations and expands each MCS to minimal cut sequences by considering the timing constraints.

Let and be the input of MCS-AND and MCS-OR, respectively; the basic set operations are as follows:where and are the input of MCS-AND and MCS-OR. , , , and , respectively, represent set intersection, set difference, set union, and set product. and are the output of MCS-AND and MCS-OR, respectively.

The MCS generation algorithm is executed recursively during the depth-first left-most traversal of a fault tree. Firstly, it generates the MCS of the inputs of a connection gate and then carries out some operations to combine the MCS of the inputs into the MCS of the output of the connection gate. Finally, we can obtain all the minimal cut sequences from the MCS by considering the timing constraints. For convenience, we define the sum of all MCS as the characteristic function of the system.

2.4. Qualitative Analysis of Dynamic Fault Tree

After a dynamic fault tree is constructed and all basic events have their corresponding failure rates with the exponential distribution function, reliability parameters can be calculated by solving the dynamic fault tree. Traditional solution for dynamic fault tree is based on MC model [31], which has the state space explosion problem and cannot solve a larger dynamic fault tree. Therefore, DTBN was proposed to solve the dynamic fault tree in [32]. Dynamic logic gates are converted to DTBN and the reliability results are calculated using a universal BN inference algorithm. However, this is an approximate solution and requires many memory resources to calculate the probability distribution accurately. In addition, as the number of intervals increases, the accuracy and execution time increase greatly. An improved algorithm has been introduced to reduce the dimension of conditional probability tables by an order of magnitude [33]. However, this method fails to perform posterior probability updating. Yuge and Yanagi have proposed a novel approach to solve the dynamic fault tree [34]. They use modularization algorithm to classify the dynamic fault tree into two types: one satisfies the parental Markov condition and the other does not. The module without the parent Markov condition is replaced with an equivalent single event. The occurrence probability of this event is obtained as the sum of disjoint sequence probabilities. After the contraction of modules without the parent Markov condition, the standard BN algorithm is applied to the dynamic fault tree. In this paper, we expand this method and calculate the reliability parameters by mapping dynamic fault tree into a BN in order to overcome the disadvantages mentioned above. For simplicity, we suppose that dynamic fault tree has no repeated events and all components only have two states (state0 for failure and state1 for work).

2.4.1. Fault Probability of a Module with Sequence Dependence

Let us think about an event sequence with events, , including some spare events. An event in the sequence is expressed by , which means that the event that failed in the th order of the sequence is designated a spare of an event that failed in the th order. indicates an event that was originally in active mode. () has a dormancy factor . The sequence probability of can be calculated using the following -tuple integration:where indicates the occurrence time of , is the probability distribution function of , and is the survival function of in standby mode. is a set of events that were originally in active mode and () is a set of spare events that fail in active (standby) mode [34].

When the failure time of in active mode is subjected to an exponential distribution with , the sequence probability iswhere , for , and is the inverse Laplace transform operator.

If every in the above equation is distinct from the other, the sequence probability iswhere .

2.4.2. Mapping Dynamic Fault Tree into BN

Dynamic fault tree includes static logic gates and dynamic logic gates. For the fault tree only with static logic gates, it is straightforward to map a fault tree into a BN: one only needs to “redraw” the nodes and connect them while correctly enumerating reliabilities. Figure 2 shows the conversion of an AND gate into corresponding nodes in a BN and its conditional probability table (CPT). Dynamic fault tree extends traditional fault tree by defining special gates to model the components’ sequential and functional dependencies. So it is different from the static fault tree to map a dynamic fault tree into a BN. Next, we will discuss the WSP gate as it will be later used in our example.

The WSP gate has one primary input and one or more alternate inputs. The primary input is initially powered on and the alternate inputs are in standby mode. When the primary input fails, it is replaced by an alternate input, and, in turn, when this alternate input fails, it is replaced by the next available alternate input, and so on. In standby mode, the component failure rate is reduced by a factor called the dormancy factor. is a number between 0 and 1. A cold spare has a dormancy factor ; and a hot spare has a dormancy factor . The WSP gate output is true when the primary input and all the alternate inputs fail. Figure 3 shows the WSP gate and its equivalent BN. Table 1 shows the CPT of node . Suppose that and follow the same exponential distribution with . Here, and in this table can be derived as and are sequence probabilities calculated by the following equation:

If the dormancy factor , WSP turns into the cold spare gate and (6) can be written as follows:

The output of node WSP is an AND gate whose CPT is shown in Figure 2.

2.4.3. Calculating Reliability Parameters

After a dynamic fault tree model is constructed, we can convert the dynamic fault tree into an equivalent BN using the proposed method. Once the structure of a BN is known and all the probability tables are filled, it is very easy to calculate the reliability parameters of system using the BN inference algorithm. These reliability parameters mainly include system unreliability, DIF, and SI. Calculating the system unreliability is straightforward and it can be obtained using the following equation:where state1 represents the failure of system or component.

DIF is defined conceptually as the probability that an event has occurred given the fact that the top event has also occurred, which is the corner stone of reliability-based diagnosis methodology [6]. This quantitative measure allows us to discriminate between components by their importance from a diagnostic point of view. Components with a larger DIF are diagnosed first. This assures a reduced number of system checks while bringing back the system. Considerwhere is a component in system .

Suppose that the system has failed at the mission time; we enter the evidence that system has failed, that is, , and calculate DIF using the BN inference algorithm.

Sensitivity analysis allows the designer to see how far components in a system contribute to the top event failure and to quantify the impact the improvement of components reliability will have on the overall system reliability. Here we show how one can perform sensitivity through the usage of SI [35]. SI of the th basic event is defined aswhere is the probability of the top event failure; is the probability that the top event has occurred given the fact that the basic event has not occurred.

3. Incorporating Sensors Information

3.1. Sensors Diagnostic Model

When the redundant systems fail, sometimes additional evidence information from diagnostic sensors is observed too, and this may be used to optimize the diagnosis process. Usually, for a system that performs reliability analysis, a fault tree is constructed without including diagnostic sensors. To optimize the diagnosis process, Dugan has proposed an approach to append a sensor layer for capturing evidence onto the dynamic fault tree without impacting the reliability analysis and the sensor layer uses static gates to represent sensors [7]. An algorithm for using evidence information is developed to reduce the number of suspected MCS. However, this algorithm updates only the MCS according to sensors data and fails to update the quantitative information. As we all know, the BN created from the dynamic fault tree is appropriate for reliability analysis. In order to use the BN for fault diagnosis, we need to add to some evidence nodes representing the evidence information. Evidence node in the BN provides links connecting it with the component in the BN, which is observed by the sensor. The links are directed from the component to the evidence nodes. Evidence nodes in the BN create a CPT using the probability of producing the observation results. Figure 4 shows the sensors diagnostic model. S1 and S2 are sensors, which detect the components and , respectively. Supposing that sensors do not fail at the mission time, we can obtain the CPT of S1 shown in Figure 4. This sensors diagnostic model does not affect the system reliability analysis and can update the qualitative information and quantitative parameters according to sensors data. In addition, the performance of a diagnostic system highly depends on the number and location of sensors. We determine the best location of sensors based on the expected cost specifications in the paper.

3.2. Updating the Qualitative Information

If sensors detect the failure of some components, we can adopt the evidence information to reduce the number of the diagnosed MCS. Since examining a cut set that caused the system to fail then fixing the bad components in that cut set should bring the system up, we can enhance diagnosis by reducing the number of cut sets examined. The cut sets under evidence (CUE) is the set of all essential MCS obtained after evidence eliminates some cut sets. We can use evidence information from sensors to manipulate the characteristic function of the system in order to generate the CUE function using Algorithm 1.

Input:
: the characteristic function
: evidence information function
: if occurred, , otherwise
Output: CUE
    = 0
   if ()
   for (product )
    temp
    for (component product)
     if ((product ) = component)
        temp
     else  temp
    
    = ITE(, 0, temp )
  
  return ()
3.3. Updating the Quantitative Parameters

In addition, we can use the evidence information to update quantitative parameters such as DIF and SI. The DIF of the components under the evidence information conditions can be calculated using (11). Now we input the evidence information to the BN and update the DIF of components and SI of components using the inference algorithm. Considerwhere , , and represent component, system, and evidence information, respectively.

4. Dynamic Diagnosis Strategy

In essence, the optimal diagnostic sequence problem is a multiple attribute decision-making (MADM) problem. MADM models try to choose the best alternative given a set of selection attributes and a set of alternatives. Generally, there are three independent steps in MADM models to get the ranking order of alternatives [36]. (1) Determine the relevant attribute and alternatives. (2) Allocate numerical measures to the relative importance of the attribute and to the impacts of the alternatives on these attributes. (3) Determine a ranking score of each alternative using a corresponding algorithm. Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) is a useful technique in dealing with MADM problem, developed by Hwang and Yoon [37]. It is based on the concept that the best alternative should have the farthest distance from the negative ideal solution (NIS) and the shortest distance from the positive ideal solution (PIS). The positive ideal solution minimizes the cost attributes and maximizes the benefit attributes, whereas the negative ideal solution minimizes the benefit attributes and maximizes the cost attributes. TOPSIS takes full advantage of attributes information, gives an optimal ranking of alternatives, and does not need attributes preferences to be independent. This method has the advantage of simplicity and ability to obtain an indisputable preference order. In the process of TOPSIS, the performance ratings are usually given as crisp values. Usually, crisp data are insufficient to model real life situations. Since experts opinions are often vague and cannot evaluate this performance with an exact numerical value, a more feasible solution may be to use linguistic evaluations instead of numerical values, that is, to suppose that the ratings of the attributes are evaluated by means of linguistic variables. In this paper, we propose an improved TOPSIS to solve the MADM problem.

4.1. Constructing and Normalizing Diagnostic Decision Table

DIF enables us to discriminate between components by their importance from a diagnostic point of view. SI allows the designer to quantify the importance of each of the system’s components and the impact the improvement of component reliability will have on the overall system reliability. So we treat DIF and SI as attribute and attribute , respectively. Owing to the different complexity of components their test cost is different. A balance should be taken into account between the DIF and test cost. Therefore, we introduce a new measure of importance called HIV, which allows us to optimize the cost of diagnosis. This measure is simply the DIF per unit cost. HIV has an important effect on the diagnostic sequence and is treated as attribute . HIV appears in the following equation:where is the DIF of component ; is the test cost of component .

Different attributes usually have different values and dimensions, which are not always directly comparable, so we should normalize the diagnostic decision table. For the quantitative data, we normalize them with the following equation:where is the th attribute value of the th component.

For the fuzzy numbers, we normalize them with the following equation: where ; is the module of the triangular fuzzy number :

4.2. Determining the Weights of Attributes

A weight can be defined as a value assigned to an evaluation criterion that indicates its importance relative to other criteria under consideration. Many methods can be used to calculate the relative weights. In this step, Shannon’s entropy is applied to determine the relative weights of attributes. In entropy-based weights, if all alternatives have the same value in one of the decision measures, its entropy is maximized, which means that this measure is of no importance to the decision-maker because the relative weights of the alternatives are the same. Alternatively, if a measure is below its maximum entropy, it means that the alternatives are highly differentiated and, therefore, have a higher relative importance weight. It follows that the relative importance weight of a given measure is inversely proportional to its entropy. The weights of attributes are calculated as described below:where is the number of alternatives and is the weight factor.

Then, the entropy value of the attributes can be calculated as follows:where is the entropy of normalized values.

Nonreliance or deviation () obtained from data for attribute states that the relevant attribute is how much useful information is placed at the disposal of the decision-maker for decision-making. The value of is obtained in the following equation [38]:

Finally, weights of attributes can be calculated as follows:where is the weight of attribute.

4.3. Determining the Positive and Negative Ideal Solutions

Attributes can be divided into two classes: benefit attributes where larger performance scores are preferred and cost attributes where smaller performance scores are preferred. There are three attributes in diagnostic decision table and they belong to the benefit attributes. The positive ideal solution is made of all the best performance scores and the negative solution is made of all the worst performance scores at these measures in the diagnostic decision table. When the attribute is a benefit attribute, the positive and negative ideal solutions are calculated aswhere is the maximal value of the th attribute and is the minimal value of the th attribute.

When the attribute is a cost attribute, the positive and negative ideal solutions are calculated as

4.4. Determining the Optimal Diagnosis Sequence

The distance of each alternative to the positive ideal solution and negative ideal solution is known in the literature as Euclidean distance and can be currently calculated using the following equations:After and of each alternative have been calculated, a closeness coefficient is introduced to determine the ranking order of all alternatives. The closeness coefficient of each alternative is calculated as

The closeness coefficient measures how far an alternative is from its best and ideal solution. Obviously, an alternative comes closer to the PIS and farther from NIS as approaches 1. Therefore, we can determine the optimal ranking order of all alternatives and choose the best one with the biggest closeness coefficient from among a set of feasible alternatives.

4.5. Updating Diagnostic Decision Table Using the Previous Diagnosis Result

As mentioned above, DIF, SI, and HIV are considered comprehensively to obtain the optimal diagnostic ranking order of redundant systems using an improved TOSIS. The component with a bigger closeness coefficient should be checked first. This assures a reduced number of system checks while repairing the system. However, this method fails to update the reliability results in order to optimize the diagnosis process using the previous diagnosis result. That is to say, DIF, SI, and HIV are not updated along with the previous diagnosis result, thereby having the significant effect on the diagnosis performance. So when the previous diagnosis fails we should feed this evidence information to BN and update the reliability results. Furthermore, the diagnostic decision table can be updated and the closeness coefficient can be calculated to determine the next optimal ranking order using the aforementioned method. Finally, we can obtain the optimal diagnostic ranking order of redundant systems.

5. Case Study

5.1. Dynamic Fault Tree of Braking System

The microcomputer controlled straight electropneumatic braking system has the advantages of the swift response, flexible operation, combined application with electric braking, and antislip control, which has been the first choice braking system for urban rail transit. It is an electromechanic control system and achieves its function by the coordination of electrical circuit part and air circuit part. Specifically, it includes power unit for braking system, service braking instruction processing unit, service braking control unit, emergency braking instruction processing unit, air supply unit, and braking execution unit. The service braking instruction processing unit includes braking controller, logic controller, and braking instruction line, which generates the service braking signals and transmits them into the braking control unit of every vehicle; service braking control unit receives service braking signals, calculates service braking force, and detects braking system state. It consists of microcomputer brake control unit (MBCU) and several valves. Four modules (empty weight valves, under compaction switch, emergency braking button, and emergency braking switch) form the emergency braking instruction processing unit which generates the emergency braking signals and transmits them into the emergency braking control unit; air supply unit offers air for braking system and thus a train is actuated to brake by braking execution unit. High coupling degree together with complicated logic relationships exists in these modules. So we use dynamic fault tree to model the dynamic fault behaviors. Figure 5 shows a dynamic fault tree for service braking failure of a microcomputer controlled straight electropneumatic braking system. Table 2 shows the fuzzy failure rates of the basic events for the braking system using expert elicitation and fuzzy sets theory.

5.2. Calculating Reliability Data

After the dynamic fault tree is constructed and all failure rates of the basic events are generated, reliability results of the braking system can be calculated by quantitative and qualitative analysis. We can use a ZBDD method mentioned above to obtain all MCS of the braking system. The characteristic function of the braking system is

For qualitative analysis, we map the dynamic fault tree into the equivalent BN to calculate the DIF, SI, and HIV. Supposing that the mission time is 2000 hours, we enter the evidence that the braking system has failed:

Solving the BN using the inference algorithm gives the DIF of components for the braking system in Table 3.

However, the performance of a diagnostic system highly depends upon the number and location of sensors. According to the optimal sensors placement in [39] and Table 3, and will be the best location of sensors. If sensors do not detect the failure of and , we can use the evidence to reduce the number of the diagnosed minimal cut sets using Algorithm 1. The following CUE function is generated:

Now we input the evidence defined as (27) to the BN and update the DIF and SI using the inference algorithm. Table 4 shows the diagnostic data with sensors data. Consider

5.3. Dynamic Diagnosis Strategy Based on TOPSIS

Test cost of components is usually very difficult to be expressed as crisp values owing to uncertainty. So we introduce the fuzzy linguistic expression to assess the test cost of components. Table 5 shows the components’ test cost for the braking system. Table 6 gives the diagnostic decision table for the braking system. We can obtain the weighted normalized diagnostic decision table shown in Table 7 using (13)~(19). Table 8 shows the distance of each alternative from the positive and negative ideal solutions together with the corresponding closeness coefficient.

According to Table 8, we should diagnose firstly. If fails, then diagnosis is over. Otherwise, we should feed this evidence information ( works) to BN and update the diagnostic decision table. Table 9 shows the updated corresponding closeness coefficient of components for the braking system when works. So we can draw a conclusion that the next component diagnosed is . If fails, then diagnosis is over. Otherwise, we input this evidence information to BN and update the diagnostic decision table. Repeat these steps and we can finally obtain the optimal diagnostic ranking order of the braking system. As a CUE represents minimal sets of component failures under evidence that can cause system failure, we should diagnose them one by one to find the root reason of the braking system failure. After we finish diagnosing a CUE we can diagnose the next CUE. The diagnostic ranking order of components in the same CUE is determined by their closeness coefficients. The components with a larger closeness coefficient in a CUE are checked first. Based on the closeness coefficient and CUE, we can obtain the optimal diagnostic ranking order of the braking system described in the graphical diagnostic decision tree (DDT) and the DDT for the braking system appears in Figure 6(a). If we fail to use the previous diagnosis results to update the reliability parameters, the corresponding DDT is shown in Figure 6(b).

Average diagnostic cost is widely used to evaluate the algorithms of fault diagnosis. The lower the diagnostic cost is the better the diagnosis algorithm is. For simplicity, we use expected diagnostic cost (EDC) to calculate diagnostic cost. EDC can be computed bywhere is the unreliability of the system under sensors data, is the sum of all test costs from the top node to the CUE’s leaf node, and is the unreliability of CUE.

Since the test cost of components is expressed as a triangular fuzzy number, it cannot be directly used to calculate EDC. For convenience, we evaluate the test cost of components with ten times the modal value of the triangular fuzzy number. Supposing that the test cost of components is independent, the diagnostic cost of different algorithms using (28) is shown in Table 10, which indicates that the proposed approach is more efficient than others because it takes the DIF, SI, HIV, and previous diagnosis result into account comprehensively to obtain the optimal diagnostic ranking order of the braking system.

6. Conclusion

In this paper, we have discussed the use of dynamic fault tree, BN, fuzzy sets theory, and MADM to locate complex systems failure. Specifically, it has emphasized three important issues that arise in engineering diagnostic applications, namely, the challenges of insufficient fault data, uncertainty of test cost, and failure dependency of components. In terms of the challenge of insufficient fault data, expert elicitation and fuzzy sets theory are used to evaluate the failure rates of the basic events for complex systems. In terms of the challenge of uncertainty, triangular fuzzy number is adopted to express the test cost of components. In terms of the challenge of failure dependency, a dynamic fault tree is used to model the dynamic behaviors of system failure mechanisms. Furthermore, we calculate some reliability results based on BN and algebraic technique in order to avoid some disadvantages. And based on this we present a scheme to incorporate sensors information to update reliability results. In addition, we take the qualitative information, DIF, SI, and HIV as well as the previous diagnosis result into account and design a new dynamic diagnosis strategy using an improved TOPSIS. The proposed method takes full advantage of the dynamic fault tree for modeling, fuzzy sets theory for handling uncertainty, and TOPSIS for the best fault search scheme, which is especially suitable for fault diagnosis of complex systems.

In the future work, we will focus on how to deal with sensors reliability and use their reliability to update DDT, thereby optimizing the diagnosis efficiency.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (71461021), the Natural Science Foundation of Jiangxi Province (20142BAB207022 and 20151BAB207044), the Science and Technology Foundation of Department of Education in Jiangxi Province (GJJ14166), the China Postdoctoral Science Foundation (2015M580568), and the Postdoctoral Science Foundation of Jiangxi Province (2014KY36).