Abstract

Fault tree analysis is a well-structured, precise, and powerful tool for system evaluation. However, the conventional approach has been found to be inadequate to deal with the absence of fault data, failure dependency, and uncertainty problems. This paper presents a comprehensive study on the evaluation of data communication system (DCS) using dynamic fault tree approach based on fuzzy set. It makes use of the advantages of the dynamic fault tree for modelling, fuzzy set theory for handling uncertainty, and Bayesian network (BN) for inference ability. Specifically, it adopts expert elicitation and fuzzy set theory to evaluate the failure rates of the basic events for DCS and uses a dynamic fault tree model to capture the dynamic failure mechanisms. Furthermore, some reliability parameters can be calculated by mapping a dynamic fault tree into an equivalent BN. The results show that the proposed method is more flexible and adaptive than conventional fault tree analysis for fault diagnosis and reliability estimation of DCS.

1. Introduction

Data communication system (DCS) is a key subsystem of urban rail transit and its reliability has a direct impact on the stability and safety of the train operation system. For fast technology innovation, the performance of key equipment in the DCS of urban mass transit has been greatly improved with the wide application of high technology on one hand, but, on the other hand, its complexity of technology and structure increasing significantly raise challenges in system reliability evaluation and maintenance. These challenges are displayed as follows. (1) Lack of sufficient fault data: fault data integrity has significant influence on the system reliability analysis. However, it is very difficult to obtain mass fault samples which need lots of case studies in practice due to some reasons. One reason is the imprecise knowledge in an early stage of new product design. The other factor is the changes of the environmental conditions which may cause that the historical fault data cannot represent the future failure behaviours. (2) Failure dependency of components: DCS adopts many redundancy units and fault tolerance techniques to improve its reliability. So, the behaviours of components in the system and their interactions, such as failure priority, sequentially dependent failures, functional dependent failures, and dynamic redundancy management, should be taken into consideration. (3) High levels of uncertainty: DCS is usually operated in a dynamic environment and is greatly affected by the technical, human, and operational malfunctions that may lead to hazardous incidents.

Fault tree analysis (FTA) has been widely used to calculate reliability of complex systems. It is a logical and diagrammatic method for evaluating the possibility of an accident resulting from combinations of failure events. However, the conventional FTA, which is commonly assuming that components of a complex system are described by precise probability distributions describing their reliability characteristics, has been found to be inadequate to deal with these challenges mentioned above. Therefore, fuzzy set theory has been introduced as a useful tool to handle challenges (1) and (3). The fuzzy fault tree analysis model employs fuzzy set and possibility theory and deals with ambiguous, qualitatively incomplete, and inaccurate information. Several researchers successfully used the fuzzy fault tree technique in various areas, including the nuclear safety assessment [1], risk analysis [2, 3], and reliability of gas power plant [4]. They treated basic events probabilities as fuzzy numbers and applied the fuzzy extension principle to compute the top event probability. However, these approaches use the static fault tree to model the system fault behaviours and cannot cope with challenge (2). Dynamic fault tree analysis has been introduced [5], which takes into account not only the combination of failure events but also the order in which they occur. Meshkat et al. analysed the dependability of systems with on-demand and active failure modes using dynamic fault tree and solved it to get some reliability results by Markov chains (MC) model [6]. However, this method has two well-known problems: one is the ineffectiveness in solving large dynamic fault tree; that is, MC-based approach has the infamous state space explosion problem. The other is the ineffectiveness in handling uncertainty of failure data; that is, the failure rates of the system components are considered as crisp values. Hence, Li et al. proposed a fuzzy dynamic fault tree to analyse the fuzzy reliability of the CNC machining centre [7]. Nevertheless, the solution for the fuzzy dynamic fault tree is still based on the MC model. In order to solve a larger dynamic fault tree, a discrete-time Bayesian network (DTBN) was proposed for the reliability analysis of dynamic fault tree in [8, 9]. They converted dynamic logic gates to DTBN and calculated the reliability results by a standard Bayesian network (BN) inference algorithm. However, this is an approximate solution and requires huge memory resources to obtain the joint probability distribution accurately. An innovative algorithm has been introduced to reduce the dimension of conditional probability tables by an order of magnitude. However, this method cannot perform probability updating [10]. Montani et al. proposed a translation of the dynamic fault tree into a dynamic Bayesian network (DBN) [11]. The DBN model is essentially applicable to Markov processes and the result of the calculation gives the approximated probabilities.

Motivated by the problems mentioned above, this paper presents a reliability evaluation for DCS based on fuzzy set and dynamic fault tree. It pays special attention to meet the above three challenges. We adopt expert elicitation and fuzzy set theory to deal with insufficient fault data and uncertainty problem by treating the failure rates as fuzzy numbers. In addition, we use a dynamic fault tree model to capture the dynamic behaviours of DCS failure mechanisms and calculate some reliability results using BN and algebraic technique in order to avoid the aforementioned problems.

The objective of this paper is to evaluate the reliability of DCS using fuzzy set and dynamic fault tree. This paper is organized as follows. Section 2 provides a brief introduction on DCS and its dynamic fault tree model. Section 3 describes estimation of failure rates for the basic events. Section 4 presents a novel dynamic fault tree solution which uses BN and algebraic technique. The outcomes of the research and future research recommendations are presented in the final section.

2. Dynamic Fault Tree of DCS

2.1. DCS

DCS is one of the key components of the train control system and is a medium for transmitting data among the modules in the automatic train control system. It mainly includes ground wire backbone communication networks and train-ground communication networks shown in Figure 1. The ground wire backbone communication networks are mainly used to connect zone controller, computer based interlocking system, automatic train supervision system data storage unit, and so on. As for the ground wire backbone communication networks, we usually adopt bidirectional self-healing loop industrial Ethernet. In particular, when one device fails, the communication networks will not interrupt. The train-ground communication networks have experienced a point-type electromagnetic induction communication, point-type wireless communication, and continuous wireless communication. The wireless communication based train control can not only decrease the ground units but also satisfy the requirements of mass train-ground information transmission and secure communication and thus improve the operational capability of the urban rail transport system.

The train-ground communication networks consist of the train-ground access devices and the train-ground communication transmission system. The train-ground access devices are responsible for information acquisition, information composition, information decomposition, information encoding, information decoding, and information transmission security mechanism. This can guarantee a safe, reliable, and real-time information transmission. Specifically, the train-ground access devices include the following.(i)Centralized Radio Control Unit (CRCU). CRCU, located in the control center, is primarily responsible for transmitting diagnostic information, passenger travel information, and speech information.(ii)Decentralized Radio Control Unit (DRCU). DRCU, located in the decentralized control center, offers the interface between the decentralized control system and the traction power supply system. In addition, it also performs the most important task such as information acquisition, composition, decomposition, encoding, and decoding among the decentralized control system, the vehicle control system, localization system, and the traction power supply system.(iii)Mobile Radio Control Unit (MRCU). MRCU, located on opposite ends of the train, not only offers the interface between the vehicle control system and the localization system, but also implements information processing among the vehicle control system, the localization system, the decentralized control system, and the traction power supply system.

2.2. Dynamic Fault Tree for DCS

DCS of urban mass transit is a complex system and adopts redundancy technique to ensure higher reliability. For example, the hardware redundancy technique is adopted in designing CRCU, DRCU, and MRCU. High coupling degree together with complicated logic relationships exists between these modules. So, the behaviours of components in these modules and their interactions, such as failure priority, sequentially dependent failures, functional-dependent failures, and dynamic redundancy management, should be taken into consideration. Obviously, traditional static fault tree is unsuitable to model these dynamic fault behaviours. So, we use the dynamic fault tree model to capture the dynamic behavior of system failure mechanisms such as sequence-dependent events, spares and dynamic redundancy management, and priorities of failure events. Taking the decentralized traction control failure as the top event, the dynamic fault tree of DCS is established in Figure 2. The failure events and different components of DCS are represented by different symbols which are presented in Table 1.

3. Estimation of Failure Rates for Braking System

In order to evaluate the reliability of DCS, failure rates of the basic events must be known. However, it is very difficult to estimate a precise failure rate due to lack of insufficient data or vague characteristic of the events, especially for the new equipment. In this study, the expert elicitation through several interviews and questionnaires and fuzzy set theory are used to determine the fault rates of the basic events.

3.1. Selecting Experts to Form Evaluation Committee

Experts are selected from different fields, such as design, installation, maintenance, operation, and management of the braking system, to judge failure rates of the basic events. They are more comfortable justifying event failure likelihood using qualitative natural languages based on their experiences and knowledge about the braking system, which capture uncertainties rather than by expressing judgments in a quantitative manner. The granularity of the set of linguistic values commonly used in engineering system safety is from four to seven terms. In this paper, the component failure rate is defined by seven linguistic values, that is, very high, high, reasonably high, moderate, reasonably low, low, and very low.

3.2. Converting Linguistic Terms to Fuzzy Numbers

After experts’ evaluation, a numerical approximation system was proposed to systematically map linguistic terms into trapezoidal fuzzy numbers. Each predefined linguistic value has a corresponding mathematical representation. The shapes of the membership functions to mathematically represent linguistic variables in engineering systems are shown in Figure 3. To eliminate bias coming from an expert, eleven experts are asked to justify how likely a basic event will fail in the system under investigation. So, it is necessary to combine or aggregate these opinions into a single one. There are many methods to aggregate fuzzy numbers. An appealing approach is the linear opinion pool [12]: where is the number of basic events; is the linguistic expression of a basic event given by expert ; is the number of the experts; is a weighting factor of the expert ; and represents combined fuzzy number of the basic event . Usually, an -cut addition followed by the arithmetic averaging operation is used for aggregating more membership functions of fuzzy numbers. The membership function of the total fuzzy numbers from experts’ opinion can be computed as follows: where is the membership function of a trapezoidal fuzzy number from expert and is the membership function of the total fuzzy numbers.

3.3. Calculating Fuzzy Fault Rate of the Basic Events

Obviously, the final ratings of the basic events are also fuzzy numbers and cannot be used for fault tree analysis because they are not crisp values. So, fuzzy number must be converted to a crisp score, named as fuzzy possibility score (FPS), which represents the most possibility that an expert believe occurring of a basic event. This step is usually called defuzzification. There are several defuzzification techniques [13]: area defuzzification technique, the left and right fuzzy ranking defuzzification technique, the centroid defuzzification technique, the area between the centroid point and the original point defuzzification technique, and the centroid-based Euclidean distance defuzzification technique. In this paper, an area defuzzification technique is used to map the fuzzy numbers into FPS because it has the lowest relative errors and has the closest match with the real data. If ( , , , ; 1) is a trapezoidal fuzzy number, then its area defuzzification technique is as follows:

The event fuzzy possibility score is then converted into the corresponding fuzzy failure rate, which is similar to the failure rate. Based on the logarithmic function proposed by Onisawa [14], which utilizes the concept of error possibility and likely fault rate, the fuzzy failure rate can be obtained by (4). Table 2 shows the fuzzy failure rates of the basic events for the braking system:

4. Dynamic Fault Tree Analysis Using BN and Algebraic Technique

4.1. Mapping Static Fault Tree into BN

There is a clear correspondence between static fault tree and BN. The fault tree can be seen as a deterministic particular case of the BN. Conceptually, it is straightforward to map a fault tree into a BN: one only needs to “redraw” the nodes and connect them while correctly enumerating reliabilities. Figure 4 shows the conversion of OR and AND gates into equivalent nodes in a BN. Parent nodes and are assigned prior probabilities, which coincident with the failure probability of the corresponding basic nodes in the fault tree, and child node is assigned its conditional probability table (CPT). Since the OR and AND gates represent deterministic causal relationships, all the entries of the corresponding CPT are either 0 or 1. The detailed algorithm of converting a fault tree into a BN was proposed in [15, 16].

4.2. Fault Probability of a Module with Sequence Dependence

Let us consider an event sequence composed of events, , including several spare events. An event in the sequence is denoted by , which means that the event that failed in the th order of the sequence is designated a spare of an event that failed in the th order. denotes an event that was originally in active mode. ( ) has a dormancy factor . The sequence probability of can be calculated using the -tuple integration as where indicates the occurrence time of , is the probability distribution function of , and is the survival function of in standby mode. is a set of events that were originally in active mode and ( ) is a set of spare events that fail in active (standby) mode [17].

When the failure time of in active mode follows an exponential distribution with , the sequence probability is where for , and is the inverse Laplace transform operator.

If every in the above equation is distinct from the other, the sequence probability is where .

4.3. Mapping Dynamic Fault Tree into BN

Dynamic fault tree extends traditional fault tree by defining special gates to capture the components' sequential and functional dependencies. Currently there are six types of dynamic gates defined: the functional dependency gate (FDEP), the cold, hot, and warm spare gates (CSP, HSP, WSP), the priority AND gate (PAND), and the sequence enforcing gate (SEQ). Here, we briefly discuss the FDEP and the WSP gates as they will be later used in our examples.

4.3.1. WSP Gate

The WSP gate has one primary input and one or more alternate inputs. The primary input is initially powered on and the alternate inputs are in standby mode. When the primary fails, it is replaced by an alternate input, and, in turn, when this alternate input fails, it is replaced by the next available alternate input, and so on. In standby mode, the component failure rate is reduced by a factor called the dormancy factor. is a number between 0 and 1. A cold spare has a dormancy factor and a hot spare has a dormancy factor . The WSP gate output is true when the primary and all the alternate inputs fail. Figure 5 shows the WSP gate and its equivalent DTBN. Table 3 shows the CPT of node . Suppose that and follow the same exponential distribution with . Here, and in this table can be derived as

and are sequence probabilities calculated by (8). Consider

The output of node WSP is an AND gate whose CPT is shown in Figure 4.

4.3.2. FDEP Gate

FDEP is used for modelling situations where one component’s correct operation is dependent upon the correct operation of some other component. It has a single trigger input, which could be another basic event or the output of another gate, a nondependent output reflecting the status of the trigger, and one or more dependent basic events. Figure 6 shows functional dependency gate and its equivalent BN. Table 4 shows the CPT of node . Here, in this table can be derived as

The CPT of output node FDEP is shown in Table 5.

5. Reliability Analysis of DCS

5.1. Calculating Reliability

According to the dynamic fault tree shown in Figure 2 and the basic failure data shown in Table 1, we can map the dynamic fault tree into an equivalent BN using the proposed method. Once the structure of a BN is known and all the probability tables are filled, it is straight forward to compute the fault probability of DCS using the inference algorithm. BN has already had some relatively mature accurate and approximate inference algorithms such as the variable elimination algorithm, the search-based algorithm, the conditioning algorithm, the jointree algorithm, and the differential algorithm. Here, we use the jointree algorithm to calculate the reliability indices of DCS. Table 6 shows the unreliability of DCS at the different mission time using some different methods for the dynamic fault tree solution. As we can see in Table 6, the accuracy of DTBN method increases when increases. Although the DTBN method ( ) is almost in agreement with the proposed method in this paper, the difference becomes larger with the memory of CPT and execution time.

5.2. Sensitivity Analysis

Sensitivity analysis allows the designer to quantify the importance of each of the system’s components and the impact the improvement of component reliability will have on the overall system reliability. Here, we show how one can perform sensitivity through the usage of sensitivity index [18]. The sensitivity index of the th basic event is defined as where is the probability of the top event failure; is the probability that the top event has occurred given that the basic event has not occurred.

Table 7 shows the sensitivity index of all basic events for DCS. According to Table 7, we know that the MRCU multiplexer board and DRCU multiplexer board have the maximum sensitivity index, which means that they are the key components. So, we should improve their reliability at the stage of product design in order to decrease the failure probability of DCS by several approaches.

5.3. Performing Diagnosis

Diagnosis is an obvious capability of the framework due to the use of BN. We can conveniently calculate some importance parameters by BN and perform diagnosis to locate the system failure. The diagnostic importance factor (DIF) is the corner stone of reliability based diagnosis methodology. DIF is defined conceptually as the probability that an event has occurred given that the top event has also occurred. This quantitative measure allows us to discriminate between components by their importance from a diagnostic point of view. Components with larger DIF are checked first. This assures a reduced number of system checks while fixing the system. Consider where is a component in system .

Suppose the system has failed, we would like to know what is the most probable cause that took the system down. So, we enter the evidence that the braking system has failed; that is, , and we solve the BN using the jointree algorithm. Table 8 gives the components’ DIF. We should check the component with larger DIF firstly one by one to locate the DCS failure. According to Table 8, the MRCU multiplexer board and DRCU multiplexer board have the maximum DIF, which means that they are the most unreliable components. So, when DCS fails, we should diagnose them firstly to locate the failure of DCS. Furthermore, proper measures should be allocated for these components to improve their reliability at the stage of product design in order to decrease the failure probability of DCS.

6. Conclusion

In this work, we have discussed the use of fuzzy set theory, dynamic fault tree, and BN to evaluate the reliability of DCS. Specifically, it has emphasized three important issues that arise in engineering diagnostic applications, namely, the challenges of insufficient fault data, uncertainty, and failure dependency of components. In terms of the challenge of insufficient fault data and uncertainty, we adopt expert elicitation and fuzzy sets theory to evaluate the failure rate of the basic events for DCS. In terms of the challenge of failure dependency, we use a dynamic fault tree to model the dynamic behaviours of system failure mechanisms. Furthermore, we calculate some reliability parameters of DCS using BN and algebraic technique in order to avoid the state space explosion problem and huge memory resources. As it can be seen from Tables 7 and 8, the MRCU multiplexer board and DRCU multiplexer board have the most contribution to the top event probability. So, we should improve their reliability at the stage of product design in order to decrease the failure probability of DCS by several approaches. The proposed method makes use of the advantages of the dynamic fault tree for modelling, fuzzy set theory for handling uncertainty, and BN for inference ability, which is especially suitable for reliability evaluation and fault diagnosis of the complex system.

In the future work, we will focus on the common cause failures to optimize the dynamic fault tree model and establish a method of dynamic fault tree solution without the exponential distribution assumption.

Conflict of Interests

The authors declare there is no conflict of interests.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 61074139 and Science and Technology Foundation of Department of Education in Jiangxi Province under Grant GJJ14166.