Abstract

The influence of failure propagation is ignored in failure sample selection based on traditional testability demonstration experiment method. Traditional failure sample selection generally causes the omission of some failures during the selection and this phenomenon could lead to some fearful risks of usage because these failures will lead to serious propagation failures. This paper proposes a new failure sample selection method to solve the problem. First, the method uses a directed graph and ant colony optimization (ACO) to obtain a subsequent failure propagation set (SFPS) based on failure propagation model and then we propose a new failure sample selection method on the basis of the number of SFPS. Compared with traditional sampling plan, this method is able to improve the coverage of testing failure samples, increase the capacity of diagnosis, and decrease the risk of using.

1. Introduction

In the process of industrial manufacturing for electrical systems and equipment, testability plays a crucial role in the reliability improvement for large scale electrical equipment [1]. As we know, having good testability of systems and equipment can detect and isolate failures quickly, reduce maintenance time, and increase the availability of the system. Thereby, testability is paid more attentions by many researchers.

Testability refers to testing the abilities of failure diagnosis, fault prognosis, and fault isolation. Numerous models and methodologies have been developed to diagnose, prognose, and prevent failures or faults. In 1983, Huang et al. introduced a new diagnostic concept of K-node fault diagnosis [2]. They pointed out that testability is only relying on the structure of circuits with nothing to do with the value of elements. In [3], Maeda et al. discussed factors of testability and distinguishability for nonlinear systems according to analysis and graph theory. Yang et al. [4] proposed the slope fault mode on the complex field; the method is available for the diagnosis of linear or nonlinear analog circuits. In [5], a new fault diagnostic method under tolerance condition is proposed by using fuzzy math means to detect faults. In order to detect and isolate faulty components and to predict the remaining useful performance of analog circuits, Vasan et al. [6] proposed using a kernel method and a particle filtering method for diagnosis and prognosis, respectively.

One significant stage of design for testability is the testability demonstration experiment. It is to test the ability of failure detection and isolation through injecting some failures [1]. As injected testing failure samples, these failures samples are randomly selected or the selection depends on the biggest probability of failure in traditional methods. However, in accordance with the traditional testability demonstration experiment method failure propagation modes in systems are ignored, which commonly leads to serious fault omission in failure sample selection. In other words, if propagation failures are caused by some failures which have very low failure rate, it means that these failures could not be selected in the traditional testability demonstration experiment because of their lower failure rate. When their failures occur in a system they will spread to other components and cause huge faults. The phenomenon could be a serious problem. That is, if we do not consider propagation failures, these low failure rate faults which cause propagation faults could be missed. This means that established test failure set is not complete and is not able to detect and isolate failures correctly. To solve the problem, numerous related researches have been developed [79]. Reference [7] proposed an approach to analyze failure propagation of aircraft engine systems with small world net theory. Li et al. utilized fuzzy probability Petri net model to describe fault propagation and then the method of sample selection based on propagation intensity was introduced [8]. This method can afford better fault coverage rate.

Our work is to employ failure propagation probability to deduce the intensity of failure propagation and then optimize maximum probability failure propagation path using ant colony optimization (ACO) according to the intensity of the failure propagation. Finally, subsequent failure propagation sets (SFPS) are built and a new failure sample selection plan is proposed. The proposed method can effectively reduce the risk of omission of propagation failures, and it increases the accuracy of failure diagnosis.

The materials in this work are arranged as follows. In Section 2, a brief introduction to the principle of failure propagation modeling and ant colony optimization is introduced. Moreover, a new failure sample selection optimization method is presented. In Section 3, a case study is used to verify the failure diagnostic effect of our method through comparing the traditional failure sample selection. Finally, brief conclusions are presented in Section 4.

2. Methodology

The proposed method involves three major stages: () failure sample selection plan design and analysis on the basis of failure propagation model; () path optimization with ACO; and () failure sample selection optimization based on subsequent propagation failure set. The block diagram of the procedure of optimization for electrical systems is shown in Figure 1. Once a sampling plan is confirmed, failure samples will be assigned to different modules. In order to detect and identify failures correctly and avoid the omission of propagation failures, failure propagation should be taken into account. Through adopting ant colony optimization algorithm, the maximum probability propagation path is searched. Then a new failure sample selection is proposed according to the intensity of edge in this path. In the section, fundamentals of the procedure of testability demonstration experiment and ACO are provided as below at first.

2.1. The Modeling and Analysis of Failure Propagation
2.1.1. Stage 1: The Building of Failure Sample Selection Plan

Before testability demonstration experiment, we need to extract enough failure samples by using random sampling. Suppose, in a testability demonstration experiment, samples are selected for independence test and tests result in failure. A positive integer is regulated as a threshold value; if , the experiment is considered up to standard. Otherwise, it is considered unqualified. Figure 2 shows the process of traditional failure sample selection plan. Thereby, the primary aim of the scheme is to determine the value of and .

For testability demonstration experiment, we assume that it meets the requirements of a binomial distribution. Assume that the probability of success of each test is , after independent tests. The probability of failures can be expressed as where is the combinatorial number. It represents the number of all combinations where each combination is an unordered collection of distinct elements. And these distinct elements are taken from a giving set consisting elements. To our knowledge, for a successful testability demonstration experiment, the number of failure tests should be less than or equal to the threshold value . Therefore, the probability of failure for a successful testability demonstration experiment is equal to the sum of probabilities of its failure tests. The following expression is given by

Through consultation between suppliers and customers, the design value of fault detection rate (FDR) is determined. The design value is the probability of success for one test. FDR’s minimum acceptable value is . When , we consider the test has reached the design standard. The suppliers’ risk is , which denotes the minimum accepted probability of success for an experiment by suppliers. The risk of using is which is the maximum probability of failure for the experiment. Under these conditions, we can use formula (3) to determine the values of and .When the plan is confirmed, samples will be assigned to different modules in the system according to layered design pattern and proportions. Then, an injected failure set is built through extracting failure modes in each module. As we all know, we should acquire only failure modes, and the number of these failure modes is less than the total number of failure modes in the systems. In order to guarantee that the injected failure set has bigger failure coverage, a hierarchical distribution of failure sample size is used. Its formula is shown inwhere denotes the number of assigned samples for module , is the assignment weight of the th module, is the number of failure modes of the th module and it indicates the complexity of equipment, is the operation time coefficient of the th module and it is equal to the ratio of the operation time and work life, and represents mean time between failures in module . Thus, is the failure rate in the th module which is expressed in failures per unit of time.

2.1.2. Stage 2: Failure Propagation Modeling

In this section, a failure propagation model will be built based on propagation probability with the use of directed graph (DG) of failure propagation and adjacency matrix. In graph theory, DG is a graph, which is a set of nodes connected by directed edges. It can be used to describe the relationship of failure propagation among components of electrical system with nodes and directed edges. In formal terms, directed graph is represented with a function as shown in Figure 3. In the diagram, indicates nodes (components) set; expresses a failure set which includes 5 failure modes such as ; is a set of directed edges which can clearly describe the link and the relationship between any two circuit components or modules with the capacity or intensity of the failure propagation.

These intensities of the failure propagation and relationship between nodes (components) may be heterogeneous. Assuming the system has nodes, we introduce adjacency matrix to describe the link relationship between components with all zeros on the main diagonal and off-diagonal elements. It is given as follows:where is the directed weight between node and node with probability , for , for , and .

The existence of an edge from node to node is determined by the probability which is independent of other edges. The probability iswhere is the membership degree of ambiguity set of failure states, indicates various failure’s symptom signals, and represents the probability of the th . The probabilities are collected in the probability matrix .

2.1.3. Stage 3: Analysis of Failure Propagation

When a failure occurs in a certain node of circuit system, the failure spreads to its connected neighbor nodes and could lead to these neighbor node failures. As the directed link weight between nodes, intensity of failure spread indicates the fact that the greater the intensity an edge has, the bigger the possibility that failure propagation happens in the edge. It means that the failure propagation may lead to bigger possibility of cascading failures to its connected neighbor nodes with bigger intensity of edge.

In order to describe the intensity of failure spread, the formula of the intensity is given as follows:where is crossing-clustering coefficient. is the weight of failure propagation probability, is the weight of node degree, is the propagation probability from node to node in the th propagation step, represents subsequent node set after propagation steps, and indicates the node degree of node in . Node degree is the number of edges associated with a node.

In order to easily compare intensities between each other and also to simplify calculations, the -score of is the most suitable method to compare these intensities in our work, because -score indicates a datum above or below the mean with signed number. It is defined as where is the expected value of and is the standard deviation of the population of .

For instance, we have known intensities of edges of Figure 3. Their -scores are calculated by making use of (8) and (9) as shown in Table 1. We take as an example; according to the structure of the DG, we can see that there are two propagation edges from —namely, edge () and edge (). It is clear that the intensity (1.3908) of the edge () is greater than the intensity (−0.9934) of the edge (). As a result, it is easy to determine that failure leads to a failure propagation on the edge () with greater possibility than on the edge ().

According to the above analysis, failure propagation happens on the path with the greatest intensity the failure propagation has. As shown in Figure 3, the bold line is ’s propagation path with the maximum intensity.

2.2. Path Optimization with ACO

In general, the structure of Very Large Scale Integration (VLSI) is very complex and hard to analyze failure propagation through manual work. Hence, intelligent algorithms are used. In order to obtain the maximum probability failure propagation path, the ACO is adopted in the paper.

The algorithm was proposed by M. Dorigo in his doctoral thesis in 1991 and it was aimed at solving the travelling salesman problem based on the action of ants, in which the goal was to find the shortest round-trip to link a series of cities [10]. More details about this technique can be found in [10]. The ACO has strong robustness and it is suitable for parallel implementations [11]. Therefore, we use the ACO to search the maximum probability failure propagation path.

The mathematical model of the maximum probability failure propagation path in circuits can be represented as follows:At a given time , ants make use of pheromone which is deposited between nodes to search subsequent path from node . For ant , the probability of selected next path is where is equal to the intensity of failure propagation from node to node , is the amount of pheromone deposited for transition from node to node , are parameters to control the influence of and , respectively, and is the set of nodes which connect with node .

The pheromones are updated bywhere represents the pheromone evaporation coefficient; is the initialization of pheromone; is the amount of pheromone; is the number of ants; and is the pheromone of ant .

2.3. Failure Sample Selection Optimization Based on Subsequent Propagation Failure Set

In testability demonstration experiment, for Unit Under Test (UUT), there is a replaceable unit (, being the amount of replaceable units), which consists of failure modes. Adopting the rule of allocation in stratified sampling, failure modes are assigned to the replaceable unit. Thus, we need to consider selecting suitable failure modes from total failure modes to establish a failure sample set. As our discussion above, we also need to take the influence of the failure propagation into account. Here, subsequent failure propagation sets (SFPS) are made use of to optimize the failure sample set. SFPS is defined as a set of failure modes which occur in a failure propagation path and it indicates the range of failure spread.

We assume that failure mode set of the replaceable unit is , and steps of failure sample selection optimization based on SFPS are described as follows.

Step 1. Count the SFPS number of every element (failure mode) of the failure mode set to construct a set Then count the number of elements which are greater than 1 in the set , marked as . After that, select failure modes from to construct a new failure mode set . These selected failure modes have more than one SFPS. Next, make , where is the set of the remaining failure modes.

Step 2. If , generate a random number set in which these numbers are discrete uniform distribution between 1 and . It is denoted by . And then, according to , extract failure modes, respectively, from the set by natural order to make up a set . A last failure sample set for test ability demonstration experiment is confirmed by .

Step 3. If , create random numbers with discrete uniform distribution between 1 and . It is marked . Next, select elements from the set to compose a set ; . It is also the last failure sample set .

Step 4. Achieve the amount of failure sample set through adding up the failure sample sets from Steps 2 and 3.

3. Case Study

A certain type of air-to-air missile system consists of six modules, namely, refrigeration module, vibration control device, rectifier, shear stents, lock system, and the box of circuit. Here, we only take the refrigeration module as an example. Table 2 shows failure modes of the refrigeration module.

Suppose that we have known these values of according to the agreed contract between suppliers and customers. By making use of formula (3) the failure sampling plan () is confirmed. It means that 50 failure samples will be assigned to 6 modules with proportional stratified sampling method. We have known the assignment weight of the refrigeration module in the system is 0.121; thus the number of failure samples assigned to it is 6 based on expression (4). Therefore, we should pick up 6 suitable failure samples from a total of 12 failure samples to establish failure sample set for refrigeration module.

In accordance with the circuit connection of the system, directional graph of failure propagation for refrigeration module is gained as shown in Figure 4. From the graph, we can see that the component set of refrigeration module is and its failure mode set is . To take advantage of formulas (6), (7), and (8), the propagation intensity of each directed edge in the system is obtained as shown in Table 3. In the table, the range of -score value is from −1.1389 to 1.6663.

Analyzing the failure propagation with the data from Table 3, it is clear that the greater the -score value of an edge is, the more possible it is that the failure propagation happens on the edge. For instance, the -score value of intensity of edge () is −1.1389. It is the smallest value in all -score values. This means that when a failure occurs in , the failure could not spread to or spread to with tiny possibility. However, edge () has the greatest -score value of 1.663. It shows propagation failure is inevitable on the edge. Thus, through searching the maximum -score for each failure with ACO, we can find the maximum probability propagation path of failure easily. Take as an example; failure spreads along with edge () and edge () which have the biggest intensities (0.8528 and 1.6663, resp.). As a result, the maximum probability propagation path () of failure is obtained with ACO. At the next step, we use the method mentioned in Section 2.3 to obtain the related subsequent failure propagation set () of ; . It means the failure spread number from is 2. Utilizing the same method, other modules’ subsequent failure propagation sets and failure spread numbers can be solved as well. Finally, the optimization samples set is established based on the failure spread path.

Table 4 shows the advantages of the proposed method compared with traditional failure sampling plan. The symbol √ in the table expresses the selection of failure samples. By the traditional sample plan, 6 samples are assigned randomly to 12 modules, not taking into account the influence of failure propagation. Conversely, the proposed method can reasonably choose 6 samples under the consideration for failure propagation. Through experiment, the proposed method has better failure coverage than traditional one.

4. Conclusion

This paper proposes a new failure sample selection method to cover the shortage of the traditional sample selection. First of all, we use the DG and ACO to obtain a maximum probability failure propagation path based on the intensity of edge. Then we proposed the new failure sample selection method on the basis of the subsequent failure propagation set. Compared with traditional sampling plan, this method is able to increase the coverage of failure due to establishing a relatively complete fault sample set through focusing on the propagation failure and a case study is given to demonstrate that it can decrease the risk of using.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported in part by the Fundamental Research Funds for the Central Universities of China (Grant no. ZYGX2015J074), Science and Technology Support Project of Sichuan Province, China (2014FZ0037, 2015FZ0111), and Support Project of CDTU (KY1311018B).