Table of Contents Author Guidelines Submit a Manuscript
Science and Technology of Nuclear Installations
Volume 2015, Article ID 693891, 29 pages
http://dx.doi.org/10.1155/2015/693891
Research Article

Risk-Based Clustering for Near Misses Identification in Integrated Deterministic and Probabilistic Safety Analysis

1Energy Department, Politecnico di Milano, Via La Masa 34, 20156 Milano, Italy
2Chair on System Science and Energetic Challenge, Fondation EDF-Electricite de France, Ecole Centrale Paris and Supelec, Chatenay-Malabry Cedex, 92295 Paris, France

Received 16 December 2014; Accepted 16 February 2015

Academic Editor: Arkady Serikov

Copyright © 2015 Francesco Di Maio et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In integrated deterministic and probabilistic safety analysis (IDPSA), safe scenarios and prime implicants (PIs) are generated by simulation. In this paper, we propose a novel postprocessing method, which resorts to a risk-based clustering method for identifying Near Misses among the safe scenarios. This is important because the possibility of recovering these combinations of failures within a tolerable grace time allows avoiding deviations to accident and, thus, reducing the downtime (and the risk) of the system. The postprocessing risk-significant features for the clustering are extracted from the following: (i) the probability of a scenario to develop into an accidental scenario, (ii) the severity of the consequences that the developing scenario would cause to the system, and (iii) the combination of (i) and (ii) into the overall risk of the developing scenario. The optimal selection of the extracted features is done by a wrapper approach, whereby a modified binary differential evolution (MBDE) embeds a -means clustering algorithm. The characteristics of the Near Misses scenarios are identified solving a multiobjective optimization problem, using the Hamming distance as a measure of similarity. The feasibility of the analysis is shown with respect to fault scenarios in a dynamic steam generator (SG) of a nuclear power plant (NPP).

1. Introduction

Integrated deterministic and probabilistic safety analysis (IDPSA) attempts to overcome some limitations of deterministic safety analysis (DSA) and probabilistic safety analysis (PSA). The former is solidly founded on the multibarrier and defense-in-depth concepts and aims at verifying the capability of a nuclear power plant (NPP) to withstand a set of postulated design basis accidents (DBA) [1, 2]. To account for the uncertainties in the model representation of the actual plant behavior, conservatism is introduced in the calculations by thermal-hydraulics (TH) codes under DBA conditions [3]. The latter aims at considering a wider set of possible accidental scenarios and includes the quantification of accident probabilities [4, 5].

Both DSA and PSA are scenario-based analyses, where scenario selection and definition are done by expert judgment. State of the art of DSA and PSA approaches can provide relevant and important insights into what is already known to be an “issue,” but they are not capable of revealing what, and to what extent, is not known (i.e., scenarios which are not expert-selected in the DSA and PSA inputs), with the risk of neglecting or underestimating potentially dangerous scenarios [6]. This is due to the difficulties of the static structure of the classic DSA and PSA approaches in treating dynamic variations that usually occur during the operational time of a process [7] due to (i) stochastic disturbances (e.g., equipment failures), (ii) deterministic plant responses (i.e., transients), (iii) controls, and (iv) operator actions [6, 8, 9]. Indeed, the order and timing of the events occurring along a scenario and the values of the process variables at the time of event occurrence are critical in determining the evolution of the scenario itself [10].

The development and application of IDPSA in practice must meet the challenge of computational complexity, in both model construction and implementation and in postprocessing for the retrieval of the relevant information from the scenario outcomes. The number of dynamic scenario branches generated in IDPSA increases in power law with the number of occurring events and, thus, is much larger than in classical PSA based on event trees (ET) and fault trees (FT). The a posteriori information retrieval (postprocessing) then becomes quite burdensome and difficult [11, 12]. Continuous event trees (CETs) [13, 14] and dynamic event trees (DETs) [15, 16] provide realistic frameworks for IDPSA. However, their application is limited by their computationally intensive nature, by the need of tailoring the algorithms to the system under consideration and by the need of processing a massive amount of data for any single initiating event considered [17].

Postprocessing, in general, consists in classifying the generated dynamic scenarios into safe scenarios and prime implicants (PIs), that is, sequences of events that represent minimal combinations of accident failures necessary for system failure and cannot be covered by more general implicants [18]. Among the safe scenarios, Near Misses are important scenarios to be identified, because they are those sequences of events that reach values of the safety parameters close to, but not exceeding, the corresponding acceptable thresholds [19]. They can, thus, be relevant contributors to the “hidden” risk of the system and should not be neglected, as a small deviation may transform them into accidental scenarios.

In the literature, several authors introduce the concept of Near Misses as accident precursors [20, 21]. We here consider Near Misses as sequences of events that incidentally keep the system in a safe state but endangered and insecure. For the purpose of the analysis, they are here defined as sequences of events similar to those leading the system into fault conditions, except for one characteristic which is missing or is slightly different (e.g., sequence time lag, different failure magnitude, and different involved component in an event) [22].

The postprocessing analysis entails a “Forward” classification of the dynamic scenarios into classes, that is, safe, PIs, and Near Misses and a “Backward” identification of the similarities of the features of the scenarios (i.e., stochastic event occurrence and deterministic process variables values), which characterize the groups of Near Misses among the whole set of safe scenarios.

For the “Forward” classification of the Near Misses sequences, we look at two factors of risk: the probability of occurrence of an undesired event and the severity of the consequence caused by the event [23]. Thus, we describe the sequences of events by (i) the probability that the developing scenario is an accidental scenario, (ii) the consequence that the developing scenario can cause to the system, and (iii) the overall risk of the developing scenario that we compute synthetically as (expected consequence).

The optimal features for discerning the Near Misses from the safe scenarios are extracted from the profiles of , , and of the accidental scenarios and selected by a wrapper algorithm, which takes into account six statistical indicators of , , and , and, through a modified binary differential evolution (MBDE) optimization algorithm, selects the best features, which are fed to a -means clustering algorithm, which is a simple and well-known clustering algorithm (other classical clustering algorithms, such as mean-shift [24, 25] or fuzzy -means [19, 26]).

The outcomes of this “Forward” classification is, then, interpreted by a “Backward” identification of the similarities of the features of the Near Misses scenarios: the acquired knowledge can be exploited in an online integrated risk monitoring system that can rapidly detect the problem and set up a repair strategy of the occurring failures before the system reaches a fault state.

The proposed approach is illustrated with reference to scenarios occurring in the steam generator (SG) of a NPP [27]. We use multiple-valued logic (MVL) theory for modeling the behavior of the system, where timing and sequences of component failure events are determining the system behavior [4]. By using MVL, we increase the limited description capability of binary variables in modeling the different component operational states (e.g., a valve that can be closed, partially closed, or fully open or can fail at different times) and, therefore, perform an IDPSA postprocessing analysis on the whole set of simulated accidental scenarios [17].

The paper is organized as follows. In Section 2, the SG model used to generate the scenarios for the reliability analysis is presented [27], along with multistate representation of the system dynamics. In Section 3, the PIs are identified and the risk-based “Forward” and “Backward” Near Misses identification method is introduced with reference to the case study considered. In Section 4, conclusions and remarks are drawn.

2. Case Study

2.1. The U-Tube Steam Generator (UTSG) Model

The U-tube steam generator (UTSG) under consideration is sketched in Figure 1. The improper control of the water level, whose difficulties arise from nonminimum phase plant characteristics, that is, plant strong inverse response behavior, particularly at low operating power, due to the so-called “swell and shrink” effects [28], is a major cause of NPP unavailability [2830].

Figure 1: Schematic of the UTSG [31].

The reactor coolant enters the UTSG at the bottom and moves upward and then downward in the inverted U-tubes, transferring heat to the secondary fluid before exiting at the bottom. The secondary fluid, the feedwater (), enters the UTSG at the top of the downcomer, through the space between the tube bundle wrapper and the SG shell. The value of is regulated by a system of valves: a low flow rate valve, used when the operating power () is smaller than 15% of nominal power (), and a high flow rate valve when [27]. In the secondary side of the tube bundle, water heats up, reaches saturation, starts boiling, and turns into a two-phase mixture. The two-phase fluid moves up through the separator/riser section, where steam is separated from liquid water, and through the dryers, which ensure that the exiting steam () is essentially dry. The separated water is recirculated back to the downcomer. The balance between the exiting and the incoming governs the change in the water level in the SG. Because of the two-phase nature, two types of water level measurements are considered, as shown in Figure 1, each reflecting a different level concept: the narrow range level () is calculated by pressure difference between two points close to the water level and indicates the mixture level, whereas the wide range level () is calculated by pressure difference between the two extremities of the SG (steam dome and bottom of the downcomer) and indicates the collapsed liquid level that is related to the mass of water in the SG.

“Swell and shrink” phenomena are also modeled to reproduce the dynamic behavior of the SG: when increases, the steam pressure in the steam dome decreases and the two-phase fluid in the tube bundle expands causing to initially swell (i.e., rise), instead of decreasing as would have been expected by the mass balance; contrarily, if decreases or increases, a shrink effect occurs. A similar model has been presented in [27].

The is governed by and across the tube bundle region of the SG as shown by the following transfer function:where is the flow rate of the incoming water in the tube bundle, (2), is the equivalent steam-water mixture flow rate exiting the tube bundle region, (3), is a time constant that accounts for the dynamics.

The incoming water flow rate is proportional to :where the lag accounts for the feed-water valve dynamics and accounts for the water mass transportation dynamics: their values are reported in Table 1.

Table 1: Parameters of the UTSG model at different power levels [27].

The exiting steam-water mass is proportional to :where the first-order lag accounts for the elapsed time from the turbine steam demand and the increase of , and the nonminimum phase term accounts for the two-phase swell and shrink effects.

Combining (1), (2), and (3), is equal toand , that is, the overall water mass in the steam generator, iswhere is a time constant that accounts for the dynamics.

We assume and , and and ; the state space representation of the SG model is, thus, The values of the parameters , , ,, , and change depending on the power , as shown in Table 1.

The goal of the system is to maintain the SG water level at a reference position (): the SG fails if the rises (falls) above (below) the threshold (), in which case automatic reactor or turbine trips are triggered. Indeed, if the exceeds , the steam separator and dryer lose their functionality and excessive moisture is carried in , degrading the turbine blades profile and the turbine efficiency; if decreases below , insufficient cooling capability of the primary fluid occurs. Similarly, the is relevant for the cooling capability of the primary circuit [28]. Prealarms are triggered when exceeds () if a small deviation from occurs or when exceeds (), when the deviation is large. Set points of and of depend on , as shown in Figure 2, and, thus, also the alarms thresholds depend on . The set point is low at low , to partially account for the strong inverse response of [28]; thus, the low level thresholds are more restrictive than the high level thresholds at low .

Figure 2: Set point for at different power rate values.

A dedicated model has been implemented in SIMULINK to simulate the dynamic response of the UTSG at different values. Both feedforward and feedback digital control schemes have been adopted. The feedback controller is a PID that provides a flow rate resulting from the residuals between and , whereas the feedforward controller operates a safety relief valve that is opened if and only if exceeds the and removes a constant flow safety flow rate (). The block diagram representing the SIMULINK model of the SG is shown in Figure 3: the controlled variable is , whereas the control variable is .

Figure 3: Block diagram representing the SIMULINK model of the SG.
2.2. The Set of Possible Failures

The set of multiple component failures that can occur during the system life are shown in Figure 4.(1)The outlet steam valve can fail stuck at a random time in  (s) in three different positions: (i) closed; (ii) stuck open at 50% of the nominal that should be provided at ; (iii) stuck open at 150% of the nominal that should be provided at .(2)The safety relief valve can fail stuck at a random time in  (s) at a uniform random value in the range  (kg/s).(3)The communication between the sensor that monitors and the PID controller can fail at random times in  (s), in which case the PID is provided with the same input value of the previous time step.(4)The PID controller can fail stuck at random times in  (s), providing a uniform random flow rate belonging to of the nominal that should be provided at .It is worth noticing that in the UTSG there are two PID controllers and, thus, two communications between the sensors measuring and the PIDs (one for high power feedback control and the other for low power feedback control). The selective action of the PIDs depending on hides some of the failures. For example, if the power profile of the scenario under investigation is a ramp, both PIDs are called in operation: if anyone (or both) failed, their fault state is detectable. On the contrary, if we consider scenarios with constant power profile, for example, low power rate (), the occurrence of a high power feedback control failure cannot be detected, and, thus, the fault remains hidden.

Figure 4: Sketch of the failures that can be injected into the system.

Choices and hypotheses for modeling the failures (i.e., the mission time, the number and type of faults, the distributions of failure times, and magnitudes) have been arbitrarily made with the aim of generating multiple failures in the sequences and capturing the dynamic influence of their order, timing, and magnitude. The choice of a mission time () equal to 4000 (s) has been made, because it is a long enough interval of time to allow the complete development also of slow dynamic accident scenarios.

2.3. The Multistate Representation of System Dynamics

For realistically treating the dynamic behavior of the UTSG when component failures occur, we go beyond the binary state representation and adopt a multiple value logic (MVL) [17, 32] for an approximated description of the continuous time of occurrence of component failures and their magnitude. The MVL allows describing that the components can fail at any (discrete) time (not only the initial time) along the scenario, with different (discrete) magnitudes (not only the most conservative). The discretization of the time and magnitudes values is as follows:time discretization: we use the labels , , , and , for failures occurring in the intervals  (s),  (s),  (s), and  (s), respectively; if the label , the component does not fail within the time of the whole scenario, ;magnitude discretization:the steam valve magnitude is indicated as 1, 2, or 3 for failure states corresponding to stuck at 0%, stuck at 50%, and stuck at 150% of the value that should be provided at , respectively; if the steam valve magnitude is indicated as 0, the component does not fail in ;the safety relief valve fails with magnitude indicated as 1, 2, 3, and 4, if it is stuck between  (kg/s),  (kg/s),  (kg/s), and  (kg/s), respectively; if the safety relief valve magnitude is indicated as 0, the component does not fail in ;the communication between the sensor measuring and the PID controller is labelled 0 if the communication works, 1 otherwise;the PID controller failure magnitude range is discretized into 8 equally spaced magnitude intervals, labelled from 1 to 8, representative of failure states corresponding to discrete intervals of output value belonging to of the value that should be provided at ; if the PID controller magnitude is labelled as 0, the component does not fail in .The values of time and magnitude and order of failure occurrence for each component are included into a sequence vector that represents a scenario. As an example, the sequence vector of Figure 5 represents a scenario where the steam valve fails stuck at its maximum allowable value at a time in  (s) and it is the third event occurring along the sequence; the safety relief valve fails first in  (s), with a magnitude belonging to  (kg/s); the communication between the sensor measuring and the PID controller is the second failure event in the sequence and occurs in  (s); finally, the PID controller fails stuck in  (s), with a magnitude belonging to of the value that should be provided at .

Figure 5: Sequence vector representing a scenario.

The number of possible sequence vectors that arise from the MVL discretization is 100509, each one evolving towards either safe or faulty conditions. To investigate this, a Monte Carlo-driven fault injection engine is used to sample combinations of discrete times and discrete magnitudes of components failures.

The (dynamic) analysis has been performed with respect to the two constant power scenarios, 5% (low power level) and 80% (high power level). The system configurations considered are listed in Table 2.

Table 2: System configurations.

The dynamic analysis shows that the same combination of components failures does not unequivocally lead to only one system end state but rather it depends on when the failures occur and with what magnitude. This is shown in Figure 6, where the frequencies of occurrence of the three system end states (“High,” “Safe,” and “Low”) are plotted for the 16 dynamic system configurations of Table 2.

Figure 6: Histograms for high power level (a) and low power level (b) of the frequencies of occurrence of the end states for each of the 16 system configurations of Table 2, simulated by sampling discrete failure times and magnitudes of components failures.

Figure 7 shows that, at high power operation, the timing of the events is quite important, because with the same system configuration but different times of failure occurrences, the system end state change. Specifically, in Figure 7(a), the safety valve fails stuck at 100% of after 1020 seconds and the communication between the sensor measuring and the PID controller fails at(i)1052 seconds (solid line),(ii)1063 seconds (dashed-dotted line).The two scenarios lead to low and high failure modes, respectively, whereas they would be considered as minimal cuts sets (MCS) in a static reliability analysis presented in Appendix A.

Figure 7: Example of dynamic system behavior at 80% .

Figure 7(b) shows the effects of different failures magnitudes on the system end state: the safety relief valve fails stuck in its maximum position at 2000 seconds, the communication fails at 2010 seconds, and the PID controller fails at 2020 seconds with two different magnitudes:(i)magnitude equal to 13% of the nominal value that should be provided at 80% (dashed-dotted line),(ii)magnitude equal to 12% of the nominal value that should be provided at 80% (solid line).The low power scenarios also present dynamic effects, as shown in Figure 8. In particular, Figure 8(a) shows the effects of the timing on the system end state: the safety relief valve fails stuck at  (kg/s) at 1005 (s) and the steam output valve fails stuck at 150% of the nominal value that should be provided at 5% at(i)1046 seconds (dashed-dotted line),(ii)1047 seconds (solid line).Figure 8(b) shows the effects of the order of components failure occurrence on the system end state: the safety relief valve fails stuck at  (kg/s) and the PID controller fails stuck at its minimum allowable value.(i)The PID controller failure is the first failure event along the sequence of events (dashed-dotted line).(ii)The safety relief valve failure is the first failure event along the sequence of events (solid line).Hereafter, without loss of generality, among the system configurations of Table 2, we focus only on the classification of the PIs and Near Misses of the high level failure mode at high power level ().

Figure 8: Example of dynamic system behavior at 5% .

3. Near Misses Identification

The Near Misses identification is here treated as a classification problem, in which Near Misses are sorted out from the safe scenarios, among the whole set of accidental transients simulated. In practice, the PIs are first identified among the whole set of 100509 possible scenarios and, then, the Near Misses are separated out among the remaining safe scenarios.

3.1. Prime Implicants Identification

A PI is a set of variables that represents a minimal combination of accident component failures necessary for system failure and cannot be covered by a more reduced implicant [17, 18]. Note that in our case the “PIs” identification task may consider noncoherent structure functions, for which both failed and working states of the same components can lead the system to failure. In such circumstances, traditional methods, for example, based on minimal cut sets analysis, cannot be applied, whereas dynamic reliability methods need to be applied for the identification of the PIs [33, 34].

The PIs identification among the whole set of 100509 possible scenarios is performed by means of the visual interactive method presented in [34]. The basic idea it relies on is that PIs are those scenarios with as few as possible events that are capable of leading the system into a failure state [35]; then, we first select as most important feature for the PIs identification the literal cost of the sequence vector (i.e., the number of components whose behavior is specified in the accident sequence) and then the accident sequences associated with the lowest literal cost are selected and stored as PIs. In fact, these are the most reduced sequences (i.e., with least number of events) that cannot be covered by any other implicant, and, thus, these are PIs by definition. The selected PIs and the implicants covered by them are deleted from the set of implicants and the procedure is repeated for the remaining implicants until all are covered. By so doing, 1255 PIs are identified for the high level failure mode, covering 36128 minterms. The total computational time approximately required for the identification of the PIs is 780 (s) on an Intel Core 2 Duo T9300 CPU @2.50 GHz.

3.2. The “Forward” Classification

Once the (1255) PIs for the SG high level failure mode have been identified, they are removed from the set of all possible scenarios, which is left with 64381 safe scenarios. For the identification of Near Misses among these, we resort to their definition as sequences of failure events that indeed keep the system in a safe condition but endangered (i.e., a quasifault system state). To this aim, we introduce a risk-based characterization of these remaining scenarios, calculating their associated risk, at each time instant , as [23]where is the probability that at time the scenario can lead the system into an accidental scenario and is the consequence that the developing scenario would cause to the system.

In this view, we build a functional relationship such that increases as moves further away from the reference level , in a way that if is equal to and if reaches . Such relationship is given in (8) below, assuming that scenarios whose approaches are more prone to failure than those with close to ; that is, (8) “filters out” (i.e., neglects) scenarios whose is close to and “mines” (i.e., weighs more) scenarios whose is close to :where is the cumulative probability function of the Gaussian distribution with mean , and standard deviation . Figure 9 shows the trend of .

Figure 9: Probability function for the definition of risk.

The consequence of a scenario increases as approaches the failure threshold , and can be calculated at time as [23] where is the intensity coefficient that accounts for the closeness of to the thresholds , , and , and for the exceedance time between the first event of the failure sequence (hereafter called initiating event (IE)) and the time of exceeding the threshold: the shorter this time, the more critical the scenario. Thus, the larger is, the faster and closer approaches a threshold; we assume (no consequences) if no threshold is exceeded; (low consequences) if exceeds after at least 2001 (s) from IE or if exceeds after at least 3001 (s) from IE; (medium consequences) if exceeds within 2000 (s) from IE, if exceeds and the elapsed time is in  (s), and if exceeds after at least 2001 (s) from IE; (catastrophic consequences) if exceeds within 1000 (s) from IE or if exceeds and the elapsed time from IE is in  (s). A matrix representation of the intensity coefficient is shown in Figure 10.

Figure 10: Matrix representation of the intensity coefficient .

By so doing, the available 64381 remaining safe scenarios are fully described at each time instant  [s] by their values of probability , consequence , and overall risk . An example of the , , and evolutions for two generic trends of is shown in Figure 11. More specifically, the behaviors represented in Figure 11(a) are due to(i)solid line: the PID controller fails at 100 (s) with magnitude 4 and the safety relief valve fails at 190 (s) with magnitude 2;(ii)dashed-dotted line: the safety relief valve fails at 100 (s) with magnitude 1, the communication between the sensor measuring and the PID controller is interrupted at 136 (s), and the PID controller fails at 3917 (s) with magnitude 5.It is worth analysing the behavior of , , and, thus, (Figures 11(b), 11(c), and 11(d), resp.): all three abovementioned functions increase as moves further away from and decrease as approaches . The steps shown in the consequences and risk plots (around 800 [s] for the solid line scenario and around 3500 [s] for the dashed-dotted line scenario) are due to the change of the discrete consequence intensity coefficient along the scenarios. The solid line scenario is faster than the dashed-dotted line scenario (upper plot) and, thus, the value of the parameters for the former scenario is 400 (catastrophic consequences, see Figure 10), due to the fact that exceeds within 1000 (s), whereas, (medium consequences, see Figure 10) for the dashed-dotted scenarios, because exceeds within  (s). Thus, the solid line scenario is more abrupt in its development towards failure and expected to have more catastrophic consequences, and, thus, more overall risk, than the dashed-dotted scenario, because the time between IE and the exceedance of is shorter (i.e, less grace time).

Figure 11: Probability , consequences , and risk for two sequences of events.
3.2.1. Features Selection

The identification of the Near Misses is treated as an unsupervised classification problem and addressed by clustering, where (i) the number of clusters is unknown and (ii) the features that enable the best clustering according to the risk-based characteristic profiles of , , and of the accidental scenarios are unknown. Unsupervised clustering, thus, entails identifying the number of clusters in which similar scenarios can be grouped according to similar values of some scenario features. To do this, from the profiles , , and , we extract some statistical indicators as features [36]:(1)mean value:(2)peak value:(3)standard deviation:(4)root mean square:(5)skewness:(6)kurtosis:where is alternatively equal to , , and and, thus, the total number of features is equal to . Among these 18 available features, we search for those that are optimal for clustering the 64381 scenarios in Near Misses and safe scenarios.

We resort to a wrapper framework [37, 38], whereby a modified binary differential evolution (MBDE) search engine [33, 39] searches candidate groups of features sets that are fed to a -means clustering algorithm [40]; eventually, the wrapper evolves so that among these candidate groups, the group retained is that which makes the -means clustering algorithm perform best (most compact and separate clusters). The idea behind the wrapper approach is shown in Figure 12. During the features search by MBDE, the -means clustering is run on the (training) safe scenarios with sets of features () that are randomly selected by the MBDE algorithm. The optimal number () of clusters is also unknown and it is determined by looking at the clustering performance obtained by the -means with reference to the Calinski-Harabasz (CH) index [41], which accounts for the ratio of the overall between-cluster variance (separation) and the overall within-cluster variance (compactness). The search proceeds iteratively until the CH index is maximised and the number of clusters is fixed. Then, the results of the wrapper algorithm are evaluated on an independent test set (), that is, the safe scenarios that have been left out during the training phase.

Figure 12: Wrapper approach for optimal feature subset selection based on a MBDE optimization algorithm and a -means classifier.

The CH index for a number of clusters, is equal to [41]where is the overall between-cluster variance:and is the overall within-cluster variance:where is the number of scenarios assigned to the th cluster, is the centroid of the th cluster, that is, the mean of the selected features belonging to the th cluster, is the mean of the selected features, and and are the norms, that is, Euclidean distances, between the two vectors.

The optimal features selection provides as best features the standard deviation of , the standard deviation of , and the root mean square of ; the best performance is obtained with and .

3.2.2. The Clustering Results

The obtained clusters of the safe scenarios are shown in Figure 13 with reference to the features of mean risk () and time elapsed from the instant at which starts to deviate from zero, that is, the time interval during which the system is exposed to risk. The rationale behind this choice is that the larger and the longer , the more dangerous the scenarios. In Figure 13, clusters 3, 4, and 5 (triangles, crosses, and squares, resp.) are well separated, that is, the low level risk scenarios clusters are widened by the adoption of (8) for the quantification of the risk profile . It is possible to distinguish the scenarios having the lowest risk level from the scenarios having low risk level, and, thus, the highest risk scenarios are well separated from the lower risk scenarios. The good performance obtained when (8) is adopted instead of other profiles, for example, linear probability function () that would give the same importance to any level , for the quantification of the risk profile , is due to the fact that (8) “filters out” (i.e., neglects) scenarios whose is close to and “mines” (i.e., weighs more) scenarios whose is close to : the 332 circles in Figure 13 (listed in Appendix B) can, thus, be considered the Near Misses scenarios, that is, scenarios that incidentally keep the system into safe state, although in endangered and insecure operational conditions.

Figure 13: Clustering results.
3.3. The “Backward” Approach

Once the Near Misses for the SG high level failure mode have been identified by clustering, we can search for similarities among them in terms of their multiple value sequences, that is, order and timing of event occurrences and deterministic process variables values. This “Backward” approach can lead us to finding the minimum conditions, that is, minimum and minimum , that lead the system into a quasifault state. The problem can be framed as a multiobjective optimization problem (MOP) [42] that looks for the set of scenarios to dominate any other scenarios with respect to the fitness function :where and are the objectives functions of the defined MOP, that is, minimum and , respectively. The solution of the MOP of (19) is the Pareto set shown in Figure 14, where 12 solutions are plotted (squares lined by continuous line) and listed in Table 3. These scenarios of minimum are expected to cover all failure of Near Misses scenarios cluster.

Table 3: List of the Pareto-optimal sequence vectors.
Figure 14: Pareto front for the cluster of Near Misses.

The coverage can be verified by, first, identifying the most similar characteristics of the sequence vectors belonging to the Near Misses cluster with the Pareto set scenarios , and, then, by solving a set covering problem (SCP) [43, 44].

The most similar characteristics can be computed by coverage vectors (one for each scenario belonging to ): this entails calculating the Hamming distance [45] between each sequence vectors in and each one of the other sequence vectors in the Near Misses cluster [46]. The entries of the coverage vector (in our case twelve entries, one for time, magnitude, and order of occurrence of each component failure; see Figure 5) are increased if the Hamming distance between one same entry of the considered scenario belonging to and of the Near Misses vectors is equal to zero, as shown, without loss of generality, in Figure 15 for 1 sequence vector of and only 2 Near Misses vectors.

Figure 15: Coverage vector computation by Hamming distance.

Table 4 lists the 12 coverage vectors, where each entry is the percentage of Near Misses vectors having the same stochastic behavior of the optimal set shown in Table 3. It can be seen that, for each scenario belonging to , columns 8, 11, and 12 (e.g., sensor-PID communication failure magnitude, PID failure magnitude, and PID order of failure, resp.) have the largest values of the coverage vectors: this means that the majority of the sequence vectors of the Near Misses clusters can be well represented by (only) these failures. Furthermore, the analysis of the MVL values of the scenarios belonging to (Table 3) where the largest coverage values of these columns are registered (i.e, 87%, 98.5%, and 85.2% for columns 8, 11, and 12, resp.) highlights that these failures are characterized by the same MVL values that can be summarized as follows:(i)the failure of the communication between the sensor monitoring the and the PID controller;(ii)the failure of the PID controller with magnitude belonging to of the value that should be provided at , that is, magnitude equal to 4 in MVL framework, and it is the first accident occurring along the sequence of events in over 85% of the Near Misses scenarios.A SCP can, thus, be solved for verifying that these latest characteristics are the minimum set of stochastic event occurrences and deterministic process variables values of that exhaustively describe the scenarios belonging to the Near Misses cluster: if a Near Miss sequence vector is characterized by (at least) one of the common characteristics, this is covered by the optimal set . In the present application we have verified that all the scenarios belonging to the identified Near Misses cluster are covered by the minimal conditions that lead the system into a quasifault state, that is, the optimal set . In conclusion, the occurrence of one of the common characteristics listed above is sufficient to lead the system into endangered and insecure operational conditions.

Table 4: List of coverage vectors for each scenario belonging to the Pareto set .

4. Conclusions

In this paper, a risk-based clustering approach for Near Misses identification has been proposed. The approach includes a risk-based feature selection task, where by each safe scenario it is described in terms of probability, consequence, and overall risk. The optimal features set is identified by a wrapper approach based on the combination of a MBDE algorithm with -means clustering. The characteristics of the Near Misses scenarios are, then, identified solving a multiobjective optimization problem and Hamming distance as a measure of similarity.

The application of the approach to a case study of IDPSA of a UTSG has shown the possibility of retrieving relevant information for risk monitoring.

Appendices

A. Static Reliability Analysis

For a static reliability analysis of the UTSG, we conservatively assume that component failures occur at the beginning of the scenario, with magnitudes equal to their extreme (either maximum or minimum) plausible values [19]. We analyze the dynamic response of the system at constant values ( and ) and identify the minimal cuts sets (MCS) with respect to the low and high level failure modes. Considering the binary, safe or faulty, states of the 6 components (component state is 0 if it works and 1 if it is failed), the number of possible system configurations is equal to 26. However, many configurations are not detectable in constant power scenarios, for example, simultaneous occurrence of low and high power communication failures, whereas some others are not important when event occurrence timing is not considered; for example, PID and communication failures occur simultaneously, because, in this case, the feedback control output would always be the same as a stand-alone PID failure. Thus, the possible system configurations to be considered in a static analysis with constant power is equal to 12 for each power level (Table 5).

Table 5: Possible system configurations to be considered in the static reliability analysis with constant power profile.

To identify the system MCS, the different system configurations of Table 5 have been simulated by the SIMULINK model, at low and at high (constant) power levels. It turns out that the MCSs for the high level failure mode are the same at both power levels (Figure 16): the failures of the PID controller at its minimum values (i.e., −18% of the nominal that should be provided at ) and of the steam valve at its maximum value (i.e., 150% of the nominal value that should be provided at ) are two first-order MCS. The evolutions when these MCSs occur are shown in Figures 17 and 18.

Figure 16: Fault tree for the high level failure mode.
Figure 17: evolution when the PID controller output is stuck at time at the minimum allowable value of −18% of nominal that should be provided at 5% (a) and at 80% (b).
Figure 18: evolution when the steam valve fails stuck at time at the maximum allowable value of 150% of nominal that should be provided at 5% (a) and at 80% (b).

The analysis of the low level failure mode provides different MCSs at different . At 5% , there are three first-order MCSs represented by the following: (i) safety valve fails stuck at the maximum allowable value: that is,  (kg/s); (ii) steam valve fails stuck closed; (iii) PID controller fails stuck at its maximum values, (i.e., 18% of the nominal value that should be provided at 5% ). The evolution when these MCSs occur and the relative FT are shown in Figures 19 and 20, respectively.

Figure 19: (a) evolution when the safety relief valve fails stuck with  (kg/s); (b) the steam valve fails stuck closed; (c) the PID controller fails stuck at 18% of nominal that should be provided at 5% .
Figure 20: Fault tree for the low level failure mode at low power.

At 80% , three MCSs are found: (i) a second-order MCS that combines the failure of the safety relief valve at its maximum allowable value: that is,  (kg/s), and the failure of the communication, (ii) the steam valve failure in a closed position, and (iii) the PID controller fails at its maximum value (i.e., 18% of the nominal value that should be provided at 80% ). The evolution when these MCSs occur and the relative FT are shown in Figures 21 and 22, respectively.

Figure 21: (a) evolution when simultaneously the safety relief valve fails stuck with  (kg/s) and the communication fails; (b) the steam valve fails stuck closed; (c) the PID controller fails stuck at 18% of the nominal that should be provided at 80% .
Figure 22: Fault tree for the low level failure mode at high power.

B. Near Misses Sequence Vector Scenarios

See Table 6.

Table 6

Abbreviations

CET:Continuous event tree
CH:Calinski-Harabasz index
DBA:Design basis accident
DET:Dynamic event tree
DSA:Deterministic safety analysis
ET:Event tree
FT:Fault tree
IDPSA:Integrated deterministic and probabilistic safety analysis
IE:Initiating event
MBDE:Modified binary differential evolution
MCS:Minimal cuts set
MOP:Multiobjective optimization problem
MVL:Multiple-valued logic
NPP:Nuclear power plant
PIs:Prime implicants
PSA:Probabilistic safety analysis
SCP:Set covering problem
SG:Steam generator
TH:Thermal-hydraulics
UTSG:U-tube steam generator.
Symbols
:Probability that the developing scenario is an accidental scenario
:Consequence that the developing scenario can cause to the system
:Overall risk of the developing scenario
:Time instant
:Probability that at time the scenario can lead the system into an accidental scenario
:Consequence that at time the developing scenario is predicted to cause to the system
:Overall risk of the developing scenario at time
:Flow rate of fresh feed-water entering the steam generator
:Operating power
:Nominal power
:Flow rate of dry steam exiting the steam generator
:Narrow range steam generator water level
:Wide range steam generator water level
:Time constant for the dynamics
:Flow rate of incoming water in steam generator tube bundle region
:Time constant for the water mass transportation dynamics
:Time constant for the feed-water valve dynamics
:Flow rate of steam-water mixture exiting the steam generator tube bundle region
:Time constant for the dynamics relating to
:Constant in the nonminimum phase term of the dynamics relating to
:Time constant for the dynamics
:System state
:Derivative of system state
:Narrow range steam generator water level at a reference position
:Automatic reactor trip threshold
:Turbine trip threshold
:First prealarm automatic reactor trip threshold
:First prealarm turbine trip threshold
:Second prealarm automatic reactor trip threshold
:First prealarm turbine trip threshold
:Water flow rate provided by PID controller
:Water flow rate removed by safety valve
:Mission time
:Time steps in MVL discretization
:Cumulative probability function of the Gaussian distribution
:Mean value of the Gaussian distribution
:Standard deviation of the Gaussian distribution
:Intensity coefficient
:Number of clusters
:Index of the profile of , , and
:Mean value of the th profile
:Peak value of the th profile
:Standard deviation of the th profile
:Root mean square of the th profile
:Skewness of the th profile
:Kurtosis of the th profile
:Number of scenarios to the training set
:Dimension of the set of features
:Number of scenarios to the test set
:Overall between-cluster variance
:Overall within-cluster variance
:Number of scenarios assigned to the th cluster
:Generic scenario
:Centroid of the th cluster
:Mean risk of the clustered scenarios
:Time elapsed from the instant at which starts to deviate from zero of the clustered scenarios
:Fitness function of the MOP
:First objective function of the MOP
:Second objective function of the MOP
:Sequence vector belonging to the Pareto set of the MOP.

Conflict of Interests

The authors declare no conflict of interests.

References

  1. D. G. Kang, S.-H. Ahn, and S. H. Chang, “A combined deterministic and probabilistic procedure for safety assessment of beyond design basis accidents in nuclear power plant: application to ECCS performance assessment for design basis LOCA redefinition,” Nuclear Engineering and Design, vol. 260, pp. 165–174, 2013. View at Publisher · View at Google Scholar · View at Scopus
  2. E. Zio and F. Di Maio, “The needs and dreams for methodologies of IDPSA,” in Proceedings of the Integrated Deterministic and Probabilistic Safety Analyses Workshop, KTH, Stockholm, Sweden, November 2012.
  3. E. Zio, F. Di Maio, and J. Tong, “Safety margins confidence estimation for a passive residual heat removal system,” Reliability Engineering and System Safety, vol. 95, no. 8, pp. 828–836, 2010. View at Publisher · View at Google Scholar · View at Scopus
  4. T. Aldemir, “A survey of dynamic methodologies for probabilistic safety assessment of nuclear power plants,” Annals of Nuclear Energy, vol. 52, pp. 113–124, 2013. View at Publisher · View at Google Scholar · View at Scopus
  5. W. Keller and M. Modarres, “A historical overview of probabilistic risk assessment development and its use in the nuclear power industry: a tribute to the late Professor Norman Carl Rasmussen,” Reliability Engineering and System Safety, vol. 89, no. 3, pp. 271–285, 2005. View at Publisher · View at Google Scholar · View at Scopus
  6. Y. Vorobyev and P. Kudinov, “Development and application of a genetic algorithm based dynamic pra methodology to plant vulnerability search,” in International Topical Meeting on Probabilistic Safety Assessment and Analysis, vol. 1, pp. 559–573, 2011.
  7. N. Khakzad, F. Khan, and P. Amyotte, “Dynamic risk analysis using bow-tie approach,” Reliability Engineering and System Safety, vol. 104, pp. 36–44, 2012. View at Publisher · View at Google Scholar · View at Scopus
  8. M. Marseguerra and E. Zio, “Monte Carlo approach to PSA for dynamic process systems,” Reliability Engineering and System Safety, vol. 52, no. 3, pp. 227–241, 1996. View at Publisher · View at Google Scholar · View at Scopus
  9. J. Kirschenbaum, P. Bucci, M. Stovsky et al., “A benchmark system for comparing reliability modeling approaches for digital instrumentation and control systems,” Nuclear Technology, vol. 165, no. 1, pp. 53–95, 2009. View at Google Scholar · View at Scopus
  10. T. Aldemir, S. Guarro, J. Kirschenbaum et al., “A Benchmark implementation of two dynamic methodologies for the reliability modeling of digital instrumentation and control systems,” NUREG-CR Report Draft, 2008. View at Google Scholar
  11. P. E. Labeau, C. Smidts, and S. Swaminathan, “Dynamic reliability: towards an integrated platform for probabilistic risk assessment,” Reliability Engineering and System Safety, vol. 68, no. 3, pp. 219–254, 2000. View at Publisher · View at Google Scholar · View at Scopus
  12. E. Zio, “Integrated deterministic and probabilistic safety assessment: concepts, challenges, research directions,” Nuclear Engineering and Design, vol. 280, pp. 413–419, 2014. View at Publisher · View at Google Scholar
  13. J. Devooght and C. Smidts, “Probabilistic reactor dynamics I: the theory of continuous event trees,” Nuclear Science and Engineering, vol. 111, no. 3, pp. 229–240, 1992. View at Google Scholar · View at Scopus
  14. V. Kopustinskas, J. Augutis, and S. Rimkevičius, “Dynamic reliability and risk assessment of the accident localization system of the Ignalina NPP RBMK-1500 reactor,” Reliability Engineering and System Safety, vol. 87, no. 1, pp. 77–87, 2005. View at Publisher · View at Google Scholar · View at Scopus
  15. E. Hofer, M. Kloos, B. Krzykacz-Hausmann, J. Peschke, and M. Sonnenkalb, Dynamic Event Trees for Probabilistic Safety Analysis, Gesellschaft für Anlagen- und Reaktorsicherheit, Garsching, Germany, 2004.
  16. A. Hakobyan, R. Denning, T. Aldemir, S. Dunagan, and D. Kunsman, “A methodology for generating dynamic accident progression event trees for level 2 PRA,” Tech. Rep. SAND2008-4746, Sandia National Laboratories, Albuquerque, NM, USA, 2008. View at Google Scholar
  17. F. Di Maio, S. Baronchelli, and E. Zio, “A computational framework for prime implicants identification in non-coherent dynamic systems,” Risk Analysis, vol. 35, no. 1, pp. 142–156, 2015. View at Publisher · View at Google Scholar
  18. W. V. Quine, “The problem of simplifying truth functions,” The American Mathematical Monthly, vol. 59, pp. 521–531, 1952. View at Publisher · View at Google Scholar · View at MathSciNet
  19. E. Zio and F. D. Maio, “Processing dynamic scenarios from a reliability analysis of a nuclear power plant digital instrumentation and control system,” Annals of Nuclear Energy, vol. 36, no. 9, pp. 1386–1399, 2009. View at Publisher · View at Google Scholar · View at Scopus
  20. V. M. Bier and W. Yi, “The performance of precursor-based estimators for rare event frequencies,” Reliability Engineering and System Safety, vol. 50, no. 3, pp. 241–251, 1995. View at Publisher · View at Google Scholar · View at Scopus
  21. J. W. Johnson and D. M. Rasmuson, “The US NRC's accident sequence precursor program: an overview and development of a Bayesian approach to estimate core damage frequency using precursor information,” Reliability Engineering and System Safety, vol. 53, no. 2, pp. 205–216, 1996. View at Publisher · View at Google Scholar · View at Scopus
  22. J. H. Saleh, E. A. Saltmarsh, F. M. Favarò, and L. Brevault, “Accident precursors, near misses, and warning signs: critical review and formal definitions within the framework of Discrete Event Systems,” Reliability Engineering and System Safety, vol. 114, no. 1, pp. 148–154, 2013. View at Publisher · View at Google Scholar · View at Scopus
  23. O. Zadakbar, S. Imtiaz, and F. Khan, “Dynamic risk assessment and fault detection using a multivariate technique,” Process Safety Progress, vol. 32, no. 4, pp. 365–375, 2013. View at Publisher · View at Google Scholar · View at Scopus
  24. K. Fukunaga and L. D. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition,” IEEE Transactions on Information Theory, vol. 21, no. 1, pp. 32–40, 1975. View at Google Scholar · View at MathSciNet
  25. D. Mandelli, A. Yilmaz, K. Metzroth, T. Aldemir, and R. Denning, “Scenario aggregation and analysis via Mean-Shift Methodology,” in Proceedings of the International Congress on Advances in Nuclear Power Plants (ICAPP '10), vol. 2, pp. 987–991, June 2010. View at Scopus
  26. J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic, Norwell, Mass, USA, 1981. View at MathSciNet
  27. J. F. Aubry, G. Babykina, A. Barros et al., “Project APPRODYN: APPROches de la fiabilité DYNamique pour modéliser des systèmes critiques,” Tech. Rep., Collaboration CRAN, EDF R&D, INRIACQFD, UTT-ICD, 2012. View at Google Scholar
  28. M. V. Kothare, B. Mettler, M. Morari, P. Bendotti, and C.-M. Falinower, “Level control in the steam generator of a nuclear power plant,” IEEE Transactions on Control Systems Technology, vol. 8, no. 1, pp. 55–69, 2000. View at Publisher · View at Google Scholar · View at Scopus
  29. H. Habibiyan, S. Setayeshi, and H. Arab-Alibeik, “A fuzzy-gain-scheduled neural controller for nuclear steam generators,” Annals of Nuclear Energy, vol. 31, no. 15, pp. 1765–1781, 2004. View at Publisher · View at Google Scholar · View at Scopus
  30. M. Marseguerra, E. Zio, and F. Cadini, “Optimized adaptive fuzzy controller of the water level of a pressurized water reactor steam generator,” Nuclear Science and Engineering, vol. 155, no. 3, pp. 386–394, 2007. View at Google Scholar · View at Scopus
  31. [IAEA-TECDOC-981; 1997], “Assessment and management of ageing of major nuclear power plant component important to safety: Steam,” IAEA, IAEA-TECDOC-98, Vienna, 1997.
  32. S. F. Garribba, E. Guagnini, and P. Mussio, “Multiple-valued logic trees: meaning and prime implicants,” IEEE Transactions on Reliability, vol. 34, no. 5, pp. 463–472, 1985. View at Publisher · View at Google Scholar · View at Scopus
  33. F. Di Maio, S. Baronchelli, M. Vagnoli, and E. Zio, “Prime implicants determination by differential evolution for dynamic reliability analysis of non-coherent systems,” Reliability Engineering and System Safety. Under review.
  34. F. Di Maio, S. Baronchelli, and E. Zio, “A visual interactive method for prime implicants identification,” IEEE Transactions on Reliability, no. 99, pp. 1–11, 2014. View at Publisher · View at Google Scholar
  35. C. M. Rocco and M. Muselli, “A machine learning algorithm to estimate minimal cut and path sets from a Monte Carlo simulation,” in Probabilistic Safety Assessment and Management, pp. 3142–3147, Springer, London, UK, 2004. View at Publisher · View at Google Scholar
  36. F. Di Maio, J. Hu, P. Tse, M. Pecht, K. Tsui, and E. Zio, “Ensemble-approaches for clustering health status of oil sand pumps,” Expert Systems with Applications, vol. 39, no. 5, pp. 4847–4859, 2012. View at Publisher · View at Google Scholar · View at Scopus
  37. R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1-2, pp. 273–324, 1997. View at Publisher · View at Google Scholar · View at Scopus
  38. P. Baraldi, F. Cannarile, F. Di Maio, and E. Zio, “Hierarchical k-nearest neighbours classification and binary differential evolution for fault diagnostics of automotive bearings operating under variable conditions,” Reliability Engineering and System Safety. Under review.
  39. L. Wang, X. Fu, M. I. Menhas, and M. Fei, “A modified binary differential evolution algorithm,” in Life System Modeling and Intelligent Computing, vol. 6329 of Lecture Notes in Computer Science, pp. 49–57, Springer, Berlin, Germany, 2010. View at Publisher · View at Google Scholar
  40. J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297, Berkeley, Calif, USA, 1967.
  41. T. Calinski and J. Harabasz, “A dendrite method for cluster analysis,” Communications in Statistics, vol. 3, no. 1, pp. 1–27, 1974. View at Google Scholar
  42. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002. View at Publisher · View at Google Scholar · View at Scopus
  43. J. E. Beasley and P. C. Chu, “A genetic algorithm for the set covering problem,” European Journal of Operational Research, vol. 94, no. 2, pp. 392–404, 1996. View at Publisher · View at Google Scholar · View at Scopus
  44. F. Di Maio, S. Baronchelli, and E. Zio, “Hierarchical differential evolution for minimal cut sets identification: application to nuclear safety systems,” European Journal of Operational Research, vol. 238, no. 2, pp. 645–652, 2014. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  45. R. W. Hamming, “Error detecting and error correcting codes,” The Bell System Technical Journal, vol. 29, no. 2, pp. 147–160, 1950. View at Publisher · View at Google Scholar · View at MathSciNet
  46. A. Popa and J. J. McDowell, “The effect of Hamming distances in a computational model of selection by consequences,” Behavioural Processes, vol. 84, no. 1, pp. 428–434, 2010. View at Publisher · View at Google Scholar · View at Scopus