Research Article  Open Access
Automatic Emergence Detection in Complex Systems
Abstract
Complex systems consist of multiple interacting subsystems, whose nonlinear interactions can result in unanticipated (emergent) system events. Extant systems analysis approaches fail to detect such emergent properties, since they analyze each subsystem separately and arrive at decisions typically through linear aggregations of individual analysis results. In this paper, we propose a quantitative definition of emergence for complex systems. We also propose a framework to detect emergent properties given observations of its subsystems. This framework, based on a probabilistic graphical model called Bayesian Knowledge Bases (BKBs), learns individual subsystem dynamics from data, probabilistically and structurally fuses said dynamics into a single complex system dynamics, and detects emergent properties. Fusion is the central element of our approach to account for situations when a common variable may have different probabilistic distributions in different subsystems. We evaluate our detection performance against a baseline approach (Bayesian Network ensemble) on synthetic testbeds from UCI datasets. To do so, we also introduce a method to simulate and a metric to measure discrepancies that occur with shared/common variables. Experiments demonstrate that our framework outperforms the baseline. In addition, we demonstrate that this framework has uniform polynomial time complexity across all three learning, fusion, and reasoning procedures.
1. Introduction
Complex systems usually consist of multiple subsystems, whose nonlinear interactions can cause unpredictable and disastrous outcomes. However, it is intractable to analyze all possible outcomes in complex systems directly due to the combinatorial nature of this problem. Extant analysis approaches often build separate models for all subsystems and make conclusions about the entire system by linearly aggregating individual analysis results. This approach, although simple, cannot model emergence of complex systems. In fact, emergence is one of the most challenging concepts of complex systems. However, there exists significant discrepancy about the nature of emergence. Some researchers, such as Mill and John [1] and Broad [2], model complex systems using a layered approach, in which the world consists of different strata. Per this approach, higherlevel emergent properties result from lowerlevel causal interactions. Others, such as Wears et al. [3], study emergent properties by predictive approaches and claim that emergent properties are system level features that could not have been anticipated. Per this definition, emergent properties are those that cannot be predicted even by individuals who possess thorough knowledge of the parts of this system. Popper and Eccles [4] relate emergence to unpredictability by studying the nondeterminism within complex system. Another viewpoint identifies a spectrum of approaches to emergence. Bedau [5] distinguishes between weak and strong emergence. Per his definition, weak emergence can be derived from the knowledge of the system’s microdynamics and external conditions but only by simulation. Strong emergence, on the other hand, cannot be derived even by simulation.
No matter how emergence is defined, the consensus among these definitions is that emergence stems from the interaction of subsystems of a complex system. To model subsystem interactions and detect resulting emergence in a complex system, we need to model this complex system first. Extant complex systems modeling techniques can be classified into three groups: () subject matter experts (SMEs) manually analyze system dynamics and create a descriptive model, such as the model in [3]; () experts simulate system dynamics via agentbased complex systems model (ACS), such as the model in [6]; and () data scientists collect data about individual subsystems, learn subsystem models from data via machine learning approaches, and integrate subsystem models via ensemble methods, such as the model in [7]. The first approach is only useful to perform postevent analysis, since SMEs can only manually analyze event related scenarios from all possible scenarios, whose number is combinatorial. The second approach requires that experts manually build behavioral models for each agent and set up proper parameters. It is both timeconsuming and expensive to build such models for largescale complex systems. The third approach, even though easy to apply and suitable for large complex systems, cannot detect emergence, because ensemble methods integrate subsystem models by their outputs, neglecting their interactions among shared/common variables.
To overcome the drawbacks in extant approaches, we need a new framework that can automatically build complex system models from data, can detect emergence, and can scale to large complex systems simultaneously. The first requirement for the new framework is learning a complex system model from data automatically. Given a single dataset drawn from the entire complex system, we could simply learn a single model in hopes of capturing all interactions and then detect emergence within it. However, since largescale complex system usually consists of multiple (loosely) coupled (possibly competing) subsystems, it is impractical (and likely infeasible) to construct a single dataset which captures all its features and dynamics. We typically only have access to multiple datasets corresponding to different subsystems at best. Due to this limitation, we can only learn a separate model from each dataset for a subsystem and fuse them into one model via shared variables between different submodels. Ensemble methods learn separate models for different subsystems, but ensemble method infers on these models separately and chooses the (weighted) majority opinion as the final opinion. However, the true result may differ from the majority opinion.
We provide an alternative definition of emergence in complex systems derived as follows: Given some target variable, we query its state on the subsystem models learned from corresponding datasets and group their opinions into majority and minority sets. Then we observe its state at the entire system level. If its true state (observed over the entire system) is different from the majority opinion given by subsystems, we consider this situation as emergent. This is like the one given by predictive approaches in that it “cannot be predicted even by individuals who possess thorough knowledge of the parts of this system.”
As such, based on the existence of majority and minority opinions, we can define emergence as composed of four types. If all subsystems form a unanimous opinion, and the true result differs from it, we call it Type 1 emergence. If both majority and minority opinions exist, but the true result differs from both opinions, we call it Type 2 emergence. If both majority and minority opinions exist, and the true result is consistent with the minority opinion, we call it Type 3 emergence. If only minority opinions exist, but the true result differs from all minority opinions, we call it Type 4 emergence. This emergence definition is complete for a complex system with an arbitrary number of subsystems, if each subsystem can provide a valid opinion about the queried target. However, if some subsystem cannot provide direct opinion on target variable but can provide opinion about variables which also exist in other subsystems, its opinion will impact other subsystems’ opinions about target variable in an implicit manner. Even worse, if such feedback exists among these subsystems, we will not reach a conclusion easily. Such complex scenario will be studied in the future.
In this paper, we describe our approach to modeling and detecting emergence in complex systems according to our proposed definition of emergence. In brief, we first learn subsystem dynamics through a probabilistic graphical model called Bayesian Knowledge Bases (BKBs) [8] from observations on each subsystem. Then we fuse these BKBs into one BKB via the BKB fusion algorithm [9], which includes interactions among subsystems both probabilistically and structurally sound. We name the fused BKB as FBKB. Lastly, we perform belief updating on the fused BKB (FBKB) to detect emergence in this complex system. The entire framework, which consists of learning, fusing, and reasoning blocks, is named as the Bayesian Knowledge Fusion for Complex System (BKFCS).
Experiments on synthetic datasets show that our proposed method can detect emergence over extant approaches. We also show that our proposed algorithm has polynomial time complexity for all three phases of learning, fusion, and reasoning.
The contribution of this paper is twofold:(i)It defines four types of emergence in a complex system based on deviations from majority and minority opinions observed from each subsystem, derived from observations/datasets of its subsystems. This quantitative datadriven emergence definition is different from extant descriptive definitions of emergence in that it sets up a concrete boundary between different kinds of emergence. This unique quantitative approach is the first of its kind to the best of our knowledge.(ii)It designs an automatic emergence detection algorithm based on supervised machine learning techniques. This framework is built upon a probabilistic graphical model named Bayesian Knowledge Base, which not only detects the four types of emergence, but also traces back variable interactions resulting in emergence.
The rest of this paper is organized as follows: We begin with brief backgrounds on Bayesian Knowledge Bases (BKBs), learning BKBs from data, multiple BKB fusion, and belief updating on BKBs. These are the principle components used in our detection framework and algorithm. Next, in Section 3, we formally define emergence in complex systems according to our four proposed types, provide an illustrative example of a complex system, apply our emergence detection framework to this example, explore factors underlying emergence in complex systems with respect to our framework, and briefly recap the framework and its operation. Having established our framework, we detail our experiments and analyses on synthetically generated complex systems testbeds against extant approaches and our proposed factors and measures.
2. Background
This section first introduces Bayesian Knowledge Bases (BKBs), the building block of our proposed framework. Next, we summarize the BKB learning approach for subsystem data and describe how to fuse such multiple BKBs into one fused BKB, which represents subsystem interaction that can cause emergence. Lastly, we present how to run belief updating on a fused BKB and its role in emergence.
2.1. Bayesian Knowledge Bases (BKBs)
Before introducing BKBs, we would like to provide some intuitions behind our choice of building blocks for our framework. Researchers have proposed various methods and modeling strategies to explore different aspects of complex system. In this paper, since the research objective is to detect and explain emergence in complex systems, we opted for probabilistic graphical models, which are powerful tools to explore variable relationships and provide quantitative explanations. In fact, probabilistic graphical models such as Bayesian Networks (BN) [10] and Markov Random Fields (MRF) [11] have been widely applied to model causal relationships and/or interactions among variables in a system. Many researchers also proposed different methods to learn a Bayesian Network or Markov Random Field from data [12–20].
However, neither BNs or MRFs will serve our purpose well. In a MRF, variable connections are undirected, which cannot provide a causal relationship. However, one of our goals is to understand causal relationship in emergence. For BNs, extant methods of fusing multiple BNs into one BN have several drawbacks. First, if two BNs include contradictory information about variable causality direction, extant fusion algorithm requires compromise and consensus regarding this direction [12, 13], which results in unrecoverable information loss. Second, if two BNs contain incompatible variable distributions, a new distribution is created by merging them [12]. Unfortunately, this new distribution no longer represents the observed causal relationships found in the subsystems.
To solve these problems, we apply Bayesian Knowledge Bases [8] into our emergence detection framework. BKBs are an alternative to Bayesian Networks (BNs), by specifying dependence at the instantiation level (versus BNs that are specified only at the random variable level); by allowing for cycles between variables; and by loosening the requirements for specifying complete probability distributions. Figure 1 illustrates a simple BKB.
In general, a BKB is specified by a set of Inodes (instantiation nodes, rectangles), a set of Snodes (support nodes, circles), and edges between and , namely, the tuple . In a BKB, a variable is called a component (denoted as ). A BKB does not include an icon for a component; instead it represents all instantiations/states of a component with multiple Inodes. This is different from a BN, which represents a variable/component with a single icon. In Figure 1, there are two components, and . Each component can take two states, and . An Inode is noted as a rectangle, and it represents the th state of the th variable. In this example, an Inode corresponds to the first rectangle in the first row with remaining Inodes , , and , respectively.
An Snode is represented as a circle, and it contains a value for some prior or conditional probability. A directed edge connects an Snode and an Inode, which represents direct conditional dependency between the single immediate Inode descendant of the Snode (also called its head, denoted as ) and the immediate Inode predecessors (also called its tail, denoted as ). The conditional probability is denoted as . In the example, the Inode is the descendant or head of the Snode with value 0.01, and the Inode is a predecessor or tail of the same Snode. This connection represents the conditional probability . If an Snode only has a descendant but no predecessor, the connection from it to its descendant represents the prior probability. In the example, one such connection is , representing that .
The set of components which set belongs to is the parent component set of Inode , noted as . In the example, component is the parent set of the Inode and the Inode . This relationship is similar to that in a BN, where all states of one variable have the same set of parent variables. In a general BKB, however, different states of a component can have different parent variable sets. This feature allows more flexible variable relationship in a BKB than a BN; however, this is beyond the scope of this paper.
2.2. Subsystem Learning from Data
This subsection describes a BKB learning algorithm, inspired by extant BN learning algorithms.
The first step of building our Bayesian Knowledge Fusion for Complex System (BKFCS) emergence detection framework is to learn a probabilistic model from subsystem data. In machine learning literature, scoring functionbased methods have been widely applied in BN learning problem. Scoring functions can be classified into two categories: information theorybased scoring functions and Bayesian scoring functions [14].
Typical informationtheoretic scoring functions include log likelihood (LL), minimal description length (MDL), Bayesian information criterion (BIC) [15], Akaike information criterion (AIC) [16], and mutual information test (MIT) [17]. Typical Bayesian scoring functions include BD [18], BDe [18], BDeu [19], and K2 [20]. However, BKBs represent variable correlations at the variable instantiation level, so we cannot apply existing scoring functions directly to a BKB learning algorithm.
Instead, we propose a modified scoring function designed for learning a BNlike BKB and a greedy algorithm to learn a BKB from a given dataset. This algorithm learns a BKB that maximizes the scoring function (1) given dataset , assuming it contains cases and variables/features, and each feature/component , has states/Inodes. The notation means the number of cases in which condition “” holds. The penalty constant is set to 0.01 in our algorithm. This function consists of two parts: the first part computes the log likelihood of BKB given dataset , and the second part is the penalty for complexity and overfitting, which is proportional to the difference between number of possible Snodes and number of Snodes that appear in the BKB. The difference between existing scoring functions and our proposed function is that in the penalty term (the second part), MDL, BIC, and AIC only penalize network fitness by total number of parentchild patterns, namely, .
We have also learned BKBs using Bayes, BDeu, MDL/BIC, and MIT (entropy) and AIC scoring functions and tested their performance against BKBs learned by our proposed function on thirteen UCI datasets. We choose these five popular scoring functions whose usefulness has been widely tested and validated. Their average accuracies are 84%, 83%, 82%, 70%, and 85%, respectively (details in Table 12). Our scoring function achieves 85% average accuracy on the same testbed (details first column in Table 13). It turns out that our function can outperform four of five scoring functions and has comparable performance with AIC. However, BKBs learned using AIC tend to result in simple structures. Even though simple BKBs based on AIC scores can perform equally well in classification tasks compared to BKBs learned based on our proposed method, a BKB learned from AIC score drops variable interactions within a BKB and across BKBs. Without sufficient interactions across BKBs, a BKB learned from AIC score reduces the capability to detect emergence. As such, we cannot use AIC.
In general, learning a BN or BKB from data is NPhard; therefore we make several tradeoffs to achieve polynomial time complexity. A detailed complexity analysis is provided in Appendix A. In the worst case, the time complexity of the entire learning algorithm is , where is the number of variables and is the number of cases. The other two constants are explained in the Appendix.
To test its performance against other kinds of models on a general supervised classification task on the same testbed, we compare BKB classifier’s performance with a wide range of popular classifiers: Adaboost [21], Bayesian Network [22], Sequential Minimal Optimization (SMO) [23], logistic regression [24], and decision tree [25]. Experiment results show that our classifier has comparable accuracy. Since learning a BNlike BKB is not the central contribution of this paper, these results are detailed in Appendix A.
2.3. Subsystem Probabilistic Fusion
This subsection describes how to fuse multiple BKBs learned from multiple subsystem related datasets into one fused BKB (FBKB) that represents the entire system dynamics.
To fuse multiple BKBs, we apply the BKB fusion algorithm developed by Santos Jr. et al. [9]. This resulting fused BKB (FBKB) is the Knowledge Base that the BKFCS framework will reason on.
We design another BKB in Figure 3, which contains the same set of variables, but different probabilistic distributions with BKB 1 in Figure 1. Then we get a fused BKB in Figure 2 by fusing BKB 1 and BKB 2 from Figure 3. Briefly, the idea is to associate each component from each BKB with a special component named as source fusion component. In this example, there are two such components: and , and each has two source Inodes: “tom” and “john.” Each source Inode connects to an Inode via all Snodes pointing to it. Each source Inode also has one Snode that points to it, representing the reliability/weight of its source. In this example, this weight is 0.5/0.5, meaning that two sources “tom” and “john” are equally reliable.
This source fusion component is the glue that connects variables from different subsystems together. Therefore, it fuses BKBs from various subsystems at the variable instantiation level. In this way, fusion not only computes inferences originated from each subsystem, but also computes new inferences generated by subsystems interactions through their shared variables. The accumulated probability of these new inferences contributes to detection of emergence. Fusion also preserves the distributions and variable relationships in the base subsystems without loss of information. In general, a fused BKB cannot be represented as a BN since both cycles and different parent Inode combinations can occur for each target Inode drawn from the different BKBs being fused together [9]. We provide the details of the BKB fusion algorithm in Appendix B.
The time complexity of BKB fusion algorithm is also polynomial. In particular, its worstcase complexity is , where is the number of Inodes in all subsystem BKBs, is the number of Snodes, and is the total number of edges/arcs. Please refer to Appendix B for details.
2.4. Belief Updating for Emergence Detection
This subsection describes an efficient belief updating algorithm on the FBKB and briefly demonstrates how to detect emergence and perform general classification tasks at the same time.
In general, performing belief updating on a BN or a BKB is an NPhard problem. It is also NPhard to find an approximate solution [26]. Bayesian belief updating involves computing the probability that target variable takes a certain state based on an observation that some other feature variables take certain states. It is denoted as , where is a set of observed feature variables instantiations. Since it is proportional to the joint probability , we only compute this joint probability. We can compute this probability by summing up all inferences probabilities which are consistent with and . Exact inferencing simply enumerates all inferences, picks out consistent ones, and sums their probabilities as the joint probability.
If we do belief updating on BKB 1, BKB 2, and their fused BKB, we will discover emergence. As a simple demonstration of the belief updating procedure, we first name all Snodes of the three BKBs in Table 1. Notice that, in this example, each pair of Snodes sums up to 1, so only half of all Snodes need to be marked. Based on these marks and belief updating rules, we compute variable ’s state probability in the three BKBs, as shown in Table 2. In the last two rows, the constant 0.25 is the product of two source fusion variable priors, namely, . In fact, since two sources have equal weights, and the constant appears in all inferences, it does not change the relative ordering of ’s (two) states’ probabilities.


From the last column , we see that, for both BKBs 1 and 2, . In the fused BKB, on the other hand, we see that . This is one type of emergence, which cannot be detected by aggregating separate analyses on the subsystems. This is just a simple example of emergence. For general purpose complex systems, we will fully describe our detection framework through realworld examples and provide the underlying mathematical formulations and solutions.
Finally, we note that, in the example fused BKB, the number of inferences doubled compared with that in each single BKB, which is the result of variable interaction. In a fused BKB, there can exist an exponential number of inferences, which makes exact inferencing algorithm extremely demanding with multivariate systems. Instead, we provide a samplingbased approach to approximate the joint probability. To overcome the NPhard problem, we set up a constant threshold on the number of valid samples we collect before termination. Therefore, our approximation approach has uniform polynomial time complexity and maintains decent performance compared to exact inferencing algorithm. We also compared its running time and accuracy against exact inferencing algorithm and conclude that it is sufficient to serve our purposes for detecting emergence efficiently. In worst case, the time complexity of approximation algorithm is , while the exact inferencing algorithm is , where SV is the number of shared variables among subsystems, is the number of evidences in a testing case, and is the average number of states across all shared variables. Details are described in Appendix C.
3. Automatic Detection of Emergence
This section first formally defines different types of emergence in complex systems and explains the intuitions behind these definitions. Next, it applies our proposed framework on a realworld example about a historical US blackout. Lastly, we analyze some major factors causing emergence in a general complex system and how to detect emergence from data automatically. We briefly summarize our proposed emergence detection framework.
3.1. Definition of Emergence in Complex Systems
We define emergence in complex systems formally in this subsection, which forms the basis for all the following subsections.
As mentioned in the Introduction, emergence is unpredictable system behaviors caused by nonlinear interactions within its subsystems. However, many other reasons can cause unexpected/unpredictable system behaviors. In such cases, those unpredictable behaviors should not be categorized as emergence. To rule out alternative explanations of unexpected behavior or emergent behavior of a complex system, such as due to incomplete information, inconsistent measurements, or inexpert judgments, we make three assumptions about this definition:(i)Assumption one is that all subsystems within a complex system are observed, and their features/behaviors are recorded descriptively and/or quantitatively. This assumption indicates that there is no hidden subsystem or obscured subsystem behavior, which may result in unpredictable behavior in the overall system.(ii)Assumption two is that someone with sufficient expert knowledge can build consistent models based on these observables for each subsystem and analyze subsystem behaviors from the constructed models. In this assumption, “consistent” means that the same modeling technique and logic are applied across all subsystems, and no discrimination is allowed.(iii)Assumption three is that we have access to ground truth about both subsystem and overall subsystem behaviors, so that the emergence definition is based on ground truth, rather than relative metrics influenced due to applied modeling techniques.
In our framework, we require that datasets are available for both subsystems and the overall system and that a maximum likelihood logic is applied in the system behavior modeling. In this way, all three assumptions are satisfied. Prior work [28] studied an emergent border crossing behavior during the 2009 H1N1 pandemic in Mexico using the BKB framework. In that paper, two types of emergence were defined: strong emergence and weak emergence. However, the BKBs were manually constructed from descriptive data sources. In this paper, we apply a datadriven approach for automatic emergence detection whenever data is available.
Given subsystem data and maximum likelihood logic, we can query about target variable’s (Tar’s) most likely state in all subsystems. Then each subsystem makes decisions based on their partial knowledge of , learned from the corresponding subsystem dataset. The subsystems’ opinions can form multiple sets: a majority opinion set and/or minority opinion set(s). In an extreme case, all subsystems form a unanimous opinion, and there is no minority opinion. In another case, each subsystem has a different opinion from the others’, or each opinion has an equal number of supporters. In this case, there is no majority opinion.
At last we apply the same logic on an overall system dataset to figure out the most likely state of Tar. Intuitively, if there is a majority opinion from the subsystems side, it is expected to coincide with the overall system opinion. Otherwise, we claim this discrepancy as one form of emergence. If there is no majority opinion from the subsystem side, and overall system opinion agrees with one of the minor opinions, it is also accepted. Otherwise, we also claim it as one type of emergence. Based on these intuitions, we illustrate four types of emergence in Table 3. In this table, Sub 1 to Sub 3 represent three subsystems. Whole means the opinion from overall system. Type labels the type of emergence this case belongs to. The states “a”, “b”, “c”, and so forth represent different opinions about Tar from subsystems and/or overall system.

In general, a complex system can have an arbitrary number of subsystems, but three is the minimum number to have all types of emergence. We notice that not all (if any) will occur in a complex system. If is binary, only Type 1 and Type 3 can occur; if it is multinomial, all four types can occur. Furthermore, per this definition, we believe that Type 3 emergence should be observed most often. The condition for Type 2 emergence is harder to meet, so it should occur less frequently. Type 1 and Type 4 are likely rarest as their conditions are most stringent.
3.2. Emergence Detection: BKFCS
This subsection details emergence detection through BKFCS.
If we have a dataset about system behaviors under various circumstances, we can apply our BKFCS to detect emergence within the system from data. We also name a system configuration as a case in the dataset. A system configuration refers to a variablestate pair tuple, representing system working status. For instance, if a system has two binary variables, and , then it will have at most four different configurations, namely, . In the system dataset, which is stored as a twodimensional matrix format, each row corresponds to one configuration, and each column corresponds to a feature/variable in that system. We also call each row an entry or case of the system. In addition, we assume both subsystem datasets and overall system dataset are available. Therefore, we can set up ground truth for each case. To identify an emergent case against a nonemergent case, we need to label each testing case as emergence or nonemergence based on majority and minority opinions. Assuming that subsystem datasets are labeled as , and overall system dataset is labeled as . We use to label ground truth of each case, but only provide BKFCS with subsystem datasets . By comparing its prediction with ground truth, we can measure BKFCS’s performance.
To classify a testing case as emergent or nonemergent, we first run belief updating on all BKBs learned from those subsystem datasets. Then we form majority and minority opinions based on individual opinions from all BKBs. Based on these opinions, we know which state of target leads to emergent case and which does not. Next, we perform belief updating again on the fused BKB, which gives probabilities for both emergence and nonemergence states.
To simplify this procedure, we first treat emergence detection as a binary classification problem; namely, all types of emergence cases are viewed as positive, while nonemergence cases are viewed as negative. For each testing case, we compute the accumulated probability of this case being positive and the accumulated probability of it being negative per function (2). Then we normalize and into and and compare them to determine if this case is emergent per (3). In this equation, if the difference is bigger than a predefined threshold (will be discussed in experiment section), we declare it as emergence. Then we compare claimed result with ground truth to evaluate BKFCS’s performance.
3.3. An Example of Emergence in Complex System
This subsection details a realworld emergence example.
We selected the 1996 US west coast blackout [29] as our conceptual demonstration example. On July 2, 1996, a blackout occurred on the west coast of the US, which impacted over two million customers. The first event was a single phasetoground fault on the 345 kV Jim BridgerKimport line. System protection removed this line from service clearing the fault. Twenty milliseconds later, system protection opened the 345 kV JimBridgerGoshen line due to misoperation of the ground element in a relay at Bridger. Loss of the two lines correctly initiated a remedial action scheme (RAS) that removed two generating units from service. The next event was system protection opening the 230 kV Round UpLaGrande line due to misoperation of a zone 3 relay at Round Up. These three events together caused a series of disturbances to the entire system and caused overload on other lines, which further brought down more lines offline.
Per incident report [30], “the simultaneous combination of operating conditions on July 2 was not anticipated or studied. The speed of the collapse seen July 2 was not observed in this region and was not anticipated in studies.” In fact, due to the combinatorial nature of interactions that could happen in such complex systems, it is impractical to evaluate all combinations in their studies and prevent all possible advert outcomes before they happen.
This incident meets all three assumptions of proposed emergence definition. First, all behaviors and features of each subsystem, which is power supply and delivery system in the case, are recorded. Their designed features are all functional as expected. For each subsystem, its individual purposes, such as line protection, power delivery rebalancing, and overload protection, are all achieved as well. In theory, these measures should be sufficient to protect the entire system from collapsing. In short, this meets the first assumption of no hidden behavior or missing information. Second, all subsystems handle incidents according to the same logic, which is prebuilt into hardware and software action rules. Employees in that company also followed operation procedures to handle all situations they met to solve immediate problems. This satisfies the second assumption of equal treatments in all subsystems. Finally, the entire system behavior is also recorded, which represents systemscale failure. Therefore, we know the ground truth behavior of both subsystems and overall systems.
Since all three assumptions are met, we can claim that the observed behavior belongs to Type 1 emergence. It means that since all subsystems have been reviewed separately, power delivery in the entire network should not fail. However, overall system observation tells us the opposite. In the next subsection, we apply our proposed framework to model this incidence and compute the emergence.
3.4. Applying Emergence Detection Framework on 1996 US West Coast Blackout Incidence
This subsection details how to apply our proposed framework to model this incidence.
In this accident, the first three major events are Jim BridgerKimport line open, Jim BridgerGoshen line open, and Round UpLaGrande line open. Since details of incidence are recoded in descriptive manner, we manually build three BKBs representing each event (Figure 4). Next, we fuse them into one FBKB (Figure 5) by BKB fusion algorithm. Then we perform BKB belief updating on threeevent BKBs and the FBKB and choose variable “system failure” (abbr. “SF”) as target. For demonstration purpose, we label Snodes as before in Table 4. Next, we list target state probabilities for every subsystem (single event) and entire system in Table 5. In the last two rows of this table, variable is the product of source fusion variable probabilities which correspond to that inference. Remember that the BKBs in this case are simplified such that only instantiated variables states are depicted, so we can see some Snodes do not occur in any subsystem BKB but occur in overall system BKB.


In Table 5 checking target state probabilities, we know that, in all three events, , but in the overall system BKB, we see that . This is a Type 1 emergence per our definition. Now we study this emergence from a mathematical point of view. We know the values of Snodes in these BKBs are just one solution to the following set of inequalities. Other types of emergence can be constructed in a similar way if the feasible region for these inequalities is not empty. This is the mathematical foundation (4) for emergence in this work. However, this realworld example only displays one type of emergence. In the next section, we will discuss emergence detection in a general system.
3.5. Relevant Factors Underlying Emergence
This subsection describes relevant factors effecting emergence in complex systems from datadriven approach and emergence detection on general systems.
Recall the US blackout example above. We noticed that it shared multiple parameters in different subsystems, both variables and probabilities. In a general complex system, however, all kinds of divergence can occur across different subsystems. We now discuss these variations from a datadriven approach, which provides quantitative metrics of these factors.
In some complex systems, different subsystems have similar structures and parameters, such as power delivery system; in other complex systems, subsystems differ from each other, such as in health care delivery systems. It is reasonable to believe that subsystem variation also plays a role in emergence of complex systems. Therefore, if we collect multiple datasets for subsystems of a complex system, we should consider how different datasets coming from different subsystems differ from each other. To quantify their difference, we define dataset similarity metrics. These metrics introduce relevant factors for emergence. These metrics will be used in the Experiments.
Assume and , are two sets both including variable . Let and be in and .
Define pairwise variable similarity, , where is the KullbackLeibler divergence of from .
This measures the difference of a certain variable between two experts’ views.
Assume that and that it only exists in , where
Define variable similarity, , as the average pairwise variable similarity for variable in these sets, namely, . This measures the difference of certain variable in all experts’ views on average.
Define datasets similarity, Ω, as the ratio between and , where denotes number of variables of set . This measures the difference in the variable selection criteria of two experts.
A related question is how these differences could happen in realworld systems. The answer is complicated. Sometimes different subsystems observe partially overlapped subsets of features on a system, and each shared variable in different subsystems has the same probability distribution. Such systems should have high dataset similarity scores between their subsystems. In other situations, different subsystems observe the same variable from various perspectives, resulting in contradictory probability distributions on each shared variable. These systems will have low dataset similarity scores between their subsystems. In the second kind of situations, shared variables have different distributions from one source to another caused by perspective difference, sample representativeness, random noise, and system biases.
Therefore, once we have datasets about subsystem characteristics under various system configurations, we should be able to identify which configurations lead to emergent behaviors.
As for ground truth, we apply a model independent criterion. Let be the set of variables observed in , and let be the set of variables observed in . For the th case , we denote its target state as . Similarly, , we denote its state as . Let be the number of observations in , and let , be the state of variable of observation in . We determine target state of case through function (5), namely, . However, if is not unique, we pick the best state by function (6). Then, for each case , in , we can compute opinions about target state from all subsystems based on and these two equations, resulting in opinion vector . Combined with its true target state , we can determine whether it is an emergent case and which type of emergence it belongs to. This forms the ground truth for each case in testing set .
Based on the ground truth, we can perform an emergence detection task. Given several datasets , representing subsystem dynamics, we first learn each BKB from one dataset by BKB learning algorithm introduced in the Background. Then we fuse these BKBs into one FBKB per BKB fusion algorithm. Lastly, we run belief updating via sampling method on both individual BKBs and the FBKB for each testing case. To detect emergence versus nonemergence case, we form majority and minority opinions by querying about most probable state of target variable on individual BKBs and compare the opinion of querying FBKB on target variable . Per emergence definition, we decide whether this case is emergence and which type of emergence it belongs to. Finally, we compare our decision with ground truth label to see if we make the right call.
3.6. Emergence Detection Framework Recap
We now provide a stepbystep recap description of our framework.
Step 1. Collect data from multiple subsystems. These datasets contain subsystem feature states as well as target variable states.
Step 2. Learn BKBs for each subsystem via BKB learning algorithm if subsystem data are presented in a structured form. Otherwise, we build BKBs manually based on descriptive data about subsystem features and target variable states.
Step 3. Fuse BKBs for subsystems into one FBKB via BKB fusion algorithm. If we have information about BKB reliabilities, we assign them to fusion algorithm; otherwise, we simply assign equal reliabilities to all subsystem BKBs.
Step 4. Analyze single BKBs and FBKB using belief updating. Compute individual BKB opinions and FBKB opinions for each system feature state combinations.
Step 5. Determine which cases belong to emergence and the emergence type according to definitions in Table 3.
Step 6. Compare BKFCS decision of emergence with ground truth if we have access to it and evaluate its performance.
4. Experiments
This section begins with designing synthetic datasets that simulate various types of complex systems. Then, it details building complex system models from synthetic dataset via BKB learning and fusion. Finally, we summarize the framework’s performance in comparison with existing methods.
4.1. Designing Synthetic Datasets
Even though various types of complex systems exist in real world, the subsystem datasets for emergence modeling typically have not been available for one of two reasons. () Extant subsystem behaviors and features are usually described in natural language or equations in postmortem briefings, but we cannot directly apply the framework to such forms of knowledge now. () In the cases when subsystem datasets have been recorded, they are not available to the public for commercial, security, or political reasons. As such, we test our proposed framework BKFCS against baselines using synthetic testbeds.
We selected thirteen datasets (Table 6) from UCI machine learning library [27] per several rules. First, both independent and dependent variables are categorical or binary, since BKBs do not currently handle continuous variables. If we choose continuous features and discretize them, we will introduce an uncontrolled level of noise. Second, sample number is sufficient compared to variable number; otherwise, no algorithm will extract useful pattern from that dataset and result in meaningless comparison. Finally, these datasets include various variable and sample number combinations so that they represent a diversity of scenarios—covering different scales of complex systems, different amounts of available data, and various kinds of variable interactions between subsystems and within a subsystem.

To evaluate BKFCS performance, we split one dataset into training and testing set in a 10fold cross validation fashion. For each training set, we can split it into multiple subsets, where a subset includes a part of all features and all cases. Different subsets have varying numbers of shared/common variables, representing their interactions in complex systems (Algorithm 1). To simulate the dataset similarity difference, we introduce ten popular perturbing functions that transform an original distribution to a perturbed one on shared variables (Table 7).

These functions have various effects on the original distribution: some can transform a uniform or relatively even distribution to a skewed one, others can lessen the skewness of distribution, and others can flip the density of distribution, making rare cases more popular and common ones less popular. In short, they cover most scenarios in which the procedure of fusing multiple inconsistent information can result. The perturbation procedure is as follows: for each shared variable, we compute its original probability mass function (pmf); choose a function randomly for each source; compute the perturbed pmf for each source; and modify shared variable instantiations so that the distribution of the modified shared variable follows perturbed pmf with minimal change.
4.2. Applying BKFCS on One Synthetic Dataset
This subsection demonstrates learning BKBs and BKB fusion from synthetic datasets.
We first demonstrate BKFCS on dataset balloon, where there are 76 cases, and each case includes five variables. Therefore, per algorithm in Algorithm 1, each training set contains 68 cases, and each testing set contains 8 cases. The five variables are “size,” “act,” “age,” “color,” and “class” (target variable), all of whom are binary variable. If we set low variable overlap (), we can have one shared variable. In one round, we pick “size” as shared feature variable and split the rest three into three subsystems evenly.
Based on three subsets created via this manner, we learn three BKBs via the BKB learning algorithm mentioned in the Background, which are drawn in Figure 6.
Then, we apply the BKB fusion algorithm detailed in the Background to fuse three BKBs into one and perform belief updating on fused BKB. For space reason, we omit showing fused BKB here. Finally, we run emergence detection algorithm on the testing set. The details of emergence detection on it as well as on other datasets will be presented in the following subsection.
4.3. Emergence Detection on All Synthetic Datasets
This subsection details emergence detection algorithm on all synthetic datasets
Here we evaluate BKFCS performance on these datasets. A typical way of evaluating classifier performance is to compare true positive rate against false positive rate and plot the results into ROC figures. To study the ratio of correct claims of emergence versus false claims, we need to know how many cases are truly emergence cases. After all, emergence can only be detected if it occurs in testing sets. We analyze the emergence rate in the synthetic datasets by comparing majority and minority of individual subsystem dataset opinions against overall system opinion on each case in the testing sets. For instance, if, for a test case, three subsets’ opinions are the same, but the overall set opinion is different than this opinion, we label this case as Type 1 emergence case. If our model predicts that it has the same opinion of the overall opinion, we classify it as correctly identified; otherwise, we claim it generates a false negative case. To evaluate its overall emergence rate in a dataset, we collapse different types of emergence. The aggregated emergence rate, which sums up all four types of emergence for each dataset under different parameters, is summarized in Table 8.

Perturbation is also involved in some experiments to simulate probability distribution variations in subsystem datasets. We simply named these datasets as perturbed sets and named those which have the same distribution of shared variables as original sets. In most datasets and both original and perturbed sets, emergence rate is positively correlated (with value < 0.05) to datasets similarity, . This is because the more shared variables there are among different subsystems, the more interactions exist among various subsystems.
Recall that, in 3.2, we need to compare computed accumulated state difference in (3) with some predefined decision threshold. In our experiments, we vary this threshold from 0.05 to 0.25 at 0.05 step and list all results. The results for different decision thresholds and different dataset similarities are shown in Figure 7. It only contains results for original sets. We also compute ROCs for perturbed sets and it shows similar relationships, so we omit that due to space limitation. From this figure, we see that, in both original and perturbed sets, all ROCs are above the baseline (this line means “true positive rate” = “false positive rate”). In addition, as grows from 10 percent to 60 percent, most ROC curves move northwest (ensemble method), indicating an improved performance. Thirdly, in most datasets, the decision threshold has a significant impact on precision and recall. Finally, at a fixed threshold, precision and recall have huge variances among different datasets. However, in most conditions, our proposed algorithm can reach 50 percent true positive rate while controlling false positive rate to be under 20 percent.
This figure demonstrates the overall performance of BKFCS on all types of emergence. However, we still want to break it down by each type. Therefore, we need to know emergence rate in each dataset for each type and evaluate its detection efficiency.
Here, we treat the different types of emergence cases separately and show the emergence rate for each dataset in Table 9. In this table, the first column in the first row shows 9%, meaning, in original dataset, when omega is set to 1 (10% overlap features), the average Type 1 emergence rate across thirteen datasets is 9 percent. The second column of the first row shows 12%, meaning that the average Type 1 emergence rate for thirteen datasets is 12 percent, and so on. For each omega, Type 3 emergence occurs most often, followed by Type 1 emergence. Type 2 and Type 4 emergence are less often observed. These results are consistent with our emergence definition, because Type 2 and Type 4 emergence indicate more divergent opinions from the various subsystems, indicating a harder decisionmaking process. Type 1 and Type 3 emergence, on the other hand, occur more often in practice, and it should be easier to detect them as well.

To test this hypothesis, we compute a confusion matrix for detection rates on each type of emergence by BKFCS and the average detection rate across thirteen datasets in all omega values in Table 10. In this table, the sum of each row represents the percentage of total cases that really belong to a certain type of emergence. The sum of each column is the percentage of cases that are predicted to be a certain type of emergence. In each column and each row, the number denotes the percentage of cases that is classified as that kind of emergence. The results indicate that BKFCS can detect most Type 3 and Type 1 emergence, but it performs worse on Type 2 emergence. It cannot detect any Type 4 emergence. Its performance is reasonable in that proposed BKB learning algorithm learns a BKB model from subsystem data by maximizing likelihood score, penalized by BKB structure complexity. As a result, it has limited capability in capturing extreme low frequency patterns, which maps to Type 4 emergence.

4.4. Performance Comparison against Ensemble Methods
This subsection compares the performance of BKFCS with BN fusion baselines.
The baseline is set up as follows: for each subsystem dataset, we learn a BN using the Weka machine learning package. Then we learn a BN for the whole system from the union of subsystem dataset. Remember that we only provide classifiers with subsystem dataset and keep whole system dataset as ground truth. Then by comparing majority and datasets with opinion of BN learned from the union dataset, we evaluate its emergence detection capability.
We repeat this procedure on all datasets with all parameters and list true positive rate and false positive rate in Table 11. In comparison, we list BKFCS results in the same table with threshold 0.05 results. At last, we summarize their average performance in six configurations in Figure 8. In this figure, we organize results into six groups from left to right, where group 1 represents original set, omega = 0.1, and group six maps to perturbed set, omega = 0.6. Remember that, for false positive rates, lower is better, and for true positive rate, higher is better. Then we do a onetail paired ttest for both true positive and false positive rates on all six configurations. Results show that eleven out of twelve tests are significantly different at 0.05 level. All six true positive rates in BKFCS are those of BN, but two are significantly larger than those of BN (groups 3 and 6). Only group 5 shows no significant difference in false positive rate between two classifiers. In short, BKFCS is much better than BN ensemble approach in detecting emergence in these synthetic datasets results.



5. Conclusion
In this paper, we propose a quantitative definition of emergence and an emergence detection algorithm that learns and fuses several subsystem models through variable interaction, which preserves all inconsistent information. Experiments on synthetic datasets show that this algorithm can better detect emergence in complex systems than extant methods. To the best of our knowledge, this automatic emergence detection approach of fusing graphical models is the first in this field.
Appendix
A. BKB Learning Algorithm
This appendix provides details on a new BKB learning algorithm, analyzes its time complexity, and compares its performance against five baselines on thirteen datasets from UCI machine learning library [27].
We assume that there exists at least one dataset for each subsystem, from which we learn a BKB. Even though a BKB can contain cycles, we concentrate on learning acyclic BKBs now for simplicity. Learning cyclic BKBs and their modeling impacts will be studied in future work. This algorithm first learns a component level structure and then builds a BKB from and dataset . In a general BKB, different states of a component can have different sets of parent components, namely, . However, different states of a component have the same set of parent components in a BKB built from , for specifies parentchildren relationship at variable level other than variable instantiation level. The set of parent components of a component is denoted as . Based on this simplification, our algorithm learns a BKB that maximizes the score function given dataset , assuming it contains cases and variables/features, and each feature/component , has states/Inodes. The notation means the number of cases in which condition holds. The penalty constant is set to 0.01 in our algorithm. This function consists of two parts: the first part computes the log likelihood of BKB given dataset , and the second part is the penalty for complexity and overfitting, which is proportional to the difference between maximized number of possible Snodes and number of Snodes that appear in the BKB. A nonzero difference means this BKB only associates component with some instantiations of that occur in the training set but cannot generalize to unobserved instantiations, which is a sign of overfitting. What is more, a parent set containing possible instantiations must overfit to the training data, since is the upper bound of observed patterns.
Based on (1), we design a polynomial time BKB learning algorithm which finds a nearoptimal solution, as shown in Algorithm 2. It has been well known that learning a general BN from data is a NPhard problem, so we set up some constraints to make a polynomial time algorithm possible: first, we include a threshold in the number of iterations (1000); second, we set up an upper bound on the number of parents each feature/variable can have in the BKB (C.1), and it also avoids overfitting per previous analysis about parent pattern limit; last but not least, since this algorithm takes a greedy strategy, it can only find a local maximum from a given starting search point, and we precompute multiple starting points with various density and search for multiple local maxima in parallel. Then we choose the best BKB among these local maxima as an approximation of the global maxima.

The algorithm works as follows: a fully connected DAG (directed acyclic graph) has edges/arcs, and we would like to search from multiple initial graphs with different densities. Remember that an “edge/arc” in structure connects two components/variables, while an edge in BKB connects an Inode and an Snode. In line (), we generate a random DAG with density , where ratio ranges from 0.01 to 1 at an interval of 0.01.
To compute it, we first do a random shuffle of variables 1 to and build a fully connected graph based on this shuffle. Namely, the variable in the front of shuffle points to all variables behind it. It guarantees acyclic property and its time complexity is . Then we pick the first edges from this fully connected graph to form , and its time complexity is . From , we iteratively search for a better graph from all its immediate neighbors. Here an immediate neighbor of means a graph which can be built by adding, deleting, or reversing an arc/edge . Here and represent two components this edge connects with, while, in a BKB , and represent Inodes. There are three possible scenarios, and each scenario corresponds to two potential neighbors, as shown from line () to line (). In each scenario, we test the acyclic property of two potential neighbors through topological ordering in line (). The time complexity of the acyclicity check is , where is the number of edges in this graph . If a neighbor is acyclic, we compute its score and compare it with the current graph’s score . In fact, we only compute the scores of nodes in the set through function (A.3).
In this function, structure graph is associated with BKB , and can be either current graph or its neighbor graph . In line (), each node score of current best graph is stored as . In line (), the change of score is computed as follows: if this neighbor is built by adding or removing an arc , only will change its score.
Therefore, we compute via (A.1). If this neighbor is built by reversing an arc, both and will change their scores, and we compute via (A.2). In both cases, we count all instantiations of in one loop over and compute the log likelihood score for each instantiation. In the worst case, dataset contains different patterns. Therefore, the time complexity of computing a node score difference is . After we get , we compare it with current largest improvement and update its value, as shown from line () to line (). After we evaluate all neighbors of , we update with the best neighbor for the next iteration. However, if no neighbor has a higher score, then is a local maximum, and the iteration stops, as shown from line () to line (). In each iteration, the worstcase time complexity is . For all iterations from each starting point, the worstcase time complexity is . The time complexity of entire algorithm is . Therefore, this is a polynomial time complexity algorithm.
In practice, we can optimize running time in several ways: first, we compute a local maximum from different initial graphs in parallel. Second, within each iteration, we compute the node scores of neighbors in parallel. Third, we memorize all node scores for patterns already computed and do a constanttime look up for existing patterns. The platform we use is a 16node Dell cluster, and each node contains two Intel® Xeon® CPU E52640 clocked at 2.6 GHz. Each node has 512 G of RAM. We have a total of 512 hyperthreaded cores/216 physical cores, which can speed up the algorithm by 2 orders of magnitude.
First, we learn BNs from UCI single datasets with five scoring functions using Weka. The classification accuracies are listed in Table 12. In the first column, each abbreviation corresponds to one dataset in the same order as in Table 6. For instance, “Bs” refers to “Balancescale” and “Co” is short for “Connect 4”. The last row “Avg” denotes the average result of all datasets. In the first row, each abbreviation denotes one learning scoring function. Next, we compare our algorithm with five baselines. We choose these older algorithms instead of the stateoftheart ones because the goal is not about learning a classifier that must beat the best performer, but to learn a BKB that can help in detecting emergence. Their 10CV accuracy is tabulated in Table 13. In the first column, dataset abbreviations are the same as in the previous table. In the first row, each term represents a type of classifier. In particular, “bkbc” means BKB classifier, “ad” means Adaboost classifier, “bn” means Bayesian Network classifier, “smo” means sequential minimal optimization classifier, “lr” means logistic regression classifier, and “dt” means decision tree classifier. The results indicate that our algorithm has competitive performance with other baselines in single source classification tasks.
In addition, we notice that BKBs learned by proposed scoring function also outperform most BNs learned by several extant scoring functions. In fact, according to the “no free lunch theorem” [31], all classifiers have their strength and weakness, as shown in performance variation on various datasets.
B. BKB Fusion Algorithm
This appendix details the BKB fusion algorithm initially designed by Santos Jr. et al. [9] and analyzes its time complexity.
Given BKBs learned from distinct subsystems, we integrate them into a single BKB that reflects the entire complex system through BKB fusion [9]. Santos Jr. et al. have proven that if all individual BKBs are valid BKBs, the fused BKB is also a valid BKB. This feature means we can build a hierarchy of BKBs representing emergent properties appearing on different levels of complex systems; however, we will not do this in this paper. We first introduce this algorithm and then analyze its complexity.
The BKB fusion algorithm is shown in Algorithm 3. To fuse multiple BKBs, we start with an empty BKB , and we also need a weighting function representing each BKB’s relative importance in a complex system, noted as in line (). In line (), we add all Inodes from all individual BKBs to , add all Snodes to , and add all edges to . The time complexity of this combination operation is . From line () to line (), we add a source fusion Inode for each Snode in all individual BKBs and add a supporting Snode for each source fusion Inode just added. In lines () and (), for each Snode coming from BKB , we note its head Inode as . In line (), we add a source Inode , which connects to Inode via Snode . It is an instantiation of source fusion component related to Inode , noted as .