#### Abstract

The connectivity of a network contains information about the relationships between nodes, which can denote interactions, associations, or dependencies. We show that this information can be analyzed by measuring the uncertainty (and certainty) contained in paths along nodes and links in a network. Specifically, we derive from first principles a measure known as *effective information* and describe its behavior in common network models. Networks with higher effective information contain more information in the relationships between nodes. We show how subgraphs of nodes can be grouped into macronodes, reducing the size of a network while increasing its effective information (a phenomenon known as *causal emergence*). We find that informative higher scales are common in simulated and real networks across biological, social, informational, and technological domains. These results show that the emergence of higher scales in networks can be directly assessed and that these higher scales offer a way to create certainty out of uncertainty.

#### 1. Introduction

Networks provide a powerful syntax for representing a wide range of systems, from the trivially simple to the highly complex [1–3]. It is common to characterize networks based on structural properties like their degree distribution or clustering, and the study of such properties has been crucial for the growth of Network Science. Yet there remains a gap in our treatment of the information contained in the relationships between nodes in a network, particularly in networks that have both weighted connections and feedback, which are hallmarks of complex systems [4, 5]. As we will show, analyzing this information allows for modeling the network at the most appropriate, informative scale. This is especially critical for networks that describe interactions or dependencies between nodes such as contact networks in epidemiology [6], neuronal and functional networks in the brain [7], or interaction networks among cells, genes, or drugs [8], as these networks can often be analyzed at multiple different scales.

Here we introduce information-theoretic measures that capture the information contained in the connectivity of a network, which can be used to identify when these networks possess informative higher scales. To do so, we focus on the out-weight vector, , of each node, , in a network. This vector consists of weights between and its neighbors, , and if there is no edge from to . For each we assume , which means can be interpreted as the probability that a random walker on will transition to in the next timestep, where a random walker might represent the passing of a signal, an interaction, or a state-transition [9]. The information contained in a network’s connectivity can be characterized by the uncertainty among its nodes’ out-weights and in-weights. The total information in the relationships between nodes is a function of this uncertainty and can be derived from two properties.

The first is the uncertainty of a node’s outputs, which is the Shannon entropy [10] of its out-weights, . The average of this entropy, , across all nodes is the amount of noise present in the network’s relationships, only if is the network is *deterministic*.

The second property is how weight is distributed across the whole network, . This vector is composed of elements that are the sum of the in-weights to each node from each of its incoming neighbors, (then normalized by total weight of the network). Its entropy, , reflects how certainty is distributed across the network. If all nodes link only to the same node, then , and the network is totally *degenerate* since all nodes lead to the same node.

The *effective information * of a network is the difference between these two quantities:

The entropy of the distribution of out-weights in the network forms an upper bound of the amount of unique information in the network’s relationships, from which the information lost due to the uncertainty of those relationships is subtracted. Networks with high contain more certainty in the relationships between nodes in the network (since the links represent less uncertain dependencies, unique associations, or deterministic transitions), whereas networks with low contain less certainty. Note that can be interpreted simply as a structural property of random walkers on a network and their behavior, similar to other common network measures [9].

Here, we use this measure to develop a general classification of networks (key terms can be found in Supplementary Materials, SM V A). Furthermore, we show how the connectivity and different growth rules of a network have a deep relationship to that network’s . This also provides a principled means of quantifying the amount of information among the micro-, meso-, and macroscale dependencies in a network. We introduce a formalism for finding and assessing the most informative scale of a network: the scale that minimizes the uncertainty in the relationships between nodes. For some networks, a macroscale description of the network can be more informative in this manner, demonstrating a phenomenon known as *causal emergence* [11, 12], which here we generalize to complex networks. This provides a rigorous means of identifying when networks possess an informative higher scale.

#### 2. Results

##### 2.1. Effective Information Quantifies a Network’s Dependencies

This work expands to networks previous research on using effective information to measure the amount of information in the causal relationships between the mechanisms or states of a system. Originally, was introduced to capture the causal influence between two subsets of neurons as a step in the calculation of integrated information in the brain [13]. Later, a system-wide version of was shown to capture fundamental causal properties in Boolean networks of logic gates, particularly their determinism and degeneracy [11].

Our current derivation from first principles of an for networks is equivalent to this system-wide definition (SM V B), which was based originally on interventions upon system states. For example, if a system in a particular state, *A*, always transitions to state *B*, the causal relationship between *A* and *B* can be represented by a node-link diagram wherein the two nodes—*A* and *B*—are connected by a directed arrow, indicating that *B* depends on *A*. This might be a node pair in a “causal diagram” (often represented as a directed acyclic graph, or a *DAG*) such as those used in [14, 15] to represent interventions and causal relationships. In such a case, the information in the causal relationship between *A* and *B* can be assessed by intervening to randomize *A * and measuring the effects on *B*. The would be the mutual information between *A* and *B* under such randomization: [16].

To expand this framework to networks in general, we relax this intervention requirement by assuming that the elements in sum to 1. In this case, an “intervention” can be interpreted as dropping a random walker on the network. For example, if the network represents a DAG or Markov chain, then dropping a random walker on a node would be equivalent to . The entropy of the transitions of the random walkers and how those transitions are distributed defines the of a network. In this generalized formulation, only in networks where the nodes and edges actually represent dynamics, interactions, or couplings does indicate information about causation. In the case where edges represent correlations, or where what nodes or edges represent is undefined, is merely a structural property of the information contained in the behavior of hypothetical random walkers (however, this situation is no different from other analysis methods that rely on random walkers).

Here we describe how this generalized structural property behaves in common network models, asking basic questions about the relationship between a network’s and its size, density, and structure. These inquiries allow for the exhaustive classification and quantification of the information contained in the connectivity of real networks. It is intuitive that the of a network will increase as the network grows in size. In general, adding more nodes should increase the entropy, which should in turn increase the amount of information. However, in cases of randomness rather than structure, should reflect this randomness. We found this is indeed the case.

Figure 1(a) shows the relationship between a network’s and its size under several parameterizations of Erdős-Rényi (ER) random graphs [17, 18]. As the size of an ER network increases (while keeping constant the probability that any two nodes will be connected, ), its converges to a value of . That is, in random networks, is dominated solely by the probability that any two nodes are connected, a key finding which demonstrates that, after a certain point, a random network structure does not contain more information as its size increases. This shift occurs in ER networks at approximately , which is also the point at which we can expect all nodes to be in a giant component [1]. This finding illustrates that network connectivity must be nonrandom to increase the amount of information in the relationships between nodes (see SM V C 1 for derivation). Note that if a network is maximally dense (i.e., a fully connected network, with self-loops), . However, we expect such dense low- structures to be uncommon, since network structures found in nature and society tend to be sparse [19].

**(a)**

**(b)**

We report another key relationship between a network’s connectivity and its in Figure 1(b). We again compare the of a network to its size, focusing on networks grown under different parameterizations of a preferential attachment model [20, 21]. Under a preferential attachment growth model, a new node is added to the network at each time step, contributing new edges to the network; these edges connect to nodes already in the network, , with a probability proportional to . Here, is the degree of node and tunes the amount of preferential attachment. A value of corresponds to each node having an equal chance of receiving a new node’s link (i.e., no preferential attachment). The classic Barabási-Albert network corresponds to linear preferential attachment, [21]. Superlinear preferential attachment, , creates networks that have less and less , eventually resembling star-like structures (see SM V C 2 for derivation). As shown in Figure 1(b), only in cases of sublinear preferential attachment, , does the network’s continue to increase with its size. When —creating a random tree—the network’s increases logarithmically as its size increases.

The maximum possible in a network of nodes is . This can be seen in the case of a directed ring network where each node has one incoming link and one outgoing link, each with a weight of 1.0, so each node has one node uniquely connecting to it. In such a network, each node contributes zero uncertainty, since , and , and therefore, its is always . In general, the of undirected lattices is fixed entirely by its size and the dimension of the ring lattice (i.e., is an undirected ring, is a taurus, etc. [22]), so for such lattices (see SM V C 2 for derivation).

The picture that emerges is that is inextricably linked with a network’s connectivity and growth (even network motifs, as shown in SM V D) and therefore to the fundamentals of Network Science. Random networks have a fixed amount of , and scale-freeness represents the critical bound for the growth of . In general, dense networks and star-like networks have less . The next section explores how ’s components explain these associations.

##### 2.2. Determinism and Degeneracy

*Determinism* and *degeneracy* are the two fundamental components of [11]. They are based on a network’s connectivity (see Figure 2(a) for a visual explanation), specifically the degree of overlapping weight in the networks. Determinism and degeneracy are derived from the uncertainty over outputs and uncertainty in how those outputs are distributed, respectively:

**(a)**

**(b)**

In a maximally deterministic network wherein all nodes have a single output, , the determinism is because . Conceptually, this means that a random walker will move deterministically starting from any node. Degeneracy is the amount of information in the connectivity lost via an overlap in input weights (e.g., if multiple nodes output to the same node). In a perfectly nondegenerate system where all nodes have equal input weights, the degeneracy is zero since . Together, determinism and degeneracy can be used to define :

These two quantities provide clear explanations for why different networks have the they do. For example, as the size of an Erdős-Rényi random network increases, its degeneracy approaches zero, which means the of a random network is driven only by the determinism of the network, which is in turn the negative log of the probability of connection, *p*. Similarly, in *d*-dimensional ring lattice networks, the degeneracy term is always zero, which means the of a ring lattice structure also reduces to the determinism of that structure. Ring networks with an average degree will have a higher than ER networks with the same average degree because ring networks will have a higher determinism value. In the case of star networks, the degeneracy term alone governs the decay of the such that hub-and-spoke-like structures quickly become uninformative in terms of cause and effect (see SM V C for derivations concerning these cases). In general, this means that canonical networks can be characterized by their ratio of determinism to degeneracy (see Figure 2(b)).

##### 2.3. Effective Information in Real Networks

So far, we have been agnostic as to the origin of the network under analysis. As described previously, to measure the of a network, one can create each by normalizing each node’s out-weight vector to sum to 1.0. Regardless of what the relationships between the nodes represent, the network’s determinism reflects how targeted the out-weights of the nodes are (networks with more targeted links possess higher ), while the degeneracy captures the overlap of the targeting of nodes. High reflects the greater specificity in the connectivity, whereas low indicates a lack of specificity (as in Figure 2(a)). This generalizes our results to multiple types of representations, although the origin of the normalized network should be kept in mind when interpreting the value of the measure.

Since the of a network will change depending on the network’s size, we use a normalized form of known as *effectiveness* in order to compare the of real networks. Effectiveness ranges from 0.0 to 1.0 and is defined as

As the determinism and degeneracy of a network increase to their minimum and maximum possible values, respectively, the effectiveness of that network will trend to 0.0. Regardless of its size, a network wherein each node has a deterministic output to a unique target has an effectiveness of 1.0.

In Figure 3, we examine the effectiveness of 84 different networks corresponding to data from real systems. These networks were selected primarily from the Konect Network Database [23], which was used because its networks are publicly available, range in size from dozens to tens of thousands of nodes, often have a reasonable interpretation as being based on interactions between nodes, and they are diverse, ranging from social networks, to power networks, to metabolic networks. We defined four categories of interest: biological, social, informational, and technological. We selected our networks by using all the available networks (under 40,000 nodes due to computational constraints) in the domains corresponding to each category within the Konect database, and where it was appropriate, the Network Repository as well [24]. See Materials & Methods section and SM Table II for a full description of this selection process.

Lower effectiveness values correspond to structures that have either high degeneracy (as in right column, Figure 2(a)) or low determinism (as in left column, Figure 2(a)) or a combination of both. In the networks we measured, biological networks on average have lower effectiveness values, whereas technological networks on average have the highest effectiveness. This finding aligns intuitively with what we know about the relationship between and network structure, and it also supports long-standing hypotheses about the role of redundancy, degeneracy, and noise in biological systems [25, 26]. On the other hand, technological networks like power grids, autonomous systems, or airline networks on average are associated with higher effectiveness values. One explanation for this difference is that efficiency in human-made technological networks tends to create sparser, nondegenerate networks with higher effectiveness on average, wherein the nodes relationships are more specific in their targeting.

Perhaps it might be surprising to find that evolved networks have such low effectiveness. But, as we will show, a low effectiveness can actually indicate that there is informative higher-scale (macroscale) connectivity in the system. That is, a low effectiveness can reflect the fact that biological systems often contain higher-scale structure, which we demonstrate in the following section.

##### 2.4. Causal Emergence in Complex Networks

This new global network measure, , offers a principled way to answer an important question: what is the scale that best captures the connectivity of a complex system? The resolution to this question is important because science analyzes the structure of different systems at different spatiotemporal scales, often preferring to intervene and observe systems at levels far above that of the microscale [12]. This is likely because relationships at the microscale can be extremely noisy and therefore uninformative, and coarse-graining can minimize this noise [11]. Indeed, this noise minimization is actually grounded in Claude Shannon’s noisy-channel coding theorem [10], wherein dimension reductions can operate like codes that use more of a channel’s capacity [16]. Higher-level causal relationships often perform error-correction on the lower-level relationships, thus generating extra effective information at those higher scales. Measuring this difference provides a principled means of deciding when higher scales are more informative (emergence) or when higher scales are extraneous, epiphenomenal, or lossy (reduction).

Bringing these issues to network science, we can now ask, what representation will minimize the uncertainty present in a network? We do this by examining *causal emergence*, which is when a dimensionally reduced network contains more informative connectivity, in the form of a higher than the original network. Note that, as discussed, can be interpreted solely as a general structural property of networks. Therefore, while we still call this phenomenon “causal emergence” because it has the same mathematical formalization as previous work in Boolean networks and Markov chains [11, 12, 16], here we focus on how it can be used to identify the informative higher scales of networks regardless of what those networks represent.

Notably, the phenomenon can be measured by recasting networks at higher scales and observing how the changes, a process which identifies whether the network’s higher scales add information above and beyond lower scales.

##### 2.5. Network Macroscales

First, we must introduce how to recast a network, , at a higher scale. This is represented by a new network, . Within , a micronode is a node that was present in the original , whereas a macronode is defined as a node, , that represents a subgraph, , from the original (replacing the subgraph within the network). Since the original network has been dimensionally reduced by grouping nodes together, will always have fewer nodes than .

A macronode is defined by some , derived from the edge weights of the various nodes within the subgraph it represents. One can think of a macronode as being a summary statistic of the underlying subgraph’s behavior, a statistic that takes the form of a single node. Ultimately there are many ways of representing a subgraph, that is, building a macronode, and some ways are more consistent than others in capturing the subgraph’s behavior, depending on the connectivity. We highlight here that macroscales of networks should in general be *consistent* with their underlying microscales in terms of their dynamics. While this has never been assessed within networks or systems generally, there has been previous research that has asked whether the macroscales of structural equation models are consistent with the effect of all possible interventions [27].

Here, to decide whether or not a macronode is an consistent summary of its underlying subgraph, we formalize consistency as a measure of whether random walkers behave identically on and . We do this because random walks are often used to represent dynamics on networks [9], and therefore, many important analyses and algorithms—such as PageRank for determining a node’s centrality [28] or InfoMap for community discovery [29]—are based on random walks.

Specifically, we define the *inconsistency* of a macroscale as the Kullback-Leibler divergence [30] between the expected distribution of random walkers on vs. , given some identical initial distribution on each. The expected distribution over at some future time, , is , while the distribution over at some future time is . To compare the two, the distribution is summed over the same nodes in the macroscale , resulting in the distribution (the microscale given the macroscale). We can then define the macroscale inconsistency over some series of timesteps as

This consistency measure addresses the extent to which a random dynamical process on the microscale topology will be recapitulated on a dimensionally reduced topology (for how this is applied in our analysis, see Materials & Methods).

What constitutes a consistent macroscale depends on the connectivity of the subgraph that gets grouped into a macronode, as shown in Figure 4. The can be constructed based on the collective of the subgraph (shown in Figure 4(a)). For instance, in some cases, one could just coarse-grain a subgraph by using its average as the of some new macronode (as in Figure 4(b)). However, it may be that the subgraph has dependencies not captured by such a coarse-grain. Indeed, this is similar to the recent discovery that when constructing networks from data, it is often necessary to explicitly model higher-order dependencies by using higher-order nodes so that the dynamics of random walks to stay true to the original data [31]. We therefore introduce *higher-order macronodes* (HOMs), which draw on similar techniques to consistently represent subgraphs as single nodes [31].

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

Different subgraph connectivities require different types of HOMs to consistently represent them. For instance, HOMs can be based on the input weights to the macronode, which take the form . In these cases, is a weighted average of each node’s in the subgraph, where the weight is based on the input weight to each node in the subgraph (Figure 4(c)). Another type of HOM that generally leads to consistent macronodes over time is when is based on the stationary output from the subgraph to the rest of the network, which we represent as (Figure 4(d)). These types of HOMs may have minor inconsistencies given some initial state, but will almost always trend toward perfect consistency as the network approaches its stationary dynamics (outlined in Section 4).

Subgraphs with complex internal dynamics can require a more complex type of HOM in order to preserve the macronode’s consistency. For instance, in cases where subgraphs have a delay between their inputs and outputs, this can be represented by a combination of and , which when combined captures that delay (Figure 4(e)). In these cases, the macronode has two components, one of which acts as a buffer over a timestep. This means that macronodes can possess memory even when constructed from networks that are at the microscale memoryless, and in fact, this type of HOM is sometimes necessary to consistently capture the microscale dynamics.

We present these types of macronodes not as an exhaustive list of all possible HOMs, but rather as examples of how to construct higher scales in a network by representing subgraphs as nodes and also sometimes using higher-order dependencies to ensure those nodes are consistent. This approach offers a complete generalization of previous work on coarse-grains [11] and also black boxes [16, 32, 33], while simultaneously solving the previously unresolved issue of macroscale consistency by using higher-order dependencies. The types of macronodes formed by subgraphs also provide substantive information about the network, such as whether the macroscale of a network possesses memory or path-dependency.

##### 2.6. Causal Emergence Reveals the Scale of Networks

A network has an informative macroscale when a recast network, (a macroscale), has more than the original network, (the microscale). In general, networks with lower effectiveness (low given their size) have a higher potential for such emergence, since they can be recast to reduce their uncertainty. Searching across groupings allows the identification or approximation of a macroscale that maximizes the .

Checking all possible groupings is computationally intractable for all but the smallest networks. Therefore, in order to find macronodes which increase the , we use a greedy algorithm that groups nodes together and checks if the increases. By choosing a node and then pairing it iteratively with its surrounding nodes we can grow macronodes until pairings no longer increase the , and then move on to a new node (see the Materials & Methods section for details on this algorithm).

By generating undirected preferential attachment networks and varying the degree of preferential attachment, , we observe a crucial relationship between preferential attachment and causal emergence. One of the central results in network science has been the identification of “scale-free” networks [21]. Our results show that networks that are not “scale-free” can be further separated into micro-, meso-, and macroscales depending on their connectivity. This scale can be identified based on their degree of causal emergence (Figure 5(a)). In cases of sublinear preferential attachment , networks lack higher scales. Linear preferential attachment produces networks that are scale-free, which is the zone of preferential attachment right before the network develops higher scales. Such higher scales only exist in cases of superlinear preferential attachment . And past the network begins to converge to a macroscale where almost all the nodes are grouped into a single macronode. The greatest amount of causal emergence is found in mesoscale networks, which is when is between 1.5 and 3.0, when networks possess a rich array of macronodes. Note that the increase in following macroscale groupings for shown in Figure 5(a) resembles the decrease in with higher that we observe in Figure 1(b). This is because after the decreasing of the microscale leaves room for improvement of the at the macroscale, following a grouping of nodes.

**(a)**

**(b)**

Correspondingly the size of decreases as increases and the network develops an informative higher scale, which can be seen in the ratio of macroscale network size, , to the original network size, (Figure 5(b)). As discussed previously, networks generated with higher values for will be more and more star-like. Star-like networks have higher degeneracy and thus less , and because of this, we expect that there are more opportunities to increase the network’s through grouping nodes into macronodes. Indeed, the ideal grouping of a star network is when and bit. This result is similar to recent advances in spectral coarse-graining that also observe that the ideal coarse-graining of a star network is to collapse it into a two-node network, grouping all the spokes into a single macronode [34], which is what happens to star networks that are recast as macroscales.

Our results offer a principled and general approach to such community detection by asking whether there is an informational gain from replacing a subgraph with a single node. Therefore, we can define *causal communities* as being when a cluster of nodes, or some subgraph, forms a viable macronode (note that this assumes the connections in the network actually represent possible causal interactions, but it also merely a topological property). Fundamentally, causal communities represent noise at the microscale. The closer a subgraph is to complete noise, the greater the gain in by replacing it with a macronode (see SM V G). Minimizing the noise in a given network also identifies the optimal scale to represent that network. However, there must be some structure that can be revealed by noise minimization in the first place. In cases of random networks that form a single large component which lacks any such structure, causal emergence does not occur (as shown in SM V G).

##### 2.7. Causal Emergence in Real Networks

The presence and informativeness of macroscales should vary across real networks, depending on connectivity. Here, we investigate the disposition toward causal emergence of real networks across different domains. We draw from the same set of networks that are analyzed in Figure 3, the selection process and details of which is outlined in the Materials & Methods section. The network sizes span up to 40,000 nodes, thus making it unfeasible to find the best macroscales for each of them. Therefore, we focus specifically on the two categories that previously showed the greatest divergence in terms of the : biological and technological. Since we are interested in the general question of whether biological or technological networks show a greater disposition or propensity for causal emergence, we approximate causal emergence by calculating the causal emergence of sampled subgraphs of growing sizes. Each sample is found using a “snowball sampling” procedure, wherein a node is chosen randomly and then a weakly connected subgraph of a specified size is found around it [35]. This subgraph is then analyzed using the previously described greedy algorithmic approach to find macronodes that maximized the in each network. Each available network is sampled 20 times for each size taken from it. In Figure 6, we show how the causal emergence of these real networks differentiates as we increase the sampled subgraph size, in a sequence of 50, 100, 150, and finally 200 nodes per sample. Networks of these sizes previously provided ample evidence of causal emergence in simulated networks, as in Figure 5(a). Comparing the two categories of real networks, we observe a significantly greater propensity for causal emergence in biological networks, and that this is more articulated the larger the samples are. Note that constructing a random null model of these networks (e.g., a configuration model) would tend to create networks with minimal or negligible causal emergence, as is the case for ER networks (Figure 13 in SM V G).

That subsets of biological systems show a high disposition toward causal emergence is consistent, and even explanatory, of many long-standing hypotheses surrounding the existence of noise and degeneracy in biological systems [36]. It also explains the difficulty of understanding how the causal structure of biological systems functions, since they are cryptic by containing certainty at one level and uncertainty at another.

#### 3. Discussion

We have shown that the information in the relationships between nodes in a network is a function of the uncertainty intrinsic to their connectivity as well as how that uncertainty is distributed. To capture this information, we adapted a measure, effective information , for use in networks and analyzed what it reveals about common network structures that have been studied by network scientists for decades. For example, the of an ER random network tends to , and whether the of a preferential attachment network grows or shrinks as new nodes are added is a function of whether its degree of preferential attachment, , is greater or less than 1.0. In networks where the mechanisms or transitions are unknown, but the structure is known, captures the degree of unique targeting in the network. In real networks, we showed that the of biological networks tends to be much lower than technological networks.

We also illustrated that what has been called “causal emergence” can occur in networks. This is the gain in that occurs when a network, , is recast as a new network, . Finding this sort of informative higher scale means balancing the minimization of uncertainty while simultaneously maximizing the number of nodes in the network. These methods may be useful in improving scientific experimental design, the compression and search of big data, model choice, and even machine learning. Importantly, not every recast network, , will have a higher than the that it represents, that is, these same techniques can identify cases of reduction. Ultimately, this is because comparing the of different network representations provides a ground for comparing the effectiveness of any two network representations of the same complex system. These techniques allow for the formal identification of the scale of a network. Scale-free networks can be thought of as possessing a fractal pattern of connectivity [37], and our results show that the scale of a network is the breaking of that fractal in one direction or the other Note that a future area of research is how to efficiently identify such informative higher scales, as well as how network properties beyond the EI change across scales [38].

The study of higher-order structures in networks is an increasingly rich area of research [29, 39–42], often focusing on constructing networks that better capture the data they represent. Here, we introduce a formal and generalized way to recast networks at a higher scale while preserving random walk dynamics. In many cases, a macroscale of a network can be just as consistent in terms of random walk dynamics and also possess greater . Some macronodes in a macroscale may be of different types with different higher-order properties. In other words, we show how to turn a lower-order network into a higher-order network. One noteworthy and related aspect of our work is demonstrating how a system that is memoryless at the microscale can actually possess memory at the macroscale, indicating that whether a system has memory is a function of scale.

While some [43] have previously recast subgraphs as individual nodes as we do here, they have not done so in ways that are based on noise minimization and maximizing consistency, focusing instead on gains to algorithmic speed via compression. Explicitly creating macronodes to minimize noise brings the dependencies of the network into focus. This means that causal emergence in networks has a direct relationship to community detection, a vast subdiscipline that treats dense subgraphs within a network as representing shared properties, membership, or functions [44, 45]. However, the relationship between causal emergence and traditional community detection is not as direct as it may seem. For one, causal emergence is high in networks with high degeneracy (i.e., networks with high-degree hubs, as we show in Figure 5(a)). Community detection algorithms do not typically select for such structural properties, instead focusing on dense subgraphs that connect more highly within the subgraph than outside [44]. In SM Figure 12, we show a landscape of stochastic block model networks and their associated values for causal emergence. Indeed in networks that would have high modularity [46] (e.g., two disconnected cliques), we do observe causal emergence, but only when the two disconnected cliques are of *different sizes*. This distinction is key and situates networks that display causal emergence in a meaningful place in the study of complex networks. In light of this, macronodes offer a sort of community detection where the micronodes that make up a macronode are a community and ultimately can be replaced by a macronode that summarizes their behavior while reducing the subgraph’s noise. Under this interpretation, community structure is characterized by noise rather than shared memberships.

#### 4. Materials and Methods

##### 4.1. Selection of Real Networks

Networks were chosen to represent the four categories of interest: social, informational, biological, and technological (see SM Figure 10, where we detail the same information as in Figure 3, but also include the source of the network data in addition to the effectiveness value of each network). We used all the available networks under 40,000 nodes (due to computational constraints) within all the domains in the Konect database that reflected our categories of interest. For our social category, we used the domains *Human Contact*, *Human Social*, *Social*, and *Communication*. For our information category, we used the domains *Citations*, *Co-authorship*, *Hyperlinks*, *Lexical*, and *Software*. For our biological category we used the domains *Trophic* and *Metabolic*. Due to overlaps between the Konect database and the Network Repository [24] in these domains, and the paucity of other biological data in the Konect database, we also included the *Brains* domain and the *Ecology* domain from the Network Repository to increase our sample size (again, all networks within these domains under 40,000 nodes were included). For our technological category, we used the domains *Computer* and *Infrastructure* from the Konect database. Again due to overlap between the Konect database and the Network Repository, we also included the *Technological* and *Power Networks* domains from the Network Repository. For a full table of the networks used in this study, along with their source and categorization, see Table II.

##### 4.2. Creating Consistent Macronodes

Previously we outlined methods for creating consistent macronodes of different types. Here, we explore their implementation, which requires deciding which macroscales are consistent. Inconsistency is measured as the Kullback-Leibler divergence between the expected distribution of random walkers on both the microscale and the macroscale , given an initial distribution, as in equation (5).

To measure the inconsistency we use an initial maximum entropy distribution on the shared nodes between and , that is, only the set of nodes that are left ungrouped in . Similarly, we only analyze the expected distribution over that same set of micronodes. Since such distributions are only over a portion of the network, to normalize each distribution to 1.0, we include a single probability that represents all the nonshared nodes between and (representing when a random walker is on a macronode).

We focus on the shared nodes between and for the inconsistency measure because (a) it is easy to calculate which is necessary during an algorithmic search, (b) except for unusual circumstances, the inconsistency over the shared nodes still reflects the network as a whole, and (c) even in cases of the most extreme macroscales (such as when in Figure 5), there are still nodes shared between and .

Here, we examine our methods of using higher-order dependencies in order to demonstrate that this creates consistent macronodes. We use 1000 simulated preferential attachment networks, which were chosen as a uniform random sample between parameters and 2.0, to 35, and with either or 2. These networks were then grouped via the algorithm described in the following section. All macronodes were of the type, and their inconsistency was checked over 1000 timesteps. These macronodes generally have consistent dynamics, either because they start that way or because they trend to that over time, and of the 1000 networks, only 4 had any divergence greater than 0 after 1000 timesteps. In Figure 11 in SM V F, we show 15 of these simulated networks, along with their parameters, number of macronodes, and consistencies. Note that even in the cases with early nonzero inconsistency, this is always very low in absolute terms of bits, and of the randomly chosen 15, none do not trend toward consistency over time. In our observations, most macronodes converge before 500 timesteps, so in analyzing the real-world networks using the macronode, we check all macronodes for consistency and only reject those that are inconsistent at 500 timesteps. More details about the algorithmic approach to finding causal emergence can be found in the following section.

##### 4.3. Greedy Algorithm for Causal Emergence

The greedy algorithm used for finding causal emergence in networks is structured as follows: for each node, , in the shuffled node list of the original network, collect a list of neighboring nodes, , where is the *Markov blanket* of (in graphical models, the Markov blanket, , of a node, , corresponds to the “parents,” the “children,” and the “parents of the children” of [47]). This means that consists of nodes with outgoing edges leading into , nodes that the outgoing edges from lead into, and nodes that have outgoing edges leading into the out-neighbors of . For each node in , the algorithm calculates the of a macroscale network after and are combined into a macronode, , according to one of the macronode types in Figure 4. If the resulting network has a higher value, the algorithm stores this structural change and, if necessary, supplements the queue of nodes, , with any new neighboring nodes from ’s Markov blanket that were not already in . If a node, , has already been combined into a macronode via a grouping with a previous node, , then it will not be included in new queues, , of later nodes to check. The algorithm iteratively combines such pairs of nodes until every node, , in every node, ’s Markov blanket, is tested.

#### Data Availability

All data used in this work were retrieved from the Konect Database [23] and also the Network Repository [24], which are publicly available. Software for calculating in networks and for finding causal emergence in networks is available by request or at https://github.com/jkbren/einet.

#### Disclosure

The opinions expressed in this publication are those of the author(s) and do not necessarily reflect the views of Templeton World Charity Foundation, Inc.

#### Conflicts of Interest

The authors declare no conflicts of interests.

#### Authors’ Contributions

B.K. and E.H. conceived the project. B.K. and E.H. wrote the article. B.K. performed the analyses.

#### Acknowledgments

The authors thank Conor Heins, Harrison Hartle, and Alessandro Vespignani for their insights about notation and formalism of effective information. This research was supported by the Allen Discovery Center program through The Paul G. Allen Frontiers Group (12171). This publication was made possible through the support of a grant from Templeton World Charity Foundation, Inc. (TWCFG0273). This work was also supported in part by the National Defense Science & Engineering Graduate Fellowship (NDSEG) Program.

#### Supplementary Materials

A: table of key terms. B: effective information calculation. C: deriving the effective information of common network structures. D: network motifs as causal relationships. E: table of network data. F: examples of consistent macronode. G: emergent subgraphs.* (Supplementary Materials)*