Complexity

Volume 2019, Article ID 5120581, 13 pages

https://doi.org/10.1155/2019/5120581

## Reconstructing Mesoscale Network Structures

^{1}IMT School for Advanced Studies, Piazza S. Francesco 19, 55100 Lucca, Italy^{2}University College London, The Bartlett Centre for Advanced Spatial Analysis, Gower Street, WC1E 6BT London, UK

Correspondence should be addressed to Tiziano Squartini; ti.accultmi@initrauqs.onaizit

Received 19 February 2018; Revised 5 December 2018; Accepted 19 December 2018; Published 10 January 2019

Academic Editor: Lucas Lacasa

Copyright © 2019 Jeroen van Lidth de Jeude et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

When facing the problem of reconstructing complex mesoscale network structures, it is generally believed that models encoding the nodes organization into modules must be employed. The present paper focuses on two block structures that characterize the empirical mesoscale organization of many real-world networks, i.e., the* bow-tie* and the* core-periphery* ones, with the aim of quantifying the minimal amount of topological information that needs to be enforced in order to reproduce the topological details of the former. Our analysis shows that constraining the network degree sequences is often enough to reproduce such structures, as confirmed by model selection criteria as AIC or BIC. As a byproduct, our paper enriches the toolbox for the analysis of bipartite networks, still far from being complete: both the bow-tie and the core-periphery structure, in fact, partition the networks into asymmetric blocks characterized by binary, directed connections, thus calling for the extension of a recently proposed method to randomize* undirected*, bipartite networks to the* directed* case.

#### 1. Introduction

The analysis of mesoscale network structures is a topic of great interest within the community of network scientists: much attention, however, has been received by the community-detection topic [1–3], while the analysis of other mesostructures has remained far less explored.

The present work aims at contributing to this stream of research, by exploring the effectiveness of models that constrain only local information in reproducing complex mesostructures as the bow-tie and the core-periphery ones. When approaching such a problem it is, in fact, commonly believed that models encoding the nodes organization into modules must be employed: here we test this hypothesis, by comparing models that enforce topological information like the total number of links, the degree sequences, and the reciprocity structure with their block-wise counterparts.

To this aim, we have considered real-world networks whose topological structure is* empirically* characterized by bow-tie and core-periphery structures: both are characterized by a central, cohesive subgraph surrounded by a loosely connected set of nodes [4]; in the first case, however, the central part of the network has a fan-in and a fan-out-component, respectively, entering into and exiting from it.

Remarkably, all models considered in the present paper can be recovered within the same framework, i.e., the entropy-maximization one, which has been proven to be rather effective for approaching both pattern detection and real-world networks reconstruction problems [5, 6]. Such a framework allows a tunable likelihood function to be definable for each considered model, thus allowing selection criteria like AIC or BIC to be applicable for unambiguously determining the “winner” between competing models, i.e., the one carrying the right amount of information to account for the inspected structures.

As a byproduct, our paper enriches the toolbox for the analysis of bipartite networks. Among the many, available, network representations, the bipartite one has recently received much attention [7, 8]. This, in turn, has led to the definition of algorithms for randomizing [9–12], reconstructing [13] or projecting [14, 15]* undirected*, bipartite networks. Their directed representation, however, has not been explored yet, thus calling for the definition of techniques to approach the study of this kind of networks as well.

This is especially true when considering that bipartite networks emerge quite naturally when studying the aforementioned mesoscale structures. It is, in fact, evident that analysing the way nodes cluster together unavoidably leads to the analysis of the way such modules interact. From an algebraic point of view, this boils down to consider matrices characterized by diagonal square blocks (i.e., the adjacency matrices of the modules themselves) and off-diagonal rectangular blocks (i.e., the adjacency matrices of the bipartite networks encoding their interactions).

Our method will be employed to analyse economic and financial networks empirically characterized by either bow-tie or core-periphery structures: more specifically, we will focus on two systems, the World Trade Web and the Dutch Interbank Network. As we will show, while the former can be described by a partial bow-tie structure, the latter is characterized by the coexistence of a core-periphery-like structure and a proper bow-tie one, the second one carrying a larger amount of information about the system evolution than the first one.

#### 2. Data

Let us now describe the two systems we have considered for the present analysis.

*The World Trade Web*. We consider yearly bilateral data on exports and imports from the UN Comtrade Database [17], from 1992 to 2002. We limit ourselves to considering the World Trade Web (WTW hereafter) in its binary, directed representation at the aggregate level. In order to perform a temporal analysis and compare different years, we restrict ourselves to a balanced panel of countries (present in the data throughout the considered interval). Accordingly, for a given year , () means that country has registered a nonnull (null) export towards country .

*The Dutch Interbank Network*. We consider a dataset where nodes are Dutch banks and a link from node to node indicates that bank has an exposure larger than 1.5 million euros and with maturity shorter than one year, towards a creditor bank [18]. We consider 44 quarterly snapshots of the Dutch Interbank Network (DIN hereafter), from 1998Q1 to 2008Q4. The last year in the sample represents the year during which the recent financial crisis became manifest.

#### 3. Methods

##### 3.1. The General Framework

Let us, first, provide an algebraic representation of the mesoscale structures considered in the present paper, i.e., the bow-tie and the core-periphery ones.

Networks whose topology is empirically characterized by a core-periphery structure can be represented as follows:the adjacency matrix is composed by four distinct blocks: while the square adjacency matrices and lying along the diagonal represent the core and the periphery modules, the two rectangular (in the most general case), off-diagonal matrices and represent the (bipartite) networks through which they interact. Usually, the link densities of the matrices above satisfy the chain of relationships ; i.e., the core module is (much) denser than the periphery module.

Notice that the two matrices and bring genuinely different information: while the generic entry () indicates that a directed link from the node in the core to the node in the periphery is present (absent), the generic entry () indicates that a directed link from the periphery node to the core node is present (absent). In other words, in order to fully describe the topological structure of* one*,* directed* bipartite network,* two* matrices are, in fact, needed. Naturally, in case the network is undirected, , and , which restores the symmetry of the whole adjacency matrix (i.e., ).

While the definition of core-periphery structure is quite intuitive, the definition of bow-tie structure, on the other hand, is based on the concept of node* reachability*: node is reachable from node if a path exists from node to node (a path being defined as a sequence of adjacent links connecting with ). According to this definition, each node is assigned to one of the sets described in [19]. The definition of the three most relevant ones follows:(i)SCC: each node in the Strongly Connected Component (SCC) is reachable from any other node belonging to the SCC;(ii)IN: each node in the SCC is reachable from any node belonging to the IN-component;(iii)OUT: each node in OUT-component is reachable from any node belonging to the SCC.

According to the definitions above, networks whose topology is empirically characterized by a bow-tie structure can be represented by the following adjacency matrix:the three blocks , , and representing the SCC, IN-, and OUT-component, respectively. The off-diagonal matrices and , instead, represent the (bipartite) networks through which they interact.

##### 3.2. Null Models

Let us now provide a brief description of the set of models that will be implemented to analyse the two kinds of mesoscale structures described above (for a detailed description see Appendix A). Let us also clarify that we will proceed by comparing the empirical network structures with models that constrain an increasing amount of information: in other words, we will compare our observations with increasingly refined benchmarks, a way of proceeding that justifies our choice of naming the latter* null models*.

The first class of null models we consider for the present analysis is the one including the so-called* degree-informed null models*. All null models in this class are defined by constraints encoding node-specific local information (i.e., the directed degree sequences), beside the membership of nodes to specified groups (labeled by the symbols ). Upon combining these two kinds of information, one obtains, in the most general case, block-specific directed degree sequences, definable aswith indicating the contribution to the out-degree of node (belonging to block ) coming from block (and analogously for ). Remarkably, all null models in this class induce a probability for the generic network configuration readingwith being (in the most general case)an expression making the dependence of the nodes degree(s) on the group membership apparent. Notice that all degree-informed null models considered here can be recovered from (6) upon opportunely relaxing the aforementioned dependencies. As an example, the directed version of the Stochastic Block Model (SBM) can be recovered by posing in (6); the traditional Directed Configuration Model (DCM), on the other hand, is obtainable by posing in the same equation. Upon eliminating the parameters dependence on nodes, and the Directed Random Graph Model (DRG) is finally obtained.

Interestingly, the* directed degree-corrected SBM* (ddc-SBM) can be recovered by decoupling the parameters dependence on node-specific quantities from their group membership, i.e., by posing .

When analysing directed networks, however, a nontrivial piece of information to be taken into account is represented by reciprocity [20]. For this reason, a second class of null models, i.e., the one including the so-called* reciprocity-informed null models*, is considered as well. Null models in this class are defined by constraints encoding the (non)reciprocal degree sequences, beside the usual nodes membership. In the most general case, the constraints defining such models can be written aswith , , and [20] and indicating the contribution to the reciprocal degree of node (belonging to block ) coming from block . All models in this second class induce a probability for the network readingas before, different null models induce different functional forms for the probability coefficients , , , : more explicitly, while the Reciprocal Configuration Model (RCM) is defined by the set of equationsits block-wise counterpart, i.e., the Block Reciprocal Configuration Model (BRCM), is defined by the block-specific version of the coefficients above (see Appendix A for more details).

Models in both classes are* parametric*: a recipe is, then, needed to estimate the parameters appearing in their definition. To this aim, the likelihood-maximization principle can be invoked, the likelihood function associated with reading . Notably, the evidence that each null model we consider in this paper treats different nodes pairs as independent allows us to write the likelihood for block models in a block-wise form, i.e., as with indexing the different modules (e.g., in the case of bow-tie structures).

##### 3.3. Model Selection Criteria

Although rising the number of parameters to better reproduce empirical patterns is tempting, the risk of overfitting should be, nevertheless, avoided. A criterion to identify the best model out of a basket of possible ones is, thus, needed. In what follows, we will adopt the Akaike Information Criterion (AIC hereafter)and the Bayesian Information Criterion (BIC hereafter)whose first addendum is, in both cases, proportional to the likelihood of the null model under analysis, is the number of parameters defining the model, and is the sample size (set, as usual, at ). Both AIC and BIC are minimum for the best explanatory model in the basket [21].

In order to make (14) and (15) more explicit, let us call the number of blocks our network can be divided into (i.e., the* diagonal* blocks of the matrix ). While the Directed Random Graph (DRG) is defined by just one parameter, , the Stochastic Block Model (SBM) is defined by parameter (as can be verified upon inspecting definitions (1) and (2)).

Specifying the degree sequences leads to further rise the number of parameters: the Directed Configuration Model (DCM) is, in fact, defined by , the directed degree-corrected Stochastic Block Model (ddc-SBM) is defined by , and the Block Configuration Model (BCM) is defined by (each node, in fact, “needs” two parameters per block).

Accounting also for the information provided by the reciprocity requires a number of parameters to be specified that is for the Reciprocal Configuration Model (RCM) and for the Block Reciprocal Configuration Model (BRCM, each node, in fact, “needs” three parameters per block).

The model selection framework based upon the two information criteria above allows the probability that a given model is the best approximating model to be calculated as well, via the so-called* AIC weights* and* BIC weights*, defined aswith and , respectively.

#### 4. Results

*The World Trade Web*. Although the WTW has been deeply studied throughout the years [22–25], the analysis of its mesoscale organization has received far less attention [16, 26]. Interestingly, checking for the applicability of the bow-tie definition provided above, the WTW appears as being partitioned into a SCC and an IN-component only, the OUT-component being completely missing (see Figure 1). According to the algebraic representation introduced at the beginning of the paper, the WTW mesoscale structure is represented by the following adjacency matrixwith throughout our temporal interval. This implies that the nodes belonging to the IN-component do not establish internal relationships, their links pointing towards the SCC nodes only (via the block). Interestingly, the percentage of nodes belonging to the SCC steadily increases with time: from the 32% in 1992 to almost the 75% in 2002. Since the total number of nodes does not vary across the considered temporal interval, the IN-component shrinks accordingly. These results refine the picture drawn in [16], where only the largest connected component was considered.