Table of Contents Author Guidelines Submit a Manuscript
Complexity
Volume 2019, Article ID 5120581, 13 pages
https://doi.org/10.1155/2019/5120581
Research Article

Reconstructing Mesoscale Network Structures

1IMT School for Advanced Studies, Piazza S. Francesco 19, 55100 Lucca, Italy
2University College London, The Bartlett Centre for Advanced Spatial Analysis, Gower Street, WC1E 6BT London, UK

Correspondence should be addressed to Tiziano Squartini; ti.accultmi@initrauqs.onaizit

Received 19 February 2018; Revised 5 December 2018; Accepted 19 December 2018; Published 10 January 2019

Academic Editor: Lucas Lacasa

Copyright © 2019 Jeroen van Lidth de Jeude et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

When facing the problem of reconstructing complex mesoscale network structures, it is generally believed that models encoding the nodes organization into modules must be employed. The present paper focuses on two block structures that characterize the empirical mesoscale organization of many real-world networks, i.e., the bow-tie and the core-periphery ones, with the aim of quantifying the minimal amount of topological information that needs to be enforced in order to reproduce the topological details of the former. Our analysis shows that constraining the network degree sequences is often enough to reproduce such structures, as confirmed by model selection criteria as AIC or BIC. As a byproduct, our paper enriches the toolbox for the analysis of bipartite networks, still far from being complete: both the bow-tie and the core-periphery structure, in fact, partition the networks into asymmetric blocks characterized by binary, directed connections, thus calling for the extension of a recently proposed method to randomize undirected, bipartite networks to the directed case.

1. Introduction

The analysis of mesoscale network structures is a topic of great interest within the community of network scientists: much attention, however, has been received by the community-detection topic [13], while the analysis of other mesostructures has remained far less explored.

The present work aims at contributing to this stream of research, by exploring the effectiveness of models that constrain only local information in reproducing complex mesostructures as the bow-tie and the core-periphery ones. When approaching such a problem it is, in fact, commonly believed that models encoding the nodes organization into modules must be employed: here we test this hypothesis, by comparing models that enforce topological information like the total number of links, the degree sequences, and the reciprocity structure with their block-wise counterparts.

To this aim, we have considered real-world networks whose topological structure is empirically characterized by bow-tie and core-periphery structures: both are characterized by a central, cohesive subgraph surrounded by a loosely connected set of nodes [4]; in the first case, however, the central part of the network has a fan-in and a fan-out-component, respectively, entering into and exiting from it.

Remarkably, all models considered in the present paper can be recovered within the same framework, i.e., the entropy-maximization one, which has been proven to be rather effective for approaching both pattern detection and real-world networks reconstruction problems [5, 6]. Such a framework allows a tunable likelihood function to be definable for each considered model, thus allowing selection criteria like AIC or BIC to be applicable for unambiguously determining the “winner” between competing models, i.e., the one carrying the right amount of information to account for the inspected structures.

As a byproduct, our paper enriches the toolbox for the analysis of bipartite networks. Among the many, available, network representations, the bipartite one has recently received much attention [7, 8]. This, in turn, has led to the definition of algorithms for randomizing [912], reconstructing [13] or projecting [14, 15] undirected, bipartite networks. Their directed representation, however, has not been explored yet, thus calling for the definition of techniques to approach the study of this kind of networks as well.

This is especially true when considering that bipartite networks emerge quite naturally when studying the aforementioned mesoscale structures. It is, in fact, evident that analysing the way nodes cluster together unavoidably leads to the analysis of the way such modules interact. From an algebraic point of view, this boils down to consider matrices characterized by diagonal square blocks (i.e., the adjacency matrices of the modules themselves) and off-diagonal rectangular blocks (i.e., the adjacency matrices of the bipartite networks encoding their interactions).

Our method will be employed to analyse economic and financial networks empirically characterized by either bow-tie or core-periphery structures: more specifically, we will focus on two systems, the World Trade Web and the Dutch Interbank Network. As we will show, while the former can be described by a partial bow-tie structure, the latter is characterized by the coexistence of a core-periphery-like structure and a proper bow-tie one, the second one carrying a larger amount of information about the system evolution than the first one.

2. Data

Let us now describe the two systems we have considered for the present analysis.

The World Trade Web. We consider yearly bilateral data on exports and imports from the UN Comtrade Database [17], from 1992 to 2002. We limit ourselves to considering the World Trade Web (WTW hereafter) in its binary, directed representation at the aggregate level. In order to perform a temporal analysis and compare different years, we restrict ourselves to a balanced panel of countries (present in the data throughout the considered interval). Accordingly, for a given year , () means that country has registered a nonnull (null) export towards country .

The Dutch Interbank Network. We consider a dataset where nodes are Dutch banks and a link from node to node indicates that bank has an exposure larger than 1.5 million euros and with maturity shorter than one year, towards a creditor bank [18]. We consider 44 quarterly snapshots of the Dutch Interbank Network (DIN hereafter), from 1998Q1 to 2008Q4. The last year in the sample represents the year during which the recent financial crisis became manifest.

3. Methods

3.1. The General Framework

Let us, first, provide an algebraic representation of the mesoscale structures considered in the present paper, i.e., the bow-tie and the core-periphery ones.

Networks whose topology is empirically characterized by a core-periphery structure can be represented as follows:the adjacency matrix is composed by four distinct blocks: while the square adjacency matrices and lying along the diagonal represent the core and the periphery modules, the two rectangular (in the most general case), off-diagonal matrices and represent the (bipartite) networks through which they interact. Usually, the link densities of the matrices above satisfy the chain of relationships ; i.e., the core module is (much) denser than the periphery module.

Notice that the two matrices and bring genuinely different information: while the generic entry () indicates that a directed link from the node in the core to the node in the periphery is present (absent), the generic entry () indicates that a directed link from the periphery node to the core node is present (absent). In other words, in order to fully describe the topological structure of one, directed bipartite network, two matrices are, in fact, needed. Naturally, in case the network is undirected, , and , which restores the symmetry of the whole adjacency matrix (i.e., ).

While the definition of core-periphery structure is quite intuitive, the definition of bow-tie structure, on the other hand, is based on the concept of node reachability: node is reachable from node if a path exists from node to node (a path being defined as a sequence of adjacent links connecting with ). According to this definition, each node is assigned to one of the sets described in [19]. The definition of the three most relevant ones follows:(i)SCC: each node in the Strongly Connected Component (SCC) is reachable from any other node belonging to the SCC;(ii)IN: each node in the SCC is reachable from any node belonging to the IN-component;(iii)OUT: each node in OUT-component is reachable from any node belonging to the SCC.

According to the definitions above, networks whose topology is empirically characterized by a bow-tie structure can be represented by the following adjacency matrix:the three blocks , , and representing the SCC, IN-, and OUT-component, respectively. The off-diagonal matrices and , instead, represent the (bipartite) networks through which they interact.

3.2. Null Models

Let us now provide a brief description of the set of models that will be implemented to analyse the two kinds of mesoscale structures described above (for a detailed description see Appendix A). Let us also clarify that we will proceed by comparing the empirical network structures with models that constrain an increasing amount of information: in other words, we will compare our observations with increasingly refined benchmarks, a way of proceeding that justifies our choice of naming the latter null models.

The first class of null models we consider for the present analysis is the one including the so-called degree-informed null models. All null models in this class are defined by constraints encoding node-specific local information (i.e., the directed degree sequences), beside the membership of nodes to specified groups (labeled by the symbols ). Upon combining these two kinds of information, one obtains, in the most general case, block-specific directed degree sequences, definable aswith indicating the contribution to the out-degree of node (belonging to block ) coming from block (and analogously for ). Remarkably, all null models in this class induce a probability for the generic network configuration readingwith being (in the most general case)an expression making the dependence of the nodes degree(s) on the group membership apparent. Notice that all degree-informed null models considered here can be recovered from (6) upon opportunely relaxing the aforementioned dependencies. As an example, the directed version of the Stochastic Block Model (SBM) can be recovered by posing in (6); the traditional Directed Configuration Model (DCM), on the other hand, is obtainable by posing in the same equation. Upon eliminating the parameters dependence on nodes, and the Directed Random Graph Model (DRG) is finally obtained.

Interestingly, the directed degree-corrected SBM (ddc-SBM) can be recovered by decoupling the parameters dependence on node-specific quantities from their group membership, i.e., by posing .

When analysing directed networks, however, a nontrivial piece of information to be taken into account is represented by reciprocity [20]. For this reason, a second class of null models, i.e., the one including the so-called reciprocity-informed null models, is considered as well. Null models in this class are defined by constraints encoding the (non)reciprocal degree sequences, beside the usual nodes membership. In the most general case, the constraints defining such models can be written aswith , , and [20] and indicating the contribution to the reciprocal degree of node (belonging to block ) coming from block . All models in this second class induce a probability for the network readingas before, different null models induce different functional forms for the probability coefficients , , , : more explicitly, while the Reciprocal Configuration Model (RCM) is defined by the set of equationsits block-wise counterpart, i.e., the Block Reciprocal Configuration Model (BRCM), is defined by the block-specific version of the coefficients above (see Appendix A for more details).

Models in both classes are parametric: a recipe is, then, needed to estimate the parameters appearing in their definition. To this aim, the likelihood-maximization principle can be invoked, the likelihood function associated with reading . Notably, the evidence that each null model we consider in this paper treats different nodes pairs as independent allows us to write the likelihood for block models in a block-wise form, i.e., as with indexing the different modules (e.g., in the case of bow-tie structures).

3.3. Model Selection Criteria

Although rising the number of parameters to better reproduce empirical patterns is tempting, the risk of overfitting should be, nevertheless, avoided. A criterion to identify the best model out of a basket of possible ones is, thus, needed. In what follows, we will adopt the Akaike Information Criterion (AIC hereafter)and the Bayesian Information Criterion (BIC hereafter)whose first addendum is, in both cases, proportional to the likelihood of the null model under analysis, is the number of parameters defining the model, and is the sample size (set, as usual, at ). Both AIC and BIC are minimum for the best explanatory model in the basket [21].

In order to make (14) and (15) more explicit, let us call the number of blocks our network can be divided into (i.e., the diagonal blocks of the matrix ). While the Directed Random Graph (DRG) is defined by just one parameter, , the Stochastic Block Model (SBM) is defined by parameter (as can be verified upon inspecting definitions (1) and (2)).

Specifying the degree sequences leads to further rise the number of parameters: the Directed Configuration Model (DCM) is, in fact, defined by , the directed degree-corrected Stochastic Block Model (ddc-SBM) is defined by , and the Block Configuration Model (BCM) is defined by (each node, in fact, “needs” two parameters per block).

Accounting also for the information provided by the reciprocity requires a number of parameters to be specified that is for the Reciprocal Configuration Model (RCM) and for the Block Reciprocal Configuration Model (BRCM, each node, in fact, “needs” three parameters per block).

The model selection framework based upon the two information criteria above allows the probability that a given model is the best approximating model to be calculated as well, via the so-called AIC weights and BIC weights, defined aswith and , respectively.

4. Results

The World Trade Web. Although the WTW has been deeply studied throughout the years [2225], the analysis of its mesoscale organization has received far less attention [16, 26]. Interestingly, checking for the applicability of the bow-tie definition provided above, the WTW appears as being partitioned into a SCC and an IN-component only, the OUT-component being completely missing (see Figure 1). According to the algebraic representation introduced at the beginning of the paper, the WTW mesoscale structure is represented by the following adjacency matrixwith throughout our temporal interval. This implies that the nodes belonging to the IN-component do not establish internal relationships, their links pointing towards the SCC nodes only (via the block). Interestingly, the percentage of nodes belonging to the SCC steadily increases with time: from the 32% in 1992 to almost the 75% in 2002. Since the total number of nodes does not vary across the considered temporal interval, the IN-component shrinks accordingly. These results refine the picture drawn in [16], where only the largest connected component was considered.

Figure 1: Top panel: the WTW bow-tie structure, composed by the SCC and the IN-component only. The panels below show the countries belonging to the SCC (in colors) and the countries belonging to the IN-component (in gray) in 1993, 1998 and 2002, respectively. Countries belonging to the SCC keep rising their reciprocated degree (see also Figure 2); richest world countries (Canada, Europe, Japan, in dark red) are always characterized by the largest values of reciprocated degree.
Figure 2: Dynamics of the in-degree (defined as ) and of the reciprocated degree (defined as ) of a sample of countries (Italy, in green; Japan, in black; China, in red; Russia, in blue; India, in brown; USA, in purple; Australia, in orange): while the in-degree remains rather stable across time, the value of the reciprocated degree keeps rising once the country has joined the SCC. Such a dynamics can be interpreted as a signal of ongoing integration [16].

From a macroeconomic point of view, the increasing number of nodes within the SCC may evidence a sort of ongoing globalization process [16]. It is interesting to notice that the inclusion of (whole subsets of) countries within the SCC seems to be related to the existence of trade agreements. Examples are provided by Commonwealth nations, all of which are part of the SCC since 1993, European nations (EU as a whole joined the SCC in 1994, the same year of the EEA agreement) and the case of USA (NAFTA entered into force in 1994 as well). From a purely topological perspective, an interesting dynamics takes place: as shown in Figure 2, the reciprocal degree of nodes belonging to the SCC keeps rising. Since all nodes are characterized by a rather stable in-degree value, this finding points out the tendency of such countries to reciprocate previously established connections by creating new outgoing links (i.e., to consolidate existing trade relationships). Besides, such a dynamics suggests that the large number of paths within the SCC may be due to the large value of reciprocity characterizing it.

Let us now analyse what kind of topological information is actually needed in order to explain the mesoscale WTW structure. To this aim, let us sum up the observations about the empirical structure of the WTW by imagining a densely connected, highly reciprocated SCC - , throughout our temporal interval.

The need of considering a block model becomes evident when comparing the homogeneous benchmark provided by the DRG with its block-wise counterpart, i.e., the SBM (see Figure 3). The SBM outperforms the DRG since the network is “composed” by modules characterized by very different link densities that cannot be reproduced by tuning just one, global parameter: in fact, and .

Figure 3: Evolution of the AIC and BIC values for the WTW across the years 1992-2002: while the SBM (blue trend) must be preferred to the traditional DRG (being the network composed by parts with different link densities), heterogeneous benchmarks are, generally speaking, to be preferred. Although the DCM and the RCM are characterized by very similar AIC values, AIC and BIC weights let always the DCM win. The ddc-SBM experiences convergence problems throughout the entire temporal period.

Generally speaking, however, benchmarks encoding the degree heterogeneity are to be preferred. Interestingly, (both) nonblock models outperform block models, indicating that specifying additional information to the one encoded into local properties is indeed unnecessary. This is not surprising, however, when considering that the nodes belonging to the IN-component have zero in-degrees. The latter, in fact, are exactly reproduced by both the DCM and the RCM: the “peripherical” part of the network under analysis is, thus, automatically explained by a simpler kind of statistics with no need to invoke any a priori partition.

Let us now compare our degree-informed models over the and subgraphs. For what concerns the former, the information carried by reciprocity is encoded into the degree sequences: the result is, in fact, rooted into the observation that the links from the IN-component to the SCC are not reciprocated.

The same consideration, together with the observation that the large value is due to reciprocal connections established between nodes within the SCC, leads to the result ; similarly, . As a consequence, being the two likelihood values (overall) very similar, the model with a larger number of parameters is more “penalized” (i.e., ).

On the other hand, comparing the BCM and the DCM on the SCC leads to the conclusion that, as the latter enlarges, , since the largest contribution to the nodes degrees comes from the connections established with other nodes within the SCC itself.

Apparently, thus, two nonblock models compete, i.e., the DCM and the RCM (see Figure 2). However, the computation of the AIC and BIC weights for each model in our basket reveals that the DCM always wins. The explanation of this result may lie in the evidence that the WTW reciprocity is actually compatible with the DCM prediction, as the computation of the index reveals (it amounts at throughout our time interval) [27]. In other words, the seemingly peculiar mesoscale structure of the WTW is, to a good extent, reproduced by just specifying local constraints as the in- and out-degree sequences.

The Dutch Interbank Network. According to the axiomatic model in [28], the DIN has been described as characterized by a well-defined core-periphery structure [18]. However, as it has been pointed out elsewhere [29], such a mesoscale organization is compatible with the predictions provided either by the DCM or by the RCM, depending on the topological quantity inspected.

Notably, the DIN is also characterized by a certain degree of bow-tieness, given the presence of an SCC, an IN-component, and, differently from the WTW, also a nonvanishing OUT-component: both the and the blocks, however, are empty, and nodes belonging to the IN- and OUT- components are not directly linked with each other (but only via the SCC nodes). From a purely empirical point of view, the evolution of the DIN bow-tie structure is much more informative than the evolution of its core-periphery structure: as Figure 4 shows, while the size of the DIN SCC, in 2008, reduces to more than half its precrisis value—thus providing an additional, structural indicator of it—the number of nodes belonging to the core shows no significant variations across the same period. Very interestingly, however, the SCC starts shrinking well before 2008, a dynamics seemingly constituting an additional early-warning signal of the upcoming, topological change affecting the DIN. The IN-component, in turn, shrinks as well, while the OUT-component enlarges.

Figure 4: Evolution of the DIN bow-tie structure (the SCC is shown in gray, the IN-component is shown in blue, and the OUT-component is shown in green). The crisis period (last four points) is signalled by a sharp decrease of the SCC and IN-components size (and a corresponding increase of the OUT-component size). The size of the SCC, however, starts shrinking in 2004Q1 (deviating from the approximately constant trend observed since 1998Q1), seemingly constituting an additional, early-warning signal of the upcoming crisis. On the other hand, the DIN core (shown in orange) does not undergo any significant variation throughout the whole temporal interval.

In order to individuate the null model encoding the right amount of topological information to explain the DIN bow-tie structure, let us notice that its SCC can be imagined as a weakly-connected, weakly-reciprocated subgraph ( and , except in 2008 where the SCC reciprocity drops to ). More precisely, ; i.e., while the SCC connectance basically coincides with the one of the whole network, the core is much denser, an empirical observation that explains why the SBM provides a better explanation of the core-periphery structure; see Figure 5. Conversely, the AIC and BIC values for the SBM and the DRG are closer when considering the bow-tie structure).

Figure 5: Evolution of the AIC and BIC values for the DIN across the quarters 1998Q1-2008Q4: while the SBM (blue trend) must be preferred to the traditional DRG (being the network composed by parts with different link densities), heterogeneous benchmarks are, generally speaking, to be preferred. Although the DCM wins in the vast majority of cases (both for the bow-tie and the core-periphery mesoscale structures), quarters exist where the DCM and the RCM compete; BIC, on the other hand, lets the SBM win sometimes, when analysing the DIN core-periphery structure. The ddc-SBM experiences convergence problems throughout the entire temporal period.

Generally speaking, however, models accounting for the degree heterogeneity are to be preferred. As for the WTW, zero in-degrees and zero out-degrees are exactly reproduced by nonblock models like the DCM and the RCM. On top of this, the low reciprocity value of the DIN (amounting at ) allows us to imagine it playing a minor role in determining the nodes degrees. As a consequence, the DCM and the RCM can be interpreted as different ways to rewrite the same (configuration) model. More quantitatively, .

Deviations from this idealized picture, however, exist. This is particularly evident when analysing the block, to fully understand which reciprocity indeed plays a role (in fact, ); when considering the “peripherical” blocks, instead, one concludes that , and , (since the links from the IN-component to the SCC and from the SCC to the OUT-component are not reciprocated).

Consistently, AIC and BIC weights let the DCM win in the vast majority of cases, although in some periods the DCM and the RCM compete. Overall, this is valid when considering the DIN core-periphery structure too.

5. Discussion

The WTW and the DIN represent two real-world systems characterized by (apparently) nontrivial mesoscale structures: while the first one is characterized by a (partial) bow-tie organization, in the second one the bow-tie partition coexists with a core-periphery partition. Both kinds of mesoscale structures are characterized by interacting blocks whose internal topology is commonly believed to be determined by a nontrivial interplay between nodes connectivity and the reciprocity of connections. It is, thus, interesting to ask ourselves the extent to which such structures are, instead, accounted for by purely local information.

Remarkably, what our analysis points out is that specifying the degree sequences is often enough to reproduce these mesoscale structures, thus suggesting that the observed modules emerge as a consequence of local connectivity patterns between nodes: for example, the absence of incoming/outgoing connections for a set of nodes naturally leads them to be identified as an IN-/OUT-component.

Differences between systems, naturally, exist. Let us notice that, contrarily to what observed in the WTW case, AIC and BIC provide different answers to the question concerning the performance of block models in explaining the DIN core-periphery structure: while the Akaike criterion ranks the BCM first, the Bayesian criterion assigns the highest score to the SBM in the vast majority of temporal snapshots. If, on the one hand, this saves the role potentially played by blocks, on the other it points out that the large difference between the connectivity values of the core and the periphery [29] provides—by itself—an effective explanation of this mesoscale organization.

A second comment about the DIN concerns the observation that, when considering the core-periphery structure, the AIC values of block models overlap with the AIC values of the simpler models to a larger extent (see Figure 5): this may be a consequence of the fact that the core-periphery partition is, in some sense, less “neat” than the bow-tie one (the requirement that nodes belonging to either the IN- or OUT-components have zero in- or out-degree represents a quite strong constraint); only apparently, however, the core-periphery organization seems to require additional information to be explained, as the explicit calculation of the Akaike weights confirms.

A third comment concerns reciprocity: although it plays a role in the definition of the “core” parts (i.e., the SCC and the properly defined core), its explanatory power is much more limited than expected: as a result, the degree sequence seems to encode all relevant information to reproduce the mesoscale structures considered in the present paper, thus questioning the role supposedly played by some kind of higher-level information—e.g., a partition into blocks—to explain them.

Appendix

A.

Generally speaking, all null models considered in this paper can be recovered within the Exponential Random Graphs (ERG) framework. Following [5], a canonical ensemble of adjacency matrices must be considered, in order to maximize Shannon entropy under a given set of constraints [5]. A probability coefficient is, then, assigned to every adjacency matrix in the ensemble. The result of the aforementioned constrained-optimization problem is the well-known exponential distribution: with the Hamiltonian summing up the imposed set of constraints and being the normalization.

A.1. Degree-Informed Null Models

All degree-informed null models can be recovered as particular cases of the following Hamiltonian:defined by constraints encoding the dependence on block-specific, local quantities, in the most general case.

Block Configuration Model (BCM). The BCM is defined by the probability coefficients introduced in (6), i.e.,(where and ), to be numerically determined by solving the likelihood equationswith and . The BCM extends the results in [30, 31] to the directed case.

Directed Degree-Corrected SBM (ddc-SBM). Interestingly, upon identifying and the directed degree-corrected SBM (ddc-SBM) is recovered. Upon retaining all multipliers in (A.1) and defining , , and , one findsalthough formally equivalent, expressions (A.4) and (6) are not when coming to estimate the unknown parameters: (A.4) is, in fact, determined by solving the equationsthus requiring less parameters than the BCM [32]. The ddc-SBM generalizes the results in [30, 33] to the nonsparse case.

Directed Configuration Model (DCM). The DCM is obtained by posing and in eq. (A.1). Upon defining and , the surviving multipliers induce probability coefficients readingto be numerically determined by solving the likelihood equationswith the out- and in-degrees reading and , respectively, and , .

Stochastic Block Model (SBM). Notice that the directed version of the Stochastic Block Model (SBM) can be recovered as a special case of the BCM, by posing and in (A.1) and solving the equationswith and .

Directed Random Graph Model (DRG). The DRG can be recovered as a particular case of the DCM, obtained by posing and in (A.1). The only coefficient is determined by solving the equationwith and .

A.2. Reciprocity-Informed Null Models

Reciprocal Configuration Model (RCM). The RCM is defined by the following probability coefficients:to be numerically determined by solving the likelihood equationswith , , and .

Block Reciprocal Configuration Model (BRCM). The RCM can be redefined in a block-wise fashion, by specifying the probability coefficients defined by (A.10), (A.11), and (A.12) for each block. A Block Reciprocal Configuration Model (BRCM), thus, remains naturally defined by the system of equationswith obvious meaning of the symbols.

B.

Let us explicitly solve the BCM in the two, off-diagonal matrices and . In order to fix the formalism, let us suppose the two off-diagonal blocks and to have dimensions and , respectively. Analogously to the undirected case [12], solving the DCM within the off-diagonal blocks of the matrix induces the following probability coefficientsandthe probability that a link from a core node to a periphery node exists is and the probability that a link from a periphery node to a core node exists is . Consistently, the vector is coupled to the outgoing degrees, while the vector is coupled to the incoming degrees.

The aforementioned probability coefficients are determined via the likelihood condition in (A.3). Let us notice that the out-degree of core nodes and the in-degree of periphery nodes are measured on the matrix ; the converse is true for the matrix . More quantitatively, upon indicating with the core and periphery nodes degrees, one hasand

The estimation step, thus, reads

The SBM can be recovered by posing and , to be estimated by solvingwith obvious meaning of the symbols.

Inserting the information about reciprocity into a bipartite null model leads to the following probability coefficient:that “mixes” the information coming from the two biadjacency matrices and (whence the choice of a different symbol, , to indicate the bipartite network as a whole). The new variables read , , , and : while indicates that a nonreciprocated link is present from the core node to the periphery node , indicates that a nonreciprocated link is present from the periphery node to the core node ; naturally, indicates that both links are present between nodes and and indicates that no link is present between the same nodes.

The probability coefficients defining our bipartite, reciprocal model readwhose numerical value is determined by the following sufficient statistics, i.e., the reciprocal and nonreciprocal degrees of both core nodes(with ) and periphery nodes(with ). Notice that the binary variables defining () are the ones defining also (): in fact, the nonreciprocated links outgoing from the core (periphery) are the same links incoming into the periphery (core). Finally, the estimation step for such a model reads

C.

The aim of this appendix is providing simple examples of network configurations to further illustrate the methodology presented in the paper.

To this aim let us consider a bimodular structure where the link density of the two communities (whose number of nodes is and respectively) amounts at and where the two off-diagonal blocks have the same link density, i.e., . Let us, now, compare the explanatory power of the SBM and the DRG. The explicit calculation of the BIC for the SBM leads to the expressionfor consistency, the BIC for the DRG readswith being the weighted average of the SBM probability coefficients. In fact,

Let us now plot the trends of and as the parameter varies. As Figure 6 shows, a region of values around exists where the SBM (i.e., the model specifying the network partition into modules) is penalized: notice, in fact, that the first terms of the two expressions coincide but the SBM correction term is larger than the DRG correction term. In other words, the network is homogeneous enough to be satisfactorily described by the only, global, parameter defining the DRG.

Figure 6: Left panel: comparison between the numerical values of the BIC computed for the SBM (blue trend) and the DRG (red trend), on a bimodular network where the link density of the two communities (; ) amounts at . Notice that a region of values around exists where the SBM is penalized: only one global parameter is, in fact, enough to satisfactorily reproduce the network structure. As the link density of the off-diagonal blocks deviates from the value the network becomes more and more heterogeneous and specifying the modules is indeed rewarding. Right panel: comparison between the numerical values of the BIC computed for the SBM (blue trend) and the DRG (red trend), on a core-periphery network where the link density of the two blocks (; ) amounts at and . Notice that no region of values exists where the DRG is to be preferred: the network, in fact, is so heterogeneous that only one global parameter is not enough to account for its structure.

Let us now consider a core-periphery structure where the link density of the two communities (whose number of nodes is and , respectively) amounts at and and where the two off-diagonal blocks have the same link density, i.e., . Analogously to the previous example,while the BIC for the DRG is formally analogous to (C.2). Plotting the trends of and as the parameter varies reveals that the SBM is always to be preferred. In this case, in fact, the network heterogeneity can never be accounted for by a single, global, parameter.

As a last case-study, let us now consider the comparison between the DCM and the RCM. To this aim, let us explicitly solve both models on binary, directed networks with an increasing level of reciprocity . As Figure 7 shows, as rises the performance of the RCM becomes increasingly preferable. To better understand this result, let us think about the two extreme configurations, i.e., the perfectly a-reciprocal one with and the perfectly reciprocal one with . In the first case, the evidence that , , induces probability coefficients satisfying the equalities and , thus leading the DCM to be preferred. In the second case, the evidence that induces probability coefficients satisfying the equalities , thus leading the RCM to be preferred.

Figure 7: Comparison between the numerical values of the BIC computed for the RCM (blue trend) and the DCM (red trend) on a network with an increasing level of reciprocity .

Data Availability

World Trade Web data that support the findings of this study are openly available at the UN Comtrade Database (http://comtrade.un.org/). Dutch interbank exposures data are not publicly available due to privacy restrictions.

Disclosure

The manuscript has been presented in the CCS 2018 Conference (http://ccs2018.web.auth.gr/reconstructing-mesoscale-network-structures).

Conflicts of Interest

The authors declare no competing financial interests.

Authors’ Contributions

Jeroen van Lidth de Jeude, Riccardo Di Clemente, Guido Caldarelli, Fabio Saracco, and Tiziano Squartini developed the method. Jeroen van Lidth de Jeude performed the analysis. Jeroen van Lidth de Jeude, Riccardo Di Clemente, Guido Caldarelli, Fabio Saracco, and Tiziano Squartini wrote the manuscript. All authors reviewed and approved the manuscript.

Acknowledgments

This work was supported by the EU Projects CoeGSS (Grant no. 676547), DOLFINS (Grant no. 640772), MULTIPLEX (Grant no. 317532), Openmaker (Grant no. 687941), and SoBigData (Grant no. 654024). RDC, as Newton International Fellow of the Royal Society, acknowledges support from the Royal Society, the British Academy, and the Academy of Medical Sciences (Newton International Fellowship, NF170505).

References

  1. S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3–5, pp. 75–174, 2010. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  2. S. Fortunato and D. Hric, “Community detection in networks: a user guide,” Physics Reports, vol. 659, pp. 1–44, 2016. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  3. B. S. Khan and M. A. Niazi, “Network community detection: a review and visual survey,” 2017, https://arxiv.org/abs/1708.00977.
  4. P. Csermely, A. London, L. Wu, and B. Uzzi, “Structure and dynamics of core/periphery networks,” Journal of Complex Networks, vol. 1, no. 2, pp. 93–123, 2013. View at Publisher · View at Google Scholar
  5. T. Squartini and D. Garlaschelli, “Analytical maximum-likelihood method to detect patterns in real networks,” New Journal of Physics, vol. 13, Article ID 083001, 2011. View at Google Scholar · View at Scopus
  6. R. Mastrandrea, T. Squartini, G. Fagiolo, and D. Garlaschelli, “Enhanced reconstruction of weighted networks from strengths and degrees,” New Journal of Physics, vol. 16, Article ID 043022, 2014. View at Google Scholar · View at Scopus
  7. A. Tacchella, M. Cristelli, G. Caldarelli, A. Gabrielli, and L. Pietronero, “A new metrics for countries' fitness and products' complexity,” Scientific Reports, vol. 2, article 723, 2012. View at Publisher · View at Google Scholar · View at Scopus
  8. G. Cimini, A. Gabrielli, and F. S. Labini, “The scientific competitiveness of nations,” PLoS ONE, vol. 9, no. 12, p. e113470, 2014. View at Publisher · View at Google Scholar · View at Scopus
  9. M. Kitsak and D. Krioukov, “Hidden variables in bipartite networks,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 84, no. 2, Article ID 026114, 2011. View at Publisher · View at Google Scholar
  10. C. F. Dormann and R. Strauss, “A method for detecting modules in quantitative bipartite networks,” Methods in Ecology and Evolution, vol. 5, no. 1, pp. 90–98, 2014. View at Publisher · View at Google Scholar · View at Scopus
  11. G. Strona, D. Nappo, F. Boccacci, S. Fattorini, and J. San-Miguel-Ayanz, “A fast and unbiased procedure to randomize ecological binary matrices with fixed row and column totals,” Nature Communications, vol. 5, Article ID 4114, 2014. View at Google Scholar · View at Scopus
  12. F. Saracco, R. Di Clemente, A. Gabrielli, and T. Squartini, “Randomizing bipartite networks: The case of the World Trade Web,” Scientific Reports, vol. 5, Article ID 10595, 2015. View at Google Scholar · View at Scopus
  13. T. Squartini, A. Almog, G. Caldarelli, I. Van Lelyveld, D. Garlaschelli, and G. Cimini, “Enhanced capital-asset pricing model for the reconstruction of bipartite financial networks reconstruction,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 96, no. 3, Article ID 032315, 2017. View at Google Scholar · View at Scopus
  14. M. Tumminello, S. Miccichè, F. Lillo, J. Piilo, and R. N. Mantegna, “Statistically validated networks in bipartite complex systems,” PLoS ONE, vol. 6, no. 3, 2011. View at Google Scholar · View at Scopus
  15. F. Saracco, M. J. Straka, R. Di Clemente, A. Gabrielli, G. Caldarelli, and T. Squartini, “Inferring monopartite projections of bipartite networks: An entropy-based approach,” New Journal of Physics, vol. 19, no. 5, Article ID 053022, 2017. View at Google Scholar · View at Scopus
  16. M. Barigozzi, G. Fagiolo, and D. Garlaschelli, “Multinetwork of international trade: A commodity-specific analysis,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 81, no. 4, Article ID 046104, 2010. View at Publisher · View at Google Scholar
  17. http://comtrade.un.org/.
  18. I. v. Lelyveld and D. L. In 't Veld, “Finding the Core: Network Structure in Interbank Markets,” DNB Working Paper 348, 2012. View at Publisher · View at Google Scholar
  19. R. Yang, L. Zhuhadar, and O. Nasraoui, “Bow-tie decomposition in directed graphs,” in Proceedings of the 14th International Conference on Information Fusion, Fusion '11, pp. 1–5, USA, July 2011. View at Scopus
  20. D. Garlaschelli and M. I. Loffredo, “Multispecies grand-canonical models for networks with reciprocity,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 73, no. 1, 015101, 4 pages, 2006. View at Publisher · View at Google Scholar · View at MathSciNet
  21. K. P. Burnham, D. R. Anderson, and K. P. Huyvaert, “AIC model selection and multimodel inference in behavioral ecology: Some background, observations, and comparisons,” Behavioral Ecology and Sociobiology, vol. 65, no. 1, pp. 23–35, 2011. View at Publisher · View at Google Scholar · View at Scopus
  22. M. Á. Serrano and M. Boguñá, “Topology of the world trade web,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 68, no. 2, p. 015101, 2003. View at Publisher · View at Google Scholar
  23. A. Fronczak and P. Fronczak, “Statistical mechanics of the international trade network,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 85, no. 5, Article ID 056113, 2012. View at Publisher · View at Google Scholar
  24. G. Fagiolo, T. Squartini, and D. Garlaschelli, “Null models of economic networks: The case of the world trade web,” Journal of Economic Interaction and Coordination, vol. 8, no. 1, pp. 75–107, 2013. View at Publisher · View at Google Scholar · View at Scopus
  25. R. Mastrandrea, T. Squartini, G. Fagiolo, and D. Garlaschelli, “Reconstructing the world trade multiplex: The role of intensive and extensive biases,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 90, no. 6, Article ID 062804, 2014. View at Google Scholar · View at Scopus
  26. S. Torreggiani, G. Mangioni, M. J. Puma, and G. Fagiolo, “Identifying the community structure of the international food-trade multi network,” 2017, https://arxiv.org/abs/1711.05784v1.
  27. D. Garlaschelli and M. I. Loffredo, “Patterns of Link Reciprocity in Directed Networks,” Physical Review Letters, vol. 93, no. 26, Article ID 268701, 2004. View at Publisher · View at Google Scholar
  28. B. Craig and G. von Peter, “Interbank tiering and money center banks,” BIS Working Paper, vol. 322, 2010. View at Google Scholar
  29. T. Squartini, I. Van Lelyveld, and D. Garlaschelli, “Early-warning signals of topological collapse in interbank networks,” Scientific Reports, vol. 3, article no. 3357, 2013. View at Publisher · View at Google Scholar · View at Scopus
  30. B. Karrer and M. E. J. Newman, “Stochastic blockmodels and community structure in networks,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 83, no. 1, Article ID 016107, 2011. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  31. P. Fronczak, A. Fronczak, and M. Bujok, “Exponential random graph models for networks with community structure,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 88, no. 3, Article ID 032810, 2013. View at Google Scholar · View at Scopus
  32. J. Reichardt, R. Alamino, and D. Saad, “The interplay between microscopic and mesoscopic structures in complex networks,” PLoS ONE, vol. 6, no. 8, 2011. View at Google Scholar · View at Scopus
  33. Y. Zhu, X. Yan, and C. Moore, “Oriented and degree-generated block models: Generating and inferring communities with inhomogeneous degree distributions,” Journal of Complex Networks, vol. 2, no. 1, pp. 1–18, 2014. View at Publisher · View at Google Scholar · View at Scopus