An Asymmetric Popularity-Similarity Optimization Method for Embedding Directed Networks into Hyperbolic Space

Wu, Zongning; Di, Zengru; Fan, Ying

doi:https://doi.org/10.1155/2020/8372928

Complexity

On this page

Abstract Introduction Materials and Methods Results Discussion Appendix Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2020 | Article ID 8372928 | https://doi.org/10.1155/2020/8372928

An Asymmetric Popularity-Similarity Optimization Method for Embedding Directed Networks into Hyperbolic Space

Zongning Wu,¹Zengru Di,¹and Ying Fan¹

Academic Editor: Dehua Shen

Received16 Oct 2019

Revised28 Feb 2020

Accepted02 Apr 2020

Published22 Apr 2020

Abstract

Network embedding is a frontier topic in current network science. The scale-free property of complex networks can emerge as a consequence of the exponential expansion of hyperbolic space. Some embedding models have recently been developed to explore hyperbolic geometric properties of complex networks—in particular, symmetric networks. Here, we propose a model for embedding directed networks into hyperbolic space. In accordance with the bipartite structure of directed networks and multiplex node information, the method replays the generation law of asymmetric networks in hyperbolic space, estimating the hyperbolic coordinates of each node in a directed network by the asymmetric popularity-similarity optimization method in the model. Additionally, the experiments in several real networks show that our embedding algorithm has stability and that the model enlarges the application scope of existing methods.

1. Introduction

Complex networks can largely simplify real systems and preserve the essential information of the interaction structure. It thus is an ideal tool for investigating complex systems. However, complex networks are models with nongeometric properties, which include a large set of tools and methodologies developed in geometry that cannot be applied to complex networks. In this context, there is a wave of studies exploring geometric properties of complex networks, aiming at mapping complex networks into latent variables (that is, hidden variables and Gaussian latent variables) or a low-dimensional metric space (that is, Euclidean space or hyperbolic space) [1–3].

Advances in network geometry have shown that structural properties observed in scale-free networks derived from real complex systems can emerge as the geometrical properties [4, 5]. Hyperbolic geometry is a branch of non-Euclidean geometry, and it has many applications in practical engineering techniques. More importantly, the random geometric model and hyperbolic embedding model of growth [3, 6] proposed later can easily explain the heterogeneity and high clustering of scale-free networks and even give the clear meaning of each coordinate. These models can not only simulate the growth of networks but also explain the dynamic process of the classic BA network model in complex networks. Related studies are thus becoming increasingly popular, and network models with geometric properties have been used successfully in many fields in network science and other disciplines, including brain science [7, 8], international trade [9], route transfer [10–12], and protein formation mechanisms [13, 14].

When this research framework was proposed, it attracted wide attention from scholars, and then models and hyperbolic spatial embedding methods were developed for it [3, 6, 15–17]. These models have a great performance in studying the potential structure of networks. In particular, in the popularity-similarity optimization method, the hyperbolic property of complex networks is combined with the hidden space property, and the spatial position of nodes is estimated by statistical inference. However, these models cannot describe real systems completely. One drawback is that those models ignore the directionality of links. The relationships between nodes may be unequal in most real networks, which is the so-called asymmetry property of links. Although the asymmetry property may bring many challenges for network embedding, ignoring the asymmetry of directed networks will lose much important information and cannot lead to a full representation of the structure and function of real systems.

Another branch of study is network representation learning or graph embedding through machine learning methods and matrix analysis [1, 18–22], including directed networks [23, 24]. Those models have provided many useful modeling inspirations for us, such as motifs, random walks, centrality information with graph convolutions [25], the high-order proximity [26], and the spatial and temporal proximity [27]. Unfortunately, an obvious limitation is that these models cannot identify the real meaning of the spatial coordinates when obtaining a vector representation of the nodes. More importantly, since it has been demonstrated that a scale-free degree distribution is a basic condition for the embedding of hyperbolic space [3], it is urgent to develop a directed network embedding model in hyperbolic space.

To further address this problem, we explore the intrinsic relationship between directed links and the network topology. Interestingly, in a nontrivial way, directed networks have a hidden bipartite structure [28]. We checked that all kinds of directed networks have such structures, and this phenomenon is the universal law of directed networks. The contribution of our work is that we offer a directed network embedding scheme based on node information multiplexing and identifying a potential topological structure (bipartite structure) as well as a new idea for the dimensionality reduction of directed network data. In addition, using visualization technology, we provide a new snapshot of directed networks in the hyperbolic space, which enables us to show the nodes’ status and macrolevel structure and features.

In this paper, we propose an asymmetric popularity-similarity optimization method for embedding directed networks into hyperbolic space through the bipartite structure of directed networks, and our methodology is grounded in the network topology information and its characteristics. From this perspective, we first introduce the concept of the bipartite structure of directed networks firstly. Based on this, the mapping model for mapping nodes from directed networks to hyperbolic space is discussed in Section 2. Four real-world directed networks are used to test the applicability of our mapping method in Section 3. Section 4 closes the article with concluding remarks.

2. Materials and Methods

2.1. Data

In the simulation, we use four empirical network datasets and give the full description of the data source, including the field of Caenorhabditis elegans (C. elegans) neural systems, the international trade, email relationships, and the international migration. (1) Email dataset is the internal email communication network between employees of a mid-sized manufacturing company [29, 30]. The network is directed and nodes represent employees and edges between two nodes are individual emails. (2) International migration dataset is a weighted directed network with 153 nodes and can be required from the world bank (http://www.worldbank.org/). Adjacency matrix describes the immigration from the country to with the migration flow . (3) C. elegans neural dataset, a directed network, describes the neural interconnection via chemical synapses and gap junctions, which can be obtained from the Wormatlas database [31]. (4) The international trade dataset describes the trade relationships among countries (or regions), which are obtained from the United Nations commodity trade statistics database (https://comtrade.un.org/). If country imports to country with weights , adjacency matrix is .

2.2. Methods

2.2.1. Complex Network and Hyperbolic Space

Hyperbolic space is an isotropic space with negative curvature that cannot be embedded in any Euclidean space. The topological geometry and hyperbolic geometry of complex networks are intimately related, which has been well explored in mathematics [32], because the shortest paths in networks, those defining chemical distances, closely follow their hyperbolic geodesics in the latent space. Poincaré disks with radius can be used to represent hyperbolic spaces according to Ref. [3]. The main property of hyperbolic geometry is the exponential expansion of space: the area A(R) of a two-dimensional hyperbolic disc grows with R as . From the perspective of hyperbolic geometry, the emergence of scale-free networks produces two exponentials: node density exponentially increases with the distance from the center of the disk, and the average degree exponentially declines with . In early studies, the hyperbolic geometric model and hidden geometric models perform equivalently for embedding scale-free network topology into a metric space.

A hidden geometric model assigns to nodes their expected degrees and , where is derived from the power-law distribution , is the minimum expected degree of node, is the average degree of the network, and is the power exponent. is chosen uniformly at random from . Then, each pair of nodes, with probability , is connected, where the effective distance , where the angular distance , and the parameters and are constrained by .

To build a hyperbolic geometry model for complex networks, −1 is set as the curvature of the hyperbolic space. Poincaré model assigns to nodes their radial coordinates with density and angular coordinates with density . The geodesic distance x between nodes and satisfies . Nodes are connected by probability . The process of constructing complex networks reflects the competition of two forces in this model: popularity (r) and similarity ().

In addition, note that the hidden metric space model and hyperbolic geometric model of a network perform equivalently through the transformation: .

2.2.2. Interaction between Links with Direction and a bipartite Structure

Directed links, as important linking features of the network mode, are increasingly used to incorporate the dynamics of the evolution and node status of real systems. The asymmetry property increases the difficulty of network embedding. The asymmetry of links can be identified in the topological information of complex systems, that is, a bipartite structure [33]. Different from directed networks, each node is split into two nonoverlapping groups, according to its unique feature, in bipartite structures. Modifying modeling methods of directed networks, therefore, is critical in overcoming difficulties.

Nodes in directed networks can be split into two parts, and each side of a link comes from different sets. Such a way of modeling is called the directed network with a bipartite structure, as shown in Figures 1(a) and 1(b). Specifically, each node in a directed network contains in set A and in set B in a bipartite structure. A directed link goes from to () and can be mapped by (). By doing so, the number of nodes becomes twice as large especially, but this does not affect subsequent work, and a directed network will be converted to a bipartite network. Additionally, the method reconstructs geometric directed networks through dividing nodes into two categories, but these two categories are not independent completely and its intrinsic relationships will be considered by the parameter during the modeling.

Figure 1

Flow diagram of the directed network embedding algorithm. The algorithm steps and intermediate input/output are illustrated. (a, b) An illustration of transforming directed networks (a) to directed networks with a bipartite structure (b). Specifically, an asymmetric link in (a) can be expressed as a link between any pair of nodes from set A and set B in (b). (c) The main principle of our embedding method. The hidden metric model with the asymmetric popularity-similarity method is used to construct the embedding process, where the estimation and optimization methods (MLE and LMH) obtain the metric space coordinates of nodes. (d) The mapping achieves the representation and visualization of directed networks in the hyperbolic plane.

Since the bipartite structure has been discovered to be a hidden geometric property in the latent metrics space [16], we also expect the origin node and the end node to be able to form directed links under the multiscale node multiplexing perspective. Relationships between direction and topology provide a powerful idea for spatial mapping and directed link prediction. Unlike in the bipartite network, node in two sets () is one-to-one and they represent a certain kind of attribute of node . For directed networks, for example, in the case of international trade networks, countries with stronger export capabilities are more likely to become trading partners with ones that have a high level of import. To show this, we develop a new method for displaying the asymmetry of the adjacency matrix of networks in hyperbolic space with a hybrid of direction and topology.

2.2.3. A Geometric Model of a Directed Network in Hyperbolic Space

The directed network embedding model describes how generative geometric directed networks are embedded in hyperbolic space. The model multiplexes node information as an embedding foundation by identifying the bipartite structure of directed networks and considers the trade-off between node popularity, represented by the radial coordinate, and similarity, represented by the angular coordinate distance, to be the definition of the connection probability.

In the binary network embedding model, the popularity is higher and the probability is greater between nodes with greater similarity. Unlike binary networks, directed networks should be linked by trading off four types of forces, namely, the similarity and popularity in each set (set A or set B). Radial coordinates ( and ) and angular coordinates ( and ) represent the popularity and similarity of nodes, respectively. The connection of a pair of nodes in directed networks would balance four forces: out-popularity , out-similarity , in-popularity , and in-similarity . The flow diagram of the directed network embedding algorithm is shown in Figure 1.

Computational implementations of the directed embedding model are as follows.

Step 1. Obtaining model parameters from directed networks with bipartite structures.
The scale-free property is ubiquitously observed in real-world networks, and hyperbolic geometry captures such features of complex networks. According to the principle that nodes multiplex information, basic properties can be computed: the average degree , the degree distribution (), and exponent of the power-law distribution in set .
However, not all of the real-world networks have the directed scale-free property. Besides, most networks are heterogeneous in node degrees and weights, and those heterogeneities enable us to address the problem: finding the backbone structure of the complex network by filtering links [17, 34] to capture the power-law property in directed networks. The method of constructing backbone networks in the paper is that the link with the smallest weight will be deleted in turns and stopping by the threshold. To find the appropriate value of the threshold, we plot the fraction of remaining nodes in the backbone vs. the fraction of remaining links for each deleting step. According to Ref. [9], the best choice of threshold is the point in the plane that maximizes the vertical distance to the diagonal, so that most nodes remain in the system and most links with small weight will be removed.
Another significant parameter of our model is . controls the clustering property of complex networks and is intractable due to its nontrivial dependence on the topological structure [16]. Estimating parameter in the symmetric network embedding model is performed by comparing the real networks to synthetic networks generated with the model using different values for several topological properties [9, 11, 35]. However, this method fails to compute the values of directed networks in experiments. Note that the embedding model splits each node into two sets, but these two sets should intuitively be dependent to a certain extent. Therefore, the in the directed network embedding model needs to take into account the clustering property and intrinsic relationships between two set nodes.
Common neighbors meet the conditions discussed above. For one thing, the clustering definition in bipartite structures derived from common neighbors to enable representing an intrinsic relationship between two sets, such as the quadrilateral-based clustering coefficient and the 4-loop density [5]. For another, empirical results indicate that the higher values of favor connections at smaller angular distances, and the number of common neighbors m grows asymptotically as a power-law function of angular similarity: [16]. Here, we use the number of common neighbors as the coarse-grained representation of the angle similarity to estimate (that is, in set A and in set B). We, finally, take the area under the curve (AUC) as a function of (namely, ) to determine the parameter by the performance of mapping effect in our model. Specifically, for international trade networks, international migration networks, and email networks, ; the C. elegans networks use .

Step 2. Embedding into hyperbolic space.
Geometric models—hyperbolic geometric models and hidden geometric models—have successfully captured the natural geometry underlying real complex networks. Similarly, a directed network embedding into hyperbolic space also has two equivalent models. Next, two geometric models of directed networks will be introduced.
In the hyperbolic model of directed networks, scale-free directed networks are generated by scattering nodes randomly into a hyperbolic disk of radius . Each node is assigned radial coordinates and angular coordinates : are in set A, and are in set B. The connection probability () of the directed link from to in the hyperbolic space represented by the Poincaré disk is defined by the Fermi–Dirac distribution: . The hyperbolic distance between and is given by the hyperbolic cosine function: . Note that hyperbolic distance can be well approximated by .
In the hidden geometric model of directed networks, connecting probability between nodes is any integral function:where the effective distance between the nodes , and is parameter of model. and represent the expect degree of nodes in set A and set B, respectively. Expect degree for every node is drawn from the power-law distribution:where , the minimum expect degree , the average degree , , and . Note that radial coordinates and expected degree have an intrinsic relationship, as and . To facilitate solving the parameters, we apply hidden geometric models for a directed network embedding in Step 3.

Step 3. Spatial positions and parameter estimation.
Given a snapshot of a real directed network consisting of nodes, we use the hidden geometric model to represent a directed network. An asymmetric popularity-similarity optimization method is proposed by us to compute the radial (popularity) and angular (similarity) coordinates for each node . In particular, the embedding process of asymmetric links aims to find the coordinates of every node such that the likelihood that the given the resulting directed network topology is generated by the model described above is maximal.
Form a statistical perspective, inferring coordinates is to find the best match the hyperbolic model by a given adjacency matrix. The maximum likelihood estimation method has been widely applied to infer coordinates in hyperbolic embedding [9, 11, 36]. The hidden variables take particular values {, } in the network and can be found from the observed adjacency matrix using Bayes’ rule as follows:where . The prior probability of the hidden variables is given byand the likelihood iswhere is adjacency matrix; if there is a link between and , = 1, otherwise = 0. The hidden variables are {, } and the effective distance . The obtained hidden coordinates are optimized by maximizing the likelihood in equation (3) and its logarithm:where the constant is independent of and .
Next, node positions can be obtained by the likelihood function . Firstly, inferring the radial coordinates is relatively easy. We derive the analytical solutions by partial derivation of the equation (6) with respect to the expected degree :By doing this, we can obtain the parameter (the expected degree of node ) from the expected degree as follows: and .
Similar to , the angular coordinates are obtained by maximizing the likelihood function equation (6). However, it is difficult to obtain an analytical solution, and numerical methods have therefore been used to estimate angular coordinates, including the standard Metropolis-Hastings algorithm (SMH) [37] and the localized Metropolis-Hastings algorithm (LMH) [11]. Since the LMH method performs well in estimating and computes angular coordinates exactly it in a distributed manner without knowing the global network topology [11], we applied the LMH method to infer angular coordinates. The global log-likelihood equation (5) is represented by the following equation:In the LMH method, the local contribution of every node i to the global log-likelihood will be defined firstly as , where . Next, nodes are visited one by one. When the particular node is visited, all other nodes have fixed the positions, and the angular position of i is moved according to the fitness, which maximizes the local log-likelihood at each node visit. The angular position is sampled at intervals with .

3. Results

3.1. Validating the Asymmetric Popularity-Similarity Optimization Method

To assess how well the asymmetric popularity-similarity optimization (A-PSO) method performs, we examine the embedding accuracy by comparing the topology inferred by the A-PSO method to real-world directed networks. Several experiments are conducted to analyze the effectiveness of the following methods: the C. elegans neural network, the international trade network, the email network, and the international migration network. Additionally, we analyze the robustness and performance of the embedding model by repeated experiments, and the results are stable. We take two measures: (1) comparing the original network to the topological structure, i.e., in terms of the degree distribution, clustering coefficient distribution, and betweenness centrality distribution; and (2) performing a global test: linking the empirical connection probability with data to the theoretical prediction with hyperbolic distance equation (1).

The first test to evaluate embedding accuracy is that we experiment and compare the first-order neighbors of nodes and the higher-order cases, including degree, clustering coefficient, and betweenness centrality, according to standard practice. We represent degree cumulative distribution, clustering cumulative distribution, and betweenness cumulative distribution in Figure 2 and observe a good match between the properties of the synthetic networks constructed by our models and real directed networks.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

The second check is that we compute the connection probability from empirical network data and compare it with the theoretical prediction given by equation (1). The observed closeness of the empirical and theoretical connection probabilities in Figure 3 suggests that hyperbolic metric spaces are reasonable representations of the directed network. In addition, we also find the area under the curve (AUC), which is widely used in the field of link prediction as an accuracy evaluation index. AUC statistics measure how well our model can reproduce networks in hyperbolic space, and the best possible result corresponds to AUC = 1. From Table 1, the embedding results with high AUC suggest that directed networks can be reproduced in hyperbolic space, especially socioeconomic networks, whose AUC values are well above 0.88: the international trade network and the email network.

(a)

(b)

(c)

(d)

Figure 3

Empirical versus theoretical connection probability. The hyperbolic embedding of directed networks is successful, as the empirical connection probability closely matches the geometric hidden model prediction. (a–d) The results of the C. elegans neural network, international trade network, email network, and international migration network, respectively. The black dashed lines are theoretical results of the connection probability, given by . Note that the parameters used in the theoretical simulation are the same as the corresponding empirical network. The pink dots are results of the empirical directed analysis. The whole range of hyperbolic distances and connection probabilities are binned, and each bin is the average of the values in it.

3.2. Linking the Hyperbolic Property and the Empirical Network

Using the A-PSO method for embedding directed networks into hyperbolic space in the above sections, we showed the association between the theoretical model and empirical networks. In this section, we first provide comparisons between hyperbolic properties (popularity, distance, and the core-periphery structure) and economic measures to show how they correlate monetary macroeconomic indicators and represent economic systems and their time evolution in hyperbolic space. Next, we analyze the relationship between disk partitions and real-function areas by comparing different measures of distance (similarity, real distance, and hyperbolic distance) in the C. elegans neural system.

3.2.1. Economic Systems in Hyperbolic Space

The international trade network is a complex system of states and their trade relations, and the aim of examining it is to understand international trade in terms of quantities and mechanisms [38–40]. The three key characteristics of the international trading system—globalization, stratification, and localization—lay the foundation for the geometric interpretation of the international trading system [9]. From the perspective of the directed network generation mechanism, closer countries have a greater chance of becoming connected by national status and similarity in the underlying trade space.

Through statistical inference techniques and the structural information of networks, a directed international trade network can be embedded in hyperbolic space. Such an embedding model uses two dimensions of national information for each country, import popularity and export popularity. Extracting a higher-dimensional national status measure can help to analyze how the multitrade relations change the world trade and even reshape the world trading system. But the international trade networks often have not the directed scale-free property; we use the filtering links method mentioned in Step 1 to address a problem, as shown in Figure 4(d).

(a)

(b)

(c)

(d)

Figure 4

The embedding results of international trade networks. (a, b) The visualization of the international trade network on the Poincaré disk. To facilitate the display of the position of the nodes, we highlight the origin nodes and end nodes. (a) Origin nodes are highlighted, and yellow nodes on the edge are end nodes so that the popularity and similarity of origin nodes in the disc are visible. (b) The end nodes are highlighted. (c) The ranking evolution over time of the centrality of major countries as measured by the distance from the center of disc. (d) The curve denotes the fraction of nodes vs. the fraction of links . and represent the number of nodes and links in backbone networks. The pink dot represents the appropriate threshold in the extracting process.

To study this issue further, we apply our model to visualize international trade systems and infer the popularity and similarity properties (, , , ) of nodes, according to the data from 20 years (1996–2016) of the international trade. Recently, studies have discussed the correlation of radial coordinates as equivalent to correlations among node degrees and the national economic scale [38, 39]. Similarly, radial coordinates and GDP have a significant negative correlation (approximately −0.5), which indicates that the spatial position of countries can be used to identify the national economic size, especially the country’s export capacity (the correlation coefficient of imports and GDP is higher than that of exports). That is, export status can be used as a coarse-grained measure of the national economy size. Moreover, the correlation coefficient between angular distance and geographical distance is not significant (approximately 0.1), which indicates that the international trade network does not have geographical clustering in the hyperbolic space. The core-periphery structure of hyperbolic network shows that only approximately 35% of its nodes are in the center position of the hyperbolic space.

The long-term evolution of the international trade system based on the network hyperbolic embedding method is shown in Figure 3(c). From the perspective of exporting ability, it presents world energy commodity trade as an imbalanced, diversified, and multipolar development. The United States and Russia have always occupied a central position, reflecting the fact that energy resources are the decisive factor in exporting capacity. In particular, with the depletion of the North Sea oil field, the UK’s export position in energy trade has been gradually marginalized. Interestingly, Saudi Arabia’s energy export status has gradually been marginalized due to the change in the direction of Saudi energy policy. Asia, Africa, and the European continent have become active areas and injected new vitality into the energy trade market. The European Community, China, and India have moved to central positions following the increase of their importing dominance. While India has moved towards a more central position of superpower status during the last few years, the United States, which is a leader among trade superpowers, has been at the core of international trade.

3.2.2. Complex Ecosystems in Hyperbolic Space

Caenorhabditis elegans is a soil-dwelling nematode that is evolutionarily rudimentary. It contains approximately 300 neurons, and neural interconnections are made via chemical synapses and gap junctions. Despite a century of investigation, knowledge of nematode neuronal networks is incomplete [41, 42]. Here, we use a directed hyperbolic network embedded framework and the potential geometry of neurons as the entry point to provide a new perspective for studying the topology and visualization of neural networks.

The C. elegans neural network is embedded in the hyperbolic space with the geometric features of the neurons and geometric distance (hyperbolic distance), and the topology is found to conform to the simple and powerful probability-based linking rule of hyperbolic embedding. We compare the embedded results with the real positions of neurons and the geometric distances: the position distance of neurons is the relative position between the neurons, and the geometric distances are hyperbolic distance and angular distance. However, hyperbolic distance in our model is asymmetric, and it is difficult for some real tasks, such as node clustering analysis and node centrality computation. To further compare distances in different metric spaces, we define the symmetric distance as follows: . The results are shown in Figures 5(a) and 5(b). From the results, we find that the angular distance is similar to the position distance. The difference between hyperbolic distances and position distance, on the contrary, is larger. This shows not only that the effective distance between neurons includes the distances of other dimensions than position distance but also that the effective distance of the nervous system is the result of a nonlinear hybrid of topological information and spatial information.

(a)

(b)

(c)

Figure 5

The result of mapping the C elegans neural network in hyperbolic space. (a, b) The relationships among the hyperbolic distance, the angular distance, and the spatial distance. The abscissa indicates the distance between the nodes, and the ordinate is the statistic for the number of distances. (c) The value of according to the starting point of different partitions, which is used to find the optimal disc partitioning result of . The blue and pink lines represent the results of , which are calculated from the angular distance and the hyperbolic distance, respectively.

Realistic networks can reproduce the properties of clustering, small-worldness, scale-freeness, and rich-club in the hyperbolic space. Another important feature of complex networks that is commonly observed is the community structure, in which the links of the inner community are dense, and the links between communities are sparse [43]. Since the connection probability is a decreasing function of the hyperbolic distance, there are no angular regions containing a cluster of spatially close nodes that are more densely connected to each other than they are to the rest of the network. Nevertheless, the angular distance of the hyperbolic disc indicates the similarity of nodes, and the partition of the disc is the potential module defined in geometric space [13]. To obtain the community structure of neural networks in the hyperbolic space after the nodes have acquired settled coordinates, the best partition of the Poincaré disc zoning is defined as the proportion that minimizes the average distance from one partition to the rest.

Since there are 10 neural functional areas, we divide the hyperbolic disc into 10 parts of 36 degrees each and compare them with real neural functional areas. Note that the angular coordinates of nodes are distributed in and that a key factor is the difference in initial position selection, which may change the effect of the partition. To solve this problem, we define the index , which represents the ratio of the distances inside the community (Poincaré disc zoning) to the external distances (the rest of the zoning), to describe the optimal partition, , as follows:where n is the number of partition and represents the average distance between nodes inside and outside the community. The range of is [0, 1]. is closer to 1, the better the community division.

In doing so, we obtain the optimal partition of hyperbolic space by selecting different starting partition positions, taking the minimum distance between nodes as the reference for the initial partition span and maximizing . We further considered hyperbolic distance and angular distance as the basis for calculating results, as shown in Figure 5(c). Due to the strong spatial position dependence of neuron function, the common information entropy of the community structure and the functional partition is very low. An interesting finding is that some neurons with higher rankings of out-directed popularity belong to the same neurological functional area, the lateral ganglion, such as RIAL, RIAR, SAAVR, RMDR, and SMDVR, but the partitions in the hyperbolic space are more scattered. This indicates that there are a large number of long-range links among the neurons of C. elegans; for example, the neuron RIAL, in the front, and VD12, in the trailing regions, have similar angular distances in the hyperbolic disc, which is also the reason why the topological partition is not consistent with the functional partition. Additionally, the functional classification of the hyperbolic embedding is a complement to the functional partitioning.

4. Discussion

In this work, we developed a mapping model of directed networks in hyperbolic space and highlighted the bipartite structure of directed networks. We especially focused on two main issues: (1) how to identify asymmetrical links from topological information, and (2) how to embed directed networks into the hyperbolic space and whether it is feasible to use empirical data to test the model. The results show that the directed links are hidden in the topological information in a nontrivial way. Based on empirical data, we mapped some real directed networks into hyperbolic space, including economic systems and biological ecosystems. We found that our method can reveal the topological features of directed networks, such as the degree of nodes, degree distribution, clustering coefficient, and core-periphery structure.

Furthermore, we analyzed the importance of nodes and evolutionary rules through visualization technology. The results show that the spatial positions of nodes in the hyperbolic space can be used to quantify the importance of nodes. The position change of nodes in the space is consistent with the evolution of state status in the economic system. More importantly, similar phenomena are observed in both socioeconomic systems and nervous systems; namely, there are significant discrepancies between spatial aggregation and the community structure of the topology. This is reflected in two ways: one is that the common information entropy between the community structure of hyperbolic space and the spatial distribution of nodes is particularly low, and the other is that the Pearson correlation coefficient between the node hyperbolic distance and the true distance is not significant. Long-range links weaken the spatial agglomeration of system functions, which means that the hyperbolic embedding reflects the clustering system functions. The effective distance of the network is the result of the trade-off between topological information and geometric features.

It needs to be reiterated that, unlike the binary network, the basis for the construction of the directed network is the trade-off between the four coordinates , , , and . Undirected network embedding is only based on the trade-off between popularity and similarity, which may bias the prediction of the network links at some point. For example, consider the extreme case: if two nodes have large out-degrees and the in-degree is 0, the two nodes must be without links. The chance of their being linked when undirected networks are embedded is large. Last but not least, asymmetric links increase the difficulty in embedding, which is reflected in the embedding accuracy to some extent. The frontiers of network geometry point that the community structure can be used as a coarse version of its embedding in a hidden space with hyperbolic geometry [44, 45]. From the perspective of the mesoscale structure, the contribution of community structure should be incorporated into the embedded model in future studies, which may improve the embedding accuracy and reduce the algorithm complexity.

Appendix

In this section, the performance of the embedding method will be discussed further in the aspect of estimating angular coordinates. The aim is to observe whether the inferred angular coordinates are close to the real angular coordinates, through generating synthetic directed networks. Specifically, synthetic directed networks are created by following the A-PSO model. Inferred angular coordinates are estimated by the synthetic network and the embedding method. The process can be summarized as follows: Step 1: the expected degree is derived from the pdf and obtained by using the Monte-Carlo simulations Step 2: the angular coordinates are sampled uniformly at random from [0, 2] Step 3: the synthetic directed network can be constructed by the connection probability and model parameters

By doing this, the inferred angular coordinates are computed by the embedding method, using the topological information of synthetic directed networks. Inferred angular coordinates and real ones can be compared by the scatter plot [11]. The results illustrated that estimating coordinates of set B perform well, as shown in Figure 6(a). Specifically, the scatter points are gathered near two lines, which can be spliced into a line after translation along the X-axis. However, the scattered points in Figure 6(b) are disordered, which shows that the estimation deviation of the node angle in set A is large. The reason for this result is more likely related to the process of node angle estimation and the challenge of asymmetry.

(a)

(b)

Data Availability

The data used in this article are from public databases and can be downloaded from the relevant website. Code is available from the authors at [email protected].

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The research was supported by the National Natural Science Foundation of China (NSFC, Grant nos. 71731002 and 61573065) and BNU Interdisciplinary Research Foundation for the First-Year Doctoral Candidates (Grant BNUXKJC1921).

References

P. Cui, X. Wang, J. Pei, and W. Zhu, “A survey on network embedding,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 5, pp. 833–852, 2019.
View at: Google Scholar
M. A. Serrano, D. Krioukov, and M. Boguná, “Self-similarity of complex networks and hidden metric spaces,” Physical Review Letters, vol. 100, no. 7, Article ID 078701, 2008.
View at: Publisher Site | Google Scholar
D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguná, “Hyperbolic geometry of complex networks,” Physical Review E, vol. 82, no. 3, Article ID 036106, 2010.
View at: Publisher Site | Google Scholar
D. Krioukov, F. Papadopoulos, A. Vahdat, and M. Boguná, “Curvature and temperature of complex networks,” Physical Review E, vol. 80, no. 3, Article ID 035101, 2009.
View at: Publisher Site | Google Scholar
D. Krioukov, “Clustering implies geometry in networks,” Physical Review Letters, vol. 116, no. 20, Article ID 208302, 2016.
View at: Publisher Site | Google Scholar
F. Papadopoulos, M. Kitsak, M. Á. Serrano, M. Boguñá, and D. Krioukov, “Popularity versus similarity in growing networks,” Nature, vol. 489, no. 7417, pp. 537–540, 2012.
View at: Publisher Site | Google Scholar
A. Cacciola, A. Muscoloni, V. Narula et al., “Coalescent Embedding in the Hyperbolic Space Unsupervisedly Discloses the Hidden Geometry of the Brain,” 2017, https://arxiv.org/abs/1705.04192.
View at: Google Scholar
C. Seguin, M. P. Van Den Heuvel, and A. Zalesky, “Navigation of brain networks,” Proceedings of the National Academy of Sciences, vol. 115, no. 24, pp. 6297–6302, 2018.
View at: Publisher Site | Google Scholar
G. García-Pérez, M. Boguñá, A. Allard, and M. Á. Serrano, “The hidden hyperbolic geometry of international trade: world trade atlas 1870–2013,” Scientific Reports, vol. 6, p. 33441, 2016.
View at: Publisher Site | Google Scholar
R. Kleinberg, “Geographic routing using hyperbolic space,” in Proceedings of the INFOCOM 2007. 26th IEEE International Conference on Computer Communications, pp. 1902–1909, IEEE, Barcelona, Spain, May 2007.
View at: Publisher Site | Google Scholar
M. Boguná, F. Papadopoulos, and D. Krioukov, “Sustaining the internet with hyperbolic mapping,” Nature Communications, vol. 1, p. 62, 2010.
View at: Publisher Site | Google Scholar
J. Zhang, “Greedy forwarding for mobile social networks embedded in hyperbolic spaces,” in Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM (SIGCOMM’13), vol. 43, pp. 555-556, ACM, Hong Kong, China, August 2013.
View at: Publisher Site | Google Scholar
M. Á. Serrano, M. Boguñá, and F. Sagués, “Uncovering the hidden geometry behind metabolic networks,” Molecular Biosystems, vol. 8, no. 3, pp. 843–850, 2012.
View at: Publisher Site | Google Scholar
G. Alanis-Lobato, P. Mier, and M. Andrade-Navarro, “The latent geometry of the human protein interaction network,” Bioinformatics, vol. 1, p. 9, 2018.
View at: Google Scholar
A. Allard, M. Á. Serrano, G. García-Pérez, and M. Boguñá, “The geometric nature of weights in real complex networks,” Nature Communications, vol. 8, Article ID 14103, 2017.
View at: Publisher Site | Google Scholar
M. Kitsak, F. Papadopoulos, and D. Krioukov, “Latent geometry of bipartite networks,” Physical Review E, vol. 95, no. 3, Article ID 032309, 2017.
View at: Publisher Site | Google Scholar
K.-K. Kleineberg, M. Boguñá, M. Ángeles Serrano, and F. Papadopoulos, “Hidden geometric correlations in real multiplex networks,” Nature Physics, vol. 12, no. 11, pp. 1076–1081, 2016.
View at: Publisher Site | Google Scholar
Y. Bengio, A. Courville, and P. Vincent, “Representation learning: a review and new perspectives,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
View at: Publisher Site | Google Scholar
B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: online learning of social representations,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710, ACM, New York, NY, USA, August 2014.
View at: Publisher Site | Google Scholar
P. Goyal and E. Ferrara, “Graph embedding techniques, applications, and performance: a survey,” Knowledge-Based Systems, vol. 151, pp. 78–94, 2018.
View at: Publisher Site | Google Scholar
Y. Shi, Q. Zhu, F. Guo, C. Zhang, and J. Han, “Easing embedding learning by comprehensive transcription of heterogeneous information networks,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2190–2199, London, UK, August 2018.
View at: Publisher Site | Google Scholar
H. Yin, L. Zou, Q. V. H. Nguyen, Z. Huang, and X. Zhou, “Joint event-partner recommendation in event-based social networks,” in Proceedings of the IEEE 34th International Conference on Data Engineering (ICDE), pp. 929–940, IEEE, Paris, France, April 2018.
View at: Publisher Site | Google Scholar
M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu, “Asymmetric transitivity preserving graph embedding,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16), pp. 1105–1114, ACM, San Francisco, CA, USA, August 2016.
View at: Publisher Site | Google Scholar
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line: large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web (WWW’15), pp. 1067–1077, Florence, Italy, May 2015.
View at: Publisher Site | Google Scholar
H. Chen, H. Yin, T. Chen, Q. V. H. Nguyen, W.-C. Peng, and X. Li, “Exploiting centrality information with graph convolutions for network representation learning,” in Proceedings of the IEEE 35th International Conference on Data Engineering (ICDE), pp. 590–601, IEEE, Macao, China, April 2019.
View at: Publisher Site | Google Scholar
H. Chen, H. Yin, W. Wang, H. Wang, Q. V. H. Nguyen, and X. Li, “Pme: projected metric embedding on heterogeneous networks for link prediction,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1177–1186, London, UK, August 2018.
View at: Google Scholar
Y. Wang, H. Yin, H. Chen, T. Wo, J. Xu, and K. Zheng, “Origin-destination matrix prediction via graph convolution: a new perspective of passenger demand modeling,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1227–1235, Anchorage, AK, USA, August 2019.
View at: Publisher Site | Google Scholar
R. Guimerà, M. Sales-Pardo, and L. A. N. Amaral, “Module identification in bipartite and directed networks,” Physical Review E, vol. 76, no. 3, Article ID 036102, 2007.
View at: Publisher Site | Google Scholar
R. Michalski, S. Palus, and P. Kazienko, “Matching organizational structure and social network extracted from email communication,” in Lecture Notes in Business Information Processing, vol. 87, pp. 197–206, Springer, Berlin, Geramany, 2011.
View at: Publisher Site | Google Scholar
J. Kunegis, “Konect—the koblenz network collection,” in Proceedings of the 22nd International Conference on World Wide Web (WWW’13 Companion), pp. 1343–1350, Rio de Janeiro, Brazil, May 2013.
View at: Publisher Site | Google Scholar
J. G. White, E. Southgate, J. N. Thomson, and S. Brenner, “The structure of the nervous system of the nematode caenorhabditis elegans,” Philosophical transactions of the Royal Society of London. Series B, Biological sciences, vol. 314, no. 1165, pp. 1–340, 1986.
View at: Google Scholar
V. Nekrashevych, Self-Similar Groups, vol. 117, American Mathematical Society, Providence, RI, USA, 2005.
J. G. Foster, D. V. Foster, P. Grassberger, and M. Paczuski, “Edge direction and the structure of networks,” Proceedings of the National Academy of Sciences, vol. 107, no. 24, pp. 10815–10820, 2010.
View at: Publisher Site | Google Scholar
M. A. Serrano, M. Boguna, and A. Vespignani, “Extracting the multiscale backbone of complex weighted networks,” Proceedings of the National Academy of Sciences, vol. 106, no. 16, pp. 6483–6488, 2009.
View at: Publisher Site | Google Scholar
F. Papadopoulos, R. Aldecoa, and D. Krioukov, “Network geometry inference using common neighbors,” Physical Review E, vol. 92, no. 2, Article ID 022807, 2015.
View at: Publisher Site | Google Scholar
F. Papadopoulos, C. Psomas, and D. Krioukov, “Network mapping by replaying hyperbolic growth,” IEEE/ACM Transactions on Networking, vol. 23, no. 1, pp. 198–211, 2014.
View at: Google Scholar
M. Newman and G. Barkema, Monte Carlo Methods in Statistical Physics chapter 1–4, Oxford University Press, New York, NY, USA, 1999.
M. A. Serrano and M. Boguná, “Topology of the world trade web,” Physical Review E, vol. 68, no. 1, Article ID 015101, 2003.
View at: Publisher Site | Google Scholar
D. Garlaschelli and M. I. Loffredo, “Fitness-dependent topological properties of the world trade web,” Physical Review Letters, vol. 93, no. 18, p. 188701, 2004.
View at: Publisher Site | Google Scholar
A. Almog, T. Squartini, and D. Garlaschelli, “A GDP-driven model for the binary and weighted structure of the international trade network,” New Journal of Physics, vol. 17, no. 1, Article ID 013009, 2015.
View at: Publisher Site | Google Scholar
L. R. Varshney, B. L. Chen, E. Paniagua, D. H. Hall, and D. B. Chklovskii, “Structural properties of the caenorhabditis elegans neuronal network,” PLoS Computational Biology, vol. 7, no. 2, Article ID e1001066, 2011.
View at: Publisher Site | Google Scholar
G. Yan, P. E. Vértes, E. K. Towlson et al., “Network control principles predict neuron function in the caenorhabditis elegans connectome,” Nature, vol. 550, no. 7677, pp. 519–523, 2017.
View at: Publisher Site | Google Scholar
M. E. J. Newman, “Detecting community structure in networks,” The European Physical Journal B: Condensed Matter, vol. 38, no. 2, pp. 321–330, 2004.
View at: Publisher Site | Google Scholar
A. Muscoloni and C. V. Cannistraci, “A nonuniform popularity-similarity optimization (npso) model to efficiently generate realistic complex networks with communities,” New Journal of Physics, vol. 20, no. 5, Article ID 052002, 2018.
View at: Publisher Site | Google Scholar
A. Faqeeh, S. Osat, and F. Radicchi, “Characterizing the analogy between hyperbolic embedding and community structure of complex networks,” Physical Review Letters, vol. 121, no. 9, Article ID 098301, 2018.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2020 Zongning Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1339

Downloads

1326

Citations