Abstract

Interactions between two different guilds of entities are pervasive in biology. They may happen at molecular level, like in a diseasome, or amongst individuals linked by biotic relationships, such as mutualism or parasitism. These sets of interactions are complex bipartite networks. Visualization is a powerful tool to explore and analyze them, but the most common plots, the bipartite graph and the interaction matrix, become rather confusing when working with real biological networks. We have developed two new types of visualization which exploit the structural properties of these networks to improve readability. A technique called k-core decomposition identifies groups of nodes that share connectivity properties. With the results of this analysis it is possible to build a plot based on information reduction (polar plot) and another which takes the groups as elementary blocks for spatial distribution (ziggurat plot). We describe the applications of both plots and the software to create them.

1. Introduction

Network science is a powerful tool for biological research across all scales: molecular [13], genetic [46], individual [7, 8], and community [9, 10]. The conceptual framework is valid for them all, and this fact has fostered both theoretical and applied developments. An important subset of biological networks are bipartite. They have two different classes of nodes. Each one may be tied to nodes of the opposite guild but never to its peers.

Gene-protein, host-pathogen, and predator-prey interactions are the basis of bipartite biological networks. A common structural property of them is the core-periphery organization [1113]. This fact is well-known in ecology. In mutualistic communities there is one group of very interconnected nodes, the  generalists, that provide stability and resilience [14]. Species with a low number of links (degree) are tied to those specialists. This property is called nestedness, and there are different indexes to measure its strength [15]. Another important structural feature is modularity, which accounts for the existence of small groups of nodes with a high number of links (also known as degree) inside a network sparsely connected [16].

In many cases the issue of interest is not the generalization of the network properties but the study of a particular system itself. In these fields dealing with complex systems, scientists are more interested in finding special relationships or understanding the role of a specific node than their statistical properties. A more detailed, qualitative rather than quantitative analysis about relationships in a complex network may be more useful for some researchers in medicine, biology, sociology, or even economy. Visualization may play an important role in network analysis as an interface between data and people [1719].

The range of possible applications is wide [20]. For instance, a field ecologist could identify central species and those most endangered within a community with a good network plot. A clinical researcher may detect anomalies in complex gene-protein associations. Visualization is an essential procedure in the exploratory stage [21] and requires fast and interactive applications able to disentangle structure. General purpose network analysis applications like Gephi [22] are a good choice to have a quick overview, but plotting bipartite layouts is not their primary purpose. Although a lot of effort has been put in development, those tools designed for bipartite biological visualization are still scarce [2328].

The most common plots in literature are the bipartite graph and the interaction matrix, two ways to visualize a bipartite network of any kind. In the bipartite graph, nodes of both classes are plotted along parallel lines. Interactions appear as links amongst them (Figure 1(a)). On the one hand, it is quite simple, as it makes clear the separation of guilds. On the other hand, it is not easy to follow indirect interactions, those between two nodes of the same class linked by a common node of the opposite. They are not much relevant in affiliation networks (journals-authors, movies-actors) [34] but are extremely important in many biological networks. They create feedback loops that increase complexity and eventually emerging properties that arise from it [35, 36].

For networks with more than 75 nodes, the bipartite plot becomes extremely confusing. It is hard to distinguish individual links and impossible to follow indirect interactions. Accumulation of links in the space between guilds creates what is known as the hairball effect [37], but the main shortcoming of the bipartite plot is that it does not show the network hierarchical organization.

In the interaction matrix, nodes of one guild are arranged along rows and species of the opposite guild along columns. A filled cell marks the interaction between two species (Figure 1(b)). With the interaction matrix it is possible to visually discover patterns of nestedness and modularity, so it is more expressive in the representation of structure. On the other hand, indirect interactions are even less apparent than in the bipartite plot. The matrix also becomes difficult to interpret when the number of nodes and links raises.

To overcome the drawbacks of the bipartite graph and the interaction matrix, there are two possible attack strategies: information reduction or taking advantage of known network traits to order nodes and links in space. In this paper, we explain how structural properties of bipartite biological networks are the basis of two new types of visualization. Both rely on a classical technique called -core decomposition [38]. We also describe an interactive application to plot them.

2. Plots

The rationale behind this research is that as biological networks are not random, this fact should provide a natural way to group nodes using their topological properties. These groups must be the basis for a spatial distribution that minimizes the hairball effect and, in addition, makes structural sense.

The -core decomposition is a fast and efficient technique to cluster nodes by their connectivity properties [39, 40]. The -core of a graph is a maximal connected subgraph of degree . Each node of the core of order (called -shell) has links with at least other nodes that belong to that same core. The practical implication of this definition is that nodes are classified according to their connectivity. The innermost shell is the set of highest index nodes. Nodes with higher degrees are the generalists. As index decreases, nodes become more specialist. The usual way to identify the subsets is the pruning algorithm: one starts pruning the nodes with just one link, recursively. This subset of nodes constitutes the 1-shell. The remaining nodes are tied by at least two links. In the next step one extracts nodes with only two links, also recursively; this subset is the 2-shell. And so on. This procedure helps to recognize how the nodes of the -shell are tied to the network. We refer to [41] for further details on the  k-core analysis of bipartite networks.

As a result of the analysis we define two magnitudes. The first one is . The of node of guild is the average distance to all nodes of the innermost shell of guild (set ).where is the shortest path from node of guild to node of guild . In an intuitive way, measures how far the node is from the most connected shell, the group that is the corner stone of the network.

The second magnitude is . It is defined as the reciprocal of sum of the reciprocal values of of neighbour nodes:where is the element of the interaction matrix that represents the link, considered as binary (1 if , 0 if ). Note that is a weighted degree.

2.1. The Polar Plot

The -core decomposition helps to visualize very large systems and networks and to understand their structure [42, 43]. In particular, the fingerprint plot uses a polar coordinate system [44]. Nodes are depicted at a distance proportional to the shell they belong to and their areas are proportional to their degree. The plot includes only a fraction of links. There are some variations that work on the principle of edge-bundling [45], merging nodes and links to create a more readable plot [46].

Taking this idea as the starting point, we build the polar plot. Differences are noteworthy. The first one is the bipartite nature of the networks, so space is divided into two half-planes, one for each guild. Node shapes are also different for each guild. This plot provides an overview of how far from the core the nodes are and, at the same time, their connectivity (by the size of the marker) and to which they belong (by the color of the marker). This visualization is interesting to detect some special features of the network; for instance, a well bonded core will present the innermost shell at distance equal to one, and a nested network will show a periphery close to the core. This plot shows the periphery nodes less relevant for the network connectivity as markers far away from the core and it allows detecting highly connected nodes that do not belong to the core. Angle does not convey information; the algorithm computes it to reduce node overlapping. Links are not displayed.

Optionally, the user may choose to display the histograms of , , and . The histogram shows the distribution of node distances to network core. The histogram is very similar to the degree distribution but with noninteger bins, due to the weights in its definition. The most interesting histogram is that of the ; a typical nested network exhibits a U-shaped histogram. This shape of distribution is related to a big core and numerous peripheral nodes; an L-shaped histogram is related to a network with too many peripheral nodes and a small core.

Figure 2 is the polar plot of a host-parasite assembly with a characteristic high concentration of nodes in the innermost shell. Most nodes lay inside the 1 circle, but there is a sensible number of outlying species. This network is moderately nested (, where NODF is a common measure of nestedness that ranges from 0 to 100 [47]).

2.2. The Ziggurat Plot

The polar plot does not show network links, as it works on the information reduction strategy. The ultimate goal of this research is the creation of a new kind of diagram with as many details as possible. The basic idea is grouping nodes by their . If we stick nodes with the same in a reduced area, links amongst the same shell nodes will not spread across the whole space. Only ties with their edges in different shells would have long paths. The receding stepped-shape of each group of nodes reminded us of the ancient Mesopotamian temples, so we have called this second kind of plot ziggurat.

This simple principle is not so easy to implement. The bipartite nature of networks means that links have to go from one guild to the opposite.

The core-periphery organization implies that there are many ties from 1-shell nodes towards upper -index groups. Nodes with high degree are prone to be visually suffocated by surrounding links in the bipartite graph. See plant species numbers 1, 2, 3, 4 in Figure 1. This challenge is a formidable obstacle.

The general procedure to draw a ziggurat plot is as follows:(1)Perform the -core decomposition and assign each node its -index.(2)Compute and .(3)Draw the highest -index shell of each guild as a group of stacked rectangles ordered by and decreasing height.(4)For () and () draw species groups as stacked rectangles, with the one with smallest as the basis. Raise the position of the basis according to their -index.(5)Draw 1-index species as rectangles in the outer part of the plot. If two or more nodes of 1-shell are tied to the same node of a higher shell, merge them into just one rectangle.(6)Draw outliers, chains of 1-shell nodes tied to other 1-shell nodes.(7)Draw nodes disconnected of the giant component as a small bipartite plot in the lower area of the graph.(8)Draw links.

Figure 3 shows a ziggurat plot under construction. It is the same network of Figure 1. The -core decomposition puts each species inside one shell; we do not show nodes of 1-shell at this moment. The maximum -index is 4 for this community.

The innermost shell (4-shell) is found on the center of the plot, slightly leftwards. Nodes are rectangular-shaped and are ordered by . Heights decrease just for plotting convenience. The specular position of both guilds leaves space to draw the links amongst them. In Figure 3 we have plotted just three connections from pollinator 1 towards plants of 4-shell.

Lower -shells have ziggurat shape, with nodes ordered by ascending , so pollinator 7 is the closest to the innermost shell in 3-shell. Links inside the shell (grey color) connect the left sides of rectangles (plant 4-pollinator 9). Links between two different shells (green) connect the right side of the highest -index node to the left side of the lowest one (plant 17-pollinator 7).

3-shell ziggurats are more distant from the horizontal axis than 2-shell ziggurats. Moving them up or down, it is possible to change the area of the internal almond-shaped space defined by the ziggurats and the innermost shell triangles. This area is key because links from 4-shell lay here and do not cross the inner ziggurats.

The outer space is the 1-shell nodes home. We divide them into three groups: outsiders, tails, and chains of specialists. Outsiders are nodes disconnected from the giant component. They are unusual in recorded ecological networks because by definition they do not interact with the community. This network lacks outsiders. Tails are nodes directly connected to higher -index nodes. They are very common, and to reduce the number of lines we apply a simple grouping rule. If tails are tied to the same species of a ziggurat, we plot them in a unique box with just one link. Chains of specialists are less frequent. They are built with nodes of 1-shell linked amongst them; the edge that has a link with a higher -index shell is the root node (plant 13).

Putting everything together, we obtain the ziggurat plot of Figure 4. This may be compared with the bipartite graph of the same network (Figure 1).

Links are drawn as straight lines or splines that make the diagram more appealing to the eye. If links are weighted, setting the width of each link to be proportional to a function of the interaction strength is optional.

Note that, for a given node, the main links are those towards higher -shells. With this plot, it is very easy to observe how many links depart from a node to higher and to lower -shells. One can also determine if a node is more connected to higher -shell than another one and, then, if its contribution to the network is more important.

3. Exploratory Analysis Using -Core Plots

The ziggurat plot unveils structural details that are hard to visualize in the bipartite graph. Figure 5 is a network of associations amongst human diseases and noncoding RNA (lncRNA); we refer to the original paper for comparison with the bipartite visualization [31]. It is a small network with 39 nodes and low connectivity, just 44 links. The highest degree lncRNA node is number 8 (XIST), which in the bipartite plot looks as the most central one. The ziggurat shows at a glance that, despite its high connectivity, it appears in association with diseases that belong to its chain of specialists. On the other hand, diseases like breast cancer and acute myeloid leukemia are associated with multiple lncRNAs.

The network of Figure 6 is slightly bigger, with 29 gene signatures used for predicting the reoperative treatment response of breast cancer and 19 pathways to different types of cancer [32]. The bipartite plot is hard to understand in the original paper, because of the number of ties, 149.

Figure 6 shows a network with a stronger hierarchy than Figure 5. The identification of genes most frequently associated with pathways to cancer is straightforward.

The main application of the polar plot is the visual comparison of networks even if their sizes are very different. Figure 7 is a subset of a disease-cofactor network. Authors selected diseases tied to at least 5 cofactor-interacting proteins (39 nodes) and plotted the bipartite graph 7. The ziggurat plot (Figure 7(a)) of the subset shows an extremely nested structure, an effect of the selection rule. The polar plot of the full network (Figure 7(b)), with 414 nodes, displays a much richer structure. Diseases are distributed across a wider range of . Most cofactors have high degree and were not filtered; the opposite happened to the disease nodes.

These figures are a small sample of the importance of choosing a good visualization tool with a correct analysis of decomposition of a network.

4. Software

The -core analysis and plotting of ziggurat and polar graphs are provided as an open source application.

4.1. The kcorebip Package

The  R package  kcorebip contains the functions to perform the analysis and to plot static graphs of a network. It comes with a set of networks for testing purposes. Ecological data were downloaded from the web of life database [48]. As the format of the web of life files has become a standard de facto by its simplicity,  kcorebip follows the same convention for input files.

The function  network_k_analysis computes the k-magnitudes and other useful indexes, using the functions that provide packages as  bipartite and  igraph [49, 50]. We refer to the user manual for details.

Ziggurat and polar graphs use basic calls to the  ggplot2 graphics package [51]. We compute from scratch coordinates and sizes, not relying on other network plotting libraries.

4.2. Interactive Application

The  kcorebip package is a powerful solution for researchers with programming skills that need high quality plots for scientific publications, but exploratory analysis requires a more interactive approach.   BipartGraph has been designed with this need in mind.

The technological choice is  Shiny, the  R reactive programming environment. It has the advantage of a native backend and a  JavaScript-based user interface that may be easily extended. This combination of technologies ensures a wide compatibility with most common operating systems.

The  interactive ziggurat is the main feature of  BipartGraph. The original implementation of the  kcorebip package only provided the  ggplot2 object to display or save. To create an interactive version we faced two main choices, replicating the code with a dynamic technology or extending  kcorebip. We found a fast and almost nonintrusive solution creating an  SVG object. The ziggurat is a set of rectangles, lines, and texts. The most time consuming tasks are network analysis and spatial distribution. These computations are performed just once, and besides each  ggplot2 element in the function plots, it creates the SVG equivalent.

The browser displays the SVG ziggurat with multiple options for the user: tooltips, select a node or a link, highlight connections, zoom in, and zoom out. In addition, a second panel shows information of highlighted nodes and the available information on Wikipedia (Figure 8).

The configuration panels make plot properties easy to modify. Visual and intuitive Shiny controls, as sliders or checkboxes, hide the complexity of the input parameters of the ziggurat_graph function.

At any moment, the user may download the high quality, high resolution static plot with the    printable ziggurat option. In order to reproduce the results or to include the graph in other environments, such as    R Markdown or    Jupyter notebooks, we added the  Download generating code button. When clicked upon, BipartGraph writes a file with the last  ziggurat_graph call, ready to use in any  R script.

There is not interactive version of the polar plot, as we think that network exploration is much easier with the ziggurat. The user may produce the static polar plot, the high quality downloadable PNG file, and the generating code, in the same way that we have explained with the ziggurat.

5. Conclusions

Visualization of bipartite biological networks is very useful for researchers when they are interested in following the paths from a node or scanning the structure of the network. Using the -core decomposition we have designed and developed two new graphs that work by information reduction (polar plot) and spatial grouping by connectivity (ziggurat plot). They provide two complementary views of internal network structure.

We would like to emphasize the importance of choosing a correct visualization of complex networks and, in particular, of bipartite networks, which helps in correctly understanding networks of a large number of nodes and high density.

We benchmarked both tools with the full collection of ecological bipartite collection of the web of life database. The ziggurat plot remains readable up to 250 nodes, that is, about fourfold the limit of the bipartite plot. The polar plot works fine for networks beyond that size because it works on reducing information, paying the price of a loss of detail.

Software is provided as open source, under a very loose MIT license, and comes in two versions. The package  kcorebip provides the full functionality for researchers with a minimum of  R programming skills. The application  BipartGraph is the fully fledged interactive environment to build both kinds of graphs for this public. Its user centric design makes it very easy to master, provides some additional features, and is open to new fields of application such as education.

Data Availability

Interaction matrixes were downloaded from the web of life database http://www.web-of-life.es. A subset of these matrixes is installed by default with BipartGraph, including all networks used in this paper. Software availability data are as follows: name of software: BipartGraph; programming language: R; operating system: Windows, Linux, and MacOS; availability: SW at https://github.com/jgalgarra/bipartgraph; user interface: web browser; license: free, under MIT License.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by Ministry of Economy and Competitiveness of Spain (MTM2015-63914-P). The authors express their gratitude to Juan Manuel García-Santi for his assistance in building the interactive prototype as part of his BEng dissertation [52]. They are also indebted to all the colleagues and friends who beta-tested the application and improved usability with their comments and suggestions.

Supplementary Materials

The authors provide a graphical abstract in video format explaining the main points of the method. They think this video can be useful to show researchers the advantages of BipartGraph software. (Supplementary Materials)