Abstract

With the development of social networks, people have started to use social network tools to record their life and work more and more frequently. How to analyze social networks to explore potential characteristics and trend of social events has been a hot research topic. In order to analyze it effectively, a kind of techniques called information visualization is employed to extract the potential information from the large scale of social network data and present the information briefly as visualized graphs. In the process of information visualization, graph drawing is a crucial part. In this paper, we study the graph layout algorithms and propose a new graph drawing scheme combining multilevel and single-level drawing approaches, including the graph division method based on communities and refining approach based on partitioning strategy. Besides, we compare the effectiveness of our scheme and FM3 in experiments. The experiment results show that our scheme can achieve a clearer diagram and effectively extract the community structure of the social network to be applied to drawing schemes.

1. Introduction

Graph drawing is a combination technique of information science and mathematics, which is employed in multiple research areas such as social network analysis. Since social networks are commonly very complex in large amount of data about features and relationships, it is difficult for people to understand the huge data. Fortunately, graphs help analytics in visualization and rationalization. Graph drawing is, given a set of nodes and sets (edge sets) of their relationships, to calculate the position of each node and plot the edges as curves. In other words, it is a transforming way from abstract data such as text and digits to static or dynamic visualized results in order to let people easily understand the principle and inner meaning of huge amount of complex data. It helps people make judgmental and analytic decision from the macro view. But, although graph drawing for social networks has been studied for several years, there are still many problems to be solved.

Currently, in most schemes of graph drawing, one social network is regarded as one kind of community structure to draw the graph instead of multiple communities in one social network. However, a social network commonly has possible features of various communities. Thus, it leads to an appearance that many graph drawing algorithms can perform well in some data sets with certain features but perform badly in more complex data. Therefore, how to detect various community structures in social networks and adapt drawing to current structure are important research problems to be solved.

Besides, in many data sets of social networks, there are various semantic information fusions and exchanges among members; however, in current drawing approaches, the impact of visualization for the semantic information is not considered; only topology model or structure features are employed. Thus, it may result in many graphs being unreadable and readers hardly fully extract the information of members or communities they care about from the drawn graphs. Therefore, we need a new drawing approach which combines topology and semantic information to make the drawn graphs readable and reasonable.

Currently, there are mainly several categories of graph drawing approaches such as node-link [14], space filling [5, 6], matrix [79], and mix [10, 11]. Node-link is relatively simple, considering nodes as vertices, only calculating their positions and representing edge as curve or fold line; space filling is a reduction of multidimensional problems, for example, reducing 3-dimensional problems into 2-dimensional problems. A nested curve such as Hilbert m-Peano curve is recursively refined to represent the data. Matrix approach represents a diagram as a connected matrix, is represented as the edge from node to node , and the attributes of the edge are encoded in visual features such as color, form, or size.

For node-link approach, there are two main drawing algorithms (single-level drawing algorithm and multilevel drawing algorithm).

In single-level drawing approaches, there are several typical types such as tree based [12], radical [13], and force directed [6]. Among them, force directed is widely used for drawing. The idea of force-directed way is proposed by Eadges [14]. It is that, mapping the relationships into physics mechanics models, nodes are replaced by small solid balls with some certain radius; the edges are replaced by springs. In initialization, the coordinates of each small ball are generated randomly. And then, to use the elastic force of springs on the balls to move the positions of the balls until the energy of whole system is minimal at last is what we call optimal state. Lately, the spring algorithm is updated in several schemes [2] and [1517]; the major difference among them is the way to compute elastic force.

Multilevel scheme is mainly used to improve the effectiveness of layout and shorten drawing time. The main idea of multilevel scheme is to recursively apply the coarsening of diagrams. The coarsening of diagrams is the abstraction representation of fine-grained diagrams in multilevel and it can be drawn much faster. In other words, it can be applied for larger data sets to enhance the visualization effectiveness and reduce the running cost at the same time. FM3 (fast multipole multilevel method) [15] is a classical multilevel algorithm applying for most of the graphs. In FM3, the diagram is segmented to several child diagrams called “solar systems”; each “solar system” is compressed to be a node and repeats the process until forming a hierarchical diagram. The method had a better effectiveness than former approaches [18]. Walshaw [19] proposed a kind of evaluation method to coarsen by maximum matching. Maximum matching is a greedy algorithm to contain the largest possible number of edges. ACE algorithm [20] decides the number of partitions by solving Laplacian matrix. The feature vectors are computed by constructing a hierarchical coarsening matrix and recursively evaluating the feature vector of each level to achieve the vector of the original diagram. Archambault et al. [21] proposed a multilevel approach based on topological feature. In the approach, the interested topological feature is firstly detected and the child diagram with the topological feature is replaced with a node in the coarsening level. And then recursively execute detection for the features and compression process. In the process, for the topological feature of each child diagram, a proper drawing approach is selected.

In this paper, we propose a new drawing scheme combining multilevel drawing and single-level drawing approaches, including the graph partition method based on communities and layout refining approach based on partitioning strategy. Graph partition based on communities is employed in the stage of graph division of multilevel drawing. Single force-directed algorithm is used for the setting of initial coordinates of layout refining process; the layout refining process based on partitions is used for the iteration of initial coordinates and optimizing process to achieve the best layout effectiveness. Besides, we compare the effectiveness of our scheme with FM3 in experiments. The experiment results show that our scheme can achieve a clearer diagram and effectively extract the community structure of the social network to be applied to drawing algorithm.

The objective of our graph drawing scheme is to quickly present readable graphs to the users which can also precisely reflect the data principle. It mainly includes three goals:(1)recognizing the communities accurately,(2)adaptive layout,(3)reasonable use the layout space to reflect the strong and weak relations among vertices.

We propose a new adaptive scheme to achieve the above goals.

3. Assumption and Notations

In this paper, we mainly research on undirected graphs. Undirected graph can be represented by , where represents the set of vertices and represents the set of edges. In our work, we target on connected undirected graph. The definitions of notations are as follows: is number of vertices; is number of edges; is neighbor set of node ; is the edge between nodes and ; are the coordinators for distributing nodes; is the distance between two nodes in the graph; is the distance between any two nodes.

In our scheme, we adopt small-world network theory. As we all know, researchers have studied small-world network theory for a long time, but most of the researches focus on exploring the principle and topology of small-world networks. For example, a common job of social networks analysis is recognition of the modes and relations among the connected nodes which represent some social implications such as social status. Actually, many networks to be visualized have some features such as community characteristics. Those features can be easily recognized by people straightforwardly if they are shown in a graph. However, most of researchers focus on data and topological analysis of social networks, while, in the area of information visualization, small-world network theory is not fully employed. Therefore, we propose a graph drawing scheme based on small-world network theory. We separate a network to some small hierarchical communities which are highly connected with each other inside each community. And it is much more convenient for users to observe the relations and groups among members and understand the relations structures in the graph.

4. The Graph Drawing Scheme

It mainly involves two steps: communities partition and adaptive refinement.

4.1. Communities Partition

In community partition, we adopt a filtering approach to separate a graph into a hierarchy of subnetworks by finding out the weakest edges as the separation starting edges. The procedure is as shown in Figure 1.

The process is to calculate the edge strength to find out the weakest edges in the network and then delete the weakest edges so that it can be separated into subnetworks with stronger connections inside each subnetwork.

The procedure of community partition can be divided into 3 steps.

(1) Filter Out Weak Edges. The edge strength represents its contribution to the clustering coefficient. If an edge connects two uninteracted groups of neighbors, then strength of the edge is considered zero. Thus, the edge is weak and filtered out.

We can set up a threshold value , and once strengths of the edges are lower than , they would be filtered out. Thus, the original graph can be divided into some subnetworks. Based on our observation, we found that the threshold value is related to the maximum of edges strengths instead of empirical value. Then we propose an approach to identify the threshold value. Find out the maximum and minimum of the strengths of all edges, and then . When ratio is close to the biggest strength such as 0.95, it can guarantee the accuracy.

Given an edge , the edge strength can be calculated as follows (as shown in Figure 2).(1)Separate the neighbors of and into three subsets which have no interaction with each other.(2) represents the set of all ’s neighbors which are not adjacent to .(3)Similarly, represents the set of all ’s neighbors which are not adjacent to .(4) represents the set of common neighbors of and . represent the number of edges between set and set .(5) represents the ratio of the real exist edges and all possible edges between and .(6)Any edge between and or is a part of a 4-edge closed loop which must have an edge of .(7)We define that is the ratios of 3-edge closed loop including . Then the strength of can be calculated by the following equations:

(2) Hierarchical Decomposition. Recursively apply filtering out weak edges until the final graph has enough small size to maintain the communities characteristics.

(3) Selection of Threshold Value Decides the Form Way of Clusters. The technique of graph drawing provides a good view way for the data sets. When graph becomes very big, subgraphs would be presented by dense areas.

However, this kind of hierarchical clustering way has a disadvantage which is that deciding the edges strength of higher level graph is difficult after each time of hierarchical decomposition. For example, there are three subgraphs , , and which are the same size. In the original graph, there were 10 edges between and , while there are 2 edges between   and . Apparently, the relation between and is much closer than that between and . But, after the hierarchical decomposition, in the high level graph, the number of edges between and becomes 1, so does the number of edges between and . It completely loses the weight of relations. It is not reasonable. Therefore, we propose an approach to solve the problem.

Suppose the original graph is , after a time of hierarchical clustering, a higher graph is generated, and then and are generated sequentially. Suppose is one edge of and its strength is smaller than ; that means that in the clustering process it would be deleted. After it is deleted, and are located in two different clusters and ; we use and to represent two clusters. Then, in the higher level graph, there would be an edge which connects with . Then, in , the new edge strength is calculated as the following formula:

At the same time, represents the number of edges between and in . That means the edges strength in is decided by .

4.2. Adaptive Refinement

In graph drawing of social networks, analyzers are often interested in the topology of relationships and consider drawing strategy according to physical model, instead of the semantic information. Thus, it leads to much important relationship information which would be covered or lost. Therefore, we propose a new refinement approach to fully use layout space to make the visualization result intuitionistic based on topology and semantic characteristics. It mainly achieves three objectives: make the communities partition as clear as possible by regionalization that is making drawing of the different communities in different regions, and the size of regions is expected to reflect the size of communities; make full use of layout region which can minimize the intersection part of each of subregion; decide the distance between two vertices which is expected to reflect their relation strength, and the distance between two communities is also expected to reflect two communities relation strength. To achieve the three goals, the approaches are as follows.

Suppose there are constructed hierarchical graphs list , , for graphs and , where is compressed result from . Suppose has nodes, is the size of the community nodes set, is one vertex of which is compressed result from of , Areau is the distributed area of in the process of layout for , and is the rectangle area of in the limited area of . The area of each vertex of the layout for is calculated as follows.(1)Calculate the area of : (2)Calculate the rectangle area of partitioned in the layout of , and the width and height of should satisfy the equations: Since there maybe overlaps between rectangles, we need to reduce the overlaps.(3)Locate the subrectangle and minimize the overlap of rectangles. In other words, reducing the overlap area represents more usage of space. After the step , get the rectangle areas of all the vertices of ; we need to distribute them to appropriate positions to make the overlap minimum. For human view, each subrectangle can be laid flat on the original rectangle. The overlap areas of subrectangles are calculated as follows (shown in Figure 3).

Suppose and of represent two subrectangles, their areas are marked as and , their width and height are and and and , and the coordinates of their center points are and . Consider and . To identify whether two rectangles have intersection area, we need to check whether the distance between two center points is bigger than half of the sum of their width or height along the coordinates or . The following equations should be satisfied:

If the above two equations are satisfied, two rectangles would have intersection part.

After identifying that they have intersection part, we can calculate the overlap area as the following formula:

After computing the overlap areas of all subrectangles, we need to maintain a matrix 0-1 to record whether they are intersected with each other. Then we need to check them one by one; if there are vertices in the graph , we need to maintain a matrix whose size is , marked as ShadowM. Then all overlap areas can be calculated as the following formula:

Then, according to the objectives, we need to make the overlap areas as small as possible.

In order to achieve the third goal, we define the standard distance between two entities: suppose the edge strength of two entities is , and then their standard distance is calculated by the following formula:

Suppose and are the actual coordinates of vertices and , and then the real distance ; we use to represent the deviation of real distance and the edge strength. So   is expected to be as small as possible to make sure the real distance can reflect the edge strength.

Considering the three above goals, the objective function is as follows:

5. Main Algorithms

In our scheme, we propose three main algorithms:(1)community partition algorithm,(2)hierarchical compression algorithm,(3)optimization algorithm based on blocks.

The first one is used to compress the first layer; the second algorithm is for generating hierarchical compression map which is refined in the third algorithm.

5.1. Community Partition Algorithm

Community partition algorithm is the first step of our scheme. The procedure is as follows.(1)Calculate the set of neighbors of all edges, including the single neighbor-sets of the two nodes of one edge which have no intersection part, 3-edge circle neighbor-sets which can form a circle including 3 edges between the neighbor and the two nodes, and 4-edge circle neighbor-sets which can form a circle including 4 edges between the neighbor and the two nodes. These neighbor-sets are the base of calculating the strength of edges in the next step.(2)Calculate the strengths of the above edges.(3)Compare and find out the maximum and minimum of edge and calculate the filter threshold value.(4)Delete all the edges whose strengths are lower than threshold value and update the diagram.(5)Recalculate the connected components of updated diagram and return the value.(6)Compress each connected component into a packed node as a new vertex of the updated graph.

And we add edges for new vertices if they have relationship in their original diagrams.

The pseudocode is shown in Algorithm 1.

Input: Graph  
Output: New graph after Community partition
compressGraph(Graph   )
for each     in   , do
u_e.node1; v_e.node2;
// Compute the single neighbours, 3-circles common neighbours and 4-circles common neighbours of     and   .
computesES( );
maxESmax(Set(ES)); minESmin(Set(ES));
thresholdcomputeThresh(maxES, MinES);
filterLowEdges( , threshold);
computeComponent( );
for each component in g.components
newNodecompSubG( ); newG.add(newNode);
newG.addEdges();

5.2. Hierarchical Compressions

Hierarchical compression is the second step; its pseudocode is shown in Algorithm 2.

Input: Graph oriG;
Output: Hierarchical graphs after compression
hierGraphs.push(oriG);
while(hierGraphs.topG.nodes.size > tN)
newGcompressGraph(hierGraphs.topG);
hierGraphs.push(newG);
returnhierGraphs;

The algorithm is built on algorithm 1. It firstly compresses the original diagrams and puts them into a stack, and then judge whether the top element of queue satisfies the compression condition; if yes, take it out, compress the front element, and push it into the queue again.

5.3. Optimization Algorithm Based on Blocks

The main procedure of single-level partition algorithm includes three steps:(1)adopt spring algorithm to locate the coordinates of single-level diagram,(2)set partitions and put each vertex into the partitions,(3)execute iteratively gradient descent to achieve the minimum of intersection of the partitions.

The algorithm is a loop procedure which processes the diagram of each layer. In the first step, it adopts spring to locate the initial positions of single-level diagram; secondly, it traverses all the vertices; because each vertex corresponds to the new child diagram connected component from the top of the stack, the size of each component is different from the others. Then, according to the size of component, it allocates subareas which are in proportion to the width and height of original area, and set the center coordinate as the initial coordinate of the vertex from the first algorithm. In the third step, based on iteratively gradient descent approach for the intersected parts of subareas, the minimum value can be achieved and the final coordinates of all areas are created. Its pseudocode is shown in Algorithm 3.

Input: hierarchical compressed graphs-hierGraphs
Output: the layout result
while(hierGraphs not null)
springlayout(hierGraphs.topG);
layoutdistrArea(hierGraph.topG.nodes);
optimize(layout);
hierGraphs.pop();

6. Evaluation

In community partition, we compare our scheme with a popular partition approach [3] which is based on empirical value. We adopt 10 data sets of social networks with communities’ structure. The configuration of experiment is operating system, WIN7, CPU, Intel(R) Core(TM)4 Quad CPU2.33 GHZ; Memory, 4 G, and area of layout is .

In multilevel drawing stage, we compare our scheme with fast multilevel algorithm FM3. The data sets are from Newman classical data sets which include two groups of artificial social networks graphs with communities, three real data sets “subScience,” “football,” and “polbooks.” The artificial graphs include a social network with 128 nodes and 1009 edges and a scientists working network with 379 nodes and 914 edges. Football graph represents the networks of competitions which involve the football teams in a competition season, which include 114 nodes (teams) and 615 edges (competitions); polbooks is the data set of the sales of American political books in http://www.amazon.com/ where edge represents that two books are bought together in one order.

6.1. Evaluation Metrics

We adopt cluster quality value MQ to evaluate the accuracy of community partition. MQ is the average value of the density of edges inside a community. After partition for a graph, if MQ is bigger, it represents that the partition result is closer to real community result. MQ is calculated as follows:

In several common algorithms, the selection of MQ threshold value is decided by empirical value based on statistics approach. Given a value of MQ as “ ,” we can identify the probability of partition effect higher than .

In our scheme, the threshold value is computed by ), which is described in Section 4.1.

6.2. Community Partition Comparison

According to 10 social networks data sets, we compare our scheme with the threshold approach in cluster quality value; the result is as shown in Table 1.

From Table 1, the MQ value based on empirical value is less than MQ of our scheme. We can find that the average value of MQ based on empirical value is smaller than that of MQ based on threshold value selection approach of our scheme.

6.3. Comparison of Layout Effects

Figure 4 shows the drawing effectiveness comparison of FM3 algorithm and our algorithm on the artificial social networks data sets. The first data set has 128 nodes and 1009 edges and four obvious communities; the second data set has 5 obvious communities.

From Figure 4, our scheme and FM3 can both recognize four communities in the graph; the difference is that FM3 often presents the whole layout in a very small region; the communities partition is clear but the degree of overlapping inside the communities is high; the main reason is that the configuration of parameters needs a lot of experiments and validations to adjust the empirical values for various kinds of layout regions. Therefore, if operators are not familiar with the algorithms or do not have enough experience, they need much longer time to adjust experiments to achieve an ideal drawing result. However, in our scheme, not only four communities are recognized correctly, but also the layout space is better for visualization. Besides, it only needs one parameter to be configured (the percentage of users’ expectation on final layout space). Thus, it is easy for users to understand and operate.

Figure 5 shows the comparison of the drawing results between our scheme and FM3 on three groups of real data sets with some certain community structures. The first data set is a researchers collaboration relationship graph, “subScience,” with 379 vertices and 914 edges; the second data set is a competition schedule graph of a football club, “football,” with 114 vertices and 616 edges; the third is the sales situation of books about American Politics on http://www.amazon.com/, “polbooks,” with 105 vertices and 882 edges.

For the first data set, our scheme can recognize the main community’s structures and make full use of layout space; for the second, our scheme can present the distribution of competitions of teams; for the 3rd data set, our scheme can partition 3 categories of buyers and make good use of layout space. In general, our scheme has better layout effects than FM3.

7. Conclusion

We studied graph drawing schemes and propose a new scheme for social networks which improves the graph drawing effectiveness. Besides, we compare the effectiveness of our scheme with FM3 in experiment. The experiment result shows that our scheme can effectively extract the community structure of the social network to apply into drawing algorithm and achieve a clearer diagram.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This research was supported by National Natural Science Foundation of China (no. 61100192) and Research Fund for the Doctoral Program of Higher Education of China (no. 20112302120074) and was partially supported by Shenzhen Strategic Emerging Industry Development Foundation (no. JCYJ20120613151032592 and no. ZDSY20120613125016389), National Key Technology R&D Program of MOST China under Grant no. 2012BAK17B08 and National Commonweal Technology R&D Program of AQSIQ China under Grant no. 201310087. The authors thank the reviewers for their comments.