Abstract

Detecting local community structure in complex networks is an appealing problem that has attracted increasing attention in various domains. However, most of the current local community detection algorithms, on one hand, are influenced by the state of the source node and, on the other hand, cannot effectively identify the multiple communities linked with the overlapping nodes. We proposed a novel local community detection algorithm based on maximum clique extension called LCD-MC. The proposed method firstly finds the set of all the maximum cliques containing the source node and initializes them as the starting local communities; then, it extends each unclassified local community by greedy optimization until a certain objective is satisfied; finally, the expected local communities will be obtained until all maximum cliques are assigned into a community. An empirical evaluation using both synthetic and real datasets demonstrates that our algorithm has a superior performance to some of the state-of-the-art approaches.

1. Introduction

In recent years, more and more research has begun to pay attention to large complex networks, such as social networks, protein interaction networks, citation networks, and WWW. Extensive researches have indicated that community structure universally exists in complex networks and the connection between nodes in a community is closer than that between communities. Meanwhile, these nodes often have similar attributes or play a similar role. Therefore, community detection has become one of the basic tasks of complex network analysis and is of important theoretical significance and real value.

The research of community detection mostly has focused on detecting all community structures in a whole network from a global viewpoint [15]. However, the large scale of complex networks in many real applications is inconceivable. For example, the friend relation networks on Facebook and Twitter contain hundreds of millions of nodes [6], and the detection of the community structures in such huge complex networks will cost tremendous time and space. In addition, as the nodes and links of many complex networks are dynamically evolving [7, 8], it is often hard for us to acquire the complete network information, further increasing the difficulty in global community detection. Therefore, many scholars have begun to focus on local community detection of complex networks.

Different from the global community detection which classifies a total complex network, local community detection is only to inquire the community structure where a designated node (source node) is located in a network. A complex network is essentially divided into two parts, namely, the community where a designated node is located and the rest part. Furthermore, the local community where the node is located has a close internal connection within the community but a relatively loose relation with the outside. Local community detection need not know all information about a complex network in advance. It starts from a node, gradually extends from the node, and gradually acquires the local information around the current community during the extension process. The representative algorithms for local community detection include [913].

However, most of available local community detection algorithms have two restrictions: firstly, the method including direct start from a source node, continuous selection of the best nodes from candidate ones by greedy algorithm, and adding them into a local community till the local community detected satisfies all specified conditions, which makes it easy to deviate from the real local community, thus reducing the accuracy of local community detection; secondly, in this way, finally, only a unique local community structure can be obtained and when the source node is an overlapped (hub) node connecting multiple communities, it is unable to obtain all local communities.

To solve the above two problems, we propose a local community detection algorithm based on maximum cliques extension (LCD-MC), which includes finding maximum cliques and extending local communities. Its main advantages are shown as below.(i)Instead of taking the source node as input directly, finding all maximum cliques containing the source node is made as the start of local community extension, thus increasing the stability of local community detection.(ii)The flexibility in identifying overlapped node-involved local communities is realized by extending all maximum cliques satisfying certain conditions, respectively.(iii)The experimental results on both synthetic and real networks demonstrate that, compared with the state-of-the-art local community detection algorithms, LCD-MC, on one side, can obtain better local community quality and, on the other side, can effectively identify multiple local community structures connected with the overlapped node.

Let a complex network , where represents the node set, the edge set, and and number of nodes and number of edges in the network, respectively. Different from global algorithms which divide into a number of closely connected community structures, local community detection designates a node and explores the community structures in close relation with the node .

Clauset firstly proposed the formal definition of local community detection [9]. Assume that we have known a community structure (initially, contains only the node ) composed of some nodes; set is connected with nodes in but does not belong to the node set of community . The process of local community detection is to continuously select nodes from and add them to the current community till the predefined local modularity reaches the maximum value. To define the objective function , Clauset also defined the boundary of community , that is, node sets in that have at least one node connecting with , as shown in Figure 1.

Assume that the given , , and are known; is defined by where if and either node or node exists in ; otherwise, . When either node or node exists in and the other node exists in , then and, otherwise, . The larger the value is, the better the local community structure detected will be. Initially, , and Clauset used a greedy optimization algorithm for the local modularity to find the local community structure where the designated is located.

Similar to Clauset algorithm, Luo et al. proposed LWP algorithm [10], in which is replaced by a new local modularity , as shown in the following: where if both node and node exist in community ; otherwise, . And means that only one, either node or node , exists in community . In addition to a different objective function, LWP algorithm also includes addition and deletion operations, making it possible to add into or delete from community the nodes that can increase value. Besides, LWP algorithm need not predefine the size of community in advance.

LMF [11] is a local-global community detection algorithm, which proposes a fitness function, as shown in the following: where and refer to the sums of the internal node degrees and external node degrees of community , respectively, and is a resolution parameter used for controlling the size of local community. This algorithm is similar to LWP algorithm in that, according to the given , it achieves the objective of making the fitness function reach the maximum local value by addition and deletion.

Wu et al. [12] proposed a local community detection algorithm based on link similarity (LS). The algorithm firstly defines the similarity between a single node and a local community and then carries out local community detection in a decrease sequence of the calculated similarity values. In addition, this algorithm’s search process is composed of greedy clustering, optimization, and trimming.

Chen et al. [13] proposed a local community detection algorithm based on local degree center node (LMD). Though the objective of local community detection is to find the community structure where the given node is located, it was held by the authors that, for some given nodes, the detection directly starting from may not necessarily obtain ideal results. Therefore, to increase the robustness of local community detection, instead of starting the search from the given node, LMD starts from finding a local degree center node nearest to the given node and then extends local communities starting from this local degree center node with , , and as objective functions. Here, the degree of the local degree center node is greater than or equal to that of all neighbor nodes.

3. Algorithm

In this section, we introduce the proposed local community detection algorithm based on maximum cliques extension (LCD-MC), which is mainly composed of two parts, namely, algorithm FindMC for finding the maximum cliques of a node and algorithm LCD for local community extension corresponding to the maximum cliques.

3.1. FindMC Algorithm for Finding Maximum Cliques

Definition 1. Given an undirected graph , if  , for random , , and , then is called ’s complete subgraph.

Definition 2. ’s complete subgraph is ’s maximum clique if and only if is not included in the ’s larger complete subgraphs.
In FindMC, we mainly adopt the concept of the Bron-Kerbosch algorithm [19] to find the maximum cliques where the given source node is located. That means mainly using three sets, namely, , , and , whose functions are, respectively, explained as follows:(1) is used for storing the already acquired nodes forming the current clique structures;(2) is used for storing all the candidate nodes edge-connected with the nodes in , which can be used to extend all the clique structures already found and will be added to ;(3) is used for storing the candidate nodes that have been used in . If a certain node in can cluster with nodes in to form a larger clique structure, this node will be added into both set and set but deleted from set .
FindMC algorithm starts from node and constructs recursively a search tree as the nodes in set are continuously added to set . On this search tree, each internal node corresponds to a state or is called a candidate clique structure, while a leaf node represents a corresponding maximum clique.
Algorithm 1 gives the pseudocode of FindMC algorithm for finding maximum clique. In the initialization phase, FindMC stores only node in set and only ’s neighbours in set . This is because the nodes in the maximum clique where node is located can only be ’s neighbours and any search outside this will be invalid (line 01). In addition, to protect the updating of set , the nodes in are copied to nodeList to be extended, where the nodes are arranged in a degree-decreasing sequence. This is because that node with a larger degree is easier to form a maximum clique. Therefore, they are given a priority so as to improve the algorithm efficiency (lines 02~03). After the initialization is completed, FindMC algorithm, at its second phase, will execute the conventional Bron-Kerbosch algorithm recursively according to sets , , and and store the acquired maximum cliques in MCS (maximum clique set).

Input:
Output: Maximum Clique Set MCS
Begin:
(1) , ,
(2) nodeList
(3) sort the nodes in nodeList based on their degree descending
(4) foreach   in nodeList
(5)   Bron_Kerbosch ( , );
(6)    ;
(7)    ;
(8) endfor
(9) retrun  MCS
End

Algorithm 2 is the conventional Bron-Kerbosch algorithm. In this algorithm, if both sets and are null sets, then the nodes in satisfy the condition for maximum clique and will be added to the maximum clique set MCS (lines 01–03). Otherwise, it will select a pivot node from sets and and continue to make self-recursive call for each node except in . When all subprograms of are ended, will be deleted from set and added into set (lines 04–09).

Input: , , , MCS
Output: MCS
Begin:
(1) if   and
(2)   ;
(3) else
(4)   a pivot node in
(5)  foreach   in
(6)   BronKerbosch ( , , , );
(7)   ;
(8)   ;
(9) endfor
(10) endif
(11) retrun  MCS
End

3.2. LCD Algorithm for Extending Local Community Structure

A maximum clique is a very closely connected node set; however, the requirement on full connection is too strict to the definition of community structure. Therefore, the second step of LCD-MC is to further extend the local community structure according to the MCS obtained in Algorithm 1, as shown in Algorithm 3 (LCD). LCD algorithm carries out the following operation for each unclassified maximum clique (no node in such clique has been allocated to any local community):(1)initialization, which will add each node in the current clique to the local community LC and add all nodes connected with but not belonging to LC into set U (lines 04–08);(2)extending local community LC, which will select a node from that can make the greatest increase of the objective function, add it to the local community LC, and update the corresponding nodes in till there is not any node that can increase the value of the objective value (lines 09–21);(3)finally, return the required local community set LCS, each community of which containing the initial node (line 23).

Input: ,
Output: Local Communities LCS
Begin:
(1)
(2) foreach unclassified MC in MCS
(4)  
(5)   foreach   in MC
(6)    
(7)    Initialize ;
(8)   endfor
(9)   while true
(10)    
(11)    foreach   in
(12)    delta CalculateDeltaValue ( );
(13)     , max delta;
(14)    if  
(15)      ;
(16)     Update ;
(17)    else
(18)     break;
(19)    endif
(20)   endfor
(21)   
(22) endfor
(23) retrun  LCS
End

On line 12 of Algorithm 3, function CalculateDeltaValue () calculates the incremental value of the objective function after node is added to the current local community. Here, the objective function can be any objective function mentioned in Section 2. To improve the algorithm efficiency, the incremental value of objective function can be calculated according to the assumed change value generated after the addition of node into the local community. Take the objective function in (2) as an example. Assume the number of edges in the current local community LC is , the number of the edges between LC and is , where represents the delta value of edge number in LC due to the addition of , and represents the number of edges of nodes in LC connecting after the addition of into LC; then, the delta value of is

3.3. Time Complexity

Let us analyze the time complexity of LCD-MC from its two steps, respectively. As the worst time complexity of Bron-Kerbosch algorithm is [20], but LCD-MC only needs to find cliques containing the initial node, it essentially operates Bron-Kerbosch algorithm in the subgraphs formed by the initial node and its neighbors. In this way, the worst time complexity of FindMC, the first step in LCD-MC, is , where is the degree of the initial node, and that of LCD, the second step in LCD-MC, is , where is the local community detected. It should be noted that either in FindMC or in LCD is far smaller than the overall size of the network. Therefore, LCD-MC indicates satisfied time efficiency.

4. Evaluation

In this section, we make the verification comparison of LCD-MC proposed in this paper with several representative local community detection algorithms. We conduct all the experiments on a Pentium Core2 Duo 2.8 GHz PC with 2 GBytes of main memory, running on Windows 7. We implement our algorithm in C#, using Microsoft Visual Studio 2008.

4.1. Experimental Setup

We compared LCD-MC with several representative local community detection algorithms including Clauset [9], LWP [10], LS [12], and LMD [13]. Of them, Clauset was the first to propose local community detection, and, therefore, we take it as the basic algorithm. Both LWP and LS make self-inspection and delete the nodes not satisfying certain conditions while making local community extension. LWP takes in (2) as the objective function, while LS takes node similarity factor into comprehensive consideration and can adopt any objective function. Both LMD and LCD-MC start local community detection from a node other than the source node. LMD starts from the local degree center node nearest to the source node, while LCD-MC starts from the maximum cliques where the source node is located. Both LMD and LCD-MC are suitable for different objective functions. In our experiment, LS, LMD, and LCD-MC all took as the objective function.

We firstly compared the quality of local communities found by these algorithms. As these algorithms, except LCD-MC, have no ability to identify local communities connected by overlapping nodes, in experimental data, we selected LFR benchmarks [21, 22] with nonoverlapped structure and several labeled real networks, whose information is as shown in Tables 1 and 2. The meaning of the parameters in Table 1 is described as follows: , the number of nodes; , the average degree; , the maximum degree; , the minimum for the community sizes; , maximum for the community sizes; mu, a mixing parameter, the probability of nodes connected with nodes of external community. It should be pointed out that, to find the only local community structure, in the second step of LCD-MC, the maximum clique with the most number of nodes was selected for extension.

To evaluate the quality of the local communities generated by various methods, we adopt F-Measure score (FM) and normalized mutual information (NMI) [23] as the evaluation indexes.

FM is a commonly used measure for community detection algorithms. Assume is the set of node pairs , where nodes and belong to the same classes in the ground truth, and is the set of node pairs that belong to the same communities generated by an algorithm. Then FM is computed from both the precision and the recall synthetically: where precision and recall are written as (6). Consider that

NMI is another widely used criterion for measuring the performance of community detection algorithms. Formally, the measurement of NMI can be defined as where is the confusion matrix, is the number of nodes both in the th class and the th cluster, and are the number of classes and clusters, respectively, and and are the number of nodes in the th class and the th cluster, respectively.

4.2. Synthetic Networks

Figures 2, 3, 4, and 5 are the experimental results on the synthetic networks S1, S2, S3, and S4 conducted by the five local community detection algorithms. Each figure includes four subgraphs and the abscissa of each subgraph represents the value of mixed parameter . The larger the value, the less distinct the community structure; meanwhile it also shows the greater difficulty in finding corresponding community. The ordinates of each subgraph represent the evaluation indexes, which, from left to right and from upper to lower, are precision, recall, FM, and NMI.

From the experimental results, we can acquire the following observation.(1)Comparison of LWP and LS with Clauset. It can be seen that, compared with the basic algorithm, both LWP and LS can only focus on either precision or recall. That means that when a high precision can be achieved in local community detection, the recall is often relatively low, and vice versa. Viewing from FM and NMI comprehensive evaluation indexes, neither algorithm can guarantee finding a better local community than that by Clauset. Therefore, it can be known that the critical factor in local community detection is not the self-inspection of the community detected.(2)Comparison of LCD-MC and LMD with Clauset. It can be seen that, in terms of the four indexes, both these algorithms can achieve better results than those by Clauset to various extents, indicating that there are really certain restrictions in starting local community detection from the initial node. Viewing from the experimental results, compared with LMD algorithm which starts the detection from local degree center point, the LCD-MC algorithm that starts from node maximum clique is more effective.(3)As a whole, LCD-MC achieves the best results of all four evaluation indexes. On both small community networks (S1 and S3) with a mixed parameter mu smaller than 0.6 and large community networks (S2 and S4) with a mixed parameter mu smaller than 0.5, LCD-MC could find out the local community structure of each node almost fully correctly. On a highly mixed network, for example, with a mu of 0.8 or 0.9, neither LCD-MC nor the other algorithms could obtain ideal results. This just conforms to the real condition that network community structure is not distinct.

To sum up, LCD-MC can find local communities with better quality on synthetic networks compared with the other representative local community detection algorithms.

4.3. Real Networks

To further verify the performance of LCD-MC, we compare it with the other algorithms on real networks and show the comparison results in Table 3. The bold digits are the maximum value of local community quality of each algorithm for the related evaluation index. Of them, precision and recall only reflect one aspect of algorithm performance, while FM and NMI take algorithm performance into a comprehensive consideration and, therefore, are of more comparison significance. We can see that LCD-MC achieved the best results on Karate, Football, Polblogs, and Adjnoun networks (all algorithms achieved a very low NMI value on Adjnoun, which can be neglected). Though, on Polbooks network, LCD-MC results are not the best, they are only slightly inferior to the best results achieved by LMD algorithm.

To sum up, LCD-MAC also achieved better experimental results on real networks compared with the other local community detection algorithms.

4.4. Local Communities of Overlapped Node

We verified the ability of LCD-MC to identify multiple local communities where overlapped nodes are located on a LFR synthetic network. The configuration parameters of the LFR synthetic network are , , , , , , , and , where on represents the number of the overlapping nodes, and om represents the number of memberships of the overlapping nodes. The corresponding network layout is as shown in Figure 6. Table 4 shows the corresponding community distribution, in which 2, 28, 37, and 56 are overlapped nodes and each node connects two communities.

As Clauset, LWP, LS, and LMD algorithms can obtain only one local community from each node; we selected Clauset as their representative in comparison with LCD-MC algorithm. The results of local community detection on the network depicted in Figure 6 by Clauset and LCD-MC are shown in Tables 5 and 6, respectively. It can be seen that Clauset only found out one local community from each overlapped node, because Clauset algorithm can only extend the current only local community according to objective function . LCD-MC effectively found out two local communities from each overlapped node. This is because, in an initialization, LCD-MC uses maximum cliques as candidate local communities instead of only one node. Take node 2 as an example. In initialization, its maximum clique sets included , , and . Of them, the maximum clique was used as the initial local community for extension and, as a result, community 4 was found, as shown in Table 4. At this stage, the maximum clique was already included in this local community, but the maximum clique was still outside it. Therefore, LCD-MC started from to continue the search for new local community and, then, found two local community structures connected with node 2. Therefore, LCD-MC has the ability to find out multiple local communities connected by overlapped nodes.

5. Conclusion

In this paper, we propose a novel local community detection algorithm for large complex networks based on maximum cliques extension (LCD-MC). This algorithm firstly adopts the idea of the Bron-Kerbosch algorithm to find out all maximum cliques containing the source node in the network by recursion, then, takes an arbitrary maximum clique satisfying certain conditions as the initial local community, and, by continuous exploring the neighbor area around the current local community, continuously adds conforming nodes to the local community structure till there is not any conforming node. LCD-MC algorithm is most characterized by starting the extension from the maximum clique of a given node instead of starting directly from the given node. In this way, it avoids the deviation of community extension and can identify multiple local community structures connected by an overlapped node. We compared LCD-MC with some representative algorithms such as Clauset, LWP, LS, and LMD on various synthetic and real networks. The experimental results demonstrate that LCD-MC algorithm can find local communities with better quality on both synthetic and real networks. Moreover, the experimental results on a synthetic network with known overlapped community structures indicate that the LCD-MC algorithm has the ability to identify multiple local communities where the overlapped node is located.

Conflict of Interests

The authors declare that they have no financial and personal relationships with other people or organizations that can inappropriately influence their work; there is no professional or other personal interest of any nature or kind in any product, service, or company that could be construed as influencing the position presented in, or the review of, the paper.

Acknowledgments

This work was supported by the National High Technology Research and Development Program of China (Grant no. 2012AA0622022 and Grant no. 2012AA011004), the Doctoral Fund of Ministry of Education of China (Grant no. 20100095110003 and Grant no. 20110095110010), the State 863 projects (Grant no. 2012AA011004), and the Graduate Research and Innovation Projects in Jiangsu Province (Grant no. CXZZ12_0934).