Table of Contents Author Guidelines Submit a Manuscript
Complexity
Volume 2019, Article ID 3475458, 13 pages
https://doi.org/10.1155/2019/3475458
Research Article

A Semantic Community Detection Algorithm Based on Quantizing Progress

1School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang 150080, China
2Postdoctoral Research Station of Computer Science and Technology, Harbin University of Science and Technology, Harbin, Heilongjiang 150080, China

Correspondence should be addressed to Deyun Chen; nc.ude.tsubrh@nuyednehc and Hailu Yang; nc.ude.tsubrh@uliahgnay

Received 26 July 2018; Revised 25 November 2018; Accepted 11 December 2018; Published 9 January 2019

Academic Editor: Pasquale De Meo

Copyright © 2019 Xu Han et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The semantic social network is a kind of network that contains enormous nodes and complex semantic information, and the traditional community detection algorithms could not give the ideal cogent communities instead. To solve the issue of detecting semantic social network, we present a clustering community detection algorithm based on the PSO-LDA model. As the semantic model is LDA model, we use the Gibbs sampling method that can make quantitative parameters map from semantic information to semantic space. Then, we present a PSO strategy with the semantic relation to solve the overlapping community detection. Finally, we establish semantic modularity (SimQ) for evaluating the detected semantic communities. The validity and feasibility of the PSO-LDA model and the semantic modularity are verified by experimental analysis.

1. Introduction

With the development of society and the improvement of science and technology, semantic social networks are rapidly developed and many semantic networks, like Twitter and Weibo, have made an insignificant impact in our life so far. In these networks, different individuals have different small social “worlds” which are called communities [1]. Thus, researchers focus attention on community detection not only to divide networks into modules but also to make a deep insight into understanding interesting properties within the semantic social network. In practical application, semantic communities have a great promotion on intelligent information retrieval, marketing management, individual service, and other information management domains [2]. Heretofore, the research on community detection mainly reflects on the following three categories: topological community detection [3], community detection on overlapping construction [4], and semantic community detection.

The topological community detection represents the pioneer work, the goal of which is studying the topological constructions and dividing the social networks into several separate networks. The representative algorithms contain Modular Optimization [5], GN [6], and FN [7]. Then, researchers gradually focus on overlapping communities which can be more real than previous research networks. Therefore, CPM [8] was proposed to detect the overlapping communities. Soon afterwards, community detection on overlapping construction received more attention in social networks and many representative algorithms were proposed, including LFM [9], EAGLE [10], COPRA [11], DEMON [12], and so forth. Neuman and Yair [13] proposed an agglomerative spectral clustering method with conductance and edge weights. In their method, the most similar nodes are agglomerated based on eigenvector space and edge weights. But this method only is suitable for the nonsemantic social networks. Then, with the big interest in semantic network, semantic community detection came into researchers’ eyes. Yang and McAuley [14] proposed the CESNA model to develop communities by using edge structure and node attributes. This method leads to more accurate community detection as well as improved robustness in the presence of noise in the network structure. But when this method applies into semantic network, it performs instable. Reihanian and Ali [15] proposed a generic framework for overlapping community detection in social networks with special focus on rating-based social networks. This framework considers the information shared by the users in order to find meaningful communities. The most important feature of semantic communities is that the nodes in these communities not only have topological relationships, but also own semantic context. For the semantic data mining must be considered on the text analysis, and many semantic community detection algorithms applied the Latent Dirichlet Allocation (LDA) [16] model as the core model.

In the last few years, the analysis in semantic social network has become popular. Most of these algorithms utilize LDA model as the basic model. The SVM-DTW method proposed by Solera, Calderara, and Cucchiara [17] can work on the hierachical networks. This method makes simple structure and needs less input parameters, but the semantic context is not considered and the detected community has less connection with the real semantic network. Li and Ming and She [18] proposed the GRTM model which not only simulates users’ interests as latent variables through their information, but also considers the connections between users as a result of their information. This method combines the context analysis with topological analysis and the similarity of the detected community is nearly close to the real semantic social network, but it is lack in the feature of sampling that would make some fuzzy irrelevant community. Xiao and Liu [19] proposed the GLDA-FP model which can be extended using the prediscretizing method which can help LDA model detect the topic evolution automatically, but the calculation required is large. As for the LCTA model proposed by Yin, Cao, and Gu [20] which makes the different topic distributions in different communities to make the model reasonable, this method has high accuracy in the result, but the number of communities needs to be preset and some hidden parameters need to be set up with experience.

In this paper, we propose a novel community detection algorithm for the objective of dividing nodes into clusters. The main characteristic of communities detected by this algorithm is that members of the same community have common or similar interests. We take into account the topic and keywords information in text from individuals’ words through LDA model, then quantize semantic nodes, and map them into semantic space. Then, we get ideal virtual social communities after using Particle Swarm Optimization algorithm. Last but not least, we build a novel modular model and use the new function to evaluate the virtual social communities we make.

Compared with other models in semantic social network, such as lovain method model [21] and stochastic block model [22], the LDA model provides the probabilistic method so as to promote the foundation of mathematics. Then considering the following sampling, the Gibbs sampling can give an accurate and powerful mathematical proof for the convergence and solution of the LDA model, which is impossible to happen in the other semantic models. Combined with the PSO algorithm, the probability function compiled by LDA model can be closely integrated with the inertia weight and the constriction factor of the particles [23]. In performance measure, we propose a new module detecting evaluation model based on semantic information using the cosine function, which enriches the classic semantic detecting evaluation model.

The rest of the paper is organized as follows: Section 2 introduces LDA model in semantic network. Section 3 shows gibbs sampling and the proposed algorithm. In order to verify our approach, we conducted extensive experiments on a real data set. Performance evaluation and experimental results are shown and discussed in Sections 4 and 5. Finally, in Section 6 we make conclusions and envision further work.

2. Preliminaries

2.1. Community Detection Process

The problem of community detection belongs to NP-hard areas [24] which need initialize solutions at the beginning and optimize solutions constantly in the way of getting the best satisfying solution. The main goal of detecting semantic community is to form communities that individuals share common interests and probably they have similar characteristic [25]. So we show a novel idea that we focus on textual data of individuals’ words. According to the complexity of community detection, we utilize the probabilistic graphical model–LDA to design network. This model has a most clearly hierarchical structure [26], and the scale of parameter spatial has no connection with the number of training documents.

First, we select topics and words from individuals’ semantic information through LDA model. Then, we map semantic nodes into semantic space via Gibbs sampling method [27]. Last, in order to get more accurate communities, we use Particle Swarm Optimization (PSO) algorithm to form semantic communities. The proposed community detection algorithm is clearly explained in the following steps.

2.1.1. Similar Semantic Information Discovery

Every individual says different words as each node has its own information contents in semantic social network [28]. So we abstract semantic context into topic, and then we extract keywords from topic. Through semantic information, we convey some distributions to constrain our mess context [29]. In this way, dividing communities in semantic social network based on similar documents, topics, and keywords from social semantic contents make communities real [30]. The LDA probability model is shown in Figure 1.

Figure 1: LDA probability model.

In this section, we research LDA model on information contents. The relevant mathematical symbols for illustrating the LDA model are given in Table 1. LDA model assumes the following generative process for each node:

Table 1: The symbol description.

  . The parameter , which pertains to topic distribution, is subject to the Dirichlet distribution over a priori parameter .

  . The parameter , which pertains to keyword distribution, is subject to the Dirichlet distribution over a priori parameter .

  . The topic is subject to the multinomial distribution in case of topic distribution probability .

  . The keyword is subject to the multinomial distribution in case of keyword distribution probability over topic .

The process of forming LDA model is shown in Algorithm 1. And means the number of documents in the process.

Algorithm 1: The generative process of LDA.

3. Gibbs Sampling and PSO Strategy

3.1. Gibbs Sampling

Gibbs sampling [31] is a simple case of Markov-chain Monte Carlo (MCMC) [32] and aims at extracting a set of approximate samples from Markov-chain that is targeted to make a suitable probability distribution for converging to optimal solutions in high-dimensional models [33] such as LDA. According to the feature of Markov-chain, the probability-distribution function becomes the key to Gibbs sampling [34]. As for LDA in this text, we only sample topics in semantic social network; that is, we only need to consider hidden variety . We denote (topic set besides ) and (set of keywords besides ) to draw a posterior probability . As for , we can find the corresponding keyword . So the probability can be described as in the following equation.When and ( is one of the keywords in ; , which corresponds to , is one of the topics in ), the probability only involves conjugate distribution of the document and topic under the Dirichlet-multinomial model.

We make as the number of topics in document, and the multinomial distribution can be described as The number of keywords in topic, named , can be shown as follows under multinomial distribution. The posterior distribution of and can be obtained in the following equations. is the number of topics and is the number of keywords.

The distribution probability can be calculated by (6)(11). is the amount of topics while , is the amount of topics, is the amount of keywords while , and is the amount of keywords.

3.2. PSO Class Dependent LDA (PSO-LDA)

Particle Swarm Optimization (PSO) is an intelligent optimization algorithm. It was first proposed by J.Kennedy and R.C.Eberhart [35]. PSO algorithm has the advantages of simplified, rather quick convergence [36] speed and less controlling parameter, and so forth.

Compared with other optimization algorithms, such as Genetic Algorithm (GA), Ant Colony Optimization (ACO), and Simulate Anneal (SA), PSO algorithm has two attractive features: firstly, PSO optimizes the solution from the local optimum first and runs fast, which makes the algorithm more adaptable to the evolution of networks; secondly, particles in PSO can be mapped to nodes in semantic network; the process of finding the optimal solution in PSO is consistent with the birth process of the semantic community.

PSO puts a set of random solutions at system startup time and uses iterative search to find out optimal solutions [37]. In PSO, a solution of each optimization problem is called “particle”. Each particle owns fitness value of itself. So we design a heuristic method to detect communities based on PSO. Each particle searches for the optimal solution by sharing social information between individuals.

In PSO-LDA, some LDA semantic feature is put into PSO. We use nodes in semantic social network mapping to “particle” in PSO and utilize semantic information vector of each node mapping to velocity of each particle in PSO. As for fitness value, we use information similar function instead. In PSO, we normalize that the nodes in semantic social network simulate the behavior of a “bird flock”, where social sharing of information takes place, individuals’ gains from the discoveries and previous experience of all other nodes during the search for food [38]. Thus, each node, called particle, in semantic social network which is called swarm, is assumed to “fly” over the search place looking for promising regions on the landscape.

First, we assume the search place is space; and the particle position of the swarm is denoted as , the vector . Each particle has two pieces of message in the process: its “best” position with the smallest value (i.e., its personal best position) and the best function value of global particles in swarm (i.e., the global best position of all particles) . At each iteration, particle of the swarm updates its position and the velocity according to the following equation: is the current iteration, , , represents the size of population, is the dimension of the search place, is the inertia weight, and and are two positive constants. and are study factors, that is, two random numbers extracted from the range for each dimension.

In the search place, once velocity updated, the particle position is changed as in the following equation. is a constriction factor which manages and regulates the velocity’s magnitude to maintain a balance between exploration and exploitation and it can be calculated as follows:, . The constriction factor has influence on the proposed algorithm; we discuss the issue in part 4. The pseudocode for PSO is described in Algorithm 2 [39].

Algorithm 2: Optimization algorithm by PSO.

4. Performance Measure

Generally speaking, the performance measure of semantic social network is mostly based on the topological construction. And the model proposed by Shen et al. [40] is widely used in evaluating overlapping communities, which is described in the following equation: is the degree of node and is the degree of node , is the total degree of the network, is the element of adjacency matrix of the network, is the number of communities which the node belongs to and is the number of communities which the node belongs to, and is the community in the network. For we use both topological construction and semantic context to detect communities, a novel evaluation model named , which we add information similarity into topological evaluation index, is given by the following equation. is the node and is the node, is the number of communities that the node pertains and is the number of communities that the node pertains, is the total degree of the network, is the element of adjacency matrix of the network, and the range of value for is . As for the information similarity , we give a normal social graph , where is a set of nodes in the network and is the node; is the set of edges linking to graph nodes. The actual point of is to measure the structural correlation of nodes and add semantic correlation components at the same time. This is more suitable for the basic characteristics of the semantic communities. Each node has connection with an information vector ; is the information similarity of two neighbor nodes and which is calculated as is the dimension of the social network. In our method, if the semantic components of two nodes are close, the projection angles of these two nodes in two-dimensional space will be relatively small. On the contrary, the projection vectors are in contradictory situation.

5. Experimental Results

In this part, we would present and discuss the experiments with topics number analysis, evaluation criterion, real datasets, and different community detection algorithms, based on three datasets (the American College Football network dataset, the Krebs polbooks network dataset, and the dolphins network dataset).

5.1. The Analysis on Topics Number

The number of topics , which is one of the input parameters in PSO-LDA model, can influence the compactedness of communities. So we choose the following three datasets to verify the effect of topics over the result: The American College Football network is shown in Figure 2. This network, created by Newman, is a complex social network about American College Football league. Nodes are regarded as football teams and one edge, between two neighbor nodes, represents that two football teams have played a match. It contains 115 nodes and 616 edges. The Krebs polbooks network established by V.Kreb is shown in Figure 3. The nodes represent the politics books sold on Amazon. Generally, the books on political tendency are approximately divided into three classes. So in order to get topic distribution, Newman collected the political tendency in 3 steps away around each node. The dolphins network collected by Newman is shown in Figure 4. The dolphins network is made up of two families, including 62 nodes and 159 edges. We simulate each node with the semantic information to fit on Dirichlet distribution.

Figure 2: The graph of football network.
Figure 3: The graph of polbooks network.
Figure 4: The dolphins network.

In this section, we use the topic number to experimentalize on three datasets (football, polbooks, and dolphins). Figure 5 shows the comparison of and on the three datasets with . While the topic number grows bigger and the topic distribution rises higher, the number of detected communities gets bigger as rises. In Figure 5, when the topic number gets larger to a certain degree, the topic distribution tends to be stable, resulting in the increment of communities. From the comparison of and , these two performance measure models tend to decrease as increases, since the topic number arrives at an optimal point. The optimal value of is 6 in Figure 5.

Figure 5: The performance of detected communities with .

For the sake of getting communities more intuitive, Figure 6 shows the detected communities of three datasets when is 6, 12, and 18.

Figure 6: The communities for (the black nodes are overlapping nodes).
5.2. The Comparison on Different Optimization Algorithms

In this section, we do the comparison on different optimization algorithms with three network datasets above (dolphins, polbooks, and football). We compare the number of communities, the size of communities, runtime, and semantic concentration with PSO algorithm, Genetic Algorithm (GA), Ant Colony Optimization (ACO), and Simulate Anneal (SA). The result is shown in Figure 7. From Figure 7, we can see PSO algorithm makes more numbers of communities and smaller size of communities than others. As for runtime in PSO algorithm, it runs a little better than ACO and SA. The semantic concentration () [41] is a function for measuring and testing degree of coagulation on specific topic and is shown in the following equation: is the performance measure of communities links, while and only if and belong to the same community, there is a link between and . Compared with similarity function , makes focus on the stability of social groups in local environment. But what needs to be noted is that higher does not mean higher in communities and higher does not mean we can get the best divisions; this is because the overlapping part of communities can effect the semantic cohesion. So the ideal community construction should be suitable with and , and this also fits the performance measure of overall optimization and local optimization. Compared with GA, ACO, and SA in Figure 7, the detected communities by PSO have a little small size and a bit more community numbers, which is in accordance with the topic distribution. As for runtime, PSO runs a bit slower than ACO but much better than GA and SA. Figure 8 shows four optimization algorithms run on dolphins network, and as similar as Figure 7, PSO works much better than other algorithms on community detection.

Figure 7: The performance of different optimization algorithms.
Figure 8: The comparison on different optimization algorithms on dolphins (the black nodes are overlapping nodes).
5.3. The Comparison on the Constriction Factor with and

In this section, we compare and over three datasets. The run diagrams, which and run in three datasets, are shown in Figure 9. From (16), we put the similar function of information into and . So generally, the tendency of diagram can be higher than . The maximum value of in football dataset is 0.4233 ( =0.52) and is 0.4064 (=0.53); and there exists bias when =0.53, and the value of is 0.4112 (not the maximum one). There is also bias in polbooks dataset and dolphins dataset, and the maximum value of is 0.4154 ( =0.54) and is 0.3982 ( =0.55) in polbooks dataset while the maximum value of is 0.4639 ( =0.60) and is 0.4489 ( =0.62) in dolphins dataset.

Figure 9: The digrams of comparison on the constriction factor with and .
5.4. The Comparison on Community Detection Algorithms

Considering the bias in the semantic community detection, we utilize classical nonsemantic algorithms to illuminate the issue with the football dataset, for example.

We choose GN, FN, LFM, COPRA as nonsemantic classical algorithms, where LFM and COPRA are the overlapping community detection algorithms. The and of the algorithms above are covered in Table 2 and the detection of communities is shown in Figure 10 with football dataset.

Table 2: The classical nonsemantic algorithms on , , and .
Figure 10: The detected communities with nonclassical algorithms on football.

From the result in Table 2, the of nonsemantic classical algorithms work higher than that of PSO-LDA (0.5132), but the works lower than PSO-LDA (0.4258). So it suggests that the nonsemantic classical algorithms make a higher in the topological construction detection and a lower in the semantic detection. There is a bias in community detection by nonsemantic classical algorithms compared to semantic algorithms in the way of getting the ideal communities. On the one hand, we verify the performance of these algorithms; on the other hand, we use this experiment to verify the relation above , , and . As for in Table 2, PSO-LDA performs better in and has high , and PSO-LDA is higher than other algorithms in . This means PSO-LDA performs well in overall search ( and ) and works better than others in local search ().

5.5. The Comparison on Real Datasets

In this section, we compare real different datasets, including Quantifying Link Semantics-Publication (QLSP) dataset (805 nodes), Academic Social Network (ASN) dataset (extract 2500 nodes) (https://www.aminer.cn/aminernetwork), extracting 10000 nodes and 20000 nodes from DBLP (December 31, 2014) dataset (2839219 nodes) (http://dblp.uni-trier.de/db/) as DBLP(A) and DBLP(B), and Enron email network (Enron) dataset (extract 25000 nodes) (http://snap.stanford.edu/data/email-Enron.html). The , , and (the number of detected communities) of datasets above detected by various algorithms are reported in Table 3, as the PSO-LDA for . The histogram of is shown in Figure 11 and in Figure 12. From Figures 11 and 12, the PSO-LDA model can be more suitable to solve the semantic community detection than the classical nonsemantic algorithms.

Table 3: The results of classical nonsemantic algorithms under various datasets.
Figure 11: The histogram of with various classical algorithms.
Figure 12: The histogram of with various classical algorithms.

6. Conclusion

In this paper, we presented a novel community detection algorithm PSO-LDA that combines the topological construction with semantic information. It can avoid the number and the size of communities. For the Gibbs sampling solving the hidden parameter in the proposed model, the sampling result approaches to the realistic state. The main contribution of this research focuses on how to use different similarity measure to measure similarity between nodes based on topological construction and their semantic information. As for future work, we would apply the model in some fields such as privacy protection and worm containment in semantic social network.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is sponsored by National Natural Science Foundation of China (61402126), Nature Science Foundation of Heilongjiang province of China (F2016024), Heilongjiang Postdoctoral Science Foundation (LBH-Z15095), University Nursing Program for Young Scholars with Creative Talents in Heilongjiang Province (UNPYSCT-2017094), Heilongjiang Province Foundation for Returned Scholars (LC2018030), and National Training Programs of Innovation and Entrepreneurship for Undergraduates (201810214020). The paper is also supported by China Natural Science Fund.

References

  1. S. Fortunato and D. Hric, “Community detection in networks: a user guide,” Physics Reports, vol. 659, pp. 1–44, 2016. View at Publisher · View at Google Scholar · View at MathSciNet · View at Scopus
  2. Y. Ruan, D. Fuhry, and S. Parthasarathy, “Efficient community detection in large networks using content and links,” in Proceedings of the International Conference on World Wide Web, pp. 1089–1098, 2013. View at Scopus
  3. U.-U. Narantsatsralt and S. Kang, “Social network community detection using agglomerative spectral clustering,” Complexity, vol. 2017, Article ID 3719428, 10 pages, 2017. View at Google Scholar
  4. C.-D. Wang, J.-H. Lai, and P. S. Yu, “NEIWalk: Community discovery in dynamic content-based networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 7, pp. 1734–1748, 2014. View at Publisher · View at Google Scholar · View at Scopus
  5. A. Clauset, M. E. J. Newman, and C. Moore, “Finding community structure in very large networks,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 70, no. 2, Article ID 066111, 2004. View at Publisher · View at Google Scholar · View at Scopus
  6. M. E. Newman, “Fast algorithm for detecting community structure in networks,” Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, vol. 69, article 066133, 2004. View at Google Scholar
  7. M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Phys Rev E Stat Nonlin Soft Matter Phys, vol. 69, no. 2, article 026113, 2004. View at Google Scholar
  8. G. Palla, I. Dere Nyi, I. S. Farkas, and T. S. Vicsek, “Uncovering the overlapping community structure,” Nature, vol. 435, no. 7043, pp. 398–406, 2005. View at Google Scholar
  9. A. Lancichinetti, S. Fortunato, and J. Kertesz, “Detecting the overlapping and hierarchical community structure of complex networks,” New Journal of Physics, vol. 11, no. 3, pp. 19–44, 2012. View at Google Scholar
  10. V. D. Blondel, J. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, pp. 155–168, 2008. View at Publisher · View at Google Scholar · View at Scopus
  11. H. A. Deylami and M. Asadpour, “Link prediction in social networks using hierarchical community detection,” in Information and Knowledge Technology, pp. 1–5, 2015. View at Google Scholar · View at Scopus
  12. X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov, “Clustering on multi-layer graphs via subspace analysis on Grassmann manifolds,” IEEE Transactions on Signal Processing, vol. 62, no. 4, pp. 905–918, 2014. View at Publisher · View at Google Scholar · View at MathSciNet
  13. Y. Neuman, Y. Neuman, and Y. Cohen, “A novel procedure for measuring semantic synergy,” Complexity, vol. 2017, Article ID 5785617, 8 pages, 2017. View at Google Scholar · View at Scopus
  14. J. Yang, J. McAuley, and J. Leskovec, “Community detection in networks with node attributes,” in Proceedings of the 13th IEEE International Conference on Data Mining (ICDM '13), pp. 1151–1156, 2013. View at Publisher · View at Google Scholar · View at Scopus
  15. A. Reihanian, M. R. Feizi-Derakhshi, and H. S. Aghdasi, “Overlapping community detection in rating-based social networks through analyzing topics, ratings and links,” Pattern Recognition, 2018. View at Google Scholar
  16. D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003. View at Google Scholar · View at Scopus
  17. F. Solera, S. Calderara, and R. Cucchiara, “Socially constrained structural learning for groups detection in crowd,” IEEE Computer Society, 2016. View at Publisher · View at Google Scholar · View at Scopus
  18. X. Li, C. Ming, and J. She, “Connection discovery using shared images by gaussian relational topic model,” in Proceedings of the IEEE International Conference on Big Data, pp. 931–936, 2017.
  19. Y. Xiao, L. Liu, M. Xu, H. Wang, and Y. Liu, “Glda-fp: Gaussian lda model for forward prediction,” in Proceedings of the International Conference on Big Data, pp. 124–139, 2018.
  20. X. Yu, J. Yang, and Z. Q. Xie, “A semantic overlapping community detection algorithm based on field sampling,” Expert Systems with Applications, vol. 42, no. 1, pp. 366–375, 2015. View at Google Scholar
  21. S. Gupta and P. Kumar, “Community detection in heterogenous networks using incremental seed expansion,” in Proceedings of the 2016 International Conference on Data Science and Engineering (ICDSE), pp. 1–5, 2017. View at Publisher · View at Google Scholar
  22. Y. Zhao, E. Levina, and J. Zhu, “Consistency of community detection in networks under degree-corrected stochastic block models,” The Annals of Statistics, vol. 40, no. 4, pp. 2266–2292, 2012. View at Publisher · View at Google Scholar · View at MathSciNet
  23. W. B. Towne, C. P. Rosé, and J. D. Herbsleb, “Measuring similarity similarly: Lda and human perception,” ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 1, 2016. View at Publisher · View at Google Scholar
  24. S. Cavallari, V. W. Zheng, H. Cai, K. C.-C. Chang, and E. Cambria, “Learning community embedding with community detection and node embedding on graphs,” in Proceedings of the 26th ACM International Conference on Information and Knowledge Management, (CIKM '17), pp. 377–386, 2017. View at Scopus
  25. Z. Yin, L. Cao, Q. Gu, and J. Han, “Latent community topic analysis: Integration of community discovery with topic modeling,” ACM Transactions on Intelligent Systems and Technology, vol. 3, no. 4, pp. 1–21, 2012. View at Google Scholar · View at Scopus
  26. Z. Xia and Z. Bu, “Community detection based on a semantic network,” Knowledge-Based Systems, vol. 26, pp. 30–39, 2012. View at Publisher · View at Google Scholar · View at Scopus
  27. F. Zhao, Y. Zhu, H. Jin, and L. T. Yang, “A personalized hashtag recommendation approach using LDA-based topic model in microblog environment,” Future Generation Computer Systems, vol. 65, pp. 196–206, 2016. View at Publisher · View at Google Scholar · View at Scopus
  28. S. Ahajjam, M. El Haddad, and H. Badir, “A new scalable leader-community detection approach for community detection in social networks,” Social Networks, vol. 54, pp. 41–49, 2018. View at Publisher · View at Google Scholar · View at Scopus
  29. X. Yang and J. Cao, “A Fast and accurate way for API network construction based on semantic similarity and community detection,” in Proceedings of the IFIP International Conference on Network and Parallel Computing, pp. 75–86, 2017. View at Publisher · View at Google Scholar
  30. C. X. Zhai, “Probabilistic topic models for text data retrieval and analysis,” in Proceedings of the International ACM SIGIR Conference, pp. 1399–1401, 2017.
  31. G. Heinrich, “Parameter estimation for text analysis,” Technical Report, 2008. View at Google Scholar
  32. W. K. Hastings, “Monte carlo sampling methods using Markov chains and their applications,” Biometrika, vol. 57, no. 1, pp. 97–109, 1970. View at Publisher · View at Google Scholar
  33. M. Sachan, D. Contractor, T. A. Faruquie, and V. L. Subramaniam, “Using content and interactions for discovering communities in social networks,” in Proceedings of the International Conference on World Wide Web, pp. 331–340, 2012.
  34. G.-J. Qi, C. C. Aggarwal, and T. Huang, “Community detection with edge content in social media networks,” in Proceedings of the IEEE 28th International Conference on Data Engineering, (ICDE '12), pp. 534–545, 2012. View at Scopus
  35. J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, vol. 4, pp. 1942–1948, 1995. View at Scopus
  36. S. Kianian, M. R. Khayyambashi, and N. Movahhedinia, “Semantic community detection using label propagation algorithm,” Journal of Information Science, vol. 42, no. 2, pp. 166–178, 2016. View at Publisher · View at Google Scholar · View at Scopus
  37. H. Abadlia, N. Smairi, and K. Ghedira, “Particle swarm optimization based on island models,” in Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 49-50, 2017. View at Publisher · View at Google Scholar
  38. X. Wang, D. Jin, X. Cao, L. Yang, and W. Zhang, “Semantic community identification in large attribute networks,” in Proceedings of the 30th AAAI Conference on Artificial Intelligence, (AAAI '16), pp. 265–271, 2016. View at Scopus
  39. N. A. Helal, R. M. Ismail, N. L. Badr, and M. G. Mostafa, “An efficient algorithm for community detection in attributed social networks,” in Proceedings of the International Conference on Informatics and Systems, pp. 180–184, 2016. View at Publisher · View at Google Scholar
  40. H. Shen, X. Cheng, K. Cai, and M.-B. Hu, “Detect overlapping and hierarchical community structure in networks,” Physica A: Statistical Mechanics and its Applications, vol. 388, no. 8, pp. 1706–1712, 2009. View at Publisher · View at Google Scholar · View at Scopus
  41. A. Clauset, “Finding local community structure in networks,” Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, vol. 72, no. 2, Article ID 026132, 2005. View at Publisher · View at Google Scholar · View at Scopus