Abstract

In most cases, the block structures and evolution characteristics always coexist in dynamic networks. This leads to inaccurate results of temporal community structure analysis with a two-step strategy. Fortunately, a few approaches take the evolution characteristics into account for modeling temporal community structures. But the number of communities cannot be determined automatically. Therefore, a model, Evolutionary Bayesian Nonnegative Matrix Factorization (EvoBNMF), is proposed in this paper. It focuses on modeling the temporal community structures with evolution characteristics. More specifically, the evolution behavior, which is introduced into EvoBNMF, can quantify the transfer intensity of communities between adjacent snapshots for modeling the evolution characteristics. Innovatively, the most appropriate number of communities can be determined autonomously by shrinking the corresponding evolution behaviors. Experimental results show that our approach has superior performance on temporal community detection with the virtue of autonomous determination of the number of communities.

1. Introduction

Dynamic network analysis, as an important branch of complex network science, has attracted wide attention in recent years [1]. Temporal community structure analysis is one of the important research problems, which includes two subproblems: temporal community detection [1] and community evolution analysis [2]. However, community structure analysis of dynamic networks first detects the community structures and then analyzes the corresponding evolution pattern with a heuristic strategy in most cases. These works ignore the evolution characteristics of community structures when doing the temporal community detection. In fact, the block structures and evolution characteristics always coexist in dynamic networks. This leads to inaccurate results of temporal community detection. Therefore, it is very necessary to propose a model which describes the community structures with evolution characteristics for improving the accuracy of temporal community detection.

At present, temporal community detection has been widely concerned, which focuses on mining the meaningful block structures or functional modules hiding in the network snapshots of dynamic networks. In the beginning, two-stage approaches are introduced into temporal community detection, which first detect communities on each snapshot with a static method and then match them across different snapshots [3]. These types of approaches detect the communities of current snapshots ignoring the historical community structures from last snapshots, which take away the evolution characteristics of temporal community structures and are usually sensitive to noise. Later, evolutionary clustering- [4] based approaches are proposed, which effectively make up for this shortcoming by detecting the communities at the current snapshot with not only the current topology structure but also the previous community structures [5]. However, most of these works ignore determining the number of communities at each snapshot automatically and need to be specified in advance. In fact, the determination of the number of communities is a model selection problem, which is a common challenge for community detection. In addition, most of these works just focus on identifying temporal communities accurately but ignore analyzing the corresponding community evolution.

In addition, community evolution analysis exposes the evolution behaviors, which quantify the transition relationships of communities between adjacent snapshots, and helps to trace the change trends of dynamic networks. At first, the heuristic-based approaches [3] are proposed, which usually summarize the changing laws over time for analyzing the evolution pattern after detecting communities. Then, the feature engineering-based approaches [6] are proposed, which extract the evolution feature based on the detected temporal communities with a machine learning algorithm. The first two types of methods tend to analyze the evolution laws after detecting communities so that the results of the evolution analysis rely too much on the results of community detection. They ignore that community structures and evolution characteristics coexist in dynamic networks. Fortunately later, generative model-based approaches [7], which model the generative mechanism of community structure and community evolution synchronously, are able to make up for that shortcoming. However, most of these existing approaches just describe the evolution behaviors qualitatively but not quantitatively. And few of these approaches can deal with model selection problems automatically.

For addressing the above issues, we pay attention to model the community structures with evolution characteristics for boosting temporal community detection and determining autonomously the number of communities at each snapshot of dynamic networks synchronously. In this paper, a model called Evolutionary Nonnegative Matrix Factorization (EvoBNMF) is proposed based on a Bayesian probabilistic model. In detail, we introduce the evolution behaviors to model the evolution characteristics of community structures with Bayesian Nonnegative Matrix Factorization (BNMF) [8] in an evolutionary clustering framework [4]. Then, we develop a gradient descent algorithm to optimize the parameters of EvoBNMF by maximizing the posterior estimate. Innovatively, the most appropriate number of communities can be determined autonomously by shrinking the corresponding evolution behaviors. Last but not least, experimental results from synthetic networks and real-world networks over several state-of-the-art methods show that our proposed EvoBNMF has superior performance on temporal community detection with the virtue of autonomous determination of the number of communities. It is worthwhile to highlight several contributions of this work here:(i)A model called Evolutionary Nonnegative Matrix Factorization (EvoBNMF) is proposed by modeling community detection with evolution characteristics for improving the performance of temporal community detection(ii)The proposed EvoBNMF catches the most appropriate number of communities autonomously by shrinking the corresponding evolution behaviors of each snapshot network(iii)An effective algorithm is developed to optimize the objective function of EvoBNMF, of which the time complexity can be degraded to be linear(iv)Extensive experiments on synthetic and real-world dynamic networks demonstrate that our proposed EvoBNMF has superior performance on temporal community detection in comparison with state-of-the-art methods

According to the core idea of the related works, the methods for temporal community detection can be divided into three categories: the snapshot matching-based methods [9, 10], the historical structural dependency-based methods [11, 12], and the community evolution model-based methods [13, 14].

The basic idea of the snapshot matching-based method is first to discover each network snapshot independently based on the static community detection algorithm and then to match the communities between snapshots based on some similarity strategy. For example, Seifikar et al. [9] proposed a new Louvain-based dynamic community detection algorithm that relied on the derived knowledge of the previous steps of the network evolution. Mishra et al. [10] proposed a tree-based community detection algorithm exploiting two important properties, connectedness and influence, for finding communities in the network. However, this kind of method pays more attention to discover the community structures on each network snapshot but ignores the smoothness of the evolution characteristics of dynamic network structure, splitting the evolution of community structure into multiple snapshots.

The basic idea of the historical structural dependency-based method is derived from the assumption of smoothness of the evolution of dynamic complex network structure. It is believed that the community structure of the current snapshot is evolved slowly from the previous network snapshot so that the result of the current community structure depends on the community structure of one or more historical network snapshots. For example, Yin et al. [12] proposed an efficient and effective multiobjective method via modifying the traditional evolutionary clustering framework and the particle swarm algorithm. Rossetti et al. [5] proposed an online incremental clustering dynamic community detection algorithm (Tiles) based on modularity incremental optimization. The calculation of the network substructure is local and the number of nodes and communities involved is limited, thus speeding up the updating efficiency. In addition, Wang et al. [15] constructed a novel similarity combining structural perturbation theory and network topology characteristics and proposed a dynamic community mining algorithm based on evolutionary clustering. All of these methods can avoid the matching of community structure between snapshots and incorporate the smoothness of dynamic community evolution.

The community evolution model-based method simulates the generation rules and the attribute characteristics hiding in the networks from the perspective of a dynamic complex network generation mechanism. At the same time, a reasonable network evolution mechanism is embedded to construct a parameterized dynamic community evolution generation model. Finally, the model parameters are solved to obtain the optimal community structure and the evolution pattern. For example, Ting et al. [13] proposed a novel framework for fitting the multilayer stochastic block model (SBM) that builds on multislice modularity maximization. It can discover a common community partition of all snapshots simultaneously. In addition, Yu et al. [16] constructed a matrix decomposition model containing an edge evolution time function for mining the evolution patterns in the microstructure (edge) of dynamic networks, which can be applied to structural trend prediction, link prediction, and anomaly detection of dynamic networks. Li et al. [14] proposed a method to learn the graph embedding and dynamic community detection via joint learning graph representation and NMF. In general, this kind of method abandons the two-step strategy in the snapshot matching-based method and continues the evolution smoothness property in the historical structural dependency-based method. It makes up for the lack of evolution mechanism in the historical structural dependency-based methods and gradually becomes the most popular method in the field of dynamic community detection.

3. Methodology

3.1. Notations

A dynamic network is usually cut as a series of network snapshots according to a fixed time window. It can be expressed as , where is the entity or node sets, is edge sets at snapshot , and is the number of snapshots. Network snapshot is represented with an adjacency matrix , where the element at snapshot isIn addition, we summarize the main notations in Table 1.

As community number is unknown in most of the real cases, from the perspective of the Bayesian model, we assume that the community membership degree and the evolution tendency follow some probability distribution, respectively. At the same time, we assume that the parameters of the probability distribution of the evolution tendency follow a given prior distribution. Then, we construct the dynamic community evolution model EvoBNMF under the framework of the NMF and transform the optimization problem from maximizing the a posteriori probability of minimizing the negative logarithm of the posterior probability. Therefore, the tasks of the dynamic community evolution model EvoBNMF are summarized as follows:(i)Input: the adjacency matrix sequence of the dynamic complex network , and the hyperparameters and (ii)Output: the dynamic community structures , the community evolution matrix sequence , and the number of communities

3.2. EvoBNMF Model

Here, we design the generative graphical model of EvoBNMF (see Figure 1) with a Bayesian probabilistic model using the core idea of evolutionary clustering. For snapshot 1, as there is no historical structure information, EvoBNMF can be constructed similarly according to the static BNMF. Accordingly, the observed adjacency matrix is influenced by an unobserved expectation snapshot network , of which element denotes the expected link weight that takes place between and at snapshot 1. The expectation snapshot network can be composed of a basis matrix and a community membership matrix so that , where captures the propensity of node belonging to community and is the unknown number of communities. Similar to [8], we assume that is drawn from a Poisson distribution with rate . And and are both drawn from a half-normal distribution with scale parameters . In addition, we consider that the conjugate prior [17] of half-normal distribution is Gamma distribution. Therefore, is drawn from Gamma distribution with two hyperparameters and . According to the graphical model in Figure 1, the model of snapshot 1 is the same as that of [8], and the corresponding posterior of the model at snapshot 1 is

The task of minimizing the negative log posterior, which is equivalent to the task of maximizing the posterior, can be regarded as the objective function of snapshot 1, and its specific form is as follows [8]:where denotes a constant.

For the case of snapshot , similarly, the observed adjacency matrix is influenced by an unobserved expectation snapshot network , of which element denotes the expected link weight that takes place between and at snapshot . The expectation snapshot network can be composed of a basis matrix and a community membership matrix so that , where captures the propensity that node belongs to community and is the unknown number of communities. Differently, we consider the historical structure information into the model of the current snapshot according to the core idea of evolutionary clustering. In addition, we introduce an evolution matrix to model the evolution behaviors of communities synchronously. The element denotes the propensity that nodes of community of snapshot transfer into community of snapshot . Here, we think that the current community membership is evolved from by modeling some evolution behaviors and introduce a penalty term to force that . In detail, we assume that is drawn from a Poisson distribution with rate , and are both drawn from a half-normal distribution with scale parameters , and is drawn from Gamma distribution with two hyperparameters and [17]. According to the graphical model in Figure 1, the joint distribution at snapshot can be represented as follows:where is a balance parameter. Therefore, the corresponding posterior is

The task of minimizing the negative log posterior, which is equivalent to the task of maximizing the posterior, can be regarded as the objective function of snapshot , and its specific form is as follows:

We assume that is drawn from a Poisson distribution with rate , is drawn from a Poisson distribution with rate , and are both drawn from a half-normal distribution with scale parameters , and is drawn from Gamma distribution with , and . Then, can be rewritten aswhere is a constant.

3.3. Updating Rules

For snapshot 1, EvoBNMF is reduced to BNMF. So, the updating rules of the objective function are the same as those of [8]; in detail,

Similarly, for snapshot , we optimize equation (7) for , , , and with a gradient descent algorithm, and the updated rules are as follows:

We update iteratively , , , and according to the above rules until converges. We determine automatically the most appropriate number of communities of each snapshot with a statistical model selection method. In detail, we set a large value (e.g., ) as the initial number of communities. After parameter optimization, we shrink , , to , , and by removing the irrelevant rows or columns of which sum is zero or very close to zero. The pseudocode of the solving algorithm of EvoBNMF is presented in Algorithm 1. The returned community label vectors are the results of temporal community detection. And the returned evolution matrices are the results of quantifying evolution behaviors.

(1)Initialize , , , , where ;
(2)while not converge do
(3)  Update , , according to equations (8)–(10);
(4)fordo
(5)  while not converge do
(6)    , , , according to equations (11)–(14);
(7)fordo
(8)  shrink , , and to , , and ;
(9)  ;
(10)return, .

According to Algorithm 1, iteratively updating of is most time-consuming. The time complexity of each iterative is . Here, we set the average number of iterations as , and the whole time complexity is about . As we know, dynamic networks are usually very sparse in real cases. Then, can be replaced with the average number of edges approximatively at each snapshot. In addition, can be ignored as it is usually much less than . Naturally, the time complexity of the optimization algorithm of EvoBNMF can be degraded to .

4. Experiments

In order to verify the principle and the effectiveness of EvoBNMF, we design the comparison experiments on synthetic networks and real-world networks. In this section, we mainly introduce the experimental settings, discuss the experimental results, and analyze the parameter sensitivity and algorithm convergence.

4.1. Settings
4.1.1. Datasets

We test the performance of our EvoBNMF on the eight dynamic networks. Four networks are generated according to SYN-FIX [18], and the other four networks are from real-world KIT-mail (https://i11www.iti.uni-karlsruhe.de/en/projects/spp1307/emaildata). We show their statistical information including the number of snapshots , the average number of nodes , the average number of edges , and the average number of in Table 2. The details are described as follows:(1)SYN-FIX [19]: this type of dataset is generated one snapshot by one snapshot with Girvan Newman benchmark. During the specific generation process, the parameters of network properties are set as follows: the number of snapshots is 10, the number of nodes is 128, the number of communities is 4, the mixed parameter is 3 which is used to control the degree of the noise, the average degree of the nodes is 16 and 20, and the community transfer parameter NC is used to control the dynamic level of nodes moving from the current community to other communities, which is set at 10% and 30%.(2)LFR [20]: the classical synthetic network LFR is mainly to describe the dynamics of networks by considering some community evolution events, which include Birth, Death, Growth, Contraction, Merging, and Splitting. Here, we select the Mergesplit event to generate datasets. We set different probabilities and during network generation to control the probability of community merging and splitting.(3)KIT-Email: it is a mail communication network from the Information Department of Karlsruhe Institute of Technology (KIT) in Germany. The members are nodes, and the number of mail communication times is the weight of the edge. Different research groups are the corresponding communities. Here, the network data of 48 from September 2006 to August 2010 are divided into different dynamic complex networks. Specifically, we construct each snapshot by shading consecutive 2, 3, 4, and 6 months, respectively, and get four different dynamic networks.

4.1.2. Evaluation Metrics

The performance of community detection is evaluated with two widely used indexes: the Normalized Mutual Information (NMI) and Error Rate (ER) [7]. In detail,where denotes the community structures detected from the algorithm and denotes the ground truth. and denotes the entropy of and , and denotes the mutual information between and , respectively [18]. Here, the entropy and the mutual information are computed with the equation , and , respectively, where is the number of nodes, and is the number of communities, respectively. NMI as an entropy measure restrained in is usually used to measure the consistency between two partitions. ER is usually used to measure the difference between two different partitions, and the smaller it is, the better its performance is. Universally, ER tends to increase with the scale of networks.

And the accuracy of the autonomous determination of the number of communities is evaluated with KA [21]; in detail,where denotes the grand truth of the number of communities and denotes the number of communities detected by the methods.

4.1.3. Comparison Approaches

In this work, five state-of-the-art approaches are chosen for detecting communities as compared groups as follows:(i)BNMF [8]: it is a two-steps strategy, which segments the snapshots into discrete time steps and community detection with the static Bayesian NMF on each snapshot, respectively. When the snapshot label , EvoBNMF is reduced to BNMF.(ii)Dyluvain [3]: it optimizes the temporal modularity with a greedy heuristic method. The resolution parameter and couple parameter are set to 1 and 0.5, which are the commonly used parameter settings in the related works.(iii)PisCES [22]: it is a temporal community detection model based on global spectral clustering, which is derived from the idea of evolutionary clustering and degree modification. The parameter is set to 0.1, and the maximum community number defaults to of the number of nodes.(iv)DYNMO [18]: it is a multiobjective approach based on evolutionary clustering, which is formalized as a multiobjective optimization problem to be optimized by a genetic algorithm.(v)ESPRA [15]: it is an evolutionary clustering algorithm based on the fusion of structural perturbation and network topological features, which can automatically determine the number of communities. In this model, the smoothness balance parameter is set to 0.8, and the perturbation and similarity information balance parameter is set to 0.5.

4.2. Experimental Results
4.2.1. Illustrative Example

To clarify the working principle of EvoBNMF, we take an illustrative example on the results of Net. 1 in Figure 2. Due to space constraints, we just show the results of snapshots . At snapshot 1, the learned matrices and are decomposed from the observed . Obviously, there are just four columns of and four rows of have a high value. And the number of rows of is the targeted number of communities after the adaptive compression of the rows with a low value. Then, the compressed and the observed are both the input of the model at snapshot 2. For snapshot , the matrices and are decomposed from the observed and the matrices and are decomposed from the synchronously in a unified model. It is worth noting that we can obtain the evolution matrices after the adaptive compression of the rows of . And the evolution matrices correspond to quantitative results of the evolution behavior of communities.

The community evolution matrix mined from the dynamic network through EvoBNMF can represent its evolution pattern and reflect the community evolution relationship between adjacent snapshots. Because the scale of the evolution matrix of different snapshots is different, the row normalization of is carried out, so that . Then, can be regarded as the propensity that the node transfer from community to community between snapshot and snapshot . In Figure 3, the community evolution relationship between adjacent snapshots of Net. 1 is given, and the shade of color in the figure represents the node transition probability between communities. In each subgraph, the vertical coordinate represents the community label of the current network snapshot, the horizontal coordinate represents the community label of the next snapshot, and the shade of the color represents the transition probability, which corresponds to the value of of the community label in the current snapshot . On the whole, the diagonals of these subgraphs always show a yellow (light) color, which means that most of the internal nodes of the communities are likely to remain in the current communities. This phenomenon reflects that the evolution of the dynamic network is slow.

It is worth noting that the subgraphs of , , , and have taken place an anomaly about the evolution patterns. For example, in the subgraph of , the new community is derived when the network evolves from snapshot 6 to 7. And the probability that nodes transfer from community to community is relatively high, and the probability that nodes keep in community is obviously relatively low. Therefore, it can be speculated that community is split from community . In addition, in the subgraph of , it can be found that community disappears again when the network snapshot evolves from 7 to 8. And its nodes have a high probability to move to community . Similarly, it can be found community and appear in the subgraphs of and and then disappear. This phenomenon indicates that the evolution of community structure is unstable after snapshot 6. The Birth/Death of community structure is actually corresponding to the dynamic community evolution events, which is first defined by Palla et al. [23]. In real-world social networks, if there are a large number of similar community evolution events, it can indicate that there are real events behind them, which proves that EvoBNMF can be applied to event detection with real-world social networks.

In order to more vividly represent the evolution process of a dynamic community, corresponding to the community evolution of dynamic network Net. 1 in Figure 3, a visual schematic diagram of evolution and its transition over time is given in Figure 4. In Figure 4, the horizontal axis represents the network snapshot label , the vertical axis represents the dynamic community label , the colored circle represents different communities, the circle size represents the relative size of communities, and the dotted arrow represents the node transfer relationship between communities. The community evolution intensity in Figure 3 corresponds to the node transfer relationship between communities and the changing situation of communities in Figure 4. For example, in Figure 3, there is a high diagonal strength in many subgraphs, corresponding to Figure 4, and the community changes little over time in most cases. In addition, new communities appear in snapshots 7 and 9 and disappear again in snapshots 8 and 10, respectively. This phenomenon echoes strongly with Figure 4. In general, Figures 3 and 4 show the evolution pattern of a dynamic community and give a visual diagram of evolution, which can effectively deepen people’s understanding of the evolution of temporal communities in the networks.

4.2.2. Temporal Community Detection

In order to investigate the effectiveness, we compare the accuracy of our proposed EvoBNMF with five state-of-the-art methods on temporal community detection, including BNMF [8], Dyluvain [3], PisCES [22], DYNMO [18], and ESPRA [15]. The hyperparameters are set as , , and in experiments.

At first, the results are shown in Table 3 over NMI, ER, and KA of the five methods on Net. of SYN-FIX. The best results, which are bolded out, demonstrate that Dyluvain and our proposed EvoBNMF are comparable in performance and both are better than others. The reason is that Dyluvain optimizes the temporal modularity with a greedy heuristic method and is suitable to the synthetic data SYN-FIX. In addition, the results of BNMF, which is the static version of EvoBNMF, are obviously lower than EvoBNMF. The phenomenon verifies the validity of the proposed EvoBNMF.

Furthermore, we show the results over NMI, ER, and KA of the five methods on network from top to bottom in Figure 5, respectively. These results are the average results of ten repetitions including the corresponding variance bar. In addition, the x-axis is the snapshot label , and the y-axis is NMI or ER or KA values. From all the subfigures, DyLouvain and EvoBNMF have higher NMI and ER values, and DYNMO and EvoBNMF have higher KA values. This is a strong indication that the proposed EvoBNMF has superior performance not only on temporal community detection but also on autonomous determination of the number of communities.

Similarly, the subfigures in Figure 6 show the results over NMI, ER, and KA of the five methods on network from top to bottom, respectively. From all the subfigures, we found that the results over NMI, ER, and KA of EvoBNMF have the highest accuracy in most cases, but not at the first snapshot. The main reason is that there is no historical structure information for the first snapshot, and EvoBNMF degenerates to BNMF. In addition, there is significantly improved accuracy from snapshot 1 to snapshot 2, which fully demonstrates the effectiveness of EvoBNMF.

4.2.3. Parameter Sensitivity and Algorithm Convergence

We test the sensitivity of the balance parameter of EvoBNMF on Net. over NMI by ranging with a step length of 0.02. As shown in Figure 7(a), the performance of EvoBNMF is not sensitive when parameter , which is the best on about 0.2.

In addition, we verify the convergence of EvoBNMF on Net. 12 with . Figure 7(b) shows the convergence of at snapshots of Net. 12. We find that the value of always tends to converge when the times of iterations are no more than 50, which demonstrates that the convergence rate is relatively fast.

5. Conclusion

In this paper, we focus on modeling temporal community structure with evolution characteristics for boosting community detection and propose the EvoBNMF model which can trace the corresponding evolution behaviors synchronously in dynamic networks. In addition, a gradient descent algorithm is developed to optimize our model. Importantly, the number of communities of each snapshot can be determined automatically by shrinking the evolution behavior in EvoBNMF. Finally, experimental results on synthetic and real-world networks demonstrate the effectiveness of EvoBNMF. In the future, we will do some predictive tasks of dynamic networks (e.g., links or community structures prediction), which are of great practical significance and application value.

Data Availability

The data sets used to support the results of this study are available from the corresponding author upon request.

Disclosure

This manuscript is an extension of a conference paper [24].

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Key Research Project of Zhejiang Yuexiu University (D2020003) and the Finance Science and Technology Project of the 12th Division of Xinjiang Construction Corps (SR202103).