Advances on the Resilience of Complex NetworksView this Special Issue
Underestimated Cost of Targeted Attacks on Complex Networks
The robustness of complex networks under targeted attacks is deeply connected to the resilience of complex systems, which is defined as the ability to make appropriate response to the attack. In this paper, we study robustness of complex networks under a realistic assumption that the cost of removing a node is not constant but rather proportional to the degree of a node or equivalently to the number of removed links a removal action produces. We have investigated the state-of-the-art targeted node removing algorithms and demonstrate that they become very inefficient when the cost of the attack is taken into consideration. For the case when it is possible to attack or remove links, we propose a simple and efficient edge removal strategy named Hierarchical Power Iterative Normalized cut (HPI-Ncut). The results on real and artificial networks show that the HPI-Ncut algorithm outperforms all the node removal and link removal attack algorithms when the same definition of cost is taken into consideration. In addition, we show that, on sparse networks, the complexity of this hierarchical power iteration edge removal algorithm is only .
The ability of complex system to dynamically adapt to internal failures or external disturbances is called a resilience. The adaptation is connected to the robustness of the network structure , which is defined as the ability to maintain functionality without adaptation to internal failures or external disturbances (attacks). In this paper, we will focus on the robustness of complex networks under targeted attacks with the more realistic cost function. Robustness of connected components under random failure of nodes or links is described with the classical percolation theory [2, 3]. Percolation is the simplest process showing a continuous phase transition, scale invariance, fractal structure, and universality and it is described with just a single parameter, that is, the probability of removing a node or edge. Network science studies have demonstrated that scale-free networks [4, 5] are more robust than random networks [6, 7] under random attacks or failures but less robust under targeted attacks [8–12]. Recently, studies of network resilience have moved their focus to more realistic scenarios of interdependent networks , competing networks , different failure , and recovery [16, 17] mechanisms.
Although the study of network robustness has received a huge amount of attention, the majority of the targeted attack strategies are still based on the heuristic identification of influential nodes [11, 18–21] with no performance guarantees for the optimality of the solution. Finding the minimal set of nodes such that their removal maximally fragments the network is called the network dismantling problem [22, 23] and it belongs to the NP-hard class. Thus no polynomial-time algorithm has been found for it and only recently different state-of-the-art methods were proposed as approximation algorithms [22–28] for this task. Although state-of-the-art methods show promising results for network dismantling, we take one step back and analyze the implicit assumption these network dismantling algorithms have. The implicit assumption that the cost of a removing action is equivalent for all nodes regardless of their importance or centrality in network is not a realistic one. Attacking a central node, for example, a high degree node in sociotechnical systems, usually comes with the higher additional cost when compared to the same action on a low degree node. Therefore, it is more realistic to explicitly assume that the cost of an attack is heterogeneous. In this paper, we define the cost of removing a node as a function of its degree.
Recently, similar definition of the cost  was used to analyze fragmentation and strengthening process for a class of random network models. Under the assumption of the random network models, they found that the optimal cost for fragmentation and strengthening process consists out of the list of priorities of degrees for removed nodes which is independent of the network’s degree distribution.
In this work, we make the explicit assumption that the cost of an attack is proportional to the degree of a node or equivalently to the number of adjacent links a removed node has. We investigated different state-of-the-art node removal algorithms on real networks and results show that with respect to this concept of cost, most state-of-the-art algorithms are very inefficient and in most instances perform even worse than the random removal strategy for a fixed finite budget of cost.
Furthermore, when edge removal attacks are possible, we compare them to the node removal strategies with respect to the same definition of cost, that is, the number of removed links needed to fragment the network. Note that removing a node is equivalent to removing all the edges of that node, and therefore all node removal actions can be reproduced with the edge removal strategy but vice versa does not hold. Therefore, we also make highlight that the comparisons between node and edge based strategies are only interpretable in cases when edge based attacks are possible. In that case, we propose and use an edge removal strategy, named the Hierarchical Power Iterative Normalized cut (HPI-Ncut) as one of the possible solutions to overcome the large fragmentation cost. Although edge based strategies have higher degree of freedom as they can remove only a fraction of edges adjacent to the node, still we find cases where node-based strategies can outperform the edge based strategies. However, our proposed method (HPI-Ncut) always outperforms all the state-of-the-art targeted node-based attack algorithms and edge removal strategies [18, 27, 30].
The structure of this paper is organized as follows. First, in Section 2 (“Materials and Methods”), we introduce the empirical and artificial networks that are used in this paper (Section 2.1), present and describe current targeted attack strategies (Section 2.2), define a degree cost-fragmentation measure (Section 2.3), and describe the proposed HPI-Ncut method (Section 2.4). Then, in Section 3 (“Results and Discussions”), we quantify the cost of the state-of-the-art node removal strategies and show that in most cases the cost of such attacks is inefficient with respect to the degree-based definition of cost (Section 3.1). These results have important impact for real world scenarios of network fragmentations where cost budget is limited. Finally, when it is possible to remove single edges (e.g., shielding communication links, removing power lines, cutting off trading relationships, or others), we use the proposed HPI-Ncut method and compare its performances with other strategies (Section 3.2). The effect of edge removal HPI-Ncut method as an immunization measure for the epidemic spreading process on networks is presented (Section 3.3).
2. Materials and Methods
In this section, we describe data sets and some existing state-of-the-art targeted attack algorithms. Among them, the node removal-based attack algorithms are designed to dismantle the network into pieces with no thought for the cost of the attacking. In other words, these algorithms consider all the nodes have uniform cost. We also introduce the edge betweenness and bridgeness, which originally proposed evaluateing the importance of nodes, as two comparable link attacking methods. Then, we define the degree cost-fragmentation effectiveness (DCFE) as an index to measure the performance of different attacking methods. At last, we introduce the degree cost-fragmentation effectiveness measure and present the HPI-Ncut method.
2.1. Data Sets
To evaluate the performances of the network dismantling (fragmentation) algorithms, we used both real networks and synthetic networks in this paper: (a) Political Blogs  which is an undirected social network that was collected around the time of the US presidential election in 2004. This network is a relatively dense network whose average degree is 27.36; (b) Petster-hamster which is an undirected social network which contains friendships and family links between users of the website http://hamsterster.com. This network data set can be downloaded from KONECT (http://konect.uni-koblenz.de/networks/petster-hamster); (c) Power Grid  which is an undirected Power Grid network in which a node is either a generator, a transformer, or a substation, while a link represents a transmission line. This network data set can also be downloaded from KONECT (http://konect.uni-koblenz.de/networks/opsahl-powergrid); (d) Autonomous Systems is an undirected network from the University of Oregon Route Views Project . This network data set can be downloaded from SNAP (https://snap.stanford.edu/data/as.html); (e) Erdös and Rényi (ER) network  is constructed with 2500 nodes. Its average degree is 20 and the connection probability is 0.01; (f) Scale-free (SF) network with size 10,000, exponent 2.5, and average degree 4.68; (g) Scale-free (SF) network with size 10,000, exponent 3.5, and average degree 2.35; (h) stochastic block model (SBM) with ten clusters is an undirected network with 4232 nodes and average degree 2.60. The basic properties of these networks are listed in Table 1.
2.2. Compared Attack Strategies
In this subsection, we will briefly introduce state-of-the-art node removal attack algorithms and some edge evaluation methods which are used in this paper. We also employ several baselines methods for edge based attacks, which are based on the random edge removal and sequential removal of edges with high betweenness and bridgeness measures.(i)Percolation method: in the study of the network attacks, percolation  is a random process of uniform removal of either nodes (site percolation) or edges (bond percolation).(ii)High degree (HD) method [9, 36]: in HD method, all the nodes are ranked according to their degrees at the beginning. Then the highest ranked node (and its associated edges) will be removed one by one. The high degree adaptive (HDA) method is an adaptive version of the HD method. The HDA recomputes and ranks the degree of all the nodes before every removing.(iii)Equal graph partitioning (EGP) algorithm: EGP algorithm , which is based on the nested dissection  algorithm, can partition a network into two groups with arbitrary size ratio. In every iteration, EGP algorithm divides the target nodes set into three subsets: first group, second group, and the separate group. The separate group is made up of all the nodes that connect to both the first group and the second group. Then minimize the separate group by trying to move nodes to the first group or the second group. Finally, after removing all the nodes in the separate group, the original network will be decomposed into two groups. In our implementation, we partition the network into two groups with approximate equal size.(iv)Collective Influence (CI) algorithm: CI algorithm  attacks the network by mapping the integrity of a tree-like random network into optimal percolation theory  to identify the minimal separate set. Specifically, the collective influence of a node is computed by the degree of the neighbors belonging to the frontier of a ball with radius . CI is an adaptive algorithm which iteratively removes the node with highest CI value after computing the CI values of all the nodes in the residual network. In our implementation, we compute the CI values with .(v)Min-Sum algorithm: the three-stage Min-Sum algorithm  includes (1) breaking all the circles, which could be detected form the 2-core  of a network, by the Min-Sum message passing algorithm, (2) breaking all the trees larger than a threshold , (3) greedily reinserting short cycles no greater than a threshold , which ensures that the size of the GCC is not too large. In our implementation, we set and as 0.5% and 1% of the size of the networks.(vi)CoreHD algorithm: inspired by Min-Sum algorithm, CoreHD algorithm  iteratively deletes the node with highest degree from the 2-core  of the residual network.(vii)Belief propagation-guided decimation (BPD) [28, 39]: the BPD method is a loop-focused global algorithms which removes a set of nodes so that all the loops in the network are broken. In every iteration process, the node with the highest probability of being suitable for deletion is deleted. After the deletion of a specific fraction of nodes, the probability of all the nodes will be updated.(viii)Edge betweenness : betweenness is a widely used centrality measure which is the sum of the fraction of all-pairs shortest paths that pass a node. Edge betweenness, an extension of the betweenness, is used to evaluate the importance of a link and is defined as the sum of the fraction of all-pairs shortest paths that pass this link . In this strategy, the links are removed sequentially from high to low edge betweenness value.(ix)Bridgeness : bridgeness uses local information of the network topology to evaluate the significance of edges in maintaining network connectivity. The bridgeness of a link is determined by the size of -clique communities that the two end points of this link are connected with and the size of the -clique communities that this link is belonging to.
2.3. Degree Cost-Fragmentation Effectiveness (DCFE)
The robustness of the network structure can be measured in different means, but a common way is to characterize the function of the size of the largest connected component (GCC) with respect to the ratio of the removed nodes or edges, that is, cost. Characterization of this function was done in two distinct ways: (i) by the value of critical point when the largest component completely collapses  or (ii) by measuring the size of the largest component during the whole attacking process . However, only recently  the cost functions for nodes attacks were formulated in a more general way as a function of degree.
We make explicit assumption that the cost of removing a node is proportional to the degree or to the number of the adjacent edges that have to be removed. Let us define the function as the size of GCC for fixed attack cost for strategy . The cost is measured as the ratio of the number of removed edges in the network. Now, for a fixed budget , strategy is more efficient than strategy if and only if ; that is, the size of the GCC is smaller by attacking with strategy than with strategy with the limited budget .
Here we define the degree cost-fragmentation effectiveness (DCFE) for strategy as the area under the curve of the size of GCC versus the cost, which can be computed as the integral over all possible budgets: . This measure is a combination of the robustness measure that takes all possible cost budgets  into consideration. Here, the cost is proportional to the degree  or to the number of the adjacent edges that have to be removed. Smaller value of DCFE implies that the attack has stronger effect over all possible budgets.
2.4. HPI-Ncut: Edge Removal Strategy
In this section, we introduce and describe the Hierarchical Power Iterative Normalized cut for edge removal strategy (HPI-Ncut). Thus, if the edge removal actions on networks are applicable, we compare them with the same definition of the cost to the node-based strategies. The link fragmentation problem can be narrated as follows: if we have a budget of links that can be attacked or removed, which links should we pick? This is mathematically equivalent to asking how to partition a given network with a minimal separate set of edges.
We applied the spectral strategy for edge attack problem, which fall in the class of well-known spectral clustering and partitioning algorithms [43–47]. We use the hierarchical partitioning with Ncut objective function  combined with power iteration procedure for approximation of eigenvectors.
Now, we describe the hierarchical iterative algorithm for edge removing. This algorithm hierarchically applies the spectral bisection algorithm, which has the same objective function as the normalized cut algorithm . Furthermore we have used the power iteration method to approximate spectral bisection. In order to explain our algorithm, we quickly recall the spectral bisection algorithm.
The Spectral Bisection Algorithm Input: adjacency matrix of a network Output: a separated set of edges that partition the network into two disconnected clusters , :(1)compute the eigenvector , which corresponds to the second smallest eigenvalue of the normalized Laplacian matrix , or some other vector for which is close to minimal. We use the power iteration method to compute this vector, which will be explained later,(2)put all the nodes with into the first cluster and all the nodes with into the second cluster . All the edges between these two clusters form the separation set that can partition the network.
The clusters that we obtained by this method had usually very balanced sizes. If, however, it is very important to get clusters of exactly the same size, one could put those nodes with the largest entries in into one cluster and the remaining nodes into the other cluster.
Hierarchical Power Iterative Normalized Cut (HPI-Ncut) Algorithm Input: adjacency matrix of a network Output: partition of the network into small groups:(1)partition the GCC of the network into two disconnected clusters and by using the spectral bisection algorithm and removing all the links in the separated set,(2)if the budget for link removal has not been overrun and if the GCC is not yet small enough, partition and with Step (1), respectively.
The reason why we cluster hierarchically is because this allows us to refine the fragmentation gradually. For example, if, after partitioning the network into clusters, we decide that the clusters should be smaller, we would just have to partition each of the existing clusters into new clusters, obtaining clusters. So the links that were removed already remain removed and we just need to remove some additional ones. If, however, we had used spectral clustering straightforwardly, it could happen that the set of links to be removed in order to partition the network into clusters would not contain the set of links that needed to be removed for clusters.
Power Iteration Method Input: adjacency matrix of a network, number of iterations Output: the eigenvector or some other vector for which is close to :(1)draw randomly with uniform distribution on the unit sphere(2)set , where (3)for to , where and .
Objective Function of the Spectral Bisection Algorithm. In Appendix A, we show that the spectral bisection algorithm has the same objective function with the relaxed Ncut  algorithm:where denotes set of nodes in the first partition, denotes the set of nodes in the second partition, and is the degree of the node .
The main reason we used this objective function is that it minimizes the number of links that are removed and the total sum of node degree centralities in both partitions and is approximately equal. In Appendix B, we show the exponential convergence of the power iteration method to the eigenvector associated with the second smallest eigenvalue of .
Complexity of the HPI-Ncut Algorithm. In Appendix C, we show that the complexity of the spectral bisection algorithm is and the complexity of the hierarchical clustering algorithm is where is the number of iterations in the power iteration method. The power iteration method converges with exponential speed as . The average degree is almost constant for large sparse network. Hence we may expect asymptotically good results with for any , giving the hierarchical spectral clustering algorithm a complexity of . In practice, we have used , which gives a complexity of .
3. Results and Discussions
In this section, we compare existing node targeting attack strategies with respect to the new definition of cost. We make explicit assumption that the cost of removing a node is proportional to the number of the adjacent edges that have to be removed. This suggests that the nodes with higher degree have higher associated removal cost.
3.1. Effectiveness of the Node Targeting Attack Strategies
By taking into the account the degree-based cost in targeted attacks, the results can be highly counterintuitive. The performances of the state-of-the-art node removal-based methods are in some cases even worse than the naive process of random removal of nodes (site percolation), when we take into account the attack cost, as shown in Figures 1 and 2. In fact, networks have their intrinsic resilience under attacking for their distinct network structures. To avoid the interference of the architectural difference of networks, we use site percolation method as a baseline null model. The site percolation strategy randomly removes nodes in a network, which could be used to reflect the intrinsic resilience of the attacked network to a certain extent. The cost-fragmentation effectiveness of the site percolation is denoted with ; see details in Section 2.3.
(a) Power Grid
(b) Political Blogs
(d) Autonomous Systems
(d) SBM with ten clusters
Table 2 summarizes the DCFE of different attack strategies on eight networks. Table 3 summarizes the improvement of DCFE of different attack strategies comparing with the null model (site percolation), which is calculated as . On the whole, all node-centric strategies (HD, HDA, EGP, CI, CoreHD, Min-Sum, and BPD) distinctly work better than baseline on the three networks with lower average degree, that is, Power Grid, SF (), and SBM network. However, on empirical social Petster-hamster network, Political Blogs network, Autonomous Systems network, and SF () network, all these node-centric strategies are comparably equal or even worse than the baseline method, according to the DCFE score. More interestingly, for a fixed budget, many networks are more fragile with the HD attack strategy than by HDA, as the results shown in Tables 2 and 3. In the last line of Table 3, we compute the average value of the improvement over different networks, which can reflect the overall performance of the algorithms. These results suggest that state-of-the-art node removal-based algorithms in realistic settings are rather inefficient if the cost of fragmentation is taken into account.
3.2. Effectiveness of the HPI-Ncut Attacks
In this section, we will compare the proposed edge removal-based attack strategy, HPI-Ncut algorithm, with random uniform attack, edge betweenness, bridgeness, and some classical node removing strategies (see the details in Section 2). The results show that the HPI-Ncut strategy greatly decreases the cost of the attack, comparing with the state-of-the-art removing strategies.
In general case, each attack strategy algorithm could generate a ranking list of all (or partial) nodes or links of the network. After removing the nodes or links one after another, the size of the GCC of the residual network characterizes the effectiveness of each algorithm. The removal process will cease when the size of the GCC is smaller than a given threshold (here we use 0.01). In this paper, to test the effectiveness of this spectral edge removal algorithm, HPI-Ncut, we plot the size of the GCC versus the removal fraction of links, for both real networks (Figures 1 and 3) and synthetic networks (Figures 2 and 4), comparing with classical node removing algorithms (Figures 1 and 2) and existing link evaluation methods (Figures 3 and 4). The results show that the HPI-Ncut algorithm outperforms all the other attack algorithms.
(a) Power Grid
(b) Political Blogs
(d) Autonomous Systems
(d) SBM with ten clusters
In Figures 1 and 2, we compared the HPI-Ncut algorithm with some state-of-the-art node removal-based target attack algorithms. Figure 1(a) shows that all the node removal-based algorithms are better than the site percolation method on Power Grid network, which is because the average degree of the Power Grid network is very low, only 2.67. This could also be verified by the results in Figures 2(c) and 2(d), in which the average degree of the SF () and the SBM network are 2.35 and 2.60, respectively. The trends of the curves in Figures 1 and 2 also show that the target attack algorithms work better on networks with lower average degree. Furthermore, regardless of the HPI-Ncut algorithm, other algorithms have poorer performance than baseline method (site percolation). The performances of site percolation are better until the proportion of the removed links is greater than 0.7 on SF () network and until the proportion is greater than 0.2 on SF () network. The site percolation on the SF () presents an obvious phase transition phenomenon  comparing with the result on the SF (). In addition, in Figures 2(a) and 2(d), the SBM network has obvious clusters structure comparing with the ER network. The BPD, Min-Sum, CI, CoreHD, EGP, and site percolation algorithms have a better performance on the SBM network. Moreover, the error of the site percolation method on the ER network is larger than the error on SBM network. That implies that the cluster structure of a network has a big influence on the performance of the attack strategies.
To conclude the results of Figures 1 and 2, the state-of-the-art targeted node removal strategies make large cost for optimized targeted attacks. When it is possible to apply edge-bases strategies, the HPI-Ncut algorithm overwhelmingly outperforms all the node removal-based attack algorithms, no matter on sparse or dense networks or on the networks with or without clusters structure. It is also interesting to show that some of the node targeted attack strategies (BPD, Min-Sum) can also outperform edge based strategies on several networks (PG, ER, SF, and SBM), but not the HPI-Ncut.
In Figures 3 and 4, we compared the HPI-Ncut algorithm with some exited link evaluation algorithms. First of all, we can find that the HPI-Ncut algorithm works better and is more stable than all the other algorithms. Secondly, comparing with the results of site and bond percolation in Figures 1 and 2, we can see that the bond percolation method outperforms the site percolation method only when the average degree of the network is lower (see the results of the Power Grid, SF (), and SBM network); otherwise, the site percolation is a better choice. Thirdly, in Figures 4(b) and 4(c), we can see that the bond percolation method has a better performance comparing with the edge betweenness and bridgeness algorithm when the cost is limited on scale-free networks; that is, the proportion of the removed links is smaller than 0.63 in Figure 4(b) and is smaller than 0.4 in Figure 4(c). To conclude, the HPI-Ncut algorithm overwhelmingly outperforms all the node removal-based attack algorithms and link evaluation algorithms, no matter on sparse or dense networks or on networks with or without clusters structure.
3.3. Spreading Dynamics after HPI-Ncut Immunization
To more intuitively display the ability of the HPI-Ncut to make immunization of links, we studied the susceptible-infectious-recovery (SIR)  epidemic spreading process on four real networks. We compared both the spreading speed and spreading scope on these networks before and after targeted immunization by HPI-Ncut. The simulation results in Figure 5 show that, by simply removing 10% of links, the function of the networks had been profoundly affected by the HPI-Ncut immunization. The proportion of the GCC of the Political Blogs, Power Grid, Petster-hamster, and Autonomous Systems network after attack are 37% (449/1222), 1% (54/4941), 57% (1146/2000), and 37% (2387/6474), respectively. Thus, the spreading speeds are greatly delayed and the spreading scoops are tremendously shrunken on these networks.
(a) Political Blogs
(b) Power Grid
(d) Autonomous Systems
To summarize, we investigated some state-of-the-art node target attack algorithms and found that they are very inefficient when the degree-based cost of the attack is taken into consideration. The cost of removing a node is defined as the number of links that have to be removed in the attack process.
We found some highly counterintuitive results; that is, the performances of the state-of-the-art node removal-based methods are even worse than the naive site percolation method with respect to the limited cost. This demonstrates that the current state-of-the-art node targeted attack strategies underestimate the heterogeneity of the cost associated with the nodes in complex networks.
Furthermore, in cases when the link removal strategies are possible, we compared the performances of the node-centric (HD, HDA, EGP, CI, CoreHD, BPD, and Min-Sum) and edge removal strategies (edge betweenness and bridgeness strategy) based on the cost of their attacks, which are measured in the same units, that is, the ratio of the removed links. We propose a hierarchical power iterative algorithm (HPI-Ncut) to fragment a network, which has the same objective function with the Ncut  spectral clustering algorithm. The results show that HPI-Ncut algorithm outperforms all the node removal-based attack algorithms and link evaluation algorithms on all the networks. In addition, the total complexity of the HPI-Ncut algorithm is only , which makes it very practical to be applied on large scale networks over a million of nodes.
The underestimated cost of current state-of-the-art algorithms with respect to the degree-based cost has high influence on the development and design of better robustness and resilience mechanisms in complex systems. Furthermore, more accurate estimation of robustness under realistic conditions will allow better allocation of response resources.
A. Objective Function
Let be an undirected graph with adjacency matrix and diagonal degree matrix , whose th entry is the degree of the node . For , let denote the number of links between and its complement . We definewhere . If we describe the set by the normalized indicator vectorone can show  thatFrom the definition of one can see that finding a set which minimizes corresponds to partitioning the network into two sets and such that (1) is small and hence there are only few links between and ,(2) is small and so sets and contain more or less equally many links.
Finding such a set is NP-hard , but by relaxing the constraints in the RHS of the identity (A.3) one can find good approximate solutions :(1)Find where we have imposed the condition , because every set for which is nontrivial satisfies .(2)Set and define .
The idea behind this method is that will be the best approximation of , out of the set of all vectors with entries in and , and since minimizes ,will be also close to
One can show that a solution to (A.4) is given by , where is the eigenvector of the second smallest eigenvalue of the normalized Laplacian matrix is a diagonal matrix and if the network is connected we have . So the entries of the vectors and have the same sign and therefore we have .
B. Exponential Convergence of the Power Iteration Method
is real and symmetric. Therefore it has real eigenvalues corresponding to eigenvectors which form an orthonormal basis of . One can easily show that and . So in order to compute we consider the matrix , which has the same eigenvectors as . Now the corresponding eigenvalues are and in particular corresponds to the largest eigenvalue and to the second largest eigenvalue.
If is a random vector uniformly drawn from the unit sphere and we force it to be perpendicular to by setting ; then and almost surely. Furthermore and if we set , thenconverges with exponential speed to some eigenvector of with eigenvalue , because for every with we have and therefore . Generally one can deduce from (B.1) thatand therefore this quantity converges to with exponential speed.
The complexity of the spectral bisection algorithm is the same as the complexity of the power iteration method. The complexity of the power iteration method equals the number of iterations times the complexity of multiplying and , that is, where is the average degree of the network, or equivalently where is the number of edges.
Assuming that the spectral bisection algorithm always produces clusters of equal size, the complexity of the hierarchical spectral clustering algorithm is then given by the sum of(i)the complexity of applying spectral bisection once on the whole network (ii)The complexity of applying it on each of the two clusters that we obtained from the first application of spectral bisection and which will have size (iii)The complexity of applying it on each of the 4 clusters that we obtained from the previous step and which will have size (iv)The complexity of applying it on each of the clusters that we obtained from the previous step and which will have size .
That is, in total at mostwhere we have made the pessimistic assumption that the number of iterations and the average degrees are in each step as large as they were in the beginning.
The choice of the function is a little bit involved. If the initial random choice of the vector is very unfortunate, there may be many iterations needed in order to have a good approximation of the eigenvector . In fact, if , then this algorithm would not converge to at all; however this event has probability .
Another condition that might slow down the computation of is if some of the other eigenvalues , are close to . In that case would be close to and therefore one can see from (B.1) that the corresponding might have a large contribution in for a long time. However when is close to , this also implies thatis close toand therefore also provides a good partition of the network, since these are the quantities that are related to the cut-size.
Due to this fast convergence, one can expect asymptotically good partitions when and , giving the hierarchical spectral clustering algorithm a complexity of in general and for sparse networks.
D. HPI-Ncut Algorithm with Different Number of Partitions
Previous sections give us a clear picture about the performances of different attack algorithms. Some algorithms work quite well, such as HPI-Ncut algorithm, Min-Sum algorithm, and edge betweenness algorithm, while others are not. What causes such a difference? Figure 6 may give us a clue. In this toy example, the original network is a two clusters’ SBM model with totally 2078 nodes and 3729 links. Figure 6 shows the visualization of the top 10% removed links of different algorithms. Please note that the number of the red links in Figures 6(b)–6(f) is the same, namely, 373. However, comparing with edge betweenness and HPI-Ncut algorithm, much less of links between the two clusters are removed by EGP and CI algorithm, and more links are distributed among the left or the right cluster. Furthermore, comparing with edge betweenness algorithm, the links removed by HPI-Ncut algorithm mainly are distributed in the bridge part of the two clusters. This helps to partition the network into two disconnected clusters.
(a) Original network
In the previous sections, the default target number of the disconnected clusters in HPI-Ncut algorithm is set to 2. Figure 7 shows the size of the GCC after targeted attack by HPI-Ncut with different target number of disconnected clusters, on the SBM network with two clusters and with ten clusters, respectively. Figure 7 indicates that when the original networks contains less clusters, the target number of clusters in HPI-Ncut will greatly affect the size of GCC in the initial stage of the target attack, while this influence will decline sharply in the later part of the attack process. However, the target number has a smaller impact on the attack performances of the HPI-Ncut when the original network contains much more clusters. Furthermore, when the target number of the disconnected clusters is set to 2, we can always obtain the optimal outcome on both networks. To conclude, we recommend setting the default target number of the disconnected clusters to 2 in HPI-Ncut algorithm.
(a) SBM with two clusters
(b) SBM with ten clusters
Conflicts of Interest
The authors declare that they have no conflicts of interest.
The work of Nino Antulov-Fantulin has been funded by the EU Horizon 2020 SoBigData project under Grant Agreement no. 654024. The work of Dijana Tolić is funded by the Croatian Science Foundation IP-2013-11-9623 “Machine Learning Algorithms for Insightful Analysis of Complex Data Structures.” Xiao-Long Ren acknowledges the support from China Scholarship Council (CSC).
J. P. Sethna, Statistical Mechanics: Entropy, Order Parameters, and Complexity, Oxford Master Series in Physics, Oxford University Press, Oxford, UK, 2006.
P. Erdos and A. Rényi, “On the evolution of random graphs,” in Publication of the Mathematical Institute of the Hungarian Academy of Science, pp. 17–61, 1960.View at: Google Scholar
J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graphs over time: densification laws, shrinking diameters and possible explanations,” in Proceedings of the KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 177–187, ACM, New York, NY, USA, 2005.View at: Publisher Site | Google Scholar
R. Cohen and S. Havlin, Complex Networks: Structure, Robustness and Function, Cambridge University Press, Cambridge, UK, 2010.