Abstract

Skeleton network extraction is a crucial context in studying the core structure and essential information on complex networks. The objective of this paper is to introduce the novel network extraction method, namely, TPKS-skeleton, for investigating the global terrorism network. Our method aims to reduce the network’s size while preserving key topology and spatial features. A TPKS-skeleton comprises three steps: node evaluation, similarity-based clustering, and skeleton network reconstruction. The importance of skeleton nodes is quantified by the improved topology potential algorithm. Similarity-based clustering is then integrated to allow detecting high incident concentrations and allocating the important nodes according to the event features and spatial distribution. Finally, the skeleton network can be reconstructed by aggregating high-influential nodes from each cluster and their simplified edges. To verify the efficiency of the proposed method, we carry out three classes of a network assessment framework: node-equivalence assessment, network-equivalence assessment, and spatial information assessment. For each class, various assessment indexes were performed using the original network as a benchmark. The results verify that our proposed TPKS-skeleton outperforms other competitive methods in particular node-equivalence by Spearman rank correlation and high network structural-equivalence defined by quadratic assignment procedure. In the spatial perspective, the TPKS-skeleton network preserves reasonably all kinds of spatial information. Our study paves the way to extract the optimal skeleton of the global terrorism network, which might be beneficial for counterterrorism and network analysis in wider areas.

1. Introduction

For over the past decade, complex network analysis has become a vital component to understand behavioral patterns of the global terrorism network. Fertile knowledge discovered from investigating various complex network properties helps to reduce the risk or effect of any attack that would happen in the future [1, 2]. More than that, it can deliberately carry out feasible disruption plans or demolishment of terrorist organizations [3, 4].

In recent years, mining and analyzing the entire global terrorism network is, however, a nontrivial task. There remain various issues to complex network analysis. First, an excessive amount of data has far exceeded the capability of most available technologies to interpret and analyze the large-scale network appropriately. The statistical evidence from START [5] shows that nearly 182,000 terrorism incidents occurred globally from 1970 to 2017. Since the world experienced on September 11th, the total number of terrorism incidents have rapidly increased and reached its peak in 2014. Despite the fall in deaths for the later three consecutive years, the terrorist incidents in Europe, nevertheless, increased [6]. Second, terrorism is growing not only in the network’s size but also in the complexity of the social relationships involved. This fact could be confirmed by the suicide bombing attacks of Sri Lanka on Easter Sunday, 2019 [7]. The information implies that the collaboration among terrorist groups is not a straightforward matter. Local terrorists, alone, could have attempted but may not succeed in executing the attack on their own unless they have collaborators. Third, it is undeniable that the large-scale network always comes along with noisy information. The terrorism data are collected from multiple sources containing a number of unimportant elements and are also inconsistent. Directly modeling a network may cause misleading relationships and makes the network far from its actual structure. The extent of these problems affects network analysis and the discovery process to be computationally very expensive and constantly faced with high difficulties.

In this paper, the advantages of analyzing the global terrorism network are under the discrimination of the network’s core structure. Generally speaking, instead of investigating the significant information from the entire network, we pay attention to the skeleton network [8]. The skeleton network relies on the concept of minimizing the network’s scale and complexity [9]. It is the simplified form of any original network [10] that would allow network analysis methods to perform faster and more efficient [11]. The skeleton network helps to easily visualize, identify, and understand relevant information over the network’s natural abstraction. At the same time, it still preserves the most useful insights and the major functions of the original network. Therefore, research on skeleton network extraction may help to discover the ground-truth relationship of the global terrorism network much easier.

The main objectives of this paper are as follows: first, we propose an idea of extracting the skeleton network using the influential nodes identified by an improved topology potential algorithm (TPKS-skeleton). The extraction strategy is conducted under the assumption that a skeleton network could retain its functions and behavior patterns with a small portion of the important nodes. Second, we apply the TPKS-skeleton method to obtain intrinsic insight of the global terrorism network. The influence value of all nodes in the network is evaluated to identify the optimal skeleton nodes with high significance. The integrated process of the clustering algorithm over the similarity-based network is applied to detect the group of nodes regarding their region in space and characteristics of terrorism events. Consequently, the skeleton network is reconstructed by a set of highly influential nodes from each cluster and the simplified edges. Through evaluation with the quadratic assignment procedure (QAP) and Voronoi region, our skeleton network extraction method is efficient in preserving the topological structure and spatial information as consistent with the original network. Results from our study can also provide valuable contributions to the field of social network analysis and related areas.

The remainder of this paper is organized as follows: in Section 2, we describe the current situation on skeleton network extraction and its lack of contribution to the terrorism domain. In Section 3, the concept of TPKS-skeleton for network extraction is presented. In Section 4, we briefly describe the details of the global terrorism dataset, the construction process of the original network, and the network assessment framework. Furthermore, we discuss the deployment of TPKS-skeleton for extracting the core network structure and its results in Section 5. Finally, Section 6 gives conclusions to this paper.

2. Current Situation on Skeleton Network Extraction

The literature on skeleton network extraction takes a number of different approaches. Most of them have introduced techniques to deal with the edge reduction problem, for instance, edge weighting [12], edge sampling [13], statistical importance of links [14], and unexpected noise connections [15]. These methods try to remove edges as many as possible but retain countless of vertices. We feel that only a small number of important nodes and their connections might be satisfactory for network functionalities. The basic thought of our approach is that a set of the most influential nodes can sufficiently describe the ground-truth information of the global terrorism network.

Another argument shows that recent network extraction methods have a successful application in the social network domain. They are developed and applied to different networks, ranging from coauthorship networks [16], online social networks (e.g., Youtube, Live Journal, and Goodreads blogging community) [11], and communication networks [17, 18]. Unfortunately, it is rare to find empirical investigations of the global terrorism network, notably via the shape of a skeleton. In network science, recent studies focus on resilience inside the terrorism network as a consequence of network disruption strategies [19, 20]. Rostami and Mondani [21] work on comparing different criminal networks to illustrate the complexity, similarity, and reliability of measurement in view of social network analysis (SNA). Research from [22] highlights the evolution of hub nodes via the backbone network structure. Their work is the only one that closely related to our research. The existing studies give evidence that skeletonizing discipline has not been confirmed sufficiently to its performance, especially in extracting the hidden behaviors and meaningful insights into the global terrorism network.

In spatial context, efficient network extraction methods are not only used to reduce network size but are also supposed to preserve the key topology and spatial features [23]. It is worth to remind that impact of terrorism remains widespread over the targeted geographical area. The global terrorism network contains attributes associated with geolocation where the incidents occurred. Although some of the existing methods can be applied to spatial networks, they have not taken into account the spaces occupied by vertices and spatial distribution. Existing methods are purely statistical, which has negligible implications in a spatial notion.

Besides, the efficiency of empirical methods to retain spatial features has not yet been clarified. It is still unclear which methods are dominant in the network that contains spatial attributes. Disregarding spatial features may lead to frustration in understanding terrorism behavior. The usage of existing applications for spatial networks might be doubtful. For this reason, it requires specific skeleton network extraction algorithms that are sensitive to the network topology and geography. This situation calls for a novel approach that could potentially be used to improve a comprehensive understanding of the global terrorism network and other network mining processes. The question arises whether the skeleton network extraction based on topology potential can sufficiently preserve both topological information and spatial information of the original network.

3. Method

The proposed network extraction method consists of three consecutive processes: (1) evaluation of node influence value by an improved topology potential algorithm, (2) identification of important nodes associated with the geographical area and event features, and finally, (3) reconstruction of the skeleton network.

3.1. Influential Nodes by an Improved Topology Potential

Topology potential (TP) is brought as a key technique to measure the influence of the nodes over recent years. The efficiency of TP can be proved by evidence that covers the analysis methods of social networks [24, 25], medical science [26, 27], and information search area [28, 29]. TP was introduced in light of the data field [30, 31]. It is based on the concept that “the importance of nodes which are adjacent to the important nodes” [32]. Such that, TP is an indicator to reflect the node’s ability, which is influenced by the differential position of each node and its neighbors [33].

In this research, the subject of a global terrorism network is actors (terrorists and targets) that can be represented by nodes. Edges illustrate the interactive relationships between different types of nodes. Given a directed complex network , consists of the set of nodes and the set of edges . represents each unique node, while denotes the interactive relation from the node to the node , i.e., attackers’ action. TP is defined in the form of Gaussian function [34] and can be formalized aswhere is the TP of the node , for . expresses the shortest distance from the node to the node . The parameter σ is a factor used to control the influence region of each node. represents the mass of the node, which is redefined by k-shell centrality, and [8]. Accordingly, the influence value of the nodes is induced by a different value of in an improved TP algorithm, namely, TPKS.

TPKS describes the influence of the nodes itself and its neighbors [35]. The source node which is terrorist can pass its impact to the target through the reachable path in network topology structure [36]. As previously tested and confirmed by [37], TPKS provides a great potential application in the context of skeleton network extraction. The reason is straightforward: TPKS is more likely to create a skeleton network from a small set of highly influential nodes. This set of nodes is sufficient and significant for the network’s key functions. It is showing that the extracted network from TPKS is much reduced in size and complexity as concerning the objective of this paper. Therefore, the resulting network would solve some cost issues in the large-scale network and allows other network analytical algorithms to perform precisely.

3.2. Clustering over Similarity Network

The selection of skeleton nodes can be manipulated by the specific clustering algorithm, which relied on both geolocation and event features. To our knowledge, clustering is a fundamental qualitative concept that is described by assigning all observations into different clusters [38]. A resulting cluster is a group of densely connected nodes that are more similar than those in different clusters. Thus, the distribution of the nodes in the global terrorism network and their feature similarities can be used to determine the influences of the core node to other nodes. The significance of clustering in this study is to detect high incident concentrations and allocate the high influential nodes of each cluster accordingly. It is also mainly done to break up the dataset into subgroups and correctly select essential nodes as the core membership for a skeleton network.

To complete this task, we employ the similarity network model [39] to quantify any resemblance between terrorist organizations from the given terrorists-to-targets network. The algorithm creates a summary vector for each terrorist based on the associated targets. This algorithm requires extremely high computational demands and exceeds the capability of our available resources. To overcome this issue, we include only the terrorist events that started in 2001, the year of the tragic incident on September 11th [40]. After the summary vector is obtained, the algorithm continually computes the similarity scores according to the TPKS value and the following event features: (1) Geolocation: the geographical attribute could influence the accessibility of spatial nodes in the global terrorism network. It is, therefore, realistic to define the dissimilarity between nodes by their location in the physical space. For each terrorist, we calculate the mean value of geolocation coordinates (latitude and longitude) of all its targeted attacks. It is for estimating the active geographic area of the terrorism events caused by a certain terrorist organization. (2) Peak year: this is a year where the terrorist performs the most events. The peak year is to show how active the terrorist is associated with the time dimension. (3) Lethality: the summation of deaths and the number wounded from all attacks by a terrorist. (4) Attack type: the general method or tactics that the terrorist used the most in all of its events. (5) Weapon type: the type of weapon that the terrorist used the most in all of its events. , (6) Target type: the victim types that terrorists targeted the most in all of its events.

Algorithm 1 explains the detailed description for constructing a similarity network based on geolocation and event features. In line with this, a network element is defined as a small subgraph containing a terrorist node with many attacked targets. Once the similarity score between any pair of subgraphs reaches the threshold value, the algorithm connects two subgraphs by adding a tie between terrorist nodes. Subsequently, we apply a k-means clustering algorithm to partition similarity network into different k clusters [41]. To obtain a value of k, we use the factoextra:fviz_nbclust() integrated with Elbow analysis in the R platform. Consequently, k = 4 is suggested as the optimum number of clusters for the similarity network. Each node that belongs to the same cluster is detected from the nearest means by Euclidean distance and has the greatest possible distinction to other nodes in different clusters.

Similarity Network based on Geolocation and Event Features
Input Original Network , Terrorist Node
Output
(1) m-Modified Topology Potential function ()
(2) TPKS  0
(3) for each node todo
(4)  calculate TPKS of within hops
(5) end for
(6) 
(7)similarityScore function ()
(8) score  0
(9) fordo
(10)  if feature is unbounded numeric value then
(11)   curr = 
(12)  if feature is bounded numeric value then
(13)   curr = 
(14)  if feature is categorical then
(15)   curr = 
(16)   curr = 
(17)  if feature is geolocation then
(18)   distance = 
(19)   curr = ;
(20)  score+ = curr;
(21) return score
(22) 
(23)similarityNetwork function ()
(24) for do
(25)  
(26)  
(27)  
(28)  
(29)  similarity  similarityScore ()
(30) if similarity then
(31)  add tie between and
(32) return
3.3. Reconstruction of Skeleton Network

The practical procedure of skeleton network reconstruction includes four steps. In the first step, the initial set of terrorist nodes from the similarity network is identified. For each cluster, the algorithm selects high-influential terrorists with a number of top nodes that satisfied . From this step, we could obtain the terrorist network that covers some disconnected components. This is because the top terrorist nodes from different clusters are dissimilar, and they might have originally no connection with each other. In order to maintain the network’s key structure, we focus only on the connected network , which has the highest degree of connectivity.

In the second step, the relative targets are discovered with the identical parameter . This means all the target nodes, which have at least one connection to any of the terrorist nodes from the first step, are ranked in decreasing order. Then, the algorithm picks only the top target nodes according to their TPKS value.

In the third step, the algorithm merges a set of top terrorists and the top targets . Both terrorist and target nodes and their formerly connected edges are aggregated to form the whole skeleton network.

Finally, all edges between two nodes are mapped into a single one, and the summary statistics or attributes of the edges are combined. Such that the node set and edge set of the skeleton contain only the most important and , and the size of the new network is much smaller than the original network. The summary description of the TPKS-skeleton is given in Algorithm 2.

Skeleton Network Extraction Model (TPKS-skeleton)
Input Original Network , Similarity Network , nodes in different clusters
Output Skeleton Network
(1)TPKS-skeleton ()
(2) ;
(3) ;
(4)  = max(degree ())
(5) 
(6) 
(7) ;
(8) 
(9)  = merge(, )
(10) 
(11) for each pair of nodes ,
(12)  if no. of edge between then
(13)   map all edges into a single tie
(14)  else
(15)   remain a tie between ()
(16)  end if
(17) until the network is connected
(18) end for
(19) return Skeleton Network

4. Experimental Settings

In this section, we introduce the network data and the network assessment framework used in this study. The TPKS algorithm is developed using Python programming language. The skeletonizing experiment and comparison were mainly conducted on an open package available in the R platform. Rstudio has been used to generate data visualization.

4.1. Data Source Description

We use a dataset from the Global Terrorism Database (GTD) to construct the network. The GTD is retrieved and maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START) [5]. The terrorism incidents are recorded from several publicity sources that include media articles, electronic news, books, journal, and legal documents. The observations span from 1970 to 2017. However, our study excludes the incidents from 1993, as there was only 15% of attacks could be reliably identified [42]. Moreover, geolocations of 1993’s incidents that could benefit our study were not available. The entire GTD dataset contains nearly 182,000 observations with 135 attributes, e.g., perpetrator information, victim information, geolocation, number of casualties and consequences, and so forth.

The GTD is a large-scale network dataset that is continually increasing with time. A cleaning process is necessary for preparation before the study. The incidents in the GTD are often ambiguous between terrorism and other forms of crime or political violence. In order to make a clear determination regarding this issue, we filtered out all incidents that the GTD suggested to be unclear. In cases that incidents were carried out by “Unknown Terrorist” and “Unknown Target”, they were removed from the dataset. Since our study relies on spatial information, any event with unavailable geolocation was filtered out. Our work does not include the attacks that were attempted but were ultimately unsuccessful. Moreover, we eliminate all incidents that were targeted in the multinational area or occurred in the international area. It is to maintain network modularity to a particular nation. The cleaning process results with 70,113 observations remaining in the global terrorism dataset.

4.2. Original Network Construction

The global terrorism network contains all actors and their relationships involved in each terrorism incident. Over the entire network, two types of actors (terrorist nodes and target nodes) are connected by directed links. A relationship is defined whenever a terrorist attacks a target in the event at the timeframe . A directed edge then emits from the terrorist node to the target node , where is the initial timeframe. Each target node is associated with the geolocation where the incident occurred. The severity of the attack obtained by the number of deaths and wounded can be used to define the weight of an edge. According to this structure, the generated network is a directed, weighted, and bipartite network (DNWB) whose relation’s structure is presented across a particular set of nodes. Thus, the connection between nodes within the same group (i.e., terrorist-to-terrorist, target-to-target) is not allowed.

4.3. Assessment Framework for Skeleton Network Extraction

To verify the accuracy and performance of the network extraction method, we proposed the network assessment framework, which is comprised of three classes: node-equivalence assessment, structural-equivalence assessment, and spatial information assessment. Several measures for the extracted skeleton network will be computed and compared with the original network.

Node-equivalence assessment is to find the similarity of all preserved nodes in the skeleton network that cover the important nodes in the original network. It is to discover the significant correlation in the degree of dominance from the individual actors. The evaluation indexes which are integrated into this class are the measurement of Spearman’s rank-order correlation [43], and the feature’s correlation matrix [44].

Network-equivalence assessment is to find a significant equivalence of two networks in the overall topological structure. The validation strategies in this class are the basic structural features of the network and the network’s correlation by QAP [45].

Spatial information assessment is to find how the skeleton network and the original network exhibit relationship among spatial nodes. In this class, different kinds of spatial information on the Voronoi diagram [46] are explored and described whether the information pattern of the two networks is consistent.

5. Results and Discussion

5.1. Reduction Effectiveness of TPKS-Skeleton

On extracting the skeleton network, the higher control parameter retained many significant nodes that have high TPKS value, as well as noise nodes. Lower , on the other hand, filters a larger number of nodes that were the key to the network information, leading to a risk of network destruction. Setting the control parameter too low or too high may result in extraction inefficiency. Therefore, it is necessary to examine the suitable extraction level beforehand. For this purpose, we refer to the empirical research from [47]. Their proposed metric of reduction effectiveness guides us to specify the extraction capacity, which is appropriate for a certain network. The reduction effectiveness could be found by quantifying the size of isolated subnetworks. Owing to the fact that the skeleton network contains the most significant connected nodes that hold key information. Removing the skeleton will cause the entire network structure to break into small pieces. Without a skeleton structure, the network seems to malfunction as the remaining subgraphs cannot operate further. For this experiment, the networks were generated at different compression ratios. We set the control parameter ranging from 0.01 to 1. Then, the number of isolated subnetworks could be derived by removing the skeleton network from the original network at different scales.

Figure 1 shows the compression ratio and the number of subnetworks. The fitted curve is continually decreasing after reaching the peak at the compression ratio of 0.2 (skeleton of top 20% nodes). At the compression ratio of 0.5 to 0.8, the extracted network reveals a relatively high number of subnetworks. It implies that when the backbone structure of a particular compression ratio is destroyed, the global terrorism network could collapse easily. Although the extracted network from the compression ratio 0.5 to 0.8 provides voluminous subnetworks, the network remained bulky in size. Thus, continually compressing the network is no longer effective. It can be concluded that whenever the number of an isolated subnetwork is comparatively large, the probability that skeleton nodes are sufficiently small, and the entire network structure could be collapsed after eliminating the skeleton . This experiment gives direct evidence of measuring the performance of TPKS-skeleton should exclude the situation of inefficient compression ratio.

Together with the reduction effectiveness, the coverage ratio is applied. The coverage ratio is empirically used to measure the overall skeleton nodes from different granularities that cover the important nodes of the original network [47]. Based on the TPKS value, all the nodes are ranked in descending order. Then, the coverage of each node can be calculated:

For the arbitrary node , indicates its ranking value in the extracted skeleton network, while is its sorting value in the original network. A node in a skeleton network which has the larger coverage value implies the better quality. Therefore, the prominence of the skeleton network at a particular scale depends on the distribution of all nodes’ coverage. The coverage ratio of a network is defined in equation (3). The larger the coverage ratio, the higher the accuracy of the extracted skeleton.

The significant coverage ratio of the skeleton network is declared in Figure 1. The ratio was normalized, indicating 1 as the maximum score, while 0 is the minimum. It is depicted that the network in the range of compression ratio 0.1 and 0.2 can provide a higher coverage ratio beyond others: 0.1175 and 0.0926, respectively. When the compression ratio increases up to 0.3, the coverage ratio gradually changes downwards, and the network preserves a much higher number of nodes. Therefore, the compression ratio 0.2 was chosen as the optimal ratio for extracting the network by the TPKS-skeleton. This ratio is more efficient than others both in terms of breaking connections over the entire network, while the coverage of important nodes is well preserved.

Figure 2 illustrates the graphical representation of the original network and the extracted network with the compression ratio 0.2. The original network in Figure 2(a) exhibits the hairball structure where its real topological structure is hidden in the noise. Some of the periphery nodes appeared in the network; those can be considered as isolated nodes with few or weak connections. Compared to the original network, the TPKS-skeleton network in Figure 2(b) contains fewer high-influential terrorists and their corresponding targets (top 20% nodes from each cluster). There is only a single connection between a node pair. Thus, the network is comparatively smaller and less complex than the original network. The TPKS-skeleton network at this scale intends to sufficiently explain the behavioral patterns of the original global terrorism network hereafter.

5.2. Node-Equivalence and Spearman Ranking Correlation

To evaluate the node importance ranking, we investigate the relationship between the top nodes from two networks. We list the top 20 terrorists and target nodes from the skeleton network to compare with the top 20 terrorists and targets from the original network.

As presented in Table 1, the association between the highly influential nodes is measured and ranked by TPKS values, which are quantified by equation (1). With the removal of some peripheral nodes at a specific compression ratio, the TPKS-skeleton network contains nearly all key nodes, most of which are positioned at the top of the ranking. Taliban ranked as the deadliest terrorist. It was followed by ISIL, Muslim Extremists, Al-Qaida in Iraq, IRA, ETA, and ISI. However, Boko Haram, one of the most dangerous terrorists in 2019 [48], is out of the list from our TPKS-skeleton. Excluding Boko Haram in the skeleton network implies that this terrorist group has a weak relationship to other key terrorists.

In this exploration, different features of all terrorist nodes are expanded to plot the correlation matrix [44]. The plots illustrated in Figure 3 enable understanding positive, negative, weak, or strong relationships between one feature and other features. The correlation coefficient indicated by the color of the cell accounts for the strength of correlation. In examining attack type for both the original network and skeleton network, we can see a strong, positive correlation with weapon type. This information signifies that terrorists tend to perform different types of attacks considering the type of accessible weapons. However, geolocations (latitude and longitude) for both networks show no sign of correlations. Focusing on the original network, most of the event features are lacking in relationships with other features. The correlation matrix manifests the global terrorism network is extremely unpredictable, as the cooccurrence of the incidents is varying. However, the correlation among terrorist features in the skeleton network more correlates than with those in the original network. The skeleton network exhibits a weak similarity and positive correlation between year, lethality, attack type, target type, and weapon type. The TPKS-skeleton not only maintains a similar ranking correlation to the original network but also makes the relationship between two nodes’ properties more pronounced.

When only the target nodes in Table 2 were considered, Private Citizens & Property, Business, and Government in Belfast are positioned at the top 3 nodes in both the skeleton network and the original network. The list of top targets showed that Belfast held the most vulnerable area for global attacks. Cities in European Union countries, including Paris, Lurgan, Londonderry, Newtownabbey, London, and Manchester, have dominated the rest of top 20 targets in the TPKS-skeleton. Likewise, Baghdad, Istanbul, and Basra from the middle east region countries are also included. However, there was less dominance in the original network by target nodes from South America.

To confirm the ability of the TPKS-skeleton method in terms of node-equivalence, the correlation analysis is employed. We adopt Spearman rank-order correlation [49] to find a statistically significant relationship between the influential values of the nodes from the original network and ones from the skeleton network. Spearman’rho is calculated based on the ranked value and does not carry any assumption about the distribution of the data. Where represents the Spearman rank correlation. is the difference between ranks of the corresponding variables, while n denotes the number of observations. The value of correlation coefficient can be interpreted between +1 and −1. A perfect degree of association between the two variables is indicated by 1. As the correlation coefficient goes towards 0, the relationship will be weaker. The negative sign indicates antirelationship among two variables.

The performance of TPKS-skeleton is compared with five competitive methods, including disparity filter (DF) [12], global weight thresholding (GWT) [50], k-core decomposition (k-core) [51], minimum spanning tree (MST) [52], and w-skeleton [53]. Inevitably, the network extraction methods rely on different disciplines. Choosing a suitable parameter for each method is difficult for an impartial judgment. For this reason, we study the reduction effectiveness of each method and select one best skeleton at a certain granularity. The optimal skeleton network, which can provide the highest number of subgraphs, is chosen as a competitor. However, the MST is an exception to this criterion because the method does not require any control parameter as an input. The best-sized tree is directly selected and constructed by applying mst() function available in the R platform.

The traditional correlation coefficient is defined as a paired observation. The calculation of Spearman correlation is limited for networks that have different sizes and contain unpaired nodes. To find the correlation between two sets of variables (i.e., nodes in the extracted network and nodes in the original network), we design the correlation measure using two techniques. First, could be found by directly removing unpaired observations from the original network. Such that, nodes of the original network that does not appear in the skeleton network are filtered out before computing . In the second technique, we adopted an idea of samples at random and representative samples, as was suggested in [54]. We run the randomization test 1000 times for each method. Then, the average result from a random set is used to reflect the real node’s importance of equivalence between the skeleton network and the original network.

The information in Table 3 demonstrates that of the skeleton nodes from all methods is all positively and profoundly correlated with the influence capability of the nodes in the original network. As a result, it is no surprise that the extracted skeleton from the MST always held a high correlation value in both and . The reason is that the MST preserves all the nodes as equal to the original network. We argue that although the MST can give the highest correlation, the method fails to reduce the network size. In this case, it is meaningless in the application level since the key nodes could not be recognized, and the resulting network is still complex and remains trouble for actual uses. By measuring , the results from GWT and k-core are highly related to the original network. Nevertheless, they gave a low value in a random method . It can be clearly seen that the skeleton network from DF contains the lowest correlation value, whereas w-skeleton gives the best results in , but its is comparatively low. By emphasizing both sides and the average value of correlation, the nodes’ importance ranking from the TPKS-skeleton has the strongest correlation as compared to other traditional methods.

5.3. Network Features and Graphical Representation of Different Skeletons

At this point, we study the cumulative degree distribution of the global terrorism network, where denotes probability density function and is a degree of the nodes. A simple measure of the degree distribution can give us a partial view of the network structure and allows us to discriminate against the mechanisms of a network. Seeing that the cumulative degree distribution provided in Figure 4(a) shows a wide variety and long-tailed distribution. The distribution reveals the heterogeneous topology in the node’s activities, which follows a linear downward sloping trend. From this information, the extracted skeleton network is probably a scale-free network. As similar to the original network, the distribution in the graph means that there is only a small number of actors who have a large number of connections, while the majority of them carry a few links.

The same applies when investigating overall weight distribution drawn in Figure 4(b). Edge weight is compared to the distribution of the remaining edge weight after the skeleton network is extracted successfully. The cumulative weight distribution of the TPKS-skeleton deviates, and it is more linear than the one observed from the original network. The distribution implies that the heavyweight edges are conserved in the skeleton because many edges are merged into a single one, and their weight attribute is combined.

The strength distribution of both the skeleton network and the original network is demonstrated in Figure 4(c). We compute the strength of a vertex by , where indicates the neighbor of node and is the weight of an edge . This metric is used to verify having a node whose strength is larger than . The strength of all nodes extends from the majority of vertices whose links have a minor effect on lethality. Only a fraction of nodes placed with multiple severe attacks. By filtering unimportant nodes, the edges connected to a particular node were also removed. It is the cause of a decrease in the scale of node strength. At the same time, the downward pattern keeps on a related path to the original network.

Another essential characteristic is the clustering coefficient distribution. Note that some network measures cannot be applied directly to the bipartite network. Hence, we applied the network projection method from tnet [55] to reveal the clustering coefficient from the terrorist-to-target network. Figure 4(d) illustrates the distribution of the local clustering coefficient. The clustering coefficient of all nodes from both skeleton and the original network is plotted against the cumulative frequency. The node clustering increases as the cumulative frequency increases. Both networks have some more precise clustered parts and some less clustered parts. The plot entails some of the nodes belong to very dense subgroups, and these subgroups are connected through hubs.

The local clustering ensures that the hub nodes are proportionally discovered across different dense subgroups. Since the global terrorism network is scale-free, the characteristic of hub nodes from different groups makes the network resilient to a random attack. As emphasized in [3], there are many more less-degree nodes than high-degree nodes in the network. Thus, randomly removing a terrorist node has a lower chance of selecting a high priority one. When the boundary nodes are removed, it shows little to no damage to the entire network. Nonetheless, directly targeting known hubs or relatively high-influential nodes, even only a small portion of them, could destroy the whole network structure. Thus, identifying the skeleton network structure by TPKS-skeleton allows preserving native characteristics of the network and significantly offers a great opportunity to destabilize the global terrorism network.

To confirm the power of TPKS-skeleton, we demonstrate the graphical representation of different skeleton networks in Figure 5. The resulting networks reveal their distinguishing features regarding the mechanism which the method relies on. Table 4 also provides several topological features of the original network and different skeleton networks. When the basic network characteristics are carefully investigated, more detailed distinctions from the network structural-level emerge. Different skeleton networks hint at uneven matches with the original network.

In Table 4, the DF method prunes the connections of the original network and produces a skeleton network which includes more than 11% of crucial terrorists and targets and over 7% of involving crucial incidents. DF and GWT are similar both in terms of network visualization (Figures 5(a)5(b)), and topological features, e.g., number of nodes, edges, and average degree. DF holds a higher number of communities (by ignoring the case of the MST) but receives a lower global clustering than GWT. GWT produces the clustering coefficient value (0.3819), nearest to that of the original network (0.3302). The skeleton network from k-core method remains core behavior by neglecting peripheral nodes, as presented in Figure 5(c). k-core uncovers densely connected nodes with the highest maximum degree of connectivity ( = 2864). Compared to other methods, the k-core preserves a large number of edges. The network thus highlights a small set of nodes with the most massive connectivity and is significant for the network core structure.

In Figure 5(d), the MST has roughly the same structure as the original network. The skeleton conserves all the nodes, and only half of the edges were reduced. There are sparse connections among a large number of vertices that diminish the characteristic of cliquishness in the MST and lead the clustering coefficient to drop to zero. Since the number of nodes in the MST is fixed as equal to the original network, it might fail to achieve the objective of the skeletonizing process and network simplification defined in this paper. For w-skeleton, the method only focuses on the largest connected component of the weight thresholding network. Thus, the extracted network in Figure 5(e) merely contains one large connected network and does not include any isolated vertices or other subgroups. w-skeleton conserves more nodes than other methods as a second place following by the MST. Simultaneously, its clustering coefficient is also close to the results of the original network.

In Table 4, TPKS-skeleton is fundamentally allowing a network extraction with the smallest number of nodes and edges (904 nodes and 1349 edges). By focusing on the graphic manifested in Figure 5(f), the TPKS-skeleton network exhibits the densely connected subgroups. The network keeps the highest global clustering coefficient, which far exceeds that of the original network. The reason for the high clustering coefficient is that the skeletonizing process is combined with the similarity network model. Some ties among terrorist groups are added according to the similarity features and geolocations. TPKS-skeleton network contains connections across different classes of nodes, thus leading to a high number of local cycles or cliques. The network displays some isolated subnetworks which are not tied to other subgroups. The network encompasses with nodes which are extremely significant regarding their TPKS value, and their similarity group. TPKS-skeleton reveals the more regionally oriented group of highly influential terrorists and targets with their geolocations over the globe. It means high TPKS nodes will not always be preserved in the skeleton network, but it depends on their role in a specific community as well. Specifically, the relationship among nodes in the TPKS-skeleton network covers not only terrorist-to-target but also terrorist-to-terrorist.

5.4. Network Structural Similarity by QAP Correlation

In this section, we employ a quadratic assignment procedure (QAP) to find the correlations between the two network metrics that retain the integrity of the observed structure [56]. The QAP is a nonparametric, permutation-based test commonly used in SNA. It is applied for identifying the systematic connections between different relations in the dyadic datasets. Based on the concept of the QAP, all pairs of nodes in the global terrorism network are analyzed. The QAP will generate a similar network’s structure by taking the matrix for the observed network and reshuffling (permutating) its rows and columns. The QAP controls permuting the network over a large number of iterations. We set 1000 iterations for this experiment. In doing so, the QAP should efficiently reflect the network’s structural correlation between the original network and different skeleton networks.

Table 5 accounts for the percentage of node reduction. The information is also including a fraction of edge reduction, weight reduction, and QAP correlation, respectively. Recalling from the application of the QAP defined above, the results show that QAP correlations between different skeleton networks and the original network are all statistically significant, except in the case of DF. The network structure of DF exhibits anticorrelation with the original network. It is followed by GWT, which bears the least structural similarity. The network structural-equivalence of the GWT and MST is both moderate (QAP = 0.396 and 0.521, respectively). Interestingly, TPKS-skeleton reduces sharply in network sizes for all aspects (up to 97% for nodes, 98% for edges, and 68% for weights). At the same time, the network maintains a relatively high similarity to the original network (QAP = 0.906). Despite this, the k-core network displays an almost-perfect QAP. A fraction of removed nodes, edges, and weights are smaller than those in the TPKS-skeleton. Indeed, there is always a trade-off between holding the network sizes as small as possible, while the network can retain the topology information as similar to the original network. From this matter, TPKS-skeleton outperforms other skeleton network extraction methods as it best captures the overall network structure. The method is efficient in preserving high QAP correlation with the smallest number of nodes and edges involved in the skeleton network.

5.5. Spatial Information on Voronoi Diagrams

Voronoi diagrams have been widely adopted to evaluate spatial information embedded in maps [57]. Inspired by this, our paper employs some spatial statistics and quantitative measurements using Voronoi diagrams to calculate spatial information of the global terrorism network. The Voronoi diagram divides the entire network into polygonal partitions. The diagram represents the region governed by a particular spatial object based on the closure degree of distance to the object [58]. In these cases, the spatial objects are nodes. The classification of the two spatial nodes is the vertical bisector of the line connecting them. Such that each Voronoi region covers the location of exactly one terrorist/target who is involved in the terrorism incident. Figure 6 shows the Voronoi diagrams occupied by the spatial position of the nodes. We investigate spatial information over these diagrams in three aspects. First, the geometric information is related to the position, size, and shape of the network. Second, the spatial relationship information from neighboring features and their distribution [46]. Third, geometric balance and concentration are based on the concept of spatial equilibrium [59].

Considering the quantitative measure of information known as Shannon’s entropy [60], geometric information over the network can be evaluated by the relative space of the spatial node. Therefore, the geometric information of the global terrorism network is calculated by the entropy of the Voronoi region. The amount of geometric information suggests how spatial nodes occupy network space. In Table 6, TPKS-skeleton provides smaller geometric information than the networks generated by other methods. This characteristic implies that the space on the Voronoi region governed by each node is varied. We can assume that the area of devastating consequences from the terrorist attack also diverges. Figure 6 shows that the TPKS-skeleton method highlights a more distinct area of certain geolocation. This is because the network containing fewer nodes and regions. It can also be observed that nodes mostly correspond to a wider relative space, i.e., the blank space that surrounds the node. The larger the empty space around a certain node, the more effortlessly the node can be recognized and the more vulnerable the node will be. The remaining methods DF and k-core reflect a larger amount of geometric information, whereas the GWT and MST are also closely related to the information of the original network. Such substantial information means the relative space held by each node is similar, and the nodes in the network are more evenly distributed.

The Voronoi diagrams can reflect the adjacency relationships among nodes, which can be described by the spatial relationship information. This kind of information allows people to understand the relationship between spatial objects contained in the skeleton network. The types of spatial relationship information are typically based on three approaches: topological relationship, distance relationship, and directional relationship [57]. Our validation focuses on topological relationship information [46]. The topological relationship information can be quantified by the sum of Voronoi neighbors over the total number of nodes in the dual graph. Such that, the number of neighbors is somehow indicating the complexity of a dual graph. The index value of spatial relationships summarized in Table 6 explains that different methods have a variety of an average number of neighbors for each vertex . The skeleton of GWT expresses the largest average number of neighbors, which also greatly differ from the original network. It can be stated that when filtering some meaningless edges and corresponding nodes, the relationship between the remaining spatial nodes in the skeleton becomes more dominant. By using the original network as a benchmark, the MST is the best as it can preserve the closest spatial relationship information to the original network. It is because the MST keeps all the nodes from the original network. The result from the MST could infer to the nodes have more resulting significance in spatial information and has a higher impact on the network structure beyond the edges. In another aspect, the MST provides the smallest indicating less complexity as compared to other competitive methods. Meanwhile, the result from DF and w-skeleton holds an almost identical value of . The spatial relationship from TPKS-skeleton is slightly more complex than the MST and k-core by having a larger . Interestingly, the result from TPKS-skeleton does not deviate from other methods as it preserves relatively high spatial relationship information even though it contains only a few nodes and edges.

Next, the concept of geometric balance and concentration are applied to explain the spatial distribution. The measure of spatial equilibrium is commonly used to verify the balance of distribution within the sample space. The value of ranges between 0 and 1. The larger the value, the more the balance in the network’s distribution. In contrast, the network’s spatial concentration reveals the phenomenon of the spatial distribution that might concentrate in some particular areas. The higher the spatial equilibrium, the smaller the spatial concentration. In Table 6, all skeletons hold the characteristics of spatial distribution as the original network. The results show that all extracted skeletons have more tendency in the spatial concentration . The highest is seen in the extracted network from TPKS-skeleton. However, at the same time, it provides the smallest . The high of TPKS-skeleton is due to the method selects the major nodes from clusters and uses the intensity of incidents over the geographical area as a basis. It could be stating that the key incidents either did not happen equally or were not widespread over the globe but instead were denser in specific territories. To this end, we can conclude that our proposed TPKS-skeleton leads networks to be much reduced in size and retains reasonably all kinds of spatial information.

6. Conclusions

The global terrorism network is generally dense and contains many nodes and edges that are redundant and insignificant. The underlying structure of the real network is very likely hidden. Identifying the most influential nodes and edges would allow the real network’s structure to become apparent. This process lies in the concept of skeleton network extraction. The study is based on the belief that a skeleton network would be less complex and contain only the most relevant information from a small portion of important vertices. As such, a well-extracted network would greatly simplify the analysis of complex network properties and the inherent behavior of the underlying complex system.

Our proposed skeleton network extraction method has broad prospects beyond the traditional methods. The influence value of all nodes is evaluated by the TPKS algorithm to obtain the optimal skeleton entities with high significance results. We used the integrated process of the clustering algorithm over the similarity-based network to detect the group of nodes regarding their region in space and the event features. This is because nodes in the global terrorism network exhibit social relationships and are always associated with spatial units. Consequently, the skeleton network is simply reconstructed by a set of the high-influential nodes from each cluster and the simplified edges.

TPKS-skeleton allows one to measure how far the resulting node-equivalence, topological structure, and spatial information is from the original network. The extracted network covers the most important nodes from the original network and also provides the greatest node correlations compared to other competitive methods. Moreover, the TPKS-skeleton displays high structural-equivalence defined by QAP even though it preserves only a few nodes and edges. The validation experiments on spatial relationships using Voronoi diagrams provide a detailed distribution of information and the intensity of event features, which makes the results more understandable. Therefore, we are convinced that our skeleton network extraction method based on topology potential and similarity-based clustering is novel and unique. TPKS-skeleton network seems to provide a reasonable abstraction of the global terrorism network that helps us to understand the structure of the network easily. Its particular information could offer possible solutions for counterterrorism to monitor suspects and prevent terrorists from committing a terror attack together. More than that, we anticipate that our research findings could be useful for further academic research in complex network domains.

Data Availability

The dataset is publicly available in the Global Terrorism Database (GTD) which owned by the University of Maryland. The GTD is free for individual use and can be downloaded from https://www.start.umd.edu/gtd/access/. The source code used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by Science and Technology Innovation Research Project of the Ministry of Science and Technology of China (grant numbers ZLY201970 and ZLY201976-02), and National Natural Science Foundation of China (grant number 6147203).