Abstract

As the hybrid worm can propagate by both personal social interactions and wireless communications, it has been identified as one of the most severe threats to the mobile Internet. This problem is expected to become worse with the boom of social applications and mobile services. In this work, we study the propagation dynamics of hybrid worms and propose a systematic countermeasure. The system maintains a set of community structure which describes the high-speed infection zone of worms and contains worm propagation by distributing the worm signature to the guard nodes selected from the periphery of each community. For those nodes that are geographically close but located in different communities , we evaluate the communication security between them based on the observed infection history and limit communications between insecure ones to avoid the worm spreading across communities. We also design an efficient worm signature forwarding strategy that enables most nodes in the network to reach an immune state before being infected by the worm. Extensive real-trace driven simulations verify the feasibility and effectiveness of the proposed methods.

1. Introduction

With the rapid deployment of mobile Internet technology, smartphone-based social services such as Facebook, Wechat, and LinkedIn have already reached billions of registered users, many of whom choose to incorporate those services into their work and family life. On the positive side, the mobile Internet provides a convenient platform for people to communicate with close friends and interact online. On the negative side, however, it is also a breeding place for the spread of the mobile Internet worm [1].

The propagation of the worm in mobile Internet mainly depends on two dominant patterns [2]. First, the short-range worm infects all Bluetooth or Wi-Fi opened devices within the infection radius, which exhibits a spatial propagation pattern similar to the case of the contact-based disease [3]. Such kinds of infections rely on peer-to-peer communications between sensors with geographical adjacency, which build a geographic interaction networks (short for GINs). The GINs worm always exploits hardware vulnerabilities of the mobile device to crash them. Defending against this type of infection is a challenge due to the lack of effective centralized regulation. On this account, most of the existing methods utilize a distributed coping scheme that allows users to limit the communications with vulnerable devices to insulate the proximity worm [4, 5]. Second, a long-range worm can replicate itself and infect all smartphones whose identifiers are stored in the infected smartphone’s contact list, a delocalized propagation mode based on social relations in the social information networks (short for SINs). The SINs worm is similar to the one observed in Multimedia Messaging Service (MMS), both of which exhibit the characteristics of slow start and exponential propagation [6]. Recent works mostly utilize a partitioning strategy to insulate the SINs worm in several disjoint “islands” [79].

Recent research reveals that as mobile phone functions continue to increase, the worm no longer uses a single model to spread [2]. The hybrid worm uses short-range infections as well as long-range ones. The first variant that utilizes the hybrid propagation mechanism is Commwarrior, which spreads from one phone to another via the Bluetooth interface and MMS. Since the message usually comes from friends or family members, Commwarrior has a high probability of being activated. The most recent malware that exhibits the hybrid propagation feature is WannaCry, devastating ransomware that uses the campus network as the short-range propagation route. Although there has been no case of WannaCry infections on mobile phones yet, people have to be vigilant because of its extremely destructive power. Figure 1 gives a brief description of the propagation dynamics of the hybrid worm. The synergetic infection mechanism visibly increases the infection ability of the worm and brings a considerable challenge to the worm containment task.

To solve this problem, one possible method is to integrate the SINs and the GINs into one single network using the dimensionality reduction method [10]. However, the evolution speed of these two networks is quite different, so the integrating process usually does not make any sense. In this paper, we adopt a divide-and-conquer strategy and propose (2-dimensional circulation controller). We first model the propagation dynamics of the hybrid worm in the mobile Internet and analyze the vulnerable spot in the worm propagation chain. Next, two components are designed to reduce infection rates. The first one is the SINs containment unit which aims to solve the secondary forwarding caused by acquaintances based on the fact that the propagation of the worm on social networks mainly depends on the closeness of social relationships. Another one is the GINs feedback unit. Some devices are geographic proximity and offer the chance for the worm to propagate across communities with short-range communications. Based on the security history records, the GINs feedback unit restricts the communication with insecure devices with a certain probability. The outputs of these two units serve as the input parameters of each other, forming a cyclic state.

The rest of the paper is organized as follows: Section 2 introduces related works. Section 3 discusses the propagation dynamics of the hybrid worm. Section 4 gives detailed descriptions of . The proposed methods are evaluated in Section 5. In Section 6, we make conclusions and envision further work.

Although the worm is well understood on the Internet [6, 11], the worm on mobile Internet, however, has received only limited attention. In simple terms, the existing methods fall into two categories. For the short-range worm containment, Su et al. [12] showed that Bluetooth is an essential interface for worm’s propagation. Yan and Eidenbenz [13], Mickens and Noble [14], and Morris-king and Cam [15] confirmed this conclusion by analyzing the propagation dynamics of the worm transferred via the Bluetooth interface. Zyba et al. [4] designed a distributed coping scheme to eliminate the adjacent worm by using the worm signature. However, the time complexity of the algorithm is too challenging to solve a vast network. Yang et al. [16] proposed a sensor worm coping scheme based on graph coloring. The basic principle of this method is to increase the diversity of software version in the network. Li et al. [5] proposed a method to evaluate node vulnerability and control the worm by restricting the communication between vulnerable nodes. Miklas et al. [17] exploited social relations to improve the security of the Bluetooth interface and reduced the propagation speed of the worm by refusing connection requests from strangers. Gao and Liu [18] focused on the impacts of human behaviors on worm propagation and proposed a two-layer network model to protect large-scale dynamic mobile networks. The short-range worm relies on the direct connections between hardware interfaces and currently tends to attack wireless sensor networks [1921] and vehicular networks [2224].

For the long-range worm containment, Fleizach et al. [25] verified the differences in propagation characteristic between the Internet and the mobile Internet worm and evaluated the propagation effect of the MMS worm on cellular networks. Meng et al. [7] investigated the credibility of the communications in Short Messaging Service (SMS) by analyzing the trajectory data in the mobile networks. Bose et al. [26] utilized a quarantine method to limit the interaction between vulnerable nodes in MMS networks. Zhu et al. [8] considered that the core nodes in social networks should be immunized first, but this method ignores the transmission route of the worm via the Bluetooth interface, so the worm still has an opportunity to quickly forward. Moreover, the algorithm needs the number of clusters k, which is incognizable for social networks in advance. Zhao et al. [27] integrated the centralized and decentralized patch distribution strategy by constructing a new network layer model. Yang and Yang [28] proposed an evaluation framework for testing patch distribution efficiency. The key to long-range worm containment is to identify the area of high-speed infection. The latest research studies tend to use community detection [9, 29, 30] and social influence analysis [31, 32] to solve this problem.

The above studies have made remarkable progress in the field of SINs and GINs worm containment, respectively. However, hybrid worm containment on the mobile Internet is still an open issue. The contribution of this study is that we formalized the propagation equation of the hybrid worm and proposed a worm containment scheme based on the mesoscopic analysis. Different from the flooding patching strategy, we preferentially distribute patches to the high-impact nodes in the network and establish a link between historical communication records and security predictions through Bayesian inference.

3. Propagation Dynamics of the Hybrid Worm

Susceptible infected recovered (SIR) model is used to measure the propagation dynamics of contagions within a population under contact infections in epidemiological theory [33]. Inspired by [2], we propose the hybrid-SIR model to depict the propagation characteristics of the hybrid worm by changing the propagation criteria of the SIR model.

Let , , and represent the number of susceptible, infected, and recovered nodes at time t, respectively. calculates the number of infected nodes including immune. Let β, γ, and N, respectively, represent the infection rate, the recovery rate, and the total number of nodes, then the differential equations of the SIR model are given by

For our hybrid-SIR model, all of the interactions between smartphones derive from the SINs and the GINs. Let and , respectively, represent the number of infections via the SINs and the GINs at time t. calculates the total number of infected nodes at time t. denotes the total number of susceptible nodes at time t. Then, we have

When an infected smartphone intends to propagate the worm through the SINs, it behaves like a traditional SMS virus which sends messages to the one found in the local contact list (reflected in the degree of nodes). Let denote the probability of the worm being activated, denote the average degree of nodes in the SINs, the number of susceptible nodes neighboring a start point equals to , and the total number of susceptible nodes is thus given by

Based on equation (3), the equation that depicts the dynamics of infected nodes in the SINs with time iswhere denotes the recovery rate by means of sending patches.

When an infected device tries to propagate the worm through the GINs, it first detects all adjacent neighbors (related to population density) within its propagation range R. We assume that the smartphones are distributed with density ρ, then the average amount of accessible smartphones equals to .

At the new time step, only the infected smartphone that lies on the circumference of the infection ripple has the chance to exchange messages with the susceptible smartphones and therefore have the possibility to infect them, while the smartphones located in the interior of the infection ripple are not contributing to further spatial infections (that part of nodes are immune or still under infection).

Let and represent the number of smartphones lies on and lies within the circumference of the ripple, respectively; the infected dynamics of the periphery nodes with the incremental infected radius is described as

Here, calculates the radius of the infection ripple at time . We have based on the fact that the number of nodes inside the infection ripple is the sum of neighbors of the center node. Thus, equation (5) can be simplified into

We assume that the spatial infection ripple of a start node is generated at time and achieves the current infection status after the duration of . Then, the incremental infection at time is

After simplification, we have

In equations (7) and (8), indicates that for each smartphone that lies on the circumference of the ripple (e.g., the node in Figure 2), approximately two-thirds of its neighbors outside the ripple are susceptible. denotes the probability of a GIN worm being activated in the susceptible device. is the proportionality constant [34]. The incremental infection of all infection ripples at time t is thus given by

It means that dominates the spatial infections of the worm. In other words, there are infected phones emerging at time s and each one approximately contributes incremental infection at time t.

We roughly verified the propagation performance of the worm via the SINs, the GINs, and the hybrid mode based on the proposed model. Two parameter settings are considered: one is , , and ; the other is , , and . Simulation results in Figure 3 indicate that with the increasing number of smartphone’s neighbors (), the hybrid model greatly enhances the power of worm infection and brings considerable challenges to the worm containment task.

4. Containment Scheme of the Hybrid Worm

In this section, we present the framework of (shown in Figure 4) and demonstrate the basic idea of the system. The system consists of five units. On the top, it starts with the Online Community Manager. In social networks, messages from close friends have a higher probability of being received and further opened. Thus, the infection rate in equation (4) basically equals to one. We cannot expect the recovery rate to be increased, as the mobile operator requires necessarily time to detect worm and code patches. Also, directly modifying is unrealistic. However, reducing the average degree of nodes is seeking the sparse representation of the network. Online Community Manager maintains a set of community structures [35], in which nodes are closer to each other than anyone else outside the community. In this light, if the nodes are mapped to the communities, the propagation speed of the worm can be significantly reduced at the mesoscopic level during the exponential growth phase.

node monitor is used to monitor the infection status of nodes, and generate guard nodes lies on the periphery of each community. Once the worm has been found on the network, the system starts generating the worm signature and delivers it to the guard nodes through a data transmitter. If a guard node receives the worm signature before it is infected, it will become immune to the worm. As a result, the worm propagation can be blocked since any malicious message has to go through the guard nodes to reach the adjacent communities.

All the information about the nodes is gathered in the information collector and provides the evidence for the security evaluator. The purpose of security evaluation is to prevent the worm from spreading across communities through location-based infections. For those users with high activity and poor security, we randomly reject the communication with them until their security consciousness is improved. This scheme will reduce in equation (8) to some extent, while in the equation is intractable since it describes the density of the population in real space. The output of the security evaluator also guides the community detection process, ensuring security within the community. Section 5 describes more details of .

5. System Description

5.1. Online Community Manager

Community detection in this component consists of three steps: security evaluation, label propagation [36], and overlapping community combination. Note that we do not intend to detect “high-quality” partitions (i.e., with higher modularity [37]), but to detect a reasonable structure with higher density and internal forwarding efficiency. We use label propagation (LP) algorithm to simulate the process of nodes receiving neighbor messages. In the algorithm, if a sender satisfies the following conditions, the label he delivers is more likely to be accepted: (1) enough security; (2) with coercive power.

For condition 1, we first analyze the communication security of nodes on the SINs (marked as ) and estimate the probability of secure communication θ with the Bayesian formula as follows:

Let a denote the number of secure communications between node i and its neighbors in n communications (marked as normal in Figure 4), and calculates the number of insecure communications in the same situation (marked as infected in Figure 4). Before inference, we assume that the probability of secure communication θ follows uniform distribution , which means , . Estimating communication security is a binomial experiment , so the likelihood of a out of n communications being normal is

Substituting (equation (11)) into (equation (10)), we have

Given that for any real number α and β, the beta function satisfies

Let and , respectively, represent the number of normal and infected communications, then we have

Equation (14) indicates that the communication security of node i follows beta distribution with parameters and , and its expectation is

According to the law of large numbers, we have . It seems that frequent sending of secure messages is an indication of reliable communication. This conclusion, however, does not consider the security of the adjacent environment. We define the user’s security evaluation function as

Here, denotes the degree of node i. In equation (16), the multiplier represents the security of the adjacent environment. It can be found that when the adjacent environment is insecure and the user can still send secure messages with higher probability, the user is considered to be relatively reliable.

Similarly, we can define the likelihood of node i sending insecure messages as . For condition 2, if the malicious message sent by a user does not significantly change the security of the adjacent environment, the coercive power (cp) of the user is weak. We define this condition in equation (17), which measures the probability of a security-conscious user receiving a malicious message:

Based on equations (16) and (17), the transition probability of messages (whether secure or not) sent by node i on the SINs is defined as . Combined with the security evaluation result (equation (27) submitted by the GINs feedback unit, the transition probability of the community label held by node i is defined as

We modify the propagation and acceptance criteria of the LP process and propose a security-based label propagation (SLP) algorithm based on equation (18); the pseudocode is described in Algorithm 1.

Require:
Contact history from information collector;
Ensure:
Community structure ;
Step 1. Initialize the community label. Each node i in graph G contains a feature vector , where l represents the community label, b represents the belonging degree, L represents the transition probability of l, t represents the number of iterations. For example, the feature pair of i is means the initial label of node i is itself and the belonging degree equals to one;
Step 2. Set ;
Step 3. Arrange each in a random order and assign them to V;
Step 4. For each , , calculate the belonging degree of the adjacent community label as ;
Step 5. For each , if exists s.t. , then replace with , and remain unchanged. Else stop and jump to Final;
Step 6. Set and go to Step 3;
Final.

The output of Algorithm 1 is a set of overlapping communities, as each node is allowed to have multiple community labels in Step 5. However, this will increase the bandwidth cost of the network, as the more the number of communities, the more the number of guard nodes needs to be monitored. Therefore, we need to execute a merging program to reduce structure redundancy. We define the structure stability of community C as

Equation (19) measures the tradeoff between structure density and security. What we need to evaluate is whether the overlapping part provides significant stability for the entire community. In our method, any two communities can be merged if . Note that the merging program also provides a way to detect dynamic communities. Given a network G, an initial community structure C, and an incremental update , we have

Equation (20) indicates that when the network changes, we only need to execute the SLP algorithm on the newly added nodes in the local environment and then run the merging program to avoid repetitive computation of the existing results. Figure 5 shows a three-step example of the SLP process. Each node chooses to join a security substructure based on the observed contact history. In the algorithm, the label in iteration t is always based on its neighboring labels in iteration to avoid the oscillations of labels [36].

5.2. Data Transmitter

The function of the data transmitter is twofold: (1) construct the forwarding strategy of worm signature and (2) select the guard nodes (the initial delivery node set). When the worm is found in the network, the data transmitter will send the worm signature to the network operations center (NOC) and then distribute it to each network node. Flooding is a possible forwarding strategy. However, this will bring enormous bandwidth costs and influence normal network communications. Besides, the flooding method considers that all the network nodes have the same delivery priority, and the forwarding capability is weak. We propose a new function to calculate the forwarding capability of nodes, which is described as follows:

Here, denotes the hops between node i and node j. Equation (21) shows that the delivery priority of nodes depends on the security and coercive power of its neighbors within 2-hops. The start points of the delivery process include the overlapping nodes and the endpoints of links between communities. This setting will ensure community quarantine [38] as early as possible and prevent worms from spreading across communities. Algorithm 2 describes the pseudocode of the data transmitter.

Require:
Graph ; community structure ;
Ensure:
Guard nodes ; signature propagation;
Step 1. Initialize . For each do , ;
Step 2. For each , initialize the worm signature from the NOC in i’s buffer as ; set initial token τ in i’s buffer;
Step 3. For each , , if exists τ in i’s buffer and j’s buffer is null then duplicate and deliver to node j; if exists j s.t. then duplicate and deliver token τ to node j;
Step 4. If all nodes have received the worm signature, then stop and jump to Final. Else go to Step 3;
Final.
5.3. Security Evaluator

In the GINs feedback unit, the security evaluator is an essential component which is designed to perform community quarantine procedure and provide security evaluation results for short-range data transmission. Different from social applications, there is not enough evidence to use the Bayesian formula to deduce the communication security of users since short-range interfaces are not used frequently in practice. However, this will not prevent the use of Bayesian thoughts in this component.

Let and , respectively, represent the number of normal and infected communications that node i receives from its neighbor j in n rounds. Based on the observed security history, node i will calculate the communication security between neighbor j as

When there is no supervision record, the initial value of both and equals to one. Equation (22) is unlikely to provide a reliable security calculation directly when the sample is insufficient. To solve this problem, we design an uncertainty computing function , which is defined as

In equation (23), when or dominates, the value of declines, which indicates that the security of the next communication will be more predictable. Based on equations (22) and (23), the revised communication security can be defined as

Another parameter that needs to be considered in the security evaluator is the interaction frequency, which is measured in this paper as the regularity of interaction. We present the concept of circulation connections. As shown in the dashed box in Figure 6, the communication interval can be regarded as continuous communication. Let and , respectively, represent the number of continuous and circulation connections in a sliding window of size ws, then the interaction frequency is thus given by

In the four cases shown in Figure 6, equals to 0.9, 0.8, 0.2, and 0, respectively. The first two cases show strong regularity, which probably comes from close friends. While the latter two cases with a low could be the contact history from acquaintances or strangers. From a statistical point of view, a low communication frequency will not provide sufficient evidence to support the communication security evaluation. For example, when and , should be punished appropriately; when and , should be gained slightly. We define the communication security of arc on the GINs as

Here, is used to decide whether the security should be gained or punished and the regulation threshold is approximately equal to 0.5671. When , we have . In this case, we use to give a reference value to the GINs security evaluation. Based on equation (26), the security evaluation result of node i in the GINs can be defined as

Algorithm 3 describes our idea on community quarantine. For node , , if i has not received any worm signature, i will reject j’s messages with a certain probability by using equation (26). Considering channel utilization, we first perform the algorithm on users who are located in different communities but geographically adjacent (i.e., within 30 m [39]). Next, these users will broadcast quarantine notification to other community members to achieve global synchronization.

Require:
; ; ;
Ensure:
Community quarantine;
Step 1. ;
Step 2. For each , , if , i will reject the messages from j (except ) with probability ;
Step 3. For each , i broadcasts quarantine notification to node , j will reject messages from other communities with probability ;
Step 4. For each , if , then stop and jump to Final. Else go to Step 3;
Final.

Here, . We use counter to record the number of consecutive messages on arc and increase the rejection probability with grows.

5.4. Complexity Analysis

The computation complexity of mainly consists of three parts: (1) community detection and combination in Online Community Manager; (2) forwarding capability calculation in Algorithm 2; and (3) security evaluation in Algorithm 3.

For the first part, the time complexity of Algorithm 1 is [40], m is the number of edges, n is the number of nodes, and is the average number of communities per node. The time complexity of the merging process is , where represents the average degree of communities and is the number of communities.

For each community , we havewhere represents the average number of nodes in the community. The SLP algorithm guarantees that each node is at least located in one community, therefore

It is quite clear that . Let represent the average degree of nodes, then we have

Based on equations (27)–(30), we have . Since , the time complexity of the first part is . For the adaptive procedure (equation (20)), the time complexity of the SLP process and merging process is , where represents the incremental update of the network.

For part 2 and part 3, the time complexity is , where is the size of the sliding window. We use the KMP algorithm to match circulation connections in the window, and the time complexity is . For the adaptive procedure, the time complexity of part 2 and part 3 is .

In , all the components are executed sequentially; therefore, the time complexity of the initial phase is . For the adaptive procedure, the time complexity is . Note that in the above analysis, we assume that the system confronts extreme conditions (e.g., we assume that the newly added nodes link to all the existing nodes). In practice, the execution time of will be shorter.

6. Experiments and Analyses

In this section, we will present and discuss the experiments of on two different kinds of real-world traces including the MIT Reality (consists of students and faculty in the MIT Media Laboratory) (http://crawdad.org/mit/reality) and the Haggle Project (conducted for four days during INFOCOM 2006 in Barcelona) (http://crawdad.org/cambridge/haggle). In both datasets, Bluetooth contacts, phone call records, and users’ locations were provided to construct the GINs and the SINs layer, respectively. Each Bluetooth contact includes the start time, end time, and the IDs of nodes. Each phone call record includes call logs, cell tower IDs, application usage, and phone status (such as charging and idle).

For each round of the simulation, a portion (default 35%) of the dataset was used as the contact history (including normal and infected communications). At the very beginning, we randomly chose 0.05% of the nodes as the seed set of worm sources to initiate the infection. The propagation of the malicious message follows the hybrid propagation model proposed in Section 3. In the model, the recovery rate is and the probability of a node sending messages to acquaintances (its top 10 neighbors) in the SINs and the GINs is set to 0.2 and 0.05, respectively and to strangers is set to 0.05 and 0.01, respectively. The activation probability of the worm is set to 0.95 (from acquaintances) and 0.05 (from strangers) without running any coping scheme. To avoid flooding broadcast, each infected node attempts to attack its neighbor nodes only once.

6.1. The Analysis on Community Quality

In this section, we test the quality of the community structure generated by the SLP algorithm (Algorithm 1) in terms of the average community size (marked as ACS), the number of detected communities (marked as #Communities), the structure stability of community structure (equation (19) marked as SS), the EQ function, and the execution time. The EQ function proposed by Shen et al. [41] is widely used in evaluating overlapping communities, which is described in the following equation:where is the degree of node , is the total degree of the network nodes, is the element of the adjacency matrix of the network, is the number of communities which the node belongs to, and is the i-th community in the network. The comparison algorithms include COPRA [40] (based on label propagation), EAGLE [41] (high-performance overlapping detection algorithm), and A3CS [42] (detecting dynamic community structure). We perform the algorithms mentioned above in Haggle and MIT Reality, respectively, and the experiment results are covered in Tables 1 and 2.

In both datasets, the SLP algorithm makes more numbers of communities (10 and 16, respectively) and smaller size of communities (8 and 6, respectively) than others, which means that the detected communities are more granular. When this feature is applied to Algorithm 2, will have more opportunities to find high-impact nodes in the local environment and thereby will increase the performance of worm containment and the efficiency of sending patches. The EQ value of the SLP algorithm (0.439 and 0.441, respectively) is slightly lower than that of COPRA and EAGLE. As we discussed in Section 5, the SLP algorithm is not aiming to find high “modularity” partitions [37], but to find communities with higher security awareness inside. So the structure stability (SS score) of SLP (0.486 and 0.533, respectively) is stronger than the other three algorithms. In terms of the execution time, SLP (1.132 and 1.296, respectively) is slightly slower than COPRA and faster than EAGLE and A3CS, which is in line with the online community monitoring task.

6.2. The Performance of SINs Containment Unit

The test in this section mainly considers three scenarios: (1) turn on and off the GINs feedback unit to demonstrate the performance of the SINs containment unit; (2) replace the SLP algorithm with COPRA, EAGLE, and A3CS to verify the performance of the Online Community Manager in the SINs containment unit; and (3) compare with three community-based coping strategies, including Member [30], I2C [29], and TC-based [9]. We primarily focus on the infected ratio which is reflected in the proportion of nodes infected by the worm.

In Figure 7, the line graph marked as “no coping” depicts the infected ratio of worms with time without any coping scheme, which can be used as the benchmark in comparison. When the GINs feedback unit is turned off, the performance of is slightly better than Member and I2C and better than COPRA, EAGLE, and A3CS due to the higher security within the community structure. performs significantly when the GINs feedback unit is turned on, primarily because it decreases worm propagation across communities when the user’s location is adjacent. When the recovery rate , can constrain the further spread of worms around 30 time units and control the final infected ratio at 22.5% (Haggle) and 12.2% (MIT Reality) at 70 time units. We can find that at the mesoscopic level, the smaller the group size, the higher the internal security of the group, and the better the worm containment performance. Besides, the community quarantine strategy significantly improves the effect of worm containment, approximately equal to the difference between GINs ON and GINs OFF.

6.3. The Performance of GINs Feedback Unit

In this section, the SINs containment unit is turned on and off to demonstrate the performance of the security evaluator in the GINs feedback unit. The comparison methods include (1) a distributed local detection-based scheme [4] (marked as distributed), (2) a proximity signature forwarding-based scheme [4] (marked as proximity), (3) a Bluetooth-based malware coping scheme [21] (marked as hierarchical), (4) a pruning-based proximity malware coping scheme [15] (marked as K-distance), (5) a community-based proximity malware coping scheme [5] (marked as centralized), (6) a social network-based patching scheme [8] (marked as socializing), and (7) TC-based (performs well in the former test). The first four methods primary focus on the GINs worm containment task, while the latter three to contain the propagation of the worm on the SINs. We keep the relevant parameters as well as the original literature setting for comparison convenience.

Figure 8 gives the experiment results. In contrast to Figure 7, the performance of SINs OFF is weaker than GINs OFF, which indicates that the high-speed infection of worm relies on the community structure. The comparison with the other seven methods confirms this conclusion, the scheme using social relationships (centralize, socialize, and TC-based) is superior in performance to the scheme using geographic location (distribute, proximity, K-distance, and hierarchical). We also noted that the patching strategy is an effective way to control worms (proximity, centralize, and socialize), and its performance is much higher than that of the structure-based control methods (distribute, K-distance, and hierarchical). On the whole, SINs ON performs better than other methods with an average reduction of 14.4% (Haggle) and 5.9% (MIT Reality) in terms of the infected ratio. Another observation is that although the propagation speed of the GINs worm is slow, ignoring the GINs feedback unit, however, will significantly reduce the containment performance, which is approximately equal to the difference between TC-based and SINs ON.

6.4. The Comparison on Worm Containment Capability

This section mainly contains three series of experiments. First, we do tests on the percentage of patched nodes, which is defined as the average number of signatures forwarded in the network. Comparing Figures 8 and 9, we can see that only needs to deliver 20.9% (Haggle) and 16.7% (MIT Reality) network nodes to control the infected ratio at 22.5% (Haggle) and 12.2% (MIT Reality). With the same patching ratio, proximity, centralize, socialize, and TC-based can only control the infected ratio around 60.4%, 41.3%, 40.5%, and 35.4% (Haggle) and 31.7%, 24.7%, 21.8%, and 19.2% (MIT Reality), respectively. Distribute does not need to deliver patches, but it also loses the chance to be immune to worms. As a result, the performance of distribute is only slightly better than “no coping” in Figure 8.

In the experiment shown in Figure 9, the start points of the worm signature are automatically generated by Algorithm 2 and grow gradually with time. Next, we manually set the initial immunization ratio (percentage of patched nodes) from 0% to 5% (preferential the guard nodes) and verify the infected ratio at 70 time units. We use a patching threshold μ (initially infected ratio) to decide when to initiate the patching process. The experiments are conducted with μ = 5%, 10%, 15%, and 20% (beyond 20%, the worm becomes uncontrollable), and the results are described in Figures 10 and 11.

In both cases, performs better than the other three containment schemes under the same patching cost. For example, is 20.1%, 13.2%, 10.2%, and 7.7% higher than proximity, centralize, socialize, and TC-based on Haggle networks when μ = 5%. This advantage is even more pronounced when μ = 20%, with the gap becoming 35.7%, 16.1%, 12.1%, and 10.4%, respectively. We also conclude four observations: (1) the later the worm is discovered (i.e., the higher the threshold μ is), the more difficult it is to control; (2) the control of the worm is proportional to network sparsity; (3) increasing the delivery priority of high-impact nodes can significantly improve worm containment performance; and (4) when a critical point is reached (such as 2.5% and 1.5% in Figures 10 and 11 when μ = 5%), further increasing the number of patched nodes will not effectively reduce the infected ratio.

The third experiment in this part verified the infected ratio with time when , and . The comparison methods include (1) random patching (repeated 100 times for consistency), (2) centralize (short-range containment), (3) socialize (long-range containment), and (4) TC-based (community-based coping scheme). To make the results more explicit, we set the initial percentage of patched nodes to 5% (performs well in the previous simulation), and the results are shown in Figures 12 and 13.

During the experiment, the random patching strategy is not sufficient to control the spread of worms. Even for the MIT Reality dataset with a sparse topology, the infected ratio eventually stabilized at 79.1% (around 70 time units). For the Haggle dataset, the value rises to 89.1% (around 80 time units). As expected, the later the worm is discovered, the higher the infected ratio. In essence, the worm signature is “racing” against the worm. If the worm signature “runs” faster, the containment performance will be better. On both datasets, when , centralize, socialize, TC-based, and can control the infected ratio below 50% (around 40 time units). The reason is that the active containment scheme delivers the worm signatures in parallel, while the worm can only randomly move around. When , only can effectively control the further spread of the worm and the network reaches a stable state around 50 time units.

6.5. The Analysis on Rejection and Immunization Number

Finally, we briefly analyze the immunization and rejection strategies of . We focus on the percentage of rejections and immunizations with initial infected ratio , and during the iteration. The results are shown in Figures 14 and 15. On average, the number of immunizations is higher than the number of rejections, suggesting that the immunization strategy dominates the worm containment process. This conclusion will be more conspicuous when the link density becomes larger (i.e., the Haggle network). We also noticed that when the number of immunizations and rejections decayed from the peak, the infected ratio is still at a high rate of growth (compared to Figures 12 and 13), which is related to the two-stage propagation pattern of the worm. The first stage is the “strong-link infection” of top ten mutual friends, which makes the number of attacks reach its peak quickly. The second stage is the “weak-link infection” of nontop ten friends. In this mode, the probability of a node being infected is proportional to the degree of the node. Once a high-degree node is infected, it triggers a broader range of secondary infections, which continues to increase the infected ratio. From this perspective, the idea of priority immunization against high-impact nodes is practical and has certain performance advantages. Besides, the immunization strategy does not affect the normal communication of the network (e.g., the peak is 21.3% and 17.1% when in Figure 14), which ensures the channel utilization of the network to a certain extent.

7. Conclusion

This paper models the spreading dynamics of the hybrid worm and propose several designs for worm containment on the mobile Internet. First, we identify the community structure of the network and control the long-range infection by immunizing the guard nodes that lie on the periphery of each community. Second, a worm signature distribution scheme is presented to quickly deliver each signature to all nodes to prevent the worm from spreading out to a larger population. Third, we design a security evaluation method to help users make favorable decisions when exchanging data using short-range transmission interfaces.

Experimental results show that the key to SINs worm containment is to identify high-impact nodes in the network accurately. Besides, although the GINs worm spread slowly, it can significantly improve the propagation performance of hybrid infections. The method has high performance and efficiency in the location-based social networks. Subsequent research will focus on the multilayer network worm containment task to achieve an accurate perception of the real world.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request. The data can also be downloaded from the URLs given in Section 6.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was sponsored by the National Natural Science Foundation of China (61402126 and 61602133), Nature Science Foundation of Heilongjiang Province of China (F2016024), Heilongjiang Postdoctoral Science Foundation (LBH-Z15095), University Nursing Program for Young Scholars with Creative Talents in Heilongjiang Province (UNPYSCT-2017094), Scientific Research Foundation for the Overseas Returning Person of Heilongjiang Province of China (LC2018030), and National Training Programs of Innovation and Entrepreneurship for Undergraduates (201810214020).