Abstract

Epidemic models trade the modeling accuracy for complexity reduction. This paper proposes to group vertices in directed graphs based on connectivity and carries out epidemic spread analysis on the group basis, thereby substantially reducing the modeling complexity while preserving the modeling accuracy. A group-based continuous-time Markov SIS model is developed. The adjacency matrix of the network is also collapsed according to the grouping, to evaluate the Jacobian matrix of the group-based continuous-time Markov model. By adopting the mean-field approximation on the groups of nodes and links, the model complexity is significantly reduced as compared with previous topological epidemic models. An epidemic threshold is deduced based on the spectral radius of the collapsed adjacency matrix. The epidemic threshold is proved to be dependent on network structure and interdependent of the network scale. Simulation results validate the analytical epidemic threshold and confirm the asymptotical accuracy of the proposed epidemic model.

1. Introduction

Epidemic models have been widely used to analyze sophisticated interactions in networks, e.g., virus attack and propagation in computer networks [1, 2], rumor spreading in social networks [3], and cascading failures [4]. As a basic epidemic model, the susceptible-infected-susceptible (SIS) model defines two node states (susceptible and infected) and captures two state transition processes, namely, the infection from infected nodes to susceptible nodes and the self-healing of infected nodes. Earlier epidemic models omit topologies by assuming a network topology of a complete graph [5]. Network topologies, playing a key role in cyber security [6], are later considered and proved to have a strong impact on the epidemic propagation process [7].

Epidemic models trade the modeling accuracy for complexity reduction. Markov-chain based epidemic models are able to precisely analyze the epidemic propagation process but require Markov states to capture the S/I states of nodes and therefore can hardly be applied to large-scale networks [7, 8]. A number of topological epidemic models [812] decompose the -state Markov process into small Markov processes. This is achieved by using the expected infection probability of every node instead of the actual node state. A significant result from the models is that the epidemic threshold, under which the epidemic will eventually die out, is given by , where is the largest eigenvalue of the adjacency matrix of the network topology [9].

Network features, e.g., the degree distribution of scale-free networks [13], can be employed to simplify epidemic models by adopting the degree-based mean-field approach [14]. Specifically, nodes with the same degree are assumed to be infected with the same probabilities. An interesting result of the epidemic in scale-free networks is the absence of an epidemic threshold [15]. The degree-based mean-field approach has extended the SIS process, e.g., the epidemic propagation with incubation and the epidemic with a recovery state [16, 17]. However, the degree-based epidemic model cannot capture epidemic propagations in specific networks.

This paper presents a group-based continuous-time Markov model to quantitatively analyze the SIS process in large-scale directed networks. We start with the network modeling, where nodes are categorized into groups according to their connectivity. A collapsed adjacency matrix is proposed to describe the network topology. Based on the node groups, a continuous-time Markov model is proposed to capture the SIS-type propagation, where the state of a group is estimated by taking the mean-field approximation. Focusing on the problem of epidemic threshold, the proposed nonlinear model is linearized via omitting high-order terms around the disease-free point. The epidemic threshold is derived by performing matrix analysis on the Jacobian matrix of the linearized model and then validated by simulations. The key contributions of this paper can be summarized as follows:(i)We propose a new modeling method, which groups nodes with the same connectivity in directed networks and models the epidemic propagation of the groups by using continuous-time Markov SIS models.(ii)By taking the mean-field approximation, the proposed SIS model is asymptotically accurate with the decrease of effective spreading rate and/or the increase of node groups.(iii)Linearization and stability analysis are carried out on the proposed SIS model to deduce the epidemic threshold, under which the epidemic eventually becomes extinct.(iv)The epidemic threshold is proved to be dependent on network structure and interdependent of the network scale.

Comprehensive simulations confirm the validity of the proposed mean-field epidemic model and the deduced epidemic threshold in large-scale networks.

The rest of this paper is organized as follows. In Section 2, related works are reviewed. In Section 3, the directed network model is presented, followed by the proposed mean-field SIS model in Section 4. In Section 5, numerical and simulation results are provided, followed by conclusions in Section 6.

The SIS model was firstly developed for biological infectious diseases, which defines two states for a node, i.e., susceptible or infected [5]. A susceptible node can be infected with probability by an infected neighbor. An infected node can be cured with probability . Recently, various states were introduced into the SIS model. Kephart et al. [18] and Vojnovic et al. [19] proposed a new “warning” state at every node, in addition to the “infected” and “susceptible” states. The probability of a node switching to the “warning” state depends on the population of nodes in the warning state and is independent of the network topology. The SIS model was extended with “quarantined”, “vaccinated”, and “delay” states, to capture the time-delayed worm propagation in computer networks [20]. These models significantly simplified the dynamic process by omitting the impact of network topologies.

The network topologies can have a strong impact on epidemic propagations [11]. In [8], the continuous-time Markov process was adopted to model epidemic propagations in specific topologies, where every Markov state collects the states of all nodes in the network. The second largest eigenvalue of the Markov transition matrix determines the convergence rate to the absorbing virus-free state. With a discrete-time Markov model, the convergence rate was proved to be asymptotically bounded [7]. The Markov models can be decomposed into small Markov processes in -node networks by applying the mean-field theory to achieve tractability [8, 1012]. The decomposed Markov processes deduced a widely approved result that a virus dies out quickly if , where is the adjacency matrix of the network, and is the largest eigenvalue of . The network can also be dynamic, where nodes can transfer among communities [21]. The simulation results on two communities revealed that the node mobility can accelerate the malware propagation and improve the epidemic threshold.

Some network features can be employed to simplify epidemic models and improve tractability. Pastor-Satorras et al. [15] derived the probability of a node being infected as a function of the expected number of its infected neighbors in scale-free networks. The number of infected nodes at the equilibrium state was given by , where is a network generation parameter and is the effective spreading rate. Zou et al. [22] simulated the propagation of Internet email worms in scale-free graphs and showed that the aforementioned result can be overestimated due to the implicit homogeneous mixing assumption. Meanwhile, Li et al. showed the analysis on scale-free graphs is inaccurate for specific topologies [23].

The statistic topology models are generally based on undirected networks. However, there is an intrinsic directionality in the propagation in specific types of dynamics, e.g., infectious disease spreading [24] and information transmission [25]. Directed networks, sets of vertices, and a collection of directed edges that connect pairs of ordered vertices are useful to represent specific transmissions with intrinsic directionality in the propagation [14]. Meyers et al. [26] employed the percolation theory to predict disease transmission through semidirected contact networks, where edges may be directed or undirected and found that the probability of an epidemic and the expected fraction of a population infected during an epidemic can be different in semidirected networks, in contrast to the routine assumption that these two quantities are equal. Li et al. [27] defined the directionality as the percentage of unidirectional links and found that the lower bound of the epidemic threshold increases with a growing , implying that the directionality hinders the propagation of epidemic processes. In [28], Khanafer et al. studied the stability of an SIS N-intertwined Markov model over arbitrary directed network topologies and showed that when the basic reproduction number is greater than one, the epidemic state is locally exponentially stable, and when the network is not initialized at the disease-free state, the epidemic state is globally asymptotically stable.

3. The Directed Network Model

We consider the SIS epidemic process in a strongly connected network with nodes connected by directed edges. Each node in the network can be in either a susceptible () or an infected () state. An infected node can infect its susceptible neighbors along the directed edges at the rate of per edge. Infected nodes can independently recover to be susceptible at the rate of .

We suppose that the nodes in a network can be categorized into groups, denoted by , , , , where the nodes in the same group have the same out-degrees and the same number of edges to the nodes in the same destination group. The number of -nodes can be denoted by (here, ). Given the node groups, we define a collapsed adjacency matrix, denoted by , to describe the topology of . The -th entry of the matrix, denoted by , describes the number of edges from a -node pointing to -nodes. As a result, can be described by the node number vector and the collapsed adjacency matrix . Figure 1 provides an example of node categorization, where nodes are categorized into groups, i.e., . Its collapsed adjacency matrix can be given by

Other notations are defined as follows: () denotes the number of -nodes in state . denotes an edge starting from a -node and ending at a -node, where the -node and the -node are in states and , respectively. denotes the number of edges. Let () denote the number of edge pairs consisting of and ( and ), as illustrated by Figure 2. Numerical relationships between states of nodes and states of edges satisfyHere, (2a) is because any node is in either an or state. (2b) is because any edge is in one of the four states given in the right-hand side (RHS) of (2b). Meanwhile, edges can be classified according to the state of the starting point, as given by (2c) and (2d). (2e) is deduced from (2a), (2b), and (2d).

4. Group-Based Mean-Field SIS Model

We propose to analyze the SIS process in directed networks by employing the mean-field approximation which uses a single average effect to approximate the effect of all the other individuals on any given individual. Thus, the same group of nodes in our model are evaluated by the same average estimation. The state transitions of nodes and edges in the SIS process can be given byHere, (3a) and (3b) give the changing rate of susceptible and infected -nodes. The RHS is because an infected -node can be cured with the rate ; a susceptible -node can be infected with the rate per edge by an infected neighbor. In the continuous-time model, the time slot is infinitesimal that the infection rate can be summed together, i.e., .

Equations (3c)-(3f) capture the time-varying number of links. The first two terms on the RHS of (3c) are because an edge can transfer from an or edge when the infected node is cured. The last two terms on the RHS of (3c) capture the cases where the starting -node or the ending -node is infected by its infected neighbors. (3d) is because an edge can transfer to an edge in the case that the infected -node is cured at the rate ; and to in the case that the susceptible -node is infected by its infected neighbors. can transfer from in the case that the infected -node is cured with the rate or from in the case that the susceptible -node is infected by its infected neighbors. Different from previous SIS models in undirected networks, e.g., [15], cannot transfer to , as the epidemic can only propagate along directed edges. Equation (3e) can be similarly obtained attaching the infection process, i.e., . Equation (3f) is self-explanatory.

Known as the disease-free equilibrium point, the equilibrium point of interest is ; i.e., all nodes are susceptible. The condition of the disease-free equilibrium point can be deduced by linearizing the SIS model given by (3a)–(3f). This is because the stability of the original nonlinear system can be determined by the eigenvalues of the linearized model as stated in Lyapunov’s First Method [29].

To linearize the SIS model, the number of edge pairs and is first estimated by using the number of unpaired edges. This is achieved by applying the moment closure approximation as evaluated in [30, 31]. As a result, we haveIn (4a), every susceptible -node on average has edges pointing to -nodes. is then estimated by employing (2e), where and are introduced to solve the equation. Here, satisfies and ; satisfies and . (4b)-(4d) can be similarly obtained.

By substituting (4a)–(4d) into (3a)–(3f), (3c)-(3f) can be rewritten asThe terms of can be suppressed by substituting (2b) into (5a)–(5d). As a result, we have

Near the disease-free equilibrium point, (6a)–(6c) can be linearized by suppressing all higher order terms. As a result, we have

After the model has been linearized, the condition of the disease-free equilibrium point can be obtained by performing eigenvalue analysis on the Jacobian matrix of the linearization. The Jacobian matrix is a matrix and denoted by . The nonlinear dynamic system is stable at the equilibrium point if and only if all the eigenvalues of the Jacobian matrix are negative, as stated by the Hartman-Grobman Theorem [32]. In other words, the epidemic is certainly extinct ifwhere is the largest eigenvalue of the operator. is the Jacobian matrix of (7a)–(7c) and can be given bywhere is an diagonal matrix whose diagonal entries are . We note that and are zero matrices. As a result, is an upper block triangular matrix, andwhere is the set of eigenvalues of the operator.

Here, , , and are diagonal matrices. Their diagonal entries are , , and , respectively. The entry in the -th row, -th column of , denoted by , is given by

By employing the Schur complement, i.e.,we have , where . The entry in the -th row, -th column of , denoted by , is given by

We conclude that . Note that and are diagonal matrices, and all the diagonal entries of them are negative, i.e., all the eigenvalues of and are negative. As a result, we have

The matrix can be written aswhere is a diagonal matrix and can be given by . The entry in the -th row, -th column of , denoted by , is given byWe have that all the eigenvalues of are and

Note that is a sparse matrix and similar with a block matrix consisting of and zero matrices as given byThis is achieved by performing matrix operations on . Thus, the eigenvalues of can be given by

Combining (8), (14), (17), and (19), the epidemic will become extinct if . In other words, the epidemic threshold, denoted by , is given by The epidemic dies out, if .

5. Simulation and Numerical Results

In this section, numerical and simulation results are presented to validate the proposed group-based SIS model and the deduced epidemic threshold. In every run of the simulations, the groups and the network topology, i.e., the scale and structure, are first set up, according to the rules specified in Section 3; i.e., number of directed edges are added from a -node to number of randomly selected -nodes. The nodes do not connect themselves despite . Then, the infection rate and the curing rate are configured, based on the analytical epidemic threshold . The simulations are carried out on the NepidemiX [33], which is a Python library implementing simulations of epidemics. For initialization, randomly selected 10% of the nodes are infected. During a simulation run, the infected nodes can be cured at the rate of . Every infected node can infect its neighbors connected by edges at the rate of . Every dot in the figures is the average of 100 independent runs under the same configurations, including the network topology, the percentage of initially infected nodes, and .

We first validate our model on a 1000-node network where nodes connect each other and form a complete graph. The infection and curing rates are set to be and per time slot. is set to be larger than to evaluate the proposed model. Ten percent of nodes are randomly chosen to be infected at . Figure 3 plots the infection density, given by , from to 30. The analytical results are numerically evaluated by employing (6a)–(6c) with the nodes evenly divided into 1, 2, 5, 10, 25, and 50 groups, respectively.

From Figure 3, we can see that the analytical results can asymptotically approach the simulation results with the increasing number of groups, e.g., from to . The simulation results can outgrow the analytical results when the number of node groups is small. The analytical results under a single group of nodes, i.e., , can substantially underestimate the infection density. The analytical result is only 83.8% () of the simulation result, when . In contrast, the analytical results under 50 groups of nodes, i.e., when , match the simulation results indistinguishably. This is because the proposed model is designed to decouple the state transitions of edges connecting nodes from different groups and estimate the number of edges connecting infected nodes and subsequently the population of infected nodes. The mean-field approximation is applied to model the interplay between the averaged ratios of infectious edges connecting different pairs of node groups. With an increasing number of node groups, the averaged ratio of infectious edges connecting a specific pair of node groups can become increasingly representative with a reducing deviation. In other words, the averaged ratio becomes increasingly precise for a reducing set of edges. In the special case where every node forms a group (i.e., the number of groups is ), the ratio is exactly the probability at which an edge is infectious. The proposed model is able to capture the state transition of an edge under the averaged effect of all other individual edges, and can be fairly accurate given the large number of edges. Note that the number of groups, e.g., , is far less than the network size, i.e., . This finding allows us to model the epidemic propagation with a small number of differential equations. Figure 3 also shows that the analytical results can be accurate at the initial stage (or low infection densities) even with few node groups. This is because the state transitions of different edges are loosely coupled if only very few nodes are infected.

We proceed to evaluate the model accuracy with different infection densities. This is done by adjusting the infection rate. A 1000-node network is considered where nodes connect each other and form a complete graph. The epidemic propagation with four effective spreading rates, i.e., , and , are simulated and analyzed. The values of are larger than, and close to, the analytical threshold. The infection rates are obtained by adjusting while setting to . The analysis is based on (6a)–(6c) by evenly dividing the nodes into 5 groups to explore the applicability of the model to a small number of groups.

From Figure 4, we can see that the simulation results still overtake the analytical results when and 0.002. For example, the simulation result is 0.459 in the case of and , while the analytical result is only 0.418. However, the gap between simulation results and analytical results decreases with dropping , i.e., from to 0.0015. When further declines, i.e., and 0.001, the analytical results are able to match the simulation results from the beginning to the end. According to (20), the epidemic threshold is given by . This figure reveals that the proposed mean-field model is asymptotically accurate with a decreasing and is able to precisely describe the epidemic propagation when the effective spreading rate is around the epidemic threshold.

We evaluate the epidemic threshold given by (20) in Figure 5. Three networks with 500 nodes are considered, where nodes are divided into three groups (i.e., ). The impact of the network topology on the threshold is evaluated by varying the number of edges in the network. Without loss of generality, their collapsed adjacency matrix is set towhere , or . The curing rate is set to be . Ten percent of nodes are randomly selected to be infected at the initial state. The infection density at is used as the stable infection density, as indicated by the -axis. Evaluated with (20), the epidemic threshold is = 0.005, 0.0033, and 0.0025 when = 4, 6, and 8, respectively. We can see from Figure 5 that (the solid vertical lines) can precisely specify the epidemic thresholds. When , the epidemic can be suppressed eventually. When , the infection density grows with and also exhibits convexity. We can see that the network topology has a strong impact on the epidemic threshold and the infection density. Specifically, the threshold decreases with the growth of . For example, halves from to , when doubles from 4 to 8 (the number of edges doubles, as well). The infection density increases with the growth of , especially around the threshold. For example, in the case of and , the infection density is 0.5 as compared to the infection density of .

We note that the epidemic threshold, given by (20), is determined by the network structure , rather than the number of nodes given by . To illustrate this, we compare the number of infected populations in different scales of networks, illustrated by Figure 6. To be specific, , where and 6, respectively. Their collapsed adjacency matrices are obtained with (21) by letting . As a result, . Figure 6 firstly confirms the accuracy of illustrated by the solid vertical line. The observation that the three networks with the same structure but different scales share the same epidemic threshold validates that the epidemic threshold depends on the network structure rather than the scale. This finding can significantly reduce the complexity to deduce the epidemic threshold, i.e., from decomposing an -dimensional matrix by using , given by [12], to decomposing an dimensional matrix (). For example, and in Figure 4, and and in Figure 5. It is interesting to notice that the infection densities are the same, e.g., in the case of , although the infected population varies in different scales of networks. This can be the reason of that the epidemic threshold depends on the network structure rather than the network scale.

6. Conclusion

In this paper, we designed a continuous-time SIS model in large-scale networks. By categorizing nodes into groups, the model complexity was significantly reduced. The proposed epidemic model was validated to be asymptotically accurate with the decrease of the effective spreading rate and/or the increase of node groups. The epidemic threshold can be deduced with the largest eigenvalue of the collapsed adjacency matrix whose dimension is much smaller than the network scale. Simulation results corroborated the effectiveness of the model, as well as the analytical accuracy of the threshold in large-scale networks.

Data Availability

The simulations are built on the epidemic simulation platform NepidemiX.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key R&D Program of China (no. 2017YFB0802703), the National Natural Science Foundation of China (no. 61121061), and UTS DVC-R Funding Initiative for Research Strengths.