Abstract

We investigate the impact of human dynamics on the information propagation in online social networks. First, statistical properties of the human behavior are studied using the data from “Sina Microblog,” which is one of the most popular online social networks in China. We find that human activity patterns are heterogeneous and bursty and are often described by a power-law interevent time distribution . Second, we proposed an extended Susceptible-Infected (SI) propagation model to incorporate bursty and limited attention. We unveil how bursty human behavior and limited attention affect the information propagation in online social networks. The result in this paper can be useful for optimizing or controlling information propagation in online social networks.

1. Introduction

Rapid development of information and communication technology has increased the wide adoption of online social network in our life. Indeed, online social network such as Sina Microblog, Twitter, and Facebook had become an indispensable part of our life. Every day we sign into our homepages more than once to view and share information. These online social networks have common characteristics: instantaneity, simplicity, and universality. Taking Sina Microblog, for example, unlike the traditional blog, it allows the use of mobile devices to disseminate information by a length of 140 characters text at anytime and anywhere. Investigating the online social network is crucial in a broad range of settings from information propagation and viral marketing to political purposes.

Recent years, online social network as a platform for the empirical study of information has been widespread concern [14]. Despite the progresses that have been made, the empirical study of information propagation is still in its infancy. Studies in this direction have been mostly hindered by the shortcoming of available large-scale data. However, the availability of large-scale data from online social network has recently created unprecedented opportunities to explore the impact of human behaviors on the information propagation.

Firstly, information propagation in online social network is determined by rhythms and activity patterns of human [5, 6]. An increasing number of recent measurements indicate that human activity patterns are heterogeneous and bursty [711]. If only considering the time interval between events, these human activity patterns are often described by a power-law interevent time distribution , where is the time interval between two consecutive activities [12]. Recently, the researchers began to realize that the bursty human behavior has an important impact on the dissemination of information [13, 14].

Secondly, the wide adoption of online social network has increased the competition among information for our limited attention. Every day we receive a lot of information from various online social networks. However, we do not have enough time and attention to disseminate each message which we received. It is an interesting question that whether such a competition may affect the velocity of information propagation. The issue of limited attention has been studied through messages posted and forwarded in online social networks [15, 16]. However, how limited attention affects velocity of information propagation is still unclear.

In this paper, we propose an extended Susceptible-Infected (SI) propagation model, incorporating bursty human activity patterns and limited attention for the first time. Then, we obtain a large number of real data to test the model. Adopting the methods of theoretical research and empirical analysis, we study the information spreading process in social networking qualitatively and quantitatively. The key contributions of this study are summarized as follows.(1)From the empirical statistical results we find that at the group level, the interactive time (time interval between two consecutive login microblog homepage) follows power-law distribution with the . And the distribution of newly infected individual (calculate as the number of new forwarding per day) follows power-law with the . Two slope values satisfy the relationship .(2)Through both the theoretical research and simulation, we prove that if the generation time distribution follows power-law with exponent , then the decay of propagation velocity will be characterized by the same power-law distribution; if bursty human behavior follows a power-law distribution with exponent , the decay of propagation velocity also follows a power-law with exponent .

In summary, although tremendous efforts have been made regarding the research about information propagation, further study based on human dynamics is still needed to unveil the role of human behaviors for the information propagation in online social network. In future studies, on the other hand, we can use other more mature theories to research the spreading dynamics, such as in the references [17, 18].

The rest of this paper is organized as follows. Section 2 gives the data description. In Section 3, we propose the extended SI model. In Section 4, we present simulation results and observations. Section 5 introduces theoretical analysis. Finally, in Section 6, we conclude the work.

2. Data Description

The dataset of this paper was collected from Sina Microblog (http://www.weibo.com/), one of the most popular microblog platforms in China at present. The dataset includes 345,095 messages from 41667 individuals during 2009/8/16 to 2011/6/4, collected by snowball sampling. These messages have been forwarded 203,997,094 times and triggered 58,617,139 comments. For each message, message ID, releasing time, times of forwarding, and number of comments were recorded. For each individual, the individual ID and the timing of individual sign in his/her microblog homepage were recorded.

The basic statistical results show that at the group level, the interactive time (time interval between two consecutive login microblog homepage) follows power-law distribution with the (Figure 1(a)). And the distribution of newly infected individual (calculate as the number of new forwarding per day) follows power-law with the (Figure 1(b)). If set the slope of interactive time distribution is and the slope of newly infected individual distribution is , we find that there is the relationship between two slopes.

3. Model

3.1. Model Description

In this paper, we use the branching processes [19, 20] in conjunction with power-law human behaviors to describe the process of information propagation. We adopt the Susceptible-Infected (SI) propagation model for the simulation of information propagation in online social networks. Similar to the classical SI model, the population is divided into two states, either susceptible (S) or infected (I). In the information propagation model, however, the susceptible individual is defined as the one who has not yet known a piece of message, and the infected individual is defined as the one who knows the message and shares the message with his/her friends. After being infected, an individual will never return to susceptible state. At time , there are susceptible individuals and infected individuals, and the population .

Initially all individuals are susceptible except for a single infected individual. Different with the traditional model, at a given time step, an infected individual can be inactive; that is to say, infected individual will not infect connected susceptible individuals at that time step. The time interval between two consequent active steps of an infected individual is defined as the interactive time, which is often characterized by a power-law distribution at the group level. Meanwhile, different individuals have different active time interval and each individual acts with an unchanged interactive time .

On the other hand, the advent of online social network has greatly lowered the cost of information generation and propagation, boosting the potential reach of each message. However, the abundance of information to which we are exposed through online social networks is exceeding our capacity to consume it. Due to the limited time and attention, the individual cannot continuously check the update of information on his/her homepage. We assume that individuals interact on a directed online social network. Each individual is equipped with two lists. One is the screen where received messages are recorded and maintained a time-ordered list of messages. The other is memory where individual interested messages are recorded. Each individual can share some of the messages from the list with his/her friends. The friends in turn pay attention to a newly received message by placing it at the top of their lists. Because of the limited attention, we allow messages to survive in an individual’s screen for a finite amount of time . Meanwhile, we assume that each individual only forwards each message once, and then the individual loses interest in the message. In addition, if the individual no forwarding the message within , the individual will no longer be concerned about the message and delete it from the screen. Each message may attract the individual’s attention with probability ; that is to say, the individual will forward the message with probability .

3.2. SI Model Based on Bursty and Limited Attention

According to the previous description, the SI model incorporating bursty and limited attention is illustrated in Figure 2. We characterize the timing of information propagation by the generation time , which is defined as the time interval between the forwarding of an individual and the forwarding of his/her followers.

To sum up, the extended SI model is defined as follows.

Step 1. At time step , an individual posts a message. Meanwhile, individual receives the message, where and is the set of individual ’s neighbors.

Step 2. For each individual , the first active time step is , , and individual will be active at the time steps , where is the active time interval of individual .

Step 3. At each active time step, individual will forward the message with the probability . If individual forwards the message at the time step , we obtain the generation time and generation time must satisfy the condition .

Step 4. Update the time step and repeat Step 1 to Step 3 until the preset time steps.

In addition, we also introduce two indicators to characterize the velocity of information propagation:(1)the first time step when the number of infected individuals exceeds half of the population, defined as half time ;(2)the mean infection time of an individual after the outbreak, defined as mean time , where is the maximum simulation step, such as in our simulation .

4. Simulation Results and Observations

In our simulations, initially all individuals are susceptible except for a single infected individual. Each individual has an unchanged interactive time , which follows power-law distribution with . We set time steps. This is because messages will survive in an individual’s list one day, namely, 1440 minutes [15]. Simulations were performed on a BA network with size and . We set the degree of attention and randomly select an initial infected node. For detailed comparison, we also performed the same SI dynamics with exponential interactive time distribution . From the numerical simulation results (Figures 3 and 4), we have the following observations of the propagation process.

Observation 1. In power-law case, the average number of newly infected individuals and the generation time follow power-law distributions with the exponent (Figure 3).

Observation 2. The smaller the exponent of interactive time distributions, namely, the larger heterogeneity of interactive time, resulting in the slower velocity. The half time and mean time monotonic decrease with the increase of exponent (Figure 4).

In order to investigate the impact of attention on the propagation process, we fixed interactive time following power-law distribution with the exponent and randomly select an initial infected node. From other parameters , simulations were also performed on a BA network with size and . The results are averaged over independent runs. From the numerical simulation results (Figure 5), we have the following observation of the propagation process.

Observation 3. The higher the degree of attention, the faster the velocity. The half time and mean time monotonic decrease with the increase of attention (Figure 5).

5. Theoretical Analysis

In this section, the properties of propagation dynamics are analyzed. We prove that the decay exponent of propagation velocity equals that in the generation time distribution. Furthermore, we also proved that the exponent characterizing the bursty is related to that in the decay of propagation velocity by the relation .

Proposition 1. If the distribution of generation time follows power-law with , the decay of propagation velocity also follows power-law and with the same exponent .

Proof. We consider a general theory of propagation process in online social networks. We assume that the propagation process outbreaks starting from a single infected individual at time . In this case, the average number of new infected individuals at time is [19] where is the average number of individuals at generation away from the first infected individual, where denotes the convolution operation; for example, For the limited , we can obtain where , is some characteristic time scale, and represents the Levy distribution with exponent .
For , the Levy distribution can expressed as [21] To sum up, when , we obtain ; namely, . Thus, the proposition has been proved.

This preposition means that if the generation time distribution follows a power-law with the exponent , then the decay of propagation velocity will be characterized by the same power-law distribution.

Proposition 2. If the distribution of interactive time follows a power-law with , the decay of propagation velocity also follows a power-law distribution with and .

Proof. When the distribution of interactive time follows a power-law with , the active time interval has a finite mean .
Since the generation time probability density function is related to the interactive time probability density function [21], therefore we have According to Proposition 1, we obtain Namely, Thus, the proposition is proved.

6. Conclusion

An extended SI model is proposed in this paper. Different from the analysis of the network topology, we study the information propagation in online social networks from the perspective of human dynamics. We found that human behavior affects the range and velocity of information propagation greatly.

In the future, with the development of online social systems, there may be other factors influencing information propagation in online social network. Therefore, we must improve the propagation model in order to better explain the propagation process.

Acknowledgments

The authors would like to thank Liang Huang and Byungjoon Min for helpful discussions. This work was supported by Program for New Century Excellent Talents in University (NCET-11-0597) and the Fundamental Research Funds for the Central Universities (2012RC1002).