- About this Journal ·
- Abstracting and Indexing ·
- Advance Access ·
- Aims and Scope ·
- Annual Issues ·
- Article Processing Charges ·
- Articles in Press ·
- Author Guidelines ·
- Bibliographic Information ·
- Citations to this Journal ·
- Contact Information ·
- Editorial Board ·
- Editorial Workflow ·
- Free eTOC Alerts ·
- Publication Ethics ·
- Reviewers Acknowledgment ·
- Submit a Manuscript ·
- Subscription Information ·
- Table of Contents
Discrete Dynamics in Nature and Society
Volume 2012 (2012), Article ID 872908, 11 pages
Impact of Bursty Human Activity Patterns on the Popularity of Online Content
1School of Economics and Management, Beijing University of Posts and Telecommunication, Beijing 100876, China
2School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287, USA
Received 11 June 2012; Accepted 13 August 2012
Academic Editor: Bo Yang
Copyright © 2012 Qiang Yan and Lianren Wu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The dynamics of online content popularity has attracted more and more researches in recent years. In this paper, we provide a quantitative, temporal analysis about the dynamics of online content popularity in a massive system: Sina Microblog. We use time-stamped data to investigate the impact of bursty human comment patterns on the popularity of online microblog news. Statistical results indicate that the number of news and comments exhibits an exponential growth. The strength of forwarding and comment is characterized by bursts, displaying fat-tailed distribution. In order to characterize the dynamics of popularity, we explore the distribution of the time interval between consecutive comment bursts and find that it also follows a power-law. Bursty patterns of human comment are responsible for the power-law decay of popularity. These results are well supported by both the theoretical analysis and empirical data.
The advent of Web 2.0 and online social media  is fostering web-mediated brokers such as microblog and search engines, through which anyone can easily publish and promote content online. The dynamics of online content popularity has been deeply affected by the existence of these web-mediated brokers. The Web 2.0 and online social media not only change traditional communication processes with new types of phenomena, but also generate a huge amount of time-stamped data, making it possible for the first time to study the dynamics of online content popularity and human activity patterns at the global system scale.
Many human factors may affect the popularity of online content, which include human interests [2–4], social identity [5, 6], limited attention , and memory effects . In this paper, we focus on the impact of bursty patterns of human comment on the popularity of Sina Microblog (http://www.weibo.com/) . Temporal heterogeneous and bursty are widely observed in many human-activated systems, which may result from both endogenous mechanisms like the highest-priority first protocol and exogenous factors like the seasonality and heterogeneity of human activities . This phenomenon was found in various human activity patterns such as instant messaging [11, 12], web browsing [13, 14], E-mail and surface mail [15, 16], and mobile phone calls [17, 18]. If only regarding the timing of events, these human activity patterns are often described by a power-law distribution , where is the time interval between two consecutive activities. Regarding this phenomenon, many mechanisms of human dynamics have been proposed to explain the temporal bursts, such as task priority mechanism , memory effect mechanism [8, 19], human interaction mechanism , interest-driven mechanism [2, 3], and social identity mechanism [5, 6].
Here, the popularity of microblog news is defined as the number of comments per day posted for a piece of news. It is well documented that the statistical properties of the variable are very heterogeneous, with distribution following power-law. We explore the distribution of the time interval between consecutive comment bursts and find it follows power-law. Furthermore, we prove that the exponent , which characterizes the bursty patterns of human comment, is connected to that in the decay of popularity by the relation .
The rest of this paper is organized as follows. Section 2 gives the data set. In Section 3, we study the dynamics of online news popularity. Section 4 introduces the relationship between bursty patterns of human activity and online content popularity. The power-law distribution is verified in Section 5. Finally, in Section 6 we conclude the work.
2. Data Set
The data of this research is collected from Sina Microblog (http://www.weibo.com/), which is one of the biggest microblogging platforms in China. We collected all news about a public topic, dated from August 20, 2009 to September 3, 2010, with the duration of 380 days. During this period, there are totally 125,150 pieces of news released, which have been forwarded 2,260,826 times and triggered 1,786,000 comments. For each piece of news, news ID, releasing time, times of forwarding, and number of comments were recorded. Therefore, we can track the dynamics of one specific piece of news through a unique ID.
From the statistical results, we find that during the data collecting time window, the number of news and comments exhibits an exponential growth (Figure 1). At the beginning of the observation time window, only a small amount of news and comments were posted. This is because Sina Microblog was just launched on August 14, 2009, and only a small group of people knew this application at that time.
The measurements also indicate that the number of forwarding and comments possess heterogeneity and burst. In our data set, among all the 125,150 pieces of news, 65772 pieces of news were forwarded and 69440 pieces of news were commented, respectively. As shown in Figure 2, the strength of forwarding (the number of forwarding for a piece of news) and the strength comment (the number of comments for a piece of news) follow power-law with the same exponent.
3. The Dynamics of Online Popularity
In order to quantitatively analyze the popularity of online microblog news, we consider the number of comments per day posted for a piece of news, expressed by at time . We study which represents the relative variation of the measurement in the time unit. Here, we use one day as the time unit, so is the average value of comment strength. And means the number of comments posted for a piece of news in the first day after the news released.
The relative variation of comment in the time unit is shown in Figure 3. Most news experienced a burst and received little attention thereafter. Since the relative variation may be negative, indicating a decrease in popularity, but our main concern is the positive values, so the events with negative variation are neglected.
Another way to characterize the dynamics of bursty systems is to study the distribution of time intervals between successive events. We analyzed the time distribution between consecutive comment bursts , namely, the time intervals between positive , shown in Figure 4. The intervals between bursts are distributed in a power-law.
We use maximum likelihood estimation (MLE) method  in conjunction with the Kolmogorov-Smirnov (KS) statistic table  to verify whether the fit is a good match to the data. In this case, the KS statistic suggests that the power-law curve is the better fit for the data, which will be explained in detail in Section 5.
4. Relationship between Bursty Human Activity Patterns and Popularity
In this section, we focus on the impact of bursty human activity patterns on the dynamics of popularity. First of all, we show that the number of comments for a piece of news can be derived from the comment patterns of users.
Assume that a given piece of news is released at time and that all users can comment on it. The comment patterns are different from the browsing patterns . Every user can comment on a piece of news more than once. In Figure 5, we show the comment patterns of one user, each vertical line represents a separate comment on the news. The thick line denotes the time when the user comments on the news for the first time after it was released at . The release time of the news divides the time interval into two consecutive comments of length and , where . The probability that a user comments at time after the news was released is proportional to the number of possible intervals. For a user characterized by a power-law intercomment time distribution with exponent and a minimum time unit of , the probability of finding an interval having a length larger than is In (4.1) we assume that .
For all users characterized by different exponents, the number of comment can be calculated analytically as the average of (4.1) over the observed exponent values: For simplicity, we assume that and focus on the case that all users are characterized by the same exponent . For example, the intercomment time distribution follows a power-law with exponent at the collective level (Figure 7).
Hence, (4.2) can be written as Thus, we prove that the number of comments for a piece of news decay follows a power-law with the exponent , namely, the decays of popularity follows a power-law with the exponent .
In our data set, the number of comments for a piece of news follows a power-law with exponent (Figure 6(a)). More than 80% of comments take place within the first day, then decay to only 10% on the second day, and finally reach a small amount after five days. Meanwhile, we statistic the lifetime of all news (Figure 6(b)) and find the average lifetime of all news is 5.16 days. Distribution of interval between two consecutive comments at the collective level follows a power-law with exponent (Figure 7).
To sum up, we prove the fact that bursty human activity patterns deeply affected the popularity of news. Meanwhile, we conclude that the exponent characterizing the bursty human activity patterns is connected to that in the decay of popularity by the relation: These results are supported by both the theoretical analysis and empirical data.
5. Testing the Power-Law Hypothesis
Recent empirical observations suggested that power-law distributions occur in many natural and man-made systems. Unfortunately, most previous empirical studies of power-law distributed data have not attempted to test the power-law hypothesis quantitatively. Instead, they typically rely on qualitative appraisals of the data, for instance, based on visualizations.
In this section, we use a goodness-of-fit test to tell whether the fit is a good match to the data. First, we fit our empirical data to the power-law model using the methods of maximum likelihood estimation (MLE) and calculate the KS statistic for this fit . Next, we use the KS table  obtaining good basis to confirm or reject the power-law distribution hypothesis.
Mathematically, a quantity obeys a power-law if it is drawn from a probability distribution as follows: where is a constant parameter of the distribution known as the exponent or scaling parameter.
In the discrete case, power-law distribution, known as the zeta distribution , is expressed as where is the Riemann zeta function defined as .
Maximum likelihood estimation of the zeta distribution maximizes the log-likelihood function given by where is the likelihood function of , given the data , is the log-likelihood function.
When , this maximum can be obtained theoretically for the zeta distribution by finding the zero of the derivative of the log-likelihood function : The most commonly used goodness-of-fit test is the Kolmogorov-Smirnov (KS) test , which is simply the maximum distance between the CDFs (cumulative distribution functions) of the data and the fitted model: where is the cumulative distribution function of the data, is the cumulative distribution function for the power-law model that best fits the data in the region .
Firstly, we fit our empirical data to the power-law model using the MLE method and calculate the KS statistic for this fit. Secondly, we adopt the KS table, shown in Table 1, for obtaining a goodness-of-fit estimate . Statistics were collected from the simulations to generate the KS quantiles. For each of the logarithmically spaced sample sizes, 10,000 power-law distributions were simulated, with random exponents from 1.5 to 4.0. Thirdly, we calculate the value, namely, the fraction of the time that the resulting statistic is larger than the value for the empirical data. Conover  presented detailed instructions of how to use the KS table for obtaining a goodness-of-fit estimate.
According to the above theory and KS table, we obtain the goodness-of-fit estimates as shown in Table 2.
From Table 2, we can find that data sets described in our paper are consistent with a power-law distribution. For the data set described in Figure 3, however, most data is less than 1. Therefore, in this case we cannot use the method of maximum likelihood estimation. Here, we use the method of nonlinear least square and find that the data obeys a power-law distribution in certain areas.
The main goal of this paper is to explore the impact of bursty human activity patterns on the dynamics of popularity. Through theoretical analysis and empirical data, we prove that bursty human activity patterns are responsible for the power-law decay of popularity. This conclusion is consistent with previous studies [21, 24, 29–31]. Our statistical results also indicate that the number of news and comments exhibits an exponential growth. The strength of forwarding and comment is characterized by bursts, displaying fat-tailed distribution. In order to characterize the dynamics of bursty systems, we explore the distribution of the time interval between consecutive comment bursts and find that it also follows a power law. Our measurements also indicate that microblog news have short lifetime. Most comments take place within the first day and the average lifetime of all news is 5.16 days. The average lifetime may vary for different social media, but the decay law of popularity is very likely generic, as they do not depend on the content, but are determined mainly by the human activity patterns. Indeed, the exponent , which characterizes the bursty human activity patterns, is connected to that in the decay of popularity by the relation .
This work was supported by Program for New Century Excellent Talents in University (NCET-11-0597) and the Fundamental Research Funds for the Central Universities (2012RC1002).
- D. Tapscott and A. D. Williams, Wikinomics: How Mass Collaboration Changes Everything, Portfolio Hardcover, New York, NY, USA, 2006.
- Z. D. Zhao, H. Xia, M. S. Shang, and T. Zhou, “Empirical analysis on the human dynamics of a large-scale short message communication system,” Chinese Physics Letters, vol. 28, no. 6, Article ID 068901, 4 pages, 2011.
- M. S. Shang, G. X. Chen, S. X. Dai, B. H. Wang, and T. Zhou, “Interest-driven model for human dynamics,” Chinese Physics Letters, vol. 27, no. 4, Article ID 048701, 4 pages, 2010.
- X. P. Han, T. Zhou, and B. H. Wang, “Modeling human dynamics with adaptive interest,” New Journal of Physics, vol. 10, Article ID 073010, 8 pages, 2008.
- Q. Yan, L. Wu, and L. Yi, “Research on the human dynamics in mobile communities based on social identity,” Discrete Dynamics in Nature and Society, vol. 2012, Article ID 672756, 13 pages, 2012.
- Q. Yan, L. Yi, and L. Wu, “Human dynamic model co-driven by interest and social identity in the MicroBlog community,” Physica A, vol. 391, no. 4, pp. 1540–1545, 2012.
- L. Weng, A. Flammini, A. Vespignani, and F. Menczer, “Competition among memes in a world with limited attention,” Scientific Reports, vol. 2, article 335, 2012.
- A. Vazquez, “Impact of memory on human dynamics,” Physica A, vol. 373, pp. 747–752, 2007.
- T. Zhou, Z.-D. Zhao, Z. Yang, and C. Zhou, “Relative clock verifies endogenous bursts of human dynamics,” Europhysics Letters, vol. 97, no. 1, article 18006, 2012.
- W. Hong, X. P. Han, T. Zhou, and B. H. Wang, “Heavy-tailed statistics in short-Message communication,” Chinese Physics Letters, vol. 26, no. 2, Article ID 028902, 2009.
- Z.-D. Zhao and T. Zhou, “Empirical analysis of online human dynamics,” Physica A, vol. 391, no. 11, pp. 3308–3315, 2012.
- B. Gonçalves and J. J. Ramasco, “Human dynamics revealed through Web analytics,” Physical Review E, vol. 78, no. 2, Article ID 026123, 2008.
- F. Radicchi, “Human activity in the web,” Physical Review E, vol. 80, no. 2, Article ID 026118, 2009.
- A. L. Barabási, “The origin of bursts and heavy tails in human dynamics,” Nature, vol. 435, no. 7039, pp. 207–211, 2005.
- J. G. Oliveira and A. L. Barabási, “Darwin and Einstein correspondence patterns,” Nature, vol. 437, no. 7063, p. 1251, 2005.
- H.-H. Jo, M. Karsai, J. Kertesz, and K. Kaski, “Circadian pattern and burstiness in mobile phone communication,” New Journal of Physics, vol. 14, Article ID 013055, 2012.
- J. Candia, M. C. González, P. Wang, T. Schoenharl, G. Madey, and A.-L. Barabási, “Uncovering individual and collective human dynamics from mobile phone records,” Journal of Physics A, vol. 41, no. 22, Article ID 224015, 2008.
- A. Vázquez, J. G. Oliveira, Z. Dezsö, K. I. Goh, I. Kondor, and A. L. Barabási, “Modeling bursts and heavy tails in human dynamics,” Physical Review E, vol. 73, no. 3, Article ID 036127, pp. 1–19, 2006.
- Y. Wu, C. Zhou, J. Xiao, J. Kurths, and H. J. Schellnhuber, “Evidence for a bimodal distribution in human communication,” Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 44, pp. 18803–18808, 2010.
- J. Ratkiewicz, S. Fortunato, A. Flammini, F. Menczer, and A. Vespignani, “Characterizing and modeling the dynamics of online popularity,” Physical Review Letters, vol. 105, no. 15, Article ID 158701, 2010.
- A. Clauset, C. R. Shalizi, and M. E. J. Newman, “Power-law distributions in empirical data,” SIAM Review, vol. 51, no. 4, pp. 661–703, 2009.
- M. L. Goldstein, S. A. Morris, and G. G. Yen, “Problems with fitting to the power-law distribution,” European Physical Journal B, vol. 41, no. 2, pp. 255–258, 2004.
- Z. Dezsö, E. Almaas, A. Lukács, B. Rácz, I. Szakadát, and A.-L. Barabási, “Dynamics of information access on the web,” Physical Review E, vol. 73, no. 6, Article ID 066132, 2006.
- N. L. Johnson, S. Kotz, and A. W. Kemp, Univariate Discrete Distributions, John Wiley & Sons, New York, NY, USA, 2nd edition, 1992.
- H. Bauke, “Parameter estimation for power-law distributions by maximum likelihood methods,” European Physical Journal B, vol. 58, no. 2, pp. 167–173, 2007.
- W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, Cambridge, UK, 2nd edition, 1992.
- W. J. Conover, Practical Nonparametric Statistics, John Wiley & Sons, New York, NY, USA, 1999.
- G. Szabo and B. A. Huberman, “Predicting the popularity of online content,” Communications of the ACM, vol. 53, no. 8, pp. 80–88, 2010.
- R. Crane and D. Sornette, “Robust dynamic classes revealed by measuring the response function of a social system,” Proceedings of the National Academy of Sciences of the United States of America, vol. 105, no. 41, pp. 15649–15653, 2008.
- M. J. Salganik, P. S. Dodds, and D. J. Watts, “Experimental study of inequality and unpredictability in an artificial cultural market,” Science, vol. 311, no. 5762, pp. 854–856, 2006.