Abstract

The dynamics of online content popularity has attracted more and more researches in recent years. In this paper, we provide a quantitative, temporal analysis about the dynamics of online content popularity in a massive system: Sina Microblog. We use time-stamped data to investigate the impact of bursty human comment patterns on the popularity of online microblog news. Statistical results indicate that the number of news and comments exhibits an exponential growth. The strength of forwarding and comment is characterized by bursts, displaying fat-tailed distribution. In order to characterize the dynamics of popularity, we explore the distribution of the time interval Δ𝑡 between consecutive comment bursts and find that it also follows a power-law. Bursty patterns of human comment are responsible for the power-law decay of popularity. These results are well supported by both the theoretical analysis and empirical data.

1. Introduction

The advent of Web 2.0 and online social media [1] is fostering web-mediated brokers such as microblog and search engines, through which anyone can easily publish and promote content online. The dynamics of online content popularity has been deeply affected by the existence of these web-mediated brokers. The Web 2.0 and online social media not only change traditional communication processes with new types of phenomena, but also generate a huge amount of time-stamped data, making it possible for the first time to study the dynamics of online content popularity and human activity patterns at the global system scale.

Many human factors may affect the popularity of online content, which include human interests [24], social identity [5, 6], limited attention [7], and memory effects [8]. In this paper, we focus on the impact of bursty patterns of human comment on the popularity of Sina Microblog (http://www.weibo.com/) [9]. Temporal heterogeneous and bursty are widely observed in many human-activated systems, which may result from both endogenous mechanisms like the highest-priority first protocol and exogenous factors like the seasonality and heterogeneity of human activities [10]. This phenomenon was found in various human activity patterns such as instant messaging [11, 12], web browsing [13, 14], E-mail and surface mail [15, 16], and mobile phone calls [17, 18]. If only regarding the timing of events, these human activity patterns are often described by a power-law distribution 𝑃(𝜏)𝜏𝛼, where 𝜏 is the time interval between two consecutive activities. Regarding this phenomenon, many mechanisms of human dynamics have been proposed to explain the temporal bursts, such as task priority mechanism [15], memory effect mechanism [8, 19], human interaction mechanism [20], interest-driven mechanism [2, 3], and social identity mechanism [5, 6].

Here, the popularity of microblog news is defined as the number of comments per day posted for a piece of news. It is well documented that the statistical properties of the variable are very heterogeneous, with distribution following power-law. We explore the distribution of the time interval Δ𝑡 between consecutive comment bursts and find it follows power-law. Furthermore, we prove that the exponent 𝛼, which characterizes the bursty patterns of human comment, is connected to that in the decay of popularity 𝜃 by the relation 𝜃=𝛼1.

The rest of this paper is organized as follows. Section 2 gives the data set. In Section 3, we study the dynamics of online news popularity. Section 4 introduces the relationship between bursty patterns of human activity and online content popularity. The power-law distribution is verified in Section 5. Finally, in Section 6 we conclude the work.

2. Data Set

The data of this research is collected from Sina Microblog (http://www.weibo.com/), which is one of the biggest microblogging platforms in China. We collected all news about a public topic, dated from August 20, 2009 to September 3, 2010, with the duration of 380 days. During this period, there are totally 125,150 pieces of news released, which have been forwarded 2,260,826 times and triggered 1,786,000 comments. For each piece of news, news ID, releasing time, times of forwarding, and number of comments were recorded. Therefore, we can track the dynamics of one specific piece of news through a unique ID.

From the statistical results, we find that during the data collecting time window, the number of news and comments exhibits an exponential growth (Figure 1). At the beginning of the observation time window, only a small amount of news and comments were posted. This is because Sina Microblog was just launched on August 14, 2009, and only a small group of people knew this application at that time.

The measurements also indicate that the number of forwarding and comments possess heterogeneity and burst. In our data set, among all the 125,150 pieces of news, 65772 pieces of news were forwarded and 69440 pieces of news were commented, respectively. As shown in Figure 2, the strength of forwarding 𝑆forward (the number of forwarding for a piece of news) and the strength comment 𝑆comment (the number of comments for a piece of news) follow power-law with the same exponent.

3. The Dynamics of Online Popularity

In order to quantitatively analyze the popularity of online microblog news, we consider the number of comments per day posted for a piece of news, expressed by 𝑋(𝑡) at time 𝑡. We study []Δ𝑋𝑡=[]𝑋(𝑡)𝑋(𝑡1)𝑋,(3.1) which represents the relative variation of the measurement in the time unit. Here, we use one day as the time unit, so 𝑋 is the average value of comment strength. And 𝑋(1) means the number of comments posted for a piece of news in the first day after the news released.

The relative variation of comment in the time unit is shown in Figure 3. Most news experienced a burst and received little attention thereafter. Since the relative variation may be negative, indicating a decrease in popularity, but our main concern is the positive values, so the events with negative variation are neglected.

Another way to characterize the dynamics of bursty systems is to study the distribution of time intervals between successive events. We analyzed the time distribution between consecutive comment bursts [21], namely, the time intervals between positive Δ𝑋, shown in Figure 4. The intervals between bursts are distributed in a power-law.

We use maximum likelihood estimation (MLE) method [22] in conjunction with the Kolmogorov-Smirnov (KS) statistic table [23] to verify whether the fit is a good match to the data. In this case, the KS statistic suggests that the power-law curve is the better fit for the data, which will be explained in detail in Section 5.

4. Relationship between Bursty Human Activity Patterns and Popularity

In this section, we focus on the impact of bursty human activity patterns on the dynamics of popularity. First of all, we show that the number of comments for a piece of news 𝑋(𝑡) can be derived from the comment patterns of users.

Assume that a given piece of news is released at time 𝑡0 and that all users can comment on it. The comment patterns are different from the browsing patterns [24]. Every user can comment on a piece of news more than once. In Figure 5, we show the comment patterns of one user, each vertical line represents a separate comment on the news. The thick line denotes the time when the user comments on the news for the first time after it was released at 𝑡0. The release time of the news 𝑡0 divides the time interval Δ𝑡 into two consecutive comments of length 𝑡1 and 𝑡2, where 𝑡1+𝑡2=Δ𝑡. The probability that a user comments at time 𝑡2 after the news was released is proportional to the number of possible Δ𝑡 intervals. For a user characterized by a power-law intercomment time distribution with exponent 𝛼 and a minimum time unit of 𝑡𝑚, the probability of finding an Δ𝑡 interval having a length larger than 𝑡2 is 𝑝Δ𝑡>𝑡2=(𝛼1)𝑡𝑚𝛼1𝑡2(Δ𝑡)𝛼𝑡𝑑Δ𝑡=2𝑡𝑚𝛼+1.(4.1) In (4.1) we assume that 𝛼>1.

For all users characterized by different exponents, the number of comment 𝑋(𝑡) can be calculated analytically as the average of (4.1) over the observed exponent values: 𝑡𝑋(𝑡)𝑡𝑚𝛼+1𝛼.(4.2) For simplicity, we assume that 𝑡𝑚=1 and focus on the case that all users are characterized by the same exponent 𝛼. For example, the intercomment time distribution follows a power-law with exponent 𝛼=2.5 at the collective level (Figure 7).

Hence, (4.2) can be written as 𝑋(𝑡)𝑡𝛼+1.(4.3) Thus, we prove that the number of comments for a piece of news decay follows a power-law with the exponent 𝛼1, namely, the decays of popularity follows a power-law with the exponent 𝛼1.

In our data set, the number of comments 𝑋(𝑡) for a piece of news follows a power-law with exponent 𝜃=1.5 (Figure 6(a)). More than 80% of comments take place within the first day, then decay to only 10% on the second day, and finally reach a small amount after five days. Meanwhile, we statistic the lifetime of all news (Figure 6(b)) and find the average lifetime of all news is 5.16 days. Distribution of interval between two consecutive comments at the collective level follows a power-law with exponent 𝛼=2.5 (Figure 7).

To sum up, we prove the fact that bursty human activity patterns deeply affected the popularity of news. Meanwhile, we conclude that the exponent 𝛼 characterizing the bursty human activity patterns is connected to that in the decay of popularity 𝜃 by the relation: 𝜃=𝛼1.(4.4) These results are supported by both the theoretical analysis and empirical data.

5. Testing the Power-Law Hypothesis

Recent empirical observations suggested that power-law distributions occur in many natural and man-made systems. Unfortunately, most previous empirical studies of power-law distributed data have not attempted to test the power-law hypothesis quantitatively. Instead, they typically rely on qualitative appraisals of the data, for instance, based on visualizations.

In this section, we use a goodness-of-fit test to tell whether the fit is a good match to the data. First, we fit our empirical data to the power-law model using the methods of maximum likelihood estimation (MLE) and calculate the KS statistic for this fit [22]. Next, we use the KS table [23] obtaining good basis to confirm or reject the power-law distribution hypothesis.

Mathematically, a quantity 𝑥 obeys a power-law if it is drawn from a probability distribution as follows: 𝑝(𝑥)𝑥𝛼,(5.1) where 𝛼 is a constant parameter of the distribution known as the exponent or scaling parameter.

In the discrete case, power-law distribution, known as the zeta distribution [25], is expressed as 𝑝(𝑥)=𝜁(𝛼,𝑥)𝜁𝛼,𝑥min,(5.2) where 𝜁(𝛼) is the Riemann zeta function defined as 𝑘=1𝑘𝛼.

Maximum likelihood estimation of the zeta distribution maximizes the log-likelihood function given by 𝑙(𝛼𝑥)=𝑛𝑖=1𝑥𝑖𝛼𝜁,=(𝛼)𝐿(𝛼𝑥)=log𝑙(𝛼𝑥)𝑛𝑖=1𝑥𝛼log𝑖log(𝜁(𝛼))=𝛼𝑛𝑖=1𝑥log𝑖𝑛log(𝜁(𝛼)),(5.3) where 𝑙(𝛼𝑥) is the likelihood function of 𝛼, given the data 𝑥=𝑥𝑖  1𝑖𝑛, 𝐿(𝛼𝑥) is the log-likelihood function.

When 𝑥min>1, this maximum can be obtained theoretically for the zeta distribution by finding the zero of the derivative of the log-likelihood function [26]: 𝑑𝑑𝛼𝐿(𝛼𝑥)=𝑛𝑖=1𝑥log𝑖1𝑛𝑑𝜁(𝛼)𝜁𝑑𝛼𝜁(𝛼)=0,(𝛼)1𝜁(𝛼)=𝑛𝑛𝑖=1𝑥log𝑖.(5.4) The most commonly used goodness-of-fit test is the Kolmogorov-Smirnov (KS) test [27], which is simply the maximum distance between the CDFs (cumulative distribution functions) of the data and the fitted model: 𝐾=max𝑥𝑥min||||,𝐹(𝑥)𝑃(𝑥)(5.5) where 𝐹(𝑥) is the cumulative distribution function of the data, 𝑃(𝑥) is the cumulative distribution function for the power-law model that best fits the data in the region 𝑥𝑥min.

Firstly, we fit our empirical data to the power-law model using the MLE method and calculate the KS statistic for this fit. Secondly, we adopt the KS table, shown in Table 1, for obtaining a goodness-of-fit estimate [23]. Statistics were collected from the simulations to generate the KS quantiles. For each of the logarithmically spaced sample sizes, 10,000 power-law distributions were simulated, with random exponents from 1.5 to 4.0. Thirdly, we calculate the 𝑃 value, namely, the fraction of the time that the resulting statistic is larger than the value for the empirical data. Conover [28] presented detailed instructions of how to use the KS table for obtaining a goodness-of-fit estimate.

According to the above theory and KS table, we obtain the goodness-of-fit estimates as shown in Table 2.

From Table 2, we can find that data sets described in our paper are consistent with a power-law distribution. For the data set described in Figure 3, however, most data is less than 1. Therefore, in this case we cannot use the method of maximum likelihood estimation. Here, we use the method of nonlinear least square and find that the data obeys a power-law distribution in certain areas.

6. Conclusion

The main goal of this paper is to explore the impact of bursty human activity patterns on the dynamics of popularity. Through theoretical analysis and empirical data, we prove that bursty human activity patterns are responsible for the power-law decay of popularity. This conclusion is consistent with previous studies [21, 24, 2931]. Our statistical results also indicate that the number of news and comments exhibits an exponential growth. The strength of forwarding and comment is characterized by bursts, displaying fat-tailed distribution. In order to characterize the dynamics of bursty systems, we explore the distribution of the time interval Δ𝑡 between consecutive comment bursts and find that it also follows a power law. Our measurements also indicate that microblog news have short lifetime. Most comments take place within the first day and the average lifetime of all news is 5.16 days. The average lifetime may vary for different social media, but the decay law of popularity is very likely generic, as they do not depend on the content, but are determined mainly by the human activity patterns. Indeed, the exponent 𝛼, which characterizes the bursty human activity patterns, is connected to that in the decay of popularity 𝜃 by the relation 𝜃=𝛼1.

Acknowledgments

This work was supported by Program for New Century Excellent Talents in University (NCET-11-0597) and the Fundamental Research Funds for the Central Universities (2012RC1002).