Abstract

Statistical properties of the human comment behavior are studied using data from “Tianya” and “Tieba” which are very popular online social systems (or forums) in China. We find that both the reply number R and the view number V of a thread in a subforum obey the power-law distributions and , respectively, which indicates that there exists a kind of highly popular topics. These topics should be specially paid much attention, because they play an important role in the public opinion formation and the public opinion control. In addition, the relationship between R and V also obeys the power-law function . Based on the human comment habit, a model is introduced to explain the human view and reply behaviors in the forum. Numerical simulations of the model fit well with the empirical results. Our findings are helpful for discovering collective patterns of human behaviors and the evolution of public opinions on the virtual society as well as the real one.

1. Introduction

Statistical properties and models of human behaviors have received much attention in different scientific fields, such as sociology, psychology, and economics. However, most of the existing findings are only qualitative analyses for the lack of real data about the complexity of human behaviors. Usually, it is assumed that the human behavior is a Poisson progress [1, 2] which is a kind of the Markov progress. However, some researchers found that the interevent time distribution of some human behaviors is power-law which means that it is a non-Markov [314] one. More and more researchers are interested in it for its importance in the theory and potential applications.

As an important part of modern life and human dynamics, the human behavior on the Internet also attracts more and more attention. Chmiel et al. investigated the flows of visitors migrating between different portal subpages. A model of portal surfing was developed where a browsing process corresponds to a self-attracting walk on weighted networks with a short memory [15]. Grabowski found that the distribution of human activity has the form of a power-law [16] distribution. Based on the data from “Tianya”, Wu et al. found that the dynamics of human comment in the online society is non-Markov. Further, they proposed a model to explain it [17]. All these researches indicated that some kinds of human behavior in on-line systems were non-Markov. They have some common statistic properties. More and more researchers considered the forum as a virtual society to study the property and the evolution of complex friendship networks [18, 19].

A forum is very important for the information and the spreading of public opinions. Many public opinions were also formatted and then spread in the forum. Analyzing the user behavior in the forum is not only helpful for understanding the human behavior and enhancing the information spreading, but also for designing a better website which is important for the information spreading. Recently in China, the news about controlling public opinions on purpose by news have attracted more and more attention. There was a report that at least a half of public opinions in the Internet were proposed by some companies on purpose. So it is very important to study the human comment behavior in the forum. Yu et al. analyzed the view and reply data in the forum which was the beginning of researches on the human comment behavior in the forums. They found that the view and reply numbers of a thread in the sub-forum were power-law. However, they mainly considered statistic properties of the behavior and did not present a model to explain the basic mechanism [20].

In this paper, we consider the data collected from “Tianya” and “Tieba” which are very popular on-line social system in China and different from those in [17]. We show that both the view number and the reply number of a thread in the sub-forum obey power-law distributions which confirmed Yu et al.’s finding [20]. The relationship between and is also power-law. These present that a lot of topics are important in the formation and evolution of public opinions. Furthermore, based on the human habit, a model is proposed to explain these phenomena. Numerical simulations are given to explain the human comment behavior in the forum. We hope it is useful for understanding complex human behaviors in the forums.

This paper is organized as follows: in Section 2, the origin of the data is introduced. The statistical results are presented in Section 3. The model and numerical simulations are presented in Section 4. Finally, our conclusion is given in Section 5.

2. Description of the Original Data

Our data are obtained from “Tianya” (http://www.tianya.cn) and “Tieba” (http://tieba.baidu.com), which are two most popular on-line social systems in China. Our data are collected from the sub-forums of “Tianya” and “Tieba.” Each user is assigned a different identity name (ID) in the forums. A topic in the sub-forum is called a thread. A thread is a minimal unit, and it can be divided into a root thread and the reply threads. A root thread is a new topic, and the reply threads are related to a root one. The users discuss the public opinion in both the root and reply threads. Until 2010/02/11, there were 33,296,350 IDs in “Tianya,” and about 200,000 IDs on average were on-line at the same time. The topics and the public opinion in “Tianya” and “Tieba” reflect part of the public opinions of the real society in China. Our data sets are collected from the threads in four sub-forums. The types of these topics are different from public news to personal stories which indicate that our results are general for different contents. The format of the data is shown in Table 1, where the first column is the title of a thread, the second one gives the author’s name of a root thread, the third one shows and , and the last one is the last update time of a thread.

3. Statistical Results

In the forum, the view and reply times of a thread reflect the influencing ability of a topic. Further, more reply times mean more discussions and more communications. These two parameters play an important role in the public opinion formation and the web design. Hence, we study statistical properties of and in the thread of each sub-forum. Four sub-forums are randomly selected as our data sets. The topics and some prosperities are listed in Table 2.

The distributions of and in each sub-forum are shown in Figure 1 from which we can clearly see that all the distributions are power-law, although the threads differ in their contents. Their exponents vary with different sub-forums. These results show that the process of human comments is non-Markov which is the same as the human dynamics of the letter and e-mail communications, the web browsing, online movie watching, and broker trades. The heavy tail of the distribution allows for much more numbers of threads which have larger amounts of and than the Poisson progress. The thread which has more and has much more influences on the public opinion. The number of these kind of threads is so large that they cannot be ignored. A large population will read the thread by which their opinions may be influenced. So we must pay much attention to them.

As is known to all, the more the view, the more the reply. However, the quantity relationship between and is not very easy to know and it is the basic property of a thread. Hence, next we mainly focus on the relationship between the human’s view and reply behaviors in Figure 2. We found that it can be illustrated as a straight line in a log-log plot, which means . It is easy to understand that the more the view, the more the reply. Moreover, the nonlinear relationship here also means that the reply number increases slower than the view one when the view number is large enough. It also indicates that human’s interest in reply decreases as the increment of .

4. The Model and the Simulations

In order to get a better understanding of our empirical observations in Section 3, we propose a model based on our intuitive experience about the human comment habit. We see that the view number of each sub-forum increases more quickly as the time evolves. There are many threads on each sub-forum. Each thread will be viewed based on its content and its previous view time. Hence, our model is defined by the following scheme.

Step 1 (growing). At time , there are a few threads on the sub-forum, and each thread has a random small and . At each step, a new thread is created, and there are views on the old thread. All the old threads have the probabilities to be viewed.

Step 2 (view habit). The probability that an old thread is viewed at each step is based on its attraction , where is the attraction of a thread at time and it is reflected by the previous view number , that is, . Here represents the initial attraction which is different due to different topics.

Step 3 (reply habit). At each step, when the user views a thread, he has a probability to reply the thread.

Mathematically, the model is similar to the growing networks in [21]. Based on the analysis of the growing network in this paper, we obtain that the distribution of is a power-law one, that is, at a large enough time where the exponent is .

To compare our model with empirical observation results, let us take the sub-forum C in our data sets as an example. Here we use the parameters in the simulation. The results are shown in Figure 3. Figure 3(a) presents that the distribution of the view is indeed a power-law one with a similar exponent as that from the data. Figure 3(b) shows that the reply number also obeys a power-law distribution. The nonlinear relationship between the view and reply times is shown in Figure 3(c) which is the same as that from the data. In Figure 3(d), we further study the relationship between the parameter and the slope . We see that decreases as increases. From the analyses above, we can see that the proposed model can well-describe most important features in the human view and reply behaviors in online social systems.

5. Conclusion

In this paper, we analyze the statistical properties of the view and reply behaviors in on-line social systems. We find that they are different types of interactive human dynamics which are non-Markov. The view and the reply behaviors follow power-law distributions, and the relationship between them also follows a power-law one. A model based on the personal attraction is introduced to explain the human complex behavior. Numerical simulations of the model fit well with empirical results. Our work is useful to understand the human complex behavior in realistic society, for example, the human discussion behavior in a meeting or group communications in trunked mobile telephony [22]. We expect that quantitative understanding of human view and reply behaviors, when combined with additional content analyses, will open a new perspective on distinguishing fraud public opinions from realistic opinions.

Acknowledgment

This paper is supported by the National Natural Science Foundation of China (Grants nos. 61104152, 60804046), the Fundamental Research Funds for the Central Universities (Grant no. 2011R01), the Foundation for the Author of National Excellent Doctoral Dissertation of China (Grant no. 200951), and the Asia Foresight Program under NSFC Grant (Grant no. 61161140320).