Abstract

This paper studies the human behavior in the top-one social network system in China (Sina Microblog system). By analyzing real-life data at a large scale, we find that the message releasing interval (intermessage time) obeys power law distribution both at individual level and at group level. Statistical analysis also reveals that human behavior in social network is mainly driven by four basic elements: social pressure, social identity, social participation, and social relation between individuals. Empirical results present the four elements' impact on the human behavior and the relation between these elements. To further understand the mechanism of such dynamic phenomena, a hybrid human dynamic model which combines “interest” of individual and “interaction” among people is introduced, incorporating the four elements simultaneously. To provide a solid evaluation, we simulate both two-agent and multiagent interactions with real-life social network topology. We achieve the consistent results between empirical studies and the simulations. The model can provide a good understanding of human dynamics in social network.

1. Introduction

The increasing development of social network provides a unique source for analyzing human dynamics in the modern age. With the evolution of the mobile communication technology, people can enjoy various social applications more conveniently, such as Twitter and especially Facebook. Application development is a direct result of data surge, and the era of big data and complex system give us an unprecedented opportunity to study human behavior [1]. In China, Sina Microblog (http://en.wikipedia.org/wiki/Sina_Weibo), which is akin to a hybrid of Twitter and Facebook, is the most popular social network sites for information propagation and discussion among people. Up to May 2012, Sina Microblog has more than 300 million registered users and generates more than 100 million microblogs every day. It occupies 57% of the microblog users, as well as 87% of the microblog activities in China. There are 60% of active users who log in through the mobile terminal (http://tech.sina.com.cn/i/2012-05-15/12307109653.shtml.) Such systems have tons of information, not only from the perspective of individual behaviors but also in terms of human interactions. Therefore, such social network sites provide great potential to analyze human behaviors in social network for understanding human dynamics. The study of complex systems also attracts researchers in various fields [27].

In traditional studies on human behaviors, human behavior is usually assumed as random activity and thus can be modeled as Poisson processes [8]. This assumption leads to an exponential interevent time distribution of human activities. However, a lot of recent empirical studies have already proved that this is wrong. For example, Barabási first discovers that the time-interval between sending an email and receiving a reply follows a power-law distribution, with heavy tails [9]. Afterwards, a couple of similar statistical properties in human dynamics are empirically discovered by using various datasets, including web browsing [10], short message sending [11], cyber-physical networking [12], netizens’ behaviors on the forum [13], and movie watching [14].

To understand the intrinsic factor of such heavy-tailed property, Barabási and Vázquez first propose a priority queuing model and successfully explain the phenomenon of human behavior based on task queue [9, 15, 16]. Subsequently, researchers design various human dynamic models for further extension. An aging model which assumes the priority of each task is connected with “earliest deadline first” principle is proposed by Blanchard and Hongler [17]. Deng et al. consider the task deadline as a restrictive condition and study the influence of the deadline on the waiting time of the task [18]. Economic optimum method is employed to the process of task fulfillment by Dall’Asta and other researchers [19]. These models are largely based on task priority queuing but not suitable for nontask-driven scenarios like movie watching, enjoying feast, and microblogging entertainment.

Vazquez first propose a memory model to analyze human dynamics [20]. The memory models consider that humans have perceptions of their past activities, and therefore humans accelerate or reduce their activity rates according to their memories. By means of the memory model, Ming-Sheng and coworkers propose interest-driven model for human dynamics, which indicates people’s interest in new things rises according to involvement frequency. For example, the interest disappears due to frequent involvement but may suddenly revive after lasting indifference. The change of people’s interest may cause the heavy-tail distribution of their behaviors [21]. Han et al. also notice the fact that people’s interest in a certain activity may be changed due to their feelings and thus proposed the self-adapting human dynamic mechanism [22]. Yan et al. study on the people’s interest in the Sina Microblog community, and they point out that social identity, or defined as commenting on or forwarding a user’s message by others, is an important factor to invoke user interest [23]. Such interest models provide a good understanding of the possible dynamic mechanism in their scenarios. However, these models focus on individual behavior, but they are not suitable for social network scenarios. In social network, there are not only individual behaviors but also interaction between individuals.

The impact of human interaction on the patterns of human dynamics is first addressed by Oliveira and Vazquez [24]. They provide a minimal model that consists of two priority queues, that is, interacting () and noninteracting (). The human interaction is taken account for in a way that the -task is executed only when both of the individuals choose to execute them, (i.e., an AND-type protocol for the execution of -task). The model is suitable for the scenarios that the two interaction agents need to complete interactive work synchronously, such as participating in a conference call. Hereinafter, a series of extended models are proposed, for example, OR type protocol model [25, 26] and short message interaction model [27]. However, not all the interaction behavior follow AND-type protocol or OR-type protocol. Besides, these works are mainly focus on two agents interaction scenarios, not suitable for the real structural features of social network. Recently, Xiao et al. study human dynamic in Internet forum system and highlight the real-life social network with arbitrary relationships [28].

In the context of microblog community which is a representative online social network and characterized by mobility, people can express their viewpoints, participate in the discussion of the social events, and receive praise or criticism anytime, anywhere what they see and feel. User behavior is influenced by various factors such as user work environment, social identity, personality, and social circles. Obviously, this kind of human behavior is not task-driven and is not interest-driven or the interaction-driven or simply a mixture of both which we will not be able to explain.

To find what on earth drives human dynamics in social network, we study the combined impact of interest and node influence (i.e., interactions) of human dynamics in arbitrary social networks in this paper. We analyze the human behaviors in China’s largest online social network (Sina MicroBlog), including messaging like posting a new microblog, commenting, or forwarding an existing microblog. Based on the Sina datasets, experimental evidence shows that different types of intermessage time distributions follow power-law both at individual level and at group level. Furthermore, we try to find what on earth drives human dynamics in social network. We propose a human dynamic model that combines individual behavior (i.e., interest) and node influence (i.e., interaction). We try not to simply plug the two parts together but build a stronger model with a sound mathematical integration of various useful parameters during our modeling and simulation. These parameters reflect the factors affecting the user behavior. While testing with real-life social network datasets, the simulation results of our model are consistent with the empirical observations, which imply that our model offers a suitable explanation of the power-law properties in human dynamics.

This paper is organized as follows. After the introduction in Section 1, Section 2 describes the origin of the data; Section 3 shows the statistical analysis; Section 4 presents our hybrid model on the combination of interest and interaction; Section 5 compares the results of simulation and the empirical ones; Section 6 concludes this paper.

2. Data Description

Empirical data are collected from Sina Microblog (http://weibo.com), which is one of the top-one online social networking sites in China. Up to the time of writing, there are more than 300 million registered users (with unique IDs) and more than 100 million microblogs per day. The news and topics in Sina Microblog cover all aspects, and therefore it provides a rich dataset to reflect Chinese people’s activities and dynamics. The Sina Microblog data has been studied in [23], analyzing the intermessage time distribution using a simple individual-behavior-based model. In this paper, we study a rich and hybrid model considering both interest and interaction.

In the process of data collection, we randomly select a user as a start point (e.g., the first author’s Sina ID), and this ID’s personal profile and links are crawled by using breadth-first traversal algorithm of the graph. Each user is assigned a serial number according to the download sequence. In addition, the microblogs that each user release, the comments that each microblog obtained, and the relationship between users are crawled. The logical view of the database is shown in Figure 1. There are many-to-many relationship between users, one-to-many relationship between user and microblogs, one-to-many relationship between user and comments, and one-to-many relationship between microblog and comments. There are totally 49,556 user profiles downloaded. Ranging from 2011/08/21 to 2012/02/22, these users send 3,057,635 microblogs during the six months. These microblogs have been commented 185,079,821 times and forwarded 506,765,237 times, respectively. There are 61,880 relation downloaded, which are all the social relationship of the users whose serial number less than 200. It is worth noting that relationship field in relation table means social relation between user A and user B. This field may take three values: 1, 2, or 3, which means A following B(AB), A followed B(A    B) and A following-followed B(A    B), respectively.

3. Statistical Analysis

This section provides the empirical studies on the Sina microblog community. We mainly study human behavior in social network from three sides. At first, we analyze intermessage distribution from individual level and at group level. After that, four basic social elements are proposed based on user behavior data. And the impact of four basic elements on the user behavior is investigated simultaneously. Lastly, the intrinsic relations between these elements are further analyzed. The work of this section is the basis of our proposed model. The detailed work is as follow.

Before the process mentioned above, we first statistically analyze the basic data. Results show that among the total 49,556 users, 45,579 users have posted message. From 2011/08/21 to 2012/02/22, there are 23,100 users posted 3,057,635 messages which have been commented 185,079,821 times and forwarded 506,765,237 times. If represents the number of message one user releases, there are 22,770 users among the 23,100 users when , accounting for 98.571% of the total users. Following the way in [14, 29], we look on 100 as one step, then the users are divided into 10 groups when . Twenty users are randomly selected in each group. Empirical results show that the intermessage distribution in group level obey power law. Due to the lack of space, we could not provide all 10 experimental plots but select one group in Figure 2(a). Figure 2(b) shows the relationship between the power exponent and in each group. We observe that it is a positive correlation between and . Hereafter, intermessage distribution in individual level is analyzed. We employed a random sampling as analysis method. Fifty users are randomly selected in each group. Empirical results show the intermessage distribution of major user obey power law with .

Based on the statistics of the basic data, we further propose four basic elements which drive human behavior in social network system: social pressure, social identity, social participation, and social relation. We use mathematical symbols , , , and to represent them, respectively. Social pressure means the impact on individual behavior by social environment, working conditions, social circle, and other exogenous factors. The manifestation of this effect is the regularity of users’ messaging time and messaging amount. Figure 3(a) shows the relation between messaging time and messaging amount of all the users over 24 hours. The statistical results are fully consistent with the data released by Sina office (http://tech.sina.com.cn/i/2012-05-15/12307109653.shtml.). Figure 3(b) shows similar experiments but focus on individual level with 4 users selected randomly. It can be found that different user has different habit. We consider that these differences reflect user behavior release of individual interest, habits, and hobbies under social pressure.

Social identity means the number of comment that each message attracts. If represents the number of message user releases, and represents the number of comment user receives, then . Figure 4(a) is the cumulative probability distribution of of all the users who have released message. Because of serious long tail phenomenon, Figure 4(b) shows the same experiment result but . It can be found that there are 90.939% users when . Moreover, we take 100 as one step, then the users are divided into 10 groups when . Empirical results show that the intermessage distribution in group level obey power law, similar with the statistical results in Figure 2(a). Unlike Figure 2(b), we find that power exponent does not have positive correlation with . It can be concluded that social identity reaction user endogenous factors such as charisma cannot change user’s interest in the long-term time. However, we found that the most user () interest will be excited in a short time with the surge of in a short time synchronously. Figure 5 shows the message releasing sequence of one user selected randomly, with time scale of original experimental data. The vertical lines represent message number of one day, and the black nodes represent the max comment number of the same day. The figure marks the sync surge of and the message . The results indicate the short-term stimulus effect of social identity. It also shows the significant real-time characteristics of microblog system.

Social participation refers to the proportion of message which a user forwards from others. This parameter reflects the endogenous factors of users such as participating in social events and social topics. If represents the number of message user forwards, then . . Figure 6 is the cumulative probability distribution of of all the users. It can be found that obeys uniform distribution. In addition, we also group users by and analysis of the relationship between it and , and the results showed no significant correlation between them. The results indicate that the cannot change user’s interest but can decide the probability of forwarding message from others or the probability of joining into a debate about social events.

Social relation means the relationship between two users. As introduced in Section 2, for arbitrary two users, and , there are three relations: following, followed, or following-followed. Of course, there is another situation that does not have any relation between the two users. Through statistical analysis, we find that many users mainly have heavy interaction with just few of their friends. In particular, about 60% of the users interact more than 80% message with less than 8 bosom friends. This shows that the major users have their own fixed social circle. cannot stimulate user interest but can affect the probability of interaction between users.

After these basic elements are proposed, the intrinsic relations between them are further analyzed. According to the definition of these elements, each user has a unique , , and and has many with different friends. Besides, impact on user behavior is mainly reflected in the users’ messaging time and messaging amount. Therefore, the main works focus on the relation between , , and . Similar to the method above, the users are equally divided into 10 groups when . Due to the lack of space and for the convenience of visualization, three groups are selected to show the intuitive relation between the three elements in Figure 7. It is worth noting that we deal with the normalization processing on . As shown in Figure 3(b), since of most users is very small, we set variable threshold parameter as 50. when or else. After the processing, both and . Figure 8(a) shows the percentage of the users whose in each group. Figure 8(b) shows the similar to Figure 8(a). We observe that the percentage of the users who are more attractive increases when the number of message grows. While most users lose their social participation when grows. So is can be concluded there is a negative correlation between and when grows.

4. Model

To understand the intrinsic mechanism of human dynamics in social networking, we propose a rich model in this section. This model considers both the endogenous dynamic of an individual (called interest) and the interaction with social environment (interaction); therefore, the model is hybrid. From the interest aspect, the enthusiasm of a person who was active/inactive in contributing to social network is driven by social pressure and social participation. Ming-Sheng and Han et al. have proposed interest-driven human dynamics model for some scenarios such as web browsing and movie watching [21, 22]. However, these models do not figure out the reasons underlying change of interest. These models are based on single agent, not suitable for social network scenarios where they are characterized especially not only by individual behavior but also by the interaction between agents. From the interaction aspect, the behavior of each individual can be affected by the surroundings around us (i.e., the social identity of the neighboring nodes and the social relation with the neighboring nodes). Furthermore, user behavior is also influenced by the significant time-limit characteristics of microblog system. Therefore, we study a hybrid model that combines the impact of interest and interaction in this paper. Moreover, the four basic elements which drive human behavior are highlighted into the model. The key points of the model are as follows.(1)Social Network. People (e.g., registered users in Sina Microblog system) can be formalized as a directed-weighted graph in terms of a social network. , stands for a node set. Each individual user in the network is expressed as a node in , the number of nodes is . , , and are social pressure, social identity, and social participation of node . An directed edge set represents social relationships in the network, that is, N( stands for the adjacent node set of node . is the directed edge if following . () is the node set which is followed by . () is the node set which is following . By definition, we know . means distance from to . is a variable related to . has three possible values, that is, , which represents following , following-followed , and does not follow , respectively. They are three adjustable parameters, and we require .(2)Time Discretization. The time step is discretized in terms of (e.g., one minute in analyzing our Sina datasets). Therefore, people in the social network action/inaction with timestamp (using “minute” as the unit).(3)Action. At each timestamp , for an arbitrary node , the node will release a message with probability . The probability of is related to the , which affects messaging time and messaging amount of users. The value of comes from statistical result as shown in Figure 3. Once launch a new message, the new message will be sent to every queue of neighbor node (). The current timestamp will be recorded as the launch time of the new message .(4)Interaction-Hybrid Interest. For a node , if it does not launch a new message at timestamp , it may comment or forward one message existed in its waiting queue with a probability. Once decides to comment/forward, the message will be deleted in the waiting queue of and a new comment/forward message will be sent to the launcher of the original message. We assume the probability will decrease as time goes by and we use a simple linear decline function to describe this change of interest. On the other hand, from the interaction viewpoint, we join social elements such as social identity of a node into the function. Given the launcher of a message in the waiting queue is , then the probability is (5)Time Limit. From the statistical experimental last section, it is found that microblog system is characterized by its real time. People may change their focus from an old topic to a new topic easily as time goes by. A threshold parameter , which represents max time limit, is set at 1440 min (one day) according to Figures 3 and 5. If a message is not commented or forwarded during , that is, , the message will be dropped from the waiting queue.

Mathematically, given that one message is released by node at , the probability of being commented or forwarded by at time step is

Then

Based on the analysis above, the intermessage distribution of node follows a power law with the exponent . At the individual level, for user . From the empirical experiments shown in Figures 4(b) and 8(a), it is known that , and is usually very small. From Figures 6 and 8(b), we know is a fixed value and . At the group level, the distribution obeys power law, as shown in Figure 4(a). The distribution obeys uniform distribution, as shown in Figure 6.

5. Simulation

To validate our hybrid model, the simulation is divided into two steps. At first, the simulation is carried out in a scenario between the two agents. The purpose of the experiment is to simplify the model, highlighting the effect of basic social elements on human behavior in social network in the individual level. The simplification is reasonable as it has been found that major users have their own fixed social circle in the statistical experimental section. At the second step, we build a network and simulate group behavior based on real user relation data. While emphasizing topology of the real network, principles of human dynamics in the complex system are further studied.

For the scenario of interaction with two agents, it is assumed that they are user and user . As mentioned in Section 4, our model has four kinds of main parameters, that is, , , , and . They correspond to the four basic elements above: is a function of timestamp . Its value comes from empirical experiments. We select the mean value in Figure 3(a) as . and . From the above analysis, is a small positive integer for the major user, and is a little bigger than . We assume , based on the analyzing results in Figure 4. For a specific user, is a fixed value.

From the definition of model, we know , . In order to reflect the interaction between the agent, the social relation between and is assumed to be mutual, namely, .

The time scale of timestamp is set from 0 to 60(m)*24(h)*180(d), which is consistent with the empirical data. The intermessage distribution of user obeys power law, which is shown in Figure 9(a), similar to user . By the above analysis, adjustable amplitude of is the largest of all the parameters. By fixing the other parameters, the effect of parameter on power exponent () is shown in Figure 9(b). We observe that while changed from 0.1 to 10, varies from 0.62713 to 2.9092. The scope covers the range of in the empirical experiments. Theoretically, may be very small arbitrarily, namely, . Actually, there is always some distance with any friend. So it is impossible that is a very small parameters. On the other side, when is larger enough, namely, , the intermessage time distribution starts to lose the power law characteristics. In addition, for major users, is very small and stable. The effect of on is not significant. However, the surge of user behavior is influenced by in the short-term time. The value of affect the amplitude range of . The synchronization surge of and is shown in Figure 9(c), which verifies that our model simulations are consistent with the empirical results in Figure 5. Furthermore, if (we assume that user following user ), will synchronize with one-way only when is big enough, but will not interact with as .

At the second step, we build the network by real relation of Sina users. Human behavior in group level is further simulated. As mentioned in Section 2, 61,880 relations are downloaded, which include all the social relations of the users whose is less than 200 (). The social network of these people is shown in Figure 10. In this graph, edges with black color mean mutual relation and edges with gray color mean single relation. The number above each node represents . Our simulations are based on the network. The users are divided into 5 groups according to the amount of their message . For each node , there are mainly five parameter: , , , , and ). The first four parameters can be calculated from analyzing the experiments. (,) has three possible values, that is, . We set , , and in the simulation. Due to the lack of paper space, the intermessage distribution of one group is shown in Figure 11(a). It can be concluded that the distribution also obeys power law in the group level. The exponential in each group is shown in Figure 11(b), which confirms that our model simulations are consistent with the empirical results in Figure 2(b).

6. Conclusions

Social networking sites like Microblog system (e.g., Sina Microblog in China) provides a unique way for rapid information prorogation and discussion. Research on the laws underlying user behaviors on such social networking sites means a lot in understanding human dynamics, and in turn can provide better services. Traditional studies on such human dynamics are largely limited to a simple model, either trivial interest mechanism or simple interactions with only two agents. In this paper, we first provide a hybrid and rich model that is able to combine the impact of individual interest and interactions among users in a large social network. We try not to simply plug the two parts together but build a stronger model with a sound mathematical integration of various useful parameters during our modeling and simulation. We designed a hybrid model that can fully integrate both sides. Moreover, when we discuss “interactions,” the real network topology features and four basic social elements behind social network are deeply considered. We simulated our hybrid model both with two agents’ scenario and with real social network of multiagent scenario and evaluated it with real-life top-one microblog system in China. We focused on analyzing effect of the basic elements on human behavior. Based on the comparison between our simulation and empirical studies, we observe similar power-law intermessage time distribution using different scenarios. Therefore, our model can offer an understanding of the dynamic mechanism of human dynamics in social networks.

In this paper, the four basic social elements are defined simply, such as social identity is assumed as the average comments that each message attracts. To further improve our hybrid model, we will apply advanced metrics in quantifying those parameters. For example, we will consider link analysis algorithms like PageRank to model node’s social identity. In addition, we will model the evolution of social networks and study its effects on social events, to better understand human dynamics in an evolving social networking context.

Acknowledgment

This work is supported by the National Key Basic Research Program (973 program) of China (2013CB329603), Natural Science Foundation of China (60905025,61074128,61272400,71231002), and partially by program for NCET. Joint Construction Science and Technology Research Program of the Chongqing Municipal Education Committee under Grants of KJ110529, Natural Science Foundation of CQUPT (A2009-39,A2010-13,A2011-16), and Educational Reform Projects of CQUPT (XJG1031,XJG1216) are acknowledged.