Recent Advances in Complex Networks Theories with ApplicationsView this Special Issue
Modeling of Information Diffusion in Twitter-Like Social Networks under Information Overload
Due to the existence of information overload in social networks, it becomes increasingly difficult for users to find useful information according to their interests. This paper takes Twitter-like social networks into account and proposes models to characterize the process of information diffusion under information overload. Users are classified into different types according to their in-degrees and out-degrees, and user behaviors are generalized into two categories: generating and forwarding. View scope is introduced to model the user information-processing capability under information overload, and the average number of times a message appears in view scopes after it is generated by a given type user is adopted to characterize the information diffusion efficiency, which is calculated theoretically. To verify the accuracy of theoretical analysis results, we conduct simulations and provide the simulation results, which are consistent with the theoretical analysis results perfectly. These results are of importance to understand the diffusion dynamics in social networks, and this analysis framework can be extended to consider more realistic situations.
Research on social networks has received remarkable attention in the past decade, since social networks provide numerous features to encourage information sharing among users. Among the existing social networks, microblogging services (e.g., Twitter and Facebook) have impressively become more and more popular, which provide new communication methods for people to stay connected with their friends. The use of microblogging for lightweight communication makes it important candidate media for informal communication.
Twitter is arguably one of the most well-known microblogging platforms currently available, which is used by hundreds of millions of people all over the world. Twitter users update their daily life activities by computers or mobile phones, so as to broadcast things that happen in their daily lives, such as what they are reading, thinking, and experiencing. Users declare the persons they are interested in by the action following. For the case when user follows user , we say user is one of user ’s followers, and user is one of user ’s followees. Twitter users are allowed to post short messages (up to 140 characters) which are so-called tweets and also forward messages which are so-called retweets. Each user has a web form, where all his/her followees’ new messages (both tweets and retweets) are arranged in a reverse-chronological order. So after logging in, a user will get noticed if his/her followees have posted new messages.
Essentially, relationships in Twitter are asymmetric, since a user who is followed by another user does not necessarily have to reciprocate by following him/her back. Some social networks adopt symmetric relationships. For example, in Facebook, a relationship is established when a request for friendship is accepted by a user, which adds both on each other’s contact lists. If one user removes the other, the relationship is broken. Therefore, an important difference between these two social networks is that the network of Twitter is directed, while that of Facebook is undirected. Having noticed the increasing popularity of Twitter, we take Twitter-like social networks into account in this paper.
Compared with traditional media such as newspapers and television, social networks allow creation and exchange of user-generated contents, while every user can produce and distribute messages. This results in an explosively growing amount of information and makes many social networks become increasingly information saturated. Besides, due to the potential for marketing and advertising, Twitter and some other social networks are considered to be efficient approaches to stimulate the awareness and adoption of products or services. One important benefit of these social networks is that the costs of generating and transmitting information are almost negligible, so advertising messages can reach wide audiences within a short period of time . This also leads to a large volume of advertising information. However, due to the limitation of information-processing capability, if the messages arrive in numbers larger than what users can process, some messages will be lost without catching users’ attention, where information overload occurs . Under information overload, users will find it difficult to find useful messages according to their personal interests, which actually has a serious negative impact on the user experience. Therefore, to understand and then address the information overload issue arising in social networks, it is of importance to model and analyze the process of information diffusion under information overload, which is the focus of this paper.
Most research on diffusion dynamics in social networks has focused on the spread of one phenomenon at a time, for example, diffusion models for disease , influence , knowledge , and cooperation . Recently, some researchers have begun to study competitive diffusion, which models the process that multiple competitive epidemics , influences , or phenomena  diffuse through a complex or social network. These problems are somewhat similar to the one considered in this paper, but they fail to characterize the information overload phenomenon in social networks, where every user can generate new messages. In our previous work [10, 11], we study the process of information diffusion under information overload in Facebook-like social networks. We know that the network structure of Twitter is very different from that of Facebook. Besides, considerable effort has been devoted to alleviate the information overload syndrome, where filter-based or cost-based approaches are usually adopted [12–14]. However, to the best of our knowledge, there is no prior work which seeks to model and analyze the process of information diffusion under information overload for Twitter-like social networks.
The remainder of this paper is organized as follows. We describe the models in Section 2 and analyze the process of information diffusion under information overload in Section 3. To verify the accuracy of theoretical analysis results, we conduct simulations and provide the simulation results in Section 4. Finally, we conclude this paper in Section 5.
2. Model Descriptions
In this section, we propose models to capture the characters of Twitter-like social networks, such as network, user behaviors, and information diffusion under information overload. Based on these models, we can analyze the process of information diffusion under information overload theoretically.
We consider Twitter-like social network as a directed network, where nodes represent typical users and links represent the relationships between pairs of users. Note that a user who is followed by another user does not necessarily have to reciprocate by following him/her back. We let the direction of a link be the same as the direction of information diffusion. For example, in Figure 1, user is followed by users , , , and , where the update messages of user can be received by users , , , and , and user can only receive the update messages of user .
Since isolated users never get involved in the process of information diffusion, we neglect all the isolated users and classify the rest of users into different types according to their in-degrees and out-degrees; that is, a user with in-degree and out-degree is of type , where . For type users, we define to be the probability that a randomly chosen follower is of type . Then, we have , , and We further define to be the fraction of type users in the network, and we get
Consider the ensemble of networks in which the distributions and take specified values. This defines a random graph model similar to the random graphs defined in [15, 16]. That is to say, the network is drawn uniformly at random from the ensemble of all possible networks with the distributions and . For users, we denote by the maximum number of in-degrees and by the maximum number of out-degrees. Then, this network can be characterized by the tensor and the matrix . Note that in a Twitter-like social network, users usually have moderate numbers of followees due to attention limitation. So, we usually have .
2.2. User Behaviors
In Twitter-like social networks, different functions are adopted to diffuse information. After logging in, users can post tweets to broadcast things which happen in their daily lives. There are also other functions such as reply and retweet which allow users to interact with their friends. In this paper, we generalize these behaviors into two categories: generating and forwarding; that is, users can generate new messages or forward messages generated by other users. Note that forwarded messages can still be forwarded.
To model the user ability of message processing under information overload, we introduce the term view scope, which indicates the messages a user can process at a time. Note that for users in Twitter-like social networks, messages are listed in a reverse-chronological order. So for a user with view scope number , if information overload occurs, he/she can process (i.e., browse) the latest messages after logging in, while the former ones are lost. In this paper, we assume homogeneous view scope number, which is , for all users.
To model user behaviors, we make the same assumptions as .(i)The process of user login follows a Poisson process with rate .(ii)After logging in and browsing the messages, a user may choose to log off or react to these messages (i.e., generate or forward a message), while the reacting probability is .(iii)Among the reacting actions, users may choose to forward a randomly chosen browsed message with probability or generate a new message with probability .
Actually, user online activities may be bursty, and users may generate or forward multiple messages at a time. However, we make these assumptions to simplify the analysis here and plan to extend this analysis framework to more realistic situations in our future work.
2.3. Information Diffusion under Information Overload
Under information overload, messages are arriving in numbers larger than what users can process, and some messages are lost without catching users’ attention. We use Figure 2 to illustrate the evolvement of view scopes under information overload. Suppose user is followed by other users, such as users , , and . The view scopes of these users are depicted in Figure 2(a). After user processes the messages in his/her view scope (i.e., , ,, and ), he/she may generate a new message or just forward a message in his/her view scope. No matter which action is chosen, this message (say ) will be placed at the top of all his/her followers’ view scopes, and the messages at the bottom of his/her followers’ view scopes (i.e., , , and ) will be discarded due to the information overload effect, which are depicted in Figure 2(b).
One may argue that the view scope of user should be cleared after he/she has processed all the messages. However, for simplicity we assume memoryless users here. That is to say, processed messages can still be processed as long as they are in the view scope. We will model the behaviors of users with memories in our future work.
3. Performance Analysis
In this section, we analyze the process of information diffusion under information overload based on the proposed models. Specifically, we are interested in the information diffusion efficiency, which is characterized by the average number of times a message appears in view scopes after it is generated by a type user (say ). To achieve this goal, we first calculate the average number of times a message is forwarded by a type user after it arrives in this user’s view scope (say ).
3.1. Calculation of
Since users log in following a Poisson process with rate , we know that the probability that a user logs in and then generates or forwards a message within a time slot, which is of length , is . Consider a type user (say user ). Note that he/she is memoryless and he/she may choose to forward a randomly chosen message in his/her view scope with probability after he/she decides to react to the browsed messages. So if a message (say ) is in his/her view scope, the average number of times that he/she will forward this message in time slots is
The followees of user will generate or forward messages, which will be placed at the top of his/her view scope. Let , and then, for user , the probability that multiple followees generate or forward messages in the same time slot can be neglected. So the probability that a new message arrives in user ’s view scope in a time slot is . Note that message will be discarded after new messages arrive. Then the probability that message will stay in user ’s view scope for time slots is
Therefore, the average number of times that user will forward message is
From (5.56) at [17, page 199], we get
So, we have
Remark 1. Intuitively, the larger the view scope number is, the longer a message stays in the view scope and the more this message is forwarded. However, from (7) we find that is unrelated to . This is because larger will lead to more messages stored in the view scope, which reduces the probability that a given message is chosen to be forwarded in a time slot.
3.2. Calculation of
Consider a type user (say user ) and suppose his/her followers are divided into some partition , where is the number of type followers and The probability that the partition takes a particular value is given by the multinomial distribution 
We define the generating function as the distribution of the number of times a message appears in view scopes after it is generated by a type user and the generating function as the distribution of the number of times a message appears in view scopes after it arrives in the view scope of a type user. Then where is the Kronecker delta function, and
Then, by submitting (9) into (13) and performing the sum over , we have By solving this equation, we can derive the distribution of the number of times a message appears in view scopes after it is generated by a type user. However, here we just calculate the average number of times, which is
We know that users with out-degree 0 can generate or forward messages, but no one can receive them. So, and . We further know that users with in-degree 0 never forward messages. That is to say, never contributes to the right of (15). So we can first calculate where and then get from (15).
We know that becomes an tensor for , which is still hard to handle. We rearrange the elements of this tensor so that they form a matrix, which is called matricizing . Specifically, we let
We can write (17) in matrix form and get where
So we get
Remark 2. From (20), we observe that is determined by and and is unrelated to other factors such as , , and .
To verify the accuracy of theoretical analysis results, we conduct simulations and provide the simulation results in this section. We first take into account a directed ER network and then a growing network model, which generates directed and degree-correlated networks.
The simulations are conducted in a discrete fashion. Specifically, time is slotted, and in each time slot a random user is selected to generate or forward a message. Denoting by the user number, each simulation is run time slots, where . That is to say, each user will be selected times on average to generate or forward messages. We further set and .
4.1. Directed ER Network
In the directed ER network, we let the user number and the average user in-degree (or out-degree) . That is to say, each link is included in the network with probability .
The results for from simulations and theoretical analysis (i.e., (20)) are depicted in Figure 3. To quantify the gap between these results, we plot the differences in Figure 4, from which we know that the theoretical analysis results coincide very well with the simulation results.
4.2. Growing Network Model
Degree correlations among nodes in a network essentially characterize the network structure, while many real-world networks show degree correlations [19–21]. In particular, social networks show assortative mixing, that is, a preference of high-degree nodes to be connected to other high-degree nodes [15, 16].
To generate directed and degree-correlated networks, we adopt the growing network model proposed in , where in each step the probability of adding a new node and creating a link from one of the earlier nodes (say ) is and the probability of adding a new link and connecting two old nonlinked nodes (say from to ) is where is the node set, () is the out-degree (in-degree) of node , and parameters must obey the constraints and to ensure that each node will be chosen with positive probability.
Here, we set , , and to generate a degree-correlated network. The distributions of in-degrees and out-degrees are depicted in Figure 5, from which we know that in-degrees and out-degrees follow power law distributions. The degree correlations are depicted in Figure 6, from which we know that degree correlations at users are evident, while high in-degree users usually have high out-degrees. However, the degree correlations at links are not so obvious.
Simulation results for are depicted in Figure 7(a), while the theoretical analysis results are depicted in Figure 7(b). We also plot the differences in Figure 8, from which we know that the theoretical analysis results are quite consistent with the simulation results, especially for users with low in-degrees and out-degrees. However, even for the only user who is of type , the value of difference is about 6, which is very small compared to the value of .
Having noticed the increasing popularity of Twitter and negative influence of information overload, we take Twitter-like social networks into account and propose models to capture the characters such as network, user behaviors, and information diffusion under information overload. Based on these models, we analyze the process of information diffusion under information overload theoretically, and the accuracy of theoretical analysis results is verified by simulations. These results are of importance to understand the diffusion dynamics in social networks and of use for advertisers in viral marketing. However, to simplify the analysis, we make some assumptions such as Poisson arrival and memoryless users, which seem to be unrealistic. We seek to extend these models to characterize more realistic situations and validate the theoretical analysis results by empirical evidence in our future work. Besides, the impact of degree correlations on spreading dynamics appears to be nontrivial , and it is demonstrated that degree correlations strongly influence information diffusion . Another future work of this paper is to analyze the impact of degree correlations on the information diffusion under information overload.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported in part by the National Natural Science Foundation of China under Grants nos. 61105124 and 71331008, the Research Fund for the Doctoral Program of Higher Education of China under Grant no. 20114307120023, and the China Scholarship Council under Grant no. 2011611534.
K. Koroleva, H. Krasnova, and O. Gunther, “‘stop spamming me!’-exploring information overload on facebook,” in Proceedings of the 16th Americas Conference on Information Systems (AMCIS '10), Lima, Peru, August 2010.View at: Google Scholar
R. M. Anderson and R. M. May, “Population biology of infectious diseases: part I,” Nature, vol. 280, no. 5721, pp. 361–367, 1979.View at: Google Scholar
A. Borodin, Y. Filmus, and J. Oren, “Threshold models for competitive influence in social networks,” in Proceedings of the 6th International Conference on Internet and Network Economics (WINE '10), pp. 539–550, Stanford, Calif, USA, 2010.View at: Google Scholar
M. Broecheler, P. Shakarian, and V. S. Subrahmanian, “A scalable framework for modeling competitive diffusion in social networks,” in Proceedings of the 2nd IEEE International Conference on Social Computing (SocialCom '10), and the 2nd IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT '10), pp. 295–302, August 2010.View at: Publisher Site | Google Scholar
E. Solan and E. Reshef, “The effect of filters on spam mail,” Discussion Papers 1402, Northwestern University, Center for Mathematical Studies in Economics and Management Science, Evanston, Ill, USA, 2005.View at: Google Scholar
R. E. Kraut, S. Sunder, J. Morris, R. Telang, D. Filer, and M. Cronin, “Markets for attention: will postage for email help?” in Proceedings of the 8th 2002 ACM Conference on Computer Supported Cooperative Work (CSCW '02), pp. 206–215, New Orleans, La, USA, November 2002.View at: Google Scholar
J. Cheng, A. Sun, and D. Zeng, “Information overload and viral marketing: countermeasures and strategies,” in Lecture Notes in Computer Science, vol. 6007, pp. 108–117, 2010.View at: Google Scholar
M. E. J. Newman, “Assortative mixing in networks,” Physical Review Letters, vol. 89, no. 20, Article ID 208701, 2002.View at: Google Scholar
M. E. J. Newman, “Mixing patterns in networks,” Physical Review E, vol. 67, Article ID 026126, 2003.View at: Google Scholar
R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete Mathematics, Addison Wesley, Reading, Mass, USA, 1994.
R. Pastor-Satorras, A. Vázquez, and A. Vespignani, “Dynamical and correlation properties of the internet,” Physical Review Letters, vol. 87, no. 25, Article ID 258701, pp. 1–4, 2001.View at: Google Scholar
M. E. J. Newman and J. Park, “Why social networks are different from other types of networks,” Physical Review E-Statistical, Nonlinear, and Soft Matter Physics, vol. 68, no. 3, Article ID 036122, pp. 1–18, 2003.View at: Google Scholar