Abstract

As the network technique is fast developing, the microblog has been a significant carrier representing the social public opinions. Therefore, it is important to investigate the propagation characteristics of the topics and to unearth the opinion leaders in Micro-blog network. The propagation status of the hot topics in the Micro-blog is influenced by the authority of the participating individuals. We build a time-varying model with the variational external field strength to simulate the topic propagation process. This model also fits for the multimodal events. The opinion leaders are important individuals who remarkably influence the topic discussions in its propagation process. They can help to guide the healthy development of public opinion. We build an AHP model based on the influence, the support, and the activity of a node, as well as a microblog-rank algorithm based on the weighted undirected network, to unearth and analyze the opinion leaders’ characteristics. The experiments in the data, collected from the Sina Micro-blog from October 2012 to November 2012 and from January 2013 to February 2013, show that our models predict the trend of hot topic efficiently and the opinion leaders we found are reasonable.

1. Introduction

Microblog is another important network information interactive and propagative platform after blog. It is based on the network and communication technology. There are considerable advantages on the speed and space of information propagation as well as on the breadth and the depth of reports. Microblog opinion leaders rely on their microblog amount and quality to raise a drastic group debate through setting discussion topics on this free and open platform. They even cause the attitude shaping, turning, and action following. According to the statistics, among the Chinese Internet users, microblog users older than 19 years old occupy until September 20, 2012. The number of the microblog users is about 327 millions [1]. Microblog has been a crucial network tool for information propagation. Therefore, it is important to predict topic law and propagation trend in microblog network and study the opinion leaders in topics. It will contribute to design corresponding mechanisms to guide and control the propagation process.

Nowadays, researches about topic diffusion law have obtained high attention, which are mainly related to the time varying model [2, 3]. Zhao et al. [2] put forward a propagation model in discrete time based on the node popularity and liveness. Zhang et al. [3] used epidemic model for reference to deduce both the BBS and the blog multimodal topic propagation models as well as the multimodal ones. Yan et al. [4] proposed an extended susceptible-infected (SI) propagation model to incorporate bursty and limited attention. Chen and Gao [5] defined some authority nodes that release anti-rumor information as the prevention strategy to control the rumor in a directed microblog user network. And some works predicted diffusion probabilities by independent cascade (IC) model [6, 7]. Afrasiabi and Benyoucef [8] observed that the effect on propagation of people who are not either in a friendship network or a subscription network is higher than that of friends or subscribers. Yoganarasimhan [9] studied how the size and structure of the local network around a node affect the aggregate diffusion of products seeded by it.

Identification of opinion leaders has been widely concerned. Zhai et al. [10] gave many kinds of recognition methods in their work, while there are three research methods in opinion leader recognition: firstly, an analytical method based on the characteristic attribute, for instance, AHP method [11] and TOPSIS method [12], and an improved mix framework for opinion leader identification [13]; secondly, a method based on the cluster analysis, such as opinion leader recognition with K-means clustering method [14]; thirdly, a method based on social network analysis, including the PageRank algorithm [1517] and HITS algorithm [17]. However, the propagation model [2] simulated certain topic propagation process accurately. Its effects are not satisfactory when topics contain subevents. PageRank algorithm [17] only considered the interactive relationship between users but did not consider a user’s own authority.

This paper censuses and analyzes data of four hot topics of Sina Microblog, which has the most users in China. We first build a time-varying model with the variational external field strength to simulate the topic propagation process in Section 2. This model also fits for the multimodal events. Then, we build an AHP model based on the influence, the support, and the activity of a node, as well as a microblog-rank algorithm based on the weighted undirected network to unearth and analyze the opinion leaders’ characteristics in Section 3. The experiments in the data, collected from the Sina Microblog, show that our models predict the trend of hot topic efficiently and the opinion leaders we found are reasonable.

2. The Hot Topic Propagation Model

Hot topic refers to the hot issue that the public most care about within a certain time and range. In recent years, most issues come to public attention through the Internet. This paper takes Sina Microblog as the background and takes the hot topic as research object. This research observes the characteristics of the dynamic propagation process and may digs the opinion leaders.

2.1. The Hot Topic Propagation

The propagation velocity of hot topic is wide and quick. In order to collect the real-time and more complete microblog data, we use Rweibo to grab the Sina Microblog data automatically. Rweibo is a software development kit of R language, which implements the interface provided by Sina microblog. The data refers to the numbers of talking about these hot events on Sina Microblog. We analyze the quantity change of 4 events, 40 days after happening. The 1st event is about Yuan Lihai’s adopting those abandoned babies and orphans. The 2nd event is about the PM2.5 haze in China. The 3rd event is concerning the Diaoyu Islands. And the 4th event is concerning the 2012 Nobel Prize for literature which Mo Yan was awarded.

As shown in Figure 1, after these incidents, the rate of the amount of daily posting is easily seen. The figure’s horizontal axis shows the days of these events and the vertical axis shows the percentage of the amount of daily posting and the total number of microblogs that the users participate in discussing one topic in the network.

In event 1, the number of its microblog posts peaks in a day, which shows the timeliness of microblog. The number of microblog postings on event 2 shows the first peak from the 5th day to the 7th day. The National Meteorological Center of CMA issued a haze alert so that the second peak occurred after the 22 day. The number of microblog posting peaked in the 16th, since Japan deployed fighter plane to prevent Chinese plane from flying in the Diaoyu Islands on the 14th day, and the USA has long interfered with this event. After Mo Yan was awarded the 2012 Nobel Prize for literature, the number of postings on the event doubled. We can see the development trend of the event through the number of microblog every day. The data we collected is completely matched with the actual situation. As shown in Figure 1, event 1, event 3 and event 4 belong to the single-peak events. They meet at the peak, and the propagation rate spread slowly so they died in about 30 days. Otherwise, event 2 belongs to the multipeak event. Its propagation rate has two peaks, and the first one is higher than the second one. Therefore, the data collection of identifying the opinion leaders’ needs to last for at least 30 days after the first peak appeared.

2.2. The Hot Topic Propagation Model

Let the undirected graph represent the actual propagation network, where is the set of microblog nodes, is the set of the edges of connecting the users, and is the set of authority value. We suppose that any two nodes can communicate with each other and the microblog network is a fully connected undirected graph. Zhao et al. [2] proposed a discrete time dynamic model for bursty propagation of incidental events. We build a time-varying model based on Zhao’s model with the variational external field strength to simulate the topic propagation process.

Assume that represents the total number of microblogs that participate in discussing one topic in the network. Let be the initial time and let be the unit time. Let be the posted microblog numbers at and let be the new posting microblog number in . Namely, We mainly discuss the statistical properties of and the change trend of by the simulation.

The authority value of the user in the actual network is average value through the normalization of friends count, fans count, and microblog count. After checking the actual data of four events, we know the authority value follows the power-law distribution. Let the authority value of the user be . Its distribution is which follows the power-law distribution, and the power law is at . Therefore, the authority probability density function is defined as where is 1.5 at and is a parameter.

The node state is divided into the published microblog and the unpublished microblog. The function represents the state of microblog at . Consider the following:

The topic field strength formed by internal nodes in the network is defined as where is the authority value of node .

In fact, we can obtain the topic from the external network information. With the time passing, the external field strength will improve over time above a fundamental level and then tend to be stable. Because the external field strength is limited to the environmental capacity, we assume that the external field strength follows the logistic model partly. Suppose is a parameter related to the rate of the initial external field strength changing and is the fundamental level. The external field strength formula is as follows:

In practice, some events contain two or more subevents. For example, event 2 contains two sub-events: “The National Meteorological Center of CMA issued a yellow haze alert on the 5th day” and “haze is enshrouded in eastern and midland China on the 21st day.” The subevent can lead to a high propagation rate.

Therefore, on the event day, the simulation system is reset by the certain proportion. Namely, we turn some of nodes’ state from published to unpublished when the first day of the second sub-event of each event comes. According to the actual situation, the occurred event time is known, saying that to set the occurred sub-event time is reasonable.

If the microblog gets the topic information from the network at , the probability of the unpublished state transformed into the published state is

The different topics have some differences on the microblog number. In order to see the trend, we perform normalization to the propagation data; namely,

In order to judge the simulation effect, we define the mean square error as the error function: where represents the actual normalized data,

2.3. Simulation

We set the following steps in Algorithm 1 to simulate the process of the topic dynamic propagation.

Function Topic Propagation{
Initialize
.
.
.
While
.
 if (the new sub-event occurs){
 if (rand() 0.5){
.}
 }
.
While
   .
  if (rand() & ){
   .
   .}
   .
 }
.
.
}
.
Return
.
}

After collecting the real data of event 1 to event 4, we use the computer program to estimate optimal parameters within a reasonable range of parameters. The result is shown in Table 1.

Zhao’s algorithm [2] aims at the sudden accidents that do not contain sub-events. Accordingly, we give out the parameters in this algorithm, as listed in Table 2.

We work out the average error and minimum error of our algorithm and Zhao’s algorithm in 1000 tests. Figure 2 and Table 3 are the algorithm comparison of events 1, 3, and 4.

The two algorithms have the better results in unimodal topic propagation. Event 2 contains sub-event, so the result has the obvious difference. As shown in Figure 3 and Table 4, our algorithm has better results on the precision.

3. Opinion Leader Identifying Model of Topics Network

Now, microblog, which is known as the most deadly public opinion carrier in network, creates a new era of the Internet media. With the emergence and prosperity, microblog not only provides a new platform to the traditional opinion leaders but also provides the fertile soil for the growth of the emerging opinion leaders.

3.1. Microblog Dataset

From the section above, we discuss the topics of how to propagate in the microblog network. We know that a topic will last for about 30 days. So the opinion leaders may appear in 30 days after the incident occurred. Therefore, we only dig out information in that period on the web. The data we use in this paper is about 3 hot topics in January 2013 and the event that Mo Yan awarded the 2012 Nobel Prize for literature, as shown in Table 5.

The details information of each microblog is as follows:(1)microblog: ID of microblog, the number of comments, the number of forward, the text of microblogs, the length of microblog, the posting time;(2)author: ID of user, the number of fans, the number of friends, the number of microblogs;in addition, we also collect information of comments about the event 4;(3)comment: ID of comment, the text of comment, and the length of comment, the posting time.

Through Figure 4, we can see that the number of forward and the number of comments satisfy the power-law distribution and the exponent is in . It proves that the communication networks of these events are scale-free networks, and only a few users have much focus, so opinion leaders possibly exist.

3.2. The Method of Identifying Opinion Leader

Although the theory of opinion leader has been widely used in different fields, the judgment standards of opinion leaders are divergent. There are three traditional methods of finding opinion leaders: questionnaire, self-report, and observation, but the cost of these methods is too high. Sina Microblog is a platform for information exchanging, so users can show their opinions to others by commenting and forwarding microblog. Users communicate with each other through commenting and forwarding microblog. Interaction provides a lot of data to support our research on opinion leaders. According to the definition of opinion leaders proposed by Paul Lazarsfeld, opinion leaders should be very active and have much influence in some topics. Therefore, we should analyze microblog opinion leaders from three aspects: influence, support, and activity. The more influence the users have, the more response they obtain by posting information and influence for the other users accordingly. In addition, opinion leaders should take an active part in discussing any topics and interact with other users such that it is more likely to show their own ideals to others.

In this section, considering these three aspects and combining the characteristics of microblog spreading, we extract features of opinion leaders. Then, we identify and analyze opinion leaders using methods based on the PageRank algorithm and the analytic hierarchy process (AHP).

3.2.1. AHP

In this assessment system, we set 3 one-class indexes and 7 two-class targets, as shown in Table 6. The value of two-class targets is normalization of the actual data.

Since each two-class target of the same one class target is equally important, equations of 1-class targets are as follows: Every two-class target is a normalization of actual data. The formula of normalizing is where is original data of two-class target. Before normalizing the target , we should use an equation to measure it. The equation of posting time is . And we set be be Jan. 4th, 2013. Supposing is a parameter, we make it 0.01. Therefore, the value of assessment about user is where is the vector of weight and (see Algorithm 2).

Function AHP{
Initialize
: The number of nodes
: the value of influence about user .
: the power of support about user .
: the value of activity about user .
.
.
While
.
.
Return
 The Top opinion leaders .
}

3.2.2. Microblog-Rank Algorithm

The recognition method of PageRank algorithm is a method based on graph theory. It identifies whether the users are opinion leaders through studying the comments and reviewed numbers among the microblog users and considering the influence of the microblog users. Thus, the microblog opinion leaders are those users who have higher influence, get more comments to their microblogs, actively comment on others’ microblogs and form a frequent interaction with surrounding people.

According to the above description, a microblog network on a certain topic can be defined as an undirected network with edge weight and node strength . A node in set means a microblog user. Set is an edge set, where edge , which means a relationship of the comments between the user and the user . The means the edge weight between the node and the node , which is the number of comments between the user and the user . Meanwhile, in the actual network, the users have different influential power, such as friends count, fans count, and microblog count, so we should add a node strength to measure it. As shown in Figure 5.

PageRank algorithm is one of the top ten classical algorithms in data mining. It assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of “measuring” its relative importance within the set. Assume that user in the microblog network has interactive behavior with others; we define the user’s opinion leader value (Microblog-Rank, MR) as follows.

Microblog interactive network is a weighted and undirected network. Firstly, we like to give out the weight of links and nodes. The formula is In (13), is their own influence value measures by normalization of initial data, such as the number of fans, the number of friends, and the number of microblogs. Then, we get the sum of them. is the number of communications between the user and the user . the Microblog-Rank value for any node can be expressed as follows: This section is based on the weighted network, so we calculate the MR value by the weight. In addition, because dangling links exist in the actual network, which has no reply link, it will lead the algorithms to be not convergent. Therefore, we add the damping factor , and this factor should be set between 0 and 1. And is always (see [18]). By the iteration, we can get all the users’ MR values (see Algorithm 3).

Function Micro-blog-rank{
Initialize
: The accuracy of convergence .
: The number of nodes.
: Influence of user .
: The number of communication between user and user .
: Weight of edge to · .
: Initial value of , .
While
.
Return
 The Top opinion leaders, .
}

3.3. Actual Examples
3.3.1. The Event of “Mo Yan Being Awarded the Nobel Prize”

On October 11, 2012, Beijing time 19 o’clock, the 2012 Nobel Prize for literature was announced and Chinese writer Mo Yan was awarded. This event has received wide attention in China. We try to explore the influence of the emergencies among college students. We collected 703 microblogs in total. The data set covers 698 pairs of comment relationships and involves 1171 users. Then, we establish a microblog interactive network based on reply relationship. Figure 6 is the degree distribution of the network, the abscissa is the number of degrees, and the ordinate is the percentage of each degree in the network.

In Figure 6, we see that the microblogs network of reply relationship is a scale-free network, and it satisfies the power-law distribution. Isolated users that did not participate in any replies account for nearly 45%, and only one person received 40 replies.

Using MATLAB R2009a, we calculate the MR value for each user and pick up the opinion leaders who are the users whose MR value is in the top 1%; the others are general users. Furthermore, the opinion leaders are visualized in the interactive network by UCINET6.0. Table 7 gives out the opinion leaders of the event “Mo Yan.”

In Figure 7, blue nodes represent general users, while red nodes represent opinion leaders.

In order to analyze the relationship between scale and influence of opinion leaders, we draw a picture to show that. In Figure 9, the influence increases quickly, when there are less than 15 opinion leaders. If the number is more than 30, the influence is not changing obviously.

From Figure 8, we know that, when is more than 25, the value of each parameter of opinion leaders tends to be stable. When is less than 10, each parameter value changes greatly. Therefore, parameter should be from 10 to 20. It is reasonable to let be 12 and our results in Table 7 are reasonable.

Through the above analysis, we found that the opinion leaders of microblog in an accident should have high value in the number of fans, and the number of forward, the number of comments. Because the more fans an author has, the more users can see the microblog. And high numbers of forward and comments mean that the microblog will get much attention on the Internet. So the result is reasonable.

3.3.2. Three Hot Topics in January, 2013

Opinion leaders must be those who can give guides in topic discussions and attract more attention. Therefore, we set the weight of the Support to the maximum. In addition, opinion leaders should be those who are active in the topic discussions. Therefore, we set the active to the second most important parameter. The detailed weights are set as the above Table 8.

In order to measure the effectiveness of the algorithm, we use AHP and TOPSIS method [12] to obtain the Top 10 opinion leaders in these three events. The results are shown in Tables 9 and 10.

According to our analysis, the opinion leaders are all those who possess prominent values on one or more attributes (Figures 10, 11, 12, and 13). Their integrated ranks are prior to others.

In event 1, users of the top 10 opinion leaders are in this list all the time. But their ranks have a little difference. All opinion leaders perform outstandingly on more than one attribute.

In event 2, the first leader and the second one performs outstandingly on many attributes; however, others merely possess high values on the last two attributes. Moreover, values of parameters of the last six leaders are close to each other.

In event 3, opinion leaders that we obtained all perform outstandingly on “release time” attribute and “microblog length” attribute. From Figure 13, we can come to the conclusion that current affairs such as “Diaoyu Islands” are related more closely to the time and opinion leaders that often appear in the several days after the topic just occurred.

Above all, the results obtained by these two methods are similar. So it proves that the AHP method is cogent and effective. In the TOPSIS, we need firstly to find out the positive ideal solution and the negative ideal solution [5], but this is not needed in the AHP. Therefore, the AHP is simpler and more convenient.

3.4. Opinion Leaders

From the results we recognized, we know that opinion leaders consist of the following kinds of users.

(1) Official microblog users of mass media, including magazines, newspapers, and TV stations such as “Youth Digest,” “Entrepreneurial state magazine,” “Oriental Morning Post,” and “China News Weekly,” all belong to the news media or the literature media. Mass media’s understanding to the events is more authoritative and deeper than others and could attract more attention from web surfers.

(2) Public figures, such as the radio program host “Guo Chendong,” the chairman of the HIERSUN diamond agency “Li Houlin,” the radio program host “1011 Zhang Chi,” magazine editor “Zhou Jiangong,” and the litigant of the “a post-90s girl who showed off her books” “Chongqing Weizi,” possess certain social influence and their expressions in microblog attract more attention from others. Thus, their possibilities to be opinion leaders are much bigger than common users.

(3) Microblog users in fields related to the emergencies. “Yuan Lihai’s adoption” are about public welfare assistance; therefore, public welfare microblog user “powerful mouse v” exists in opinion leaders; “PM2.5 haze in China” is an event about environment problem; thus, microblog users on environmental protection such as “Moruier Air Purifier” and “Sina Environmental Protection” exist in opinion leaders; “Chinese Diaoyu Island” is politics military hot topics; therefore, “Nothing God 2430” in the field of current affairs and “Nucleon Submarine Chaser” on military field also come to be opinion leaders. Because the event “Mo Yan won the Nobel Prize” belongs to the topics on the cultural fields, students are more concerned about it. Thus universities’ official microblog user such as “Jinan University” may be an opinion leader in this field. They are more authoritative, their understandings are more deeper, and they possess more prestige so they attract attention more easily than common users.

4. Conclusion

A research of topic propagation characteristics and identification of opinion leaders is important to the guidance of public opinion and rumor control. In the business world, this influence can be put to commercial use. This paper constitutes time-varying hot topic propagation model and models of identifying opinion leaders based on AHP and PageRank algorithm. We use Sina microblog’s data of four events to validate and get rational results. However, there are several points that should be improved. We can extend in the following aspects.(1)On the spread of topics, the number of parameters is a bit big so it is hard to find out accurate value of parameter to fit. We can consider more about the connection between parameters and actual data and simplify the parameters.(2)This paper considers the opinion leader identification from the aspect of microblog users rather than microblog contents. Therefore, text recognition can be added to truly reflect users’ attitude to topics in the future.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the anonymous referees for their helpful comments on an earlier version of this work. This work was partly supported by the National NSF of China (no. 11071089).