Abstract

It is valuable for the real world to find the opinion leaders. Because different data sources usually have different characteristics, there does not exist a standard algorithm to find and detect the opinion leaders in different data sources. Every data source has its own structural characteristics, and also has its own detection algorithm to find the opinion leaders. Experimental results show the opinion leaders and theirs characteristics can be found among the comments from the Weibo social network of China, which is like Facebook or Twitter in USA.

1. Introduction

With further study, the definition of opinion leader expands. It involves not only the most influential person but also the most influential commentary. The finding and detection of opinion leader in social network have great commercial and political values. By identifying the most influential person, companies or governments can use this feature for selling or guiding public opinion, respectively. Additionally, detecting the most influential comments is also able to understand the source of public opinion formation process. By building multiple topic networks, this essay can detect opinion leaders with the algorithm of POLD (Positive Opinion Leader Detection). Some researchers had found that some ideas in control field such as data driven [16] and robust control [713] could be into the study of find and dynamic detection of opinion leaders in social network. However, this idea only stays on the think level. Therefore, this work will propose a Dynamic Opinion Rank algorithm to find the opinion leaders in the comments of Chinese news. By using the methodology, it can find the most influential comments from all the network comments and the most influential users form the entire user network.

2. Problem Formulation

A single theme network based Weibo news consists of three levels. Those levels are themes, comments, and users. There exist some relationship mappings between those different levels. For example, the mapping between themes and comments is 1 divided by , while the mapping between comments and users is divided by . This study will analyze a single topic, build a single view based network and the mathematical model of users, and then find out the most influential comments and users. The structure of those three levels is shown in Figure 1.

As shown in Figure 1, there are three levels. “Layer 1” stands for all the themes of news, “Layer 2” denotes the single-topic network , which is composed of comments, while “Layer 3” is the single-topic user network . Based on , it is possible to fine the most influential comments cmt3. Then, by using the mapping between comments and users, it can find out from .

Definition 1 (the most influential comments). For the comment sets , is a single point of network theme. After sorting by using some algorithms, there is sore for any comment . Based on the sort according to theirs scores, it can be assumed that . Then, the comments with the highest score are defined as the most influential comments; these are also the opinion leaders of comments [14].

Definition 2 (the most influential user). As in Definition 1, for the user set , each user has its own score . Sorting those scores, it follows that . Then, the user with the highest score is the most influential one; it is defined as the opinion leader of users [15].

3. Attitude Stabilization Controller Design

3.1. Analysis of Emotions

The work of finding the positive and negative emotion links requires determining the propensity of emotions. Based on the HowNet dictionary, this work will firstly determine the emotional feelings of tendentious comments [16]. Comments are usually classified according to emotional bias: positive (1), negative (−1), and neutral (0); this is shown in Table 1.

According to the preceding definitions, any comments can be divided into statements, such as . Then, it applies ICTCLAS to split each statement into one word, such as [17], and it extracts emotion words from emotion dictionary and obtains the number of statistical statements containing negative words, such as “No.” Usually, emotional value of words is set as 1, −1, and 0. Finally, accumulating all the emotional words in , it can obtain the emotional value of the statement. Using the parity negative word sentence to correct the statement of emotional tendencies, it yields a final statement , while the cumulative review of all statements yields final emotional tendencies .

3.2. Modeling of Single-Topic Network

By using the explicit and implicit link algorithm, all of the link relationship of the set is found. Based on the sentiment analysis methods, the algorithm proposed by this work to establish a single theme network is described as in Algorithm 1.

Input: explicit links and implicit links in , sentiment orientation of every ;
Output: //Comment Network of ;
Description:
(1) for each
(2)  for each
(3)   if ( link to ) //the link includes explicit and implicit link
(4)   if has the same sentiment orientation with
(5)     positive link to ;
(6)   else
(7)     negative link to ;
(8) assign weight wtij for edge ;

In Algorithm 1, the 1st to 7th line is to traverse the set and to find out all of the link. The 8th is going to give a certain weight to the positive and negative links; this weight is given by In this equation, the function similarity represents the contents of comments similarity between and and “tag” denotes the emotional consistency between comments. For any reply relationship , if the comment is consistent tendency, then this comment is viewed as a positive link; that is, . Otherwise, the comment is a negative link and . The weight is thus assigned according to the following equation:

In (2), if the connection relationship between and is explicit, then the similarity is equal to 1. If it is an implicit link relationship, the similarity is between the texts. If the emotional tendency between and is consistent, then the weight will not be changed, otherwise, its value will become opposite. The above construction procedure can be illustrated as in Figure 2.

In Figure 2(b), the structure of the set is explicitly evaluated. According to the chronological order release , and the sequence corresponding to the floors floor1, floor2, floor3, floor4, and floor5, the corresponding single topic views networks as shown in Figure 2(a). The serial number of the edge is ranked in ascending order. That represents the link discovery order.

4. Dynamic Detection of Opinion Leaders

Opinion leaders are the most influential comments or persons. This paper will present an approach to find out the most influential comments among a single point of network and build a user views the network to find out the most influential user.

4.1. The Factors of Time

When reading a review or a comment, the longer the interval to reply is, the weaker its influence is. Hence, the impact of time should be considered [18]. As the above analysis, this section will propose a model to explain the time factor and the comments of the relationship between the strength of influence; this impact of time is shown in Figure 3.

As shown in Figure 3, there is a comment set. The first comments will influence the late comment. For example, “B → A” represents that the comment B is affected by the comment A. The distance between those two comments denotes the time intervals. The larger the interval is, the weaker the influence is [19]. For example, the distance between A and C is greater than the distance between C and B; thus, the impact of the B on the C is greater than its impact on A. On the other hand, the influence of comment set C will change over time. Therefore, the link weight between comments not only is related to the similarity but also gradually changes with time [20].

According to the above analysis, it is found that there is an important relationship between the release time of the comment and the choice of the comment. Defining a function to reduce the probability of the selection of comment, a function of distance on the time is defined as where is a function of time , , and the damping . The term is the time of the respondents comment, is the time when the replying person proposed a comment. Hence, is time-varying function. If the reply comment is far away from now, the comment has a smaller probability to be accessed. In (3), is a time-dependent coefficient and is a control factor. Thus, it can choose an appropriate value of to enlarge or reduce the time. The larger the distance is, the smaller the impact becomes. Additionally, changes with time are shown in Figure 4.

As shown in Figure 4, the function is gradually changing with the time interval, where . Thus, the function is defined reasonably.

4.2. The Detection of the Most Influential Comments

In a single theme network, if and are explicitly linked, then it leads to ; the impact thus practically exists. If the link relationship is implicit, then ; that is, the impact may exist between comments. For each , due to the effect of the time can be reduced by using the function , and then it follows that To normalize the probability, it needs to normalize the value : By using (5), the transformed probability can be obtained as Then, the matrix of the improved finite Markov chain can be described as follows.(a)Step  1 (b)Step  2 (c)Step  3 As shown above, , , and are sequentially linked only considering the time factor and the normalized matrix.

4.3. The Improved Model of the Finite Markov Chain

In the field of information retrieval, the PageRank algorithm is widely used. Inspired by this algorithm, a random walk model called Dynamic Opinon Rank is proposed in this section. These algorithms not only take the emotional factors into account but also consider the time factor [21].

From the standpoint view of model use, if a comment gets more positive reviews, then it will be more influential. Moreover, if this comment also replies to other comments, according to the characteristics of the model PageRank, it is reasonable that its influence also will be passed each other. Usually, comments may be affected by the following two cases:(1)the comments raised by users are affected by the interested opinion with a probability ;(2)comments may also be subject to random probability effects.

Based on the above analysis, an algorithm similar to PageRank is proposed as follows: where is an improved finite Markov chain transition matrix and represents the set of comments in the . Transposing yields where any line of of denoted by represents all the cases that links to .

For any element , the ranking score can be calculated by using the following equation: where represents the authority of the value. Following the above methods, it can eventually obtain the score over a period of time. Then, the comment with the maximum authority value can be chosen as the opinion leader. If a comment gets a lot of penetration and most of them are positive emotions consistent link and further the interval time between the comments is not very long, then this comment may get higher ranking scores.

5. Experimental Results and Analysis

To verify the proposed algorithm, experimental analysis is conducted. The data for the experiment is obtained from Weibo news. Through tracking this news within two days (2012-08-17 07:55: 17~2012-08-18 05:42:43) and dividing this period of time into four different time periods, each time period was analyzed to identify opinion leaders, and the dynamic change of the opinion leaders was analyzed.

5.1. The Result of Finding Out the Most Influential Comment

By building a single-topic comments Network , setting the parameter and , and applying the algorithm proposed in Section 4, the experimental results are shown in Figures 5 and 6. As shown in Figure 6, there are 211 comments in the first time period. It is also easy to find that the comment Number 25 received the highest scores. However, due to the short time, the relationships between comments are not clear. Therefore, opinion leaders may change with time; it leads to the inaccurate opinion leader. As shown in Figure 6, it is found that Number 25 is not the opinion leader, while Number 166 received the highest score. Hence, Number 166 is opinion leader at this time. Some comments received low scores because their views are not accepted by others. Although there has been an increase of the comment number during the second time of period, the relationship between comments still appears to be relatively sparse.

As illustrated in Figure 7, Number 25 becomes opinion leader, while the score of Number 166 decreases with time. Additionally, due to the increasing number of comments, the relationship between comments becomes more and more dense and the status of the comments converges to be stable.

It is found in Figure 8 that the number of comments is 560 in the fourth time of period. Now, Number 25 received the highest score, and it is the opinion leader at this time. In comparison with Figure 7, the newly published comment’s score grows faster. The result shown in Figure 8 also demonstrates that new comments will get more attention, and it also proves that it is reasonable to take time into account. On the other hand, many comments’ scores are growing. Due to the characteristics of news comments, it will get less attention after a period of time. Moreover, the number of comments also will decrease. Hence, the leadership of Number 25 will be maintained for a long time.

The most influential comments and the sort scores are shown in Table 2. We find that opinion leaders are changing over time. Moreover, the rank of opinion leaders is also affected by the time. This also verifies that it is quite necessary to take time into consideration when developing the algorithm.

To evaluate the performance of Dynamic Opinion Rank algorithm, a standard should be proposed to allow the experts to divide comments in each time period into two categories: the strong and weak influence. Then, it needs to measure the time the comment raised, degree centrality, the degree of authority, and the -Score of several Opinion Rank algorithms. Those comparison results are shown in Figure 9. It is found that the Dynamic Opinion Rank algorithm has much more accuracy and stability than other approaches. It thus verifies the effectiveness of the proposed scheme.

5.2. The Finding of the Most Influential Users

In the process of finding out most influential users, single-topic user network should be constructed firstly, and then the proposed algorithm in Section 4 should be applied to detect the most influential users. For the DBSCAN density-based clustering algorithm, the radius range is set between 0.06 and 0.12, and the initial MinPts is chosen as 1. Consequently, there exist some clusters containing noise. With application of the proposed approach, 3~5 clusters are finally obtained, it is shown in Table 3.

From Table 3, we find that the first clusters with most elements can be removed then replaced by clusters with less elements. As is sparse, set ,   and . Finally, it detects opinion leaders in each period. The result is illustrated in the last line of Table 3. According to the experiment, it reveals that opinion leaders can change with time dynamically.

6. Conclusions

This paper presents a Dynamic Opinion Rank algorithm to find out the opinion leaders in Chinese news. Unlike the existing approaches, the proposed network model explicitly takes explicit and implicit links into account. Moreover, the proposed algorithm was able to conclude that the most influential comments and the opinion leaders were time-varying. Experimental results further verified the effectiveness of the proposed strategy.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was also supported by the National Natural Science Foundation of China (Project no. 71171068) and the Polish-Norwegian Research Programme (Project no. Pol-Nor/200957/47/2013). The authors highly appreciate the above finanicial supports.