Data Analysis and Optimization for Intelligent Transportation in Internet of ThingsView this Special Issue
Research on Multifeature-Based Superposter Identification in Online Learning Forums
With the development of online learning and distance education, online learners’ discussions in forums become increasingly effective to facilitate learning. Superposters, who play a more and more important role in forums, have attracted researchers’ close attention. The key to the research is how to identify superposters among a large number of participants. Some studies focus on the network interaction of superposters and some content-related features but neglect the basic quality like language expression that a superposter should possess and the learning-related features like learning collaboration. Based on the analysis of online learning corpus, through network interaction and combination of the different features of N-gram, the paper proposed the superposter identification method based on the three primary features including language expression (L), content quality (C), and social network interaction (S) and the eight secondary features including learning collaboration. The paper applied the method in the real online learning forum corpus for identifying 28 preset superposters, achieving the results of , , , and . Experiments showed that this was an effective superposter identification method in online learning forums.
With the improvement in online learning and remote education, discussions in online forums become increasingly effective to facilitate learning. Through online messages with teachers, learners may solve problems and ease emotional loneliness during learning. Previous research study has proved that the opinion of leaders plays an important role in online learning and has a positive effect on interactions [1, 2]. Their posts may significantly help themselves and others to learn. To differentiate between the opinion leaders in social networks, this paper terms them as superposters in learning forums. At present, research is almost nonexistent on superposter identification in online learning forums, unlike the opinion leaders in a traditional sense where much research exists [3–10]. In the context of online learning forums, superposters refer to the users who are active in posting high-quality information, which may help learners to solve problems and prompt learning . Considering the differences in discourse environment, the superposters of online learning forums differ from popular opinion leaders in social networks. Opinion leaders in social networks mainly spread information via the Internet and thus exert an influence on information receivers in terms of public opinions and tend to affect public opinions. Therefore, according to the explanation of superposters and opinion leaders, there are similarities and differences between them. Both are active in interactions; superposters aim to boost cooperative study, but opinion leaders try to influence public opinions by swaying others. How do we identify superposters among thousands of online learners? Previous similar research was made on the basis of social online communities and applied in the fields of society and economy, but little was based on the forums of online learning platforms and applied in the field of education . Although Reppel  asserted the application was applied in education, the research was mainly made on blogs for identifying opinion leaders in online learning communities.
Through analysis of authentic online learning forums and the characteristics of superposters, this paper obtained three points with which superposters and ordinary learners were distinguished, whereby a model framework for superposter identification in online learning forums was constructed. The framework considers both of the network interaction structure of learners and the discourse features of posts, so as to better identify superposters. Experiments showed that considering the above appropriate different features was effective in identifying superposters in online learning forums.
There are two main contributions of this work: one is proposing a new framework to identify the superposters in learning forums, and the other is proving the framework is useful for identifying the superposters in online learning forum, by experimenting on real-online learning forum corpus.
In the following, the related work will be reviewed in Section 2, the superposter identification framework will be detailed in Section 3, the experimental design and result will be analyzed in Section 4, and discussion and summary will be made in full in Section 5.
2. Related Work
Opinion leaders play a significant role in social networks. As a result, identifying opinion leaders in the context of social networks attracts the great attention of the related researchers like those in the fields of sociology and business. The role includes participation in social politics , promotion and popularization of new products or services in the field of business, and effect on decisions made by other consumers [6, 12]. According to the current literature, the following are main methods of identifying opinion leaders:(1)Identification based on network interaction structure: on the basis of the structure, in combination with users’ social influence and attributes of web links, this is to reflect users’ centrality and prestige in social networks through web link addresses, such as the famous PageRank, HITS algorithm, and social network analysis, which are used to identify opinion leaders [4, 6, 10, 14]. With these methods, the network interaction structure with graph models is simulated to observe the importance of user nodes, which emphasizes the structure but fails to consider the comments of opinion leaders; moreover, in the event network nodes increase for the purpose of increasing the amount of information, the graph structure will become so complicated that opinion leaders cannot be identified effectively .(2)Identification in combination with network interaction and post contents: in consideration of such limitations as sole dependence on network interaction, plenty of research is made to identify opinion leaders in combination with post contents and network interaction. Based on social network analysis and user comments, Bodendorf and Kaiser explored the opinion leaders in online communities and the propagation trend of the public opinions they make . In combination with the features of network structure and user behavior and the emotional features of posts, through analysis of multidimensional features, Cao et al. studied the social network-based opinion leaders . Li and Du constructed an opinion leader identification framework with blog contents, author attributes, reader attributes, and the network relationship between blog authors and readers to identify the opinion leaders committed to word-of-mouth marketing in online social blogs . Although good results were achieved, the above research depended too much on influence or centrality, making it impossible to reflect the quality of the contents published by opinion leaders and thus accurately identify opinion leaders. Meanwhile, they were made based on social networks instead of identification in the field of education. In accordance with the features of expertise, novelty, influence, activity, longevity, and centrality, Li and Ma et al. built an indicator framework to identify opinion leaders . Huang et al. identified superposters according to the quantity and quality of learners’ posts in course forums . This is rare in terms of opinion leader (superposters) identification in online learning forums but fails to reflect the quality and role in cooperative study of superposters’ posts.
In the opinion of the author, superposters in learning forums are different from opinion leaders in social network, and they must have a certain cultural quality and cooperative study skills, which are not reflected in the above research studies. Therefore, in consideration of the limitations of the abovementioned studies on opinion leader identification, this paper proposes a superposter identification framework based on language expression, content quality, and interaction structure, so as to identify superposters among the participating learners and learning supporters.
3. Superposter Identification Framework
The authors consider that the superposters in online learning forums should be as follows: (1) be active in posting/replying; (2) be excellent in language expression; (3) post high-quality posts and have a good ability to learn, or be knowledgeable, or accurately reflect learning needs, or provide other assistances to online learners. These not only reflect the importance of poster nodes in interaction through forums but also indicate the authority of their posts. Based on these features, the paper proposes the framework (see Table 1) for superposter identification in online learning forums (Chinese as the working language), as shown in Table 1. According to the definition given by this paper, for a superposter, we expect to reflect the language expression level of learners, quality of post contents, and activity of interaction, respectively, through language expression, content quality, and social network interaction.
3.1. Social Network Interaction
In social network analyses, degree centrality is an important index that measures the social interaction of individuals as well as a common index that evaluates the social status and prestige of individuals, including out-degree centrality and in-degree centrality; out-degree centrality is used to reflect the replies of a poster (learner or learning supporter) to others’ posts, as expressed with the following formula:where is the th learner of learners set A; N is the total number of learners, similarly hereinafter; is the reply of to others, i.e., the number of linkout of node in interactive networks, which reflects the importance of node position; is the ratio of the number of ’s replies to others’ posts to the total number of replies (excluding all self-replies), which reflects the degree of interaction in which participates; and traditional algorithms only consider but ignores the degree of interaction reflected by .
Degree centrality, also known as Prestige , may reflect the replies of other posters to the posts of , as expressed with the following formula:where is the th learner of set A; N is the total number of learners; is the number of link in of node in interactive networks, which reflects the node prestige; and is the ratio of the total number of others’ replies to a post of to the total number of others’ replies (excluding all self-replies), which reflects the centrality of in interactive networks but is rarely considered in traditional algorithms.
Therefore, the index of social network interaction of is calculated as follows:where is a weighting parameter.
3.2. Language Expression
Plenty of research on identification of opinion leaders failed to consider the language expression skill of an opinion leader. However, whether in terms of interaction in social networks or online learning forums, an opinion leader or a superposter must ensure fluent language expression and owns a certain cultural quality. If a post involves violent words or unclear expressions all along, no matter how innovative or important it is, other users (learners of learning forums) may refuse to discuss further. For this reason, this paper makes a survey on language expression with three indexes including “word normalization,” “term nonnormalization,” and “language elegance.” These are relatively easily achieved and may reflect the language expression skill of posters.
3.2.1. Word Normalization
Word normalization is to survey the frequency of Class I and Class II commonly used Chinese characters in posts and thus verify the normalization of the words used by learners. When uncommon words are used in posts to appear intellectual, learners may find it difficult to achieve optimal learning, thus limiting the spread of information. To facilitate survey, the index of the normalization of the words used by is defined as follows:where and , respectively, are the frequency of Class I and Class II commonly used Chinese characters in all posts of ; is the total frequency of Chinese characters in all posts of ; and and , respectively, are the number of the types of Class I and Class II Chinese characters in all posts of . Constants 2500 and 1000, respectively, are the number of the types of Class I and Class II Chinese characters; 0.9 and 0.1 are weighting parameters and empirical values.
3.2.2. Term Nonnormalization
Term nonnormalization is to survey the use of uncivil words by learners (Internet users). Such usage involves impolite, violent, and vulgar words and some Internet slangs in the process of exchange in forums. Thus, to further analyze the normalization of the words use by surveying the use of uncivil words and Internet slangs, the paper defines the index of term nonnormalization of as follows:where C = 1.2 as a constant and reflects that uncivil words are more improper than Internet slangs and , , and , respectively, are the frequency of uncivil words and frequency of Internet slangs in all posts of and the total frequency of words.
3.2.3. Language Elegance
Language elegance is to survey the use of fixed phrases (including fixed terms, phrases, and idioms) in posts. Although language derives from life, we cannot deny the fact that “individualized teaching” (yin cai shi jiao-因材施教, Chinese idiom) is more concise, refined, and elegant than “adopting different teaching methods for different students” in terms of expression. If similar expressions are frequently used in a post, we may see the vocabulary and language mastery of the poster. Accordingly, the paper observes the language expression ability based on this. The language elegance of is calculated as follows:where are the frequency of commonly used class I words, frequency of commonly used class II words, frequency of idioms in all posts of , and the total frequency of words, respectively; , and , respectively, is the number of the types of Class I and Class II words and idioms; , respectively, is the total number of the types of Class I and Class II words and idioms; and as constants, which are the coefficients from locally optimal solutions obtained through repeated experiments and empirical values.
Therefore, the index of language expression of is calculated as follows:where as weighting parameters, .
3.3. Content Quality
The content quality of posts directly affects the result of interaction in online learning forums. Therefore, in the process of identifying superposters, surveying the content quality is very important. In surveying the quality of the contents posted by superposters, the paper mainly focuses on three questions:(i)Q1: do the post contents help others to solve learning problems?(ii)Q2: are replies relevant to a topic?(iii)Q3: how about the conformity of post contents with knowledge points?
That is, if is a superposter, his/her posts will be considered high-quality, helpful to others in the process of interaction and highly relevant to a topic (rather than spam or meaningless posts) and to have highly professional knowledge points. Therefore, the paper evaluates content quality based on the learning collaboration, correlation with the thread, and expertise of content.
3.3.1. Learning Collaboration
Learning collaboration, mainly used to observe the role of posts and interaction activities in supporting participants to learn, is to survey whether post contents may help others to solve problems herein. The learning collaboration of is defined as follows:where , respectively, is the number of helpful posts of , total number of posts, and the number of the beneficiaries from the posts of .
The present difficulty is how to confirm whether a post of may help others to solve problems. There is also similar research, including that on manual confirmation, which is time and labour consuming and undesirable for massive corpus, and that on automatic confirmation, which identifies answer-question in forums in combination with rules and forum structure and achieves good results in extraction experiments . In addition, this is also confirmed by vote in forums . With the second method, the paper confirms the data about helpful posts and beneficiaries in line with rules and in a statistical manner.
3.3.2. Correlation with the Thread
Correlation with the thread title is the correlation of replies to the topic discussed in the main post. In the process of discussion in online learning forums, learners often post some irrelevant comments about question B in a post where question A is discussed. A superposter must not or rarely do so and should comment in accordance with threads. Therefore, the correlation of a reply of to topic is calculated as follows:where , respectively, are the number of the replies of considered relevant to the target topic (main post) and the total number of the replies of . The correlation between a reply of and the target thread is calculated with the cosine value.
3.3.3. Expertise of Content
Expertise of content is the course knowledge points involved in a post, by which the index of expertise of content in the posts of in the forum can be calculated as follows:where , KnowledgePostNum , and , respectively, are the frequency of the knowledge points of included in the posts posted by , number of the posts which contain at least 1 knowledge point, and the total number of the posts posted in the forum. is the total frequency of knowledge points of in the forum. Each knowledge point which appears for 1 or 0 times is not be counted repeatedly. is the index of educational content of the posts sent by in the forum.
Accordingly, the content quality of can be calculated as follows:where are weighting parameters.
3.4. Superposter Index
With the MIN-MAX method, the paper normalizes the results of . For example, can be normalized with the following formula to an extent that realizes the result within the range from 0 to 100:
Similarly, we may obtain and .
In conclusion, superposter index () is calculated as follows:where are weighting parameters which may be set according to the actual situation.
4. Experimental Result and Analysis
4.1. Data Set
The data in the paper are downloaded from the Q&A forum  for online learning course Computer Application Foundation. The dataset includes 7494 subject, 22369 posts, and 6747 participants (including 6712 learners and 35 teachers). Among the 35 teachers, 28 were found to meet the defined conditions of a superposter, through sampling and analysis of the data about their posts (see Table 2). Therefore, 28 teachers were considered as superposters and identified with the method proposed in the paper; there were 7 teachers unqualified to be superposters for 4 teachers who posted 2 posts each and 3 teachers who posted 1 post each. In addition, to count the number of knowledge points in posts, the paper constructs an online unified examination knowledge point set based on the Fundamentals of Computer Application .
There have been no mature and recognized methods of assessing superposter identification. In this section, evaluation is made with the following indexes .
The accuracy of TOP M () is as follows:
The average accuracy of TOP M () is as follows:
4.3. Result Analysis
With the three feature indexes of the model including language expression (L), content quality (C), and network interaction structure (S), the paper makes a test on the effect of identifying the 28 superposters. Through repeated weighting tests on data, the weighting parameters are set in Table 3, and the results and statistical analysis are described in Tables 4 and 5, respectively. Comparison of the results of our algorithm with the PageRank algorithm (PR) is shown in Tables 5 and 6 and Figures 1–3.
According to Table 5, ① the model indexes achieve good results of superposters identification, and the model is very effective in application in the dataset. ② In terms of the identification result achieved by each feature, content quality, which realizes the average accuracy of over 0.9, is considered best. Language expression is just as the N-gram model, in which only 14 superposters are correctly identified among TOP 28, and is considered good. Social network structure, a common index in social network analysis by which 22 are correctly identified among TOP 28, is considered better. ③Although a single feature is unable to perform well, their combinations may realize striking effects: with the combinations like LC, CS, and LCS, 24 superposters are correctly identified among TOP 28. With the combinations like LC and LCS, all of the superposters can be identified among TOP 15, which undoubtedly proves that the feature designs are rational and effective. ④ The single feature L performs averagely, but its combination with other features performs well, especially LC. We are confused about whether this means that the two models are mutually complementary as part of contents in terms of structure. ⑤ Among TOP 28, 14 superposters are identified by L and LS models. This shows that language expression greatly depends on the length of text and is not sensitive to identification of the superposters of short text although consideration has been given to avoidance of this case in design. Since the number of the superposters identified by LS is less than that by S, we are confused about whether this means that L and S have something in common. However, there is a difference between L as a content-based result and S as a graph-based (network interaction) result in structure. We are confused that whether it is or not because the two models with different structures are mutually exclusive, which causes the result to deteriorate. Actually, this is also the case with the CS model, which achieves a result inferior to that C does, which we cannot explain in this study. Particularly, in the event of TOP 51, all of the 28 superposters can be identified. From Table 5, the trend chart for identification results and for average identification results achieved by each feature in different cases can be obtained (see Figures 1 and 2). Compared with the PageRank algorithm, the experiment result (see Tables 5 and 6 and Figures 1–3) of our algorithm (LCS) is better.
In social networks, it is widely believed that out degree is as important as in degree; in the process of testing the weighting parameter of S, we found that when , a good result was achieved locally and other values slightly improved; however, this was considered very unbalanced; that is, it only considered the replies to others’ posts but neglected others’ replies to own posts, leaving it not universal; through observation of data set, we found that there is a difference between the number of threads and the number of replies, especially in relation to the 28 teachers; in consideration of the generality of the model, through comprehensive consideration, the paper sets . Other parameters are set according to the optimum effects achieved by a single feature in experiments.
According to the intermediate results based on content quality, we found that some data were lost, such as the number of beneficiaries and helpful posts, especially as to learning supporters; for example, the posts sent or replies by teachers were related to the course and helpful to learners; therefore, all of the learners who participate in the interaction were beneficiaries, and the posts of the teachers were considered helpful; however, the paper was unable to accurately obtain such information, leading to a loss of related data and affecting the identification of the teachers as superposters (the relationship between recall and accuracy can be seen in Table 6 and Figure 3). Nevertheless, through the model (LCS), all of the 28 superposters can be identified among TOP 51. It can be done by PageRank algorithm at top 571 cases.
Through analysis of the data on the posts sent by learners in online learning forums, the paper proposed a superposter identification model based on characters, words, and network interaction structure. First, through analysis of the network interaction of users based on graph structure, the paper calculated the out-degree centrality and in-degree centrality of each user node in networks, which involved both the interaction breadth and depth of each node, so as to determine its activity and importance in interactive networks. Then, learners’ language expression was included in the identification framework, including word normalization, term normalization, and language elegancy, by which the normalization of the words and terms used by learners and their basic ability to master language are judged. The third-dimensional feature is most important in online learning forums, i.e., content quality, which includes learning collaboration, correlation with the thread, and expertise of content. An online learning forum is designed to facilitate cooperation between learners and interaction in relation to learning contents. Learning collaboration mainly considers whether a post is helpful to others in study; correlation with the thread is to verify the correlation between a post and the topic discussed therein; expertise of content is to survey whether course knowledge points are included in a post. Accordingly, the three indexes work, respectively, in online learning forums on a targeted basis.
Although there are some deficiencies in the design of the superposter identification model, such as the need of repeated experiments on manual setting of weighting parameters in the process of calculation, a good result was achieved with the proposed method for identifying the preset 28 superposters. Considering that the method is easily realized and involves few calculations, and it is worthy to be applied in practical online learning systems.
Research data related to learners’ personal privacy cannot be shared.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This research was supported by China Scholarship Council and a general project of the National Natural Science Foundation of China (Event-Based Semantic Research on Educational Texts, Grant no. 61977032).
B. D. Wever, H. V. Keer, T. Schellens et al., “Roles as a structuring tool in online discussion groups: the differential impact of different roles on social knowledge construction,” Computers in Human Behavior, vol. 26, no. 4, pp. 516–523, 2010.View at: Google Scholar
S. Zha and C. Lee Ottendorfer, “Effects of peer-led online asynchronous discussion on undergraduate students' cognitive achievement,” American Journal of Distance Education, vol. 26, no. 4, pp. 238–253, 2002.View at: Google Scholar
N. Matsumura, Y. Ohsawa, and M. Ishizuka, “Identifying opinion leaders in the blogosphere,” 2002.View at: Google Scholar
X. Song, Y. Chi, K. Hino et al., “Mining and characterizing opinion leaders from threaded online discussions,” 2007.View at: Google Scholar
X.-H. Fan, J. Zhao, F. A. N. G. Bin-Xing et al., “Influence diffusion probability model and utilizing it to identify network opinion leader,” Chinese Journal of Computers, vol. 36, no. 2, pp. 360–367, 2013.View at: Google Scholar
Z. Zhai, H. Xu, and P. Jia, “Identifying opinion leaders in BBS,” 2008.View at: Google Scholar
D. K. Kim, A. C. James, and G. J. Shepherd, Identifying Opinion Leaders by Using Social Network Analysis: A Synthesis of Opinion Leadership Data Collection Methods and Instruments, The Faculty of the Scripps College of Communication of Ohio University, Columbus, OH, USA, 2007.
F. Bodendorf and C. Kaiser, Detecting Opinion Leaders and Trends in Online Communities, IEEE International Conference on Digital Society, New York, NY, USA, 2010.
J.-X. Cao, C. H. E. N. Gao-jun, W. U. Jiang-lin et al., “D multi-feature based opinion leader mining in social networks,” Acta Electronica Sinica, vol. 44, no. 4, pp. 898–905, 2016.View at: Google Scholar
J. Huang, A. Dasgupta, A. Ghosh et al., “Superposter behavior in MOOC forums,” 2014.View at: Google Scholar
B. Wang, B. Liu, C. Sun et al., “Extracting Chinese question-answer pairs from online forums,” Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics, vol. 23, 2009.View at: Google Scholar
P. Jurczyk and E. Agichtein, “Discovering authorities in question answer communities by using link analysis,” 2007.View at: Google Scholar
National College Network Education Examination Committee Office, Fundamentals of Computer Application, Tsinghua University Press, London, UK, 2013.