Abstract

Currently, user group has become an effective platform for information sharing and communicating among users in social network sites. In present work, we propose a single topic user group discovering scheme, which includes three phases: topic impact evaluation, interest degree measurement, and trust chain based discovering, to enable selecting influential topic and discovering users into a topic oriented group. Our main works include an overview of proposed scheme and its related definitions; topic space construction method based on topic relatedness clustering and its impact (influence degree and popularity degree) evaluation; a trust chain model to take user relation network topological information into account with a strength classification perspective; an interest degree (user explicit and implicit interest degree) evaluation method based on trust chain among users; and a topic space oriented user group discovering method to group core users according to their explicit interest degrees and to predict ordinary users under implicit interest and user trust chain. Finally, experimental results are given to explain effectiveness and feasibility of our scheme.

1. Introduction

Currently, user group in social network site (SNS) has been garnering increased attention in fields of topic related opinion expression and information sharing [1]. Commonly, there is a set of related topics, which interest all members in the user group. Therefore, individual users who maintain high interests in the set of topics would join the group and interact with other members conveniently. By joining topic associated group, users can deliver their attitudes and discuss with other members and share other related information about the topic. In user group, group related information would be shared more rapidly. Thereby, user group impacts users more deeply than other organizations. Generally, a user group which is related to more influential topics would obtain more attentions and have a larger impact in social network. Thus, how to discover influential user groups to attract more users is a significant problem for social network analysis.

(1) Theoretical Background and Consideration. Topic is the primary factor for user group. The topics, which users discuss and communicate around, should have close relations to make sure that all members in the group have most common interests. In addition, the topics must have large impacts in SNS to attract mass users’ interests into the user group. From this point, it is indispensable to find out a set of topics that have close relations and large impacts in SNS.

Interest reflects the sense of concern and curiosity about the topics that have the power of attracting or holding users’ attentions in SNS. Thereby, user’s interest about specific topics is another significant factor to evaluate whether a user has probability to join the group or not. That means the more he/she is interested in the topics, the more likely he/she is to be a member of the user group.

Additionally, information is propagated in a fission pattern based on a large-scale social relation network which is formed by users’ relationships, such as friend relationship and followed relationship [2]. Such propagation pattern of users’ relationships will also facilitate detecting and organizing user groups, since there would be more probabilities of maintaining similar interests among users who have closer relationships. That means user relationships in social network provide an important parameter to calculate and evaluate user group discovering.

Based on above consideration, the aim of our work is to find the most influential topics which are related to each other in SNS and then organize users who keep interests in these topics into groups to achieve information sharing through channeling close user relationships.

Currently, scholars cluster topics mainly through topic detection technology to construct network model for user grouping [35]. However, not only is user group organization based on users’ behaviors, such as sending, forwarding, or accepting, but also it contains implicit effects of social relationship in SNS [6]. It is more probable that information would be shared among mutual trusted users. Therefore, the effectiveness of social relationships cannot be ignored in user clustering and information propagation. Many existing researches have already explored that relationship among people plays key influence on information propagation [7]. The interests of users, which cause the people to be clustered in user groups [8], can also transfer through their trust relationships. Thus, relationship is critical to discover user group accurately in SNS.

Trust reflects user’s confidence or faith to others based on his past experiences or other factors. It has been used to measure the closeness of relationships among users and calculate related reliability in social network [9, 10]. Users can pursue their favorite items, news, and other related information about the topics and also be concerned with or even accept information which is related to them or sent by trustworthy persons in SNS [11]. In our consideration, user group is formed by the trustworthiness among users and essentially reflects their confidences towards a specific topic. Such point-to-point trust relationships would bring users together and form group under their common interests. Thereby, we can accurately discover qualified users to organize topic user group based on trust chain in SNS.

(2) Main Contributions of Proposed Work. In this paper, we propose a topic space oriented user group discovering scheme based on trust in social network, which is composed of three phases: topic space detection, interest evaluation, and user grouping based on trust chain. Firstly, we address an overview of our scheme and give related definitions, that is, graph model of social network, topic space, user interest, and trust chain. Secondly, we propose a detection method of core topic set (named topic space in this work) through topic impact evaluating and relatedness clustering. Thirdly, we present the user interest evaluation method including explicit and implicit interest degree. Then, we address a user grouping method for discovering users based on trust chain in SNS. Finally, we perform experimental analysis to verify the effectiveness and feasibility of our method. The main contributions of this work include putting forward a topic space construction method based on topic relatedness clustering and impact evaluation, including influence degree and popularity degree evaluation; setting up a trust chain model by taking user relation network topological information into account with a strength classification perspective; presenting a user interest degree evaluation method which involves explicit and implicit interest degree calculation based on trust chain prediction; proposing a topic space oriented user group discovering method. By this method, core users who have large explicit interest in topic space are grouped according to explicit interest evaluation and ordinary users who have implicit interest are further estimated based on trust chain in social network.

2.1. Community Discovering in Network

Community discovering in network environment is a traditional research area [1215]. In many existing works, networks including SNS, P2P, or distributed system were characterized by graph theory. In graph theory, machines or users are regarded as a set of vertices and their relationships or communication interactions are described as a set of edges. On this basis, communities with features of small-world network [16] or scale-free network [17] can be defined as induced subgraphs of the network graphs. There are dense and tight links among vertices (nodes) in community, while their relationships are sparse and loose outside of the community [14]. Essentially, community discovering finds out that the vertices have relatively dense links according to the topological structures and graph features of the network. Many efforts have been made for network community discovering. Girvan and Newman proposed a method of detecting community, called GN algorithm, by using the property of community structure. In this structure, network nodes are joined together to form tightly knit groups; however, between groups there are only looser connections [12]. Besides, other methods such as NN [18] and -means clustering algorithm [19] are also widely used in discovering methods.

2.2. Influence Evaluation in Social Network

In the area of impact evaluation, a lot of researches about the maximized influence of social network have been done [20]. Chen and his colleagues propose a series of works about influence maximization such as greedy algorithm evaluation [21], influence diffusion dynamics and influence maximization [22], and scalable influence maximization under the linear threshold model [23]. In our consideration, relations among topics and their popularities are two significant aspects in influence evaluation. Since topics are not independent in SNS, there should be inherent relations for topics, and influential topics would be likely to link to other more influential topics. In addition, popularity is another explicit dataset showing the influence of a topic directly. Therefore, we take the above two aspects (relation among topics and their popularity degrees) that are not addressed in traditional works into account to evaluate the influence of social network.

2.3. User Interest Analysis

User interest analysis has been used in many fields, such as online user clustering, recommender system, and service quality evaluation. Zeng et al. [24] proposed a user interest analysis method based on user activities on the web. Li et al. [8] addressed a method based on user interest popularity distribution in recommender system. Hegde et al. [25] presented an approach that automatically assigns tags to places, based on interest profiles and visits or check-ins of users at places. Most of these traditional studies are based on users’ explicit data, such as behaviors, profiles, or other related data. That means only explicit interest was analyzed. However, many users keep their interest implicitly and did not express their interests explicitly in SNS. These users expressing their interests implicitly would be lost under those traditional measuring methods. Obviously, it is insufficient to discover users just based on their explicit interest. Implicit interest is another important criterion to find potential users. That is why we take it into account for user interest analysis in this paper.

2.4. Trust Computation

There have been a large number of researches on trust and reputation in the past decades [10, 26, 27]. Many methods, such as summation/average/iteration of past trust rating [26] and Bayesian model [27], have been proposed to optimize one or more aspects of trust computation performance. In addition, the weighted average of ratings method is a typical and widely used method in trust computation [28]. In this method, all trust ratings about the target object are aggregated and then a weighted average of the aggregation is calculated as trust value. In social networks research, there are also many works for trust computing. Javier Ortega and colleagues propose a method to compute a ranking of the users in a social network and propagate both positive and negative opinions of the users [29]. The opinions from each user about others can influence their global trust score. Qureshi et al. propose a decentralized framework and the related algorithms for trusted information exchange and social interaction among users based on the dynamicity aware graph relabeling system [30]. In [31], an extended Advogato trust metric is proposed to facilitate the identification of trustworthy users and diffuse a capacity of a target user throughout personal network. Golbeck proposed TidalTrust that gets trust in social networks using numeric trust values [32]. It utilized the shortest path based on the breadth-first search. Furthermore, TidalTrust can be used to retrieve accurate information from the highest trust adjacent nodes.

Different from traditional network community, topic user group is composed of users interested in the same topic. Members of topic user group might disperse in different locations in network and do not have tight and frequent interactions with each other in past. Therefore, there are the following considerations for topic user group discovering in our view. User’s interest degree is a significant factor for measuring whether a user should be detected in the user group. Those who maintain strong interests on the topic are definitely core member of user group. There are many users who keep high interests on topic and do not express their interests explicitly. These potential members should be recognized in the user group. Interests may be transferred through users’ relationships based on their trustworthiness. That is, if there is a pretty high level of trust between two users, they might have great possibility to keep similar interests on the same topic. For example, one of them would have his positive or negative effect on another through their trust relationship. Thereby, trust is a crucial linkage among users to share their interests in common and plays an important role in user group discovering.

3. Overview of Our Scheme

Topic space oriented user group (TUG) is organized by three phases: topic space detection, interest evaluation, and user grouping based on trust chain. The topic space gathers influential and interrelated topics that can attract peoples’ attentions and public concerns. Specifically speaking, it is meaningless to detect and organize a user group about inessential and unremarkable topics. User interest reflects how interested he/she will be in the topic. That is, interest degree is the criterion for evaluating and grouping user into a TUG. User relationship model, called trust chain model in this paper, reflects the close degrees among users. Through trust chain, we can measure the probabilities of users’ topic interest similarities and then group those with mutual trusts in the same TUG.

The overview of our scheme is shown in Figures 1(a)–1(e) as follows. (a) Topics are linked through their relations (black lines) in SNS. (b) Users are linked via their trust chains (blue lines) in SNS. (c) The impacts and relatedness degree of topics are evaluated according to indicators of topic rank and popularity. Then, influential and close related topics are clustered into topic space (marked in red in Figure 1(c)). (d) The interest degrees of users towards the topic are measured based on explicit or implicit interest. (e) Core users of TUG can be identified according to their explicit interests (the core users are marked in red with their interest degrees in green dash lines in Figure 1(d)). Furthermore, ordinary users of TUG are detected based on selected core users and their trust chains (ordinary users are marked in pale red in Figure 1(e)).

Correspondingly, we address the following definitions in this paper.

Firstly, we introduce the graph theory for modeling the social network formally.

Definition 1 (social network graph model). Social network graph model can be described as , where is the nonempty set of vertexes which denote the users in SNS, while is the set of edges which denote user relationships among users.

Through Definition 1, we can describe the trustworthy relationship among users by vertexes (users) and edges (user relationships). That is, if a user keeps a trust relationship with another user , the trust can be described as .

Definition 2 (topic space). Topic space is a set of topics which have large impacts and close relations. Topic space can be defined as , where is the set of topics in topic space; denotes a topic in the topic space and it can be described as , and is the impact degree of whole topic space.

In the above definition, there are two elements for describing a topic: and ; which contains core content of topic (), its subtopic set (), and its parent topic set (), while denotes the weight value of the topic influence. Meanwhile, is the impact degree of the topic space, which is an integrated value by combining all the impact degrees of topics.

Definition 3 (interest degree). Interest degree of user reflects the quantified value of user’s interest level about a specific topic or a topic space, which can be used to predict the probability of joining a topic group for users.

In this study, we give two kinds of interests for users as follows: explicit interest and implicit interest. In our consideration, the interests expressed by users’ direct behaviors, such as judgments, browsing time, approving, and forwarding, are defined as explicit interest, while the potential feelings or opinions which have not been expressed by users are seen as implicit interest. Apparently, explicit interest can be evaluated directly through users’ past behaviors, and the implicit interest can be extracted through users’ relationships since users are linked through their relationships in social network and such relationships enable revealing the possible implicit interests. That is, we can estimate implicit interest through their trustworthy relationships, which are regarded as trust chains in this paper. For example, if user , who has no direct evidence to express his interest in a topic, keeps a very high trustworthy degree to his friend who has a strong interest in the topic, we can predict that user might have a certain interest in the topic. In this example, explicit interest is delivered through users’ trust relationship and thus generates implicit interest, which is the underlying rationale for implicit prediction in this work. Correspondingly, users in social network have their interest degree of both topic and topic space.

Additionally, relationship is another significant entity connecting users in SNS. Consequently, it can be used for evaluating closeness degree among users, predicting implicit interest degree, and further organizing users to form groups in this work. To achieve that, we use the notion of trust to reveal the relatedness among users. That is because there would be more probabilities of users to share common interests and then join the same user group if they trust each other. In this work, the trust relationships, including direct relationships and indirect relationships among users, are defined formally as a conception, trust chain, as follows.

Definition 4 (trust chain). Trust chain is a model for describing the direct and indirect link among users. It reflects the trustworthy relationship and can be defined as , where denotes nonempty set of user nodes in trust chain and the user nodes can be divided into three roles as follows: source user nodes , intermediate user nodes , and target user nodes ; denotes the finite set of atomic trust chain; denotes the combined trust chains which are composed of atomic trust chains and symbols and denote serial trust chain and parallel trust chain, respectively; denotes the chain classification of trust chain; and denotes trust value of atomic trust chain or combined trust chain.

In addition, there are two categories for trust chain: by topological route composition and by strength of trust chain. Firstly, since there are different route compositions of indirect trust chain among users, we divide trust chain into four kinds: atom trust chain, serial trust chain, parallel trust chain, and combined trust chain. Meanwhile, to signify the strength of trust chain and define the constraints of trust chain strictly, we classify trust chain as strong and weak trust chain. Details of trust chain will be discussed later. Through Definition 4, the direct and indirect trust chains can be described formally according to the topological composition of the relationships among users.

Definition 5 (topic space oriented user group). In SNS, topic space oriented user group (TUG) can be defined as a 3-tuple , where denotes the topic space; is nonempty set of users in which is the core user set and is the ordinary user set; denotes the set of users’ explicit or implicit interest degrees of topic space, respectively.

A TUG contains a topic space and a set of users who maintain strong interests to it. With the consideration that many users do not express their interests through explicit behaviors or evidences, their implicit interests can be estimated and evaluated through their trust relationships with others. Therefore, there are the following properties for TUG in this paper:(1)For user set in , .(2)For and , .(3)For each , if s/he has an explicit interest () to topic space , his/her interest value satisfies condition .(4)For each , if s/he has an implicit interest () to topic space , s/he must satisfy the condition .

For future facilitating of the reading, Table 1 presents the nomenclatures proposed in our paper.

4. Topic Space Construction and Impact Evaluation Method

We first address the method of structuring topic space for TUG discovering. As mentioned above, only influential and close related topics can be selected for organizing the topic space, and thus our method of topic space construction is based on evaluation of the relatedness degree and influence degree for topics in social network.

As defined in Definition 2, topic space is composed of a set of topics and each topic can be described as two elements: semantic () and impact level (). In our consideration, the evaluation of relatedness degree can be measured from semantic perspective, while the impact degree comprises two aspects as influence degree and popularity degree. Accordingly, there are the following three aspects for detecting topic space: relatedness clustering, influence evaluation, and popularity evaluation.

4.1. Relatedness Clustering for Topic Space

Relatedness clustering aims to find out topics which have close relations and then form a strong related topic set. That is, irrelevant topics should be removed from a topic space because they have few correlations with those topics in topic space and might contribute a little to attracting users’ interests to join TUG. Hence, we here provide a method called relatedness clustering for topic space.

We propose a factor, denoted as relatedness degree, to describe the closeness of topics’ relations. Assume that there are two topics, and , and their corresponding sample sets, which include topic related posts, comments, or other items, are and , respectively. Then, the relatedness degree of two topics can be calculated by Jaccard similarity as follows:

It is noteworthy that the result of relatedness degree is impacted by the sample sets of topics. That means different sample sets would result in different relatedness degree. Therefore, we propose an iterative algorithm for stabilizing the relatedness degree as shown in Algorithm 1.

select and ;
get ;
;
while
 select and which satisfy condition ;
 get ;
;
;
return .

Based on Algorithm 1, we can get relatedness degree factor of every two topics for measuring their closeness relation. Further, we can detect close related topics and then cluster them based on the relatedness degree factor.

Assume that there is a set of candidate topics, , for discovering topic space. Then, the relatedness degree of every two topics can be calculated based on Algorithm 1. We can get a relatedness degree matrix as follows:

According to the relatedness degree matrix, we can see that the candidate topic with maximum value of sum of its column values would be the topic which has the highest relevance with all other candidate topics. Here, we denote the topic with maximum value of sum of its column values as Topic Space Kernel. Then, topic space can be clustered based on the Topic Space Kernel. We propose a relatedness clustering algorithm for topic space as shown in Algorithm 2.

;
get ;
;
for all do
get ;
if get then ;
;
for all do
if
 then ;
for all do
 if
 then ;
return ;

In Algorithm 2, close related topics are discovered and thus clustered to form . The step of indicates utilizing Algorithm 1 to stabilize the relatedness degree values of topics. Also, is a threshold which is given in advance.

4.2. Influence Evaluation for Topic Space

According to the core content, parent, and subtopic sets, influence degree of a topic can be evaluated from the above parts. That means influence degree is an integrated value which is calculated based on importance of core content, parent, and subtopic. Here we address a method, called topic influence rank (TIR) method, which is similar to PageRank [33]. Assume that there are different topics in SNS; TIR method works by counting the number and quality of relationships to a topic to determine a rough estimate of how important the topic is. That is, more influential topics are likely to have more relationships with other topics. As shown in Figure 2, there are two kinds of relations between two topics. The first kind is link relation (solid lines in Figure 2(a)). That is, there are semantic relationships among topics and the topics are linked through their inner links in topic oriented web pages. Here, the topic oriented web page means a page whose content is mainly about a specific topic. For example, a page including a text about topic of “music” can be seen as a “music oriented page.” Most topic oriented web pages are categorized manually. (More specifically, there are the following types: if a topic is included in the title of the text in a page, the page is marked as a topic oriented page; if a topic is included in the keywords or label words of the text in a page, the page is marked as a topic oriented page; the text in a page is marked as a topic oriented one through semantic analysis technologies (owing to length limitation of the paper, detail semantic analysis technologies are discussed in other works). All the data is prepared through the data preprocessing in this work.) For example, topics of “pollution” and “disease” are in semantic relation since there are inner links among their pages. The second type is hierarchy relation (dotted lines in Figure 2(b)). Such relation is also called parent-child relation. In our consideration, the subtopics or parent topics can bring their contributions to the topic which has semantic containment relations. For example, topics of “pollution” and “air pollution” are in semantic containment relation and the two topics contribute their impact degrees to each other.

We first propose the TIR calculation method of semantic incompatible relation. In this case, all topics have completely different semantics of core contents. Assume there is topic set as , and, for each topic , it has a page set as . Then, for each , it can calculate its TIR degree of link relation, , as follows:where is the set of topics that have a link to page ; denotes the set of pages that have links from page ; and denotes the number of topics in set . Similar to damping factor set in PageRank, we set a parameter here to describe the probability of topic change. Additionally, we give a weight factor, , for page to distinguish the importance of pages as follows: if the page belongs to same topic with page , which means the link from to is an inner link, page would contribute less influence than the page which has external link with and a page belonging to a more influential topic would bring more contributions to the pages in its set. The example is shown as in Figure 2(a) and we can see that there are three topics (, , and ) and their pages and links. Then, can be calculated iteratively and finally can be convergent to stable values (suppose that is set as 0.2) as follows:

In addition, we calculate TIR degree of hierarchy relation. Likewise, we assume that if a topic’s subtopics or parents have higher TIR degrees, the topic would get a higher TIR degree. Let the subtopic set of a topic be and let the parent set be . The TIR value of for hierarchy relation, , can be calculated as follows: where is the number of subtopics of topic and is the number of parent topics of topic . Meanwhile, is impacted by the relatedness degree between topic and its subtopic or parent topic. The value of could be 0 while a topic has no parent or subtopic. For example, there are three parent topics (, , and ) and two subtopics ( and ) of topic , and then value of in Figure 2(b) can be calculated as follows:

In summary, influence degree value of a topic by TIR can be calculated based on the above two equations:

4.3. Popularity Evaluation for Topic Space

Popularity is another significant criterion for topic influence evaluation. In our consideration, the underlying assumption of popularity evaluation is that the more popular a topic is, the more influential it would be. Hence, we calculate the popularity of a topic based on its related data, including user number, propagated communities, average browsing time, and lasting time.

We first propose several types of topic related data for popularity evaluation in SNS as follows.(1)Number of followers: denotes the number of users who follow the topic .(2)Number of communities: denotes the number of communities in which the topic is propagated.(3)Browsing time: denotes the average length of time that users spend on topic .(4)Lasting time: denotes the length of time that the topic keeps hot in SNS.(5)Activity: a topic is active in a time slice if and only if it is posted, followed, browsed, or propagated or wins other social behaviors in SNS. denotes the activity level of topic in each time slice.

All the above types of data are available through specific collection methods in SNS. In this paper, popularity evaluation is the average of five indicators produced by the above five types of topic related data. Let the total numbers of user and community in SNS be and , respectively. The maximum lengths of browsing time and lasting time of all topics in SNS are and , respectively. Then, the popularity level, , of topic can be calculated as follows:

In the above equation, , where is the number of time slices in the life cycle of topic and is the number of time slices in which keeps active status.

Through the above two aspects of impact evaluation, that is, influence and popularity, we can get the total impact of a topic in SNS as follows:

Furthermore, impact of whole topic space is measured based on its included topics. Let there be a topic space , and the impact of each topic is . Then, the impact of the topic space is a comprehensive evaluation of all topics in as follows:

5. Trust Chain Model and Its Computation Method

5.1. Trust Chain Model

Here, we propose the model of trust chains in detail based on their different network topologies and their trust value calculation methods. We divide trust chain into four types based on the topology.

(1) Atomic Trust Chain. A trust relationship between users is an atomic trust chain if and only if there is a direct link between two nodes and no intermediate node between them.

(2) Serial Trust Chain. A trust relationship between users is a serial trust chain if and only if there is a serial path from source node to target node and the path has the following properties: for source node, its out-degree is 1 and in-degree is 0; for the target node, its out-degree is 0 and in-degree is 1; for each intermediate node, its out-degree is 1 and in-degree is 1.

(3) Parallel Trust Chain. A trust relationship between users is a parallel trust chain if and only if there are two or more trust paths from source node to target node and there is no intersection node among the paths, and the path has the following properties: for source node, its out-degree is () and in-degree is 0; for target node, its out-degree is 0 and in-degree is (); for each intermediate node, its out-degree is 1 and in-degree is 1.

(4) Combined Trust Chain. A trust relationship between users is a combined trust chain if and only if the trust chain is composed of the above three kinds of trust chain.

Furthermore, we here classify the trust chain between users into two types according to the mutual trust degrees as follows.

(1) Strong Trust Chain. A trust chain is a strong one if and only if there are two mutual accessible trust chains between two users and the trust degrees of the both trust chains are all higher than a given threshold ().

(2) Weak Trust Chain. A trust chain is a weak one if and only if there is a trust chain higher than a given threshold () from source node to target node and such trust chain is not a strong one.

In our definition, a strong trust chain reveals a mutual high trust relationship between two nodes, while a weak trust chain reflects a unidirectional trust degree or a bidirectional trust degree with a relative high value.

5.2. Computation of Trust Chain Model
5.2.1. Atomic Trust Chain

In atomic trust chain, there is no intermediate node between two nodes. Then, we can calculate the trust degree through their direct trustworthy interactions and their interest similarity. Let there be two nodes , , and denotes the trustworthy opinion which is expressed by to . Assume that there is a common topic set, , which denotes the set of topics of interest by both nodes. For each , the number of nodes that maintain explicit interest degrees to it is , and the maximum number of nodes that maintain explicit interest degrees to all the topics in social network is . Then, the degree of atomic trust chain from to can be calculated as follows:where denotes the trust degree based on nodes’ direct interactions and is the similarity degree based on nodes’ interests. Here, we use the factors, and , to reveal that a topic which has the less number of explicit interested nodes would give more contributions to nodes’ interest similarity calculation.

5.2.2. Serial Trust Chain

In serial trust chain, we give a constraint for its composition as follows.

Constraint 1. Each atomic trust chain part in serial trust chain must be a strong atomic trust chain or a weak atomic trust chain.

That is, an atomic trust chain part with a low trust degree would be excluded from the serial trust chain, and thus the indirect path from source node to target node cannot be considered as a serial trust chain.

Let there be source node , target node , and intermediate node in serial trust chain and which denotes trust value of atomic trust chain part in the serial trust chain. Therefore, we can calculate the trust value of serial trust chain as follows:

Here function denotes the depth of serial trust chain; namely, . We can see that the deeper the depth of serial trust chain is, the weaker the trust value among users is. That means longer serial trust chain would be punished since the trust would be damped with the number of intermediate nodes increasing. In addition, we give a parameter for distinguishing the weights of strong and weak atomic trust chain parts in serial trust chain.

5.2.3. Parallel Trust Chain

In parallel trust chain, there are at least two serial paths without intersection from source node to target node. In addition, we here present a constraint for ensuring the reliability of parallel trust chain as follows.

Constraint 2. A serial path can be seen as a serial trust chain in a parallel trust chain if and only if it is a strong or weak serial trust chain.

From Constraint 2, we can divide the serial trust chains into two types as follows: the strong or weak serial trust chains, called active serial trust chains, are taken into consideration in parallel trust chain evaluation; and the serial trust chains with low trust degrees, called inactive serial trust chains, would be excluded in trust degree calculation of parallel trust chain. However, the number of inactive trust degree serial trust chains, in which all the atomic trust chain parts are strong or weak ones, is also used in parallel trust chain evaluation since their low trust degrees also reflect the untrustworthy perspective of parallel trust chain.

Let there be () serial trust chains in the parallel trust chain from source node to target node . Assume that there are () active serial trust chains in parallel trust chain. () denotes each serial trust chain, and represents trust degree value of serial trust chain . Then, the trust degree of parallel trust chain from to , , can be calculated as follows:

In the above equation, trust degrees of parallel trust chain are calculated as follows: trust degrees of active serial trust chains are calculated by weighted average method ( is the weight of each active serial trust chain) and number of inactive serial trust chains degrees are considered as negative aspects and thus are used to weaken the trust degree of parallel trust chain by exponential weighting as in (33). We can see that the higher ratio of effective serial trust chains in parallel trust chain and the lower ratio of inactive serial trust chain to active serial trust chain imply a higher exponential weighting.

5.2.4. Combined Trust Chain

Combined trust chain includes crossing paths which are above three kinds of trust chains. We introduce an iterative optimizing approach for combined trust chain, called IOA, by including strong or weak trust chain part and excluding other parts. The constraint of the proposed approach is as follows.

Constraint 3. There are the following four rules for IOA.(i)Local trustworthy rule (LTR): for each atomic path in combined trust chain, it can be seen as an active atomic trust chain for combined trust chain if and only if it is a strong or weak trust chain. That is, those atomic paths which are not strong or weak trust chains from nodes to their neighbors can be ignored in the combined trust chain, and thus their successor paths are ignored due to the breakage occurring.(ii)Serial trustworthy rule (STR): for a serial path from to , it can be seen as active trust chain if and only if it satisfies Constraint 1.(iii)Serial merging rule (SMR): if there is a combined trust chain from ’s indirect neighbor nodes to , it would be merged as a serial one iteratively.(iv)Parallel calculating rule (PCR): if there are two or more direct neighbor nodes which have serial trust chains from to , the combined trust chain can be reconstructed as parallel trust chain with its neighbors iteratively if and only if the reconstructed parallel trust chain satisfies Constraint 2.

An example of our scheme is shown in Figures 3(a)–3(e). In Figures 3(a) and 3(b), we can see the parallel paths and the inverse paths from source node to target node , while the intermediate nodes are , , and . Assume that the thresholds, and , of strong trust chain and weak trust chain are set as 0.6 and 0.5, respectively. Then, we can get the strong atomic trust chains (, , ) and weak trust chains (, ), while the path is an ignorable path due to its low trust degree. Then, we can calculate the trust degrees of serial paths (, ) based on (32) and Constraint 1 as follows:

Likewise, we can get the active serial trust chains as (strong trust chain) and (weak trust chain) according to inverse serial trust chain calculation and Constraint 1 in Figure 3(b), while the serial path of is an inactive path. Then further, we can get the trust degree of the parallel path in Figure 3(a) through (33) as follows:

In Figure 3(c), we can see a combined path which includes joint nodes from source node to target node . According to LTR in Constraint 3, the atomic paths, , , , and , are seen as ignorable paths, and thus we can get the paths for calculating trust as in Figure 3(d). Further, the parallel parts of path from to and path from to can be eliminated by using SMR to generate serial trust chains, as in Figure 3(e). Based on PCR in Constraint 3, the trust degree of combined trust chain in Figure 3(e) can be calculated as follows:

5.3. User Influence Evaluation through Trust Chain

Each user has his/her influence in SNS. Here, we can evaluate user influence by measuring the users who maintain high trustworthiness in the trust chain. The more the users who trust source user in the trust chains, the more influential the source user.

For a user , let there be a trust chain, , in which user is the source node. We denote that the valid impacted user, , is the node which satisfies the following condition in trust chain :

Assume that there are trust chains, , in which user is the source node, and the valid impacted user set in each trust chain is denoted as . Then, the user influence can be evaluated as follows:

6. The Calculation Method of Interest Degree

6.1. User Explicit Interest Degree Evaluation towards Topic

Explicit interest degree can be measured through users’ direct behaviors or other direct witness evidences. In this study, we denote these direct items for evaluating explicit interest degree as interest evidences. For example, the behaviors of a node, such as “forwarding,” “approving,” “following,” and “comments,” can be seen as interest evidences. In our work, we have the following considerations for user explicit interest degree evaluation towards topic. Explicit interest is measured by the level of each interest evidence and its weight. This means explicit interest is aggregated by users’ past interest evidences, and, meanwhile, interest evidences have different impacts in interest aggregation. Explicit interest is impacted by the impact degree of the topic. That is, a more influential topic would attract more users to browse it. From this point, we consider that if a user shows his/her explicit interest towards a topic with lower influence, he might have more interests in the topic.

Then we address the calculation method of explicit interest degree towards a topic. Assume that user has his/her past different kinds of interest evidences as , and denotes the appearance probability of in all kinds of interest evidences. Meanwhile, for a topic , assume that user did times of interest evidences which were recorded as a set and denotes the probability of interest evidence category appearing when the user faced the topic . The weight of each is (). Then, user node would have an explicit interest degree about the topic when it appears in next future as

In the above equation, denotes the probability of appearing interest evidence for topic and it can be calculated as

Then, we give the calculation method of weight . In our view, there are inherent relations among interest evidences: many interest evidences appeared simultaneously and serially. For instance, interest evidence of long time browsing would be likely to lead to interest evidences of “approving” or “adding to the favorites list.” Then, their impacts are related to each other, and these relations can be enhanced in their increasing appearance. From this view, our underlying principle of interest evidence weight calculation is similar to the method of PageRank as more important interest evidences are likely to be related to other more important trust evidences. Here, interest evidence caused by another interest evidence is conveniently written as a link . From this, we can calculate the weight of trust evidence as follows.

Let there be an interest evidence set as , and the set of interest evidences that can link to is denoted as . Then, the equation of calculating link-weight of interest evidence is as follows:where is the probability that occurred in users’ past behaviors and is the number of interest evidences which link out of .

Moreover, user’s interest is a dynamic feeling and keeps changing with the time-passing. Here, we propose a dynamic predicting method based on aging algorithm for describing the interest changing. Suppose the original explicit interest degree calculated at the end of time quantum is . Now, suppose the next value of explicit interest is changed as at the end of next time quantum . Then, we can consider that the explicit interest degree of topic is an integrated value of the past two values and, then, renew the dynamic predicting value by taking a weighted sum of these two numbers; that is,

By analogy, the dynamic estimating value of explicit interest degree can be calculated as

In this work, we have the following constraint for interest degree.

Constraint 4. A user has an explicit interest degree toward a topic if and only if his explicit interest degree value is larger than a given threshold .

6.2. Implicit Interest Degree of Topic

Implicit interest degree reflects a kind of users’ potential opinion toward topic. In our consideration, our proposed implicit interest evaluation mainly focuses on solving the problem of interest prediction without direct evidences or data. Thus, in our proposed work, there are two main considerations for implicit interest evaluation. Implicit interest manifests a likelihood of a user’s potential interest while he/she has not shown any explicit interest or direct evidence before, such as a new register. Therefore, the only direct evidence for predicting the interest of such user is his/her relationships with others. Of course it is impossible that all the relationships can be used to reflect and predict his/her potential interest. Consequently, we in this work use the relationship with “high trustworthy degree” for implicit interest evaluation. That is because the relationship with high trust worthy degree is established based on users’ past interactions and experience, which can reflect the similarity between users in a higher level. Trustworthy relationship based implicit interest has a performance of relatively low computational overhead. The reason is that by evaluating the implicit interest through trust relationship the proposed method only relies on the data of trustworthy relationships and their degrees among users rather than other detail data of users. That is, for each evaluation, it only needs to inquire the trust relationships and degrees among users, which avoids querying detail records, whether or not it has to do with the implicit interest evaluation, and storing such data.

In this study, there are two factors for implicit interest evaluation—trust and user similarity. Trust degree reflects the probability of user’s approving or against attitude through his trust relationships. Therefore, users would tend to be influenced by those who they trust. On the other hand, user similarity reveals the common interests between users. It means users would be easily influenced by those who have similar interests with them. For example, a user might have potential interests to a topic because the topic is his/her best friend’s favorite topic and most of the time the user has common interests with his/her friend to other topics. Thereby, we give an estimating method for evaluating the implicit user interest degree according to the user trust chain model and their interest similarity. This approach comprises two steps: firstly, we calculate the value, called deliverable implicit interest degree, one-to-one from a source node to target node based on their trust chain, explicit interest degree, and user interest similarity; then, we integrate deliverable implicit interest degrees to target node from all source nodes which have trust chains with target node. In addition, there is a following constraint for implicit interest degree of topic.

Constraint 5. The explicit interest degree of a source node can be used for implicit interest evaluation if and only if there is a strong or weak trust chain from target node to the source node.

Let there be a trust chain from target node to source node at time quantum , and the original explicit interest degree of toward a topic is . Suppose that the original trust value of is and then the original deliverable implicit interest degree from to is which can be calculated aswhere denotes the user interest similarity between and about all their other common explicit interested topics, and is the weight for distinguishing the impact of strong and weak trust chain.

Now suppose the trust value of and explicit interest degree of are measured to be and at the end of next time quantum . Then, we can renew the estimation value of implicit interest degree by taking a weighted sum of the new values as

By analogy, we can get the equation for deliverable implicit interest degree after quantum as

After finishing the first step of deliverable implicit interest degree estimation, we can calculate the overall implicit interest degree from all the source nodes to target node. Let there be a set of source nodes, which satisfy Constraint 4, and each node in has trust chain from target node . Then the overall implicit interest degree can be calculated as

6.3. User Interest Degree of Topic Space

User interest degree of topic space is comprised of user interest degrees of topics which are included in topic space. Here, we use weighted average method for calculating interest degree of topic space.

Assume that there is topic space and user has explicit interest degree (or implicit interest degree) of topic as (or ). Then, user interest degree of topic space can be calculated as

7. TUG Discovering Algorithm Based on Trust Chain

In this work, the basic idea of single topic user group organization is as follows: firstly, influential topic is selected based on impact evaluation; secondly, core users, who have strong interests in the specific topic, should be discovered; then, ordinary users, who keep certain explicit or implicit interests in the topic, should be organized through trust chains. Correspondingly, we set two sets, and , for recording the above two types of users.

Based on the above consideration, we first propose the influential topic discovering algorithm as shown in Algorithm 3.

, ;
for do
 get ;
;
for do
 if then ;
return ;

Here, the topic space, which has impact values larger than average of the whole topic space set and where also the numbers of follower users are not equal to 0, would be selected as influential topics. Then, for each topic space , we propose the algorithm of core user discovering for TUG to discover core users from a candidate set as shown in Algorithm 4.

, , ;
while do
 for do
  get ;
  if
  then ;
 for do
  if
  then
  else ;
return ;

In Algorithm 4, we set two thresholds, and , for selecting qualified core users. The thresholds can be calculated asAfter discovering the set of core users, ordinary users of TUG can be found based on their trust chains and explicit (or implicit) interest degree. Then, we propose the ordinary user discovering algorithm for TUG as shown in Algorithm 5.

, ;
for do
 if then
  if
  then ;
for do
;
 while do
 if
 then ;
 else , ;
return ;

8. Experiment and Analysis

In this section, we give examinations to explain the performances of our proposed scheme. In our scenario of examinations, the data comes from a real social network platform, Tencent microblog, which is very popular in China and the dataset is collected manually. Our data included about 3,750 IDs (some IDs located in two or more communities) and more than 457,000 records, including posts, comments, and users’ behaviors (browsing, approving, forwarding, and others). The topology of collected dataset is from the users’ real relationships and the average out-degree of a node is about 9. We select eight kinds of topic, that is, education, entertainment, sporting, technique, financial, food, touring, and history, for user group discovering experiment. Details about initial data setting are shown in Table 2. Additionally, we also get about 780 robot nodes by our previous data processing method in the dataset and we call them invalid nodes, which only follow real users, forward posts, or hit approving automatically and should be excluded in user grouping.

8.1. Examination for Topic Space Construction and Impact Evaluation
8.1.1. Performance of Topic Space Relatedness Clustering Method

In this examination, we aim to reveal the performance of topic space relatedness clustering method. Here, the candidate topic sets for space clustering are selected randomly from eight kinds of topics. For comparison, we give four groups as follows: topic grouping based on -means method (M), which classifies topics into specified number groups, topic grouping based on cosine similarity (CS), which selects feature words for feature vectors in calculation by TF-IDF method, topic grouping based on NN method (NN), and proposed relatedness clustering method (RC). To verify the performances of comparing groups, we set nonnoise data and 20% and 30% noise data sets, respectively, where the noise data sets include the topics that did not belong to any of the above eight kinds. The results are shown as in Figures 4(a)4(c). We can see that the performance of our proposed method outperforms other three groups a little.

And further, we analyze the impact of two thresholds, and , in Algorithms 1 and 2. As shown in Figure 4(e), the average accuracy of relatedness between topics based on Algorithm 1 is decreased with the increasing of value setting of threshold . The reason is that a higher value of would allow Algorithm 1 ending with less stable relatedness values between topics. In addition, we can see that the average accuracy is lower while the value of is set too low or too high. We consider that a too low value of leads to unrelated topics being grouped in topic space while a too high value of leads to appropriate topics being excluded from topic space. This test validates that the thresholds around 0.1 and 0.7 are often a reasonable compromise for and .

8.1.2. Performance of Topic Space Impact Evaluation

In this examination, we reveal the performance of the proposed topic space impact evaluation method. We give about 240 topic spaces which are grouped by proposed space clustering method for evaluating their impact degrees. All topics in topic spaces are contained in the initial data set. We set six impact evaluation groups in different methods for comparison as follows: only link relation based impact calculation (LR), only hierarchy relation based impact calculation (HR), only proposed influence degree based impact calculation (ID), only proposed popularity degree based impact calculation (PD), linear threshold model based method (LT), and our proposed integrated impact degree method (TP). And then, we record the average precisions of six groups as in Figure 5(a). Here, the optimized results of topic impact evaluation are set manually in advance, and if the impact evaluation results of six groups are larger or smaller than optimized results within 0.1, the evaluation results are seen as accurate. We can see that LT method and our proposed TP method get similar performances, which both outperform other methods. Meanwhile, we can see that the precision of TP keeps increasing with the increasing of topic number in topic space as in Figure 5(b). That means that as the topic space includes more topics, the data for impact evaluation would be enriched more, which results in better performance in impact evaluation.

8.2. Performance Evaluation of Proposed Trust Chain Calculation

In this examination, we get the accuracy of predicting trustworthiness among users through invalid node detection. In this test, we use the trust degrees between real users and invalid users for testing the accuracy of invalid node detection. For a real user, if he/she has a strong or weak trust chain with an invalid node, the trust relationship between them is wrong. For comparison, we introduce the following groups: atomic trust chain (ATC), serial trust chain (STC), parallel trust chain (PTC), hybrid trust chain (HTC), EigenTrust method (ET) [28], weighted average trust rating (WA) [30], and ultimate trust rating (UT) [31]. Figure 6(a) reveals that the UT method gets the best performance, while our trust chain methods including ATC, STC, PTC, and HTC get similar performances and are better than methods of ET and WA. We consider that the reasons are as follows: the ultimate trust method provides trust calculation by its dynamic adjusting of factor by large computational costs, which results in all users maintaining trustworthy knowledge about others for detecting malicious interactions, and trust chain methods can provide more comprehensive evaluation for relationships among users through proposed factors under relative low costs. In addition, we compare the performance of proposed trust chains (ATC, STC, PTC, and HTC) with strength perspective (strong and weak trust chain) and trust chains without strength constraints. The results are shown in Figure 6(b), and we can see that the performances of proposed trust chains are obviously higher than the methods without strength constraints. With respect to value setting of thresholds , , we have that values around 0.7 and 0.5 are reasonable compromise according to our empirical testing.

8.3. Examination of User Interest Degree Measurement

In this example, we examine the performance of proposed interest degree for topic space. Firstly, we test the effects of explicit interest degree calculation. We calculate users’ explicit interest toward selected topic spaces and then compare the explicit interest quality of the following groups: view-time based interest (VT), weighted average of interest evidence based interest degree (WA), the explicit interest without dynamic predicting (ND), and the proposed explicit interest calculation in this paper (EI). As shown in Figure 7(a), EI method outperforms VT, WA, and ND, with an improvement of 28%, 21%, and 13%, respectively. We consider that the reasons are as follows: interest evidences are impacted by appearing probability and the weights of link and dynamic predicting method is used for measuring latest interest. In addition, we can see the impact of threshold in Constraint 4 and results are shown in Figure 7(b). By empirical analysis, the threshold can be set as 0.4.

Then, we test the performance of implicit interest prediction for topic space. Figure 7(c) shows the performance of different implicit interest calculation methods, that is, strong trust chain based implicit interest (SMI), weak trust chain based implicit interest (WMI), and integrated implicit interest based on both strong trust chain and weak trust chain (IMI). We can see that the performance of SMI is best since the strong trust chain implies a mutual trustworthy relationship which brings a higher precision of implicit interest prediction. The total average accuracy of implicit interest prediction based on trust chain is about 80.9%.

Finally, we verify the impacts of parameter for interest degree calculation. We test the different values of for explicit interest degree (EI) and implicit interest degree (MI), respectively. In Figure 7(d), we can see that the average accuracy is lower while is too low or too high. On the basis of test validation, a threshold around 0.4 is a reasonable compromise.

8.4. Topic Space Oriented User Group Discovering Analyzing

In this examination, we verify the performance of our user group discovering method with trust chain based interest measuring (TI). We compare our proposed method with three methods: nearest neighbor method (NN), only explicit interest based user grouping method (EI), and relationship closeness (edge density) based user grouping method (RC). Figure 8 compares the user group discovering quality of TI, NN, EI, and RC. As shown in Figure 8, NN, EI, and RC obtain lower performance than TI in all eight kinds of topics, respectively. We consider that the reasons are as follows: TI is an interest sensitive approach which adopts both explicit and implicit interests as a key criterion for user grouping, while NN and RC do not take interest into account and EI only considers the explicit interest; in TI, users are grouped within a close related way, strong or weak trust chain, for manifesting their close correlations, which decreases the error rate in ordinary user discovering.

Furthermore, we examine the effectiveness of user group discovering through online topic recommendation quality. In this examination, we recommend manually preprocessed topics, including both close related and irrelevant topic space, and then record the quality of recommendation. In the recommendation, the above mentioned topics are recommended to users who are in four kinds of user groups that are organized by methods of TI, with NN, EI, and RC. Then, if a user performs behaviors of hitting, reading, forwarding, or posting judgments on his/her received topic, the recommendation is recorded as an accurate one. The precision of a recommendation to a user group is calculated as the average of all members in the user groups. There are six user groups corresponding to each kind of organizing method, respectively, and about 120 times of recommendation were launched in our experiment. We record the average precision of all kinds of user groups and the results are shown in Figure 9. We can see that recommendation quality of user group discovered through our proposed method is better than other three methods. That means close related topics are accepted and irrelevant topics are excluded by users in user group based on TI, which implies higher accuracy and better efficiency than user groups through NN, EI, and RC.

9. Conclusion

For users in SNS, joining a user group to facilitate their communication is very common and inevitable. Commonly, the user groups are organized based on a set of topics having close internal correlations. However, most existing researches focus on user clustering based on their relation closeness degree and common explicit interests, while few efforts have been paid on users’ interest interaction and expanding. Establishing user group based on influential topics has its practical significance. First, our work provides a valuable guideline on describing the topic oriented user group and computation methodology of user group organized formally and specifically for users. Each user can inquire about details of topic space, including influence degree, members of group, and estimating user interests for his/her further purpose of information communication. In practice, a machine driven mechanism can achieve higher efficiency than manual methods to reduce the overload of user group discovering and organizing. In addition, our proposed formal definitions and conceptions are quite appropriate for machine reading and understanding the calculation methods. Secondly, application of the influence and popularity helps us aggregating all aspects to get a more comprehensive impact degree about topics and topic spaces. This is because the proposed influence degree of topic reflects the structural closeness directly which also shows the integrated impacts about topic more or less, while proposed popularity degree reflects how influential a topic is from direct data through impartial views. Thereby, our proposed method can provide sound usability for influential topic space discovering. Thirdly, the proposed trust chain model shows acquiring direct and indirect trust among users and acquiring strength of indirect trust (strong or weak trust chains) for users. Meanwhile, the topological information of trust chain is fully considered in this work. That is, we calculate trust degrees of trust chain in different route compositions corresponding to the trustworthy manners among people in real world; for example, trust is damped with the depth of route increasing and trust is enhanced when multiple direct neighbors link to the same target simultaneously from parallel paths. Furthermore, the proposed interest evaluation gives a novel method of calculating both explicit and implicit interest degrees, which may facilitate the automatic user preference analysis.

In our view, user group in SNS is an important community for users’ daily discussing, information sharing, and recommendation under similar interests. Commonly, the user groups are organized based on a set of topics having close internal correlations, which means that close related topics may attract more interests of users to share their information. In this study, our main purpose is to find a method for gathering influential and close related topics and predicting all probable interested users (including explicit interested and implicit interested users) to form user groups. To do so, we propose a topic space discovering scheme with a trust relationship based interest measuring perspective, which contains four aspects: influential topic space construction through topic relatedness clustering and impact evaluation, user trust chain evaluation based on SNS topological information, explicit and implicit interest estimation based on user trust chain, and user group discovering. Through empirical examinations based on our collected dataset manually, results and evaluations show the efficiency and feasibility of our proposed scheme. Our future work will be in study of user group organization with the concern of overlapped structure.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

This work is funded by National Basic Research Program of China (2014CB340404), National Natural Science Foundation of China (61572326, 61103069), Innovation Program of Shanghai Municipal Education Commission (13YZ052, ZZyy14003), China Postdoctoral Science Foundation funded project (2014M551334), and Program of Shanghai Institute of Technology (YJ2014-06).