Abstract

User influence is a very important factor for microblog user recommendation in mobile social network. However, most existing user influence analysis works ignore user’s temporal features and fail to filter the marketing users with low influence, which limits the performance of recommendation methods. In this paper, a Tensor Factorization based User Cluster (TFUC) model is proposed. We firstly identify latent influential users by neural network clustering. Then, we construct a features tensor according to latent influential user’s opinion, activity, and network centrality information. Furthermore, user influences are predicted by the latent factors resulting from the temporal restrained CP decomposition. Finally, we recommend microblog users considering both user influence and content similarity. Our experimental results show that the proposed model significantly improves recommendation performance. Meanwhile, the mean average precision of TFUC outperforms the baselines with 3.4% at least.

1. Introduction

Microblogging services, such as Twitter or Weibo, have been one of the most popular platforms for individuals to exchange information by posting messages or comments in up to 140 characters. With the rapid growth of mobile devices, microblog has created mobile applications to provide their users instant and real-time access from anywhere they can access to the Internet. For example, as of September 2017, the number of monthly active users in Sina Weibo platform is more than 376 million, in which about 92% users are authenticated through mobile phone and/or tablet. A large amount of valuable content exists in the microblog generated data. However, as a result of the rapid increasing population on microblog platform, most users are confronted with the serious problem of information overload [1]. It is extremely difficult to find desirable information using mobile devices. In this situation, recommending relevant users for alleviating the flooding of information appears to be very significant for the users [2].

User influence can provide valuable clue about her preference and thus is indispensable for recommending microblog users in mobile social network [3]. Consequently, incorporating user influence into recommender systems has demonstrated to improve recommendation performance and receives a lot of attention. Li et al. [2] considered social influences and their indirect structural relationships and proposed a Topic-level Social Influence-based mircoblog recommendation model to make user prediction. Jiang et al. [4] proved that users’ decisions on information adoption can be affected by individual preference and interpersonal influence and then integrated these two factors to construct a scalable algorithm for online behavior prediction. Chen et al. [5] took advantage of tweet content, user social relations, and explicit features and then proposed a collaborative ranking model for tweet recommendation task. Yan et al. [6] presented a graph-theoretic method to rank tweets and their authors simultaneously by utilizing several networks, i.e., user network, tweet network, and the network that ties the two together. Therefore, it is significant to analyze user influence in mobile social network and integrate it into the recommendation framework.

There exist several pioneer studies on user influence analysis in mobile microblog platform. Velissarios et al. [7] proposed four different metrics for emphasizing Twitter content features and the behavior of each user’s followers and then identified influential users through the comprehensive metrics considering user’s affiliation and her interest rate. Mao et al. [8] introduced a learning-based method for analyzing and measuring users’ social influence via predicting users’ capability of propagating information. Both information extracted from social network structures and user behavior factors were combined in the method to gain a better performance. Xia et al. [9] explained the propagation mechanism of influence in terms of the diffusion of users’ emotion. David et al. [10] analyzed the probabilities of one user activated by another user. Then, they combine that user’s other features to obtain influence score. Cai et al. [11] proposed an OOLAM model to measure user opinion influence, they separated users interaction graph into two parts, positive graph and negative graph. They ranked users with a PageRank analogous algorithm. These methods reviewed above explored user influence from the perspective of users, which had low accuracy in specific topics.

Recently, various studies are involved in investigating topic level user influence. Those studies showed that most information was created and diffused in terms of topic. User influence can be measured more elaborately from the point of the topic. Therefore, topic level user influence analysis has received increasing attention from researchers. Weng et al. [12] proposed the TwitterRank to calculate user influence score according to the graph structure and topic similarity. Cui et al. [13] introduced item level influence using probabilistic hybrid factor matrix factorization. Chen et al. [14] proposed the MIRC algorithm which can distinguish users in different groups. Their experimental results showed that different influence roles may have stronger influence in their own role level. Wang et al. [15] calculated user influence with four features, i.e., Expert, Leader, Social, and Similar, and then applied user influence to group recommendations. Wei et al. [16] took users’ opinion and topic relevance into consideration, and then predicted user influence according to the latent factors resulting from the tensor factorization.

However, most studies for user influence analysis on topic level only consider users’ explicit features which can be obtained from users’ profile directly [14, 15]. In particular, these existing works neglect the temporal characteristic which can be obtained from the interactions [17]. In addition, the tensor factorization algorithm of user influence analysis tends to give low propagation ability users a high ranking score, since it reduces dimensionality by retaining the critical factors. In this paper, a Tensor Factorization based on User Cluster (TFUC) model [18] is proposed for recommending users according to a specific topic. The TFUC model firstly clusters influential users into a certain groups according to their temporal characteristic. Then, we measure users’ influence scores by the temporal restrained CP decomposition on the influential clusters. Finally, both user’s influence and content similarity are integrated for recommending users for a given topic. The experimental results in Sina Weibo dataset show that user influence ranking precision of TFUC is better than existing models such as TwitterRank, OOLAM, and HF_CP_ALS. Moreover, our proposed TFUC model can significantly outperform the baseline methods according to recommendation precision.

This is an extension of our previous work [18], in which we proposed a tensor factorization based user influence analysis method. In this paper, we addressed the problem of recommending users in mobile social network and claimed that user influence is a very important factor for user recommendation. We expanded the experiment dataset and added the experiments on recommendation. To summarize, the main contributes of the work are listed as follows.

(1) The latent influence users are identified by a neural network clustering model. This model can filter the marketing users with low influence before constructing tensor, which is proven to significantly enhance the recommendation effect.

(2) TFUC model is proposed by incorporating temporal features, which can improve user recommendation accuracy. Particularly, our method integrates temporal features by tensor model, predicts user influence using temporal restrained CP decomposition, and finally recommends users considering both user influence and content similarity.

(3) We conduct extensive experiments using real-world Tecent Weibo dataset to verify the effectiveness of our proposed recommendation approach. The experimental results suggest that the proposed method can considerably improve recommendation precision and outperform the baseline approaches.

The rest of the paper is organized as follows. The recommendation problem is defined in Section 2. The proposed model for user recommendation is presented in Section 3. Experiments are conducted in Section 4. We conclude the work in Section 5.

2. Problem Formulation

It is well known that people tend to trust a user with high social influence in social network. Therefore, we apply users influence analysis to recommend items for users according to different topic in social media. In this paper, the items refer to users.

We denote items in the microblog as , where is the number of items. In the meantime, users who have ever interacted with these items are denoted as . There are two core elements for the recommendation system: characteristic model of items and characteristic model of users. We characterize every items in as and characterize every users in as .

The goal of this work is to calculate the similarity among items and users and recommend the most similar items for users. In particular, we apply the influence scores of items while calculating the similarity to verify whether the higher influence score the user has, the more likely his recommendation is to be accepted. To obtain these influence scores, we introduce some necessary features of users, such as the number of fans and the number of posts. Therefore, we let represent users’ fan characteristic and represent users’ post characteristic. Every interaction between and contains the time of when it takes place, so we present this data as .

Referring to Varun’s theory [19], we can hardly get user’ influence score from single aspect. Thus, we analyze users’ influence from four aspects as follows.

(1) Users’ propagation ability: getting propagation ability of users is an important purpose in social networks. This ability is usually calculated from the accumulation of time in document collection . We denote this ability of as .

(2) Users’ opinion strength [20]: one’s opinion strength captures his whole tendency and effectiveness in social network. By calculating all of the users’ opinion polar who has interacted with , we can get an opinion score of . We present this score as which can be analyzed from the document collection .

(3) Users’ fans activity [21]: users with higher levels of activity may contribute more influence to other users in microblog social network. In our work, we regard the number of articles which are posted by as his activity. We can obtain ’s global fans activity by accumulating all ’s activity who has ever interacted with any of . Formally, we define as ’s global fans activity, where is the collection of users’ post features.

(4) Users’ network centrality: according to [9, 22], users who have higher influence may have more number of fans. If a user’s fans have more fans, it means that the information posted by this user may spread wider. This spreading effect is known as the network centrality and denote as .

Overall, we formalize user influence analysis as follows: given a topic , the goal is to find a mapping . Users influence scores are calculated by aggregating four users’ features . After calculating all basic users’ influence scores, we can obtain a user influence ranking list sorted by influence scores.

3. Recommendation with User Influence Analysis

In this section, we propose a user influence analysis model [18] and then integrate it into recommendation. A user with high influence can receive a large number of comments in a short time. A user prefers accepting influential users (referred to items) when he is receiving recommended users by recommendation system. Therefore, the performance of recommendation system will be improved by involving the influence of items. Since the factorization based method performs poorly at low ranking users, we design a two-steps method for influence analysis. In the first step, low influence score clusters are identified by a neural network clustering method. In the first step, user influence is predicted by a tensor factorization method.

3.1. Neural Network Clustering Model

Users’ global influence consists of multiple individual influence features, i.e., propagation ability, opinion strength, fans activity, and network centrality. The users with higher influence rank would have more comment and stronger opinion strength and are more centrical in the network. On this basis, we first partition data into clusters and filter users with low influence in . We firstly describe how we obtain those four users’ features.

(1) Let denote the number of users who has interacted with . Within a time window , we can get the delay between the time of ’s first interaction happened in and the time of interacted with according to [23] as follows: assume that the delay has the exponential distribution form like , where is the transmission rate parameter. The transmission rate parameter captures the capability that how wide a user can reach in the network and thus the computing process iswhere are the basic users and are the users who have interacted with ’, the indicator function is 1 if is true and 0 otherwise. Equation (2) result in the fact that the total number of times has interacted with and (3) captures a time accumulation of those interactions. After calculating , we can infer time accumulation of by following aggregate function:

(2) Each user would show an opinion polar to an interbehavior which can be inferred from the interaction between him and basic users. Therefore, we can get ’s global opinion strength by accumulating all opinion polar of his interactions. We utilize (5) to obtain the opinion strength of :the indicator function is -1 if has ever expressed a negative interbehavior and 1 if has ever expressed a nonnegative interbehavior.

(3) As we defined previously, ’s fans activity is related to the total number of articles that are posted by all of his fans. Based on this definition, we can obtain ’s fans activity as follows:

(4) Recall that the number of fans of user is available directly from ; we calculate user ’s network centrality as follows:

We now discuss how to partition users in into clusters according to those four influence features. The input samples of our method are . Each sample involves four features which we obtain previously. We denote each sample as , where is , respectively. Let denote multiple clustering centers. Each center has four elements, i.e., . For the clustering problem, the loss function iswhere is the clustering center of and is the weight of between input and interlayer.

We update each using stochastic gradient descent.whereBringing (10) into (9), we have

We update clustering centers for each batch as where is an indicator function for clustering center and the result is 1 if belongs to the cluster and 0 otherwise. The denominator in (12) is a counting function which returns the number of samples in cluster .

3.2. Construction of Tensor User Influence Model

We assign each cluster to a specific influence category. Specifically, the assignment with most of latent influential users in is selected to construct the tensor model. Users in this cluster are denoted as , where . Our users influence model is represented by a 3-order tensor , where is the number of users in , is the number of comment users in , and is the number of influence features. Tensor decomposition is generally used to predict the distribution of data and the latent features of data. Tensor is used widely in many research area, such as weather forecast, event prediction [24], information recommendation [25], and picture processing [2628]. Finally, we take these influence features into each tensor slice.

(1) The opinion slice of users: this slice indicates every users’ interaction opinion in on in detail; i.e.,where is an indicator function as same as the function in (5), .

(2) The fans activity slice of users: in this slice, users who have ever interacted with would have a activity influence upon . Thus every element in this slice can be represent as

(3) The centrality slice of users: as mentioned in Section 3, we present users’ network centrality by his diffusion ability which can be presented by his total number of neighbours; i.e.,

3.3. Factorization of Tensor User Influence Model

For the tensor , the loss function of rank-R CP decomposition [29, 30] is

The corresponding objective function for stochastic optimization problem is

However, temporal influence feature neglects this problem. Thus, a time constraint is added to the user matrix. So the influence score of users whose propagation ability is strong will increase and the score of users who postfrequently receive few comments. The new loss function is written aswhere is the time constraint matrix which can be obtained from (4). is diagonal and the main diagonal element iswhere is the users in .

The object function is

Following the method proposed in [30], the gradient of (18) is

According to the theory proposed by Acar et al. [31], we can get thatwhere is the model-1 unfolding, , and is the Khatri-Rao product between and . In the same way, we can get , , , and .

We can obtain a rule for updating by substituting (21) into stochastic gradient descent method as follows:where is the step size. The updating rules of are similar to . We just give the updating rule of due to the space limitation.

3.4. Measurement of Users Influence

We now discuss how to calculate users’ influence score by utilizing the result of the tensor decomposition. Users’ influence can be calculated from three different influence scores.(1)Score of users’ opinion strength:(2)Score of users’ fans activity:(3)Score of network centrality:

where is the expectation of . We unify each influence score using min–max normalized method, respectively. And then, we use final influence score by combining these three normalized scores as follows:

We add a user topic similarity metric to the combining function to increase users’ influence score whose topic similarity is higher. This topic similarity metric will be explained in later sections.

3.5. Recommendation Model with User Social Influence

In this section, we recommend items for users by using content-based recommendation algorithm in Tencent Weibo dataset. The items in this dataset are the person, organization, or group in the real world. Initially, we obtain the preferences and interests of the users whom should receive the recommendation. Users’ preferences and interests are analyzed from the articles and comments of them. After that, we establish the users characteristic model based on these preferences and interests. The other essential processes are establishing the items characteristic model. In order to adapt to the dataset, we use the preferences and interests of the items characteristic. Based on these two models, we calculate the similarity between the users and the items. Furthermore, we combine the ranking indicators of items into the similarity and call it influence-similarity. Finally, we recommend items for users according to the influence-similarity.

There are two core parts for the above content-based recommendation process: users characteristic model and items characteristic model. Since the users and the items are all the individual users in Tecent Weibo dataset, we represent every user to a characteristic vector by using the TF-IDF method. Formally, we denote user vector as , where is a word that is extracted from the articles and comments which user has ever posted, and is the corresponding weight of the word in the text collection. The weight is calculated by TF-IDF method as follows:where is the number of times word appearing in text , is the total number of words that text contains, is the total number of texts in the dataset, and is the number of texts that contain word .

The next step is calculate the cosine similarity between the users and items. For example, when calculating the similarity between user and item , we have

However, when (33) was adopted to calculate the similarity between user and item , it does not take the influence of item into consideration. Therefore, we add the influence ranking indicator of item into the original cosine similarity. Thus, the item of higher influence score could have higher probability of being recommended. The cosine similarity with the influence of items is calculate as follows:where is the influence ranking indicator of item in its topic areas.

4. Experiments

4.1. Datasets

In this section, extensive experiments are conducted on two popular microblogs' platform in China, i.e., Sina Weibo and Tencent Weibo.

For Sina Weibo dataset, we first crawled 2015 basic users in different topics, including law, basketball, economy, and health. We crawled these users’ information and all articles posted by these users from October 31, 2016, to December 1, 2016. The basic statistics of Sina Weibo dataset are showed in Table 1. We annotate users’ influence ranking manually according to [16]. In this dataset, the interaction between two users is present as comment. If commented in ’s articles, there generate an interaction between them. There exists a delay between the time post article and the time commented on this article. Therefore, we can obtain the temporal characteristic based on this delay. Besides, the topic similarity metric in (29) is obtained the same as [16].

For Tencent Weibo dataset, we obtain it from KDD Cup 2012, Track 1. This dataset contains about 6095 high influence users in different topics. These users are called “Item” in this dataset. There are about 73,209,277 recommendation logs in this dataset. The recommendation is send to every user corresponding to a user profile such as his gender, the number of articles he posted, and keywords extracted from all his articles. We can infer user’s number of fans from the relational network in this dataset. In this dataset, the interaction between two users is present as recommendation. This interaction contains two significant information, i.e., acceptability and time stamp. Thus, we can obtain ’s opinion strength and temporal characteristic according to (5) and (3), respectively. For experiment convenience, we choose high influence users in four topics and the time window is from October 12, 2011, to October 13, 2011. Table 2 shows the basic statistics of this dataset. In this dataset, the topic similarity metric in (29) is obtained according to the Jaccard similarity of their characteristic set.

4.2. Baseline

We compared TFUC with the following baselines:(i)TwitterRank [12], which calculated user influence according to the users’ interactions in a certain topic.(ii)TwitterRank_C, in which we apply TwitterRank to calculated user influence based on the latent influential users cluster which obtained by our cluster model.(iii)OOLAM [11], which is a PageRank analogous method in which interactions are divided into positive and negative parts so that users’ opinion influence is calculated in positive and negative parts, respectively.(iv)OOLAM_c, in which we use latent influential users to construct positive and negative graph, respectively.(v)OOLAM_SM, in which the users’ topic similarity is taken into consideration in OOLAM.(vi)OOLAM_SM_C, in which the cluster model is added in OOLAM_SM.(vii)HF_CP_ALS [16], which is a tensor model, in which users’ opinion and topic relevance are taken into consideration.(viii)HF_CP_ALS_C, in which the cluster model is added in HF_CP_ALS.(ix)CP_SGD, in which low influence users are not filtered when we construct the user’s tensor.

Besides, we need to verify whether the performance of recommendation system with influence has a better performance than the recommendation system without; we choose a simple recommendation algorithm, i.e., content-base(BC) algorithm to be the baseline.

4.3. Evaluations

The evaluations include 3 ranking precision evaluations and 2 recommended precision evaluations.where is the set of real top- users and is the predicted set of top- users.where denotes -th rank and denotes the number of users.where is a certain topic and denotes the number of topics.

The two recommended precision evaluations are as follows:where is a certain topic, is the number of topics, and represents the number of users that need to be recommended. is an accepted index, the value of it is 1 if the -th user was once accepted when he was recommended to other users and 0 otherwise.

Equation (38) is the average precision in a single topic; it reflects the performance of the model in a single topic. Equation (39) reflects the overall performance of the model in all topics. The higher the is, the users of higher influence score are more likely to be accept by other users.

4.4. Precision Results of User Influence Ranking

The of different methods in Sina Weibo dataset is shown in Table 3. The of our method is optimal except for the in law topic. To analyze the experimental results in more detail, we compare our method with each baseline separately. It can be seen from Table 3 that our proposed method outperforms TwitterRank, which verifies that a user with strong opinion strength, many activity fans, and high propagation ability would be influential. The precision of our method is at least 10% higher than that of OOLAM, which demonstrates that a user with high propagation ability and high topic similarity would has a high influence score. The temporal features are neglected in OOLAM_SM, so that it performs worse than our method. HF_CP_ALS also did not take temporal features into consideration, so that the user with high propagation ability would not get a high influence score. Comparing to CPSGD, the precision of our method has improved at least 10%, which means that filtering some low influential users can improve the performance.

Furthermore, we also calculated the and for each method. Figure 1 shows the precision in different . The is higher when the area under the curve is larger. The detail and of each method can be seen in Table 4. of our method is better than other baselines except for the of OOLAM in basketball topic and the MAP of our method is best among all methods. We can conclude that our method performs better than other baselines.

4.5. User Influence in Recommendation

In the previous section, we proved that TFUC outperforms other baselines. In this section, we apply TFUC in retrieving users’ influence scores in Tencent Weibo dataset. After that, we rank users according to these scores. For each basic users in this dataset, we obtain a recommendation result by counting all ’s recommendation logs. If his recommendation was once accepted by other users successfully, his recommendation result could be present as 1 and 0 otherwise. By calculating the correlation coefficient between the influence ranking list and the result list, we could tell which influence analysis method we used in recommendation is closer to the practical situation.

In recommendation task, we also first recognize latent influential users by TFUC model. Now we discuss how to obtain user features to generate clustering model.

(1) As mentioned in datasets description, the interaction between two users is presented as recommendation in Tencent Weibo dataset and this interaction contains time information. Therefore, we can obtain basic users’ propagation ability from this time information by (3).

(2) Due to lack of direct opinion information in Tencent Weibo dataset, we present the acceptance of the recommendations as the opinion of a user to the basic users. In this case, we can obtain basic users’ opinion strength according to (5).

(3) The users’ fans activity and network centrality are obtained similar to (6) and (7).

After getting these four users’ features, TFUC model partitions users into different clusters and constructs tensor model based on the users whom in the latent influential user cluster. After decomposing the tensor model, all users’ influence score can be predicted according to (29).

To calculate the precision of the recommendations, we need to obtain the successful accepted list. In Tencent Weibo dataset, if a accepted the recommendation of whose entity is , the system will record this interaction into recommendation logs. By calculating which user is successfully accepted by other users in logs file, we can get an acceptance list.

Finally, we can get a recommendations precision by compare the influence list and the acceptance list according to (39). Figure 2 and Table 5 show the results.

It can be seen from Figure 2 and Table 5 that the recommend results which combine with the users influence ranking list obtained from our method have the similar performance to the OOLAM method in each topic. However, our method shows a better overall performance in four topics. This result illustrates that when the influence scores which add temporal characteristic and topic similarity were applied in the recommendation system, the items with a higher influence are more likely to be accept. The value of our method has promoted 2% to 8% than OOLAM_SM. The result above reflects that when the temporal characteristic is considered, the items influence ranking list is better adapted to the actual recommendation results. Our method has a higher recommendation precision than the method HF_CP_ALS which also confirmed the above conclusion. Compared with CP_SGD methods, the value of our method has improved in every topic. This is due to the fact that we filter the impact of low influence but high activity marketing items of which the recommendations have a low probability to be accepted.

Based on the analysis above, we can conclude that the high influence items obtained from our method have a wider range of probability to be accepted by users. Therefore, we combine the recommendation system with the item influence obtained from our method. Firstly, we calculate the recommendations result list from rec_log_train dataset in topics “1.6.2.1”, “1.1.2.1”, “1.2.2.1”, and “1.12.4.5”. We next calculate the influence-similarity between the users and items in this recommendations result list. Then, we choose the top 100 most similar results as our recommendations for users and calculate the average recommended precision in each topic. Finally, we obtain the by fusing the average recommended precision of each topic. The of content-based recommendation method is 14.5. The improves to 15.5 when TFUC is integrated. This indicates that the performance of the recommendation system can be improved when the influence is considered.

5. Conclusion

This paper focuses a recommendation task in which users’ influence analysis is involved in microblogs. We introduce a two-steps method for influence analysis. Firstly, users are partitioned into influential part and uninfluential part. And then, we expect CP decomposition with stochastic gradient descent method to expedite decomposition. In addition, a time constraint matrix is also involved in the user factor matrix during the decomposition. Finally, we apply TFUC model to recommend items for users according to the influence of items. The experimental results show that TFUC outperforms other baselines with 3.4% at least. The extensive experiments in Tencent Weibo dataset show that the precision of the recommendation system is improved when we combine the recommendation system with the influence of items.

There are many potential future directions of this work. First, the temporal characteristic of users or items in this paper is estimate rough. The reason is that we suppose that the delay of the interaction satisfies the exponential distribution. We do not conduct in-depth research of the other temporal accumulation models. Such ambiguity can be further aggravated in the microblog. Additionally, the recommendation algorithm of our work is still primitive with challenges including how to design a more realistic recommendation algorithm and combine it with user influence aspect.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research project was supported by the National Natural Science Foundation of China (no. 61772135 and no. U1605251), the Open Project of Key Laboratory of Network Data Science & Technology of Chinese Academy of Sciences (no. CASNDST201606 and no. CASNDST201708), and the Directors Project Fund of Key Laboratory of Trustworthy Distributed Computing and Service (BUPT) Ministry of Education (no. 2017KF01). The authors thank Lin Gui and Kam-Fai Wong for their cooperation in TFUC model.