Information Filtering via Biased Random Walk on Coupled Social Network
The recommender systems have advanced a great deal in the past two decades. However, most researchers focus their attentions on mining the similarities among users or objects in recommender systems and overlook the social influence which plays an important role in users’ purchase process. In this paper, we design a biased random walk algorithm on coupled social networks which gives recommendation results based on both social interests and users’ preference. Numerical analyses on two real data sets, Epinions and Friendfeed, demonstrate the improvement of recommendation performance by taking social interests into account, and experimental results show that our algorithm can alleviate the user cold-start problem more effectively compared with the mass diffusion and user-based collaborative filtering methods.
In the past two decades, the Web 2.0 and its applications have greatly accelerated the development of the Internet. They bring our lives much convenience as well as overwhelm us with too many resources in the information ocean. One typical scenario is online shopping in our daily life. When we are confronted with millions of books on http://www.Amazon.com or billions of different kinds of commodities on http://www.Taobao.com, indeed, it is very difficult to choose the relevant ones from countless candidates. This is the so-called Information Overload problem . Therefore, an automatic way that can help us make the right decision under the Information Overload is a significant issue for both academic and industrial communities.
Search engines provide a way to help users find the useful information, which alleviates this dilemma partially: a user inputs the keywords and then the search engine returns the results accordingly. However, if different users input the same keywords, the search engine will return the same results. Besides, when users resort to a search engine, they must know how to clearly describe what they want by the keywords. But in most situations, users do not know what they really want or it is hard for them to find appropriate keywords. In this case, the recommender systems  have been designed to solve this problem.
Recently, social networks (SN) [3, 4] have become a powerful tool to characterize social relationship in online social services, emerging with various Web 2.0 applications  in evolutionary games [6, 7], community detection , medical science , and so forth. By taking advantage of social relationship in recommender systems, many traditional challenges can be partially solved, such as the cold-start problem  and data sparsity problem . However, most researches are focused on mining the similarities among users or objects in recommender systems, and the social influence is seldom taken into account.
Coupled networks (CN), also known as interdependent networks , are usually composed of two layers of networks [12, 13], such as electricity/internet networks  and airport/railway networks . Being similar with interdependent networks, a coupled social network (CSN) also contains the coupling nodes (users), which form a leader-follower relationship in the layer of social network and collecting relationship in the layer of information network. Figure 1 gives an illustration of a simple CSN with five users and five objects, where circles denote users and squares represent objects; the social network (upper layer) consists of five users and the information network (lower layer) consists of five objects and five users, where the users are the same as those in the social network. It can be seen that will not be recommended to in the user-object network because only collects object and the value of similarity between and is zero. However, follows in the social network, which indicates that may have similar interests with to some extent; thus, we can accordingly recommend to via social network. Therefore, by making use of the social relationship between users, the user cold-start problem can be partially solved. When a new user comes to the system, we can recommend him/her some objects through the social network.
Moreover, most researchers focus their attention on mining the similarities among users or objects in recommender systems, and many researchers use the social interest to filter the recommendations, but we use the social interest to supplement the recommendations instead of filtering them. To our knowledge, the random walk algorithm on coupled social network remains yet to be investigated in recommender systems.
The contributions of this paper can be summarized as follows. We use the social interest to supplement the recommendations instead of filtering them and we obtain more accurate recommendations. We first propose a biased random walk recommendation algorithm on coupled social network, which considers the social interests as well as users’ preference in the recommender systems. This method can improve the performance of recommendations. Compared with the mass diffusion (MD) [16, 17] and user-based CF (UCF) [18, 19] methods, the proposed algorithm can alleviate the user cold-start problem more effectively.
This paper is organized as follows. We introduce the related works in Section 2. In Section 3, we propose a biased random walk algorithm on coupled social network. In Section 4 we describe the data sets and metrics used in this paper. We evaluate the performance of the proposed method in Section 5. Finally, we summarize this paper in Section 6.
2. Related Works
Collaborative filtering (CF) [18–25] is the most frequently used technology in recommender systems, which uses the collection history of users for mining the potential objects of interest to the target user. However, the CF algorithm only takes the similar users or objects into account and will lead to the same recommendation results to diverse users; namely, it is not conducive to the personalized recommendation. Meanwhile, the CF algorithm cannot deal with the cold-start problem ; that is, when a new user or object is added to the system, it is difficult to obtain recommendation or to be recommended because of lack of enough information. To alleviate this problem, many methods have been proposed, such as content-based , trust-aware [27, 28], social-impact , and tag-aware  methods.
Random walk  is a mathematical formalization of a path that consists of a succession of random edges, which is successfully used in recommender systems based on bipartite network [16, 32], namely, mass diffusion (MD for short) method . Accordingly, many methods based on mass diffusion were proposed [17, 33]. Furthermore, random walk was successfully used in many fields, such as social network  and Top-k search . However, there is a lack of study of random walk on coupled social network in recommender systems.
Massa and Avesani  proposed a social propagation method that is based on users’ distance from a fixed propagation horizon, which increased the coverage of recommender systems. Esslimani et al.  proposed a feedback effect between similarity and social influence in online communities. By utilizing the social relations, we can obtain the strength of social relationship between users, and we can use this social relationship to generate more accurate recommendation results. Meanwhile, the literature [36, 38] demonstrated that recommendation performance can be improved by taking into consideration the effect of social network, and the methods are both filtering the useless information by social relationship.
Lai et al.  proposed a hybrid personal trust model which adaptively combines the rating-based trust model and explicit trust metric to resolve the drawback caused by insufficient past rating records. Community-based recommender systems have attracted much research attention; the authors  proposed a novel community-based framework that employs PLSA-based model incorporating social activeness and dynamic interest to discover communities. Wei et al.  proposed a multicollaborative filtering trust network algorithm, an improved version of CF algorithm designed to work on Web 2.0 platform, which can improve the prediction accuracy compared with the original CF algorithm. We believe that if the social relationship can be used to supplement the user-object network like the aforementioned example of Figure 1, we will get more accurate recommendations and alleviate the user cold-start problem. Motivated by this, we proposed a biased random walk (diffusion-based) method on coupled social network to generate recommendations. Therefore, new users can obtain recommendations as long as they are connected to others in social networks.
In this section, we introduce the approach of diffusion on coupled social networks. Generally, a recommender system consists of two sets, and representing the users and objects, respectively. Denote by the adjacent matrix of the user-object bipartite network, of which each element , if user has collected object , and otherwise. Analogously, denote by the nonsymmetric adjacent matrix of user-user directed social network, of which each element , if the user has linked to user , and otherwise.
Random Walk on Social Network. Let be the transition probability matrix of a directed social network. The probability that a random walker at user goes to user on social network can be described as where is the out-degree in social network, that is, the number of leaders of user . Denote by the probability from other users to user at time . Therefore, we have The initial probability for target user is given by , and for all of the other user . Thus, we can obtain the probability that a random walker goes from the target user to all other users at time .
Random Walk on Bipartite Network. Let be the transition probability matrix of a bipartite network. The probability that a random walker at user goes to object on bipartite network can be described as where denotes the number of collected objects of user , and the probability that a random walker at object goes to user on bipartite network can be described as where denotes the number of users who have collected object on bipartite network. Denote and by the probability of user and object on bipartite network at time , respectively. Therefore, we have Similar to random walk on social network, the initial probability for target user is given by . But the difference is the fact that there are two different nodes on bipartite network and the initial probability and for all the other user and object . In the odd time step and , the probability of means the probability of target user selecting uncollected object . Therefore, we can obtain the recommendation list according to this probability for target user.
Biased Random Walk on Coupled Social Network. Let be the transition probability matrix of a coupled social network, where . In order to solve the user cold-start problem, suppose that a random walker at user goes to their neighbors (leaders) on directed social network with probability , and to their neighbors on bipartite network with probability . What’s more, a random walker at object goes to all users who collect object with equal probability. Thus, the target user finds the potential objects not only through other users with similar collecting interest on bipartite network, but also through their friends (leaders) on directed social network. Denote and by the probability of walker user and object on coupled social network at time , respectively. Therefore, we have
That is to say, initially, we assign the target user one unit of resource. Then () proportion of the resource is evenly distributed to the user’s social neighbors through the directed links (social network), and proportion is distributed to collected objects through the undirected links (bipartite network). In (6), when then ; it means that user has no outlinks in social network; therefore, he/she will distribute all of his/her resources to bipartite network. Similarly, when , then ; user will distribute all of his/her resources to social network. The initial score for target user is given by , , and for all the other user and object . Thus, we can obtain the recommendations by ranking the score of all objects at time for target user. At time , the recommendations are obtained only from social network, that is, his/her social leaders. At time and , the recommendations are obtained only from bipartite network and it is the pure MD algorithm.
Thus, the probability that a random walker arrives at the object at time is recognized as the possibility that the target user purchases this object. We call this algorithm biased random walk (BRW). For the example in Figure 1, the transition probability matrix for coupled social network is given in the following equation:
Consider and ; then , which means users and are reachable within 2 steps with 0.0833 probability through the coupled social network. On the other hand, without social network, the random walk distance on the original bipartite network for an arbitrary time because and are not reachable from each other in bipartite network.
4. Data and Metrics
4.1. Data Sets
To evaluate our algorithm’s performance, two real data sets are analyzed in the experiments. The data sets are from http://www.epinions.com and http://www.friendfeed.com, both of which provided user-objects collecting information and user-user social relationship. The Epinions data set was collected by Paolo Massa in a 5-week crawl (November/December 2003) from the http://www.epinions.com website  and the Friendfeed data set was collected by Fabio Celli et al. from http://www.friendfeed.com (September 6, 2009 to September 19, 2009) . We extract a smaller data set by randomly sampling the whole records of user activities in both Epinions and Friendfeed data sets. 4,066 users, 7,649 objects, 154,122 collected links, and 217,071 social links in total were found in the Epinions data set. Friendfeed contains 4,148 users who collected 5,700 objects, 96,942 collected links, and 386,804 social links. Table 1 shows the basic statistics for two representative data sets. Denote , , and by the number of users, objects, and ratings, respectively. denotes the data sparsity of user-objects network.
To test our algorithm’s performance, each information network is randomly divided into two parts: the training set consists of 90% entries and the remaining entries constitute the testing set. The training set is treated as known information used for generating recommendations, while the training set is regarded as unknown information used for testing the performance of the recommendation results. To evaluate the proposed algorithm, we employed five different metrics that characterize not only the accuracy of recommendations, but also the diversification, which are defined as follows.
(1) Precision . Precision represents the probability that the selected objects appeared in the recommendation list which is shown as where represents user ’s precision, denotes the number of recommended objects that appeared in the ’s testing set, and represents the length of recommendation list. By averaging over all users’ precisions, we can obtain the whole recommender systems’ precision as where represents the number of users. Obviously, a higher precision means a higher recommendation accuracy.
(2) Recall . Recall represents the probability that the recommended objects appeared in user’s collected list shown as where represents user ’s recall and is the number of objects collected by user in the testing set. Averaging over all individuals’ recall, we can obtain the recall of the whole recommender system.
(3) F-Measure . Generally speaking, for each user, recall is sensitive to and a larger generally gives a higher recall but a lower precision. The F-measure, that assigns equal weight for precision and recall, is defined as
By averaging over all users’ , we can also obtain the whole system’s .
(4) HD . HD is a metric to measure the diversity of users’ recommendation lists. It uses the Hamming distance to measure the difference of recommendation lists between users and , which is defined as where is the number of commonly recommended objects shown in top- locations of users and ’s recommendation list. Averaging over all pairs of users’ , we can obtain the of the recommender algorithm. Obviously, higher means higher diversity of users.
(5) Ranking Score () . Generally, the recommender system aims to generate a ranking list for the target user’s uncollected objects through the prediction score. In the recommender systems, one of the most used metrics to evaluate the algorithm’s performance is ranking score, which measures the users’ satisfaction of the ranking list, and is defined as follows: where is the position of uncollected object in user ’s ranking list and is the length of the user ’s ranking list. By averaging all links’ ranking score value we can obtain the whole system’s ranking score value . A small means the recommender system puts the user’s favorite objects in a top place in the recommender list; hence, the smaller is, the better an algorithm’s performance will be.
Figure 2 shows the ranking score values on Epinions and Friendfeed data sets. From the figure we can see that the best performance is achieved at time . At time , the recommendations are obtained only from social network and when it will generate random recommendation results since the ranking score value is much bigger than others. When the resource will spread only on bipartite network; therefore objects get scores in odd time steps only, and user get scores in even time steps only. In addition, the ranking score will fluctuate up and down alternately with time . That is because when the recommendations are obtained from social interest in odd time step, and from both social interests and collecting preferences in even time step. With the increase of time in even and odd time step, respectively, the ranking score becomes worse due to the existence of the redundant correlations .
The best ranking score performance occurs at time ; that is, when we consider the social interest in the recommender systems, it will improve the performance of recommender systems. Figure 3 shows the experimental results of precision, recall, F-measure, HD with recommendation list , and ranking score on Epinions and Friendfeed data sets at time . gives the pure MD algorithm. It can be found that when the parameter reaches the optimal value, the precision, recall, -measure, and almost simultaneously reach the maximum value except that of HD. Tables 2 and 3 show the results of biased random walk (BRW) compared with the mass diffusion (MD) and user-based CF (UCF) on Epinions and Friendfeed data sets, respectively. We can see that BRW algorithm has a higher ranking-accuracy than other algorithms and almost similar accuracy-precision with MD but lower diversity-precision than MD algorithm. It is because the probability of reciprocity links is large in the social network (Epinions data set is 45.47% and Friendfeed data set is 62.72%), where is the number of bidirectional links and is the number of all links in social network. Because it is easier for the random walker to go from one user to another user in social network, the recommendations obtained from social network will be similar among friends.
Generally speaking, the small degree users are the vast majority in the systems (Figure 4 shows the use degree distribution in the training set on Epinions and Friendfeed data sets. We find that there are 23.06% and 61.5% users with degrees smaller than 10 on Epinions and Friendfeed data sets, resp.). That is to say, increasing the small degree users’ performance could result in performance improvement of the whole system. In Figure 5, we show the effect of user degrees that is in the training set versus ranking score. From the figure we can see that the MD and UCF almost have the same ability for small degree users and our method has better performance than MD and UCF algorithm. Meanwhile, it can be seen that our method considering the social interest into the recommender system has a better performance for both larger and smaller degree users. In other words, it can alleviate the user cold-start problem.
6. Conclusion and Discussion
In a real online recommender system, for new users or users with less collections, it is difficult to obtain recommendations because of lack of enough information. However, if they are active in the social network, the system can obtain the recommendations from their friends or social leaders. In this way, the social networks can help us to solve the user cold-start problem.
In this paper, we proposed a recommendation algorithm via biased random walk on a two-layer coupled network: user-object bipartite network and user-user social network. Experiment results on two real data sets indicate that social interest and user’s preference can be combined together in a delicate way to improve the accuracy metric of recommendation systems. Compared with two other baseline algorithms, our algorithm achieves the best precision measure and has the best ability of accurately recommending objects to the small degree users, effectively alleviating the user cold-start problem.
This paper only provides a simple method to incorporate the social interest into the recommender systems by random walk on coupled social-information network, while a couple of issues remain open for future study. (i) The structure and evolution of coupled social networks are still unclear to us, but we believe they will be helpful for designing effective recommendation algorithms. (ii) The current algorithm assumes that a random walker goes to his friend on social network and his collected objects on bipartite network with the same probability; we conjecture that an appropriately adjusted weight assignment will further improve the algorithmic performance.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors acknowledge Jun-Lin Zhou for helpful discussions. This work was partially supported by the Natural Science Foundation of China (Grant nos. 61103109, 11105024, and 61300018) and the Special Project of Sichuan Youth Science and Technology Innovation Research Team (Grant no. 2013TD0006).
A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock, “Methods and metrics for cold-start recommendations,” in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 253–260, ACM, 2002.View at: Google Scholar
E. Vozalis and K. G. Margaritis, “Analysis of recommender systems algorithms,” in Proceedings of the 6th Hellenic European Conference on Computer Mathematics and Its Applications (HERCMA '03), vol. 2003, Athens, Greece, 2003.View at: Google Scholar
F. Radicchi and A. Arenas, “Abrupt transition in the structural format ion of interconnected networks,” Nature Physics, vol. 9, pp. 717–720, 2013.View at: Google Scholar
T. Zhoua, Z. Kuscsik, J. Liu, M. Medo, J. R. Wakeling, and Y. Zhang, “Solving the apparent diversity-accuracy dilemma of recommender systems,” Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 10, pp. 4511–4515, 2010.View at: Publisher Site | Google Scholar
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, “Grouplens: an open architecture for collaborative filtering of netnews,” in Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 175–186, ACM, 1994.View at: Google Scholar
J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative filtering recommender systems,” in The adaptive Web, pp. 291–324, Springer, New York, NY, USA, 2007.View at: Google Scholar
J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl, “An algorithmic framework for performing collaborative filtering,” in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 230–237, 1999.View at: Google Scholar
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collabo rative filtering recommendation algorithms,” in Proceedings of the 10th International Conference on World Wide Web, pp. 285–295, ACM, 2001.View at: Google Scholar
J. S. Breese, D. Heckerman, and C. Kadie, “Empirical analysis of predicti ve algorithms for collaborative filtering,” in Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI '98), pp. 43–52, Morgan Kaufmann, Madison, Wis, USA, July 1998.View at: Google Scholar
M. J. Pazzani and D. Billsus, “Content-based recommendation systems,” in The Adaptive Web, pp. 325–341, Springer, 2007.View at: Google Scholar
R. Burke, “Hybrid web recommender systems,” in The Adaptive Web, pp. 377–408, Springer, New York, NY, USA, 2007.View at: Google Scholar
K. Pearson, “The problem of the random walk,” Nature, vol. 72, no. 1865, p. 294, 1905.View at: Google Scholar
A. Zeng, A. Vidmer, M. Medo, and Y. C. Zhang, “Information filtering by similarity-preferential diffusion processes,” Europhysics Letters, vol. 105, Article ID 58002, 2014.View at: Google Scholar
A. W. Yu, N. Mamoulis, and H. Su, “Reverse top-k search using random walk with restart,” in Proceedings of the VLDB Endowment, vol. 7, 2014.View at: Google Scholar
B. Yin, Y. Yang, and W. Liu, “Exploring social activeness and dyna mic interest in community-based recommender system,” in Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, pp. 771–776, International World Wide Web Conferences Steering Committee, 2014.View at: Google Scholar
D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, and S. Suri, “Feedback effects between similarity and social influence in online communities,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08), pp. 160–168, August 2008.View at: Publisher Site | Google Scholar