Theory and Applications of Complex Networks 2014View this Special Issue
Research Article | Open Access
A Probabilistic Recommendation Method Inspired by Latent Dirichlet Allocation Model
The recent decade has witnessed an increasing popularity of recommendation systems, which help users acquire relevant knowledge, commodities, and services from an overwhelming information ocean on the Internet. Latent Dirichlet Allocation (LDA), originally presented as a graphical model for text topic discovery, now has found its application in many other disciplines. In this paper, we propose an LDA-inspired probabilistic recommendation method by taking the user-item collecting behavior as a two-step process: every user first becomes a member of one latent user-group at a certain probability and each user-group will then collect various items with different probabilities. Gibbs sampling is employed to approximate all the probabilities in the two-step process. The experiment results on three real-world data sets MovieLens, Netflix, and Last.fm show that our method exhibits a competitive performance on precision, coverage, and diversity in comparison with the other four typical recommendation methods. Moreover, we present an approximate strategy to reduce the computing complexity of our method with a slight degradation of the performance.
The advent of Internet has confronted us with an exploding information era. We find that it is very difficult to select the relevant ones from countless candidates on the e-commerce websites. As an automatic way to help people make right decisions under the information overload, the recommendation system has become a significant issue for both academic and industrial communities.
During the last decade, lots of recommendation methods have been proposed, including collaborative filtering methods [1, 2], content-based methods , spectral methods [4, 5], and iterative refinement methods [6, 7]. These methods are all based on the computation of either user similarity or item similarity or both. Recently, some network-based recommendation methods have been proposed to mine the latent relevance of users and items, such as the methods based on mass diffusion or association rules [8, 9].
Latent Dirichlet Allocation (LDA) was first presented as a graphical model for text topic discovery by Blei et al. in 2003 , which can be used to find the inherent relation of words and generate document set through the model. LDA has been widely used in document analysis [11–13], document classification, and document clustering [14–16]. LDA was first introduced into recommender systems for analyzing the context in content-based methods . Now in tag-based recommendation systems, LDA is widely used to find the latent relation between keywords of item description and item tags created by users, such that the items can be recommended based on the tags [18–20]. For instance, Kang et al.  proposed an LA-LDA model which considers not only the tags created by the target user but also the tags created by his/her friends in the social network to extend the scope of candidate tags created by the target user.
In this paper, we propose a new content-unaware probabilistic recommendation method inspired by LDA model. Users’ collecting behaviors are probabilistic events, in which one user belongs to multiple user-groups and users in each user-group have different collecting preferences. In our method, the collecting process is regarded as two joined probabilistic processes intermediated by the user-group; that is, every user is a member of one latent user-group at a certain probability, while each user-group will collect various items with different probabilities.
Calculating the probabilities on the entire data set is time-consuming and space-consuming. In order to reduce the computing complexity of our method, we introduce an approximate strategy with a slight degradation of the performance, which samples a part of the data set to build a rough probabilistic recommendation model.
Many products on an e-commerce website are not popular; that is, the sale of every single product lies in the tail of sale curve, but the sales of all these unpopular products constitute a big portion of the whole income. That is the so-called long tail phenomenon. Therefore, a good recommender system must focus on both the accuracy and the diversity. The experiment results on three real-world data sets, MovieLens, Netflix, and Last.fm, show that our method exhibits a competitive performance not only on the precision and the coverage but also on the diversity.
2. Materials and Methods
2.1. Recommendation Model
People have different and multiple inner attributes, including physiological characteristics, preferences, taboos, and religious beliefs. These attributes can be clustered into lots of user-groups which can represent users with similar attributes. Actually, a user does not belong to only one user-group. For example, user is a male and he is a high school student, minor, Chinese, and a Christian as well. One user belongs to multiple user-groups while users in different user-groups have different habits. For instance, users in the user-groups which contain the attribute of “elder” are more likely to buy health care products and presbyopia glasses than those who belong to the user-groups containing the attribute of “younger.” In our recommendation model, we put forward two assumptions:(1)the users’ collecting behaviors are probabilistic events;(2)one user belongs to multiple user-groups and users in the same user-group have similar collecting preference.
The collecting action of users on items is therefore considered as a two-step probabilistic process; that is, users are observed as members of several latent user-groups and users in user-groups will collect items based on the group-item probability distributions. Here we assume that is a user-group which a user belongs to; is the probability vector between users and user-groups. Each column of the vector represents the probability that a user belongs to this user-group. The probability that a user belongs to user-group can be expressed as ; is the commodity probability vector and each column of vector represents the probability that the users of current user-group will collect this item, while the probability that a user who belongs to user-group will buy item can be expressed as . In fact, reflects the degree of association between users and user-groups and shows which item the users who belong to this user-group are more likely to buy. For example, a user who loves both basketball and music belongs to two user-groups, but he prefers to play basketball rather than listening to music. The association intensity between user and user-group could be demonstrated by the probability that the user belongs to each user-group. For a student, he may not care about household items but usually buy books for study. Based on the assumption above, if there are groups, the probability for user to buy item can be expressed as
As long as and are calculated, the collecting probability vector of the users can be computed. We can get a list ranked in order of the probabilities. Deducing from the list, a proper recommendation can be given to the users. In fact, it is not easy to calculate and directly.
Considering that, the Latent Dirichlet Allocation (LDA) is a probabilistic model that uses a latent topic to bridge documents and words. Using the latent topic, LDA constructs the documents via two probabilistic processes that chooses a topic after the first probability prediction and then collects words from the attributes of the topic-word according to the topic. Inspired by the LDA, the structure of our recommendation model is designed as a three-layer structure of Bayesian, that is, the user layer, followed by the user-group layer and the item layer. To construct it, parameters are used in pairs. The recommendation model is determined by the hyper parameters and , in which describes the relative intensity between user-groups, θ~Dirichlet , and reflecting the probability distribution of each user-group, φ~Dirichlet . The complete graphical model representation of LDA for probabilistic recommendation model is as shown in Figure 1. Indeed, we can construct the model without the items or user descriptions. So it is a content-unaware probabilistic recommendation model.
In our method, the probability that a user has an attribute is expressed as ; the probability that users who have attribute purchased items is expressed as ; and the probability that user purchases item is expressed as
2.2. Parameters Estimation
There are many approximate inference strategies to estimate parameters and in LDA, such as Laplace approximate, variational inference, Gibbs sampling, and expectation propagation. Griffiths and Steyvers put forward that the perplexity and speed of Gibbs sampling method is better than those of other methods . Since the structure of our method is similar to that of LDA, we chose Gibbs sampling algorithm to estimate parameters and as well. Gibbs sampling is a simple MCMC (Markov chain Monte Carlo) method. It constructs a Markov chain which converges to the target distribution and samples. Each state of the Markov chain represents the value of the user-group, and the sampling variable is an implicit variable, which is assigned to items collected by users. The transition between states follows a simple rule. By sampling on the current values of all variables and data set of users’ purchases, the chain can translate to the next state.
Here, we use the posterior distribution , which is calculated by counting the user-groups assigned to items, as the transition probability of user-group shifts from to for item which is purchased by user , as shown in Here, is a count that does not include the assignment of item is the number of times that item has been assigned to user-group represents the number of times that all items have been assigned to user-group is the number of times that items purchased by user have been assigned to user-group represents the quantity of items that user had purchased.
When the Markov chain is near the target distribution after adequate iterations, we recorded its current values of the implicit variable and used it to estimate and as shown in (4) and (5): Here is the number of times item has been assigned to user-group represents the number of times that all items have been assigned to user-group is the number of times that items purchased by user have been assigned to user-group ; and represents the quantity of items that user has purchased.
2.3. Approximate Model
Actually, the data set is updated every day. It is not only time-consuming but also space-consuming to use the entire data set to structure the recommendation model. To save time and space, we prefer to model with less data and the recommended items are only listed when required instead of preparing them in advance. In this paper, we present an approximation method to structure the approximate model of the probabilistic recommendation.
In the approximation method, we sample part of the users’ collection data from the data set to structure an imprecise probabilistic recommendation model. The imprecise model will serve as a guide to create a recommendation list from two sides. On one hand, the latent user-group vector will be initialed by using the parameter of imprecise model. On the other hand, the transition probability is defined as the product of constant calculated in the imprecise model and iteration parameter , as shown in
The processes of the approximate method are described as follows.(1)Choose part of users from the data set for constructing the approximate model, called approximating data.(2)Use the approximating data to initiate the Markov chain: random user-group from 1 to is assigned to each item collected by user .(3)Use (3) to iterate the Markov chain until it is converged. Equation (4) will be used to work out the and then used to construct the approximate model.(4)When user needs a recommendation list, user-group is assigned to each item collected by user . The parameter , collecting probability of the user-group, which was consequently worked out in Step , will be used to initialize from 1 to probabilistically. This is the initial state of the Markov chain for user .(5)Use (6) to iterate for appreciable number of times, and we denote the result as , also called burn-in space. It is thought that the Markov chain is near the target distribution. Then, we record the current values of .(6)Sample once in a certain number of times which is called thinning space . According to (5), we can estimate , the purchasing probability of user in each user-group.(7)By ranking the product of and , the recommendation list can be provided.
The time consumption of the approximate model depends on the size of approximating data. Indeed, the performance of the approximate model depends on the size as well. In the experiment, we use different percentages of data as approximation data to find the optimal size. Approximate model is, however, imprecise owing to the use of data locality. Meanwhile, the performance oscillates when different data is chosen to do approximate modeling. Different strategies (random: randomly choose users within the entire data.; item degree: according to each user’s average degree of items sampled proportionally; user degree: according to the average degree of user sampled proportionally; quick classification: use a quick classification method to classify the users and then sample proportionally) are compared to find out the user distribution offered by which strategy is most similar to that of the entire data and has the same tendency of performance. In the experiment, we use the average value and the upper bound value to represent the performance of the approximate model.
3. Experiments and Results
3.1. Data Description
Three benchmark data sets (Table 1) are used to evaluate the performance of the proposed LDA-based recommender method. The first data set Movielens is provided by GroupLens Project at the University of Minnesota. The second data set Netflix is a randomly selected subset of the huge data set released by the DVD rental company Netflix for its Netflix Prize. The third data set Last.fm was released in the framework of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011). According to the chronological order of the data in MovieLens, we chose the early 80 percent of the data as the training set and the later 20 percent as the probe set. For the Netflix and Last.fm date sets, the data is randomly selected into two parts: the training set contains 90% of the data and the remaining 10% of the data constitutes the probe set in the experiment.
3.2. Evaluation Metrics and Comparison Methods
Three evaluation metrics were used to assess the recommendation’s effect in the experiment: precision, coverage, and diversity. Precision is a basic evaluating metric. It is defined as the proportion of users that accept the recommended items: Here, represents the list of items recommend to users; represents the set of items that the user has bought.
Different algorithm will provide different recommendation list to users. The union set of recommendation lists can be used to work out the proportion of recommended items in the entire item set. We use coverage to define this proportion, as shown in Here, represents the list of items recommended to user and represents the quantity of all items. Diversity is an important metric for personalized recommender systems. It is used to evaluate the difference between users’ recommendation lists, and we use the average hamming distance of recommendation list to define diversity as follows: is the hamming distance of recommendation list between user and user is the length of the recommendation list; and represents the quantity of users.
For comparison, we present the results of the four recommendation methods which are the probabilistic spreading (ProbS), heat spreading (HeatS), user-based collaborative filtering (UserCF), and the association rule algorithm (ARule). User-based collaborative filtering algorithm is one of the most classic collaborative filtering methods. Based on the similarity of purchased items between users, it recommends the items that similar users have bought but not yet bought by the user himself. The association rule method is also widely used in recommender systems. This method concentrates on the latent relationship between items. To find these relationships, every user’s item list is analyzed to create a list of the most related items called association rule. Heat spreading method, a variant of probabilistic spreading method, has the highest rate of coverage and diversity in current recommendation algorithms, but it ignores accuracy. In the experiment, we use accuracy of recommendation as the lower bound of precision and use its coverage and diversity rate as the upper bound. Therefore, we use the enhance metrics to evaluate the performance, as shown in (10a)–(10c). Heat spreading and probabilistic spreading are integrated methods which do well on precision but are not so good on coverage and diversity:
To evaluate the performance of the approximate model, missing rate and comprehensive Comp are defined in the metric, as shown in (11). Here, presents the missing rate of performance such as precision, coverage, and diversity; presents the performances of normal model and is the approximate performance of the model; denotes the percentage of approximate data in the entire data set; is the controlling parameter that accommodates the floating of , and are the weight of metrics (precision, coverage, and diversity) for comprehensive rate:
The recommendation performances of different methods on the MovieLens, Netflix, and Last.fm data sets are shown in Figures 2, 3, and 4, respectively. The parameters in our method will affect the result. Here we use experiential constant parameters: , , and .
The performances of our method are far better than those of the other methods on MovieLens data set which has the most links in the experiment, with ProbS running a close second, while both UserCF and ARule performed significantly worse. When the length of recommendation is lower than 20, the performance of our method is at least twice as well as the other two methods. On the Netflix data set, our method consistently performs very well in terms of precision, coverage, and diversity. The precision of ProbS goes near to that of our method while its other performances are much worse. In addition, our method gets good comprehensive performance on Last.fm which is the sparsest data set in the experiment. When the recommendation list length is over 50, the precision of our method is lower than that of ARule, and coverage runs a close second. Furthermore, the consistency of its diversity could be rated as the best. The performances of the approximate model are shown in Figures 5 and 6.
Different metrics are drawn on different maps to show their tendency of coverage, as shown in Figure 5. For precision, with the increasing of data size, the missing rate declines more slowly and is leveled out in the end. The missing rate is controlled between 0 and 20%. The tendency of coverage is different from that of precision. The missing rate will increase, at first, while the transition occurs in the range of 5 to 10 (percentage). On the contrary, the missing rate of diversity is very high in the heat of the line. The lowest point is located in the range of 5 to 10 as well, and the missing rate is leveled out at the tail.
Figure 6 shows the comprehensive performances of approximate model. Different comprehensive curve will be drawn depending on the parameters, and we kept the parameters fixed as , and in the experiment. There are some differences between the two data sets on their comprehensive curve. The comprehensive curve of MovieLens data set is flat, and less lower values lay on the heat of the curve while slight oscillation comes on the tail. In contrast, the comprehensive curve of Netflix data set is like a hook. Considering the above-mentioned factors, the optimal value occurs near 10 (percentage).
In this paper, we proposed a method which makes use of users’ behaviors to give recommendation. Instead of modeling with tags or contexts, our method takes the collecting lists to construct a recommendation model without the contents of items. As shown in the experiment, our method exhibits an all-round competitive performance on precision, coverage, and diversity, in comparison with four typical classes of recommendation algorithms. To reduce the computing complexity of our method, approximate model is also proposed in this paper, where the adjusting parameters are the determinant of performance curve of approximate model. As shown in the experiment, the approximate method is feasible since the optimal value is under 20%. When precision is considered to be the most important metric, it is more appropriate to use less than 10% of the data to construct the approximate model.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported by the National Natural Science Foundation of China (nos. 61300018 and 61103109), Research Fund for the Doctoral Program of Higher Education of China (no. 20120185120017), China Postdoctoral Science Foundation (nos. 2013M531951 and 2014T70860), Fundamental Research Funds for the Central Universities (no. ZYGX2012J071), and Special Project of Sichuan Youth Science and Technology Innovation Research Team (no. 2013TD0006).
- J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative filtering recommender systems,” Lecture Notes in Computer Science, vol. 4321, pp. 291–324, 2007.
- B. Sarwar, G. Karypis, and J. Konstan, “Item-based collaborative filtering recommendation algorithms,” in Proceedings of the 10th international conference on World Wide Web (WWW '01), 2001, Item-Based Collaborative Filtering Recommendation Algorithms,.
- M. J. Pazzani and D. Billsus, “Content-based recommendation systems,” in The Adaptive Web, vol. 4321 of Lecture Notes in Computer Science, pp. 325–341, Springer, 2007.
- K. Goldberg, T. Roeder, D. Gupta, and C. Perkins, “Eigentaste: a constant time collaborative filtering algorithm,” Information Retrieval, vol. 4, no. 2, pp. 133–151, 2001.
- S. Maslov and Y.-C. Zhang, “Extracting hidden information from knowledge networks,” Physical Review Letters, vol. 87, no. 24, Article ID 248701, 2001.
- P. Laureti, L. Moret, Y. Zhang, and Y. Yu, “Information filtering via iterative refinement,” Europhysics Letters, vol. 75, no. 6, pp. 1006–1012, 2006.
- J. Ren, T. Zhou, and Y.-C. Zhang, “Information filtering via self-consistent refinement,” Europhysics Letters, vol. 82, Article ID 58007, 2008.
- R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the 1993 ACM SIGMOD international conference on Management of data (SIGMOD '93), vol. 22, no. 2, pp. 207–216.
- W. Lin, S. A. Alvarez, and C. Ruiz, “Efficient adaptive-support association rule mining for recommender systems,” Data Mining and Knowledge Discovery, vol. 6, no. 1, pp. 83–105, 2002.
- D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 4-5, pp. 993–1022, 2003.
- T. L. Griffiths and M. Steyvers, “Finding scientific topic,” Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 1, 2004.
- D. M. Blei and J. D. Lafferty, “Correlated topicmodels,” in Advances in Neural Information Processing Systems, vol. 18, 2005.
- F.-F. Li and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 2, pp. 524–531, June 2005.
- X. Wei and W. B. Croft, “LDA-based document models for ad-hoc retrieval,” in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '06), pp. 178–185, August 2006.
- Z. Wang and X. Qian, “Text categorization based on LDA and SVM,” in Proceedings of the International Conference on Computer Science and Software Engineering (CSSE '08), vol. 1, pp. 674–677, December 2008.
- D. Ramage, P. Heymann, C. D. Manning, and H. Garcia-Molina, “Clustering the tagged web,” in Proceeding of the 2nd ACM International Conference on Web Search and Data Mining (WSDM '09), pp. 54–63, New York, NY, USA, February 2009.
- K. Yu, B. Zhang, H. Zhu, H. Cao, and J. Tian, “Towards personalized context-aware recommendation by mining context logs through topic models,” Lecture Notes in Computer Science, vol. 7301, no. 1, pp. 431–443, 2012.
- R. Krestel, P. Fankhauser, and W. Nejdl, “Latent Dirichlet allocation for tag recommendation,” in Proceedings of the 3rd ACM Conference on Recommender Systems, pp. 61–68, October 2009.
- Y. Song, L. Zhang, and C. L. Giles, “Automatic tag recommendation algorithms for social recommender systems,” ACM Transactions on the Web, vol. 5, article 4, no. 1, 2011.
- X. Si and M. Sun, “Tag-LDA for scalable real-time tag recommendation,” Journal of Computational Information Systems, vol. 6, no. 1, pp. 23–31, 2009.
- J.-H. Kang, K. Lerman, and L. Getoor, “LA-LDA: a limited attention topic model for social recommendation,” in Social Computing, Behavioral-Cultural Modeling and Prediction, vol. 7812 of Lecture Notes in Computer Science, pp. 211–220, Springer, Berlin, Germany, 2013.
- T. Hofmann, “Unsupervised learning by probabilistic Latent Semantic Analysis,” Machine Learning, vol. 42, no. 1-2, pp. 177–196, 2001.
Copyright © 2014 WenBo Xie et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.