Research Article  Open Access
A Probabilistic Recommendation Method Inspired by Latent Dirichlet Allocation Model
Abstract
The recent decade has witnessed an increasing popularity of recommendation systems, which help users acquire relevant knowledge, commodities, and services from an overwhelming information ocean on the Internet. Latent Dirichlet Allocation (LDA), originally presented as a graphical model for text topic discovery, now has found its application in many other disciplines. In this paper, we propose an LDAinspired probabilistic recommendation method by taking the useritem collecting behavior as a twostep process: every user first becomes a member of one latent usergroup at a certain probability and each usergroup will then collect various items with different probabilities. Gibbs sampling is employed to approximate all the probabilities in the twostep process. The experiment results on three realworld data sets MovieLens, Netflix, and Last.fm show that our method exhibits a competitive performance on precision, coverage, and diversity in comparison with the other four typical recommendation methods. Moreover, we present an approximate strategy to reduce the computing complexity of our method with a slight degradation of the performance.
1. Introduction
The advent of Internet has confronted us with an exploding information era. We find that it is very difficult to select the relevant ones from countless candidates on the ecommerce websites. As an automatic way to help people make right decisions under the information overload, the recommendation system has become a significant issue for both academic and industrial communities.
During the last decade, lots of recommendation methods have been proposed, including collaborative filtering methods [1, 2], contentbased methods [3], spectral methods [4, 5], and iterative refinement methods [6, 7]. These methods are all based on the computation of either user similarity or item similarity or both. Recently, some networkbased recommendation methods have been proposed to mine the latent relevance of users and items, such as the methods based on mass diffusion or association rules [8, 9].
Latent Dirichlet Allocation (LDA) was first presented as a graphical model for text topic discovery by Blei et al. in 2003 [10], which can be used to find the inherent relation of words and generate document set through the model. LDA has been widely used in document analysis [11–13], document classification, and document clustering [14–16]. LDA was first introduced into recommender systems for analyzing the context in contentbased methods [17]. Now in tagbased recommendation systems, LDA is widely used to find the latent relation between keywords of item description and item tags created by users, such that the items can be recommended based on the tags [18–20]. For instance, Kang et al. [21] proposed an LALDA model which considers not only the tags created by the target user but also the tags created by his/her friends in the social network to extend the scope of candidate tags created by the target user.
In this paper, we propose a new contentunaware probabilistic recommendation method inspired by LDA model. Users’ collecting behaviors are probabilistic events, in which one user belongs to multiple usergroups and users in each usergroup have different collecting preferences. In our method, the collecting process is regarded as two joined probabilistic processes intermediated by the usergroup; that is, every user is a member of one latent usergroup at a certain probability, while each usergroup will collect various items with different probabilities.
Calculating the probabilities on the entire data set is timeconsuming and spaceconsuming. In order to reduce the computing complexity of our method, we introduce an approximate strategy with a slight degradation of the performance, which samples a part of the data set to build a rough probabilistic recommendation model.
Many products on an ecommerce website are not popular; that is, the sale of every single product lies in the tail of sale curve, but the sales of all these unpopular products constitute a big portion of the whole income. That is the socalled long tail phenomenon. Therefore, a good recommender system must focus on both the accuracy and the diversity. The experiment results on three realworld data sets, MovieLens, Netflix, and Last.fm, show that our method exhibits a competitive performance not only on the precision and the coverage but also on the diversity.
2. Materials and Methods
2.1. Recommendation Model
People have different and multiple inner attributes, including physiological characteristics, preferences, taboos, and religious beliefs. These attributes can be clustered into lots of usergroups which can represent users with similar attributes. Actually, a user does not belong to only one usergroup. For example, user is a male and he is a high school student, minor, Chinese, and a Christian as well. One user belongs to multiple usergroups while users in different usergroups have different habits. For instance, users in the usergroups which contain the attribute of “elder” are more likely to buy health care products and presbyopia glasses than those who belong to the usergroups containing the attribute of “younger.” In our recommendation model, we put forward two assumptions:(1)the users’ collecting behaviors are probabilistic events;(2)one user belongs to multiple usergroups and users in the same usergroup have similar collecting preference.
The collecting action of users on items is therefore considered as a twostep probabilistic process; that is, users are observed as members of several latent usergroups and users in usergroups will collect items based on the groupitem probability distributions. Here we assume that is a usergroup which a user belongs to; is the probability vector between users and usergroups. Each column of the vector represents the probability that a user belongs to this usergroup. The probability that a user belongs to usergroup can be expressed as ; is the commodity probability vector and each column of vector represents the probability that the users of current usergroup will collect this item, while the probability that a user who belongs to usergroup will buy item can be expressed as . In fact, reflects the degree of association between users and usergroups and shows which item the users who belong to this usergroup are more likely to buy. For example, a user who loves both basketball and music belongs to two usergroups, but he prefers to play basketball rather than listening to music. The association intensity between user and usergroup could be demonstrated by the probability that the user belongs to each usergroup. For a student, he may not care about household items but usually buy books for study. Based on the assumption above, if there are groups, the probability for user to buy item can be expressed as
As long as and are calculated, the collecting probability vector of the users can be computed. We can get a list ranked in order of the probabilities. Deducing from the list, a proper recommendation can be given to the users. In fact, it is not easy to calculate and directly.
Considering that, the Latent Dirichlet Allocation (LDA) is a probabilistic model that uses a latent topic to bridge documents and words. Using the latent topic, LDA constructs the documents via two probabilistic processes that chooses a topic after the first probability prediction and then collects words from the attributes of the topicword according to the topic. Inspired by the LDA, the structure of our recommendation model is designed as a threelayer structure of Bayesian, that is, the user layer, followed by the usergroup layer and the item layer. To construct it, parameters are used in pairs. The recommendation model is determined by the hyper parameters and , in which describes the relative intensity between usergroups, θ~Dirichlet , and reflecting the probability distribution of each usergroup, φ~Dirichlet . The complete graphical model representation of LDA for probabilistic recommendation model is as shown in Figure 1. Indeed, we can construct the model without the items or user descriptions. So it is a contentunaware probabilistic recommendation model.
In our method, the probability that a user has an attribute is expressed as ; the probability that users who have attribute purchased items is expressed as ; and the probability that user purchases item is expressed as
2.2. Parameters Estimation
There are many approximate inference strategies to estimate parameters and in LDA, such as Laplace approximate, variational inference, Gibbs sampling, and expectation propagation. Griffiths and Steyvers put forward that the perplexity and speed of Gibbs sampling method is better than those of other methods [11]. Since the structure of our method is similar to that of LDA, we chose Gibbs sampling algorithm to estimate parameters and as well. Gibbs sampling is a simple MCMC (Markov chain Monte Carlo) method. It constructs a Markov chain which converges to the target distribution and samples. Each state of the Markov chain represents the value of the usergroup, and the sampling variable is an implicit variable, which is assigned to items collected by users. The transition between states follows a simple rule. By sampling on the current values of all variables and data set of users’ purchases, the chain can translate to the next state.
Here, we use the posterior distribution , which is calculated by counting the usergroups assigned to items, as the transition probability of usergroup shifts from to for item which is purchased by user , as shown in Here, is a count that does not include the assignment of item is the number of times that item has been assigned to usergroup represents the number of times that all items have been assigned to usergroup is the number of times that items purchased by user have been assigned to usergroup represents the quantity of items that user had purchased.
When the Markov chain is near the target distribution after adequate iterations, we recorded its current values of the implicit variable and used it to estimate and as shown in (4) and (5): Here is the number of times item has been assigned to usergroup represents the number of times that all items have been assigned to usergroup is the number of times that items purchased by user have been assigned to usergroup ; and represents the quantity of items that user has purchased.
2.3. Approximate Model
Actually, the data set is updated every day. It is not only timeconsuming but also spaceconsuming to use the entire data set to structure the recommendation model. To save time and space, we prefer to model with less data and the recommended items are only listed when required instead of preparing them in advance. In this paper, we present an approximation method to structure the approximate model of the probabilistic recommendation.
In the approximation method, we sample part of the users’ collection data from the data set to structure an imprecise probabilistic recommendation model. The imprecise model will serve as a guide to create a recommendation list from two sides. On one hand, the latent usergroup vector will be initialed by using the parameter of imprecise model. On the other hand, the transition probability is defined as the product of constant calculated in the imprecise model and iteration parameter , as shown in
The processes of the approximate method are described as follows.(1)Choose part of users from the data set for constructing the approximate model, called approximating data.(2)Use the approximating data to initiate the Markov chain: random usergroup from 1 to is assigned to each item collected by user .(3)Use (3) to iterate the Markov chain until it is converged. Equation (4) will be used to work out the and then used to construct the approximate model.(4)When user needs a recommendation list, usergroup is assigned to each item collected by user . The parameter , collecting probability of the usergroup, which was consequently worked out in Step , will be used to initialize from 1 to probabilistically. This is the initial state of the Markov chain for user .(5)Use (6) to iterate for appreciable number of times, and we denote the result as , also called burnin space. It is thought that the Markov chain is near the target distribution. Then, we record the current values of .(6)Sample once in a certain number of times which is called thinning space [22]. According to (5), we can estimate , the purchasing probability of user in each usergroup.(7)By ranking the product of and , the recommendation list can be provided.
The time consumption of the approximate model depends on the size of approximating data. Indeed, the performance of the approximate model depends on the size as well. In the experiment, we use different percentages of data as approximation data to find the optimal size. Approximate model is, however, imprecise owing to the use of data locality. Meanwhile, the performance oscillates when different data is chosen to do approximate modeling. Different strategies (random: randomly choose users within the entire data.; item degree: according to each user’s average degree of items sampled proportionally; user degree: according to the average degree of user sampled proportionally; quick classification: use a quick classification method to classify the users and then sample proportionally) are compared to find out the user distribution offered by which strategy is most similar to that of the entire data and has the same tendency of performance. In the experiment, we use the average value and the upper bound value to represent the performance of the approximate model.
3. Experiments and Results
3.1. Data Description
Three benchmark data sets (Table 1) are used to evaluate the performance of the proposed LDAbased recommender method. The first data set Movielens is provided by GroupLens Project at the University of Minnesota. The second data set Netflix is a randomly selected subset of the huge data set released by the DVD rental company Netflix for its Netflix Prize. The third data set Last.fm was released in the framework of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011). According to the chronological order of the data in MovieLens, we chose the early 80 percent of the data as the training set and the later 20 percent as the probe set. For the Netflix and Last.fm date sets, the data is randomly selected into two parts: the training set contains 90% of the data and the remaining 10% of the data constitutes the probe set in the experiment.

3.2. Evaluation Metrics and Comparison Methods
Three evaluation metrics were used to assess the recommendation’s effect in the experiment: precision, coverage, and diversity. Precision is a basic evaluating metric. It is defined as the proportion of users that accept the recommended items: Here, represents the list of items recommend to users; represents the set of items that the user has bought.
Different algorithm will provide different recommendation list to users. The union set of recommendation lists can be used to work out the proportion of recommended items in the entire item set. We use coverage to define this proportion, as shown in Here, represents the list of items recommended to user and represents the quantity of all items. Diversity is an important metric for personalized recommender systems. It is used to evaluate the difference between users’ recommendation lists, and we use the average hamming distance of recommendation list to define diversity as follows: is the hamming distance of recommendation list between user and user is the length of the recommendation list; and represents the quantity of users.
For comparison, we present the results of the four recommendation methods which are the probabilistic spreading (ProbS), heat spreading (HeatS), userbased collaborative filtering (UserCF), and the association rule algorithm (ARule). Userbased collaborative filtering algorithm is one of the most classic collaborative filtering methods. Based on the similarity of purchased items between users, it recommends the items that similar users have bought but not yet bought by the user himself. The association rule method is also widely used in recommender systems. This method concentrates on the latent relationship between items. To find these relationships, every user’s item list is analyzed to create a list of the most related items called association rule. Heat spreading method, a variant of probabilistic spreading method, has the highest rate of coverage and diversity in current recommendation algorithms, but it ignores accuracy. In the experiment, we use accuracy of recommendation as the lower bound of precision and use its coverage and diversity rate as the upper bound. Therefore, we use the enhance metrics to evaluate the performance, as shown in (10a)–(10c). Heat spreading and probabilistic spreading are integrated methods which do well on precision but are not so good on coverage and diversity:
To evaluate the performance of the approximate model, missing rate and comprehensive Comp are defined in the metric, as shown in (11). Here, presents the missing rate of performance such as precision, coverage, and diversity; presents the performances of normal model and is the approximate performance of the model; denotes the percentage of approximate data in the entire data set; is the controlling parameter that accommodates the floating of , and are the weight of metrics (precision, coverage, and diversity) for comprehensive rate:
3.3. Results
The recommendation performances of different methods on the MovieLens, Netflix, and Last.fm data sets are shown in Figures 2, 3, and 4, respectively. The parameters in our method will affect the result. Here we use experiential constant parameters: , , and .
The performances of our method are far better than those of the other methods on MovieLens data set which has the most links in the experiment, with ProbS running a close second, while both UserCF and ARule performed significantly worse. When the length of recommendation is lower than 20, the performance of our method is at least twice as well as the other two methods. On the Netflix data set, our method consistently performs very well in terms of precision, coverage, and diversity. The precision of ProbS goes near to that of our method while its other performances are much worse. In addition, our method gets good comprehensive performance on Last.fm which is the sparsest data set in the experiment. When the recommendation list length is over 50, the precision of our method is lower than that of ARule, and coverage runs a close second. Furthermore, the consistency of its diversity could be rated as the best. The performances of the approximate model are shown in Figures 5 and 6.
Different metrics are drawn on different maps to show their tendency of coverage, as shown in Figure 5. For precision, with the increasing of data size, the missing rate declines more slowly and is leveled out in the end. The missing rate is controlled between 0 and 20%. The tendency of coverage is different from that of precision. The missing rate will increase, at first, while the transition occurs in the range of 5 to 10 (percentage). On the contrary, the missing rate of diversity is very high in the heat of the line. The lowest point is located in the range of 5 to 10 as well, and the missing rate is leveled out at the tail.
Figure 6 shows the comprehensive performances of approximate model. Different comprehensive curve will be drawn depending on the parameters, and we kept the parameters fixed as , and in the experiment. There are some differences between the two data sets on their comprehensive curve. The comprehensive curve of MovieLens data set is flat, and less lower values lay on the heat of the curve while slight oscillation comes on the tail. In contrast, the comprehensive curve of Netflix data set is like a hook. Considering the abovementioned factors, the optimal value occurs near 10 (percentage).
4. Conclusions
In this paper, we proposed a method which makes use of users’ behaviors to give recommendation. Instead of modeling with tags or contexts, our method takes the collecting lists to construct a recommendation model without the contents of items. As shown in the experiment, our method exhibits an allround competitive performance on precision, coverage, and diversity, in comparison with four typical classes of recommendation algorithms. To reduce the computing complexity of our method, approximate model is also proposed in this paper, where the adjusting parameters are the determinant of performance curve of approximate model. As shown in the experiment, the approximate method is feasible since the optimal value is under 20%. When precision is considered to be the most important metric, it is more appropriate to use less than 10% of the data to construct the approximate model.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (nos. 61300018 and 61103109), Research Fund for the Doctoral Program of Higher Education of China (no. 20120185120017), China Postdoctoral Science Foundation (nos. 2013M531951 and 2014T70860), Fundamental Research Funds for the Central Universities (no. ZYGX2012J071), and Special Project of Sichuan Youth Science and Technology Innovation Research Team (no. 2013TD0006).
References
 J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen, “Collaborative filtering recommender systems,” Lecture Notes in Computer Science, vol. 4321, pp. 291–324, 2007. View at: Google Scholar
 B. Sarwar, G. Karypis, and J. Konstan, “Itembased collaborative filtering recommendation algorithms,” in Proceedings of the 10th international conference on World Wide Web (WWW '01), 2001, ItemBased Collaborative Filtering Recommendation Algorithms,. View at: Google Scholar
 M. J. Pazzani and D. Billsus, “Contentbased recommendation systems,” in The Adaptive Web, vol. 4321 of Lecture Notes in Computer Science, pp. 325–341, Springer, 2007. View at: Google Scholar
 K. Goldberg, T. Roeder, D. Gupta, and C. Perkins, “Eigentaste: a constant time collaborative filtering algorithm,” Information Retrieval, vol. 4, no. 2, pp. 133–151, 2001. View at: Publisher Site  Google Scholar
 S. Maslov and Y.C. Zhang, “Extracting hidden information from knowledge networks,” Physical Review Letters, vol. 87, no. 24, Article ID 248701, 2001. View at: Google Scholar
 P. Laureti, L. Moret, Y. Zhang, and Y. Yu, “Information filtering via iterative refinement,” Europhysics Letters, vol. 75, no. 6, pp. 1006–1012, 2006. View at: Publisher Site  Google Scholar  MathSciNet
 J. Ren, T. Zhou, and Y.C. Zhang, “Information filtering via selfconsistent refinement,” Europhysics Letters, vol. 82, Article ID 58007, 2008. View at: Publisher Site  Google Scholar
 R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the 1993 ACM SIGMOD international conference on Management of data (SIGMOD '93), vol. 22, no. 2, pp. 207–216. View at: Google Scholar
 W. Lin, S. A. Alvarez, and C. Ruiz, “Efficient adaptivesupport association rule mining for recommender systems,” Data Mining and Knowledge Discovery, vol. 6, no. 1, pp. 83–105, 2002. View at: Publisher Site  Google Scholar  MathSciNet
 D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, no. 45, pp. 993–1022, 2003. View at: Google Scholar
 T. L. Griffiths and M. Steyvers, “Finding scientific topic,” Proceedings of the National Academy of Sciences of the United States of America, vol. 101, no. 1, 2004. View at: Google Scholar
 D. M. Blei and J. D. Lafferty, “Correlated topicmodels,” in Advances in Neural Information Processing Systems, vol. 18, 2005. View at: Google Scholar
 F.F. Li and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 2, pp. 524–531, June 2005. View at: Google Scholar
 X. Wei and W. B. Croft, “LDAbased document models for adhoc retrieval,” in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '06), pp. 178–185, August 2006. View at: Google Scholar
 Z. Wang and X. Qian, “Text categorization based on LDA and SVM,” in Proceedings of the International Conference on Computer Science and Software Engineering (CSSE '08), vol. 1, pp. 674–677, December 2008. View at: Publisher Site  Google Scholar
 D. Ramage, P. Heymann, C. D. Manning, and H. GarciaMolina, “Clustering the tagged web,” in Proceeding of the 2nd ACM International Conference on Web Search and Data Mining (WSDM '09), pp. 54–63, New York, NY, USA, February 2009. View at: Publisher Site  Google Scholar
 K. Yu, B. Zhang, H. Zhu, H. Cao, and J. Tian, “Towards personalized contextaware recommendation by mining context logs through topic models,” Lecture Notes in Computer Science, vol. 7301, no. 1, pp. 431–443, 2012. View at: Publisher Site  Google Scholar
 R. Krestel, P. Fankhauser, and W. Nejdl, “Latent Dirichlet allocation for tag recommendation,” in Proceedings of the 3rd ACM Conference on Recommender Systems, pp. 61–68, October 2009. View at: Publisher Site  Google Scholar
 Y. Song, L. Zhang, and C. L. Giles, “Automatic tag recommendation algorithms for social recommender systems,” ACM Transactions on the Web, vol. 5, article 4, no. 1, 2011. View at: Publisher Site  Google Scholar
 X. Si and M. Sun, “TagLDA for scalable realtime tag recommendation,” Journal of Computational Information Systems, vol. 6, no. 1, pp. 23–31, 2009. View at: Google Scholar
 J.H. Kang, K. Lerman, and L. Getoor, “LALDA: a limited attention topic model for social recommendation,” in Social Computing, BehavioralCultural Modeling and Prediction, vol. 7812 of Lecture Notes in Computer Science, pp. 211–220, Springer, Berlin, Germany, 2013. View at: Publisher Site  Google Scholar
 T. Hofmann, “Unsupervised learning by probabilistic Latent Semantic Analysis,” Machine Learning, vol. 42, no. 12, pp. 177–196, 2001. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2014 WenBo Xie et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.