Research Article  Open Access
Normalizing ItemBased Collaborative Filter Using ContextAware Scaled Baseline Predictor
Abstract
Itembased collaborative filter algorithms play an important role in modern commercial recommendation systems (RSs). To improve the recommendation performance, normalization is always used as a basic component for the predictor models. Among a lot of normalizing methods, subtracting the baseline predictor (BLP) is the most popular one. However, the BLP uses a statistical constant without considering the context. We found that slightly scaling the different components of the BLP separately could dramatically improve the performance. This paper proposed some normalization methods based on the scaled baseline predictors according to different context information. The experimental results show that using contextaware scaled baseline predictor for normalization indeed gets better recommendation performance, including RMSE, MAE, precision, recall, and nDCG.
1. Introduction
The abundance of information available on the Internet makes the increasing difficulty in finding what the people want, especially for the Electronic Commerce domain. As a consequence, building personalized information selection models is becoming crucial. Among many different information selection technologies, the recommendation systems are greatly developed due to their application on most of the famous online shopping companies [1, 2].
The algorithms of recommending items have been studied extensively, most of which belong to two main categories. Contentbased recommendation systems try to recommend items according to the users’ past preference [3–5], whereas the collaborative recommendation systems make the recommendation in terms of the similar neighborhood preference [6–9]. Recommendation systems based purely on content generally easily suffer from the problems of limited content analysis and overspecialization. Defining the appropriate items’ features is very difficult for many situations, and these features depend heavily on the users’ history, which cannot find the latent profiles for recommendation.
Collaborative filter (CF) approaches overcome some of the limitations of contentbased ones. Items for which the content is not available or difficult to obtain can still be recommended to users through the feedback of other users. CF ones can also recommend items with very different content, as long as other users have already shown interest for these different items. Among collaborative recommendation approaches, methods based on nearest neighbors still enjoy a huge amount of popularity, due to their simplicity, their efficiency, and their ability to produce accurate and personalized recommendations [10–12]. CF models try to capture the interactions between users and items that produce the different rating values. However, many of the observed rating values are due to effects associated with either users or items, independently of their interaction. A principal example is that typical CF data exhibit large user and item biases, that is, systematic tendencies for some users to give higher ratings than others and for some items to receive higher ratings than others.
Itembased collaborative filter [13, 14] has much more accuracy than userbased one [15, 16], when the number of items is larger than the number of users. The electronic commercial business always has huge productions. The number of productions far exceeds the number of users. However, the average number of common ratings is very small, because most of the users only have interest in very few items. Userbased collaborative filter systems easily suffer from overfitting problems in this situation. So the itembased collaborative filter algorithms play an important role in modern commercial recommendation systems (RSs). This paper intends to improve the recommendation performance using a novel rating normalization strategy.
When it comes to assigning a rating to an item, each user has its own personal scale. Even if an explicit definition of each of the possible rating is supplied, some users might be reluctant to give high/low scores to items they liked/disliked. There are some different rating normalization schemes which are designed for different reasons [17–19]. Also, many of the observed rating values are due to effects associated with either users or items, independently of their interaction. We do not only convert individual ratings to a more universal scale but also consider the user and item biases.
The baseline predictor (BLP), which combines the overall averaging rating and user or item biases, involves these factors for normalization. But, for the itembased collaborative filter systems, the BLP is always a statistical constant which cannot be adaptively changed according to the context [20–23]. We found that the recommendation performance can be improved if we slightly scale the different parts of the BLP in a limited range. In this paper, we provided some novel contextaware scaled baseline predictors (CASBLP) for itembased collaborative filter normalization, considering different context information. The experimental results show that CASBLP can significantly improve the prediction performance, such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), precision, recall, and Normalized Cumulative Discounted Gain (nDCG).
The rest of this paper is organized as follows. We present the details of CASBLP in Section 2 and show experimental results in Section 3. Finally, we conclude the paper in Section 4.
2. Description of Models
2.1. Baseline Predictor for ItemBased CF Normalization
A general neighborhoodbased collaborative filter recommendation using BLP normalization is defined as follows:
is rating predictor based on the nearest neighbors. is the baseline predictor, which is always defined as
Denote by the average ratings. The parameters and indicate the observed deviations of item and user , respectively, from the average.
For itembased CF, we do not use the user biases due to using the similar items as neighbors. So the BLP in itembased CFS is
is replaced by the following formula:
is the set of the most similar items to the item , and is the set of items the user has rated.
There are many different similar weight functions. In this paper, we use two popular ones, Cosine and Pearson’s Correlation, the details of which are defined, respectively, as
2.2. Motivation of Scaling Baseline Predictor
The baseline predictor can introduce some information which is independent of neighborhood influence, but it is always set as a constant. However, we found that slightly scaling the baseline predictor could get a better predicting accuracy. But using a single scaling factor for , , and is not a good idea. Figure 1 shows an example where we can decrease the RMSE when scaling (e.g., ) on a small MovieLens dataset.
From Figure 1, the best scaling factor is 0.6, at which we can get the lowest RMSE. However, from another perspective, such as Top measure, using the same scaling factor is not a good choice. Figure 2 shows that scaling BP could not improve the precision and recall.
For the recommendation systems, Top measure is more important than RMSE. To improve both RMSE and Top measure, we should not use the same scaling factor for the parameters in :
Determining these three parameters is very difficult, but, unlike matrix factorization models, NBCFs can also not train the unknown parameters. In this paper, we provide several contextaware scaling factors. Before describing the details, we first change (6) to another representation. Actually, the baseline predictor can be also described as
is the set of users rating item . The scaling version of baseline predictor can be considered as
Here, we use the denominator to control the scaling factors, and hence . In fact, is the Bayesian mean damping term [24]. It biases means toward the global mean . Our task is to determine and according to the context information.
The recommendation system is a very special machine learning research. The useritem matrix is always too sparse. When data is sparse, we need other sources of knowledge to help the machine learning algorithm. Mining the context information is a way of adding knowledge to the recommendation system algorithms.
2.3. ContextAware Scaled Baseline Predictors
We consider several context situations to determine the scaling baseline predictors: ratings distribution, categories distribution, timestamp distribution, and links distribution. At first, we denote by the set of all the items and by the set of all the users.
The rating distribution aware (RDA) method scales the baseline predictors in terms of ratings distribution. The values of ratings are usually discrete. Denote by the set of possible rating values, where .
Denote by the set of rating records of which the value is : is the user, represents the item, and means the rating of rated by . Also, denotes the set of users whose ratings contain , and denotes the set of items which are rated using the value . Now we sort all , and let be the set of order by descent according to the size of sets: , where . Denote by all the rating records. The scaling factors of RDA are evaluated as Here, we use the largest . If the sizes of some sets are equal and the number of candidates is larger than , we randomly select the sets of the same size.
Like RDA, the category distribution aware (CDA) method scales the baseline predictors in terms of category distribution. The items in recommendation system always have some labels, indicating some special attributes. In the MovieLens, the movies have some labels of genres. Each label corresponds to a category, and each item may belong to at least one category.
Suppose we have categories, and denote by the set of these different categories, where . Denote by the set of items belonging to . is a descent ordered set according to the size of set: , where . For CDA, the scaling factors are expressed asNote that to determine we use as the numerator and as the denominator. The difference is that the items always belong to multiple categories.
There is always a timestamp record for each rating. The timestamp distribution aware (TDA) method scales baseline predictor in terms of timestamp distribution. Suppose that the element of is a 4tuple, where . The meanings of , , and are the same as in . is just the timestamp when rated by the score . The format of is usually a Unix timestamp. We change to “yymmdd” format . That means the base unit of time is the day, and now .
Let be the set of rating records, of which the reduced timestamp is . Like the previous two methods, we create a descent ordered set according to the size of , where .
We select the first elements of to compose another truncated set . Denote by the set of distinct users of the rating records belonging to . The scaling factors of TDA are expressed as
The links distribution aware (LDA) method scales baseline predictor in terms of links distribution. The links mean the relationship between users and items, which make up a rating network. Any pairs of users have no link, and any pairs of items also have no link. Equation (13) and Figure 3 show an example of rating network:
Only when the rating between and is larger than or equal to can we connect and . The degree of the user is expressed as and for the item . We create two descent ordered sets and according to the degrees. It is obvious that and . But, for convenience, we use different symbols. That is, and . There is a unique mapping from to and from to . For and , we have and . We put the ordered degrees of users and items into two sets, respectively: and , where and .
For LDA, we have two ways of evaluating the scaling factors. When considering the degrees of the users, the method is called LDAU, which is expressed as
Also, when considering the degrees of the items, the method is called LDAI, and the scaling factors are expressed as
Unlike the other methods, LDA controls and using different distributions. For , we use the top and top degrees, but for , we use the top degrees and the average degree.
3. Experiments
3.1. Experimental Settings
We use a MovieLens latest dataset in our experiments, including 100,000 ratings and 6,100 tag applications applied to 10,000 movies by 700 users [25]. There are four files for each dataset: links, movies, ratings, and tags. We use these files to get different context information. We compare several different methods in our experiments, the names and meanings of which are shown in Table 1.

The total methods compared are defined in Table 1. There are two similarity weight functions in our experiments: Cosine and Pearson’s Correlation. The neighborhood sizes of itembased models are all set to 20, while they are 100 for userbased models. Values of in (11)~(15) are the same, 6 in default. The values of are also the same for these different methods, 20 in default. We randomly split the dataset into 5 parts and use crossvalidation to train and test the models.
For top metric (e.g., precision and recall), we randomly select 100 items on the testing as the candidates, excluding the ones appearing in the training. Only the items rated above 3.5 (including 3.5) are recommended.
The neighborhood collaborative filter models always incur a high memory cost. So we use a 16 GB RAM to run different NHCF algorithms.
3.2. Experimental Metrics
Five metrics are used in our experiments: precision, recall, RMSE (Root Mean Squared Error), MAE (Mean Absolute Error), and nDCG (Normalized Cumulative Discounted Gain).
For a test dataset , denote by TP the set of recommend items which the users are really interested in, denote by FP the set of recommend items which the users are not interested in, denote by FN the set of not recommend items which the users are interested in, and denote by TN the set of not recommend items which the users are not interested in. The metrics of precision and recall are defined, respectively, as follows:
The recommendation system generates predicted ratings for a test set of useritem pairs () for which the true ratings are known. The RMSE and MAE between the actual ratings are given by
The recommendation systems always present to the user a list of recommendations, imposing a certain natural browsing order. In many cases, we are not interested in predicting an explicit rating or selecting a set of recommended items, as in the previous sections; rather we are interested in ordering items according to the user’s preferences. nDCG is a measure from information retrieval, where positrons are discounted algorithmically. Assuming that each user has a “gain” from being recommended an item , the average Discounted Cumulative Gain (DCG) for a list of items is defined as where the logarithm base is a free parameter, typically between 2 and 10. A logarithm with base 2 is commonly used to ensure that all positions are discounted. nDCG is just the normalized version of DCG:where is the ideal DCG, the value of which ranges from 0 to 1. The larger the value is, the better the performance is.
3.3. Experimental Results
We change a little the format of the MovieLens dataset and import this dataset to a MySQL database. The coefficients of BLP can be conveniently calculated using some advanced SQL sentences. All of the coefficients of CASBLP methods are shown in Table 2.

The experimental comparison results are shown in Table 3 (using Cosine similarity) and Table 4 (using Pearson’s Correlation). It seems that using Cosine is better than using Pearson’s Correlation in our experiments. Maybe this is because even if each user has different personal rating scale, the rating matrix is too sparse to become the major issue. When data is sparse, Cosine is always a good choice.


From Table 3, we can see that when not using normalization scheme (NoBP) all of the metrics are much worse than the others. The unscaled BLP is even better than NoBP, in which the precision increases by about 7%, recall increases by more than 10%, and RMSE decreases by 4%. It is surprising that only using the simple unscaled BLP the MAE increases by 15% and the nDCG increases by more than 20%. Because recommendation order has a great commercial significance, the normalization is an important improvement in recommendation system. Our contextaware scaled BLP normalization schemes make further improvement, mainly on the precision and recall metrics. From both Tables 3 and 4, CASBLP normalization has almost the same RMSE, MAE, and nDCG as the USBP, sometimes even little worse than USBP. But, for a commercial recommendation system, what the users care about is whether the RSs recommend what they really need. The production selling would benefit from even a 1% improvement on precision or recall. The precision of our CASBLP schemes increases by about 5%, and the recall increases by about 8%, which is a great improvement from the commercial perspective.
An important problem is that the coefficients we used have optimal values. So we change from 0 to 1 and from 0 to 200 to see the changes of the performance. Figures 4–6 show the impact of scaled factors on RMSE, precision, and recall, respectively.
For all these three metrics, the optimum of is near 20, at which the RMSE is the lowest and the precision and recall are the highest. What is interesting is that any shrinking of can improve precision and recall, even if we set to zero. However, shrinking would cause a slightly higher RMSE except at the value near 0.8.
This means that can control the accuracy of the rating prediction, but when has shrunk, plays a crucial role in items recommendation. What causes this phenomenon is that maybe the mean rating is computed in terms of all the users, which involves the global information, while the biases are computed in terms of only very few similar neighbors, which involves the local information. For the personalized recommendation systems, the local information is much more important, and an ordinary average prediction has little meaning. That is why even if we set to 0 and only using the item biases we can also get a passable prediction performance.
The neighbor size is an important factor in the neighborhoodbased recommendation systems, for itembased or userbased ones. We increase the neighbor size geometrically from 5 to 320. Figures 7, 8, and 9 show the change of recommendation performance including precision, recall, and RMSE, respectively.
What we can see from Figure 9 is consistent with what we have concluded from Tables 3 and 4. Whether using scaled BLP or unscaled BLP, we can get similar RMSE, which are all much lower than the NoBP scheme. With the growth of the neighbor size, all the RMSE are trending toward stability.
What surprised us is the results of precision and recall. Both metrics are increasing until reaching the stable values with the growth of neighbor size except the NoBP scheme, the precision and recall of which decrease to the stable values. This is due to the fact that, maybe without normalization, the prediction lacks personalization and causes too many more decoys to choose from.
Figures 7 and 8 also show the results which are consistent with Tables 3 and 4. Just slightly changing the coefficients of BLP, we can get higher precision and recall than unscaled BLP scheme and NoBP especially when using larger neighbor size.
4. Conclusions
Rating normalization is an important step when designing collaborative filter recommendation systems, especially for the itembased ones which play a key role in the domain of online commercial business. Using the baseline predictor for normalization considers both the global information and local information. Although we found that balancing them can improve the recommendation performance, there is no clear way of determining the weight of these two sources of information. In this paper, we proposed some contextaware scaled BLP schemes, which compute the weights of mean ratings and biases, respectively, in terms of different context information. What we concluded from the experiments not only verified the advantage of scaled BLP but also pointed out the different roles of each part of BLP. This paper only studied the BLP normalization of itembased collaborative filter system on a sole MovieLens dataset. The userbased and matrix factorization models actually are much different from itembased ones, the details of which we will explore in the future work using some different and larger recommendation dataset.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (no. 61602399 and no. 61502410) and Shandong Provincial Natural Science Foundation, China (no. ZR2016FB22).
References
 T.Y. Ku, H.S. Won, and H. Choi, “Service recommendation system for big data analysis,” in Proceedings of the 30th International Conference on Information Networking (ICOIN '16), pp. 317–320, January 2016. View at: Publisher Site  Google Scholar
 D. Zhang, T. He, Y. Liu, S. Lin, and J. A. Stankovic, “A carpooling recommendation system for taxicab services,” IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 254–266, 2014. View at: Publisher Site  Google Scholar
 L. Yao, Q. Z. Sheng, A. H. H. Ngu, J. Yu, and A. Segev, “Unified collaborative and contentbased web service recommendation,” IEEE Transactions on Services Computing, vol. 8, no. 3, pp. 453–466, 2015. View at: Publisher Site  Google Scholar
 Z.S. Chen, J.S. R. Jang, and C.H. Lee, “A kernel framework for contentbased artist recommendation system in music,” IEEE Transactions on Multimedia, vol. 13, no. 6, pp. 1371–1380, 2011. View at: Publisher Site  Google Scholar
 W. Paireekreng, “Mobile content recommendation system for revisiting user using contentbased filtering and clientside user profile,” in Proceedings of the 12th International Conference on Machine Learning and Cybernetics (ICMLC '13), vol. 4, pp. 1655–1660, July 2013. View at: Publisher Site  Google Scholar
 X. Du, L. Huang, and Y. Du, “Improve the collaborative filtering recommender system performance by trust network construction,” Chinese Journal of Electronics, vol. 25, no. 3, pp. 418–423, 2016. View at: Publisher Site  Google Scholar
 L. Wang, X. Meng, and Y. Zhang, “Applying HOSVD to alleviate the sparsity problem in Contextaware recommender systems,” Chinese Journal of Electronics, vol. 22, no. 4, pp. 773–778, 2013. View at: Google Scholar
 Y. Cai, H.F. Leung, Q. Li, H. Min, J. Tang, and J. Li, “Typicalitybased collaborative filtering recommendation,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 3, pp. 766–779, 2014. View at: Publisher Site  Google Scholar
 Y. Hu, Q. Peng, X. Hu, and R. Yang, “Time aware and data sparsity tolerant web service recommendation based on improved collaborative filtering,” IEEE Transactions on Services Computing, vol. 8, no. 5, pp. 782–794, 2015. View at: Publisher Site  Google Scholar
 J. Wu, L. Chen, Y. Feng, Z. Zheng, M. C. Zhou, and Z. Wu, “Predicting quality of service for selection by neighborhoodbased collaborative filtering,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 43, no. 2, pp. 428–439, 2013. View at: Publisher Site  Google Scholar
 X. Ma, C. Wang, Q. Yu, X. Li, and X. Zhou, “An FPGAbased accelerator for neighborhoodbased collaborative filtering recommendation algorithms,” in Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER '15), pp. 494–495, IEEE, September 2015. View at: Publisher Site  Google Scholar
 Y. El Madani El Alami, E. H. Nfaoui, and O. El Beqqali, “Toward an effective hybrid collaborative filtering: a new approach based on matrix factorization and heuristicbased neighborhood,” in Proceedings of the 1st International Conference on Intelligent Systems and Computer Vision (ISCV '15), pp. 1–8, March 2015. View at: Publisher Site  Google Scholar
 G. Linden, B. Smith, and J. York, “Amazon.com recommendations: itemtoitem collaborative filtering,” IEEE Internet Computing, vol. 7, no. 1, pp. 76–80, 2003. View at: Publisher Site  Google Scholar
 J. Zou and F. Fekri, “A belief propagation approach to privacypreserving itembased collaborative filtering,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 7, pp. 1306–1318, 2015. View at: Publisher Site  Google Scholar
 N. Chang and T. Terano, “Improving the performance of userbased collaborative filtering by mining latent attributes of neighborhood,” in Proceedings of the International Conference on Mathematics and Computers in Sciences and in Industry (MCSI '14), pp. 272–276, September 2014. View at: Publisher Site  Google Scholar
 Z. Jia, Y. Yang, W. Gao, and X. Chen, “Userbased collaborative filtering for tourist attraction recommendations,” in Proceedings of the IEEE International Conference on Computational Intelligence and Communication Technology (CICT '15), pp. 22–25, February 2015. View at: Publisher Site  Google Scholar
 M. R. N. Ranjbar, M. G. Tadesse, Y. Wang, and H. W. Ressom, “Bayesian normalization model for labelfree quantitative analysis by LCMS,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 4, pp. 917–927, 2015. View at: Publisher Site  Google Scholar
 J. Liu and G. Deng, “A newuser coldstarting recommendation algorithm based on normalization of preference,” in Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM '08), pp. 1–4, October 2008. View at: Publisher Site  Google Scholar
 R. Jin, L. Si, and C. Zhai, “Preferencebased graphic models for collaborative filtering,” in Proceedings of the 19th Annual Conference on Uncertainty in Artificial Intelligence (UAI '03), pp. 329–333, San Francisco, Calif, USA, 2003. View at: Google Scholar
 R. Jin, L. Si, C. Zhai, and J. Callan, “Collaborative filtering with decoupled models for preferences and ratings,” in Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM '03), pp. 309–316, November 2003. View at: Google Scholar
 J. Li, P. Feng, and J. Lv, “ICAMF: improved contextaware matrix factorization for collaborative filtering,” in Proceedings of the 25th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '13), pp. 63–70, November 2013. View at: Publisher Site  Google Scholar
 M. Tang, Z. Zheng, G. Kang, J. Liu, Y. Yang, and T. Zhang, “Collaborative web service quality prediction via exploiting matrix factorization and network map,” IEEE Transactions on Network and Service Management, vol. 13, no. 1, pp. 126–137, 2016. View at: Publisher Site  Google Scholar
 X. Luo, M. Zhou, H. Leung et al., “An incrementalandstaticcombined scheme for matrixfactorizationbased collaborative filtering,” IEEE Transactions on Automation Science and Engineering, vol. 13, no. 1, pp. 333–343, 2016. View at: Publisher Site  Google Scholar
 M. D. Ekstrand, J. T. Riedl, and J. A. Konstan, “Collaborative filtering recommender systems,” Foundations and Trends in HumanComputer Interaction, vol. 4, no. 2, pp. 81–173, 2010. View at: Publisher Site  Google Scholar
 GroupLens, MovieLens Latest Datasets, January 2016, http://grouplens.org/datasets/movielens/.
Copyright
Copyright © 2017 Wenming Ma et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.