Abstract
Aggregated recommendation refers to the process of suggesting one kind of items to a group of users. Compared to useroriented or itemoriented approaches, it is more general and, therefore, more appropriate for coldstart recommendation. In this paper, we propose a random forest approach to create aggregated recommender systems. The approach is used to predict the rating of a group of users to a kind of items. In the preprocessing stage, we merge user, item, and rating information to construct an aggregated decision table, where rating information serves as the decision attribute. We also model the data conversion process corresponding to the new user, new item, and both new problems. In the training stage, a forest is built for the aggregated training set, where each leaf is assigned a distribution of discrete rating. In the testing stage, we present four predicting approaches to compute evaluation values based on the distribution of each tree. Experiments results on the wellknown MovieLens dataset show that the aggregated approach maintains an acceptable level of accuracy.
1. Introduction
Recommender systems (RSs) [1–3] have been extensively studied to present items, such as movies, music, and books. They collect information on the preferences of their users for a set of items. The information is used to fulfill two main user tasks; one is predicting the rating [4], and the other is finding good items [5].
Modelbased RSs apply demographic or content information to construct a model. Some algorithms, such as Bayesian classifiers [6] and decision trees [7], have been used to generate respective models. Modelbased algorithms are suitable for coldstart recommendation for new item, new user, and new community [2, 8, 9].
Aggregated RS aims at recommending a kind of items to a group of users. A popular approach is to jointly recommend items to user groups [10] (e.g., a group of four friends who wish to choose a movie). Given the specific characteristics of the recommendation to groups, Jameson and Smyth [11] proposed to appropriately establish a consensus for different group semantics that formalize the agreements and disagreements among users.
In this paper, we propose a random forest (RF) approach to create aggregated RS through taking advantage of demographic, content, and rating information. This approach intends to predict the ratings of a group of users to a kind of items and deal with three coldstart recommendation problems, namely, new item (NI), new user (NU), and double coldstart (DOCS), where new items are recommended to new users. Decision tree [12] is a natural approach to these problems. However, one decision tree only takes advantage of limited information of users and items. Therefore, for many new users and new items, a decision tree may produce no predicting result at all. Our approach uses different information of users and items to construct an RF (a collection of decision trees [13]) to avoid this situation.
Our approach has three stages. In the preprocessing stage, the user, item, and rating tables of the original dataset are merged into an aggregated table. Then we construct training and testing sets through crossvalidation. To the best of the authors’ knowledge, little is known regarding the aggregated approach. In the training stage, an RF predictor is built to ensemble individual tree predictors. Each decision tree classifier is generated from the training set with each leaf assigned a distribution of the class attribute. In the testing stage, the demographic and content information of each useritem pair are fed to all decision trees in the RF. Each instance can get the class distribution information through topdown search. We adopt four predicting approaches, called standard voting, weighted average, distribution aggregation based voting, and distribution aggregation based average, to compute evaluation values of the RF.
The contribution of the paper is fourfold. First, we propose a new aggregated approach to predict the rating of a group of users to a kind of items. Second, we merge demographic, content, and rating information into a new aggregated table and then adopt three kinds of strategies, namely, NI, NU, and DOCS, to split training and testing sets through crossvalidation. Third, we build five kinds of aggregated RSs by RF. The first three RFs are NI, NU, and DOCS. The two other RFs are new item with average rating (NIAR) and new user with average rating (NUAR). DOCS RS can be used to recommend a kind of new items to a group of new users. Fourth, we develop four ensemble approaches to compute predicting ratings of these aggregated RSs.
Experiments are undertaken with five scenarios corresponding to five aggregated RSs. The abovementioned ensemble approaches are employed to find the appropriate setting of the forest size and compare the performance with respect to the mean absolute error (MAE) [14] in each scenario. MAE is a statistical accuracy metric that measures the deviation between real ratings and predictions generated by the RS. If are all the real values in the target set, are the predicted values for the same ratings, and , then the MAE is The lower the MAE, the more accurate the approach.
Experimental results on the wellknown MovieLens dataset show that the size of the forest is not large to ensure that the performance in terms of MAE keeps stable; the aggregated approach maintains an acceptable level of accuracy.
2. Data Models
In this section, the original data sets are converted to an aggregated decision table. Five kinds of aggregated data models are constructed through crossvalidation.
2.1. Information Systems and Decision Systems
In this subsection, we revisit the definitions of information system [15] and decision systems.
Definition 1. is an information system, where is the set of all objects, is the set of all attributes, and is the value of on attribute for and .
Example 2. An example of information system is given by Table (a) of Figure 1, where and . UID is a key. Another example of information system is given by Table (b) of Figure 1.
The decision system is a fundamental concept in data mining and machine learning and is often defined as follows [16].
Definition 3. A decision system is the 5tuple: where is a finite set of objects called the universe, is the set of conditional attributes, is the set of decision attributes, is the set of values for each , and is an information function for each .
2.2. Rating System
In this subsection, the rating system is defined.
Definition 4. Let and be two sets of objects.
Consider
where is the domain of rating.
If is boolean, is a binary relation from to . If is numeric, is a rating function from to . In this paper, we discuss the numeric ratings. A rating function is more often stored in the database as a table with two foreign keys. In this way the storage is saved. For the convenience of illustration, here we represented it with an rating matrix.
With Definitions 1 and 4, we propose the following definition.
Definition 5. A rating system is a 5tuple , where and are two information systems, and is a rating function.
Example 6. An example of rating is given by Table (c) of Figure 1, where is the set of users as indicated by Table (a) of Figure 1 and is the set of items as indicated by Table (b) of Figure 1. Here .
An example of rating system includes Tables (a), (b), and (c) of Figure 1.
2.3. Aggregated Decision Systems
In this subsection, we build decision systems to mine the behavior of users on items. For this purpose, we propose the concept of aggregated decision system as follows.
Definition 7. An aggregated decision system induced by a rating system is where and , for all , .
The number of objects in is . To distinguish this type of decision system from the other types discussed later, we refer to it as the aggregated decision system (ADS) or the firstclass decision system (1DS).
In Table (c) of Figure 1 some elements are 0, in which zero element means that a user has not rated the movie. We remove them and get a new decision system as follows.
Definition 8. An aggregated decision system with positive rating induced by a rating system is where and , , .
The number of objects in is . We refer to it as the or the secondclass decision system (2DS).
Example 9. Table (d) of Figure 1 presents a decision system where , , ,, and . is a key pair. The key pair does not participate in the establishment of RF model; therefore, it is not used in the mining work.
The attribute of average rating (AR) has been the focus of most empirical studies on product reviews [17]. There are two kinds of AR. One kind is AR of user (UAR), which reflects the rating habit for the user. The other is AR for item (IAR), which reflects the degree of item popularity. With Definition 8, we can define a new type of the aggregated decision system with AR as follows.
Definition 10. An aggregated decision system with or the thirdclass decision system (3DS) is where and , , .
Example 11. The movies , , and are rated by user in Table (c) of Figure 1. Therefore, the AR of is . Similarly, the of is .
The movie is rated by the users , , and in Table (c) of Figure 1. Therefore, the AR for is . Similarly, the for is .
In some situations we are interested in the aggregated decision system concerning a subset of users and items.
Definition 12. A subset of the aggregated decision system () or the fourthclass decision system (4DS) with respect to and is where , , , and , , .
Definition 13. A subset of the aggregated decision system with AR () or the fifthclass decision system (5DS) with respect to and is where , , , and , , .
In this paper, we discuss the coldstart problem. In Definitions 12 and 13, the demographic or content information of the subsets is not independent. Therefore, the subsets are not used to solve coldstart problem.
2.4. Data Splitting
For proper estimation of the classification accuracy, the decision sytem is divided into training and testing sets. The training set is used to calculate a classifier, which is used to classify the testing set.
We adopt three kinds of splitting strategies based on , namely, NU, NI, and DOCS. Two kinds of splitting strategies are adopted based on , namely, NUAR and NIAR. NU and NUAR approaches split the user table into two parts. Then each part is merged into item and rating information, respectively, to construct training and testing sets. A sample is that the number of training or testing sets is of the original set. NI and NIAR approaches split the item table into two parts. Then each part is merged into user and rating information, respectively, to construct training and testing sets. A sample is that the number of training or testing sets is of the original set.
DOCS approach splits user table and item table into two parts, respectively. The first part of user table, the first part of item table, and rating information are merged into training set. And the second part of user table, the second part of item table, and rating information are merged into testing set. A sample is that the number of training or testing sets is of the original set.
Supposing a group of new users and the item model, the function predicts whether these users would be interested in a set of items. With Definitions 8 and 12, the training or testing set of NU is defined as follows.
Definition 14. A subset of the aggregated decision system with respect to user sampling is where , .
Give and , where and . While , is the training set. While , is the testing set.
Given a set of new items and the user model, the function predicts whether a group of users would be interested in these items. With Definitions 8 and 12, the training or testing set of is defined as follows.
Definition 15. A subset of the aggregated decision system with respect to user sampling NI is where , .
Give and , where and . While , is the training set. While , is the testing set.
With Definitions 12, 14, and 15, the training or testing set of DOCS is defined as follows.
Definition 16. A subset of the aggregated decision system with respect to user sampling DOCS is where , , and .
Give and , where and . Give and , where and . While and , is the training set. While and , is the testing set.
With Definitions 10 and 13, the training or testing set of NUAR is defined as follows.
Definition 17. A subset of the aggregated decision system with respect to user sampling NUAR is where , , and .
Let , , , and . While , is the training set. While , is the testing set.
With Definitions 10 and 13, the training or testing set of NIAR is defined as follows.
Definition 18. A subset of the aggregated decision system with respect to user sampling NIAR is where , , and .
Let , , , and . While , is the training set. While , is the testing set.
3. Random Forest Based Prediction
In this section, an RF for aggregated dataset is constructed. Four kinds of predicting approaches will be used to compute evaluation values.
3.1. Construct the Random Tree
There are two aspects to build an RF: random decision trees are built based on the aggregated training set; an RF is constructed based on these trees.
Decision tree learners build a decision tree by recursively partitioning training data. In the build process of random decision tree, demographic and content information serve as conditional attributes, and rating information serves as the decision attribute. Each roottoleaf path of tree represents a rule for the ratings of one kind of movies by one group of users.
Example 19. In Table (d) of Figure 1, the conditional attributes are and the decision attribute is .
There are four steps to build random decision tree: an attribute is randomly selected from the conditional ones as the root node, when the information gain of the attribute is more than 0; the original set will be split to many subsets based on values of the root node; other splitting nodes are constructed based on algorithm of random decision tree, and these subsets will be split recursively to construct subtrees; the leaves are assigned the vector which indicates the distribution of the decision values. The building process of a random decision tree is described in Algorithm 1.

The following examples illustrate the selection process of root node and the way to get the distribution of leaf node.
Example 20. The conditional attribute is randomly selected as tree node. After a randomized selection, the root node of Figure 2(a) is , and the root node of Figure 2(b) is . Then we illustrate the subtree and leafnode construction process of Figure 2(a). The training data is split according to three values , , and of . is randomly selected as subtree node corresponding to the value . The leaf node corresponding to the value is the distribution of the decision values. There are two instances corresponding to the value , and the decision value of the two instances is the same rating based on the aggregated decision table of Figure 1. Therefore, the distribution of the leaf node is corresponding to the rating rated from . If standard voting to the distribution is used when the leaf node is built, the leaf of is 3. In other words, the roottoleaf path of the tree represents a rule that the rating of all movies rated by the student is 3.
(a)
(b)
A random decision tree only takes advantage of limited information of users and items. Therefore, for many new users and new items, a random decision tree may produce no predicting result at all. For example, the tree of Figure 2(a) has no information of users. No predicting result is produced if based on the classifier of information. But two trees of Figures 2(a) and 2(b) can avoid this situation.
3.2. Construct the Forest
Based on five kinds of splitting approaches mentioned in Section 2.4, we construct five kinds of RFs: NU forest, NUAR forest, NI forest, NIAR forest, and DOCS forest. NU and NUAR forests are only used to solve new user problem. The two models can predict the rating of a group of new users to a kind of items. NI and NIAR forests are only used to solve new item problem. The two models can predict the rating of a group of users to a kind of new items. DOCS forest can be used to solve both new user and item problems. The model can predict the rating of a group of new users to a kind of new items.
There are three steps to build the RF. Aggregated training and testing set are generated according to the different RF models. For NU forest, the original data is split based on Definition 14. For NUAR forest, the original data is split based on Definition 17. For NI forest, the original data is split based on Definition 15. For NIAR forest, the original data is split based on Definition 18. For DOCS forest, the original data is split based on Definition 16. Condition attributes are randomized based on random seed. Build random decision trees based on Algorithm 1. designated by the user is the size of forest. The building process of the random forest is described in Algorithm 2.

After the RFs are built, we can use them to predict. Figure 3 depicts the RF’s building and predicting process. Multiple RFs are built through selecting different numbers of random trees. Each tree in Figure 3 uses a different random seed; therefore, each one significantly contributes to the prediction.
3.3. Predicting Approaches
For each RF, we design four prediction approaches: standard voting, weighted average, distribution aggregation based voting, and distribution aggregation based average.
By comparing approaches and , we can know which is more precise between standard voting and weighted average. By comparing approaches and , we can know which is more precise between distribution aggregation based voting and distribution aggregation based average. By comparing approaches and , we can know which is more precise between standard voting and distribution aggregation based voting. By comparing approaches and , we can know which is more precise between weighted average and distribution aggregation based average.
We describe the four combination algorithms as follows.
(1) Standard Voting. For each instance of testing set, there are three steps to get the predicting rating through standard voting. First, each predicting rating is computed in each decision tree of RF through standard voting. These predicting ratings are discrete value. Second, the number of random trees is counted corresponding to each predicting rating. Third, the rating supported by the largest population of trees is used as the RF predicting value. This is given by where is count of random trees.
The following example illustrates the three steps.
(2) Weighted Average. For each instance of testing set, there are two steps through weighted average. The first step is the same as the first one of standard voting. Second, the weighted average of these predicting ratings is computed as the RF predicting value. This is given by where is the highest rating.
The following example illustrates the two steps.
Example 21. Based on Example 20 and Figure 2(a), the distribution of the leaf node is corresponding to the value of if is 5. When an instance of testing set gets the distribution after traversing the random decision tree classifier, the predicting rating of the instance is through standard voting.
Assume there are 10 trees of an RF. The number of random trees is 5 corresponding to the predicting rating . The number of random trees is 3 corresponding to the predicting rating . The number of random trees is 2 corresponding to the predicting rating .
The final predicting value of standard voting is , and then the final predicted value of weighted average is given by
(3) Distribution Aggregation Based Voting. For each instance of testing set, there are three steps to get the predicting rating through distribution aggregation based voting. First, each predicting distribution is obtained through topdown search of each decision tree of RF. Second, the cumulative vector is computed through accumulating the predicted distributions of all trees. Third, the final predicted value of the RF is computed based on the cumulative vector through standard voting.
(4) Distribution Aggregation Based Average. For each instance of testing set, there are three steps to get the predicting rating through distribution aggregation based average. The previous two steps of the algorithm are the same as distribution aggregation based voting. The third step is different between two algorithms. The final predicted value of the RF is computed based on the cumulative vector through average.
Example 22. Assume there are 3 trees of an RF. The predicted distribution of the first tree is . The predicted distribution of the second tree is . The predicted class distribution of the third tree is . The cumulative vector is .
The final predicted value of distribution aggregation based voting is , and then the final predicting value of distribution aggregation based average is given by
4. Experimental Results
In Section 3, we have designed five kinds of RFs and each kind has four predicting approaches. In this section, we finish a total of 20 forecast schemes. Each scheme is repeated 10 times with different random partitions into training and testing sets (i.e., 10 crossvalidation).
We try to answer the following questions through experimentation.(1)How large is the size of RF when the precision in terms of MAE keeps stable?(2)Which is more precise, in terms of MAE, NU, NI, or DOCS?(3)Which is more precise, in terms of MAE, NU or NUAR?(4)Which is more precise, in terms of MAE, NI or NIAR?(5)Which is more precise, in terms of MAE, standard voting, weighted average, distribution aggregation based voting, or distribution aggregation based average?
4.1. Dataset
We experimented with a wellknown MovieLens dataset (http://www.movielens.org/) assembled by the GroupLens project. It is widely used in recommender systems (see, e.g., [18]). The database schema is as follows:(i)user (userID, age, gender, and occupation),(ii)movie (movieID, releaseyear, and genre),(iii)rates (userID and movieID).
We use the version with 943 users and 1,682 movies. The original rate relation contains the rating of movies with 5 scales. The user age has 61 attributes as indicated by the data set. The user occupation has 21 attributes. Since there are 85 attributes of the movie releaseyear, the genre is a multivalued attribute. Therefore, we scale it to 18 boolean attributes, namely, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, FilmNoir, horror, musical, mystery, romance, scientificfiction, thriller, war, and western. All users have watched at least one movie, and the dataset consists of approximately 100,000 movies ratings. But rating matrix is still spare because no one has watched more than 45 percent of the total movies, and only the 20 percent users have watched more than 10 percent movies.
4.2. Results
The original dataset is partitioned into training set and testing set through crossvalidation. The training set is of the original one, and the testing set is of the original one.
In order to know the size of the forest when the precision in terms of MAE keeps stable, the number of random trees defined by us is from 2 to 50. We undertake 20 sets of experiments to answer the questions raised at the beginning of the section one by one. Each experiment is repeated 10 times with different sampling of training and testing sets, and the average accuracy in terms of MAE [19] is computed. MAE has been used to evaluate recommender systems in several cases [20, 21].
Figure 4(a) compares four approaches of NI. MAE’s range of four predicting approaches is between 0.88 and 0.91. Weighted average approach is the best in four ones. The precision of weighted average keeps stable when the size of the forest is greater than or equal to 20. However, the precision of three other approaches has kept stable.
(a) NI
(b) NU
(c) DOCS
(d) NIAR
(e) NUAR
Figure 4(b) compares four approaches of NU. MAE’s range of four predicting approaches is between 0.92 and 0.99. Standard voting approach is the best in four ones. The precision of standard voting has kept stable. However, the precision of three other approaches keeps stable when the size of the forest reaches a certain value.
Figure 4(c) compares four approaches of DOCS. MAE’s range of four predicting approaches is between 0.92 and 1.07. Standard voting approach is the best in four ones. The precision of standard voting has kept stable. However, the precision of three other approaches keeps stable when the size of the forest reaches a certain value.
Figure 4(d) compares four approaches of NIAR. MAE’s range of four predicting approaches is between 0.88 and 0.91. Weighted average approach is the best in four ones. The precision of standard voting has kept stable. However, the precision of three other approaches keeps stable when the size of the forest reaches a certain value.
Figure 4(e) compares four approaches of NUAR. MAE’s range of four predicting approaches is between 0.91 and 0.96. Distribution aggregation based voting approach is the best in four ones. The precision of standard voting has kept stable. However, the precision of three other approaches keeps stable when the size of the forest reaches a certain value.
In summary, we know from Figure 4 that the precision in terms of MAE is stable on the whole when size of random forest is 2 to 20. Among these approaches, the precision of standard voting has kept stable; NI approach is more precise than NU one. NU approach is more precise than DOCS one; aggregatedbased algorithms with AR are more slightly precise than without one. NIAR approach is almost the same as NI. The two approaches are the most precise in all prediction ones. One reason is that it is based on user rating history which forms his/her preference. They yield a MAE of 0.88 (on a fivepoint rating scale) on movie rating datasets.
5. Discussions
To the best of the authors’ knowledge, little is known in previously published studies of aggregate in recommender systems. The work is related to previously published works on modelbased RSs and group recommender.
Modelbased RSs use the demographic, item information, and collection of ratings to create a model that generates the recommendations [8]. Modelbuilding methods work by creating a model offline and then running the model online. Among the most widely used models, there are Bayesian classifiers [6], neural networks [22], and decision tree [23]. These models have been used to solve three kinds of coldstart problems [8]: new community, new item, and new user. The new community problem refers to the difficulty in obtaining a sufficient amount of data (ratings) for making reliable recommendations. The new item problem [8] arises because the new items entered in RS do not usually have initial ratings. The new user problem represents one of the great difficulties faced by the RS in operation. Since new user has not yet provided any rating in the RS, he/she cannot receive any personalized recommendations based on memorybased RS.
The random forest [13] is composed of many decision trees. Decision tree is a general computational model represented as a set of ifthen rules [24]. Each tree is built based on a different set of training data and grown to the largest extent possible without pruning. Each splitting or decision node is acted by the best splitting attribute from randomly selected subset of the conditional attributes. To classify a new object, each tree in the forest gives a classification. The final classification of the object is determined by majority of votes among the classes decided by the forest of trees.
As defined in [10, 25], there are two strategies commonly adopted for generating group recommendations: the aggregated models and aggregated predictions. The former combines individual user models, that is, individual user profiles that capture the preferences of a group member into a group model from where items are recommended for the group are identified, whereas the latter generates predictions for individual group members and then aggregates the predictions to suggest items for the group. In this paper we present our proposed aggregated recommender, which predicts the rating of a group of users to a kind of items.
6. Conclusions
In this paper, we proposed random forest approaches to aggregated recommendation. By comparing several predicting approaches, we may draw the following conclusions: MAE is stable on the whole when size of random forest is 5 to 20; aggregated recommender can be used to solve NI, NU, and DOCS problems; the precision of the NI approach is the highest; the attribute of AR can improve the predicting accuracy; the precision of DOCS approach maintains an acceptable level.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is in part supported by National Science Foundation of China under Grant nos. 61379089 and 61379049 and Scientific Research Starting Project of SWPU no. 2014QHZ025.