Abstract

Recommendation systems are used when searching online databases. As such they are very important tools because they provide users with predictions of the outcomes of different potential choices and help users to avoid information overload. They can be used on e-commerce websites and have attracted considerable attention in the scientific community. To date, many personalized recommendation algorithms have aimed to improve recommendation accuracy from the perspective of vertex similarities, such as collaborative filtering and mass diffusion. However, diversity is also an important evaluation index in the recommendation algorithm. In order to study both the accuracy and diversity of a recommendation algorithm at the same time, this study introduced a “third dimension” to the commonly used user/product two-dimensional recommendation, and a recommendation algorithm is proposed that is based on a triangular area (TR algorithm). The proposed algorithm combines the Markov chain and collaborative filtering method to make recommendations for users by building a triangle model, making use of the triangulated area. Additionally, recommendation algorithms based on a triangulated area are parameter-free and are more suitable for applications in real environments. Furthermore, the experimental results showed that the TR algorithm had better performance on diversity and novelty for real datasets of MovieLens-100K and MovieLens-1M than did the other benchmark methods.

1. Introduction

With the rapid development of the Internet and e-commerce, which have tremendous impacts on our lifestyles, the way that information is accessed has been changed. On the one hand, hundreds of millions of products are available online, making life much more convenient [1]. On the other hand, there is a problem of information overload. The large amount of data generated every day makes it difficult for items we want to be chosen as easily as previously [2]. Personalized recommendation is considered the most effective way to solve the problem of information overload [3, 4] and thus far, recommendation systems have been used in many fields [57]. However, it is still difficult to meet the growing demand for product information services [8]. Investigation results show that the underlying reasons for this problem can vary but the most prominent reason is the lack of adaptability of the recommendation algorithm, that is, the constraints exerted by the two performance evaluation standards of accuracy and diversity for the conventional recommendation algorithm [8].

A variety of personalized recommendation algorithms have been proposed previously [913], and the most representative recommendation algorithm is collaborative filtering, which includes user-based collaborative filtering (UCF) and item-based collaborative filtering (ICF) [14]. UCF and ICF are based, respectively, on the weighted combination of similar users’ opinions and the similarity between items [15]. There is also a series of methods for recommendations based on content filtering [1618], and there is a “community structure” in complex networks [19, 20]. The “time role method” [21] utilizes a time-aware modification of an existing recommendation method and exploits that by combining the temporal and structural aspects. In recent years, many physical dynamics-based methods have been used in recommendation algorithms such as mass diffusion (MD) [22], heat conduction (HC) [23], and a hybrid recommendation method that combines MD and HC [2426].

Essentially, all of these methods make recommendations for users by studying vertex similarity and focus primarily on recommendation accuracy, and therefore recommendation diversity is relatively poor. However, the latest study found that although the most popular methods do not achieve very high accuracy [27], its diversity performance is outstanding. Research results have shown that excessive attention to accuracy can have a detrimental effect on the quality of the recommendation results, especially in respect of their relevance to user interests [28, 29].

New developments in the e-commerce have taken place at the two big Chinese companies (Alibaba and Jingdong). For example, these companies are now developing vigorously their offline entities to operate online. The traditional two-dimensional recommendations (user and product model) cannot fully accommodate the roles of location and high diversity in that recommendation model. The present paper proposes a data model that introduces a “third dimension” into the traditional two-dimensional user/product recommendation relationship. A recommendation algorithm based on triangulated area is designed by using the third-dimensional data relationship. In this algorithm, the relationship weighs each of the two relationships in the three-dimensional model and uses the relationship weights between each of the two relationships as the side lengths of a triangle and then the triangular area is calculated using Heron’s formula. As each triangle corresponds to a product, recommendations can be made for the target user according to the sorted results of triangular areas, from large to small.

The triangle recommendation algorithm that is proposed in this paper has been tested on two real datasets. The results show that TR was more effective than other benchmark methods in terms of recommendation diversity. In addition, the parameter-free character of TR is an important feature as it is, therefore, easier to apply to reality than some parameter-based benchmark methods. The primary focus of the present work was not limited to overfocusing on recommendation accuracy, and therefore recommendation diversity increased dramatically.

The rest of this paper is organized as follows. In the second part, a detailed description is given of how to build a triangle model and how to apply the model to recommendation systems, and the diversity analysis of the triangle model is explained. In the third section, the experimental data are presented and the performance evaluation standards of the recommendation algorithm are introduced. The fourth section shows the performance of the proposed method on a public dataset and compares its performance with other benchmark methods. Finally, the work is summarized and an indication of the direction for future work is provided.

2. Triangle Recommendation Model

A user-object network without weights and directions is used in the study of recommendation system. Its function is to describe the relationship between the user and the product, where , , and represent the user set, the object set, and the edge set, respectively. To distinguish between user and object, Latin and Greek indices are used to represent them, respectively. Meanwhile, can be represented by an adjacency matrix , and is defined as an element in the adjacency matrix . If is collected by the user , then ; otherwise, . The ultimate purpose of any recommendation algorithm is to provide the target user with a list of rankings for objects that are collected. For user , the recommendation list is defined with a length of as . That is, for user , is a set of the first uncollected objects that has the highest recommendation points.

2.1. Construction of the Triangle Model

Recommendation accuracy and diversity are studied when evaluating the performance of recommendation algorithms. In order to achieve recommendations with high diversity, a “third dimension ()” was introduced. In the present study, the category attributes (in the MovieLens-100K dataset there are 19 product categories) of the products were tested as the third dimension. Furthermore, the third dimension in this study could be changed according to the actual application scenario (i.e., business and distance). Then the relationship between any two of the three dimensions is studied. Weighting is used to measure the strength of the relationship and, finally, the three weights are treated as the three sides of a triangle. Each triangle corresponds to an object and the final recommendations are determined based on the triangle area, as shown in Figure 1.

The composition of the three dimensions is U (user), O (object), and T1 (third dimension).

In this investigation, the relationship between two dimensions was not studied directly from the perspective of their similarity. In the user/object bipartite graph, the relationship weightings are calculated by combining the -mean with Markov chain. The user’s historical behaviors are applied as the basis of the study at the moment, and the user’s future behaviors then are predicted. The obtained prediction values are used as the relationship weight in the user/object relationship, which is denoted by . In the other two relationships, the relationship weightings are calculated using the collaborative filtering method, which are denoted by ,  . When and are determined, they can be used to construct a three-dimensional spatial structure with multiple objects (, in Figure 2). The recommendation list is composed of multiple objects. Then , , and can jointly form a three-dimensional structure, as shown in the figure . Similarly, a three-dimensional spatial structure can be constructed as shown in Figure 2. While multiple can correspond to a target , can correspond to multiple values, can correspond to multiple values, and so on. Furthermore, the final recommendation list corresponds to the spatial structure shown in Figure 2.

2.2. Solution of the Triangle Model

Firstly, the calculation of weightings between each user and object, using the “-means Markov chain,” was introduced. Here, the -means methodology was used first to conduct user clustering for (), taking user as an example.

Step 1. Because the sameness between and (in order to ensure the accuracy of the recommendation) must be considered in the clustering process and the differences between and (in order to ensure the recommendation diversity) must also be taken into account, the -means two-dimensional clustering method was used in the present study, where the number of common objects () between and was defined as the first dimension, and was defined as the second dimension. Therefore, the data structure of other user data during the clustering iswhere represents the degree of , .
The data of used for clustering isThe number of classes is the count of different sets: that is,The cluster about the user is formed by the clustering according to the Euclidean distance of two-dimensional vector.

Step 2. The cluster of is found, which is represented by . The number of different items , including the behavior records of all the users in , was counted, where represents all the users and the product purchase behavior records in .
The records of each user in were sorted by the ascending order of timestamps and a sequence then was obtained. Here, the timestamp is a factor in the construction of the Markov Chain.
No specific study was available of a time-based dynamic model as a reference [21]. Due to the uniqueness of object ID, the object ID was taken as the object status and the state transfer matrix then was constructed.

Step 3. is assigned. equals the total occurrence number of in divided by the total occurrence number of in .

Step 4. The initial behavior vector of item for is obtained by sorting the behavioral data of according to the order of , where is equal to the given score of the target user on the item .
Thus, the final result of isIn order to reduce the iteration numbers, the state transfer number was set as after comparing experiments and the collaborative filtering. Thus, the final weight was calculated as

Step 5. Next, in the other two relationships, which are the relationship weights of the object and the category and the relationship of the user and the category , was calculated using the collaborative filtering method, which is denoted as follows:where is the average weight of product and category, descripted asin which is the number of users that have actions on the given object, is the ranking of the th user on the given object, is the full user number, and is the number of categories of the given object.
In addition, the weight of the user and category is calculated with collaborative filtering method too and is denoted as follows:where is descripted aswhere is the average weight of user and category and is number of each category about the given user’s actions.

Step 6. Each triangle contains an object. As all the three side lengths are known, the triangle area can be obtained according to Heron’s formula as follows:where represents half of the perimeter:

Step 7. To make sure the three weights can construct a triangle, must be increased or decreased so that a triangle can be constructed with the other two weights. Because it will have an impact on the area whether it increases or decreases , the area must be corrected, as illustrated in Section 2.3.

2.3. The Changes of
2.3.1. Increase

When increasing , the area must be decreased accordingly. The increased is denoted asThe unknown represents the multiple of weight increase. No matter how much it increases, it must be within a small range. If the value is taken that meets the requirements, the obtained area iswhere isHere, an important assumption must be made that the original three weighting values can construct a triangle. When calculating the triangular area, the absolute value must be taken for each product factor in the root of (11); then we can get the median, , whereThen, the final equation for calculating the triangle area iswhere represents the area correction factor:

It is important to note that the main purpose of the proposed recommendation algorithm based on the triangle model is to improve the diversity of the recommendation. Thus, when correcting the area after increasing , the maximum value of the denominator in (18) is taken.

2.3.2. Decreasing

Processing of the area correction for the case of decreasing is similar to that for increasing. Only (17) and (18) need to be changed, and the following processing method is obtained:

It should be noted that to ensure recommendation accuracy, the minimum value of the denominator in (20) is taken. Although the research focus of this project is on the recommendation diversity, that does not mean giving up accuracy.

2.4. Analysis of Parameters and Diversity
2.4.1. The Analysis of Changes

In the construction of a triangle model, there will be cases when the three relationship weightings cannot meet the requirements for constructing a triangle. When , , and cannot meet the requirements for constructing a triangle, the processing method is as follows: keep the two weighting values unchanged and change the other weighting value. In practical applications, the “third dimension” is a variable factor: for example, it can be replaced by a seller. This investigation does not consider changes of and because both the relationship and relationship can be obtained from known information (for example, the training dataset). Below is an analysis showing that changes in and will not affect user behavior but a change in will have an impact on the recommendation result.

Analysis 1. Considering will have a direct impact on . Because there is no direct information representing the relationship in the known information, the change of does not directly affect . Similarly, the discussion of or is the same, so only is given here as an example.

Let the function , the change amount of is , the change amount of is , and the change amount of is ; then we have .

Then,

It can be obtained from the formula above that the user’s behavior will not be affected when is changed.

Analysis 2. Because there is no direct information representing the relationship in the known information, the choice of a different will lead to different user behavior. This is the value of the existence of . In the same way as for Analysis 1, the concept of derivatives can still be used to explain why changes in have an impact on the user behavior. Because is related to the product and the product is related to the user, the changes of will lead to changes in and at the same time.

If , the changes of will lead to the changes of and at the same time.

Then, .

Hence,

It can be concluded from the above formula that changes in will have impacts on the user behavior factor. Thus, the changed parameter in this study was .

2.4.2. Diversity Analysis of the Triangle Model

This algorithm introduces the “third dimension” to the traditional two-dimensional recommendation algorithm and proposes a method based on the triangular area to improve the diversity of the recommendation algorithm. In that case, the question arises: Is the diversity performance getting better with the increase of dimensions? To this end, this section of the manuscript presents an analysis using the “squeeze theorem” that is used to determine whether limits exist and demonstrates the method of equation-solving in linear algebra. The deterministic problem of diversity in two dimensions, four dimensions, and higher dimensions is analyzed.

Theorem 1. The triangular recommendation algorithm has better diversity performance than the two-dimensional recommendation algorithm.

Proof. The existing recommendation algorithms have introduced various methods from various fields and then the bipartite graph is used to describe how to make recommendations for the target users. These algorithms mainly focus on recommendation accuracy and have poor performance in diversity and novelty. When so implemented, the most popular recommendation algorithm does not have higher accuracy but it has better performance in the diversity and novelty [27]. With the following assumptions, the two dimensions can be mapped onto a linear function:

The and values represent the user and the object, respectively. When the slope (indicating the strength of the relationship between the user and the object) is larger, indicating that the user is more interested in this object, the object will be recommended to the user. Thus, the recommendation accuracy is improved. As the objects with large values cannot be distributed regularly on a line, they need to be translated for and units, as shown in Figure 3.

Thus, at least objects have the shortest distance in the space with distance of , and then the objects are recommended to target users, where is expressed as follows:If the “third dimension” is introduced, it is equivalent to adding an unknown parameter in (23) asIn a two-dimensional relationship, the parameters () are bound, and is a free variable. This is similar to the solving of multivariate linear equations system in linear algebra, so a suitable value can be found in the three-dimensional space to satisfy (25), and there are many values that satisfy the condition. This is the reason for better diversity performance after introducing the “third dimension.”

Theorem 2. Four-dimensional or even higher dimensional recommendation algorithms can be represented by the triangle recommendation algorithm.

Proof. Assume a diversity recommendation algorithm is constructed with user, object, , and fourth dimension . It is clear that no matter how many dimensions are added, the new added dimension must be closely linked with the object. Then the situations illustrated in Figure 4 should be considered to illustrate all possible distributions of the four dimensions. Of course, the area of the constructed graph is still used under each possible state in order to make recommendations. The following work is illustrated with Figure 4.

As the quadrilateral formed by four points varies, that is, it may not necessarily be a regular figure, so the area of the quadrilateral is denoted by , and it can be represented by

It is not difficult to see that is the most important weighting in , , and . It is also the research focus of the existing two-dimensional recommendation algorithm. At this point, two cases of , should be considered:

(1) When , for the target user, only one extra dimension is needed. Either or can be chosen and only three dimensions are needed.

(2) When or , is the sum of the two dimensions (in the three-dimensional relationship). The recommendation list of target users is the only obviously affected one of the new dimension. Therefore, another new dimension is redundant and only three dimensions are needed.

To sum up, in the two cases discussed above, the “squeeze theorem” can be used to illustrate the importance of adding a “third dimension” into the two-dimensional relationships. In other words, the purpose is to improve the diversity of the recommendation, but the accuracy of the recommendation cannot be abandoned completely. In the two-dimensional relationship, it is difficult to achieve good recommendation diversity; in the four-dimensional relationship, as illustrated by the discussion above, it is not necessary to have the fourth dimension. Hence, all cases for higher dimensions can be transformed into the research of several three-dimensional relationships and there is no need for higher multidimensional models. In the three-dimensional relationship, it not only retains the recommendation accuracy in the two-dimensional relationship but also improves the recommendation diversity. Thus, the method proposed in this paper, which is adding the third dimension to the two dimensions, is effective in improving diversity.

3. Data and Evaluation Indices

3.1. Data Description

In the research of traditional recommendation algorithms, there has never been a research method working with three dimensions. Therefore, this paper has difficulties in the data collection. Finally, two commonly used real datasets were selected for the purposes of this paper, which were the MovieLens-100K dataset and the MovieLens-1M dataset. These often are used as the database for different recommendation algorithms. The MovieLens datasets were provided by the Group Lens project at the University of Minnesota. The dataset uses a 5-point rating, and the higher the score, the better the data. In constructing the bipartite model, only data with a rating greater than or equal to 3 were considered. After coarse-graining, the small dataset contains 82520 links and the large one contains 836478 links. It should be noted that the bipartite model has no weighting in the following analysis. In other words, the rating is ignored. The basic statistics of the dataset are shown in Table 1.

3.2. Evaluation Indices

In order to evaluate the performance of recommendation algorithms in practical applications, cross-validation usually is used to assess how the results are extended to independent datasets [30]. A cross-validation process involves partitioning the datasets into complementary subsets, the training set, and the testing set. The training set is used in the recommendation algorithm, and the recommendation results are obtained. The testing set then is used to verify the result. In the following experiment, a 10-fold cross-validation strategy was used to evaluate the performance of the proposed algorithm in each independent realization. In particular, the user’s evaluation (i.e., at the edges of the bipartite graph) is divided randomly into 10 equal length subsamples and the partitioning process is independent of the user and the object. Then, one of the 10 subsamples is retained to test the performance of the recommendation algorithm. The remaining 9 subsamples are combined and used as a training set for the recommendations. In other words, 90% of the entire dataset is used for recommendation algorithms, and the remaining 10% is used to evaluate the recommendation results. Specifically, the 10-fold cross-validation process is repeated 10 times; that is, each of the 10 subsamples will be used as a testing set. Finally, the 10 test results are averaged to obtain a single result, which is used as the basis for evaluating the performance of the algorithm.

In previous literature, extensive research has been done on how to evaluate the performance of recommendation algorithms [31, 32]. In this study, seven widely used indices were used to evaluate the performance of the recommendation algorithm, including four accuracy indices (AUC, MAP, Precision, and Recall), two diversity indices (Intrasimilarity and Intersimilarity), and a novelty index (Popularity). The following is a brief overview of the seven indices.

Accuracy is one of the important indices for evaluating the quality of recommendation algorithms. First, the AUC (area under the ROC curve) [33] is introduced. The AUC value can be interpreted as the probability that a randomly chosen collected object is ranked higher than a randomly chosen uncollected object. In the calculation of AUC, a side randomly selected from the testing set is compared with a nonexistent randomly selected side. If the weight of side in the testing set is larger than that of the nonexistent side, then 1 point is added. If the two values are equal, then 0.5 points are added. After times independent comparisons, if the side value in the testing set is higher than that of the nonexistent side for times, and the two values are equal for times, then the AUC can be defined asThe greater the of AUC value, the higher the accuracy of the algorithm.

Then, three -dependent accuracy indexes are introduced, namely, the MAP (Mean Average Precision) index [34], the Precision index, and the Recall index [35]. The MAP is a standard rank-aware measure of the overall ranking accuracy in the field of information retrieval, which is similar to the average ranking score [31, 32]. The average precision for user is defined asHere, represents the number of objects in the testing set; represents the number of common objects of the testing set and the recommendation list with length of . represents the ranking of the th object in the recommendation list, and . Then, the MAP index can be obtained by averaging of all users:where represents the number of users. A larger MAP index corresponds to a better overall accuracy ranking.

The Precision index is defined as the ratio of the number of common objects that appear in the recommended list and testing set to the length of the recommendation list. For all the users, the average Precision is defined as

Recall is also defined as a ratioThe higher the values of Precision and Recall, the higher the recommendation accuracy.

In the personalized recommendation algorithm, diversity is an important index to evaluate the diversity of the objects recommended by the algorithm. As it is difficult to obtain external sources of object similarity information, the diversity measure usually is based on the evaluation matrix. The Intersimilarity is one of the widely used diversity indices and it can be quantified by Hamming distance [36]. The average Hamming distance value of all users can be defined aswhere represents the number of common objects in the recommendation list of user and user . The greater the Hamming distance value, the higher the diversity. Intrasimilarity is another similarity index [37]; in other words, it is measured by the cosine similarity between objects in the recommendation list of target user. Mathematically, the average Intrasimilarity [38] of all users is defined aswhere is the cosine similarity between the object and object in the recommendation list of user with length of . The smaller the Intrasimilarity value, the higher the recommendation diversity.

Novelty is an important index, which aims to quantify the ability of an algorithm to produce novel (i.e., unpopular or unwelcome) and unexpected results. Here, novelty is quantified using the average popularity of the recommended object. It can be defined aswhere is the degree of the object in the recommendation list of user . Lower popularity indicates higher novelty and, potentially, better user experience.

4. Experiments and Results

The recommendation algorithm based on triangle area was applied to two real online rating datasets. In order to facilitate the comparison, some existing recommendation algorithms also were considered, including global ranking (GR) [31], user-based collaboration filtering (UCF) [14], item-based collaboration filtering (ICF) [14], mass diffusion (MD) [22], heat conduction (HC) [23], and CosRA-based filtering [38].

In GR, from the perspective of item degree, all items are sorted in descending order according to their degrees and then items with high degrees are recommended to the target user [31] at one time. In UCF, the recommendation list of the target user is obtained by analyzing the historical behavior of other users with similar interests and preferences [39]. Similarly, in ICF, by analyzing the past preferences of the target user, the similarity of the items is obtained and the recommendation list then can be obtained [40]. MD can be viewed as the resource allocation process in the user-object network [41, 42]. The CosRA-based index combines the cosine index with the RA index and a new similarity index, CosRA, is generated. Through the two steps of resource allocation processes, the uncollected items are sorted in descending order according to the size of the resource and recommended to the user from large to small [38]. The realization details of the previous five recommendation algorithms can be found in the survey report [9].

The results of the seven evaluation metrics are shown in Table 2. As the Intrasimilarity value has good performance in both of the datasets, 4 decimal places were retained for the seven evaluation standards to better observe the change of the Intrasimilarity value in the two datasets. This study focused mainly on improvement of recommendation diversity and the proposed recommendation algorithm based on triangular area (TR) has better performance than other algorithms. This is discussed in two parts as shown in Tables 2 and 3. First, comparing TR with the 5 recommendation algorithms other than HC, it can be observed that TR has the highest Hamming distance value, the lowest Intrasimilarity value, and the highest novelty value. Although TR does not perform as well as other recommendation algorithms in the four accuracy metrics, it has the best performance in the three metrics that measure diversity and novelty.

By comparing TR and HC with other recommendation algorithms, it is evident that these two algorithms have better performance than other algorithms in terms of diversity and novelty. However, the performances of these two algorithms are inferior to other algorithms in terms of accuracy. Of the seven evaluation criteria, for the dataset MovieLens-100K, HC is only better than TR in terms of AUC value and the novelty value. TR is better than HC from the perspective of accuracy. For the dataset MovieLens-1M, the performance difference of HC and TR in diversity and novelty is not significant. However, the performance difference in accuracy is very evident. This is because HC studies the similarity between items from the perspective of degree and decides how to allocate resources accordingly. This is helpful in improving accuracy. There is no effective way to achieve the best performance in all the three aspects of accuracy, diversity, and novelty at the same time [27]. The latest study found that although the most popular method did not achieve very high accuracy [27], its diversity performance is outstanding.

From Tables 2-3, we can see intuitively that the performance of TR is much better than that of other recommendation algorithms in the two aspects of diversity and novelty.

5. Summary and Future Work

In this study, research of the recommendation algorithm differs from the traditional two-dimensional recommendation algorithm. This was the research focus of the investigation. Due to the rapid development of the Internet, the volume of available information has increased dramatically. Thus, information overload has become an unavoidable problem, which needs to be solved urgently. If too much attention is paid to the accuracy of the recommendation, it will bring trouble for users, and the users may feel that a recommendation is unnecessary. Therefore, the focus of the present study was recommendation diversity. A “third dimension” was introduced alongside the two normal dimensions, and the relationship between any of the two dimensions in the three-dimensional model is studied and quantified. The three dimensions then were treated as three sides of a triangle, and each triangle corresponded to an object. The area of the triangle was calculated using Heron’s formula. Finally, the recommendation list for target users was obtained according to the triangle area in descending order. In the second part of the investigation, the effectiveness of the three-dimensional recommendation was illustrated by the method of solving multivariate equations and application of the “squeeze theorem” that is used to decide whether a limit exists. The experimental results showed also that the proposed algorithm performs well in terms of recommendation diversity.

In the realization of the proposed recommendation algorithm, user behaviors are clustered and processed by combining the -mean and Markov chain in a user-object relationship. This makes the algorithm relatively complex. In future work, one of the focuses will be to reduce the time complexity while keeping the effectiveness of the technique and the recommendation performance of the model will be evaluated from the perspective of time as reference [21]. Due to the limited number of commonly used public datasets that are available to construct the triangle model, the experiments in this paper only used the MovieLens dataset. It is planned that more test data will be collected to validate the model. Furthermore, an investigation of the balance of accuracy and diversity will be helpful and is the anticipated focus of future work.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was funded partially by the Open Fund of Key Laboratory of the Ministry of Education (Grant 13zxzk01) and the Digital Media Science Innovation Team of CDUT (Grant 10912-kytd201510).