Abstract

With the rapid development of customer relationship management, more and more user recommendation technologies are used to enhance the customer satisfaction. Although there are many good recommendation algorithms, it is still a challenge to increase the accuracy and diversity of these algorithms to fulfill users’ preferences. In this paper, we construct a user recommendation model containing a new method to compute the similarities among users on bipartite networks. Different from other standard similarities, we consider the influence of each object node including popular degree, preference degree, and trust relationship. Substituting these new definitions of similarity for the standard cosine similarity, we propose a modified collaborative filtering algorithm based on multifactors (CF-M). Detailed experimental analysis on two benchmark datasets shows that the CF-M is of high accuracy and also generates more diversity.

1. Introduction

With the great development of enterprise informatization, customer relationship management (CRM) has been an indispensable part of supply chain. More and more entrepreneurs and scholars generate the growing interests in applications of CRM. This trend is partly attributable to the availability of an overwhelming amount of customer transaction data and the necessary data-mining tools to obtain managerially useful insights [1]. CRM is a model for managing company’s interactions with current and future customers. It aims to maximize the benefits gained from relationships with customers and enhance the enterprises competitive power. The most important expected outcomes of CRM can be listed as follows[2]: improvements in efficiency, cost reduction, improved profitability, increases in sales, enhanced customer value, customer satisfaction, and improved customer loyalty.

Nowadays, customer satisfaction becomes more crucial among researchers and practitioners alike. How to enhance the customer satisfaction, there are many approaches such as setting lower price and better quality of products, providing better service for customers. Specifically, with the mushroomed development of E-commerce applications, the size and complexity of business websites grow rapidly. For the users of these websites it becomes increasingly difficult and time consuming to find the information or products they are looking for. As a consequence, how to efficiently help users filter out the unwanted information and find what is really useful for them is a challenging problem for customer service. Recommendation technologies are used to provide individual marketing decisions for each user. The main task of them is to recommend good products to users, and their performance metric is the number of recommendations made to users until good products are recommended, as well as the number of users that are eventually satisfied. Crucially, they do not require detailed keywords provided by users. Instead, they use the users’ historical activities and possible personal profiles to uncover their preferences or potential interests. Today, some good recommendation technologies have been used to recommend books and CDs at Amazon.com, movies at Netflix.com, and news at VERSIFI Technologies [3].

With the development of the recommendation technologies, various kinds of approaches are proposed, including collaborative filtering (CF) [4, 5], content-based filtering [6, 7], K-Nearest Neighbor (K-NN) [811], diffusion approach [1214], and spectral analysis [15, 16]. CF is a class of information filtering technique which can predict what users will like according to their similarity to other users based on collecting and analyzing a large amount of information on users’ behaviors, activities, or preferences. Content-based filtering method selects items based on the correlation between the content of the items and the users’ preferences as opposed to a collaborative filtering system that chooses items based on the correlation between people with similar preferences. K-NN is one of the most commonly used algorithms for classifying objects based on the properties of its closest neighbors in the feature space. In K-NN, an object is classified through a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors. Diffusion approach applies the three-step mass diffusion starting from the target user on a user-object bipartite network. Spectral analysis is a new recommendation algorithm that relies on the singular value decomposition (SVD) of the rating matrix.

Due to the fact that the CF method has been the most widely and successfully used in many applications, more and more scholars devote themselves to improve this technology. In this paper, we construct a user recommendation model and present a novel method to compute the similarities among users on bipartite network. Comparing the proposed method with other standard similarity computation methods, the advantage of our method is that it takes into account the influence of each object node including popular degree, preference degree, and trust relationship. Detailed numerical analysis on two benchmark datasets, MovieLens and Book-Crossing, indicates that our modified collaborative filtering based on multifactors (CF-M) outperforms other algorithms. Specifically, it can not only provide more accurate recommendations, but also generate more diverse recommendations by precisely recommending less popular objects.

The main contribution of this paper is that we provide a better recommendation algorithm to improve the quality of individual customer service. This proposed algorithm meets high accuracy and certain diversity and also can solve the problems existed in the current CF algorithms which consider little factors of objects in the process of similarity computation. Through detailed experiment, the proposed algorithm demonstrates its superiority. A review of related work is given in Section 2. In Section 3, our recommendation model and modified collaborative filtering algorithm based on multifactors (CF-M) are described. Section 4 provides experimental results and analysis of the CF-M algorithm on two benchmark datasets. Finally, we draw conclusions in Section 5.

Collaborative filtering is the most widely used technique to produce user-specific recommendations of items based on patterns of ratings or usage without need for exogenous information about either items or users. Since the method was proposed, many scholars attempt to improve it to enhance the quality of recommendation results. As a result, it generates lots of modified CF algorithms.

Liu et al. [17] proposed a novel method to compute the similarity between congeneric nodes on bipartite network. They considered the influence of a node’s degree and then presented a modified collaborative filtering (MCF) to substitute the standard cosine similarity. Yang et al. [18] proposed an approach based on the fact that any two users might have some common interest genres as well as different ones. Different from most existing methods, this approach introduced a more reasonable similarity measure metric, considering users’ preferences and rating patterns. Zhao et al. [19] presented a shared collaborative filtering approach to alleviate the sparse problem. The proposed approach leveraged the data from other parties to improve CF performance and did not compromise the privacy of other parties. Bobadilla et al. [20] provided a detailed formulation of the method proposed and an extensive set of experiments and comparative results which showed the superiority of designed collaborative filtering compared to traditional collaborative filtering in (a) the number of recommendations obtained, (b) quality of the predictions, and (c) quality of the recommendations. Liu et al. [21] proposed a sequence-based trust model based on users’ sequences of ratings on documents. The model considered two factors in computing the trustworthiness of users. It also enhanced the similarity of user profiles and was incorporated into a standard collaborative filtering method to discover trustworthy neighbors for making predictions. Kim et al. [22] proposed a collaborative approach to user modeling for enhancing personalized recommendations to users. Their approach first discovered some useful and meaningful user patterns and then enriched the personal model with collaboration from other similar users. López-Nores et al. [23] presented a new strategy called property-based collaborative filtering (PBCF) to address problems of recommender systems by introducing a new filtering strategy, centered on the properties that characterized the items and the users. Tsai and Hung [24] assessed the applicability of cluster ensembles to collaborative filtering recommendation. They used two well-known clustering techniques and three ensemble methods. The experimental results based on the MovieLens dataset showed that cluster ensembles could provide better recommendation performance than single clustering techniques in terms of recommendation accuracy and precision. Choi et al. [25] proposed a hybrid online-product recommendation method combining implicit rating-based collaborative filtering and sequential pattern analysis. They considered the objective of their research by two ways: one was to derive implicit ratings so that CF could be applied to online transaction data even when no explicit rating information was available, and the other was to integrate CF and SPA for improving recommendation quality. Dao et al. [26] proposed a new recommendation model called Context-Aware Collaborative Filtering using genetic algorithm (CACF-GA) for location-based advertising (LBA) based on both users’ preferences and interaction’s context. They first defined discrete contexts and then applied the concept of context similarity to conventional CF to create the context-aware recommendation model. Eckhardt [27] proposed a collaborative filtering model which could provide clear information about preferences and then used this model as user similarity measure instead of traditional ratings-based similarity. Kant and Bharadwaj [28] developed an effective content-based filtering (CBF) by introducing an item representation scheme and fuzzy similarity measures and incorporating collaborative diverse predictions for alleviating its recommendation diversity. Lai et al. [29] proposed a hybrid personal trust model which adaptively combined the rating-based trust model and explicit trust metric to resolve the drawback caused by insufficient past rating records; after that, they presented a recommendation method based on a hybrid model of personal and group trust to improve recommendation performance. Choi and Suh [30] proposed a new similarity function in order to select different neighbors for each different target item. In the new similarity function, the rating of a user on an item was weighted by the item similarity between the item and the target item.

The target of our work is to construct a recommendation model containing an effective method to give users high accuracy and certain diversity recommendation results and also improve the quality of customer service. Finally, the experimental results show that our method is better than many other recommendation algorithms. In addition, our research result can be applied to CRM improvement or electronic commerce construction.

3. Modified Collaborative Filtering Recommendation Model

In this section, we introduce the similarity computation of the traditional collaborative filtering algorithm and then construct an effective recommendation model and derive out our modified collaborative filtering based on multifactors (CF-M).

Figure 1 shows the framework of our proposed recommendation model. In this model, we use a new similarity computation method considering more factors to compute the similarity between target user and other users and then obtain the objects collected by the similar users but not by the target user. Finally, we generate a recommendation list made up of these objects and then recommend them to the target user.

There are several phases in this framework.(1)Data Preprocessing. Data preprocessing is an important step in recommendation model. As far as we know, data-gathering methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, missing values, and so forth. Analyzing data that has not been carefully screened for such problems can produce misleading results. So, in this phase, we process the primary data to fulfill the requirement of the recommendation method before similarity computation. For example, we normalize the users’ ratings in order that our proposed recommendation method can compute these data directly. Since the existence of some implicit evaluations does not indicate users’ preferences but meanwhile they cannot be ignored as well, we assign some suitable values to them.(2)Similarity Computation. This phase is the key procedure. First, we give some definitions. We assume that there is a recommendation model which consists of users and objects, and each user has selected some objects. The relationship between users and objects can be described by a bipartite network. Let denote users set and denote objects set; the recommendation model can be fully described by an adjacency matrix , where when object is selected by user ; otherwise, . After that we use CF-M to compute the similarity between two users. The detailed process of this algorithm will be described in Section 3.2.(3)Recommendation. In the previous phase, we use the CF-M algorithm to compute the similarity between target user and others based on the influence of each object node including popular degree, preference degree, and trust relationship.

In this step, we calculate the comprehensive preference degree of each product unselected by the target user. Finally, the products with high comprehensive preference degree are used to compile a recommendation list in descending order. At last we recommend top products to the target user. In general, the number is no more than 100.

3.1. Similarity Computation of Traditional Collaborative Filtering

Traditional collaborative filtering method usually adopts the standard cosine similarity or Pearson correlation to compute the similarity between two users. For arbitrary users and , the number of common objects shared by them can be expressed as

Generally, for standard cosine similarity computation, let denote the similarity between and and let denote the degree of the user ; namely, how many objects are collected by this user? So we can formulate the expression as

The problem of (2) is that it has not taken into account the influence of an object's degree, so that objects with different degrees have the same contribution to the similarity. If users and both have selected object , then they have a similar preference for object .

3.2. Modified Collaborative Filtering Recommendation Method

As we know, in real recommender system, the similarity computation between two users is not simple but influenced by many factors. So we need to improve the traditional collaborative filtering method in order to fit the complex conditions. Generally, the similarity between two users should be somewhat relative to their degrees, preference degree, and trust relationship. According to these features, we propose a modified collaborative filtering algorithm based on multifactors (CF-M) including all factors mentioned previously to increase the accuracy and certain diversity of the recommendation results.

Through analyzing these factors, we can make the conclusion that each object node’s degree and preference degree are related to its popular degree and corresponding users’ comments or ratings, respectively. For preference degree computation, we need to make quantification of them when encountering users’ comments and then distinguish the degrees of preferences. In other cases, we can divide the degrees of preferences directly. Trust relationship is derived from two users’ past ratings on corated products by adopting Hwang and Chen’s [31] trust computation method. In other words, trust relationship relates to users’ evaluations or scores. Generally, a recommender is more trustworthy if he or she has contributed more precise predictions than other users do.

We assume that the similarity computation on user-object bipartite networks is affected by an influence degree that is proportional to , with being a freely adjustable parameter. Accordingly, the contribution of object to the similarity should be negatively correlated with its degree and positively correlated with its preference degree and trust relationship. It means that it is not very meaningful if two users both select a popular object, while if a very unpopular object is simultaneously selected by two users, there must be some common tastes shared by these two users.

So we suppose that the object contribution to is inversely proportional to and directly proportional to . The formulation of can be expressed as

is the range of the rating score, which equals the difference of the maximum and minimum rating scores. represents the preference degree that object obtained from user . denotes the degree of object ; namely, how many users select this object? denotes the degree of the user ; namely, how many objects are selected by this user?

Despite the users similarity computation, we can find the products unselected by target user but selected by test users who have much similarity to target user. Then we may predict the comprehensive preference degree of these products. Let represent the comprehensive preference degree of object obtained from target user . The formulation of can be expressed as

In the process of recommendation, we get the elements of uncollected by target user and then sort them in descending order, as target user prefers the objects in the top, so we recommend top objects to this user.

The pseudocode of the modified collaborative filtering algorithm based on multifactors (CF-M) algorithm is shown in Pseudocode 1.

Algorithm CF-M: Calculating the similarity between users
begin
 get ;
= size , = size ;
 parameter ;
 preference degree ;
 range of the rating score ;
= , = sum , = sum ;
 for
  for
    ;
   for
     ,
   end
    ;
    ;
  end
 end
End

3.3. Recommendation Performance Metrics

In this paper, we adopt some standard metrics to measure the accuracy and diversity of the proposed method, in which accuracy is the most important aspect in evaluating the recommendation algorithmic performance.

Five metrics: ranking score, precision, recall, intrasimilarity, and Hamming distance. The first three are used to test accuracy and the rest are used to test diversity. The detailed descriptions of these metrics are as follows.(1)Ranking score is used for an arbitrary user if the recommendation is in the test set (according to the training set, is an unselected object for ) and ranked in position in the ordered recommendation list . We can formulate the expression as . For example, if the length of is 200, namely, there are 200 unselected objects for and is the 10th from the top, we say that the position of is 10/200, denoted by . The average of of the overall user-object pairs in the test set defines the average ranking score , which can be used to evaluate the algorithmic accuracy. The smaller the ranking score is, the higher the algorithmic accuracy is.(2)Precision is defined as the ratio of the number of recommended objects collected by users appearing in the test set to the total number of recommended objects. This measure is used to evaluate the effectiveness of a given recommendation list. The precision can be formulated as , in which represents the number of recommended products collected by users appearing in test set and is the total number of recommended products.(3)Recall is defined as the ratio of the number of recommended objects collected by users appearing in the test set to the total number of objects actually collected by users. The larger recall corresponds to the better performance. The recall can be formulated as , in which represents the number of recommended products collected by users appearing in test set and is the total number of users’ actual buying.(4)Intrasimilarity evaluates the similarities between objects inside users’ recommendation lists. A good recommendation algorithm is expected to give fruitful recommendation results and has the ability to guide or help the users to exploit their potential interest fields. Therefore, it calls for a lower intrasimilarity. There are many similarity metrics between objects. Here we adopt the widely used one, that is, cosine similarity to measure objects’ similarity. For two objects and , their similarity is defined as For an arbitrary user , the number of recommendation objects is . Firstly, we need to calculate couple of objects’ similarity and then average these values to get . Finally, we use the mean value of of the overall users to measure the diversity in recommendation lists.(5)Hamming distance can measure the strength of personalization. If the overlapped number of objects in and ’s recommendation lists is , their Hamming distance is Generally speaking, a more personalized recommendation list should have long Hamming distances to other lists. Accordingly, we use the mean value of Hamming distance , averaged over all the user-user pairs, to measure the strength of personalization.

4. Experimental Results and Analysis

To test the recommendation algorithmic performance, we use two benchmark datasets. The MovieLens [32] dataset consists of 1682 movies and 943 users. Each user has rated at least 20 movies by using a discrete number in the scale of 1 to 5. The original data contains 100,000 ratings. In the dataset, there are three kinds of information tables: demographic information about the users, information about the items (movies), and the score about the movies. The Book-Crossing dataset [33] contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit/implicit) about 271,379 books. Ratings (Book-Rating) are either explicit, expressed on a discrete number in the scale of 1 to 10 (higher values denoting higher appreciation), or implicit, expressed by 0.

In our experiment, we need to preprocess the datasets. For MovieLens, only the links with ratings no less than 3 are considered and . For Book-Crossing, only the links with ratings no less than 5 or equal to 0 are considered and . We divide each processed dataset into two parts: the training set which contains 80% of the data and the remaining 20% of the data for the test.

Firstly, we need to predict the range of the optimal values of in order to reduce the computational costs in determining the optimal value through our approach. According to some works in the literature on CF approaches, we predict that our optimal values of are located in the range of 1 to 2. To find the optimal value of parameter rapidly, we execute the iterative computation based on the strategy of binary search. In the process of iterative computation, we set the interval between as 0.01. All these computational definitions and steps lead to lower computational costs. Figure 2 shows the algorithmic accuracy, measured by the ranking score, as a function of . We note that for the two benchmark datasets the best performance of this algorithm occurs around . Certainly, people can adjust the parameter’s value by themselves in practice.

Recall and precision can be used to realize the balance of two competitive factors: cost and efficiency. The efficiency can be improved by increasing the number of recommended products; however, the cost is increasing at the same time. That is to say, the cost can be decreased by reducing the recommendations, while the efficiency may be decreased correspondingly. At a certain length of recommendation list , precision tests whether the cost is deserved or necessary, while recall tests whether the efficiency is sufficient. Based on these two measures, one can find a certain as a tradeoff for cost and efficiency. In general, the number of recommended is no more than 100.

Figure 3 shows the precision and recall in different value of parameter . What we can know from this figure is that the algorithm reaches the highest precision and maximum recall when the parameter for the MovieLens dataset. In addition, the precision reaches a good level if and so does the recall.

For Book-Crossing dataset, what we can know from Figure 4 is that the algorithm reaches the highest precision and maximum recall when the parameter is also equal to 1.86. Furthermore, the precision and recall reache a good level if .

After that, we compare CF-M with three other widely used recommendation algorithms: CF, MCF, and NBI [34] in all five metrics. Different from CF and MCF, NBI is a diffusion algorithm based on homogeneous diffusion process on networks; that is, each object distributes its resource to its neighbors equally. In addition, it has been demonstrated to be more accurate than the classical CF algorithm, with lower computational complexity. We summarize the algorithmic performance in Table 1 for MovieLens and Table 2 for Book-Crossing.

Comparing CF-M with the standard CF, as is seen in Table 1 in the condition of recommendation number , the ranking score can be further reduced by 23.2%, and with MCF the ranking score can be reduced by 12.7%. Similarly, our algorithm has lower ranking score than NBI algorithms. For the rest of metrics, our algorithm is also the best. Although Book-Crossing dataset is similar to MovieLens, it is much sparse. So we set the number of recommended products no less than 50.

Table 2 shows that our algorithm exceeds the other three algorithms in all the five criterions: lower ranking score, higher precision, bigger recall, lower intrasimilarity, and larger Hamming distance.

CF-M algorithm adjusts the accuracy via parameter . When , the comprehensive preference degree of each product is inversely proportional to and directly proportional to . Thus, the algorithm tends to recommend popular products to users. But it is not what people want. On the other hand, when , the comprehensive preference degree of each product is inversely proportional to and directly proportional to . In this case, the algorithm tends to recommend unpopular products to users. The experimental results show that it is more suitable to recommend unpopular and reliable products to users in fact. Finally, for an online recommender system, we need to consider the processing time and memory consumption of its recommendation algorithm. If we denote and by the average degree of users and objects on the bipartite network, the computational complexity of CF-M is and the memory store is , which is the same as CF and MCF. For NBI algorithm, its computational complexity is and the memory store is . When the number of objects is much larger than the number of users, the CF-M may be more practicable; otherwise NBI may be more practicable. Although the accuracy and diversity of the proposed method CF-M are increased, it is still confronted with the cold-start problem.

5. Conclusions

Recommendation model predicts users’ potential future likes and interests by using users’ past preferences data. An excellent recommendation algorithm meets high accuracy and certain diversity and can enhance the quality of personalized service. Since the collaborative filtering approach was proposed, it has attracted much attention for its convenience, high accuracy, and certain diversity as well as low computational complexity.

In this paper, we construct an effective recommendation model in order to improve the current recommender system for better customer service. We analyze the collaborative filtering algorithm and propose a modified one based on multiple influence factors. We compute the similarity between two users on bipartite network. Comparing the proposed method with other standard similarity computation methods, the key feature or superiority is that our method takes into account the influence of each object node including popular degree, preference degree, and trust relationship. All the factors are governed by a parameter which is derived from optimal value calculation. Certainly, people can adjust the parameter’s value by actual requirement. Detailed numerical analysis on two benchmark datasets, MovieLens and Book-Crossing, indicates that the presented algorithm is of high accuracy and also generates certain diversity.

Concerning future work, we will improve our recommendation model and pay more attention to algorithmic structure. The research covers the following aspects: how to compute the similarity while engaging in the user context factors; how to introduce other technologies to improve the accuracy and diversity; how will the recommendation algorithm keep its robustness when meeting hostile attack; and how to alleviate the influence of the cold-start problem.

Acknowledgments

This research was supported by the National Natural Science Foundation of China (Grants nos. 71071140 and 71071141), Natural Science Foundation of Zhejiang Province (Grant no. LQ12G01007); and Ministry of Education, Humanities and Social Sciences project (Grant no. 13YJCZH216).