Research on E-Commerce Platform-Based Personalized Recommendation Algorithm
Aiming at data sparsity and timeliness in traditional E-commerce collaborative filtering recommendation algorithms, when constructing user-item rating matrix, this paper utilizes the feature that commodities in E-commerce system belong to different levels to fill in nonrated items by calculating RF/IRF of the commodity’s corresponding level. In the recommendation prediction stage, considering timeliness of the recommendation system, time weighted based recommendation prediction formula is adopted to design a personalized recommendation model by integrating level filling method and rating time. The experimental results on real dataset verify the feasibility and validity of the algorithm and it owns higher predicting accuracy compared with present recommendation algorithms.
With the rapid development of the Internet and continuous expansion of E-commerce scale, commodity number and variety increase quickly. Merchants provide numerous commodities through shopping websites and customers usually take a large amount of time to find their commodities. Browsing lots of irrelevant information and products will make consumers run off due to the information overload. In the E-commerce age, users need an electronic shopping assistant, which can recommend possible interesting or satisfying commodities according to interests and hobbies of users. To solve all these problems, a personalized recommendation system emerges .
Personalized recommendation recommends information and commodities to users according to interests and purchasing behaviors of users. Personalized recommendation system is an advanced business intelligence platform established on the basis of massive dataset mining and it aims at helping E-commerce websites provide completely personalized decision-making support and information service for customer purchase. E-commerce platform-based personalized recommendation technology has been widely mentioned in academia and industry. The recommendation factors are usually based on website best seller commodities, user city, past purchase behaviors, and purchase history to predict the possible purchase behaviors of users.
Traditional collaborative filtering (CF) algorithms have problems of data sparsity and cold start. With the rapid development of the network technology, personalized recommendation in E-commerce environment faces new challenges, faster timeliness, higher accuracy, and stronger user personalization. Its major feature is considering the influences of real-time situation. On the basis of traditional collaborative filtering algorithms, three innovation points are added: a more proper filling data method for nonrated commodity; adding time and giving high weight on data close to evaluation time and low weight on data far from evaluation time; and exploring the influences of the number of nearest neighbors on recommendation accuracy and obtaining optimal nearest-neighbor set. Through the abovementioned changes, the prediction accuracy of the algorithm can be improved and the needs of users’ personalized services can be satisfied.
The remainder of this paper is organized as follows. In Section 2, we provide an overview of related work at home and abroad. Section 3 introduces the key technology of E-commerce recommendation system. Section 4 provides the experiment dataset and evaluation metrics, which introduces the experimental scheme, experiment results, and its analysis, followed by the conclusion and future work in Section 5.
2. Related Work
With the continuous improvements of E-commerce platform, E-commerce personalized recommendation system has gradually formed into a perfect system. Academia and E-commerce enterprises have paid more and more attention to the recommendation system. At present, many large-scale websites at home and abroad have provided recommendation function for users and many prototypes of personalized recommendation systems have emerged and obtained good application effects. A lot of reprehensive recommendation systems are shown as in Table 1.
Utilizing various social relations in social networking services for recommendation studies has achieved great progress and becomes the hotspot field of personalized recommendation studies. Bonhard and Sasse studied the influences of social background on recommendation results and results proved that when users purchase commodities, they tend to accept the recommendation of acquaintance . Sinha and Swearingen carried out experiments by the aid of multiple online recommendation systems and the experimental results indicated that when online systems and friends both provided recommendations, users tended to select the latter . Caverlee et al.  constructed the trust framework by online social networking services, which adopted trust and feedback information to generate recommendation result lists, with higher recommendation precision. Adomavicius and Tuzhilin  proposed multidimensional space recommendation algorithm and pointed out that it was necessary to add recommendation feature dimensions according to specific conditions. Nguyen et al. proposed a nonlinear probability algorithm GPFM for context recommendation model. In the recommendation process, this algorithm used Gaussian process, which can not only display feedback information but also use implicit feedback information. Gradient descent method is used for optimization, which improves the model expansibility . Paper  compares four main recommendation technologies and introduces the review E-commerce research hot topic in the field of personalized recommendation. The study  proposes personalized product recommendations based on preference similarity, recommendation trust, and social relations.
Aiming at personalized problems of E-commerce, many domestic scholars carried out thorough studies. Huang and Benyoucef made a review on relevant literatures of E-commerce personalized recommendation, illustrating the concept of social commerce, discussing the relevant design characteristics of social commerce and E-commerce, and putting forward a new model and a set of principles to guide the design of social commerce . Li et al. proposed an E-commerce personalized recommendation algorithm that integrated commodity similarity, recommendation trust, and social relations . Experiment results indicate that social relations in social networking can be used to improve the recommendation algorithm accuracy. Zhang and Liu raised a personalized recommendation algorithm integrating trust relationship and time series . In another paper , a social networking recommendation algorithm integrating all kinds of context information was proposed. On the basis of users’ geographical location and time information, this algorithm deeply explores the social relations of potential users and helps users to seek other users with similar preference. Then, corresponding recommendations are made by combining social relations of mobile users, which effectively solves the recommendation accuracy. In literature , the author comprehensively considered the influences of user preference, geographical convenience, and friends and put forward a group purchase discount coupon recommendation system to promote the commodity with sensitive locations.
It is not difficult to find out through deep analysis of the abovementioned algorithms that existing personalized recommendation algorithms still have many deficiencies: poor expansibility of preference models, inability to adapt to dynamic change of datasets, resulting in lack of time information that can be used, and inability to solve cold start problems very well. Aiming at the abovementioned problems, based on a comprehensive consideration of factors such as timeliness of the recommendation system, time weighted based recommendation prediction formula is adopted and different weights are given to rating data according to rating time, so as to improve the recommendation quality of E-commerce recommendation system.
3. Key Technology of Recommendation System
In order to better solve the problems of data sparsity and rating time factor, this paper adopts level filling method to predict the nonrated items and finally combines time weights in the recommendation prediction stage to improve the recommendation accuracy of the algorithm.
3.1. Hierarchical Filling Method
Traditional collaborative filtering algorithm CF sets the nonrated items as the average or a fixed value, for example, 3 (rating between 1 and 5), shown as in Table 2. Three is set for it is the middle rating of 1–5. It does not consider the user preference and it purely sets the median as the prediction rating. Different users give different item rating. The advantage of this method is simple, but it cannot solve the problems of traditional collaborative filtering methods in sparse user matrixes.
To reduce the sparsity of the rating matrix, this paper adopts level filling method to construct the rating matrix. For E-commerce websites, each commodity owns its category, which has a parent category. Namely, commodities in E-commerce own the concept of level and different commodities own different hierarchies, shown as in Figure 1. The level of one commodity is considered in the construction of rating matrix. For commodities at different levels, this paper supposes its subordinate commodities to fill different prediction rating through calculation. This paper combines traditional classification methods with user rating data and through calculation a preliminary rating is made on the nonrated data of recommendation users. This method is used to construct a new user-item rating matrix.
For rated data, ratings are extracted to the belonging category. In the construction of rating matrix by collaborative filtering technology, for one category, its Rated Frequency (RF) is calculated and the calculation method is shown as follows:
Item Rated Frequency (IRF) represents the weight of rated items and the calculation method is shown as follows:where represents the total amount of rated data and automatically filled prediction rating. represents the total number of automatically filled prediction ratings.
This paper proposes a user-item rating matrix construction algorithm, which automatically fills ratings of nonrated data in greater than the threshold. The design flow of hierarchical filling method is shown as follows: Input: initial user-item rating matrix . Output: user rating matrix after the prediction rating is filled.
Step 1. Calculate of each category in matrix .
Step 2. Fill in the average item rating of greater than threshold of the rating matrix.
Step 3. For new items in rating matrix, 3 is automatically filled and finally constructs the user-item rating matrix .
Through calculating of specific category of items, this paper fills scores of the top- category items of instead of simply filling dataset , which can reduce the data sparsity. Finally, the algorithm automatically fills new items with , aiming at solving cold start of new items.
3.2. Improvement of Recommendation Timeliness
CF algorithm does not take the influences of time on rating data into consideration and it treats item ratings of different users visited at different moments equally. Interests and preferences of different users dynamically change with time, so the time when different users have interests in the same item differs. However, if the rating is the same, they are likely to be regarded as similar neighbor users, further influencing the recommendation quality. This paper introduces time function, shown as follows:where represents the time that user a has interests in item . Time function is a monotone decreasing function and decreases with the increase of time and time weight maintains . Namely, the newer the data is, the greater the weight is and the time function is. In this way, the influence of time on recommendation quality is solved.
3.3. Time Weighted Based Prediction Function
CF algorithm predicts the rating of item by current user according to the rating similarity of users or items, shown as follows:where represents the rating similarity between users and and (5) calculates the similarity between users and by Pearson’s correlation coefficient:where represents the item set rated by users and . and , respectively, represent the score of item by users and , and and , respectively, represent the mean score of all items by users and .
Time function is added in (4) and the weighted prediction rating of item by target user is improved:
Here, is shown as (3) and each rating item owns only one weight. Latest ratings are given with great weight and past ratings are given with small weight, which helps predict more accurately.
3.4. New Recommendation Model (NewRec)
Aiming at data sparsity and timeliness in traditional collaborative filtering recommendation algorithms, this paper integrates hierarchical filling method and time on the basis of CF and puts forward a new personalized recommendation algorithm, NewRec. NewRec recommendation model is shown as in Figure 2. The whole recommendation model is divided into three main modules: data preprocessing module, sparsity reduction module, and nearest-neighbor recommendation module.
Data preprocessing module input user information, including user purchase records, user rating on commodities, and user duration time on websites. This useful information is converted into acceptable data format of the recommendation method, forming user-item rating matrix.
In sparsity reduction module, for all the items in user-item rating matrix, RF/IRF of the commodity’s corresponding level is calculated and filled in the specific value of rating matrix, which solves the problem of data sparsity.
In nearest-neighbor recommendation module, considering timeliness of the recommendation system, time weighted based recommendation prediction formula is adopted to calculate the prediction ratings of the target items, rank them, and select top- items as recommendation set.
4. Experimental Analysis
The dataset in this paper is from https://movielens.org/, which is collected by GroupLens research group in University of Minnesota. This dataset realizes sites of user personalized recommendation by collaborative filtering technology. The system adopts the user ratings ranging from 1 to 5. The higher the rating is, the more interested the users are. This dataset contains the ratings of 1,682 movies by 943 users. According to the latest statistics, there are over 70,000 users and 6,600 rated movies in the database of MovieLens site. At present, datasets in MovieLens site are abundant, clear, real, and accurate, so they have been widely used in the simulation test of the personalized recommendation system and authoritative test data sources in this field. Taking this as the simulation dataset, this paper designs a reasonable and feasible evaluation standard and carries out a comparative analysis on the recommendation quality of the improved algorithm. The experimental results prove the validity and rationality of the improved algorithm.
4.2. Experimental Scheme
For collaborative filtering recommendation algorithm, its actual effects in E-commerce personalized recommendation system are mainly influenced by two factors: data sparsity and the number of the nearest neighbors. Thus, this experiment designs the following two schemes.
CF algorithm, time-based function recommendation (TimeRec for short), hierarchical filling (HF for short), and NewRec in this paper under different degrees of data sparsity are compared. Different degrees of data sparsity can truly simulate the working condition of E-commerce recommendation system and verify the changes of recommendation effects under different conditions of effective information.
Under different numbers of nearest neighbors, recommendation performances of CF, HF, TimeRec, and NewRec are compared. This process can verify the changes of recommendation effects of each recommendation algorithm under different numbers of nearest neighbors and help each recommendation algorithm select optimal number of nearest neighbors for convenience of operation in future experiments.
This section designs 5 experiments to verify the superiority of the algorithm in this paper:(1)The influences of different degrees of sparsity on recommendation quality: in the experiment, this paper selected three degrees of data sparsity for comparison.(2)MAE comparison between hierarchical filling method and traditional collaborative filtering CF.(3)The influences of time on recommendation accuracy.(4)The influences of numbers of nearest neighbors on recommendation algorithms: the influences of different scales of nearest-neighbor sets on recommendation quality are observed.(5)The recommendation qualities: with the same number of neighbors, the recommendation qualities of different algorithms are compared.
To test the performance of NewRec recommendation model and time function-based improved algorithm TimeRec, this paper will verify the validity of the model by experiment. Traditional collaborative filtering recommendation algorithm CF  is taken as baseline. CF algorithm utilizes the similarities between items to recommend similar commodities for target users. The similarities between users or items can be calculated by (5).
To compare the algorithm performance, this paper adopts MAE and RMSE to evaluate the recommendation performance of the recommendation algorithm. The definition of MAE is shown as follows:where represents the actual rating of commodity by user and represents the prediction rating of commodity by user . represents the number of all prediction ratings. The definition of is shown as follows:
4.5. Experiment Results
4.5.1. Influences of Data Sparsity on Recommendation Algorithm
Data sparsity refers to the ratio of nonrated items to the elements in the whole rating matrix. To verify the influences of data sparsity on recommendation accuracy, this paper fills the prediction ratings in original user-item rating matrix for recommendation calculation. Datasets with sparsity of 0.92, 0.81, and 0.74 are selected and CF algorithm was used for verification. The experimental results are shown as in Figure 3.
It can be seen from Figure 3 that the recommendation quality does not increase with the decrease of sparsity. In this experiment, when the sparsity is 0.81, the recommendation quality is the highest. In the following experiment, datasets with sparsity of 0.81 are taken for experiment.
4.5.2. Analysis of Hierarchical Filling (HF) Method in Recommendation Accuracy
To verify the influences of data sparsity on recommendation accuracy, MAE is calculated before and after hierarchical filling (HF) method through experiment. It can be seen from Figure 4 that, with the increasing number of focused users among neighbor users, MAE of HF method and MAE of CF method both decrease and MAE of HF is smaller than that of CF algorithm under the same number of neighbors. Thus, HF method is better than CF algorithm.
4.5.3. Influences of Time on Recommendation Accuracy
To guarantee the recommendation accuracy, influences of time on prediction rating stage shall be considered and each rated item owns only one weight. Latest ratings are endowed with greater weight and past ratings are endowed with smaller weight, which helps better forecast. To verify the influences of time on recommendation accuracy, this section compares MAE between CF algorithm and TimeRec algorithm.
It can be seen from Figure 5 that MAE of the improved TimeRec algorithm with time function is lower than that of CF without time function. Through comparison, it is proved that time does have influences on recommendation prediction and the use of time function improves the recommendation quality of the recommendation system.
4.5.4. Influences of the Number of User Neighbors on Recommendation Accuracy
It is easy to calculate the nearest neighbor of each user by calculating the similarity between users. To verify the influences of the number of user neighbors on recommendation accuracy, this section makes comparison through experiment and the number of nearest neighbors increased from 10 to 60, with interval of 10. The experimental results are shown in Figure 6.
It can be seen from Figure 6 that, with the increasing number of nearest neighbors, MAE of four algorithms all tend to decrease firstly and increase then. However, MAE of the improved algorithm NewRec is lower than those of the other three algorithms, which indicates that the NewRec can provide better recommendation quality than the other three. From the further analysis, it can be seen that four recommendation algorithms own lowest MAE when the number of nearest neighbors is 40. Namely, when the number of nearest neighbors is 40, four recommendation algorithms all can achieve good recommendation quality.
4.5.5. Comparison among Different Recommendation Algorithms on Recommendation Accuracy
To verify the recommendation accuracy of NewRec algorithm proposed in this paper, this section calculates RMSE of algorithms through experiments and the experimental results are shown in Figure 7.
It can be seen from Figure 7 that, compared with traditional collaborative filtering recommendation algorithm CF, level filling-based improved algorithm HF, and time function-based improved algorithm TimeRec, the improved algorithm NewRec owns the highest recommendation accuracy.
To sum up the abovementioned experimental results, the following conclusion can be drawn. Compared with the other three algorithms, the recommendation quality of the improved algorithm NewRec is significantly improved after hierarchical filling and time function are added.
This paper utilized the features that commodities in E-commerce system belong to different levels to fill in specific score in rating matrix by calculating RF/IRF of the commodity’s corresponding level, which solves problems of data sparsity and cold start to certain extent. In the recommendation prediction stage, in consideration of timeliness of the recommendation system, time weighted based recommendation prediction formula is adopted and different weights are given to rating data according to rating time, so as to improve the recommendation quality of E-commerce recommendation system. The experiment results in real dataset indicate that the algorithm in this paper is better than the traditional collaborative filtering recommendation algorithm in running efficiency and recommendation accuracy.
Collaborative filtering is a common recommendation technology of E-commerce personalized recommendation system. However, it also owns many problems. For data sparsity in user-item rating matrix and timeliness of user evaluation, this paper proposes an improved collaborative filtering recommendation algorithm, NewRec, and verifies the feasibility of NewRec algorithm through experiment simulation, proving that it can improve the recommendation quality of E-commerce recommendation system. At present, there are still many problems and shortcomings in the studies of E-commerce personalized recommendation. For user personalized recommendation, the improved collaborative filtering algorithm in this paper fails to consider the influences of context and user interaction behaviors, which need further thorough studies in the future.
The authors declare that they have no competing interests.
This paper is supported by the Science and Technology Development Planning of Shandong Province (2014GGX101011, 2015GGX101018), A Project of Shandong Province Higher Educational Science and Technology Program (J12LN31, J13LN11, and J14LN14), and Jinan Higher Educational Innovation Plan (201401214, 201303001).
R. Sinha and K. Swearingen, “Comparing recommendations made by online systems and friends,” in Proceedings of the Delos-NSF Workshop on Personalization and Rocommonder Systems in Digital Libraries, 2001.View at: Google Scholar
G. Adomavicius and A. Tuzhilin, “Multidimensional recommender systems: a data warehousing approach,” in Electronic Commerce: Second International Workshop, WELCOM 2001 Heidelberg, Germany, November 16-17, 2001 Proceedings, vol. 2232 of Lecture Notes in Computer Science, pp. 180–192, Springer, Berlin, Germany, 2001.View at: Publisher Site | Google Scholar
T. V. Nguyen, A. Karatzoglou, and L. Baltrunas, “Gaussian process factorization machines for context-aware recommendations,” in Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '14), pp. 63–72, ACM, Gold Coast, Australia, July 2014.View at: Publisher Site | Google Scholar
J. Wang, A. P. De Vries, and M. J. T. Reinders, “Unifying user-based and item-based collaborative filtering approaches by similarity fusion,” in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 501–508, ACM, August 2006.View at: Google Scholar