Abstract

With the prosperous development of e-commerce platforms, consumer returns often occur. The issue of returns has become a stumbling block to the profitability of e-commerce companies. To protect consumers’ purchase rights, the Chinese government has introduced a 7-day unreasonable return policy. In order to use the return policy to attract consumers to buy, various e-commerce platforms have created a more relaxed and convenient return environment for consumers. On the one hand, the introduction of the return policy has increased customer trust in e-commerce platforms and stimulated purchase demand. On the other hand, the return behavior also increases the cost of the e-commerce platform. With the upgrading of consumption, customers pay more attention to personalized experience. In addition to considering price when purchasing online, the quality of services provided by e-commerce platforms will also directly affect customers’ purchasing decisions and return behavior. Therefore, under the personalized return policy of the e-commerce platform, whether consumers will make another purchase is worth studying. In order to achieve this goal, an ensemble learning method (AdaBoost-FSVM) based on fuzzy support vector machine (FSVM) is applied to predict the purchase intention of consumers. First, the grid search method is used to optimize the modeling parameters of the FSVM base classifier. Second, the AdaBoost-FSVM ensemble prediction model is constructed by using multiple base classifiers. In order to evaluate the performance of the prediction models used, logistic regression (LR), support vector machine (SVM), FSVM, random forest (RF), and XGBoost were used to construct prediction models for purchasing behavior. The experimental results demonstrate that the method used in this study has a more accurate prediction effect than the comparison algorithms. The predictive model used in this study can be used in the recommendation system of shopping websites and can also be used to guide e-commerce companies to customize various preferential policies and services, so as to quickly and accurately stimulate the purchase intention of more potential consumers.

1. Introduction

The development of the Internet has promoted the birth of various network platforms, and e-commerce has become a part of people’s lives. Online shopping has many advantages over physical store shopping. For example, consumers can purchase goods at any time and place without being restricted by time and space. At the same time, online shopping provides a broader market for both parties. Sellers can sell to consumers around the world, and buyers can choose products from any store in the world. E-commerce also reduces the intermediate links of commodity circulation, thereby greatly reducing the cost of commodity circulation and transactions. Online shopping also promotes a personalized shopping process. Consumers can filter the price, evaluation, service, and other parameters of commodities according to their needs, so as to lock the items they like. Online shopping has become an indispensable part of people’s lives. Consumers are important participants in e-commerce, and the influencing factors of their purchase behavior are crucial to promoting transactions and the sustainable development of e-commerce platforms.

To protect consumers’ purchase rights, the government has introduced a 7-day unreasonable return policy. Under the government’s return policy framework, various e-commerce platforms have successively introduced their own return policies and services. The impact of the return policy on the purchase behavior of consumers has become a subject of research by many scholars. Reference [1] studied the impact of full-refund and partial-refund policies on consumers’ purchase intentions. Reference [2] constructed a profit maximization model to study the relationship between sales volume and return volume and obtained the optimal refund policy and product price. Reference [3] constructed a model based on consumer preferences to derive the seller’s optimal pricing and return fee policy. Reference [4] built a model to analyze full-refund sum, partial-refund policy, and pricing strategy by introducing the return policy into the supply chain contract. Reference [5] built several theoretical models to analyze the relationship between return policy, product quality, and product pricing based on consumer purchase and return behavior. The experimental results show that the gift policy and the return policy are mutually influencing and complementary. Reference [6] derives the optimal pricing and refund policy of fashion product retailers. In their research, both full-refund and partial-refund policies were incorporated into the model. There is a similar study on the return policy of fashion fast-selling products in [7]. After the research of [8], [9] explored the issue of return service fees for mass-customized fashion products and deduced under what circumstances e-commerce companies can provide free returns for better results. Reference [10] uses an empirical method to study the impact of return strategies on customer purchases. It divides online purchase behavior into two stages. Due to the asymmetry of information, customers face greater risk of return in the second stage. Through three sets of online purchase experiments, the relationship among the looseness of return, customer purchase decision, and merchant profit is verified. Reference [11] believes that customer return is an inevitable behavior, but it is not absolutely a bad thing for companies. The econometric model is used to demonstrate the behavior of returns during the transaction and the impact of returns on the future decisions of enterprises and consumers. The research finally demonstrated the role of returns in the transaction process. Reference [12] studies the impact of return behavior from another perspective. If e-commerce companies can provide return services for customers’ customized needs, they can gain huge advantages. From the tools used in these studies, it can be found that machine learning algorithms [1321] are widely used to predict purchase behavior.

Most of the above works study the impact of the return policy provided by e-commerce companies on the purchasing power of consumers under the unreasonable refund policy. These studies also have the following problems: One is that they did not take into account the return policies of retailers in all e-commerce platforms. Second, they did not pay attention to the return services provided by various merchants. Third, the economic and labor costs of returning goods are not included in the calculation. In response to these problems, this article focuses on the return policies of existing e-commerce companies and uses a new ensemble learning method to predict consumers’ buying behavior. By analyzing user behavior in historical data and modeling, whether the user will buy the product again next time can be predicted. From the prediction results, what kind of return policy and return service can improve the purchasing power of consumers can be analyzed. The contributions of this study mainly include the four following points:(1)Since the collected original data source is not complete, there are missing values and data duplication, inconsistent attribute data types, and noise effects. This study first carried out relevant preprocessing on the data.(2)Through the analysis of data and consumer purchasing behavior, the basic characteristics related to the research question are found out; and, according to the different situations of each data set, the features are associated and extracted.(3)The AdaBoost-FSVM ensemble learning method is used to predict consumers’ buying behavior. The ensemble method uses AdaBoost algorithm, and FSVM is used as a weak classifier. FSVM introduces a membership mechanism on the basis of SVM, so that the classifier has better noise resistance. Comparative experiments show that the ensemble learning method used in this paper has better prediction effects than other nonensemble methods and XGBoost ensemble methods.(4)Through the analysis of the experimental results, the relationship between the return policy of the e-commerce company, the return service, and the purchasing power of consumers is obtained. This connection can guide e-commerce companies on how to optimize return policies and services, so as to achieve the best sales relationship between e-commerce companies and consumers.

2.1. E-Commerce Company’s Return Policy

In 2014, the government revised the Consumer Protection Law. This law formally introduced a policy of returning goods for online shopping, which is “7 days without reason.” This policy stipulates that consumers have the right to return the goods within 7 days from the date of receipt of the goods without giving reasons. On the one hand, the unreasonable return policy can protect the legitimate rights and interests of consumers. On the other hand, providing an unreasonable return policy is also an effective means for major e-commerce platforms to attract consumers and stimulate purchase demand. Consumers not only consider factors related to the product itself when shopping online but also value the level of service provided by the seller. The return policy is an important measure. The current return policies of several mainstream e-commerce platforms are shown in Table 1.

By comparing the return policies of major e-commerce companies, the following points can be analyzed:(1)All e-commerce platforms promise to return goods without reason within 7 days, but the specific return policies and services are different.(2)Vertical e-commerce companies have a more relaxed return policy than comprehensive e-commerce companies. For example, JUMEI.COM promises to return goods without reason within 30 days. Even used goods can be returned, and the postage paid by consumers will be subsidized. Comprehensive e-commerce companies such as TMALL.COM and JD.COM have stricter return policies and will not subsidize postage.(3)Ensemble e-commerce companies provide more considerate return services than vertical e-commerce companies. For example, JD.COM will require retailers to respond to customers’ 7-day unreasonable return application within 24 hours. If the consumer needs to pick up the goods, the staff must pick it up within 48 hours. Retailers should give clear advice on postsale disposal within 48 hours after receiving the message of the returned goods from consumers.

2.2. Influencing Factors of Consumers’ Online Buying Behavior

In addition to considering product price factors when consumers buy online, the retailer’s service level will also directly affect consumer demand and return behavior. Therefore, in addition to product competition and price competition, service competition is also inevitable. All major e-commerce platforms use various means to improve service levels, bring consumers a better shopping experience, and reduce the possibility of mismatch between customer expectations and actual products. In 2017, the China Consumers Association conducted a survey, the subject of which was “online shopping integrity and consumer awareness.” The results of the survey give the factors that affect consumers’ online purchases. The details are shown in Figure 1.

From the results shown in Figure 1, the fact that consumers will be affected by many factors when shopping online can be analyzed. 70% of customers will pay attention to “product/service quality,” and 60% of customers will pay attention to “product/service price.” It can be seen that the key factor of service quality as a means for enterprises to enhance competitiveness and attract consumers is crucial.

2.3. Consumer Repurchase Behavior Analysis

In real life, consumers use online shopping platforms to screen, collect, and add to shopping carts. These behaviors reflect the customer’s preference for the target product. The shopping groups include young people, old people, office workers, and school students. Different people have different preferences. For example, young people are keen to purchase electronic products, the elderly like to purchase daily household items or food materials, and school students like to purchase school supplies and snack foods. Therefore, from the perspective of consumers, some consumers may often repeatedly purchase a certain type of goods, such as daily vegetables and fruits, with a high repeat purchase rate. However, some consumers only occasionally place orders for electronic products and so forth, and the repeat purchase rate is low. There are also consumers who are keen on shopping, purchasing a wide variety of items, and placing orders frequently. However, due to the complexity of the product categories, some products may have a high repetition rate and others may have a low repetition rate.

The goal of this experiment is to select characteristic data by analyzing the results. Ensemble learning algorithms are used to model and predict the probability of consumers buying again. A feasible solution is to make predictions through two classification methods. This study uses supervised learning, and historical big data has provided training samples and target values. Therefore, more attention should be paid to repurchase products when analyzing. For a certain type of consumer, it is important to know which products are often repurchased. It is very meaningful to find out the underlying laws.

2.4. Order Information Analysis

Finding useful information and extracting features from historical data are key steps in prediction. Generally, the fields in each table in the data set are meaningful for predicting reshopping behavior, but there are also a small part of the fields that are mainly used for description and have little meaning. Fields that are helpful for prediction can be converted into valid features through feature extraction. Fields of little significance can be ignored directly. Reorder is an important feature. During the training process, the accuracy of the prediction can be verified. Information such as the time period when the product is purchased plays an important role in predicting the user’s reconstruction behavior.

In real life, people’s purchasing trends are generally regular. Orders are more concentrated on holidays than on weekdays, and orders are more concentrated during the day than at night. The more high-quality goods are more likely to be repeated purchases, and such goods are often placed in the shopping cart first. More useful information can be analyzed in more detail.

3. Consumer Repurchase Behavior Prediction

3.1. Forecasting Process

The construction process of the consumer buying behavior prediction model is shown in Figure 2. First, the feature from the original data to obtain the feature summary table is constructed, and the training set and test set are selected. Second, the features are extracted from the training set. The feature extraction result is to select the features that the prediction algorithm needs to use from the feature list. Third, the training set after feature selection is input into the algorithm for learning to obtain a predictive model. The test set is input into the prediction model to obtain the prediction result.

3.2. Raw Data and Preprocessing

The data set used is data provided by a competition in the Kaggle machine learning competition community. The purpose of the competition is to predict which products consumers will repeat next time based on historical data. The open-source historical sales data of Instacart can be downloaded online. The data structure is shown in Tables 26.

3.3. Feature Extraction

The original data has a high dimensionality and contains poor quality features, which leads to a long training time for the model. Effective data features not only will reduce the complexity of modeling but also are very important for improving the accuracy of prediction. Based on the needs of this research, valuable features are selected from the original sample data. The selected features mainly include basic data features, user features, product features, and user product features, totaling 30. The selected features are shown in Tables 710.

3.4. Model Training

In the AdaBoost [22, 23] algorithm, the feature weights of each training data sample are initialized, and these weights form vector D. First, a weak classifier is trained through the training data set. The classification error rate of the classifier is also calculated. Then, the next round of weak classifiers is trained by changing the weight of the data set. In the second training of the classifier, the weight of each sample will be readjusted. It usually reduces the weight of samples that are classified correctly the first time and increases the weight of samples that are classified incorrectly the first time. Finally, a combination strategy is used for each weak classifier to determine the final classification result. Figure 3 is a schematic diagram of the AdaBoost algorithm.

The learning ability of FSVM [24, 25] in unbalanced samples is not good, and the AdaBoost algorithm can overcome this shortcoming well, so this study proposes an AdaBoost-FSVM ensemble learning algorithm. FSVM is used as a weak classifier for ensemble learning in AdaBoost. First, the training samples are preprocessed. Second, the kernel function of SVM [26] is optimized through the grid search method to obtain the optimal penalty factor c and deformation factor [27]. Initialize sample weights randomly and select a subset of training samples according to the weights of the samples. The selected sample subset is used for FSVM learning. During the learning process, different weights are assigned to the learning machine according to the training results. Update the weight of the sample according to the error rate of the training result. The sample subset used in the next round of FSVM learning is selected according to the weight of the sample. With the continuous learning of FSVM, multiple learning machines are trained. Multiple learning machines are ensembled according to the weighted voting method, and finally a strong classifier is obtained. The execution process of AdaBoost-FSVM is shown in Figure 4.

The flow of the AdaBoost-FSVM algorithm is summarized as follows.

Step 1The actual problem is digitized and transformed into a data format that FSVM can handle
Step 2Normalize the data
Step 3The grid search method is used to find the optimal FSVM parameters c and . Fuzzy coefficient m = 2
Step 4Initialize the weight of each sample
Step 5According to the weight of each sample, k samples are selected to form a sample training subset
Step 6FSVM is used to train a subset of samples to obtain a learning machine
Step 7Calculate the error rate of the learning machine and the weight of the learning machine
Step 8Update the weight of the sample according to the error rate of the learning machine
Step 9Determine whether the amount of training has reached the maximum number of iterations. If the maximum number of iterations is not reached, go to step 6; otherwise, the iteration stops. Voting is performed according to the attribute weight of the learning machine to generate the strongest classifier and output the result
3.5. Model Evaluation

After completing the model training, it is necessary to evaluate the quality of the model. The indicators for evaluating the quality of the model are shown in Table 11.

4. Experiment and Model Evaluation

4.1. Experimental Environment and Parameter Settings

Python language is used in this research, and PyCharm is the program running platform. Python’s data analysis library pandas and graphical tool library Matplotlib are used as tools for data import, matrix transformation, and graphical display. The machine learning library scikit-learn and the algorithm framework XGBoost are used as the algorithm library for training models.

The comparison algorithms in this experiment include logistic regression (LR) [29], SVM [30], FSVM [31], random forest (RF) [32], XGBoost [33], and AdaBoost-SVM [34]. The data set used in the experiment is the original data set shown in Section 3.2. In order to study the consumer’s repurchase intention of the purchased goods under the return policy, in this study, the data set was screened, and the information of consumers with return records was screened out. Among the filtered data, 70% is used as the training set, and the remaining 30% is used as the test set.

4.2. Experimental Results and Analysis

Experiments were performed on the Kaggle data set shown in Section 3.2, and the experimental results are shown in Table 12. After adding 1%, 3%, and 5% noise data to the original data set, the experimental results obtained by each comparison algorithm are shown in Table 13.

For nonensemble algorithms, the FSVM algorithm has the best predictive effect among the four algorithms of LR, SVM, FSVM, and RF. This is because the algorithm introduces a fuzzy strategy to make it have better noise resistance. However, the data collected in actual production and life will inevitably carry some noise. Compared with the four algorithms of LR, SVM, RF, and FSVM, the accuracy of AdaBoost-FSVM is increased by 4%, 2.7%, and 0.2%, respectively. The precision of AdaBoost-FSVM is improved by 6%, 4%, and 3%, respectively. The F1 of AdaBoost-FSVM is increased by 5%, 3%, and 1%, respectively. It can be seen from the experimental results in Table 12 that the performances of FSVM and RF are similar and that of FSVM is slightly better.

For the ensemble algorithm, the AdaBoost-FSVM algorithm among the three algorithms of XGBoost, AdaBoost-SVM, and AdaBoost-FSVM has the best prediction effect. Compared with XGBoost and AdaBoost-SVM, the accuracy of AdaBoost-FSVM is increased by 5% and 5%, respectively. The precision of AdaBoost-FSVM is improved by 3% and 4%, respectively. AdaBoost-FSVM’s F1 is increased by 4% and 4%, respectively. From the experimental data in Table 12, it can be seen that the prediction effects of the XGBoost and AdaBoost-SVM models are similar. The accuracy, precision, and F1 of AdaBoost-FSVM are improved on the basis of the two models.

From the experimental results in Table 13, it can be analyzed that as the noise increases, the prediction performance of each algorithm decreases. When the noise increases by 1%, the accuracies of LR, SVM, FSVM, RF, XGBoost, AdaBoost-SVM, and AdaBoost-FSVM are reduced by 2.3%, 1.3%, 1.2%, 2.3%, 3%, 2.4%, and 1.1%, respectively. When the noise increases by 3%, the accuracies of LR, SVM, FSVM, RF, XGBoost, AdaBoost-SVM, and AdaBoost-FSVM are reduced by 1.9%, 2.5%, 4.6%, 4.4%, 4.0%, 3.8%, and 2.3%, respectively. When the noise increases by 5%, the accuracies of LR, SVM, FSVM, RF, XGBoost, AdaBoost-SVM, and AdaBoost-FSVM are reduced by 5.3%, 4.9%, 4.5%, 7.1%, 6.8%, 5.8%, and 3.7%, respectively. It can be seen from the reduction rate that the AdaBoost-FSVM algorithm has the lowest reduction rate. This fully shows that the algorithm is more robust to noise.

5. Conclusion

The introduction of the government’s return policy has prompted various e-commerce companies to launch their own return strategies and services. In order to study the impact of different return policies and services on the subsequent purchase behavior of consumers, this study selected consumer data with return records in the data set for experiments. An ensemble learning algorithm based on Boosting-FSVM is used as a predictive model. The use of ensemble learning algorithms avoids the impact of unbalanced data on the effect of the algorithm and also improves the stability of the algorithm. In addition, the base classifier in the ensemble learning algorithm is FSVM. This classification model introduces a fuzzy strategy to make the classifier have better noise resistance. To evaluate the performance of the predictive model, nonensemble and ensemble algorithms are used to predict the data set. The experimental results show that the used AdaBoost-FSVM algorithm has the best prediction effect. Using the results predicted by the AdaBoost-FSVM algorithm, e-commerce companies can more accurately analyze the return policies and services applicable to different types of consumers. This will guide e-commerce companies to adjust their return policies and services. Although the prediction performance of the prediction model used in this study has been improved to a certain extent, there is still room for further improvement. In the next step, this study will continue to study other operating methods for unbalanced data.

Data Availability

The labeled data sets used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by the Scientific Research Project of Jilin Education Department.