Abstract

In recent years, blockchain has substantially enhanced the credibility of e-commerce platforms for users. The prediction accuracy of the repeat purchase behaviour of e-commerce users directly affects the impact of precision marketing by merchants. The existing ensemble learning models have low prediction accuracy when the purchase behaviour sample is unbalanced and the information dimension of feature engineering is single. To overcome this problem, an ensemble learning prediction model based on multisource information fusion is proposed. Tests on the Tmall dataset showed that the accuracy and AUC values of the model reached 91.28% and 70.53%, respectively.

1. Introduction

As one of the most disruptive and innovative technologies to emerge in the fourth industrial revolution, blockchain has significant potential for the industrial transformation of e-commerce platforms. Blockchain can ensure payment security, protect users’ privacy, and enhance transaction transparency of e-commerce platforms [1]. Therefore, it can improve the credibility of a platform and expand its user base. More importantly, blockchain redistributes the benefits of a centralized e-commerce platform to every merchant or seller, which effectively increases the income and improves the enthusiasm of the merchants. To attract the attention of more buyers and the rate of purchases, many merchants tend to launch large-scale promotions through e-commerce platforms on specific dates. However, the new buyers that are attracted during the promotion period are mostly one-time buyers, who do not provide long-term returns for the merchants. Therefore, repeat buyer prediction after a promotion can help an e-commerce platform carry out effective marketing to build access to long-term customers. Using historical behavioural data about buyers obtained during promotions, such as data about browsing and adding to cart and wishlist, can help e-commerce platforms understand user preferences and shopping habits in order to achieve repeat buyer prediction.

The existing prediction models can be divided into two categories: individual models and ensemble models. Individual models, such as logistic regression (LR), support vector machine (SVM), recurrent neural network (RNN), and multilayer perception (MLP), are widely applied in repeat buyer prediction [2, 3]. As the scale of e-commerce buyers increases, the size of historical behavioural data increases rapidly. Repeat buyer individual prediction models based only on impact factors cannot achieve desired results. In recent years, many researchers have proposed repeat buyer ensemble learning prediction models, such as random forest (RF), gradient boosting decision tree (GBDT), and XGBoost, that integrate multiple individual prediction models. Results demonstrate that ensemble learning prediction models are superior to individual prediction models in terms of accuracy and robustness [4, 5]. However, the existing GDBT-based ensemble learning models perform poorly on processing the buyers’ behaviour sequence data. Neural networks, represented by RNN and long short-term memory (LSTM) models, have shown notable success in sequence data modeling, which makes it possible to build a buyers’ behaviour sequence prediction model based on a RNN [6]. The individual models based on GDBT and neural networks are quite different. After fusion, they can effectively process various types of historical behaviour data about buyers. At the same time, new requirements have emerged for the fusion method of individual models in ensemble learning. To meet the new requirements, we propose a repeat buyer ensemble learning prediction model that combines the DeepCatboost, DeepGBM, and double attention BiGRU (DABiGRU) individual models using the vote-stacking method. The proposed model is better at modeling the discrete purchase records and historical behaviour sequence features, thus improving the accuracy of repeat buyer prediction.

The main contributions of this article are summarized as follows: (1)This paper analyzed the impact factors of repeat purchase behaviour and constructed corresponding features. The DeepCatboost and DABiGRU individual models are proposed, and the DeepGBM individual model is introduced to predict the repeat buyers(2)By differentiating the training data of the individual models, adding a base learning layer, and applying a majority voting decision-making mechanism to improve the test set, this paper presents an improved vote-stacking fusion approach(3)Experiments were conducted on the widely used the Tmall dataset. Compared with the individual and other reference models, the vote-stacking ensemble learning model achieved better results in predicting repeat buyers

The rest of this paper is organized as follows. Section 2 discusses existing studies on repeat buyer prediction methods and ensemble learning. Section 3 describes the sample balancing and feature construction and proposes the DeepCatboost and DABiGRU individual models, as well as an ensemble learning prediction model based on the vote-stacking fusion approach. The proposed method is evaluated on a real dataset in Section 4. Section 5 summarizes the paper.

The individual prediction models predict the repeat buyers by constructing impact features and feeding them into a single machine learning or deep learning algorithm. Using of a large amount of historical behaviour data of buyers’ browsing, clicking, and purchasing on e-commerce platforms, Liu and Li established a prediction model based on an SVM, which predicted the repeat purchase behaviour of buyers after promotions. They demonstrated the feasibility of using historical behaviour data to establish a prediction model and to identify the repeat buyers after promotions [7]. Tang et al. proposed a purchasing behaviour prediction framework based on an SVM model that optimized parameters and used the firefly algorithm [8]. They achieved better results than the traditional SVM model. Sakar et al. compared visitors’ purchase intention prediction models based on RF, SVM, and MLP and found that the accuracy and F1 score of the MLP model were significantly higher than those of the RF and SVM models [9]. The performance of traditional machine learning models depends heavily on feature engineering. However, the construction of features requires huge human resources, which makes traditional machine learning models not suitable for repeated buyer prediction tasks with massive user purchase behaviour data. Therefore, many repeat buyer prediction models based on deep learning algorithms have been proposed recently. For the task of customer purchase prediction in multichannel online promotions, Ling et al. used a fully connected long short-term memory network to model the interaction between customers and promotion channels, the nonlinear sequence correlation, and the cumulative effect between customers’ browsing behaviour [10]. The above-mentioned neural network based models substantially reduce the dependence on feature engineering. However, there is still a gap between the ability of a neural network to handle the dense numerical features of the historical purchase behaviour data of buyers and that of the ensemble learning model based on the GBDT.

With the development of ensemble learning techniques, many researchers have introduced ensemble prediction models by combining different individual prediction models to improve the accuracy and robustness of prediction results. Common ensemble learning prediction models are mainly XGBoost [4] and RF [11, 12]. Li and Shao proposed a user purchase behaviour prediction model that combined LSTM and RF [13]. The model not only extracts features from the user behaviour data and commodity attributes but also dynamically extracts features relating to the user behaviour data as a successive associated sequence by using LSTM on sequence data. Next, the model applies RF to predict user purchase behaviour. However, the dynamic feature extraction using LSTM is not accurate enough on user behaviour data, as it simply considers that each piece of historical behaviour data has equal impact on the purchase behaviour. In addition, it cannot represent the interrelation between different user behaviours (such as browsing, collecting, or purchasing). Kumar et al. adopted a hybrid method combining the artificial bee colony algorithm with machine learning techniques to predict repeat buyers. Their study found that the seller features and the buyer features were the main factors that affected the intention of repeat purchase, which inspired the feature construction of this paper [14].

The ensemble learning models can fuse a variety of prediction results from different individual models for collaborative decision-making, enabling a more accurate, stable, and robust final result. At present, the commonly used fusion strategies are the voting, averaging, and learning methods [1518]. In recent years, many studies have shown that the performance of an ensemble model can be effectively improved by improving the strategy of fusing the individual models [1921]. In reference [22], a stacking ensemble model using URL and HTML features was proposed to detect fishing pages in websites. This model effectively combined three individual models, GBDT, LightGBM, and XGBoost, so that the different individual models complemented each other, thus improving the accuracy of fishing web page detection. A credit score ensemble prediction model based on a multistage adaptive classifier by using statistical learning and machine learning was proposed in [23]. According to the performance of the classifiers on the dataset, base classifiers were selected adaptively from the candidate classifiers set, and the parameters of the base classifiers were optimized using the Bayesian optimization algorithm. Then, the optimized base classifiers were ensemble with a multilayer stacking ensemble method to produce new features. Compared with the individual models, the ensemble models RF and AdaBoost showed better performance and data adaptability.

3. Repeat Buyer Prediction Model

3.1. Model Framework

Figure 1 demonstrates the overall framework for the proposed repeat buyer prediction model, which fuses information about buyers, sellers, and the interaction behaviour between them. Cleaning the historical behaviour data about buyers and eliminating erroneous data and missing data are implemented in the reprocessing stage, and the subtime under sampling is adopted to balance the sample [24]. The three-dimensional hidden features are constructed from the buyers, sellers, and the interaction between the buyers and sellers, respectively. The sample-balanced historical purchase behaviour data is inputted into the DABiGRU model. Next, the well-constructed features are fed to the DeepCatboost and DeepGBM individual models to predict the repeat buyers. The individual models are ensemble with the vote-stacking fusion approach, and the final prediction results are obtained.

3.2. Sample Balancing and Feature Engineering
3.2.1. Sample Balancing

Usually, there are very few repeat buyers after a promotion, resulting in the problem that the repeat buyer sample and the one-time buyer sample are not balanced. To correct this problem, the sub-time under sampling method shown in Algorithm 1 is introduced to balance the samples. According to the time-sensitive characteristic of the purchase behaviour of buyers, the original sample of repeat buyers and one-time buyers are segmented according to the day. For each buyer in the original sample, the three nearest neighbour buyers are determined according to their Euclidean distance. If the buyer is a one-time buyer and more than two of its three nearest neighbour buyers are repeat buyers, delete it; if the sample is a repeat buyer and more than two of its three nearest neighbours are one-time buyers, remove the one-time buyers from the nearest neighbours. The rest are kept in the original buyer sample.

Input: The original historical data about buyers (); the numbers of recording days ();
Output: The balanced historical data about buyers ().
1: ; //Segment the Original Data according to the Number of Recording Days
2: fordo//Traverse each buyer of the original data.
3: = Random Choose()//Randomly select a buyer sample.
4: if is a repeat buyer then.
5: //Determine whether there are more than two one-time buyers in the nearest neighborhood of .
   if repeat-buy = sum(KNN()) then.
6: //Remove the One-Time Buyers from the Nearest Neighbors
delete (KNN() repeat-buy).
7:  else.
8:    save ()//Keep this buyer sample.
9: else.
10:  if norepeat-buy = sum(KNN()) then.
11:   delete ()//Remove this buyer sample.
12:  else.
13:   save ()//Keep this buyer sample.
3.2.2. Feature Engineering

After analyzing the data before and after promotion, it is found that the historical interactive data of buyers is scattered in the information of buyers, sellers, and historical behaviour and so on. There are very few features in the original data that can be applied directly for repeat buyer prediction and the predictions are not ideal. There, statistical analysis and latent Dirichlet allocation (LDA), principal component analysis (PCA), and factorization machine (FM) machine learning methods were employed to build the repeat buyer features. Then, these features were fed into the DeepCatboost and DeepGBM individual prediction models for selection and training. In addition to the basic features of buyers, sellers, and their interaction, three hidden features were constructed, and their specific meanings are as follows:

(1) Topic Features. Drawing on the method of LDA topic modelling of text, the seller is regarded as a document, and the IDs of all historical buyers are regarded as the words in the document [25]. Potential factors generated in the reset low-dimensional space are regarded as the topic features of sellers, and the topic features of the buyers can be obtained in the same way.

(2) Similarity Features. The similarity features include seller similarity and buyer similarity. The greater the number of cobuyers (similar buyers) between the sellers, the higher is the similarity of the two sellers. Before feeding into the training model, PCA is applied to the highly-sparse similarity matrix to improve the training efficiency [26].

(3) Feedback Features. The feedback features are defined as the inner product of the potential factors between the seller and the buyer, where the potential factors are obtained by the FM and the feedback matrix [27]. The meaning of a feedback feature is that different buyers who engage in purchasing behaviour with the same seller may have similar preferences, which denotes that if one buyer becomes a repeat buyer, another buyer is more likely to be a repeat buyer of the same seller.

3.3. DeepCatboost Individual Model

Catboost can automatically process the category features and use the relationship between features to enrich the original feature dimensions [28]. However, owing to the various representations of the historical behaviour data of buyers and the existence of missing data, there is a risk of overfitting in model training.

To improve the generalization ability of the Catboost model, the idea of extracting features layer-by-layer of representation learning is implemented to group and train the input data of the Catboost model, and the upper layer classification results are fed to the lower training set. The specific steps are as follows: (1)Multiple Catboost submodels are trained independently by randomly selecting a subset of features(2)The prediction results of the Catboost submodels are fused with the original features, and the output features are fed into the next layer for learning. The random noise represented by Formula (1) is introduced to mitigate the overfitting problem of the fusion processwhere denotes the historical buying feature after fusion, stands for the original feature, represents the prediction result of the Catboost submodel, and denotes the random noise (3)The features after fusion are learned in the second layer, and the prediction results of multiple Catboost submodels are fused by weighting to produce the final prediction results

The final DeepCatboost model is shown in Figure 2.

3.4. DeepGBM Individual Model

The DeepGBM individual model shown in Figure 3 combines the advantages of a neural network in dealing with large-scale sparse classification data and GBDT in dealing with dense numerical features. The model can produce strong classification and numerical features while maintaining the ability of efficient learning [29]. The model mainly includes two components: the GBDT2NN that focuses on the dense numerical features of the historical purchase behaviour of buyers, and the CatNN that deals with the sparse classification features of the buyer’s age and sex.

To apply the DeepGBM individual model to predict repeat buyers, a GBDT model is trained using historical purchase behaviour data of buyers, and then, the DeepGBM is trained by the leaves of the GBDT tree from where stands for the number of units in the training sample, represents the connection operation, and is a fully connected network that connects multiple one-hot indexes and converts them to a dense embedding vector of tree .

The output of the DeepGBM model is expressed as where and are the training parameters of the GBDT2NN component of Formula (6) and the CatNN component of Formula (7), respectively; and represents the binary output transformation of whether the buyer will purchase again or not.

The model is trained using the following loss function: where denotes a real repeat buyer, is the prediction result regarding a repeat buyer, denotes the cross-entropy loss function of the classification task whether a buyer will purchase again or not, and represents the embedding loss of the tree group . It can be inferred from Formula (5) that is the number of tree groups and and are the predefined super parameters that control the intensity of the end-to-end loss and the embedding loss, respectively. where represents the impact feature of buyer repeat purchase in the tree group . Because of the large scale and complex structure of the historical behaviour data of buyers, the number of trees in the tree group is large and each tree contains many features. To improve the ability of feature extraction, the top features are selected to represent all the features in the tree group according to the importance of each feature. where is the number of tree groups. where is the embedding vector, stands for the number of features, and are the linear part parameters, and represents the inner product. where denotes a multilayer neural network with input and parameter .

3.5. DABiGRU Individual Model

The DeepCatboost model is often inefficient in processing sparse historical behaviour data of buyers. Therefore, a DABiGRU model is proposed, which can make full use of the sparse and complex features that are automatically learned from massive data and can meet the basic requirement that individual models have substantial differences in ensemble learning.

The DABiGRU individual model is depicted in Figure 4. The model includes a feature embedding layer for encoding the original data, a bidirectional recurrent layer for modeling buyers’ purchase behaviour, a double attention layer for fusing the bidirectional recurrent layer, and a classification layer for expressing the prediction results.

3.5.1. Feature Embedding Layer

To predict the repeat buyers, the original data of the three aspects of information from buyers, sellers, and interaction between buyers and sellers are automatically extracted by the word embedding method and the DABiGRU neural network. Firstly, the word embedding model is applied to code the information about the interaction behaviour between buyers and sellers, age and sex of the buyer, etc., and the coding length is obtained by experiments. After the word vector code is obtained, the feature submodel is used to train the interaction behaviour information between buyers and sellers to get the feature vectors.

As shown in the feature embedding layer of Figure 4, each record of buyers’ behaviour is converted into word vectors by embedding and coding words containing the three variables commodity_id, brand_id, and commodity category_id. Then, the word vectors of each buyer are transformed into an -dimensional vector through the neural network ReLU_n. The corresponding weight vector of the -dimensional vector is obtained by ReLU_M, and the final eigenvector can be obtained by averaging all the -dimensional vectors according to the weight vector .

3.5.2. Bidirectional Recurrent Layer

There exists a sequential relationship in the historical behaviour data of buyers. The BiGRU model is adopted to model the long-term dependency relationship [30]. Compared with the traditional LSTM, this model is faster and avoids the vanishing gradient problem of the standard RNN, which is more suitable for predicting the purchasing behaviour of buyers.

The BiGRU model is a neural network that consists of a unidirectional and a heterogeneous GRU unit, as shown in Figure 5. The current hidden layer state of the BiGRU model is codetermined by the current input , the output of the forward hidden layer state, and the backward hidden layer state . As the BiGRU model can be regarded as two unidirectional GRU units, the hidden layer state of the BiGRU model at time can be obtained by the weighted sum of the forward hidden layer state and the backward hidden layer state . where the function makes a nonlinear transformation of the inputted word vector of buyers’ behaviour and converts it to the corresponding GRU hidden state, and represent the weights of the forward and backward hidden states of the BiGRU at time , respectively, and is the bias term of time.

The historical purchase behaviour sequence of the buyers is expressed as , where indicates the dimension of the hidden state. The historical purchase behaviour sequence of the buyers includes three types: browsing, purchasing, and collecting. The corresponding hidden sequence can be obtained by inputting the corresponding coded three feature vectors into the bidirectional recurrence layer. The browsing behaviour sequence of the buyers is expressed as ; the purchasing behaviour sequence of the buyers is expressed as ; and the collecting behaviour sequence of the buyers is expressed as .

3.5.3. The Double Attention Layer

In order to better integrate the three behaviour types of buyers’ browsing, buying, and collecting, a double attention mechanism is proposed. The lower attention mechanism allocates enough attention to the key information in the behaviour sequence, and the upper attention mechanism mainly focuses on the relationship between the three behaviour sequences [31].

The lower attention mechanism achieves repeat buyer prediction by the self-attention mechanism to obtain those behaviours that have a greater influence on repeat purchasing in each behaviour sequence. The self-attention mechanism, which usually does not require additional information, can automatically learn the weight distribution from the behaviour data of the buyers. Its formula is as follows: where stands for the degree of importance of the behaviour to the current behaviour sequence, represents the scoring system that is automatically learned from the behaviour data of buyers, and are the weight matrices, and is the bias term.

The lower attention mechanism is introduced into the BiGRU model, and the input of the lower attention mechanism is the output vector of the BiGRU model. The calculation formulas of the lower attention mechanism are as follows: where , , and represent the lower attention mechanism output of the buyers’ browsing, buying, and collecting behaviour sequences, respectively.

Different from the lower attention mechanism that focuses on the behaviour within the behaviour sequence of buyers, the upper attention mechanism pays attention to the three kinds of behaviour sequences of buyers, namely, browsing, buying, and collecting on repeat purchasing. For example, when a buyer purchases a product from a seller, the whole process may involve browsing, collecting, purchasing, etc. The upper attention model mines the interaction between different behaviour sequences to better model the buyer’s purchasing behaviour. Imitating the self-attention mechanism method in Transformer [32, 33], as shown in Figure 6, the upper attention mechanism characterizes the interaction between the buyers’ sequences by feeding two behaviour sequences and calculating the distances between the behaviours of the two behaviour sequences [34]. where indicates the attention to the browsing behaviour sequence of the buyers, indicates the attention to the purchasing behaviour sequence of the buyers, stands for the element-wise product of matrix, denotes the interattention to the browsing and purchasing behaviour sequences of the buyers, is the interattention to the browsing and collecting behaviour sequences of the buyers, and represents the interattention to the purchasing and collecting behaviour sequences of the buyers.

3.5.4. Classification Layer

The classification layer classifies the repeat buyers from the one-time buyers by the softmax function. The results of the bidirectional recurrent layer and the double attention layer are concatenated as the input of the classification layer [35].

The probability of each category is calculated as where is the weight matrix of is the dimension of the input vector and represents the number of categories of the repeat buyers and one-time buyers. After the prediction probability distribution is obtained, the cross-entropy loss function is introduced to calculate the difference between the real distribution and the predicted distribution. The parameters of the model are updated with back propagation [36, 37].

3.6. Vote-Stacking Prediction Model

The first layer of the different individual models of the traditional stacking models is implemented on the same training set, which makes the difference between the output values not significant, resulting in poor generalization performance. For this reason, the three-layer vote-stacking model shown in Figure 7 is introduced. The first and second layers are both the base learning layer, and the third layer is the meta learning layer. The final prediction results are obtained by applying the majority voting decision mechanism on the outputs of the individual models and the learning layers [38].

The first base learning layer includes the three individual models of DeepCatboost, DeepGBM, and DABiGRU. The model uses nonidentical training data to further improve the difference between the output values and improve the prediction ability of the model. The historical purchase behaviour data of buyers is sensitive to time. It can be inferred from experience that the closer the predicted time is, the greater is the influence of the historical behaviour of buyers on the results. Therefore, the original data is divided into three groups according to the period, and the data in each group is randomly divided into three data clusters. One data cluster in each group is selected randomly without putting it back, and the three data clusters selected from the three groups are combined and fed into each individual model for training. The process of building the individual models in the first base learning layer is as below.

The training set consisting of three data clusters is inputted into the individual model , and the probability that each buyer is a repeat buyer is predicted by five-fold cross-verification. The prediction results are expressed as , where stands for the classifier obtained from the individual model on the fold data subset; the value of may be 0 (denoting one-time buyer) or 1 (denoting repeat buyers).

The test results are obtained by inputting the test set into the trained three individual models, where represents the average five-fold cross-validation results of the test samples in each individual model.

The main difference between the second base learning layer and the first base learning layer is that the implicit relationship between the original features of the historical behaviour of buyers and the predicted probability of repeat buyers is retained in the second base learning layer. Compared with the base learning layer, the second learning layer includes an additional prediction result column for the five-fold cross-validation set and an additional test result column for the test set. In the first base learning layer, the prediction results of the five-fold cross-validation set of the three individual models are added to the features of the original training set as the features of the new training set . The test set prediction result is combined with the initial test set feature as the new features of the test set.

The new training set is divided into five nonoverlapping subsets. One of the five subsets is chosen as the test set and the other four subsets as the training sets to train and test the individual models DeepCatboost, DeepGBM, and DABiGRU. The above process is repeated until all the five subsets are utilized as the test set for one time each. The prediction results are saved as . In the process of establishing the individual models, each model is tested five times on the test dataset, and the average value is taken as the test results.

The third meta learning layer can effectively integrate the advantages of the three individual models and improve accuracy and stability. LR has a strong generalization ability and can reduce the overfitting risk of vote-stacking, so the meta learning layer is modeled by LR [39].

4. Experiment and Analysis

4.1. Experimental Data and Evaluation Indicators

The experimental data contains approximately 260,000 anonymous buyers’ shopping information six months before and after the “Double Eleven Shopping Festival” (https://tianchi.aliyun.com/competition/entrance/231674). It is mainly composed of three tables of information relating to the buyers: the personal information table, the buyer behaviour log table, and the purchase behaviour table of the buyer. The ratio of the positive sample of repeat buyers to the negative sample of one-time buyers is about 1 : 10. The sample balancing process is carried out by the subtime under sampling method. Whether a buyer becomes a repeat buyer of a given merchant six months after the promotion is predicted by using the historical data information.

For the repeat buyer prediction problem, the sample category combination of the real category and the predicted category can be divided into four categories: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). After the digital confusion matrix is obtained, the accuracy and area under the curve (AUC) values can be calculated as where is the total number of repeat buyers for positive samples and is the total number of one-time buyers of negative samples.

4.2. Experimental Results and Analysis
4.2.1. Word Vector Encoding Length Selection

The encoding length of the word vector of the feature embedding layer in the DABiGRU individual model is obtained by experiments. The encoding length is positively related to the size of the coding objects; and the higher the total number of words, the longer is the encoding length. The ratio of the total number of words about commodities, brands, and commodity categories is obtained by statistics and is about 625 : 5 : 1. The experiments are carried out based on the five groups of word vector length combination shown in Table 1. As the total number of words about age and sex of the buyer information is small, the words vector length is set as 10.

Experimental results show that the AUC value improves with the increment of the word vector encoding length. The main reason is that the longer is the encoding length, the stronger is the representation ability of the code, and the overall representation ability of the DABiGRU individual model is improved accordingly. However, the longer is the word vector coding length, the more system resources are consumed by the DABiGRU individual model. Considering the limitations of the experimental environment, the word vector encoding length combination of [150, 80, 80, 10, 10] was selected for the following experiments.

4.2.2. Impact Feature Analysis of Repeat Purchase by Buyers

To analyze the impact features of the repeat purchases by buyers, Table 2 presents the top 10 impact features of the DeepCatboost individual prediction model.

The features listed in Table 2 show that the interaction feature has the strongest impact on repeat purchase, followed by the seller features and the buyer features. The result on the interaction between the buyer and seller demonstrates that feature (1) represents the degree of “preference” of a buyer for a seller. The actual situation is that buyers are often accustomed to buying goods from the sellers they have purchased from previously. Features (5) and (6) measure the length of time a buyer “stays” with a given seller. According to experience, the longer a buyer “stays” in a certain merchant, the more likely it is that they will buy. From the buyer’s point of view, it can be inferred from feature (1) that a buyer’s promotion intensity has the greatest impact on predicting repeat purchase. And from features (2) and (4), we can conclude that the buyer purchase conversion rate is also an important impact factor for repeat purchase prediction. From the seller’s point of view, the buyer’s age in feature (7) and gender in feature (10) also show a certain impact on repeat buyer prediction.

4.2.3. Improved Individual Model and Vote-Stacking Ensemble Model Test

Figure 8 demonstrates the difference in accuracy and AUC values before and after the improvement of the DeepCatboost and DABiGRU individual prediction models.

It can be observed from the experimental results shown in Figure 8 that the accuracy and AUC values of the improved DeepCatboost and DABiGRU individual models are superior to the original Catboost and LSTM models on both the training set and the test set, which indicates that the improved DeepCatboost individual model can effectively reduce the risk of overfitting and improve accuracy and robustness of predictions. The DABiGRU model outperforms the original LSTM neural network model by introducing the attention mechanism and improving the neural cell structure. This is because the added double attention mechanism can effectively mine and pay sufficient attention to the key information within and between the sequences of the historical purchase behaviour of buyers. Compared with the directional LSTM neural network model, the output state of the BiGRU neural network model is codetermined by several previous inputs and subsequential inputs, and the prediction result is more accurate. In addition, experimental results show that the accuracy and AUC values of the proposed model on the training set and the test set are not very different, and there are no overfitting or underfitting problems.

Table 3 shows the experimental results of the accuracy and AUC values of five ensemble learning prediction models. In order to evaluate the experimental results objectively and accurately, each model was executed 10 times on the same training and test sets, and the average and variance of the accuracy and AUC values were considered the final results.

It can be observed from Table 3 that the average values of the accuracy and AUC of the vote-stacking ensemble model are 91.28% and 70.50%, which are higher than those of the other four ensemble models for both the training set and the test set. This demonstrates the effectiveness of the improvement of the vote-stacking ensemble prediction model on selecting individual models and stacking fusing strategies. The main reasons are as follows: compared with the XGBoost and LightGBM ensemble learning models based on GBDT, the proposed ensemble model can not only effectively deal with the dense numerical features in the historical behaviour data of buyers but also can process large-scale sparse classification features efficiently, which further enriches the feature types and improves the performance of the model. The nonidentical training data is employed in the vote-stacking model to increase the difference between the output values. The features of the original data and the new features generated by the base learning layer are effectively combined to enrich the feature types of the model. The final test results are determined by the voting mechanism to effectively combine the test results of the base learning layer and the meta learning layer, which reduces the error of the traditional stacking algorithm that relies only on the meta learning layer. In addition, the difference of accuracy and AUC values between the training set and test set is small, indicating that the model has better generalization ability.

In order to verify the stability of the vote-stacking model more intuitively, the accuracy values of the five models on 10 training sets are presented in a line graph as shown in Figure 9.

It can be seen from Figure 9 that compared with the individual models, the vote-stacking and stacking models have less fluctuation in the interval with higher accuracy and their overall stability is better. Therefore, the vote-stacking fusion method is more suitable than the traditional stacking method for the repeat buyer prediction task where the dataset difference is significant. By combining the prediction results of the three individual models, the ensemble learning model can effectively avoid the selection of any individual model with a poor prediction result. By improving the performance of each individual model and collaborating between the individual models, an ensemble model can improve the overall accuracy and stability of the predictions.

It can be observed from Figure 10 that the AUC values of the seven models are improved by different extents after subtime under sampling sample balancing. Among them, the AUC values of the DeepCatboost, LightGBM, vote-stacking, and stacking models are enhanced more obviously. Experimental results show that, when the subtime under sampling sample balancing is used to process the historical behaviour data of buyers, even if the negative sample data is reduced leading to the loss of some information, the prediction accuracy of the model is improved.

In order to verify the effectiveness and functionality of the vote-stacking model, it is compared with the five other prediction models, namely, SVM [8], RF [11], TMFBG [4], neural network (NN) [10], and blending [5] with the same datasets and experimental settings. The results are shown in Table 4.

The proposed model outperforms the other five models in terms of the accuracy and AUC values. The reason is that the proposed model can effectively fuse the advantages of the DT and NN models on random and nonlinear training data and efficiently process high-dimensional nonlinear historical behaviour data of buyers. It also has better generalization ability than the reference models.

5. Conclusions

A repeat buyer prediction ensemble model based on the vote-stacking fusion method was introduced in this paper. First, the model applies the subtime under sampling sample balancing to process the unbalanced historical behaviour data of buyers. Then, the three individual models DeepCatboost, DeepGBM, and DABiGRU are constructed consecutively. Finally, the vote-stacking fusion method is used to fuse the prediction results of the three individual models and obtain the final prediction results. The effectiveness of the proposed method has been evaluated on the Tmall real dataset. The experimental results showed that the proposed model is superior to the reference models. In addition, the DeepCatboost individual model was used to analyze the important impact features related to buyers’ repeat purchase behaviour and to provide data support for sellers to carry out accurate marketing and increase buyers’ loyalty.

In the future, we will continue to investigate the impact factors associated with buyers’ repeat purchase behaviour, construct additional fine-tuned features, and design ensemble prediction models with a higher number of individual models that are also more different. In addition, we will attempt to derive better ensemble learning fusion strategies to further improve the performance of repeat buyer prediction.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (grant numbers 61662013, U171123, U1811264, and 61662015); Guangxi Innovation-Driven Development Project (Science and Technology Major Project) (grant number AA17202024); and Graduate Student Innovation Program, Guilin University of Electronic Technology (grant number 2019YCXS045). We thank LetPub (https://www.letpub.com) for its linguistic assistance during the preparation of this manuscript.