[Retracted] Real-Time Prediction of Cross-Border e-Commerce Spike Performance Based on Neural Network and Decision Tree

Li, Na; Gong, Chongyi; Lv, Dongqin

doi:https://doi.org/10.1155/2022/5066467

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Related Work Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article Retraction

!

This article has been Retracted. To view the article details, please click the ‘Retraction’ tab above.

Special Issue

Machine Learning Enabled Signal Processing Techniques for Large Scale 5G and 5G Networks

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 5066467 | https://doi.org/10.1155/2022/5066467

[Retracted] Real-Time Prediction of Cross-Border e-Commerce Spike Performance Based on Neural Network and Decision Tree

Na Li,¹Chongyi Gong,²and Dongqin Lv²

Academic Editor: Mohammad Farukh Hashmi

Received08 Mar 2022

Accepted23 Mar 2022

Published14 May 2022

Abstract

During the spike in the activity of the cross-border e-commerce company, due to the limitation of time and data, the historical activity performance data and some data of this activity are used. Due to the limitation of data and the specificity of the prediction task, the original data are modeled and predicted by using a BP neural network model after a series of processing. This paper proposes a prediction model based on a decision tree and BP neural network model, through this real-time prediction model to predict the performance of the company’s spike activity every minute but also can play an early warning role, which is more helpful to the company’s decision-making. In fact, the company also used this model to detect a trend of lower performance during the May spike and then improved the performance and sell-out rate through email marketing and increased discounts to avoid inventory backlog.

1. Introduction

With the development of the Internet and the increasing number of Internet users, the e-commerce industry has emerged as one of the biggest beneficiaries of the Internet [1]. In recent years, it has also become the preferred shopping method for almost everyone. While the e-commerce industry is emerging, the competition in the industry is also increasing, and e-commerce companies are more dependent on data and more relevant than traditional retailers [2, 3]. For every e-commerce company, performance is undoubtedly the most important indicator for the company, especially when it comes to predicting performance during spike events [4].

In recent years, the number of small stores engaged in marketing through the Internet has been increasing year by year. According to incomplete statistics, there are more than 10 million online businesses on the Jingdong financial platform alone [5]. There are thousands of small stores gathered on the e-commerce platform, and these stores, like other traditional enterprises, are facing the problem of difficult financing. Their financing channels are mainly through borrowing from banks, public financing, and using third-party institutions as guarantees. The information system of small and medium-sized stores is not sound [6]. For example, they lack formal financial statements, valuable collateral, and perfect credit records. Therefore, it is difficult for small- and medium-sized e-commerce enterprises to obtain loans from banks. For banks, the cost of credit business for SMEs is high and risky. In short, one of the main reasons why it is difficult for small- and medium-sized retailers to obtain loans is the low value of the collateral. The capital piece has been a key factor limiting the growth of small- and medium-sized stores since the beginning [1].

To limit the development of small- and medium-sized stores on the e-commerce platform and expand the scale of operation of the financing problem, Alibaba, Jingdong, Suning, and other leaders in the e-commerce industry use their own reserves of large customer resources and database, set up their own independent microfinance companies, and launch the corresponding products to help the platform of enterprises and e-commerce to solve this problem [7, 8]. For example, Jingdong Finance has launched three core products: Jingbao Bei, Beijing Microfinance, and Real Estate Financing, which have greatly improved the difficult and high-cost financing situation for small and micro e-commerce enterprises. Among them, Beijing Microloan is a credit-first financial product with features such as high loan autonomy, no collateral required, fast and low cost of funds availability, actions such as borrowing and repayment at any time, and full online approval. Thus, it accelerates the flow of funds for enterprises and merchants, enabling them to use these funds to develop more convenient sales channels and a broader market environment [9].

By searching the related literature, we found that there are many models for sales forecasting, such as the common linear regression model, exponential smoothing forecasting in time-series, logistic regression, and logistic regression + woe combination that can be used to do forecasting. In the process of modeling, enough historical data should be collected to predict the value with high accuracy. In the case of commodity sales forecasting, the demand for products in the retail industry is unstable, and the trend of sales volume changes without any obvious pattern due to the current hobbies and consumption habits of people, which are changing all the time. There are also various external social factors such as climate, commodity markets, and national macrocontrol policies [10]. It also faces the impact of rapid changes in market demand, fashion trends, market promotion, and other factors. These changes often lead to the forecast of goods affected, so there is a great deal of uncertainty, and the use of the data collected in the first period to make accurate forecasts of the later sales needs to be further studied [10].

However, since the data studied in this paper do not take into account specific products, such as the clothing industry, electronic household appliances, and electronic devices, we can choose a model that can make accurate predictions for short-term forecasts without considering the influence of climate, hobbies, demand, etc. and just start from some actual data of the given store every day [11]. The data mining algorithms BP neural network model and GBDT algorithm in machine learning are applied to mine some classifications that people usually do not pay attention to, and in life, people just evaluate objectively some phenomena. Whether it is an online store or a brick-and-mortar store, a large variety of products means more sales, but there is no analysis of the main influencing factors within it [12, 13].

The contributions of this paper are as follows:

This paper proposes a prediction model based on a decision tree and BP neural network model. Through this real-time prediction model to predict the performance of the company’s second kill activities per minute, it can also play the role of early warning,

Under many difficulties and restrictions, we choose to process all the order data in the database and use BP neural network model to predict. In order to predict the real-time change in performance, we select the cumulative performance value as the output of the model, because the second kill activity time is short.

This paper makes some analysis on the classified stores, so that after comparing the sellers with the same type of stores, it can formulate a clearer development strategy for the development of their own stores in the future, then improve the reputation and profit of the store, so that the seller has a better reputation in the hearts of customers and shows a higher credit limit and ability in front of investors.

In recent years, with the rapid development of neural networks, more and more scholars have started to apply neural network methods to various fields, including commodity sales forecasting. For example, Fong et al. [14] applied neural networks for economic forecasting, and Lee et al. [15] used neural network models for forecasting commodity sales. The results of [16] demonstrate that the strong nonlinear approximation capability of neural networks is very good for forecasting the sales of goods. In [17], the prediction of online store sales is based on BP neural network. Alsalman et al. [18] alleviate the problem of locally optimal solutions and give the neural network real “depth,” so that deep learning neural networks are gradually used in sales prediction research. For example, Abdulkarem and Hou [19] automatically extracted effective features from the original structured data by the convolutional neural network, used the method to achieve product sales prediction, and finally verified the effectiveness of the algorithm by e-commerce dataset; Du et al. [20] applied deep learning algorithm to build the Crown model, a sales prediction model for agricultural e-commerce, and effectively achieved the sales prediction of agricultural e-commerce; Yang et al. [21] used stochastic gradient descent as a supervised learning training method to build an LSTM neural network and compared the performance of BP and the model in predicting future sales of goods, proving that the LSTM neural network has some adaptiveness and is more suitable for sales prediction in the e-commerce industry.

Lu et al. [9] analyzed the types of risks and causes of the formation of e-commerce finance from the perspective of microfinance companies established by using e-commerce platforms and finally concluded that in order to control the risks on e-commerce platforms, a perfect credit risk evaluation system should be established. Ren et al. [1] studied the lending strategies of e-commerce platforms to e-merchants and their corresponding adjustment mechanisms under the condition of controlling certain risks.

Although neural networks are increasingly used in the e-commerce industry, the use of spike activities is still relatively blank, which is caused by the characteristic that neural networks need enough data for training. This paper will construct enough data for neural network training by processing the original data and observing the accuracy of BP neural network model prediction.

3. Introduction to the Boosting Algorithm and Decision Trees

3.1. Introduction to Decision Trees

The idea and process of random forest in learning a single decision tree is to extract only a part of the feature data, add regular terms to the objective function to penalize the complex tree structure, etc.

GBDT is the regression tree of decision trees. The decision tree is divided into a regression tree and classification tree, and the metric of the classification tree is maximum entropy, while the metric of the regression tree is the minimization of mean square error. The GBDT algorithm can be used to classify and regress data. GBDT is composed of multiple decision trees, usually at least a hundred trees, and each tree is small in size (i.e., the depth of the tree will be shallow). When the model predicts, a sample of training data is input, then each decision tree is used, each tree is adjusted to correct the predicted value, and finally, the prediction result is obtained [22].

GBDT is a member of the integrated learning boosting family, and GBDT also uses the principle of iteration. In the iteration of GBDT, suppose we get a strong learner of and a loss function of in the previous round, and the purpose of our current iteration is to find a weak learner of the CART regression tree model , so that the loss function of in this round is minimized, that is, to find a decision tree in this round so that the loss of the sample becomes as small as possible.

Neural networks mimic the activation and transmission process of human neurons. In Figure 1, from left to right, the input layer (input), the hidden layer (hidden), and the output layer (output) are shown.

Each neuron will first accumulate stimuli from individual neurons and then pass them to the next layer of neurons after being activated by the activation function, as shown in Figure 2.

We already know that, in the BP neural network model, we have a three-layer structure, the input layer, the hidden layer, and the output layer, the weight of the input layer to the hidden layer is set to , the bias is , and then the input and output of each hidden layer neuron are

The forward propagation process is over, and then the BP process follows. The BP neural network first gives all parameters a random value and then updates all parameters of the neural network from back to front by using the gradient descent method through the set error function [21, 23]. Set all parameter vectors to , where represents weight, represents offset, and the error function is . The partial derivative vector of the error function for each parameter is

Then update the parameter vector is

4. Data Selection and Processing

Since we are predicting the performance of the spike activity, we select the data of all the order tables in the company database during the activity period, and the order table records the data of each customer’s email, order time, order number, order amount, and other variables. The data are selected from all the data about the order time and order amount.

However, the performance prediction during the spike campaign period is different from the usual performance prediction because the duration of the spike campaign is generally short and the amount of data available for prediction is small. We chose to use the BP neural network model to forecast all the order data in the database after data processing, due to many difficulties and limitations [24–26]. In order to predict the real-time change in performance, we chose the cumulative performance value as the output of the model, because of the short time of the spike campaign; in order to get enough data, we chose to slice the order time by minutes, so that we can have enough sample size for BP neural network training, then sum the order amount per minute to get the performance value per minute, and then accumulate the performance value per minute by time to get the cumulative performance value per minute. Then, the performance value per minute is accumulated by time to get the cumulative performance value per minute, which turns the original data into incremental time-series data.

5. Analysis of Experimental Results

5.1. Modeling and Analysis Based on GBDT Algorithm

The experimental links of this paper mainly involve the division of citation network community, document representation learning algorithm, and the training of neural network model. Experimental environment is as follows: single machine, 8 cores, [email protected] GHz, memory 36 GB, and programming environment: PyTorch [4]. The setting of main experimental parameters refers to the general setting methods of various models and algorithms.

Based on the GBDT algorithm and implementation steps, 90% of the data set consisting of the total variables screened by the generalized linear model was used as the training set, and the remaining 10% was used as the test set to study the model using the R software. Three variables such as ord_cnt, rnt_amt, and offer_amt are used as inputs to fit the model using the GBDT algorithm. In R, we can use the relevant functions in the gbm package to implement the algorithm. The loss function is chosen as the mean squared error, hrinkage (learning rate), and we all know that it is easy to slip if the step is too big, so the learning rate is as small as possible, but if the step is too small, the number of steps has to be increased, which means that the number of training iterations needs to be increased in order to make the model optimal, so the training time and computational resources required are also increased accordingly. The hrinkage is 0.01 (bag.fraction): the resampling ratio is 0.5 (interaction.depth) (the smaller the depth of the tree, the better), so in this model, after continuous attempts, the final tree depth is 1, and the maximum number of iterations is 1000.

Using the cross-validation method to determine the optimal number of iterations at 316, the residuals reach the minimum number of cross-iterations as shown in Figure 3.

Using graphs to explain the importance of each explanatory variable, the index of influence of order quantity (ord_cnt) on the dependent variable is 58.03, the index of influence of rtn_amt (return amount) on the dependent variable is 20.98, and the index of influence of offer_amt (discount amount) on the dependent variable is 20.97. The importance of the variables according to the GBDT algorithm is consistent with the generalized linear model. The importance of the variables according to the GBDT algorithm is in full agreement with the results of the generalized linear model. The details are shown in Figure 4.

The three models, generalized linear, GBDT algorithm [5, 6], and BP neural network model, were used for prediction, respectively. Figures 5 and 6 show the comparison of the predicted values with the true values for the final test data using these three models, and the relative errors between the three and the true values are calculated. The results in Figures 5 and 6 show that the GBDT algorithm fits the predicted and true values of sales in this category better than the generalized linear and BP neural network methods, and the relative error is basically below 5%, with only one or two values above 6%. It can be seen that the prediction of the future sales of the first category of stores is better simulated using the GBDT algorithm, and the prediction accuracy is higher.

5.2. BP Neural Network Prediction Effect

A training set of 1000 data from the middle of the activity was selected to train the network and then used to predict the performance of the next 500 minutes. As shown in Figure 7 where the data set is relatively simple, a three-layer fully connected neural network is built, the ReLU function is used for the activation function, and a dropout layer is added to prevent overfitting. Both of them decrease rapidly and stabilize.

Figure 8 shows the trend comparison between the predicted and true values, it can be observed that the two are still relatively close to each other, and the accuracy of the prediction is still good. This can guide the decision-making to a large extent, but of course, the length of the forecast can be increased if necessary.

In order to evaluate the accuracy of the BP neural network model more comprehensively, we applied the model to the entire campaign data set, and in order to make the model more accurate, we chose to use the first 500 minutes to predict the next 200 minutes and then to make a global forecast of the entire campaign, as shown in Figure 9. However, there is a small area of inaccuracy in the middle, and after observing the data, we found that the difference in this small area of data is due to the addition of mail marketing in that time period of the campaign, which led to a small surge in performance.

6. Conclusions

In this paper, we use the LSTM neural network in this prediction task. In fact, the company used this model to predict the performance of the company’s spike campaign in May and then improved the performance and sell-out rate through email marketing and increased discounts to avoid inventory backlogs. In the future, we plan to build e-commerce sales prediction models for other scenarios, for example, to effectively realize the sales prediction of agriculture-related e-commerce.

Data Availability

The dataset used in this paper is available from the corresponding author upon request.

Conflicts of Interest

The authors declared that they have no conflicts of interest regarding this work.

Acknowledgments

This work was supported by the General Scientific Research Project of Zhejiang Provincial Department of Education—research on the relationship between economic growth and environmental quality in Zhejiang Province under the practice of “two mountains” concept (Y202045106).

References

S. Ren, T. M. Choi, K. M. Lee, and L. Lin, “Intelligent service capacity allocation for cross-border-E-commerce related third-party-forwarding logistics operations: a deep learning approach,” Transportation Research Part E: Logistics and Transportation Review, vol. 134, p. 101834, 2020.
View at: Publisher Site | Google Scholar
J. Guo, Y. Li, Y. Xu, and K. Zeng, “How live streaming features impact consumers’ purchase intention in the context of cross-border E-commerce? A research based on SOR theory,” Frontiers in psychology, vol. 12, p. 5015, 2021.
View at: Publisher Site | Google Scholar
L. Einav, J. Levin, I. Popov, and N. Sundaresan, “Growth, adoption, and use of mobile E-commerce,” American Economic Review, vol. 104, no. 5, pp. 489–494, 2014.
View at: Publisher Site | Google Scholar
P. Dutta, P. Suryawanshi, P. Gujarathi, and A. Dutta, “Managing risk for e-commerce supply chains: an empirical study,” IFAC-Papers OnLine, vol. 52, no. 13, pp. 349–354, 2019.
View at: Publisher Site | Google Scholar
S. Ji, X. Wang, W. Zhao, and D. Guo, “An Application of a Three-Stage XGBoost-Based Model to Sales Forecasting of a Cross-Border E-Commerce Enterprise,” Mathematical Problems in Engineering, vol. 2019, 15 pages, 2019.
View at: Publisher Site | Google Scholar
W. Mu, “A big data-based prediction model for purchase decisions of consumers on cross-border e-commerce platforms,” Journal Européen des Systèmes Automatisés, vol. 52, no. 4, pp. 363–368, 2019.
View at: Publisher Site | Google Scholar
W. Gao, “Intelligent Prediction Algorithm of Cross-Border E-Commerce Logistics Cost Based on Cloud Computing,” Scientific Programming, vol. 2021, 10 pages, 2021.
View at: Publisher Site | Google Scholar
Y. Su, Y. Wang, and C. Mi, “The forecast of development prospects of China's cross-border E-commerce based on grey system theory,” in In 2017 International Conference on Grey Systems and Intelligent Services (GSIS), pp. 182–186, Stockholm, Sweden, Aug. 2017.
View at: Publisher Site | Google Scholar
C. W. Lu, G. H. Lin, T. J. Wu, I. Hu, and Y. C. Chang, “Influencing Factors of Cross-Border E-Commerce Consumer Purchase Intention Based on Wireless Network and Machine Learning,” Security and Communication Networks, vol. 2021, 9 pages, 2021.
View at: Publisher Site | Google Scholar
C. H. Cao, Y. N. Tang, D. Y. Huang, G. WeiMin, and Z. Chunjiong, “IIBE: an improved identity-based encryption algorithm for wsn security,” Security and Communication Networks, vol. 2021, Article ID 8527068, 8 pages, 2021.
View at: Google Scholar
X. I. E. Tao, Z. H. A. N. G. Chunjiong, and X. U. Yongjian, “Collaborative parameter update based on average variance reduction of historical gradients,” Journal of Electronics and Information Technology, vol. 43, no. 4, pp. 956–964, 2021.
View at: Publisher Site | Google Scholar
Z. Ibrahim and D. Rusli, “Predicting students’ academic performance: comparing artificial neural network,” in 21st Annual SAS Malaysia Forum, Shangri-La Hotel, Kuala Lumpur, September 2007.
View at: Google Scholar
M. Ali, E. Eyduran, M. M. Tariq et al., “Comparison of artificial neural network and decision tree algorithms used for predicting live weight at post weaning period from some biometrical characteristics in Harnai sheep. Pakistan,” Journal of Zoology, vol. 47, no. 6, pp. 1579–1585, 2015.
View at: Google Scholar
S. Fong, Y. W. Si, and R. P. Biuk-Aghai, “Applying a hybrid model of neural network and decision tree classifier for predicting university admission,” in 2009 7th International Conference on Information, Communications and Signal Processing (ICICS), pp. 1–5, Macau, China, Dec. 2009.
View at: Publisher Site | Google Scholar
S. M. Lee, J. O. Kang, and Y. M. Suh, “Comparison of hospital charge prediction models for colorectal cancer patients: neural network vs. decision tree models,” Journal of Korean Medical Science, vol. 19, no. 5, pp. 677–681, 2004.
View at: Publisher Site | Google Scholar
J. Wang, M. Li, Y. T. Hu, and Y. Zhu, “Comparison of hospital charge prediction models for gastric cancer patients: neural network vs. decision tree models,” BMC Health Services Research, vol. 9, no. 1, pp. 1–6, 2009.
View at: Google Scholar
K. Mathan, P. M. Kumar, P. Panchatcharam, G. Manogaran, and R. Varadharajan, “A novel Gini index decision tree data mining method with neural network classifiers for prediction of heart disease,” Design Automation for Embedded Systems, vol. 22, no. 3, pp. 225–242, 2018.
View at: Publisher Site | Google Scholar
Y. S. Alsalman, N. K. A. Halemah, E. S. AlNagi, and W. Salameh, “Using decision tree and artificial neural network to predict students academic performance,” in In 2019 10th International Conference on Information and Communication Systems (ICICS), pp. 104–109, Irbid, Jordan, June 2019.
View at: Publisher Site | Google Scholar
A. Abdulkarem and W. Hou, “The impact of organizational context on the levels of cross-border E-commerce adoption in Chinese SMEs: the moderating role of environmental context,” Journal of Theoretical and Applied Electronic Commerce Research, vol. 16, no. 7, pp. 2732–2749, 2021.
View at: Publisher Site | Google Scholar
S. Du, H. Li, and B. Sun, “Hybrid Kano-fuzzy-DEMATEL model based risk factor evaluation and ranking of cross-border e-commerce SMEs with customer requirement,” Journal of Intelligent & Fuzzy Systems, vol. 37, no. 6, pp. 8299–8315, 2019.
View at: Publisher Site | Google Scholar
Y. Yang, L. Yang, H. Chen, J. Yang, and C. Fan, “Risk factors of consumer switching behaviour for cross-border e-commerce mobile platform,” International Journal of Mobile Communications, vol. 18, no. 6, pp. 641–664, 2020.
View at: Publisher Site | Google Scholar
A. Radwan, K. M. S. Huq, S. Mumtaz, K. F. Tsang, and J. Rodriguez, “Low-cost on-demand C-RAN based mobile small-cells,” IEEE Access, vol. 4, pp. 2331–2339, 2018.
View at: Google Scholar
Y. Luo, J. Ma, and C. Li, “Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM,” Electronic Commerce Research, vol. 20, no. 2, pp. 405–426, 2020.
View at: Publisher Site | Google Scholar
L. Junfang and C. Shan, “Design of Sino–Japanese cross border e-commerce platform based on FPGA and data mining,” Microprocessors and Microsystems, vol. 80, p. 103360, 2021.
View at: Publisher Site | Google Scholar
R. Shen, “The comparative history and development of E-commerce in China and the United States,” Journal of Mathematical Finance, vol. 10, no. 3, pp. 483–498, 2020.
View at: Publisher Site | Google Scholar
B. Song, W. Yan, and T. Zhang, “Cross-border e-commerce commodity risk assessment using text mining and fuzzy rule-based reasoning,” Advanced Engineering Informatics, vol. 40, pp. 69–80, 2019.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Na Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

348

Downloads

363

Citations

Wireless Communications and Mobile Computing

Machine Learning Enabled Signal Processing Techniques for Large Scale 5G and 5G Networks

[Retracted] Real-Time Prediction of Cross-Border e-Commerce Spike Performance Based on Neural Network and Decision Tree

Abstract

1. Introduction

2. Related Work

3. Introduction to the Boosting Algorithm and Decision Trees

3.1. Introduction to Decision Trees

4. Data Selection and Processing

5. Analysis of Experimental Results

5.1. Modeling and Analysis Based on GBDT Algorithm

5.2. BP Neural Network Prediction Effect

6. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright