Abstract

E-commerce has become a crucial business model through the Internet around the world. Therefore, its transaction trend forecast can provide important information for the market planning and development in advance. For this purpose, the integrated model of enhanced whale optimization algorithm (EWOA) with support vector machine (SVM) is proposed for forecast of E-commerce transaction trend in this study. First, the global optimization ability of the whale optimization algorithm (WOA) is enhanced by the search updating strategy. Second, multiple factors that may affect the E-commerce transaction trend are analyzed and determined using the gray correlation mechanism. Third, the EWOA algorithm is employed to optimize the SVM random parameters. Finally, the EWOA-SVM model is established for forecasting E-commerce transaction trend. Two representative cases tests confirm that the EWOA-SVM model is superior to other existing methods in terms of fast convergence speed and high prediction accuracy.

1. Introduction

The current digital economy is moving forward much faster than before in recent years. Therefore, E-commerce transactions have become an important core of the digital economy in the global market [1, 2]. The rapid E-commerce development has resulted in certain changes in the logistics, manufacturing and traditional retail industries, etc. Transaction volume is usually regarded as a crucial indicator used for assessment of the E-commerce development level. As a result, E-commerce transactions trend forecast is indispensable to provide a quantitative basis for the long-term planning and strategy formulation for enterprises and governments [3, 4].

At present, the prediction methods applied for E-commerce transactions trend have been focused on machine learning models, regression models, and combination models [5, 6]. Machine learning models include neural network model, support vector machine (SVM), extreme learning machine (ELM), etc. [7]. They are somehow sensitive to parameters selection to predict the E-commerce transactions trend through the mapping relationship between influencing factors and transactions volume [8]. The regression models mainly refer to moving average model (MA), autoregressive model (AR), autoregressive moving average model (ARMA), and the nonlinear regression models. They may work well in analyzing the stationary series, but they are not suitable for the nonstationary time series analysis. On the other hand, the combination models can provide competitive results and outperform the single model, but their computational cost is relatively higher than that of the single model [9, 10].

Zhang et al. [11] applied the ELM model for forecasting E-commerce transactions and proposed an improved optimization algorithm to optimize the random model parameters. However, only few basic influencing factors were considered so that it may be not applicable in real circumstances. Ji et al. [12] combined XGBoost and ARIMA models to predict the size of E-commerce transactions, presenting better results than individual XGBoost and ARIMA models, but the complexity and computational cost may post a difficulty for further applications. Alternatively, Chen et al. [13] combined clustering technology and machine learning model to forecast the transaction size. Clustering technology is used to divide the training samples, and then the machine learning model was applied to train groups. The random parameters that may influence machine learning performance were not well considered, resulting in the instability of prediction results. Di Pillo et al. [14] employed SVM to predict the sales scale. Compared with linear regression models, the SVM model has a stronger nonlinear mapping ability. However, it is sensitive to random model parameters. Mao et al. [15] predicted China’s E-commerce online transaction volume based on the combination of ARMA model and gray model, and the forecasting result was satisfactory.

SVM model is suitable for small sample prediction, and it has strong generalization ability and less random parameters. Li et al. [16] proposed an improved dragonfly algorithm to optimize SVM’s random parameters in short-term wind power prediction. Liu et al. [17] applied SVM to forecast the remaining life of lithium-ion batteries and used chicken swarm optimization algorithm to solve random parameters. Pham et al. [18] employed the SVM model for rainfall prediction and achieved high prediction accuracy. Additionally, Hossain and Muhammad [19] applied SVM in emotion recognition system for emotion classification. Huang and Wang [20] used SVM model for pattern classification, where the genetic algorithm was applied to optimize model’s random parameters to improve classification accuracy.

For SVM to be used for forecasting the E-commerce transaction trend, two main problems need to be resolved. The first task is to reduce the influence of random parameters of SVM model, which is an optimization point. The other is to choose crucial factors on E-commerce transaction trend. Consequently, this study proposes an integrated model using enhanced Whale Optimization Algorithm (EWOA) with SVM model on the basis of multiple factors analysis and machine learning. In this approach, EWOA algorithm was used to optimize the random parameters of SVM model. The modeling process is introduced in Section 2. Section 3 analyzes multiple influencing factors and determines critical points on E-commerce transactions. Section 4 validates the proposed model through two cases. The conclusions with future work are presented in Section 5.

2. Models of E-Commerce Transaction Trend Forecast

2.1. SVM Model

SVM, which is a hot spot model in machine learning models, has the characteristics of simple structure, few adjustable parameters, and strong generalization ability. It is often used in pattern recognition, disease diagnosis, regression prediction, and other fields [21]. The samples are mapped to the high-dimensional space through a mapping function . Given a sample set , the hyperplane between input x and output y is established in a high-dimensional space as follows [22]:where represents the output function, is the weight, and l indicates the offset.

is transformed into a constrained optimization problem through the principle of structural risk minimization. Taking into account the errors, the slack variable is introduced into the objective function. The constraints minimization is expressed as follows [23]:where and are slack variables, is a penalty coefficient, and is the error.

The optimization problem is transformed into solving the equation by Lagrange multiplier, then the derivation of each variable is performed, and finally the dual form of the optimal problem is obtained [24, 25].where and are Lagrange multipliers, is kernel functions, and the radial basis function (RBF) kernel is used in this study.where ( > 0) is the size of the kernel parameter.

Finally, the SVM regression function is defined as follows:

Generally, in SVM model, the penalty coefficient and kernel coefficient are random parameters, which bring uncertainty to the prediction results under the complexity of the data. To solve this problem, these random parameters need to be optimized. For this reason, a whale optimization algorithm based on search updating strategy is proposed to optimize SVM parameters and achieve the prediction accuracy for E-commerce transaction trend.

2.2. WOA Algorithm

To date, a variety of intelligent algorithms have been developed and applied, such as PSO algorithm [26], crow search algorithm (CSA) [27], and a series of hybrid swarm algorithms [2833]. Each algorithm uses a different method or strategy to fit some specific purposes or applications. For example, Zapata et al. [34] developed a hybrid swarm algorithm for collective construction of 3D structures, and Precup [35] proposed slime mould algorithm-based tuning of cost-effective fuzzy controllers for servo systems. Alternatively, WOA, which has a strong optimization ability, is a new swarm intelligence optimization algorithm [36]. It can simulate the predatory behavior of whales in nature, including foraging, encircling, bubble hunting, and food searching [37]. During the foraging phase, the information is exchanged between individuals in the whale group, and the food location is determined through the information communication. Usually, the initial optimal target position is used as the food position, and the whale can approach the food by updating its position. The whale location update strategy is described as follows [38, 39]:where m is the current iteration number; A and C are coefficient matrices; B represents the distance between the whale and the food; x is the whale position, and x∧ represents the optimal position in the whale group.

The coefficient matrices A and C in equations (6) and (7) are calculated as follows:where u is a random number between 0 and 1; a decreases linearly from 2 to 0 in the iterative process.

The whales adopt enveloping and spiraling behaviors in the predation stage. To realize the contraction encirclement, A decays with a that decreases from 2 to 0. The prey is attacked through spiraling model when the food location is locked. At this time, the location search updating strategy of whales is shown as follows [40, 41]:where b (b = 1) as a constant is the spiral shape; is a random number in the interval [−1, 1]; B∧ represents the distance between the whale and the locked food.

Assume that the probability of the whale taking the action of shrinking encirclement and spiral attack is 50%, and the position updating strategy is expressed as follows:where is a random number in the interval [0, 1].

In addition, whales randomly searching for food can succeed through updating A. The whale can search for food in a larger area when and search for food in a smaller area when .where xrand represents a random position vector.

In the WOA algorithm, most of the parameters are random, and only the maximum number of iterations and population size need to be set, which is one of the advantages of the algorithm.

2.3. EWOA Algorithm

Wolpert and Macready [42] believed that no optimization algorithm can solve all optimization problems according to the “no free lunch” theory. It means that different optimization algorithms may obtain different solutions under the same issue. Therefore, developing new algorithms may achieve better results. The traditional WOA algorithm may suffer from some disadvantages even it has stronger optimization capability than Particle Swarm Optimization (PSO) and differential evolution algorithms, etc. For example, its coefficient a decreases linearly to achieve shrinking encirclement, but the dynamic changes during the iteration process are not convincing. The population diversity is also limited in the later iteration, resulting in being trapped into local minimum. The EWOA model is thus developed to resolve above problems. First, the dynamic attenuation coefficient is introduced to simulate the dynamic change situation in the shrinking and enveloping behaviors of whales during the iterative process. The mathematical model of the dynamic attenuation coefficient is defined as follows:where mmax represents the maximum number of iterations and m is the current iteration coefficient.

The value of dynamic attenuation coefficient over iterations is depicted in Figure 1. It can be seen that the coefficient value declines faster in the initial stage of the iteration, which enables locking the food position shortly. During the later iteration period, it decays slower but strengths the algorithm’s local exploration ability instead.

Aiming at the deficiency of population diversity weakening in the later iteration, an area search updating strategy is proposed. The whales migrate to other regions to search for food by the regional update frequency M during the optimization process. It can promise the population diversity reaching to a large extent, thus improving the algorithm’s optimization ability. The whale’s update position is shown as follows:where obeys Gaussian distribution.

Similar to WOA algorithm, most parameters in EWOA algorithm are random, but the maximum number of iterations, population size, and migration frequency need to be set in advance.

The process flowchart of the EWOA algorithm for searching the global optima is shown in Figure 2.

As shown in Figure 2, EWOA algorithm optimization process includes the following steps:(1)Initialize EWOA algorithm parameters.(2)Determine whether to implement area search updating strategy.(3)If the area search updating strategy is implemented, the location is updated according to equation (14); otherwise the location is updated according to equations (10) and (12) [36].(4)Update the optimal location of whale population [37].(5)Determine whether to terminate the iteration. If the iteration is terminated, the optimization is completed. Otherwise, return to step (2).

2.4. Convergence Analysis

There are five standard test functions used to analyze the model convergence efficiency. The f1, f2, and f3 are unimodal functions, where the local extremum is the global optima, while f4 and f5 are multimodal functions. The variable ranges in functions are listed in Table 1.

In addition to EWOA algorithm, PSO algorithm [26], crow search algorithm (CSA) [27], and classic WOA algorithm were chosen tests for comparison. Note that CSA algorithm is a new type of swarm intelligence optimization algorithm with better convergence performance and is suitable as a comparison algorithm. PSO algorithm is a classic optimization algorithm and is usually used as a comparison algorithm. Simultaneously, the traditional WOA algorithm is used as a comparison algorithm to compare the convergence results with the EWOA algorithm. Algorithms’ parameters are set as shown in Table 2.

In PSO, and are the maximum and minimum values, respectively. C1 and C2 are the learning coefficients. In CSA, AP is the consciousness probability. FL is the flight length. b is used to define the spiral shape in both WOA and EWOA algorithms. M represents the search update frequency in EWOA. The population size is 30, and the maximum number of iterations is 500. The algorithms are tested under a unified platform, and each optimization algorithm is repeated 30 times for optimizing each test function. The average convergence value, the best convergence value, and the worst convergence value from every test function are concluded in Table 3.

As can be seen, the EWOA algorithm presents better search outcomes than others. Obviously, the results obtained from unimodal functions such as f1, f2, and f3 consistently outperform those of multimodal functions like f4 and f5. Furthermore, PSO and CSA algorithms showed poor optimization results in multimodal functions, where the worst value is up to 28.85 in PSO. For all unimodal functions, the WOA algorithm can achieve very low optimal value close to 0, while the EWOA algorithm reaches the optimal value, i.e., 0.

The fitness function to evaluate the convergence process is defined as follows:where n is the number of training samples; Ptrain,i represents the E-commerce transaction training value; and indicates the E-commerce transaction prediction value.

The iterative convergence curves (log (Fitness)) using WOA and EWOA algorithms in various test functions are shown in Figure 3. The convergence speed of EWOA algorithm is considerably faster than WOA algorithm in all test functions, requiring much shorter iterations to converge.

3. Analysis of Multiple Influencing Factors in E-Commerce Transactions Trend

E-commerce transaction volume indicates the average value of E-commerce sales volume and E-commerce procurement volume. The E-commerce transactions sample set was collected from 2005 to 2019 in China, including the annual E-commerce transaction volume and the multiple influencing factors. The influencing factors in E-commerce transactions mainly consist of basic resource, transaction level, and economic development level. The basic resources include the Internet penetration rate (A1; unit: %), number of websites (A2; unit: ten thousand), number of CN domain names (A3, unit: ten thousand), and number of Internet users (A4; unit: 100 million). Indeed, the Internet penetration rate is to reflect the sharing degree of basic resources. Websites applied as a transaction platform are an important index on E-commerce transactions; the number of CN domain names can reflect the number of Internet service companies. The number of Internet users is to reflect the demand for shopping services via Internet.

In transaction level, the factors including express delivery business volume (A5; unit: 100 million) and express delivery business revenue (A6; unit: 100 million yuan) have a crucial impact on transactions level. Among them, express delivery business plays an important role in the E-commerce sales. On the other hand, the express delivery business revenue can reflect the level of E-commerce transactions in the express delivery industry. In economic development level, Gross Domestic Product (GDP) regarded as a macroeconomic factor is considered a key factor in the economic activity. It can reflect the situation of the E-commerce development. Therefore, GDP (A7; unit: trillion yuan) is used to evaluate the E-commerce transactions in this study.

The statistics on China’s E-commerce transactions volume and the influence factors from 2005 to 2019 are presented in Table 4. It indicates that E-commerce transaction volume (T) increases every year, i.e., from 1.29 trillion in 2005 to 34.81 trillion in 2019. Similarly, other influencing factors, e.g., A1–A7, in E-commerce transactions have a growing trend, in which A5 and A6 increase much more than others.

The gray correlation is employed to analyze the correlation degree between multiple influencing factors and E-commerce transaction volume. Initially, the dimensionless process in E-commerce transaction volume and influencing factors is implemented to reduce the difference between the numerical values. Then, the correlation coefficient is calculated. The E-commerce transaction volume is denoted as the reference sequence , and the influencing factors are denoted as the comparison subsequence . The correlation coefficients at time k are expressed as follows. The difference between the reference sequence and the comparison sequence is as follows:

The correlation coefficient between the ith comparison sequence and the reference sequence is as follows:where represents the absolute difference between the two sequences at time k, is the minimum absolute difference of the comparison sequences, and is the maximum absolute difference of the comparison sequences.

The correlation degree between the ith comparison sequence and the reference sequence is defined in the following equation:where n denotes the number of the sequence data.

The correlation degree between E-commerce transaction volume and multiple influencing factors is presented in Table 5.

It reveals that the highest correlation degree in the express business revenue (A6) reaches 0.91; the correlation degrees in Internet penetration rate (A1), number of CN domain names (A3), and number of Internet users (A4) exceed 0.8; the correlation degrees in website number (A2), express delivery business volume (A5), and GDP (A7) are below 0.8. As above, the collected data from A1, A3, and A4, A6 are considered as the input variables for the forecasting models.

4. Model Implementation with Case Analysis

4.1. E-Commerce Transaction Prediction Using the EWOA-SVM Model

Based on the integration of EWOA and SVM models, the proposed EWOA-SVM model is established to forecast E-commerce transactions trend. The architecture of prediction process is depicted in Figure 4, being demonstrated as follows:(1)Analyze the impact of multiple influencing factors on E-commerce transaction volume(2)Calculate the correlation degree between different influencing factors and E-commerce transactions through gray correlation using equation (18)(3)Select strongly related factors with E-commerce transactions as model input variables(4)Classify the training set and test set, and normalize the data(5)Construct E-commerce transaction trend prediction model using EWOA-SVM(6)Set the parameters of EWOA algorithm and SVM model [40](7)Train EWOA-SVM model using training set(8)Use EWOA to optimize the random parameters of SVM(9)Calculate the fitness values of EWOA algorithm through equation (15) [41](10)Output the optimal parameters of SVM after the training process is complete(11)Verify EWOA-SVM model using test set(12)Employ trained SVM to predict E-commerce transaction trends(13)Evaluate the forecast results of E-commerce transaction

The performance of all algorithms throughout this study was carried out using MATLAB software, and the code of core programs and datasets can be freely accessed on the web page https://drive.google.com/drive/folders/1OPlt_W_u8XHrvT_PW-wOwUBZtUL2mind?usp=sharing.

The root mean square error (rmse) [46] and fitting coefficient r2 [47] are used to evaluate the model performance, where rmse represents the prediction error, and r2 indicates the changing trend of the predictive values. When the fluctuating trend of the predictive value is closer to the real one, the r2 is closer to 1.where num is the number of testing samples; Ptest,i is the test value of E-commerce transaction; represents the predicted value of E-commerce transaction; and is the average test value of E-commerce transaction.

4.2. Case 1

Two cases were used to test the effectiveness of the proposed EWOA-SVM model, also including the SVM and WOA-SVM model for comparison. SVM model is selected to analyze the influence of random parameters on the prediction results, and WOA-SVM model is chosen to compare with the mining capability of the EWOA algorithm. The E-commerce transaction data collected from 2005 to 2014 was chosen as the training set, and the data collected between 2015 and 2019 was used as the test set. The training convergence curves in both WOA-SVM and EWOA-SVM models are presented in Figure 5. It indicates that the convergence speed of the EWOA algorithm is significantly faster than WOA algorithm. Moreover, the fitness value of the EWOA algorithm is obviously smaller than that of the WOA algorithm during the iteration process.

The test results from the performance of SVM, WOA-SVM, and EWOA-SVM models are presented in Figure 6. The specific prediction values are listed in Table 6. It is found that the prediction of WOA-SVM and EWOA-SVM models fits well with the actual E-commerce transaction curve. On the contrast, the SVM model shows a relatively higher error, particularly in 2015.

4.3. Case 2

The E-commerce transaction data collected between 2005 and 2010 was selected as the training set, and the data collected between 2011 and 2019 was used as the test set. The training convergence curves of the WOA-SVM and EWOA-SVM models are presented in Figure 7. It reveals that the convergence speed of the EWOA algorithm is faster than that of the WOA algorithm. Besides, the fitness value of the EWOA algorithm is considerably smaller than that of the WOA algorithm during iteration process. The results from the prediction performance of SVM, WOA-SVM, and EWOA-SVM models are shown in Figure 8. It indicates that the prediction accuracy of WOA-SVM and EWOA-SVM models is relatively higher than that of SVM model in general, well fitting with the actual E-commerce transactions trend. Nevertheless, the predicted result in SVM model is satisfactory except 2014 to 2015, showing more deviation from the actual values. The detailed prediction outcomes are listed in Table 7.

4.4. Evaluation of Test Results

The SVM, WOA-SVM, and EWOA-SVM models were used to predict the trend of E-commerce transactions in Cases 1 and 2. The prediction results of the model were evaluated in this section. For Cases 1 and 2, the relative error (Re%) curves from SVM, WOA-SVM, and EWOA-SVM models are shown in Figure 9, where the Re% values are concluded in Table 8.

For Case 1, the Re interval of the SVM model was [−9.61%, 1.23%]; the Re interval of the WOA-SVM model was [−2.75%, −2.98%]; the Re interval of the EWOA-SVM model was [−2.73%, 2.06%]. The fluctuation range of the EWOA-SVM model was the smallest, and the Re error of the model was significantly smaller than the other two models. For Case 2, the maximum Re value for WOA-SVM and EWOA-SVM exceeded 27%. The prediction error of E-commerce in 2013 was relatively large, but the remaining errors were less than 20%. The overall prediction effects of WOA-SVM and EWOA-SVM models were better than that of SVM model.

The prediction evaluation results using the rmse and r2 from the SVM, WOA-SVM, and EWOA-SVM models are listed in Table 9. In Case 1, the rmse values of WOA-SVM and EWOA-SVM models are contained below 0.6, where the minimum rmse value as 0.51 is obtained in EWOA-SVM model. It verifies that the rmse of EWOA-SVM model is 13.56%, 47.42% smaller than that of WOA-SVM and SVM models, respectively. However, the fitting result of the SVM model is better than others, showing its r2 value up to 99%. In Case 2, the rmse values in all three models significantly increase, being compared with Case 1. The minimum rmse of the EWOA-SVM model is 1.26, which is 14.86% and 17.64% smaller than that of the WOA-SVM and SVM models, respectively. Additionally, the EWOA-SVM model reaches the highest r2 value up to 98.42% among all models.

4.5. Discussion about Theory and Real Applications

At present, the global economy has entered a brand-new information network era. E-commerce is a new type of business operation model, which has been fast inspired with the impetus of the information technology. As E-commerce has the characteristics of wide transaction coverage, low cost, fast information circulation, and high work flow coordination, it has become a new engine for economic development. Accordingly, E-commerce transaction trend is becoming an important indicator to measure the business or economic activity level. To this end, this study proposes the EWOA-SVM model to predict the trend of E-commerce transactions, which provides a theoretical and effective tool for E-commerce development.

In real applications, a precise E-commerce transaction trend prediction can provide a decision-making basis for the governments or enterprises to formulate relevant development policies or plans in future business or industrial investments. The proposed model in this study can mine the crucial factors with high correlation degree in E-commerce transactions and construct E-commerce transaction trend correlation indexes. Importantly, it can be applied to logistics enterprises, Internet enterprises, and other information network companies in their business behavior.

5. Conclusions

In this study, the model training data was collected from E-commerce transactions volume and the influence factors, e.g., A1–A7, between 2005 and 2019. This sufficient data support and robust EWOA network structure can effectively alleviate the overfitting problem. The evaluation results of each model have been given more evidence with discussion to clarify this issue. The main contributions in this paper are concluded as follows:(1)A dynamic search coefficient and search updating strategy are combined to solve WOA algorithm’s limitations. Accordingly, the EWOA algorithm can reach the global optima, i.e., 0, for multimodal functions, indicating a strong ability to escape from local minima.(2)The express delivery, Internet penetration rate, number of CN domain names, and number of Internet users are confirmed as the most crucial factors in the E-commerce transactions trend.(3)The evaluation results demonstrate that the EWOA-SVM model is superior to existing algorithms in the prediction of E-commerce transaction trend. For example, rmse of the EWOA-SVM model for Case 1 is 13.56% smaller than that of the WOA-SVM model and 47.42% smaller than that of SVM model. In Case 2, the rmse of the EWOA-SVM model is 14.86% smaller than that of the WOA-SVM and 17.64% smaller than that of the SVM model.

In the future work, it suggests that additional influencing factors in E-commerce transaction trend may be extended in practical circumstances. Besides, the generalization ability for various data prediction may be improved further.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare no conflicts of interest with respect to the research, authorship, and/or publication of this article.