Abstract

Facing an increasingly competitive market, enterprises need correct decisions to solve operational problems in a timely manner to maintain their competitive advantages. In this context, insufficient information may lead to an overfitting phenomenon in general mathematical modeling methods, making it difficult to ensure good analytical performance. Therefore, it is important for enterprises to be able to effectively analyze and make predictions using small data sets. Although various approaches have been developed to solve the problem of prediction, their application is often limited by insufficient observations. To further enforce the effectiveness of data uncertainty processing, this study proposed an aggregating prediction model for management decision analysis using small data sets. Compared with six popular approaches, the results from the experiments show that the proposed method can effectively deal with the small data set prediction problem and is thus an appropriate decision analysis tool for managers.

1. Introduction

Decision analysis is one of the most important tasks for managers [1]. Effective decision-making helps managers solve operational problems in a timely manner, which is vital to remain viable in an increasingly competitive market. Uncontrollable factors and uncertain events may lead to invalid decision-making and affect business performance. Predictive analysis can help managers grasp possible future trends and reduce the impact of uncertainty on personal judgments [2], thereby assisting enterprises to make better decisions for their business.

Decisions that require immediate responses can be difficult for managers. In order to process information and effectively run an operation, managers must grasp the situation in real time through a limited number of observations [3]. An example of this is analyzing the occurrence of a new disease. If the government can make the right decisions as soon as possible, the potential harm and impact of new diseases on people’s health will be reduced. Decisions should be made in a timely manner to prevent infected people from spreading the disease. A prompt response thereby adds the management value. Building prediction models using small data sets, therefore, has a significant practical value.

Popular prediction approaches are roughly divided into three categories: time series analysis, relational models, and data mining techniques [4]. Time series analysis considers the continuous trend of data, which only needs historical data to predict the future demand [5]. It has been widely adopted to solve various prediction problems; however, it typically requires a large number of observations to obtain better forecasting performance. The relational models are used to explore the causal connection between the independent variables and the dependent variables to predict the possible outputs of the dependent variables [6]. The accuracy of the prediction depends on whether the selected independent variables can properly explain the dependent variables. Data mining techniques use algorithms to polish useful hidden information from the collected data [7]. These techniques can obtain favorable predictions through an effective learning process; however, the prediction results depend on the amount of training data and the representativeness of the training set in the population [8].

The approaches discussed above vary in their practical applications. Therefore, before building a model, data analysis must be performed to determine which approach is suitable for the collected data [9]. The modeling pretest requires a sufficient number of samples; otherwise, the evaluation may fail. This limitation makes these approaches unsuitable for prediction using small data sets [10]. One positive example is the prediction problem on the electronic commerce (e-commerce) transaction volumes in China. To plan an appropriate development strategy, it must be drafted based on new information [11]. Using observations with updated information to build a model could reflect the true situation [12]; therefore, it is valuable to use a limited number of updated observations to make predictions [13].

This research proposed a modeling procedure based on a grey system theory for developing an aggregating prediction model that combines various approaches instead of determining the most suitable prediction model. The perspective of model integration is used to solve the small data set prediction problem, with the aim to improve the stability of the prediction results by combining the advantages of various approaches. Specifically, the proposed method is designed as a two-stage modeling procedure. First, four methods are assessed through grey incidence analysis to determine whether the trend of the real series can be reflected. Second, a robust compound prediction model for small samples is built by the weighted average method. In addition, a pretest is performed to evaluate the feasibility of the proposed method before trend forecasting. Results from these experiments show that the proposed method produces favorable predictions under small data sets to solve the encountered problem. Because the proposed method reduces the decision risk, it is considered a practical tool for small data set prediction.

The remainder of this article is organized as follows: Section 2 describes the proposed method. Section 3 presents the data analysis and comparison among the various approaches. Finally, the conclusion is discussed in Section 4.

2. Methodology

This study aims at solving the prediction problems when the available samples are limited. Although popular prediction approaches (such as statistical methods, data mining, and artificial neural networks) have acceptable performance in normal applications [1416], they are not directly applicable to small data set analyses due to limited information. Therefore, this study proposed a modeling procedure to integrate the advantages of various popular approaches for this specific problem. This section details the concepts and steps of the proposed method.

2.1. Conceptual Design

Popular forecasting approaches usually have their own limitations and scope of applications [7, 17]. Therefore, it is necessary to conduct a pretest through data analysis to determine which approach is more appropriate before formal trend forecasting. A robust forecasting technique is very important for effectively grasping future trends [18]. For this reason, this research proposed a relatively robust compound prediction model based on grey system theory, which is called the grey-based aggregating model (GAM).

The proposed method accumulates the advantages of various approaches by applying the viewpoint of compounding models, thereby improving the prediction performance. The proposed method is a two-stage modeling procedure, i.e., four popular methods are used to obtain the basic predicted values, and then, the proposed method is used to obtained grey-based weights to identify the final predicted values.

2.2. Basis of Aggregating Model

Four fundamental prediction techniques are selected as the basis of the proposed model here, namely, grey model (GM), linear regression (LR), backpropagation neural network (BPNN), and support vector regression (SVR). GM is an important technique for managing insufficient information, which is easy to implement and can bring accurate predictions under small data sets [9]. LR is a commonly used numerical prediction method due to its implementation being not complicated, and it can produce good results when data follow a linear trend [17]. BPNN is widely used in nonlinear data analysis, which is a modeling method with excellent learning ability [7]. SVR is a statistical learning method that can overcome the difficulty of nonparametric prediction with limited samples [7].

Because the above methods have their own specific conditions and the components of demand are inherently complex, it is difficult to determine which method is most suitable for demand forecasting. Therefore, this research does not recommend using a single model for analysis. Instead, the study tries to retain the advantages of the above four models to form a new compound model.

The aggregating model aims at achieving relatively robust trend prediction. Therefore, the risk of choosing the wrong model will be controlled, thereby helping managers to better cope with uncertainty. In summary, this paper proposes GAM to solve the problems encountered in demand forecasting.

2.3. Grey Incidence Analysis

The degree of grey incidence (DGI) is a pretesting measurement index commonly used in grey system theory. DGI is a technique for evaluating the fitness of a prediction model regardless of whether the sample size is large or small [19]. The basic idea of DGI is to use the geometrical similarity of the series curves to determine the relationship between two series. The more similar the curves , the greater incidence exists between the series and vice versa [20].

Theoretically, residual analysis must be applied after building a model to investigate whether the model performs acceptably [21, 22]. However, an error-based objective function is usually selected to optimize the fitting status of a developed model in the learning phase of the modeling process. If similar error indicators are repeatedly used to analyze the data, it would lead to specific deviations and nonobjective results in the pretesting stage. To avoid this phenomenon, it is a feasible option to adopt the DGI to evaluate the fitting status during the modeling process instead of using any error-based index for pretesting.

There are many kinds of general DGI. Although their development principles are different, they can effectively measure the geometric similarity between the two series. To facilitate the calculation and application, this paper adopted the similitude degree of grey incidence (SDGI) [9] as the measurement index for evaluating different prediction methods. The SDGI is suitable for evaluating the fitting status of a given model to real data, which calculates the relational similarity between two series based on their geometrical similarity. A higher SDGI value indicates that the two series are geometrically similar, and the variance of the prediction error is stable. This implies that such a method is more robust and should be given a higher weight. Therefore, the order of SDGI values is used to determine the weights of different prediction methods. The detailed steps of SDGI are described as follows:Step 0: give two paired series with n periods and .Step 1: use equation (1) to perform the zero-starting point operator to form two new series, and .Step 2: use equation (2) to subtract from to obtain a difference series .Step 3: use equation (3) to sum the area of the difference series.Step 4: use equation (4) to calculate the SDGI .

2.4. Rank-Sum Weighting Method

To obtain the final aggregating model, different forecasting models need to be combined and the most common way is to achieve it by the weighted average method. How to determine a reasonable weight for each method is something that must be overcome at this stage. To facilitate the application of the proposed method, this paper used a heuristic weighting method to set the importance of each method. The adopted method is the rank-sum weighting method (RSWM), and its calculation formula is equation (5), where m is the total number of prediction methods and Rj is the rank of the methods. The method ranked first has the highest reference value, and its numerator is exactly equal to the total number of methods m; conversely, the method ranked last has the lowest reference value, and its numerator is exactly equal to 1.

2.5. Modeling Procedure of the Proposed GAM

The GAM is applied to combine the final prediction outputs for grasping the development trend of China’s e-commerce. The detailed processes of GAM are as follows:Step 0: give an initial series with n period .Step 1: apply as the training samples to establish forecasting models based on BPNN, GM, LR, and SVR and then obtain the fitted series of each established model; they are , , , and Step 2: calculate the SDGI (Section 2.2) for each paired series between the fitted values and initial values to obtain , , , and Step 3: sort the SDGI of each method from largest to smallest and get the ranking value of each method; they are , , , and Step 4: use RSWM to determine the weights of each method and obtain , , , and as the final model weightsStep 5: apply the weighted average method to aggregate the final forecast value

2.6. Feasibility Measurement

An effective prediction model must bring accurate forecasting results. It is therefore necessary to evaluate prediction methods using an error-based index, and only methods that pass inspection can be used for future trend forecasting. In this study, the mean absolute percentage error (MAPE) is selected to determine the modeling performance in the pretesting phase. The MAPE can assist managers in assessing the possible risks of using different forecasting tools. Equation. (7) is the calculation formula of the MAPE, where and are the predicted and actual values, respectively.

In the pretesting stage, the prediction results of the proposed GAM are compared with those of six popular prediction techniques to confirm whether the proposed method can provide more robust prediction results. The machine learning software used here is Weka 3.6.11, and the models are built with default parameter settings. The GM chooses the typical first-order one-variable grey model.

2.7. Rolling Framework

Time series forecasting emphasizes the immediateness of information, and the rolling framework is a process that allows data to be metabolized for achieving this purpose. For example, four given pieces of data are used to predict the next value with the prediction techniques. After the prediction is acquired, the newly predicted output is added to the data set to replace the oldest datum . Subsequently, the updated data set is used to obtain the next predicted value . The process is repeated until all desired predicted values are found.

3. Experimental Results

The effectiveness and applicability of the proposed GAM are validated using a real case in the following sections.

3.1. Data Description and Experimental Design

This experiment verifies the effectiveness of the proposed GAM in processing China’s e-commerce transaction volume forecast. The study included data collected from the National Bureau of Statistics of China on the total amount of e-commerce transactions. The data set contains ten-period annual observations ranging from 2011 to 2020 (Table 1). The unit of measurement in this table is trillion Chinese Yuan.

E-commerce is one of the current main business modes based on network communication technology, which is an important driving force for the integration and development of the physical transactions and the digital economy [23]. Under the impact of coronavirus disease 2019 (COVID-19), e-commerce has quickly become an indispensable part of people’s lives [24]. E-commerce allows for economic and commercial activities with reduced interpersonal contact leading to the decrease in transmission of viruses [25]. In the postpandemic era, the causal relationship between the development of e-commerce and economic growth is obvious.

To maintain the momentum of economic development, the support of the e-commerce operating environment is necessary [26]. The formulation of development policy is therefore critical. It not only guides the operation of industries but also affects the consumption habits of people [27]. The work of creating an e-commerce operating environment (for example, formulating laws and regulations, determining industry standards, cultivating talents, and building logistics facilities) usually requires long-term efforts to achieve an acceptable result [28], and improper policy directions could bring substantially negative effects [29]. An adequate e-commerce transaction volume prediction is a prerequisite for formulating an effective development policy as it reduces the possibility of errors in policy planning [2]. Therefore, an accurate prediction for e-commerce transaction volumes has important practical significance for governments [26]. Effectively determining the trend of e-commerce transaction volumes helps the government draft an industrial development strategy, which is crucial for economic recovery after COVID-19. In order to reflect the current situation, appropriate e-commerce development policies should be based on updated and relevant information.

In the experiment, four data points are used each time to build a model for predicting the next output. That is, 2015’s predicted value is inferred from the model built based on the data from 2011 to 2014. In the pretesting stage, the four techniques mentioned above are first used to obtain a total of 24 models and 24 predicted values. Next, these predicted values are used to determine the weights required to generate the aggregating model. The final predicted value is obtained by the weighted average method.

3.2. Modeling Example of the Proposed GAM

This section explains the modeling details of the proposed GAM. First, the four prediction techniques of GM, LR, BPNN, and SVR built six models and obtained six corresponding predicted values (columns 3 to 6 of Table 2). Second, the SDGI between the actual value series and each predicted value series was calculated, and , , , and could be obtained. Third, the rank of each prediction technique was determined according to the SDGI, that is, , , , and . Fourth, the weight of each method was obtained by the RSWM, which are , , , and , respectively. Last, through the weighted average method, the final GAM was combined as , and the corresponding predicted values were obtained by this final model (column 7 of Table 2).

3.3. Comparisons

In the pretest stage, the prediction results of GAM were compared with those obtained using the six popular methods. These methods are GM, LR, BPNN, SVR, radial basis function network (RBFN), and Gaussian process regression (GPR). The MAPE of the proposed GAM is 2.84% (Table 3), which is the best performing one of these methods. Its value is less than 5% and falls within the level of highly accurate forecasting (Table 4 [30]), indicating that the proposed method is appropriate for use to predict China’s e-commerce transaction volume. In addition, compared with four basis prediction techniques: GM, LR, BPNN, and SVR, the GAM has higher prediction accuracy. This shows that the proposed method can improve the prediction performance and bring a favorable forecast.

3.4. Future Trend of China’s E-Commerce Transaction Volume

To grasp the future trend of China’s e-commerce transaction volume, this study integrates the rolling framework into the proposed GAM to improve its trend prediction performance. A schematic diagram of the rolling framework used in this study is shown in Figure 1.

Table 5 shows the predicted values of e-commerce transaction volume in China for the next five years, obtained using the proposed GAM with the rolling framework. According to this prediction, China’s e-commerce transaction volumes will show a steady upward trend from 2021 to 2025.

By 2025, e-commerce transaction volumes in China are expected to be 45.566 trillion Chinese Yuan, which reflects an increase of approximately 22% relative to the transaction volumes in 2020. Under this development trend, the Chinese government should continue to invest in improving the e-commerce operating environment. The improvement of the quality of e-commerce transactions will not only increase people’s consumer satisfaction but also help the country’s economic growth.

4. Conclusion and Discussion

To maintain an operational advantage in a highly competitive environment, enterprises must respond quickly to business problems, and it is essential to be able to make timely and correct decisions. Decision contexts often involve many uncontrollable factors and uncertain events. To overcome this unmanageable uncertainty, businesses must employ the right analytical techniques. Predictive techniques can help managers grasp future trends, mitigate the effects of uncertainty, and lead to meaningful decisions. In a variety of management situations, due to cost and time considerations, it is often impossible to obtain sufficient information, making decisions that require immediate response a more difficult task for managers. If managers can grasp the situation in real time through a limited amount of observation and carry out appropriate processing, effective operation management can be obtained. Therefore, building prediction models under small data sets have significant practical value.

Although popular prediction techniques have acceptable performance in normal applications, they are not suitable for prediction problems with insufficient information from small data sets. Grey system theory is a technique used for small data set analysis [31], and its research scope involves the problems encountered in this paper. Therefore, a modeling procedure based on grey system theory is proposed to integrate the advantages of various approaches to this specific problem and then obtains a more robust prediction output. Through the verification of China’s e-commerce transaction volume, the results from this experiment exhibit that the proposed GAM can produce favorable predictions with a MAPE as low as 2.84%. These results imply that the proposed method is useful for decision analysis with limited data. The proposed procedure outperforms the single popular methods in the experiment. Furthermore, the results obtained using data mining and statistical learning-based methods (such as LR, BPNN, SVR, RBFN, and GPR) are not superior to the proposed method, which may be due to small sample sizes. These approaches typically require a sufficient training data set to prevent overfitting and obtain robust models. If the training data set is large enough, the prediction performance of LR, BPNN, SVR, RBFN, and GPR should improve. Finally, the prediction shows that China’s e-commerce transaction volumes are steadily increasing, indicating that the investment of resources to improve the e-commerce operating environment is in line with the direction and interests of China’s economic development.

The proposed GAM can improve the stability of the forecasting results by aggregating the advantages of several models and is considered a feasible tool when only small data sets are available. In the future, introducing heuristic methods into the modeling procedure to improve prediction accuracy is a valuable research direction. In addition, the proposed method can be combined with some data preprocessing methods (for example, the virtual sample generating technique) to further improve its ability to handle problems with small data sets. Moreover, using more training samples to confirm the predictive power of the proposed GAM may be a worthwhile research direction. Finally, the proposed method should be used in other fields, such as finance, industry, engineering, and energy, to prove its reliability, validity, and practical value.

Data Availability

The data used in the experiment are listed in this article; anyone can use these data by citing this article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the Social Science Planning Project of Fujian Province (China) under Grant FJ2019B099, the Natural Science Foundation of Fujian Province (China) under Grant 2021J01326, and the Science and Technology Planning Project of Quanzhou city (China) under Grant 2019C096R.