Abstract

This paper introduces an entropy-based belief function to the forecasting problem. While the likelihood-based belief function needs to know the distribution of the objective function for the prediction, the entropy-based belief function does not. This is because the observed data likelihood is somewhat complex in practice. We, thus, replace the likelihood function with the entropy. That is, we propose an approach in which a belief function is built from the entropy function. As an illustration, the proposed method is compared to the likelihood-based belief function in the simulation and empirical studies. According to the results, our approach performs well under a wide array of simulated data models and distributions. There are pieces of evidence that the prediction interval obtained from the frequentist method has a much narrower prediction interval, while our entropy-based method performs the widest. However, our entropy-based belief function still produces an acceptable range for prediction intervals as the true prediction value always lay in the prediction intervals.

1. Introduction

Analysts are concerned with the uncertainty of the future. Reliable and timely information about future conditions is important for policy makers and planners to propose an appropriate plan and make a correct decision in the current situation. In addition, the assessment of future conditions is essential to reach reliable results and rational decisions. For example, in the economic research field, predicting the future of the economy has become a common practice (see, Mitchell [1] and Lahiri and Monokroussos [2]). In recent years, the need to forecast in economics has become even more important due to high fluctuation in the global economy. Whatever the approach used, a forecast cannot be trusted unless it is accompanied by some measure of uncertainty. Thus, in this context, it is highly desirable to have subjective probabilities or predicted intervals for the prediction [3]. In addition to the economic field, the prediction is also important in finance, medicine, engineering, and biology in order to provide an efficient solution for any current problems.

Since the introduction of the Dempster–Shafer theory of belief functions [4, 5], new formal frameworks for handling uncertainty have been proposed and have attracted increasing interest in various areas. In this approach, a piece of evidence can be represented by a belief function, which is mathematically equivalent to a random set [6]. Although the approach is conceptually simple and easy to apply in various problems and models, statistical methods based on belief functions have not gained much acceptance. Kanjanatarakul et al. [7] mentioned that this approach cannot be applied easily to the complex statistical models in many areas, such as machine learning or econometrics. Thus, they suggested Shafer’s approach to statistical inference using belief functions, based on the concept of likelihood. Shafer [5] extended Dempster’s seminal work and proposed a belief function that is constructed from the likelihood function. It was recently justified by Denoeux [8] from the following three basic principles: likelihood principle, compatibility with Bayesian inference, and the least commitment principle.

The likelihood-based belief function has gained an increasing interest in the last few years. Previous works using the likelihood-based belief function in forecasting models are found in many studies such as Kanjanatarakul et al. [3], Kanjanatarakul et al. [7], Thianpaen et al. [9], and Chakpitak et al. [10]. However, it is often very difficult to specify the appropriate likelihood distribution for such models. If the distribution assumption is not specified properly, it might lead to model misspecification and bring about a prediction bias. To overcome this problem, an approach based on information theory, the classical maximum entropy (ME) principle, which was introduced by Jaynes [11], is proposed in this study. The ME principle has opened the way for a whole new class of estimation methods. The estimation allows us to extract all the available information from the observed data with minimum assumptions and avoid specifying a likelihood function. In this study, we consider the Generalized Maximum Entropy (GME). We modify the likelihood-based belief functions by using an entropy function in place of the likelihood function. Therefore, we propose an approach in which a belief function on is built from the entropy function. The advantage of this method is that the statistical inference will not assume any distribution in the plausibility which refers to the relative entropy. Hence, the method becomes more flexible to explain a wide range of variation in the observed data.

Although the likelihood-based belief function has been proved to be a reliable interval prediction method (Denoeux [8]) applications to forecasting have remained limited in some applications. The reason might be that the likelihood-based belief function requires the statistician to provide the likelihood distribution function, which is problematic when no likelihood knowledge, or only weak information, is available. The purpose of this paper is to introduce a new approach for forecasting future observation as well as providing a prediction interval. Different from the likelihood-based belief function, we replace the parametric likelihood function with the entropy function to derive the entropy-based belief function, and thus the model misspecification is avoided since no distributional assumption is made.

In this paper, we further explore this proposed approach by proving that, under weak distribution function information, the predictive entropy-based belief function converges to the true probability distribution of the not yet observed data. Several simulation studies are conducted to investigate the performance of our proposed method. Finally, we apply our proposed method to Autoregressive (AR) model in predicting the growth rate of Thailand’s GDP where data is scarce and involves high uncertainty. Besides, we further explore this method by showing that it can be used for predicting future observations based on the linear regression framework using the data from the operation of a plant for the oxidation of ammonia to nitric acid, measured on 21 consecutive days. We note that our efficient method allows for accurate decision-making based on reliable forecasts in various research fields.

The remainder of the paper is organized as follows. Section 2 provides the background on the linear entropy-based belief function concept and describes the estimation and prediction using the entropy-based belief function methods. Section 3 proposes a simulation study and real application study in various fields to evaluate the performance of our method through autoregressive and regression models. Section 4 summarizes the paper.

2. Methodology

This section briefly explains the necessary background notions related to the entropy approach and introduces the connection of entropy on the belief function. The Generalized Maximum Entropy approach and parameter estimation are first presented in Sections 2.1 and 2.2, respectively. The entropy-based belief function is then introduced in Section 2.3. Finally, the basic concept for predictive belief function is presented in Section 2.4.

2.1. Generalized Maximum Entropy Approach

In this study, we propose using a maximum entropy estimator to estimate our unknown parameter s in the econometric models, say AR and linear regression models. Before we discuss this estimator for regression and its statistical properties, we briefly summarize the entropy approach. The maximum entropy concept consists of inferring the probability distribution that maximizes information entropy given a set of various constraints. Let be a proper probability of variable on support and the vector be -dimensional vector support of coefficients . Following Shannon’s entropy (1948), the summation of the entropies iswhere the probabilities are the unknown and unobservable probabilities in the entropy and . The entropy measures the uncertainty of distribution and reaches a maximum when ; in other words, if no constraints (data) are imposed, reaches its maximum value for uniform distribution.

2.2. Parameter Estimation

To better understand the entropy estimation, we give a simple example. Let us consider the simple autoregressive AR model. Let , be observed data which are expressed as a linear combination of past observations . In the AR (K), we havewhere K and are the autoregressive order and the autoregressive coefficient, respectively, and is the independent and identically distributed random white noise that is not assumed to have any distribution and is assumed to be an uncorrelated random variable .

To apply this concept to be an estimator of the regression model, we generalize the maximum entropy to the inverse problem in the regression framework. Rather than searching for the point estimates , we can view these unknown parameters as expectations of random variables with support values for each estimated parameter value , , where for all , where , and and are the lower and upper bounds of . Thus, we can express parameter aswhere are the -dimensional estimated probability defined on the support vector . Next, similar to the above expression, is also constructed as the mean value of some random variable . Here, each is assumed to be a random vector with finite and discrete random variable with support value, . Let be a dimensioned proper probability weights defined on the set such that

Using the reparameterized unknowns and , we can rewrite equation (2) aswhere the vector support and are convex set that is symmetric around zero. Then, we can construct our Generalized Maximum Entropy (GME) estimator assubject towhere and are on the interval [0, 1].

This optimization problem can be solved using the Lagrangian method which takes the following form:where the unknown multipliers are , , and vectors of Lagrangian multipliers. Thus, the resulting first-order conditions are

Thus, we have

Since the constraint requires that and , we sum both and over , and we havewhere is the estimated multiplier. To estimate the Lagrangian multipliers, we can substitute the candidate probabilities, and , into the original Lagrangian (equation (9)) Then, the estimated can be obtained by differentiating the concentrated Lagrangian (equation (9)) (conditional on the candidates and ) with respect to and set it to be equal be zero. Then, substitute these estimated parameters in equations (13)-(14). Note that and are constants for a given parameter, so the optimal probability fd15equations (11) and (12) can be rewritten as

Summing up the above equations, we maximize the joint-entropy objective equation (6) subject to the regression equation (7), by adding restrictions equation (8). The solution to this maximization problem is unique by forming the Lagrangian and solving for the first-order conditions to obtain the optimal solutions , , and (estimated values). Then, these estimated probabilities are used to derive the point estimates for the regression coefficients and error term, see equations (3) and (4).

Example 1. To better understand the entropy estimation, we give a simple example. Let consider the first-order linear autoregressive AR(1) model as defined byThe true values for and are set to be 1 and −1, respectively. The error is drawn randomly from a normal distribution with mean 0 and variance 1. We generate equation (16) using sample size T = 10. The support spaces of and are defined as while is defined as . The simulated data are shown in Figure 1. And the results are reported in Tables 1 and 2.
To estimate equation (16), we need to construct our GME problem aswhere and are the parameter estimates in this problem, see equations (3)-(4).

2.3. The Entropy-Based Belief Function

Following the three basic principles of the likelihood-based belief function, consisting the likelihood principle, compatibility with Bayesian inference, and the least commitment principle, which are justified by Denoeux [8], we modify these approaches by using an entropy function as the substitute for the likelihood function. Let be the parameter of interest, and we propose an approach in which a belief function on is built from the entropy function. In this case, the entropy ratio refers to a “relative plausibility,” and the contour function can thus be readily calculated aswhere in which and are proper probability vector mass functions on the interval [0, 1] and issubject to

Meanwhile, is defined asand the marginal contour functions on each individual parameter are

Subject to the additional constraints equations (20)-(21), this belief function is called the entropy-based belief function on induced by . The corresponding plausibility function can be estimated by asfor all . The focal sets of are the level sets of defined aswhere is uniformly distributed in . This belief function is equivalent to the random set induced by the Lebesgue measure on and the multivalued mapping from [3].

2.4. The Basic Method for Predictive Belief Function

To forecast future data , the sampling model used by Dempster [12] is introduced here. In this model, the forecast data is expressed as a function of the proper probability mass function which is obtained by past observed data and an unobserved auxiliary variable with known probability measure not depending on . Note that is the sample space of :where is defined in such a way that the distribution of is conditional on fixed . When is a continuous random variable, equation (26) can be computed bywhere is the inverse conditional cumulative distribution function (cdf) of and is uniformly distributed in . We note that, in the likelihood-based belief function case, becomes which is the inverse of some conditional cumulative distribution functions, say normal, student-t, or other parametric distributions. However, in the entropy-based belief function case, we do not need any distribution assumptions on . Let us compose with , and we get a new multivalued mapping form defined as

The predictive belief function and plausibility are then induced by the multivalued mapping and on as follows:for all .

3. Simulation and Application: Entropy-Based Belief Function

Let us now turn our attention to examining the performance of the entropy-based belief function. In this section, the inference and forecasting methodology outlined in the previous section will be applied to the simulated data and real data. We apply the entropy-based belief function to the forecasting AR and regression models which will be presented in Sections 3.1 and 3.2, respectively.

3.1. The Entropy-Based Belief Function in Autoregressive Model Problem
3.1.1. Prediction

As we have seen in Section 2, the estimation problem is to make statements about some probability after observing some data. To describe the prediction method for the AR model, let us consider the case where the K = 1; thus, the first-order linear autoregressive AR (1) model is defined bywhere is a predicted value of which is not yet observed and is the observed value at time .

The main idea is to estimate as a function of , which is viewed as an expectation of probability with support and some pivotal variable , whose distribution does not depend on . We can write equivalentlywhere is the quantile function of empirical distribution based on the estimated residuals, . Then, in the case of entropy approach, we rewrite equation (30) aswhere is the number of supports of and . Consequently, in the entropy estimation, we rewrite the focal set equation (31) as

To forecast , the predictive belief function and plausibility function can then be approximated using Monte Carlo simulation as in the following procedure (Kanjanatarakul et al. [3] and Kanjanatarakul et al. [7]:(1)Draw and independently from uniform [0, 1] N draws. Then, compute (or approximate) the focal set which is the interval defined by the following lower and upper bounds:where and are the lower and upper bounds which can be computed using a constrained nonlinear optimization algorithm. This optimization problem can be solved using the nonlinear optimization:which subject to(2)We can then approximate the predictive belief () and plausibility () functions on asor all . These lower and upper cdfs of the predictive belief function are approximated using N = 5000 randomly generated focal sets (Algorithm 1).

Require: Desired number of focal sets , number of support vector , support spaces of coefficients and error terms .
  fortodo
   Draw from uniform [0, 1]
   Search for such that ,
   
   and .
   Search for such that ,
   
   and
   
  end for
3.1.2. Experiment Study

In this section, we conduct a simulation and experiment studies to investigate the finite sample performance of the proposed Shannon entropy-based belief methods. The simulated data are constructed from the AR model using different error distributions consisting of normal, Student-t, skewed Student-t, and uniform. We note that the normal and student-t are selected as symmetric distribution, while skewed student-t and uniform are typical examples of highly skewed distribution and flat distribution, respectively. In these experiments, we consider three competing likelihood functions: normal, student-t, and skewed student-t likelihoods.

To compare these likelihood-based belief functions, the one-step-ahead forecast is computed from simulated data and compared with the true prediction value. We note that, as the function has a unique maximum , it may be taken as a point prediction of the AR (1) model-based belief function approach. This is to say can be viewed as the most plausible value of given the past information. Thus, we need a criterion to gauge the performance of the method. Here, the mean square error method is employed, and it can be defined aswhere  = 20 is the number of repetitions, is the true value, and is the predicted value.

In this Monte Carlo simulation, we consider sample size and .  = 20 samples are generated for each case. To make a fair comparison, the error terms are generated from , , , and . Then, the sampling experiments are based on the AR (1) model:

In this experiment, the first 20 observations are used to predict the 21st observation. Likewise, the first 40 observations are used to predict the 41st observation. Then, the most plausible predicted values, and , are compared with , and , respectively. In the case of the Shannon entropy, the support is initially set to (−5, 0, 5) and the supports for to (−3, 0, 3), where is computed from the conventional AR model. Table 3 reports the prediction error MSE of our considered estimators. Consider the performance of the proposed estimator when the errors are generated from a uniform distribution, and we found that the error of the Shannon entropy-based belief function is smaller than the other methods over the one-step-ahead forecast. In addition, the performance of this method is better when the sample size increases from 20 to 40.

Next, we investigate the performance of the proposed estimator when the errors are generated from normal, student-t, and skewed student-t distributions and compare with the parametric likelihood-based belief function. We observe that the overall result is different in Table 3. We find that the Shannon entropy-based belief function seems to outperform the likelihood-based belief function in terms of lowest MSE of the prediction when the error distribution is assumed. When the errors are generated from other parametric distributions, we find that the likelihood-based belief function method with the correct distribution function always has the best performance. However, we notice that our Shannon entropy-based belief function performs the second-best prediction over the wide range of error distributions.

Thus, in this simulation study, we can conclude that our method performs well particularly when the error is unknown or has a uniform distribution. Nonetheless, when the error is generated from known distribution, the parametric likelihood-based belief function outperforms the entropy. This means that if the distribution is known and we estimate the model based on the true distribution, the accurate result is obtained. However, the error distribution is sometimes unknown in practice, these results confirm the advantage of the Shannon entropy-based belief function, which performs uniformly well over a wide range of error distributions.

Furthermore, it would be more interesting to consider other entropy measures such as Renyi entropy [13] and Tsallis entropy [14]; we thus introduce these two entropy-based belief functions to investigate the performance of various entropy measures. We note that Renyi entropy is the generalization of Shannon entropy while Tsallis entropy is nonlogarithmic and is obtained through the joint generalization of the averaging procedures and the concept of information gain. These two entropies are indexed by an order . The Renyi entropy is formulated aswhile Tsallis entropy can be defined aswhere the value of is a positive constant and depends on the particular units used. For simplicity, we set and order (see [15]). Note that both Renyi and Tsallis entropies become the Shannon entropy as a special case when . To construct the Renyi entropy-based belief function and Tsallis entropy-based belief function, we can replace Shannon entropy in the relative plausibility (equation (18)) by Renyi and Tsallis, respectively.

The results of these three entropies for comparison are shown in the last three columns of Table 1, and it is evident that the Shannon entropy-based belief function still performs the best prediction performance in this simulation data. However, the performances of these three entropies are not much different in terms of MSE.

Example 2. Application to predicting GDP growth by the AR model.
In this example, the real data are also used to investigate the performance of our method. The dataset considered, derived from the Thomson Reuter Datastream and World Bank database, consist of yearly data, from the ending of 1995 to 2018, of Gross Domestic Product (GDP). The examined data series is transformed into the growth rate and plotted in Figure 2. Then, we perform the Augmented Dickey–Fuller unit roots test on GDP growth and determine the lagging order number of the AR model according to Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The ADF-statistic result is −4.337, which is greater than the critical values at 10% significance level and means that GDP is stationary. Then, the lag selection for the AR model is shown in Table 4. The result shows that the AR model with lag 1 provides the lowest AIC and BIC.
To evaluate the performance of prediction, the forecast evaluation period is from 2014 to 2018. We perform a recursive forecasting to update GDP growth year by year, totally 5 years, which give us enough data points to evaluate the out-of-sample forecast performance. For this evaluation period, we compute the root mean square forecast error (RMSE) and mean square error (MAE) of forecasts at each forecast horizon (from one to five years ahead). We can compute the RMSE and MAE as in the following:where  = 5 and .
The forecast comparison of the AR(1) model with the likelihood-based approach and the entropy-based approach, as well as the conventional frequentist method, performed in out-of-sample is reported in Table 5. The out-of-sample forecasts made for the 2014–2018 period are reflected by the most plausible prediction values and their large upper-lower prediction intervals. The true prediction values are shown in the first column. The estimated conditional expectation as well as the frequentist 95% prediction intervals are shown in column 2. Columns 3–7 show the most plausible predictions and quantile intervals for the normal, student-t, skewed student-t, and the three entropy-based belief functions, respectively.
We can now consider the performance of different methods. First, we evaluate the entropy-based method regarding the prediction value. Second, we compare the performance regarding the range of the prediction intervals.
In the first evaluation, the smaller the MAE and RMSE, the superior the corresponding model’s performance. According to Table 5, we learn that all three entropy-based belief functions outperform the other likelihood-based methods. We can observe that the values of RMSE and MAE of entropy-based belief functions are a bit smaller than those of the likelihood-based belief functions. Again, the Shannon entropy still provides lower RMSE and MAE compared to the other two entropies. We expect that if the real data has a distribution and the assumption of the likelihood-based belief function is not correctly specified, a less accurate prediction is obtained. In addition, small sample size may affect the accuracy of the estimated parameters. Hence, in this study, we conclude that entropy-based belief function for AR (1) performs well in the out-of-sample forecasts relative to other competing methods.
In the second evaluation, for the prediction intervals for Thai GDP growth, the methods suggest reliable prediction intervals around the predicted value. We can observe that all prediction interval methods for the AR (1) model are reliable as the true values exist along with the range of the intervals, except for the 95% frequentist approach. The prediction interval based on the 95% frequentist for Thai GDP growth in 2014 does not cover the true value. This result reveals that the conventional frequentist approach may not provide the appropriate prediction value.
Regarding the prediction intervals, we can calculate the average of the prediction intervals for making a comparison. The results portray the difference between the methods exists. The frequentist method has a very narrow mean of the prediction intervals, while our entropy-based method performs the widest. We suspect that our approach relies on the empirical distribution assumption on the error distribution and leads to a larger simulated error in the Monte Carlo algorithm for approximating a predictive entropy-based belief function process. However, this experiment shows that our method produces an acceptable range for prediction intervals as the true values always lie in the intervals.
We now use the entropy-based belief function method to predict Thai economic growth in the future. The prediction is very important because of its relevance to the current policy debate on whether Thai GDP growth will be up or down. The forecasts made for 2019 is based on scant data, which is reflected by large upper-lower cdf intervals (Figure 3). Figure 3 presents the lower and upper cumulative density functions (cdfs) for the prediction problem obtained from (black solid line) and (pink solid line). We also plot the vertical blue dotted line corresponding to the most plausibility value together with the quantile intervals for the GDP growth in 2019. We can observe that the value of growth rate is around 0.03961 (3.961%) in 2019 and the prediction interval 95% ranges (−0.0391 – 0.0558) (pink dashed line) and (0.0630 – 0.1124) (black dashed line).
Furthermore, to show the accuracy of our estimation in this study, we show the contour plots of plausibility in two-dimensional parameter subspaces of each method, as shown in the Appendix. These contour plots show two-dimensional slices of the contour function, where one of the three parameters is fixed to its maximum normal likelihood, two of the four parameters are fixed to its maximum student-t likelihood, three of the five parameters are fixed to its maximum skewed student-t likelihood, and of the probability parameters are fixed to its maximum entropy. The results of these contour plots show that the estimated parameters of each method are likely to reach the maximum plausibility, as presented by the red cross. Consider the contour plots of for our proposed method, entropy-based belief functions. We can summarize the process graphically by plotting the optimization on a contour plot of the plausibility . We can observe that, for example, the estimated probabilities and reach a maximum plausibility at the red cross.

3.2. The Entropy-Based Belief Function in the Regression Problem
3.2.1. Regression Problem

Consider the simple linear regression model:where is observed dependent variable at time , is an independent variable at time , and is the error that is not assumed to be any distribution when the entropy-based approach is used.

3.2.2. Prediction with Certain Inputs

Similar to the AR model, the prediction with linear regression can be easily computed using the method described in Section 3.1.2. Let be the future value which cannot be observed at time and be the observed inputs or covariates at time :where is a predicted value which is not yet observed and is the observed value at time . We can write equivalently,

Under the entropy approach, we rewrite equations (45)-(46) as

To forecast the future observation conditional on all information of the observed inputs at time, , , the predictive belief function and plausibility function can then be approximated using Monte Carlo simulation as described in Section 3.1.2. We, then, compute the maximum and minimum of the linear regression function subject to for minimum and maximum problems. The additional constraints are

Example 3. As an example, we illustrate the proposed method by applying them to the data from the operation of a plant for the oxidation of ammonia to nitric acid, measured on 21 consecutive days. The data are analyzed by Brownlee [16] in a simple regression setting, and it is already available within R software. The dataset consists of 4 variables, namely, air flow to the plant (AIR), cooling water inlet temperature (WATER), acid concentration (percentage minus 50 times 10 (ACID), and ammonia lost (percentage times 10) (LOSS). We consider LOSS as the dependent variable, and AIR, WATER, and ACID as independent variables. Thus, we consider the following linear regression model:By using the Maximum likelihood (ML) and GME estimations, we can obtain the estimated parameters, standard errors, and values. Table 6 provides the estimated results of our ML and GME estimates altogether with some basic statistics (standard errors and values) and the plausibilities (see [7]). However, in the case of GME estimation, the computation of the plausibilities is different. Recall that, in the entropy approach, the estimated parameter of the model cannot be obtained directly from the estimation, but it is computed as the summation of all expected support points. Thus, it would be reasonable to have . We know that if , then . Hence, we turn our attention to test . In our approach, suppose and we have , whenever . To compute this, we need to maximize the entropy under the usual constraints and additional constraint . Therefore, we can further compute the . In this test, we test whether is less than . Let be the optimal probability of variable at support . If , the hypothesis of the coefficient equal to zero is rejected, indicating that this coefficient is significant.
Figure 4 shows the marginal contour functions for parameter ACID obtained from the likelihood-based approach, and we can observe that the is greater than 0.10, indicating that the acid concentration variable is said to have insignificant effect to ammonia lost. In Figure 5, the marginal contour functions of are plotted for parameters AIR , WATER , and ACID . We can observe thatThis confirms that there is no significant effect of ACID to LOSS.

Example 4. Continuing Example 3, we then consider the task of prediction problem. We consider the ammonia lost as the dependent variable and only AIR and WATER as the inputs since they show a significant effect on LOSS variable. We have re-estimated the model, and the result is provided in Table 7 and Figure 6. We can observe that the similar significant result of the estimated parameters is obtained for all methods. We also make a prediction of LOSS when the inputs are known at time ; hence, the prediction equation is based onwith which is assumed to be empirical distribution. We can write, equivalently,The predictive belief function on can then be approximated using the methods described in Section 3.2.2. Figure 6 displays the lower and upper cdfs of the various predictive belief functions, approximated using N = 5000 randomly generated focal sets. The prediction value of is plotted by the vertical blue dotted line as well as the quantile intervals with . We then compare the prediction performance based on prediction bias. The true value of is 15; thus, we calculate prediction bias from . According to the last row of Table 7, we can observe that the Shannon entropy-based belief function method outperforms the others as the lowest bias is obtained. Moreover, we find that the prediction interval of the Shannon entropy method provides a reliable interval. Although the 0.05 quantile interval does not cover the true value, the lower bound of its interval is closer to the true value = 15. Therefore, to confirm the reliability of the method, we extend the lower quantile interval of the Shannon entropy-based method to be and we can observe that the true observed data is contained in the 0.01 lower quantile interval (green dashed line).

4. Conclusion

In recent studies, forecasting is an important tool for decision-making and strategic planning. Describing the uncertainty of the forecasts accurately is thus a very important issue. The approach developed in this paper is to model estimation uncertainty using a belief function. The belief function is just another piece of evidence. It can be combined with the likelihood belief function and the joint belief function by Dempster’s rule. The combined belief function is then, as before, marginalized on to get the upper and lower predictive belief function.

As the information of the distribution function is generally unknown, we thus have a concern that the estimation and prediction problems that are solved in likelihood function-based belief function framework might have a bias if the observed data are not normally distributed. Therefore, this study takes advantage of the entropy approach to construct a belief function and illustrate these solutions under the autoregressive (AR) and regression contexts. Specifically, we replace the parametric likelihood function with the entropy measure (Shannon, Renyi, and Tsallis) to derive the entropy-based belief function and thus the model misspecification is avoided since no distributional assumption is made.

To validate the performance of our proposed method, the simulation study is conducted. The results reveal that our method performs remarkably well in comparison with a host of competing methods. We would like to highlight that if the distribution is known and we estimate the model based on the true distribution, the accurate result is obtained. However, the error distribution is sometimes unknown in practice, these results confirm the advantage of the Shannon entropy-based belief function, which performs uniformly well over a wide range of error distributions.

Furthermore, our entropy-based belief function is also applied to real data. In the first real application model, we conduct the out-of-sample forecasts from the year 2014 to 2018 and the results are displayed in Table 5. We find that the entropy-based belief function for AR (1) exhibits good performance in the out-of-sample forecasts relative to the other competing methods. We then apply our method to predict the growth rate of GDP in 2019 and find that the growth rate of the Thai GDP is around 3.961%. In the second real application, we apply our method to predict the future value using the linear regression with certain inputs. We can observe that the estimates under the likelihood-based methods and entropy-based methods exhibit slight differences, as expected. The forecasting bias of our entropy-based belief is lower than those via likelihood-based belief, in particular the Shannon entropy. This suggests that the entropy-based belief seems to produce more accurate estimates of the prediction under the regression context as well. From these results, it is encouraging that the use of entropy-based belief offers a better alternative in the prediction problem.

In brief, this study proposes the entropy-based belief function as an alternative method in place of likelihood predictive distributions. The method is very general and can be used with any parametric distributions. However, as the computational complexity is quite high, we may face the higher computational cost in the large datasets. We leave the performance of the computation of our method to future work. In addition, for the further studies, to make a more accurate prediction, this approach may be extended to combine a random interval encoding expert assessments of the prediction from several government agencies, investors, business enterprises, and private/public institutions. Furthermore, the forecasting performance of entropy-based belief function could be improved by increasing the order in the Renyi and Tsallis entropies. Therefore, we suggest the future study to conduct the higher order of Renyi and Tsallis entropies. Finally, it deserves comparing our approach with the posterior distribution-based belief function in the Bayesian context and Deng entropy-based belief function to confirm the performance of our entropy-based method.

Appendix

A. Out-of-Sample Prediction: Optimization Evaluation

A.1. Normal Likelihood-Based Belief Function Contour Plot

Normal likelihood-based belief function contour plot of marginal contour functions in two-dimensional parameter subspaces is shown in Figure 7.

A.2. Student-t Likelihood-Based Belief Function Contour Plot

Student-t likelihood-based belief function contour plot of marginal contour functions in two-dimensional parameter subspaces is shown in Figure 8.

A.3. Skewed Student-t Likelihood-Based Belief Function Contour Plot

Skewed Student-t likelihood-based belief function contour plot of marginal contour functions in two-dimensional parameter subspaces is shown in Figure 9.

Data Availability

In this study, we use the simulated data to show the performance of our model, and the simulation processes are already explained in the paper. For the real data analysis section, the data can be freely collected from Thomson Reuter Datastream and R-package. However, the data are available from the corresponding author upon request ([email protected]).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by CMU Junior Fellowship Program. It was also supported by the Center of Excellence in Econometrics, Faculty of Economics, at Chiang Mai University. The authors thank Prof. Theirry Denœux and three reviewers for valuable comments to improve this paper.