Abstract

This work compares Autometrics with dual penalization techniques such as minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) under asymmetric error distributions such as exponential, gamma, and Frechet with varying sample sizes as well as predictors. Comprehensive simulations, based on a wide variety of scenarios, reveal that the methods considered show improved performance for increased sample size. In the case of low multicollinearity, these methods show good performance in terms of potency, but in gauge, shrinkage methods collapse, and higher gauge leads to overspecification of the models. High levels of multicollinearity adversely affect the performance of Autometrics. In contrast, shrinkage methods are robust in presence of high multicollinearity in terms of potency, but they tend to select a massive set of irrelevant variables. Moreover, we find that expanding the data mitigates the adverse impact of high multicollinearity on Autometrics rapidly and gradually corrects the gauge of shrinkage methods. For empirical application, we take the gold prices data spanning from 1981 to 2020. While comparing the forecasting performance of all selected methods, we divide the data into two parts: data over 1981–2010 are taken as training data, and those over 2011–2020 are used as testing data. All methods are trained for the training data and then are assessed for performance through the testing data. Based on a root-mean-square error and mean absolute error, Autometrics remain the best in capturing the gold prices trend and producing better forecasts than MCP and SCAD.

1. Introduction

In the regression analysis, it is the core concern of researchers to discover the key predictors for achieving better prediction of the response variable. Therefore, identifying the potential predictors for knowledge discovery and boosting the predictive power of the model are very beneficial [1]. However, to construct a linear regression model, variable selection is one of the most vital steps. In practice, a large number of predictors can raise the variance of the fitted model, and selecting several predictors may result in unpredictable output or biased results. In other words, incorporating more predictors in the model may cause high variation in the least-squares fit, which, in turn, results in overfitting the model, and hence, it yields a poor forecast for the future [2]. Furthermore, if the predictors are highly correlated with each other, then the standard error associated with each regression coefficient tends to increase, which leads to invalid inferences [35]. On the other hand, missing a single important predictor may lead to model mis-specification, and the conclusion drawn on the basis of a particular model could be misleading [6].

In the recent era, a considerable chunk of research is focused on the analysis of “high-dimensional” data in the discipline of finance and economics. Resultantly, a considerable focus is being paid to the varieties of techniques that are applicable in the domain of data mining, dimension reduction, and machine learning [7, 8]. Among them, penalization techniques and Autometrics are very popular to handle huge data sets [9].

Many studies exist in the literature in which the performance of Autometrics is determined theoretically as well as empirically. Some of them are [916]. Similarly, many researchers have evaluated the penalization techniques under time series set up such as Mol et al. [17], Inoue and Kilian [18], Bai and Ng [19], Kim and Swanson [20, 21], Luciani [22], Swanson and Xiong [8, 23], Swanson et al. [24]; and Maehashi and Shintani [25].

In the above papers, penalization techniques are often compared to each other, and just a few papers have compared the Autometrics with penalization techniques such as least absolute shrinkage selection operator (Lasso), adaptive Lasso, and weighted adaptive Lasso. To date, none of the papers has considered the modified form of penalization techniques in our context. Hence, this study contributes in two dimensions. Firstly, we consider two modified penalization techniques: minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) and compare with Autometrics theoretically as well as empirically. Secondly, the comparison is made under asymmetric error distributions instead of Gaussian.

Our study aims to compare Autometrics with improved penalization techniques including smoothly clipped absolute deviation and minimax concave penalty under several asymmetric error distributions such as exponential, gamma, and Frechet through Monte Carlo simulations. Moreover, we alter the sample size, number of predictors, and magnitudes of multicollinearity in order to determine their effect on the considered techniques. For real phenomenon analysis, we consider a financial data set.

The remaining part of the work is organized in the following way. In Section 2, we have elaborately discussed the model selection techniques and data-generating process. Monte Carlo evidence on the comparative performance of various model selection methods is discussed in Section 3. Real data applications are described in Section 4. Section 5 gives the concluding remarks.

2. Model Selection Techniques

Model selection is one of the crucial steps of empirical research throughout all disciplines, where an earlier theory does not predefine a complete and correct specification. Economics is certainly one of them, as macroeconomic processes are typically high-dimensional, nonstationary, and complicated [26]. Commonly, many different solutions have been recommended to fit the data. Hence, statistical model selection becomes a primary and ubiquitous task in empirical economic research.

Selection procedures such as information criteria, stepwise, and penalized regression are unavoidable. There can never be a consensus regarding which model is best because there is a considerable amount of criteria to assess the model’s performance. Luckily, during the past two decades, a new revolution has been existing in model building, in the form of general-to-specific modeling, indicated by gets, as contained in the computer program, named as PcGive. Computer automation of gets methods has shed light in a new way on the statistical model selection.

PcGive is a computer program that automatically selects an econometric model. It is absolutely a new approach to formulate models and particularly devised for handling economic data when the correct form of an equation under analysis is unknown. In PcGive, the automatic model selection job is performed by Autometrics. Hence, in the next section, we provide a detailed explanation of Autometrics.

2.1. Autometrics

The automated gets procedure is almost be considered a “black box”: a final model is chosen from the model that is constructed from an initial set of candidate variables. The initial model refers to the general unrestricted model (GUM). Mostly, a set of terminal candidate models is found. In such circumstances, information criteria are utilized as the tiebreaker. There is a possibility that we may choose the final GUM in the block-search procedure, which is the union of the terminal candidate models.

The aim of the automated gets procedure is that the GUM is well specified statistically, which is subjected to mis-specification testing. Hereafter, diagnostic tests guarantee that all underlying terminal candidate models clarify these tests as well. Simplication of GUM is done via path search. Such a type of search is needed to tackle the complex autocorrelation that is often present in macroeconomic data. A simplification is acceptable provided the expelled variables are insignificant and the new model is a sound chopping of the GUM. The latter condition is also known as encompassing the GUM or backtesting and, in the context of linear regression models, is based on the F-test of the removed variables.

In the application of Autometrics, reduction in -value is the principal choice to be used for backtesting and individual coefficient significance. There are some tools to eschew estimating models [27]. This method is very efficient even though the costs of statistical inference cannot be circumvented and the costs of searching are substantially low. A pair of automatic model selection frameworks that fail to fit the model within general-to-specific (gets) methodology are as follows:(1)Stepwise regression: start with the empty model and add the most significant omitted variable in the model. The highly insignificant variable is removed from the model that is observed at any stage . Hence, in every iteration, we include one significant variable and discard an insignificant variable [28]. This method is repeated till we get all the variables in the model to be significant, and all omitted variables must be insignificant.(2)Backward elimination: all predictors are entered into the initial model; then predictors are thrown one at a time starting from the least significant. The process is continued until all predictors have a -value of or small.

There exist three main differences with automated gets: (i) lack of search, (ii) no backtesting, and (iii) no mis-specification testing/diagnostic tracking. Figure 1 describes the way that how Autometrics selects the model automatically.

2.1.1. Methodology

Autometrics comprises the following five basic stages:(i)In the first stage, the linear model known as the so-called general unrestricted model (GUM) is formed(ii)In the second stage, the parameters are estimated along with testing the statistical significance of the GUM(iii)In the third stage, the presearch process is performed(iv)The fourth stage produces the tree-path search(v)In the last stage, the final model is selected

Doornik [27] elaborated the entire algorithm of Autometrics whereas the steps to run Autometrics are as follows. Start off by considering all the candidate variables in a linear model (GUM), estimate it by the least-squares method, and then verify through diagnostic tests. In case of insignificant coefficients, then simpler models are estimated utilizing a tree-path reduction search and validated by diagnostic tests. If some terminal models are detected, Autometrics undertakes their union testing. Rejected models are deleted, and the union of those terminal models who survived induces new GUM for another tree-path search iteration. This inspection procedure continues, and the terminal models are statistically assessed against their union. If two or more terminal models clear the encompassing tests, and then the prechosen information criterion is a gateway to a final decision.

2.2. Shrinkage Methods

One of the assumptions of the classical linear regression model is that there is no association among the covariates, which often does not exist in practice. If this assumption is violated, then such a phenomenon is known as the problem of multicollinearity. In presence of multicollinearity, it is a challenging task to estimate the reliable effects of a specific covariate. More specifically, the estimated coefficients have high sampling variance along with false signs, due to which both estimation and prediction are poorly affected.

An alternative most used family of methods to deal with many features is the regularization/penalization regression, which includes many methods, but our study selects the most well-known and robust methods: minimax concave penalty and smoothly clipped absolute deviation. A form of the regularized least-squares estimator is the minimizer of the given objective function:where , , and is the coefficient matrix with . Here, and m denote the number of covariates and observations, respectively. The second term in equation (1) represents the penalty function, which adopts different shapes for different procedures. The term refers to the tuning parameter that controls the amount of shrinkage. The range of tuning parameters lies between zero and infinity.

We provide the brief discussion of the following methods:Least Absolute Shrinkage and Selection Operator: the norm is defined as tends to the Lasso estimator, where refers to the tuning parameter and is selected through cross-validation [29]. norm shrinks the several regressor coefficients to zero retaining the relevant predictors only. norm shrinks the several regressor coefficients to zero retaining the relevant predictors only. If there is a high correlation among the group of predictors, then Lasso keeps only one predictor from the group. In addition, Lasso is biased in features selection [30].Smoothly Clipped Absolute Deviation: the continuous differentiable penalty function can be defined as: If the results of  > 2,  > 0, and s > 0 then the resulting penalty refers to SCAD [31], where and  3.7 as recommended by Lu et al. [32].Minimax Concave Penalty: The minimax concave penalty is illustrated as follows: , where the value of is 3.7. This procedure provides the convexity of the penalized loss in sparse regions substantially given certain thresholds for variable selection and unbiasedness [33].

2.3. Selection of Tuning Parameter

The tuning parameter λ is often selected using a cross-validation approach aimed at achieving the optimum prediction solution. It entails splitting the given data into two halves at random: a training data set and a testing data set (or holdout set). The training data set is being used to fit the model, and the fitted model will be used to anticipate the responses for the validation set data. The test error rate is estimated by the validation set error rate, which is commonly calculated using MSE in the context of a numerical response. The k-fold cross-validation method involves randomly splitting data collection into k groups, or folds, of roughly similar size, using a k-fold CV; usually, we use k that is equal to 10 or 5. The algorithm is fitted on the remaining folds, with the initial fold serving as a validation set. On the observations in the holdout fold, the mean squared error, , is calculated. This technique is repeated k times, with each validation set consisting of a distinct set of observations. MSE1, MSE2, …, MSEk are the test error estimates produced by this method. Averaging these values yields the k-fold CV estimate.

2.4. Artificial Data-Generating Process

In the recent section, we introduced some scenarios intending to demonstrate the performance of Autometrics against shrinkage methods delineated in previous subsections. We consider two types of correlation structure among covariates, that is, low (0.25) and high (0.90) with varying the distribution of error terms. Our study uses the data-generating process followed by Doornik and Hendry [13] and Wahid et al. [34] to generate artificial data as follows:where is the response variable. The set of covariates, , is generated from multivariate normal distribution as where the mean of covariates is zero and is the variance-covariance matrix. It is fact that the variance-covariance matrix contains variance and covariance together. In our case, the variance is assumed to be one, and the covariance between and is generated in the following way: [34].which permits for regulation of the degree of pairwise correlation between the covariates m and n by altering the single parameter . Furthermore, represent the regression coefficients, and is the disturbance term, which is generated from the following three asymmetric probability distributions in this study. The distributions are exponential distribution, gamma distribution, and Frechet distribution.

The reason behind the selection of these distributions from a huge set of distributions is: exponential distribution is basically a standard distribution in the literature of asymmetric distributions. Moreover, Frechet distribution, which is also known as inverse Weibull distribution, and gamma distribution are the generalized form of the exponential distribution. Mostly, the distribution of financial data is right-skewed [35].

For our study, we are considering three asymmetric probability distributions from a huge list of distributions. There are many:(i)(ii)(iii)

2.4.1. Scenario 1

We perform simulation experiments where we consider three cases of covariates: . In each experiment, we assume 15 predictors are relevant, and the remaining are irrelevant.(i)(ii)(iii)

We consider two cases of sample size . In this scenario, we generate errors of the model from an exponential distribution.

2.4.2. Scenario 2

Furthermore, this scenario is the same as the first experiment; only the errors are generated from a gamma distribution.

2.4.3. Scenario 3

This scenario is the same as the first experiment; besides, the error is generated from the Frechet distribution.

2.5. Measures of Methods Performance

There are a few ways to evaluate the models’ performance in terms of variable selection, in which we are adopting the potency and gauge. Gauge is delineated as the empirical null retention frequency that how often irrelevant covariates are retained. The comparison of Autometrics with penalization methods is evaluated in the form of correct zero identification interpreted as potency and incorrect zero identification referred to as gauge [13].

Mathematically, the gauge is delineated as follows:

The gauge indicates the irrelevance part that corresponds to the nominal significance level (α), where shows a set of irrelevant covariates in the initial model and shows the set of estimated irrelevant covariates [36].

Potency is defined as follows:

This indicates that the relevant part shows the set of relevant covariates in the initial model and points to the set of estimated relevant covariates, so the expected potency tending towards the value 1 is evidence of a good model [36]. Furthermore, we repeat each simulation experiment 1,000 times, and the expected potency and gauge evaluate the best method relatively. We use R software for the entire analysis.

3. Simulation Results and Discussion

The results of Monte Carlo experiments are illustrated in Tables 13.

Scenario I: Table 1 presents simulation results for exponentially distributed errors, with varying sample sizes and covariates. All methods are improving with increasing sample size. In the case of low multicollinearity, in almost all cases, Autometrics and shrinkage methods such as SCAD and MCP hold all the relevant predictors, but shrinkage methods also hold irrelevant predictors in a large amount. Retaining irrelevant variables often lead to an overspecified model. Increasing the level of multicollinearity, Autometrics found 61% relevant variables (potency) along with around 3% irrelevant variables (gauge), while shrinkage methods retained more than 80% relevant variables with a much higher percentage of irrelevant variables. As we increase the sample size, the potency of Autometrics is dramatically enhanced and also gains improvement in gauge. Shrinkage methods improved the gauge, but it is still very high.Scenario II: Table 2 presents the simulation results for gamma distributed errors, with varying sample sizes and covariates. All results are improving with expanding the data window. In this scenario, all methods have correctly specified the relevant variables in most cases, but shrinkage methods have retained some irrelevant variables. In other words, it can be concluded that shrinkage methods overspecified the model.Scenario III: Table 3 depicts the simulation findings for Frechet distributed errors, with varying sample sizes and covariates. The potency and gauge of almost all methods are improving with increasing sample size. In presence of low multicollinearity, all methods selected 100% relevant variables under a large sample. Autometrics has often selected around 1% irrelevant variables (gauge), while shrinkage methods have selected a large proportion of irrelevant variables. Increasing the level of multicollinearity, all methods are adversely affected. Autometrics retained 72% relevant variables with approximately 3% irrelevant variables. On the other hand, shrinkage methods hold more than 90% active variables along with a massive set of irrelevant variables. As we increase the number of observations, resultantly the potency of Autometrics improved and reduced the gauge to 1%. The improvement was achieved in the gauge of shrinkage methods, but it is still considered high.

Now we are comparing the potency and gauge of all methods across various error distributions. We can see that under the gamma distributed errors, the potency is higher and gauge is lower than the potency and gauge what we achieved under the exponentially and Frechet distributed errors.

4. Empirical Analysis

Complementing the Monte Carlo experiments, this study performs real data analysis using Pakistan financial data set. The data set consists of 12 time series observed at annual frequency spanning from 1981 to 2020 and is taken from the world development indicators, international financial statistics, Yahoo! Finance website, and international country risk guide. Among 12 variables, gold prices are the response variable, and the remaining variables are treated as predictors in this study. The predictors are selected through theories and literature to make a general model known as a general unrestricted model (GUM). Before analysis, some missing observations in the data set are replaced by averaging the neighbor observations and then standardizing the data set in order to reduce variation, which in turn provide stable results [37]. Detail regarding the variables has been given in Table 4. Table 4 describes the variables, symbols, and source of data.

From Figure 2, it can be observed that the frequency distribution of the target variable (in our case, gold prices) is right-skewed, and the boxplot in Figure 2(b) also reveals that there are some outlying observations present in the series. However, Gujarati et al. [4] considered graphical representation as an informal approach therefore to reconfirm the distribution of gold prices; we move towards a statistical test that is known as the Shapiro test.

After applying the Shapiro test, we get a -value that is almost zero; then the null hypothesis that the data are normally distributed is rejected. It implies that the distribution under consideration is highly skewed. Table 5 depicts the findings of real data considering 11 covariates. Autometrics hold GDP, IR, UEMP, TO, SP, and REER, which reveals that these covariates significantly contribute to gold prices. MCP selected all covariates except inflation (INF) and market rate (MR), and SCAD holds all covariates.

This is the fact that we do not know about the data-generating process in the real world. Therefore, it is difficult to compare the models’ performance based on potency and gauge using real data. In such circumstances, the best and a widely used alternative approach is an out-of-sample forecast for models’ assessment. But it requires dividing the data into two parts: a training set and a testing set. Thus, in this work, we split the data set into two parts: data from 1981 to 2010 are utilized to train the model, and the remaining data (2011–2020) are used to evaluate their forecasting performance. Root-mean-square error (RMSE) and mean absolute error (MAE) are computed to evaluate the forecasting performance of all considered methods, shown in Figure 3. The smaller the values of RMSE and MAE, the closer the predicted values to the actual values and resultantly indicates better forecast. The forecast errors were shown by the bar in Figure 3, which recommend that the Autometrics method outperformed the rival methods in the out-of-sample forecast. It illustrates that Autometrics has good predictive power than other competitor models in the sense that it is having the lowest prediction errors in multistep ahead forecast.

5. Conclusion Remarks

In this work, we compare Autometrics with two penalization techniques, that is, minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) under asymmetric error distributions such as exponential, gamma, and Frechet with altering sample sizes as well as predictors. Simulations using a wide variety of scenarios demonstrate that all methods improve for a large sample size. In the case of low multicollinearity, these methods perform well in terms of potency, but in terms of gauge, the shrinkage methods collapse. Higher gauge leads to overspecification of the model. The increased level of multicollinearity among regressors adversely affects the performance of Autometrics and sparingly the shrinkage methods in terms of potency. At the same time, shrinkage methods select a massive set of irrelevant variables. We have observed that expanding the data window alleviates the detrimental influence of high multicollinearity on potency associated with Autometrics rapidly and steadily rectifies the gauge of penalized techniques.

For real data analysis, we consider the gold prices data along 11 covariates spanning from 1981 to 2020. To compare the forecasting performance of the selected methods, we divide the data into two parts, that is, 1981–2010 as training data and 2011–2020 as testing data. These methods are trained on training data, and their performance is assessed via testing data. Based on RMSE and MAE, Autometrics remained best in handling the gold prices trend and providing better forecasts than MCP and SCAD. We observed that penalization techniques hold many irrelevant covariates in comparison with Autometrics and hence tend to increase the forecast error comparatively.

Data Availability

Data can be provided upon special request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.