#### Abstract

This research compares factor models based on principal component analysis (PCA) and partial least squares (PLS) with Autometrics, elastic smoothly clipped absolute deviation (E-SCAD), and minimax concave penalty (MCP) under different simulated schemes like multicollinearity, heteroscedasticity, and autocorrelation. The comparison is made with varying sample size and covariates. We found that in the presence of low and moderate multicollinearity, MCP often produces superior forecasts in contrast to small sample case, whereas E-SCAD remains better. In the case of high multicollinearity, the PLS-based factor model remained dominant, but asymptotically the prediction accuracy of E-SCAD significantly enhances compared to other methods. Under heteroscedasticity, MCP performs very well and most of the time beats the rival methods. In some circumstances under large samples, Autometrics provides a similar forecast as MCP. In the presence of low and moderate autocorrelation, MCP shows outstanding forecasting performance except for the small sample case, whereas E-SCAD produces a remarkable forecast. In the case of extreme autocorrelation, E-SCAD outperforms the rival techniques under both the small and medium samples, but further augmentation in sample size enables MCP forecast more accurate comparatively. To compare the predictive ability of all methods, we split the data into two halves (i.e., data over 1973–2007 as training data and data over 2008–2020 as testing data). Based on the root mean square error and mean absolute error, the PLS-based factor model outperforms the competitor models in terms of forecasting performance.

#### 1. Introduction

The prediction of macroeconomic variables is very important under macroeconomic studies, monetary policy analysis, and environmental economics. Accurate forecasts induce sound insights into mechanisms of dynamic economies [1], more effective monetary policies [2], and better portfolio management and hedging strategies [3]. In the data-rich environment existing these days, many macroeconomic series are tracked by economists and decision-makers.

Low-dimensional models often include some prespecified economic covariates for instance vector autoregression and therefore have a complication in capturing the dynamic and complex patterns, which contain huge panels of time series [4]. It is a fact that missing important variable(s) leads to an underspecified model, inducing biased results. There is an intense need to propose updated statistical models and analysis frameworks with the purpose of expanding the low-dimensional counterparts for improved forecasts. Thus, in the recent era, the analysis of “Big Data” has become the core of economics research. This in turn has resulted in special attention being paid to the huge class of techniques that are available in the domain of machine learning, dimension reduction, and penalized regression [5, 6]. Recently, in the regression context, Doornik and Hendry [7] categorized Big Data into three classes: tall big data, huge big data, and fat big data. Each type can be defined as follows:(i)Tall big data: more observations and several covariates (*N* >> )(ii)Huge big data: more observations and more covariates (*N* > )(iii)Fat big data: fewer observations and more covariates (*N* < )where *N* and represent the number of observations and covariates, respectively. We graphically represent the Big Data in Figure 1.

There are many related studies on macroeconomic forecasting based on factor models and machine learning techniques. In the last two decades, forecasting studies using large-scale datasets and pseudo-out-of-sample forecast incorporate those by Artis et al. [8]; Boivin and Ng [9, 10]; Forni et al. [11]; Armah and Swanson [12, 13]; Stock and Watson [14–18]; Varian [5]; Kim and Swanson [19, 20]; Castle et al. [21, 22]; Luciani [23]; Kristensen [24]; Swanson and Xiong [6, 25]; Tu and Lee [26]; Swanson et al. [27]; Maehashi and Shintani [28]; Kim and Ko [29]; Kim et al. [30]; Abdić et al. [31]; and and Kim and Shi [32].

Moreover, Stock and Watson [17] elaborately discussed the past studies on the utility of factor models forecasting. There is an intensive and growing body of literature in this area. Few of them are relevant, as they address both theoretical and empirical problems, including Armah and Swanson [12, 13]; Artis et al. [8]; Bai and Ng [1, 33, 34], Banerjee and Marcellino [35]; Boivin and Ng [9, 10], Ding and Hwang [36]; Dufour and Stevanovic [37]; Stock and Watson [15–18]; and Smeekes and Wijler [38].

The abovementioned papers consider principal component analysis, independent component analysis, and sparse principal component analysis for the construction of the factor model. However, there is also a small and growing body of literature investigating the classical approach (Autometrics) in the context of macroeconomic forecasting [7, 21, 22]. We failed to discover any paper to date that has investigated the use of partial least squares (PLS) theoretically in our context. However, the method has been applied empirically in various fields. Apart from this, some papers have utilized shrinkage methods like ridge regression, lasso, elastic net, adaptive lasso, and nonnegative garrote, but none of the papers to date have used the updated forms of shrinkage methods in our context.

Filling the gaps, this work implements some updated techniques of big data to increment literature of macroeconomic forecasting theoretically as well as empirically. From the dimension reduction aspect, we build factor models intending to highlight the importance of such models for macroeconomic prediction. Particularly, while building factor models, we employ principal component analysis (PCA) and partial least squares (PLS). In addition, we also assess the last version of the classical approach (Autometrics) and the updated version of shrinkage methods including elastic smoothly clipped absolute deviation (E-SCAD) and minimax concave penalty (MCP). We evaluate the performance of these techniques in a simulation setting where the true data generating process (DGP) of the factor model is used. To summarize the whole discussion, our prime contribution comes in the form of comparison of updated shrinkage methods and Autometrics with factor models through forecasting under the simulated scenarios having multicollinearity, heteroscedasticity, and autocorrelation along with application to macroeconomic data to provide a conclusive solution to predictability. The study aims to produce an improved method to help policymakers; the improved tool is not restricted to workers’ remittances or the stock market (*in our case*) but is valid for any time series.

The remaining part of the paper is organized as follows. In Section 2, we provide a detailed discussion regarding factor models based on principal component analysis and partial least squares. In Section 3, we discuss big data techniques, such as the classical approach and shrinkage methods. Monte Carlo evidence on the comparative performance of several forecasting techniques is discussed in Section 4. Empirical findings are given in Section 5. Section 6 provides concluding remarks.

#### 2. Methods

The techniques we intend to apply in subsequent sections are reported in Figure 2.

This study aims to compare the predictive ability of factor models based on principal component analysis and partial least squares with Autometrics, elastic smoothly clipped absolute deviation (E-SCAD), and minimax concave penalty under different scenarios like multicollinearity, heteroscedasticity, and autocorrelation. Macroeconomic and financial datasets are used for the analysis of the real phenomenon.

##### 2.1. Factor Models

The notion of factor models also called diffusion index entails the utility of properly extracted hidden common factors that have been distilled from a huge set of features as inputs in the identification of the parsimonious models. To be more specific, let *X* be an *N* × *P* dimensional matrix of data points and define *N* × *k* dimensional matrix of latent factors.

Stock and Watson [17] have delineated in depth the literature regarding forecasting through factor models. In the below detailed discussion of factor model methodology, we follow Stock and Watson [15]:where represent the random error matrix, is the *P* × *k* coefficients matrix, and *F* is a factor matrix of *N* × *k* dimension.

We construct the following forecasting model based on the work of Bai and Ng [39], Kim and Swanson [19], and Stock and Watson [15]:where is an outcome variable to be forecasted, *h* shows the forecast horizon, and is the vector of factors with a dimension, distilled from *F* in equation (1). The associated coefficient is a vector of unknown parameters, and is the random error. The whole process of factor model forecasting consists of two steps: in the first step, we estimate *k* latent (unobserved) factors, represented by , from observable predictors. To gain convenient dimension reduction, *k* is supposed to be much smaller than (i.e., *k* ≪ ). In the second step, we estimate , by utilizing data at hand with and . Subsequently, an out-of-sample forecast is constructed.

Kim and Swanson [19] utilized the PCA approach to achieve estimates of the unobserved factors, known as principal components (PCs). The latent PCs are uncorrelated which are obtained by using the data projection in the direction of maximal variance, and naturally, the PCs are ordered based on their variance contributions. The first PC reflects the direction of the maximal variance in the data, the second PC reflects the direction that explains the maximal variance in the rest of the orthogonal subspace, and so on.

This approach is most frequently used in the literature of factor analysis because PCs are easily derived via the use of singular value decompositions [15, 33, 34].

Boivin and Ng [10], however, argued that the performance of the factor model is more likely to be worse in prediction if the incorporated factors are dominated by excluding factors. Similarly, Tu and Lee [26] stated that PCA imposes only the factor structure for *X* and does not consider the outcome variable. It indicates that PCA ignores the dependent variable while performing it. By dint of neglecting the outcome variable at the time of factors, extraction induces an inefficient forecast of the outcome variable. The solution to this problem is given in the next section.

##### 2.2. The Partial Least Squares (PLS) Method

This study looks at another method that is known as partial least squares (PLS) regression developed by Wold [40]. This method is appropriate in a data-rich environment and may be considered as an alternative to PCA-based factor models. Unlike the PCA method, the PLS identifies new factors in a supervised way; that is, it makes use of the response variable to identify new factors that not only approximate the old factors well but are also related to the response variable. Roughly speaking, the PLS approach attempts to find the directions of maximum variance that help in explaining both the response variable and explanatory variables. The PLS for an outcome variable is motivated by a statistical model as follows:where = is an *n* × 1 vector of covariates at time *t* = 1, …, *T*, is an *n* × 1 vector of associated coefficients, and is the disturbance term. Kim and Ko [29] argued that PLS models are useful especially when there are a large number of covariates. Instead of using a model given in (3), one may adopt another data dimension reduction approach through the following linear regression with *Z* × 1 vector of components = as follows:

We define :where = [, , …, ] is the *n* × *Z* matrix of each column, = , *z* = 1, 2, …, *Z*, denote the vector of weights on covariates for *z* factors or components, and is the *Z* × 1 vector of PLS coefficients. We may use the following equation for predicting the *k* steps ahead model; that is, , *k* = 1, 2, …, *m*.

#### 3. Classical Approach and Shrinkage Methods

The fundamental comparison of interest here is between automatic selection over variables as against PC and PLS-based factors in terms of prediction. Factors are often regarded as essential to summarize a large amount of information, but the classical approach and shrinkage methods are alternatives.

##### 3.1. Classical Approach

Autometrics is a well-known big data algorithm, which consists of five steps. In the first step, we begin the process with the construction of a linear model, which refers to the General Unrestricted Model (GUM); in the second step, we obtain the estimates for unknown parameters and test them statistically; the third step entails presearch process; step four delivers the tree-path search; and the last step leads to a selection of the final model.

Doornik [41] elaborately delineated the complete algorithm. The key notion is to commence modeling with a linear model that incorporates all candidate features (GUM). Estimate the GUM by the least squares method and then carry out the statistical tests to validate the congruency of the model. If the estimated GUM contains statistically insignificant coefficients at prespecified criteria, then again estimate the simpler models by utilizing different paths search and ratified by diagnostic tests. As some terminal models are detected, Autometrics undertakes their union testing. The rejected models are discarded, and the union of those terminal models who survived leads to a new GUM for another tree-path search iteration. The whole inspection process proceeds, and the terminal models are statistically checked against their union. If two or more terminal models clear the encompassing tests, then the preselected information criterion decides about the final choice.

The econometric models are achieved by applying Autometrics on the GUM:

Under Autometrics, two main strategies are commonly used for model selection, a conservative and a superconservative also called Liberal strategy. Our study implements the Liberal strategy, which is typically based on a one percent significance level rather than five percent. In other words, the statistical significance of each estimated coefficient is based on one percent level of significance.

##### 3.2. Shrinkage Methods

An alternative prominent approach to deal with many features is the family of panelized regression methods, which comprises of many techniques, but our study adopts the following updated forms: elastic smoothly clipped absolute deviation and minimax concave penalty.

###### 3.2.1. Elastic Smoothly Clipped Absolute Deviation

Fan and Li [42] added a new penalization technique to literature known as SCAD. The technique is nonconvex and enjoys an oracle property: sparsity, continuity, and unbiasedness. This technique selects useful covariates with their magnitudes asymptotically in an efficient way if the underlying true model is known (i.e., the oracle properties). The SCAD function covers all the limitations faced by the existing methods like ridge and lasso. The penalty function of SCAD is defined as follows:

The unknown tuning parameter *k* was determined by the generalized cross-validation approach, and they assumed the value of is 3.7. As given above, the penalty function is continuous, and the resulting solution is given by

The tuning parameters can be induced from the data-driven technique. The limitation of SCAD is that it selects only one variable from a correlated set of predictors. Zeng and Xie [43] extended the SCAD by augmenting penalty and called it elastic SCAD (E-SCAD). Mathematically, it can be written as

Due to penalty, the E-SCAD achieves an additional property along with oracle properties; that is, the penalty function should spur highly correlated features to be in or out of the model simultaneously. Hence, the proposed form selects the whole group of correlated predictors rather than one variable.

###### 3.2.2. Minimax Concave Penalty

Zhang [44] proposed a minimax concave penalty (MCP), which yields the convexity of the penalized loss in sparse regions considerably given specific thresholds for features selection as well as unbiasedness. The MCP is described as follows:

The tuning parameter () diminishes the maximum concavity under the following restrictions like unbiasedness and selection of features:

The dual-tuning parameters in concave penalty regression play a key role in terms of controlling the amount of regularization. Likewise, the concavity of the MCP penalty considerably evades the sparse convexity by dint of diminishing the maximal concavity. In 2010, the author showed that a rise in regularization parameter value leads to bearing more convexity and achieves an almost unbiased penalty. The penalty function of MCP typically belongs to the quadratic spline function.

#### 4. Monte Carlo Evidence on Forecasting Performance

Our simulation part consists of three main scenarios, namely, simulations on a data generating process (DGP) with (i) multicollinearity, (ii) heteroscedasticity, and (iii) autocorrelation. In each simulated scenario, varying the DGP attributes in terms of correlation strength among features, the magnitude of the variance of the error term, and the magnitude of correlation of error term with previous values (lag).

##### 4.1. Data Generating Process

We generate data from the following equation:

The set of predictors , , …, are generated from multivariate normal distribution as ∼*N* (0, ). The same data generating process (DGP) was used by [38] as mentioned in (13) for artificial data generation. Our study considers three types of sample sizes for the simulation experiments. We suppose a dual set of features with altering the number of active (*p*) and inactive features (*q*), respectively, as portrayed in Figure 3.

In our simulation experiments, we assume three scenarios as follows: in the first scenario: we generate the pairwise correlation between the predictors (i.e., and as ). The population covariance matrix is produced in the following way:

While altering the parameter , we obtain different correlation structures. In our work, we assume values for ∈ {0.25, 0.5, 0.9} as followed by Xiao and Xu [45]. In the second scenario, we generate the correlation between current and residuals lag (autocorrelation) and symbolized by . The autocorrelation is generated as follows:

Our experiments assume the low, moderate, and high cases of autocorrelation, such as *ρ* ∈ {0.25, 0.5, 0.9}. The third scenario is for examining heteroscedasticity (i.e., means that the variance of the error term is not constant and alters across data points by ).

So, we split the variance into two components (i.e., and ). Let us have “*n*” observations; we set the variance of (*n*/2) observations as and the variance of remaining observations as . Our simulation experiments assume three cases of heteroscedasticity and set the values of = (/), where *i* = 1, 2, 3 as ∈ {0.1/0.3, 0.2/0.6, 0.3/0.9}. Tenfold cross-validation is executed to determine the optimal value of the tuning parameter(s).

To evaluate the forecasting performance of all methods, we divide each realization such that 80 percent of the data are used to train the models and the remaining data are utilized for models’ evaluation followed by [46]. The entire process will be replicated *M* = 1000 times. The average of root mean square (RMSE) and mean absolute error (MAE) are computed over “*M*” to assess the forecast performance. The smaller the values of RMSE and MAE, the closer the predicted values to the actual values and the better the forecast relatively. For analysis, we have relied on several packages like gets, glmnet, ncvreg, pls, caret, forecast, and Metrics under R programming language.

##### 4.2. Simulation Results

The forecast comparison results derived from Monte Carlo experiments are presented in Tables 1–3. All methods are improving their performance by augmenting the number of observations. Increasing the number of irrelevant and candidate variables adversely affects the predictive ability. *Scenario 1*. In the presence of low and moderate multicollinearity, the performance of MCP is superior to other rival methods except for the case of a small sample, where E-SCAD and PLS-based factor models are dominant. To be more specific, in the presence of low and moderate multicollinearity, E-SCAD often produced better forecasts. As we consider the case of high multicollinearity, the PLS-based factor model is superior in particular, while asymptotically E-SCAD outperformed the other methods. *Scenario 2*. In the presence of all schemes of heteroscedasticity, the performance of MCP is often better than all competitor models. When the number of predictors is equal to 50, Autometrics provides a similar forecast as MCP in large samples. *Scenario 3*. In the presence of low and moderate autocorrelation, the MCP showed an outstanding performance in terms of forecasting particularly when we increase the sample size. In contrast, when *n* = 100, the E-SCAD produced a remarkable forecast. In the case of extreme autocorrelation, E-SCAD outperformed the rival techniques under both small and moderate samples, but as we further augment the sample equal to 400, the MCP induced a more accurate forecast comparatively.

#### 5. Real Data Analysis

After Monte Carlo experiments, this study performs real data analysis using big data. For real data analysis, we focus on two datasets: macroeconomic data and financial markets. In the context of both datasets, the study considers worker’s remittances inflow and stock market data, respectively. It is a fact that many factors influence the worker’s remittances inflow and the stock market. Among them, some covariates are recommended by economic and financial theories to be included in the model. Apart from this, a long list of variables has been recommended by past studies. This study considers all the possible determinants based on theories and literature as well to make a general model. In econometrics literature, such a model is known as the general unrestricted model (GUM).

##### 5.1. Data Source

This study collects the annual data for Pakistan from 1973 to 2020. The data is sourced from the World Development Indicators (WDI), International Financial Statistics (IFS), International Country Risk Guide, and State Bank of Pakistan. The few missing observations in the data set are replaced by averaging the neighbor observations. Most variables are transformed into logarithm form to ensure normality.

Details on the variables used for the analysis are given in Appendix Table 4.

##### 5.2. Correlation Matrix

For empirical analysis, we split the data set into parts: observations from 1973 to 2007 are utilized to train the models and the remaining data are used to evaluate their forecasting performance. But before going to compute the forecast error, we discover the correlation structure among covariates through the visualization approach. In Figures 4 and 5, blue and red colors exhibit positive and negative correlations, respectively. The colors’ severity and the area of the circle are directly associated with correlation coefficients. On the right side of the correlogram, the legend color shows the correlation coefficients and the corresponding colors. We can observe that there are many dark color circles in blue and red, which clearly illustrate the high pairwise correlation. In other words, we can conclude that there exists high multicollinearity among predictors under both datasets. Figure 6 reveals that the distribution of stock market data is almost symmetric. Apart from this, diagnostic tests revealed that the residuals of an estimated model are independently and identically distributed. As we have noted in simulation experiments that in presence of high multicollinearity, the PLS-based factor model outperformed the other methods in terms of forecast error particularly when the sample size is small. It reveals that PLS-based factor is more robust in such circumstances.

##### 5.3. Forecast Comparison Based on Two Real Datasets

Root mean square error and mean absolute error are computed to ascertain the predictive ability of MCP, E-SCAD, Autometrics, and factor models based on PCA and PLS in Figures 7 and 8, respectively. The findings show that PLS-based factor model outperformed the rival methods in the out-of-sample forecast. It illustrates that PLS-based factor model has good predictive power than other competitor models, in terms of having the lowest forecast errors in multistep ahead forecast for the period (2008 to 2020). It supports the simulation results under both real datasets.

**(a)**

**(b)**

**(a)**

**(b)**

#### 6. Concluding Remarks

This study compares factor models based on principal component analysis and partial least squares with classical approach (Autometrics) as well as shrinkage procedures (i.e., minimax concave penalty (MCP) and elastic smoothly clipped absolute deviation (E-SCAD)). The comparison is made under the presence of multicollinearity, heteroscedasticity, and autocorrelation with altering sample size and number of covariates. We carried out Monte Carlo experiments to compare all methods in terms of prediction. All methods are improving their performance with a growing sample size in all scenarios. Expanding the number of irrelevant and candidate variables negatively affects forecasting accuracy. In the presence of low and moderate multicollinearity, MCP often produced better forecasts comparatively except for the small number of observations, where E-SCAD is dominant. In the case of extreme multicollinearity, the PLS-based factor model is superior, but with increased sample sizes, the prediction accuracy of E-SCAD significantly boosts up as compared to other methods. In the presence of all schemes of heteroscedasticity, the performance of MCP is better than all competitor models. When the number of predictors is equal to 50, Autometrics provides a similar forecast as MCP in large samples. In the presence of low and moderate autocorrelation, the MCP showed an outstanding performance in terms of forecasting except for the small sample case where E-SCAD produced a remarkable forecast. In the case of extreme autocorrelation, E-SCAD outperformed the rival techniques under both the smallest and medium samples, but as we further augment the sample equal to 400, the MCP induced a more accurate forecast comparatively.

For empirical application, macroeconomic and financial datasets are used. To compare the forecasting performance of all methods, we divide the data into two parts (i.e., data over 1973–2007 as training data and data over 2008–2020 as testing data), using both datasets. All methods are trained on training data and subsequently, their performance was evaluated through testing data. Based on RMSE and MAE, the PLS-based factor model is more robust in terms of forecasting than competitor models. This study has several recommendations, reported in Table 4.

##### 6.1. Limitations and Future Direction

The few limitations of this study are that it only focuses on linear models and has considered yearly data. The simulation part of this study is restricted to Gaussian distributed errors, but in practice, this is not essential that the errors of a model are always normal. Hence, the research can be conducted to discover the forecasting performance of advanced statistical and machine learning techniques under nonnormal residuals as well as missing observations in the data set. This study can be expanded to examine the performance of nonlinear and nonparametric algorithms like artificial neural networks, random forests, support vector machines, etc.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this study.

#### Supplementary Materials

*Appendix Table 4*. Variables description.* (Supplementary Materials)*