Journal of Probability and Statistics

Volume 2017, Article ID 1235979, 12 pages

https://doi.org/10.1155/2017/1235979

## Bootstrap Order Determination for ARMA Models: A Comparison between Different Model Selection Criteria

University of California San Diego (UCSD), San Diego, CA, USA

Correspondence should be addressed to Livio Fenga; ude.dscu@agnefl

Received 31 July 2016; Accepted 19 March 2017; Published 16 April 2017

Academic Editor: Ramón M. Rodríguez-Dagnino

Copyright © 2017 Livio Fenga. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The present paper deals with the order selection of models of the class for autoregressive moving average. A novel method—previously designed to enhance the selection capabilities of the Akaike Information Criterion and successfully tested—is now extended to the other three popular selectors commonly used by both theoretical statisticians and practitioners. They are the final prediction error, the Bayesian information criterion, and the Hannan-Quinn information criterion which are employed in conjunction with a semiparametric bootstrap scheme of the type sieve.

#### 1. Introduction

Autoregressive moving average (ARMA) models [1] are a popular choice for the analysis of stochastic processes in many fields of applied and theoretical research. They are mathematical tools employed to model the persistence, over time and space, of a given time series. They can be used for a variety of purposes, for example, the generation of predictions of future values, to remove the autocorrelation structure from a time series (prewhitening) or to achieve a better understanding of a physical system. As it is well known, performances of an ARMA model are critically affected by the determination of its order: once properly built and tested, such models can be successfully employed to describe the reality, for example, trend patterns of economic variables and temperature oscillations in a given area, or to build futures scenarios through simulation exercises. Model order choice plays a key role not only for the validity of the inference procedures but also, from a more general point of view, for the fulfillment of the fundamental principle of parsimony [2, 3]. Ideally, the observation of this principle leads to choosing models showing simple structures on one hand but able to provide an effective description of the data set under investigation on the other hand. Less parsimonious models tend to extract idiosyncratic information and therefore are prone to introduce high variability in the estimated parameters. Such a variability determines for the model a lack of generalization capabilities (e.g., when new data become available), even though, by adding more and more parameters, an excellent fit of data is usually obtained [4]. Overfitting is more likely to occur when the system under investigation is affected by different sources of noise, for example, related to changes in survey methodologies, time evolving processes, and missing observations. These phenomena, very common and in many cases simply unavoidable in “real life” data, might have a significant impact on the quality of the data set at hand. Under noisy conditions, a too complex model is likely to fit the noise components embedded in the time series and not just the signal and therefore it is bound to yield poor future values’ predictions. On the other hand, bias in the estimation process arises when underfitted models are selected, so that only a suboptimal reconstruction of the underlying Data Generating Process (DGP) can be provided. As it will be seen, bias also arises as a result of the uncertainty conveyed by the process* itself* of model selection. ARMA model order selection is a difficult step in time series analysis. This issue has attracted a lot of attention so that, according to different philosophies, theoretical and practical assumptions as well as several methods, both parametric and nonparametric, have been proposed over the years as a result. Among them, bootstrap strategies [5–9] are gaining more and more acceptance among researchers and practitioners. In particular, in [6] bootstrap-based procedures applied to the Akaike Information Criterion (AIC) [10, 11] in the case of ARMA models, called b-MAICE (bootstrap-Minimum AIC Estimate), has proven to enhance the small sample performances of this selector. The aim of this work is to extend such a procedure to different selectors, that is, final prediction error (FPE) [12] and two information based criteria, that is, Bayesian information criterion (BIC) [13, 14] and Hannan-Quinn criterion (HQC) [15, 16]. In particular, the present paper is aimed at giving empirical evidences of the quality of the bootstrap approach in model selection, by comparing it with the standard procedure, which, as it is well known, is based on the minimization of a selection criterion. In particular, the empirical study (presented in Section 4) has been designed to contrast the performances of each of the considered selectors both in nonbootstrap and bootstrap world. The validity of the proposed method is assessed not only in the case of pure ARMA processes, but also when real life phenomena are simulated and embedded in the artificial data. In practice, the problem of order determination is considered also when the observed series is contaminated with outliers and additive Gaussian noise. The last type of contamination has been employed, for example, in [17], for testing a model selection approach driven by information criteria in the autoregressive fractionally integrated moving average (ARFIMA) and ARMA cases. Such a source of disturbance has been employed here in order to test the degree of robustness of the method proposed against overfitting. As it will be seen, computer simulations show that the addition of white noise generates a number of incorrect specifications comparable to those resulting from the contamination of the process with outliers of the type innovation. Outliers are a common phenomenon in time series, considering the fact that real life time series from many fields, for example, economic, sociology, and climatology, can be subjected and severely influenced by interruptive events, such as strikes, outbreaks of war, unexpected heat or cold waves, and natural disasters [18, 19]. The issue is absolutely nontrivial, given that outliers can impact virtually all the stages of the analysis of a given time series. In particular, model identification can be heavily affected by additive outliers, as they can induce the selection of underfitted models as a result of the bias elements introduced into the inference procedures. In the simulation study (Section 4), outliers of the type additive (i.e., added to some observations) and innovative (i.e., embedded in the innovation sequence driving the process) [19] will be considered.

The remainder of the paper is organized as follows: in Section 2, after introducing the problem of order identification for time series, the considered selectors are illustrated along with the related ARMA identification procedure. In Section 3 the employed bootstrap selection method is illustrated and the bootstrap scheme briefly recalled. Finally, small sample performances of the proposed method will be assessed via Monte Carlo simulations in Section 4.

#### 2. Order Selection for Time Series Models

A key concept underlying the present paper is that, in general, “reality” generates complex structures, possibly -dimensional, so that a model can at best capture only the main features of the system under investigation in order to reconstruct a simplified version of a given phenomenon. Models are just approximations of a given (nontrivial) phenomenon and the related identification procedures could never lead to the determination of the “true” model. In general, there is no true model in a finite world. What we can do is to find the one giving the best representation of the underlying DGP, according to a predefined rule. In this section, after highlighting the role played by model selection procedures in generating uncertainty, we briefly introduce the models belonging to the class ARMA along with the order selectors considered. Finally, the information criterion-based standard selection procedure is illustrated.

##### 2.1. Uncertainty in Model Selection

Uncertainty is an unfortunate, pervasive, and inescapable feature characterizing real life data which has to be faced continually by both researchers and practitioners. The framework dealt with here is clearly no exception: if the true model structure is an unattainable goal, approximation strategies have to be employed. Such strategies are generally designed on iterative basis and provide an estimate of the model structure which embodies, by definition, a certain amount of uncertainty. Common sources of uncertainty are those induced by the lack of discriminating power of the employed selector and by the so-called model selection bias [20, 21], which arises when a model is specified and fitted on the* same* data set. Unfortunately, not only are these two types of uncertainty not mutually exclusive but also statistical theory provides little guidance to quantify their effect in terms of bias introduced in the model as a result [22]. Particularly dangerous is this last form of uncertainty, as it is based upon the strong and unrealistic assumption of making correct inference* as if* a model is known to be true, while its determination has been made on the same set of data. On the other hand, the first source of uncertainty is somehow less serious, given its direct relationship with the size of the competition set, which is usually included in the design of the experiment. In practice, it is related to the fact that very close SC minima can be found in the model selection process, so that even small variations in the data set can cause the identification of different model structures. In general, trying to explain only in part the complexity conveyed in the observed process by means of as simple as possible structures is a way to minimize uncertainty in the model selection, as it is likely to lead to the definition of a smaller set of candidate models. This approach can be seen as an extension of the principle of parsimony to the competition set. In the sequel, how the proposed procedure, being aimed at replicating both the original process and the related selection procedure, has a positive effect in reducing both the considered sources of uncertainty will be emphasized [23].

##### 2.2. The Employed Identification Criteria

Perhaps the most well-known model order selection criteria (SC), among those considered, are the AIC and the FPE, whose asymptotic equivalence to the -test has been proved in [24]. AIC has been designed on information-theoretic basis as an asymptotically unbiased estimate of the Kullback-Leibler divergence [25] of the fitted model relative to the true model. Assuming , being the sample size, to be randomly drawn from an unknown distribution with density , the estimation of is done by means of a parametric family of distributions, with densities , being the unknown parameters’ vector. Denoting as the predictive density function, as the true model, and as the approximating one, Kullback-Leibler discrepancy can be expressed as follows:

As the first term on the right hand side of (1) does not depend on the model, it can be neglected so that we can rewrite the distance in terms of the expected log likelihood, ; that is,

This quantity can be estimated by replacing with its empirical distribution , so that we have that . This is an overestimated quantity of the expected log likelihood, given that is closer to than . The related bias can be written as follows: and therefore an information criterion can be derived from the bias-corrected log likelihood; that is,

Denoting by and the number of estimated parameters and the sample size, respectively, Akaike proved that is asymptotically equal to , so that the information based criterion takes the form . By multiplying this quantity by , finally AIC is defined as . In such a theoretical framework, AIC can be seen as a way to solve the Akaike Prediction Problem [6], that is, to find a model producing estimation of density minimizing* Kullback*-*Leibler *discrepancy (1). Originally conceived for AR process, extended to the ARMA case by Soderstrom and Stoica [24], FPE was designed as the minimizer of the one-step-ahead mean square forecast error, after taking in account the inflating effect of the estimated parameter. FPE statistic is defined as , where is the estimated variance of the residuals and is the model’s size. A different perspective has led to the construction of BIC-type criteria, which are grounded on the maximization of the model posterior probability [14]. In more detail, they envision the specification of the prior distribution on parameter values and the models, respectively, denoted by and , and their introduction into the analysis through the joint probability function . Posterior probabilities for are then obtained through Bayes theorem, so that the value of maximizing (4), that is,is found. With being the likelihood function associated with both the data and the model , the selected order will be . By assuming all the models equally probable, that is, , the BIC criterion is hence defined by . The last criterion considered—constructed from the law of iterated algorithm—is the BIC, in which the penalty function grows at a very slow rate as the samples size increases. It is defined as follows: .

All these selectors can be divided into two groups: one achieving asymptotic optimality [26] and one selection consistency. AIC and FPE fall in the first group, in the sense that the selected model asymptotically tends to reach the smallest average squared error [27, 28], if the true DGP is not included in the competition set. On the other hand, BIC and HQ are dimension consistent [29], in that the probability of selection of the “true” model approaches 1 as the sample size goes to infinity. However, it should be pointed out that such an asymptotic property holds only if the true density is in the set of the candidate models. In this regard, AIC and FPE as well as the other Shibata efficient criteria (e.g., Mallows [30]) fail to select the “true” model asymptotically. As pointed out earlier, -dimensionality of the “truth” implies for all the models being “wrong” to some extent—except in trivial cases—so that no set of competition models will ever encompass the true DGP. As long as this approach is held true, asymptotic efficient criteria might be preferred. In this case, one may argue a lack of significance in comparing any finite list of candidate models when we rule out the existence of a true one. Such an approach is justified in that, even if no model can ever represent the truth, we can achieve the goal to find the one being approximately correct. Conversely, if one does believe that the true density belongs to the model space, hence dimension consistent selection criteria can be preferred.

##### 2.3. ARMA Model Selection through Minimization of Selection Criteria

In what follows, it is assumed that the observed time series is a realization of a real valued, –mean, second-order stationary process, admitting an autoregressive moving average representation of orders and ; that is, , with . Its mathematical expression is as follows:with and , being and , AR polynomial, and MA polynomial, respectively. With the backward shift operator, such that , is denoted whereas is assumed to be sequence of centered, uncorrelated variables with common variance . The parameters vector is denoted by . Standard assumptions of stationarity and invertibility, respectively, of AR and MA polynomials, that is,are supposed to be satisfied. Finally, the ARMA parameters of the true underlying DGP (5) are denoted by (i.e., ) and the related model by .

Identification procedures of the best approximating model for is carried out on a priori specified set of plausible candidate models ; that is,where the chosen model, say , is selected from (i.e., ). In the ARMA case, each model represents a specific combination of autoregressive and moving average parameters . The set is upper bounded by the two integers and for the AR and MA part, respectively; that is,

This assumption is a necessary condition for the above-mentioned Shibata efficiency and dimension consistency properties to hold other than for the practical implementation of the procedure (the model space needs to be bounded). From an operational point of view, the four SC considered in this work, when applied to models of the class ARMA, take the following form:where is an estimate of the Gaussian pseudo-maximum likelihood residual variance when fitting ARMA () models; that is,

Equations (10)–(13) can be synthetically expressed as follows:where is defined in Section 3 and is the penalty term as a function of model complexity.

The standard identification procedure, here called for convenience Minimum Selection Criterion Estimation (MSCE), is based on the minimization of the . In practice, the model minimizing a given SC is the winner; that is,

#### 3. The Bootstrap Method

As already pointed out, in [6] a bootstrap selection method has been proposed to perform AIC-based ARMA structure identification. The comparative Monte Carlo experiment with its nonbootstrap counterpart, commonly referred to as (Minimum Akaike Information Criterion Expectation) procedure, gave empirical evidences in favor of -MAICE procedure. Such results motivated us to extend this approach to other selectors (see (11), (12), and (13)). For convenience, the proposed generalized version of -MAICE procedure has been called (Bootstrap Minimum Selector Expectation) procedure. Finally, in order to keep the paper as self-contained as possible, and to reduce uncertainty in the experimental outcomes, AIC has also been included in the experiment.

##### 3.1. The Bootstrapped Selection Criteria

The proposed method relies on the bootstrapped version of a given SC, obtained by bootstrapping both the residual variance term and the penalty term, so that (15) becomes

The particularization of (17) to the criteria object of this study is straightforward and yields their bootstrapped versions; that is,with , , being as above defined and being the residual variance of the residuals from the fitting of the bootstrapped series with its ARMA estimate . In symbols,

In essence, bMSE method works as follows: MSCE procedure is applied iteratively on each bootstrap replication of the observed series. A winner model is selected at each iteration on the basis of a given SC, which in turns works exploiting the bootstrap estimated variances of the residuals. The final model is chosen on the basis of its relative frequency over the bootstrap replication.

##### 3.2. The Applied Bootstrap Scheme

Sieve [31] [32, 33] is the bootstrap scheme employed here. It is an effective and conceptually simple tool to borrow randomness from white noise residuals, generated by the fitting procedure of a “long” autoregression to the observed time series. This autoregression, here supposed to be –*mean*, is of the type , , under the stationarity conditions as in (6). Its use is here motivated by the representation of process of type (5); that is,with being a sequence of iid variables with and . In essence, bootstrap approximates a given process by a finite autoregressive process, whose order increases with the sample size such that , , . In this regard, in the empirical study the estimation of the -*vector* of coefficients has been carried out through the Yule-Walker equations. The residuals obtained from the fitting procedure of this autoregression to the original data are then employed to build up the centered empirical distribution function, which is defined aswhere , with being the mean value of the available residuals, that is, , . From bootstrap samples are generated by the recursionwith starting values , for , .

##### 3.3. The Proposed bMSE Procedure

Let be the observed time series realization of ARIMA () DGP (5), from which bootstrap replications are generated via method (Section 3). Our B-MSCE procedure is based on the minimization, over all the combinations of ARMA structures, of a given SC by applying MSCE procedure to each bootstrap replication of the original time series .

In what follows the proposed procedure is summarized in a step-by-step fashion.(1)A maximum ARMA order () is arbitrarily chosen, so that exhaustive set of tentative ARMA models, that is, , with ,, of size , is defined.(2)The number of bootstrap replications is chosen.(3)A bootstrap replication, , of the original time series is generated via method.(4)The competition set is iteratively fitted to so that values (one for each of the models in ) of the are computed and stored in the -dimension vector .(5)Minimum value is extracted from so that a winner model, , is selected; that is,(6)By repeating times steps () to (), the final model is chosen according to a mode-based criterion, that is, on the basis of its more frequent occurrence in the set of the bootstrap replications. In practice, the selected model is chosen according to the following rule:with the symbol being used as a counter of the number of the cases satisfying the inequality condition expressed in (24).

The order of the autoregression is chosen by iteratively computing the Ljung-Box statistic [34] on the residuals resulting from the fitting of tentative autoregression on the original time series with sample size . Further orders, say , , for increasing sample sizes, , , are selected according to the relation , where (in [6] is chosen by iteratively computing the spectral density on the residuals resulting from the fitting of tentative autoregression on the original time series; the order for which the spectral density is approximately constant is then selected).

The presented method is exhaustive and then highly computer intensive, as for all the possible pairs (in the attempt to reduce such a burden, sometimes, see, e.g., [35], the set of the ARMA orders under investigation is restricted to ; i.e., the competition set is made up of ; however, the fact that such an approach entails the obvious drawback of not being able to identify common processes, such as , has appeared to be a too strong limitation; therefore, in spite of its ability to drastically reduce the computational time, such an approach has not been followed here), the values of the given must be computed for each of the bootstrap replications.

#### 4. Empirical Study

In this section, the outcomes of a simulation study will be reported. It has been designed with the twofold purpose of (i) evaluating bMSE procedure’s small sample performances and (ii) giving some evidences of its behavior for increasing sample sizes. As a measure of performances, the percentage frequency of selection of the true order , in the sequel denoted as and for MSCE and bMSE procedure, respectively, has been adopted; that is,with denoting the number of the artificial time series employed in the experiment and the quantifier symbol, expressing the number of times the statement “time series correctly identified” is true. Its extension to the bootstrap case is straightforward.

Aspect (i) consists of a series of Monte Carlo experiments carried out on three different sets of time series, 10 for each set, detailed in Table 1, which () are realization of three prespecified ARMA orders, that is, (), (), and () (one order for each set), and () differ from each other, within the same set, only for the coefficients’ values, but not for the order (). Two sample sizes will be considered, that is, , 200. Formally, these sets are, respectively, denoted as and supposed to belong to the order subspace : . For each , 10 different coefficient vectors are specified, that is, . The validity of the presented method is assessed on comparative basis, using as benchmark the standard MSCE procedure. For the sake of concision, the values and will be computed averaging over all the DGPs belonging to either the same set or . In practice, two indicators, that is, the Percentage Average Discrepancy (PAD) and the Overall Percentage Average Discrepancy (OPAD), depending on weather only one set or the whole order subspace is considered, will be employed. They are formalized as follows:where with the symbol being the cardinality of a set is denoted. In other words, the average percentage differences in the frequency of selection of the true model is used as a measure of the gain/loss generated by bMSE procedure with regard to a single (26) or by averaging over the sets (27). As already outlined, in analyzing aspect (ii) the attention is focused on the behavior of the proposed method for increasing sample sizes, that is, , 200, 500, 1000. In Table 4, the results obtained for the case of 4 DGPs—detailed in the same table—will be given. In both (i) and (ii), for each , a set of time series has been generated. Each time series has been artificially replicated times using the bootstrap scheme outlined in Section 3.2 (the simulations have been implemented using the software R (8.1 version) and performed using the hardware resources of the University of California, San Diego; in particular, the computer server EULER (maintained by the Mathematical Department) and the supercomputer IBM-TERAGRID have been employed). The number of bootstrap replications employed has been chosen on empirical basis, as the best compromise between performances yielded by the method and computational time.