Abstract

Akaike Information Criterion (AIC) based on least squares (LS) regression minimizes the sum of the squared residuals; LS is sensitive to outlier observations. Alternative criterion, which is less sensitive to outlying observation, has been proposed; examples are robust AIC (RAIC), robust Mallows Cp (RCp), and robust Bayesian information criterion (RBIC). In this paper, we propose a robust AIC by replacing the scale estimate with a high breakdown point estimate of scale. The robustness of the proposed methods is studied through its influence function. We show that, the proposed robust AIC is effective in selecting accurate models in the presence of outliers and high leverage points, through simulated and real data examples.

1. Introduction

Akaike Information Criterion () [1] is a powerful technique for model selection, and it has been widely used for selecting models in many fields of study.

Consider a multiple linear regression model: where is a vector containing explanatory variables, is the response variable, is a vector of parameters, is the intercept parameter, and is the error component, which is independent and identically distributed (iid), with mean 0 and variance . The classical is defined as where , with .

Since the estimator is vulnerable in presence of outliers, it is not surprising that inherits this problem. Several robust alternatives to the have been proposed in the literature (see [24]). For an example Ronchetti [3] has proposed and investigated the properties of a robust version of , with respect to -estimation. A similar idea was used by Martin [2] for autoregressive models. Furthermore, very recently Tharmaratnam and Claeskens [4] have proposed a robust with respect to -estimation and -estimation, to generalize the information criteria, using the full likelihood models. In spite of all these complicated approaches of deriving robust , we introduce a straightforward approach to derive robust , which focuses on modifying the estimate of the scale.

To set the idea, the influence of outlier on is illustrated through the presence of outliers in the -direction (called a vertical outlier) or in the -direction (called a leverage point). For this, a point with coordinates is added, where the value of ranges between (−1.5 and 3). A similar approach is done for leverage points, by replacing the value with (Figure 1). Table 1 and Figure 2 show that the value of increases as the size of contamination in increases, as expected, and if or is extremely large, the is unbounded; that is, it will tend to infinity.

The remainder of the paper has been organized as follows. Section 1.1 reviews some robust regression estimate methods. In Section 1.2 we review a robust version of ; we discuss the robustness problem from the viewpoint of model selection and point out the sensitivity of robust , based on -estimator to leverage points. We derive the influence function of and study its properties in Section 2. The performance of robust is evaluated and compared to the commonly used classical in Section 3. Finally, concluding remark is presented in Section 4.

1.1. A Robust Regression Estimate

The -estimator [5] is the value of that minimizes the following function: where is symmetric and nondecreasing on . Furthermore, , and is almost continuously differentiable, anywhere. Furthermore, is a function, which is less sensitive to outliers than squares, yielding the estimating equation: where . If we choose function in (3) as Tukey Biweight function where yields , with the standard normal cumulative distribution function and , the resulting estimator is then Biweight -estimator. -estimators are efficient and highly robust to unusual value of , but one rogue leverage point can break them down completely. For this reason, generalized -estimators were introduced, which solve where is a weight function [6].

In recent years, a good deal of attention in the literature has been focused on high-breakdown methods; that is, one method must be resistant to even multiple severe outliers. Many methods are based on minimizing a more robust scale estimate than sum of squared residuals. For example, Rousseeuw [7] proposed , a high-breakdown method that minimizes the median of the squared residuals, rather than their sum. In addition, Rousseeuw [8] proposed least trimmed squares (), which minimizes the sum of the smallest squared residuals, defined as based on the ordered absolute residuals . LTS converges at the rate of with the same asymptotic efficiency under normality, as Huber’s skip estimator. The convergence rate of is , and its objective function is less smooth than .

1.2. The Robust Version of AIC

Consider scale estimate of errors defined by with and . By replacing the value of in (2) in terms of , in (2) can be expressed as follows: Notice that the largest values of indicate that the model (and hence the explanatory variables) is less successful in explaining the variations in the response, while a small value of indicates an excellent fit (i.e., model) to the response data.

Ronchetti [3] proposed a robust counterpart of the statistic. The extension of to is inspired by the extension of maximum likelihood estimation to -estimation. The author derived for an error distribution with density function . For a given constant and a given function , the author chose the model that minimizes where is some robust estimate of and is the -estimator defined as in (3). Huber [9] suggested and .

We introduce an alternative robust version of , by replacing in (8) to the robust estimator of scale which attains a 50% breakdown point. When , (8) finds the estimates corresponding to the half samples, having the smallest sum of squares of residuals. As such, as expected, the breakdown point is 50%, and the estimated scale from is For other robust estimations, the -estimator and the Biweight -estimator are compared to least trimmed of squares. Based on the results shown in Table 2, it is evident that the -estimator is much more robust than but suffers from leverage points. The Biweight -estimator and show robust behavior: the AICBS is stable, even though the size of the outliers increases. In the next section, we generalize these findings, by computing the associated influence functions.

2. Influence Function

Consider the linear model in (1), for . Assume the distribution of errors satisfying , where is the residual scale parameter and is symmetric with valid probability density function.

Let and be independent stochastic variables with distribution . The functional is Fisher-consistent for the parameters at the model distribution , as follows: For a Fisher-consistent scale estimator, , for all . In general, the influence function of at the distribution is defined as where is the functional defined as the solution of the objective model and is the distribution which contains outliers. The following theorem gives the influence function of with any scale .

Theorem 1. Let be some distribution other than . Take and denote by the error term of the model. Assume that has the property that is differentiable with partial derivatives equal to zero at the origin (0,0). Then, where and are computed from sum model. (The proof is in the Appendix.)

It is clear that the influence function in (13) is bounded if the is also bounded. It is evident that is nonrobust, since the estimate has unbounded influence function. The influence function of -estimation with respect to is bounded by choice of , but it is unbounded with respect to the direction. That is, where and is a certain matrix given by The influence function of the using estimators, following Theorem 1, takes the form We have noted that the influence function of AICLTS is bounded in both and directions, as is bounded. Moreover, we conclude that with high breakdown point estimator provides reliable fits to the data in presence of leverage points.

3. Numerical Examples

In this section, AICLTS, AICLMS, and AICBS are compared with and . In this study, 50 independent replicates of 3 independent uniform random variables on of , and and 50 independent normally distributed errors were generated. The true model is given by , for , using two variables and . We have computed the following , based on the respective estimation criterion: (i) estimation; (ii) -estimator; (iii) estimator; (iv) estimator; and (v) estimator. In order to illustrate the robustness to outliers, we consider the following cases:(a)vertical outliers (outliers in the only),(b)good leverage points (outliers in the and ),(c)bad leverage points (outliers in the only).

For all situations, we randomly generate (0%, 5%, 10%, 20%, 30%, and 40%) of outliers from and distributions, respectively. For each of these settings we simulate 1000 samples.

3.1. Simulation Result

The resulting fit to the data is classified as one of the following:(i)correct fit (true model);(ii)overfit (models containing all the variables in the true model plus other variables that are redundant);(iii)under fit (models with only a strict of the variables in true model);(iv)wrong fit (model that are neither of the above).

Tables 3, 4, and 5 show detailed simulation result for different versions of methods. For uncontaminated datasets, the classical AIC performs best, compared to robust AIC. By introducing vertical outliers, the classical AIC selects a large proportion of wrong fit models and, as expected, the robust AIC will usually (i.e., with higher proportion) select the correct model.

For bad leverage points, we observe that tends to produce overfit and with high level of contamination it takes a higher proportion of wrong fit. However, tend to produce either an under fit or wrong fit model. However, the robust estimate produces comparable power in the presence of bad leverage points.

For good leverage points, tends, also, to produce overfit. On the other hand, the robust tend to produce either correct fit or under fit model.

3.2. Example 2 (Stack Loss Data)

Stack loss data was presented by [10]. This data set consists of 21 observations on three independent variables, and it contains four outliers (cases 1, 3, 4, and 21) and high leverage points (cases 1, 2, 3, and 21). The data are given in Table 6.

We applied the traditional and robust versions of methods on the data. Table 7 shows that the classical method selects the full model, and robust method ignored one of the important variables (), whereas robust methods, based on high break down points estimators, agreed with the importance of the two variables, and .

4. Conclusions

The least trimmed squares () and the least median squares () are robust regression methods, frequently used in practice. Nevertheless, they are not commonly used for selecting models. This paper introduced the Akaike Information Criterion () based on and scales, which are robust against outliers and leverage point. Our simulation result illustrated excellent performance of AICLTS and AICLMS for contaminated data sets. This paper focused on the variable selection criteria; one might be interested in extending other robust model selection criteria to advanced robust breakdown point estimation methods, such as , , or BS estimators. In addition, this paper has considered regression model with continuous variables; however, future studies might consider mixed variables (i.e., continuous and dummy).

Appendix

Proof of Theorem 1. Consider where where
Inserting (A.2) into (A.1) yields

Conflict of Interests

The author declares that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The author would like to thank Professor Nor Aishah Hamzah and Dr. Rossita M Yunus for their support to complete this study. The author is also grateful to the anonymous reviewers for their valuable comments on earlier draft of this paper. This research has been funded by University of Malaya, under Grant no. RG208-11AFR.