Abstract

The essential task of risk investment is to select an optimal tracking portfolio among various portfolios. Statistically, this process can be achieved by choosing an optimal restricted linear model. This paper develops a statistical procedure to do this, based on selecting appropriate weights for averaging approximately restricted models. The method of weighted average least squares is adopted to estimate the approximately restricted models under dependent error setting. The optimal weights are selected by minimizing a k-class generalized information criterion (k-GIC), which is an estimate of the average squared error from the model average fit. This model selection procedure is shown to be asymptotically optimal in the sense of obtaining the lowest possible average squared error. Monte Carlo simulations illustrate that the suggested method has comparable efficiency to some alternative model selection techniques.

1. Introduction

The essential task of risk investment aims to select an optimal tracking portfolio among numerous portfolios of stocks. Given a desired target and a series of stocks, a tracking portfolio is comprised by every nonempty subset of the given group of stocks so as to track the target to a certain degree. Because of the number of nonempty subsets of stocks, there exists a mass of possible tracking portfolios. Among all possible portfolios, we should find an optimal tracking portfolio whose return is closest to the targets. Statistically, a tracking portfolio is built by a group of stocks, which is equivalent to fitting a restricted linear model with the target’s return as the dependent variable and returns on stocks in the group as the regressors. Since the coefficient of a regressor indicates the proportion of the investment in the corresponding stock within the total investment in the portfolio, the linear model is restricted such that all coefficients in the model sum to one. Thus, the task of choosing an optimal tracking portfolio can be accomplished by selecting an optimal restricted linear model.

In this paper, a model average technique is developed for examining the selection problem of restricted linear models. Model selection has played an important role in econometrics and statistics over the past decades. The goal of model selection is to choose a model which gives the well-posed fit for observational data. So, the investigation of model selection is an indispensable process in empirical analysis. This work proposes a procedure of minimizing -class generalized information criterion to select the optimal weights for constrained linear models. Under some conditions, we examine the asymptotic behaviors of the selection program.

Various methods have been suggested to study the problems of model selection. Knight and Fu [1] discussed the lasso-type estimators with least squares methods. To simultaneously estimate parameters and select important variables, Fan and Peng [2] proposed a method of nonconcave penalized likelihood and demonstrated that this technique had an oracle property when the number of parameters was infinite. Zou and Yuan [3] investigated the oracle theory of model selection based on composite quantile regression. Caner [4] considered model selection by the generalized method of moments estimator. In the empirical likelihood framework, Tang and Leng [5] studied the parametric estimator and variable selection for diverging numbers of parameters. Jennifer et al. [6] explored the ability of automatic selection algorithms to handle the selection problems of both variables and principal components.

Model averaging is another popular and widely used technique for model selection. The method is to average the estimators corresponding to different candidate models. Bayesian and frequentist are two main perspectives of thought in model averaging. Although their spirit and objectives are similar, the two techniques are different in inference and selection of models. In view of the Bayesian model averaging, the basic paradigm was introduced by Leamer [7]. Owing to the difficulty of implementing, the approach was basically ignored until the 2000s. About recent developments of this method, the readers can refer to Brown et al. [8] and Rodney and Herman [9]. Compared with Bayesian model averaging, since the method of frequentist model averaging focused on model selection rather than model averaging, it has been considered by many authors, for instance, Hjort and Claeskens [10], Hansen [11], Liang et al. [12], Zhang and Liang [13], and Hansen and Racine [14].

Generally speaking, different methods of model selection need to construct distinct model selection criteria including AIC [15], Mallows' [16], CV [17], BIC [18], GCV [19], GMM J-statistic [20], and [21]. Zhang et al. [22] employed the generalized information criterion for selecting the regularization parameters. To choose basis functions of splines, Xu and Huang [23] showed the optimal property of a LsoCV criterion and designed an efficient Newton-type algorithm for this criterion. Focusing on the divergence measure of Kullback-Leibler, So and Ando [24] defined a generalized predictive information criterion using the bias correction of an expected weighted loglikelihood estimator. Groen and Kapetanios [25] examined the criteria of AIC and BIC to discuss consistent estimates of a factor-augmented regression.

The literature mentioned above pays more attention to the unconstrained models with independently and identically distributed random errors. Recently, Lai and Xie [26] discussed model selection for constrained models, which were limited to the homoscedastic cases. Instead of using unrestricted models or homoscedastic models, we develop a -class generalized information criterion (-GIC) to discuss the selecting problems of approximately constrained linear models with dependent errors. The -GIC is an extension of the proposed by Shao [27] and includes some conventional model selection criteria, such as BIC and GIC. We employ the technique of weighted average least squares to estimate the approximately constrained models and choose the weights through minimizing the -class generalized information criterion. Our main result demonstrates that the -class generalized information criterion is asymptotically equivalent to the average squared error. In other words, the selected weights from -GIC are asymptotically optimal. Moreover, we highlight two new results which enrich the works of Lai and Xie [26]. One is that an estimate of variance is given and the estimate is proved to be consistent. Another is that the selected weights from -GIC are shown to be still asymptotically optimal, when the true variance is replaced by the suggested estimate. The finite sample properties of model selection are performed by Monte Carlo simulation. The results of simulation reveal that the proposed method of model selection is dominant over some alternative approaches.

The remainder of this paper begins with an illustration of the model set-up and average estimator in Section 2. Section 3 calculates the average squared error of the model average estimator. The -GIC criterion is introduced and its asymptotic optimality is derived in Section 4. Section 5 states some results from simulation evidence and Section 6 is conclusions.

2. Model Set-Up and Average Estimator

The core in risk investment is to build a tracking portfolio of stocks whose return mimics that of a chosen investment target. Let be the return from investing in a selected target and be the historical return of the th stock at time . Assume that stocks are available for building a tracking portfolio of the target. Then, a tracking portfolio consisting of all stocks can be represented by where is unknown parameter and is random error. The left-hand side of (1) is the return of investing one dollar in the target. The right-hand side is the return of investing one dollar in the portfolio consisting of all stocks plus random noise.

Because each parameter stands for the proportion of investment on the corresponding stock to the total investment in the tracking portfolio, the sum of all parameters is one, namely, which means the 100 percent of the whole investment.

In practice, there exist a large number of stocks. These stocks compose various portfolios that may track the target to some degree. Among all possible tracking portfolios, an ideal tracking portfolio should be the one whose return is closest to the target’s return. Therefore, we need to find such an optimal tracking portfolio. Because of the dependance between a tracking portfolio and a restricted linear model, the aim of finding the optimal tracking portfolio can be accomplished by choosing an optimal restricted linear model.

In the following, we extend models (1) and (2) to consider a generalized constrained linear model for the problem of building an optimal tracking portfolio. Suppose that , is a random observation at fixed value , where is fixed-dimensional explanatory variable and is countably infinite-dimensional explanatory variable. Consider the constrained linear model where is a positive integer, and are parameters, is random error, is a constant for restricting the th parameter , and and are some constants. Assume that and converge in mean square.

In (3), the explanatory variable is involved in the model on theoretical grounds or other reasons and is the additional explanatory variable that we need to make sure whether it should be included in the model. In the context of building tracking portfolio, the fixed explanatory variable stands for the historical return of the th stock at time , which must be selected by investors because of their personal preference or the stable earning of this stock. In a tracking portfolio, investors need to select some alternative stocks from numerous stocks to realize their expected return. So, the additional explanatory variable indicates the historical return of the th alternative stock at time . Since can be viewed as a series expansion, the identity (3) includes semiparametric models as special form. In fact, the model (3) generalizes the models considered by Lai and Xie [26] and Liang et al. [12]. In addition, the parameters and denote, respectively, the proportion of the th required stock and the th alternative stock in a tracking portfolio. In (4), the parameter is adjusted by a linear combination of and . The economic significance of (4) is that the proportion of each fixed investment varies with the proportion of all alternative investments. When investors change their preference or have acquired new information on alternative stocks, they are capable of adjusting the proportion between required stocks and alternative stocks according to (4). This implies that the increase or decrease of alternative stocks can affect the proportion of each required stock in a portfolio. Besides, if we assume that , , the model (4) becomes which which has been discussed by Lai and Xie [26]. Particularly, if set , , and , the restricted equation (4) turns into (2).

Denote an index set , where is a positive integer. Let and , where “” stands for the transpose operation. Due to the uncertain number of in formula (3), we consider a sequence of approximately restricted models (3) and (4) with , where the th model includes the first elements of , that is, , and the parameter of is restricted by a linear combination with elements of and a constant . Hence, the th approximately restricted models (3) and (4) are where and are the approximation errors.

Set , , , , , , , and . By matrix notation, the th approximately restricted models (5) and (6) can be rewritten as where is a matrix whose th element is , is a matrix whose th element is , and is a diagonal matrix whose th diagonal element is .

Hypothesize that satisfies and its conditional covariance matrix is in which and with and . Clearly, the random errors follow a heteroscedastic stationary Gaussian process.

Substituting (8) into (7), it yields where , , and . By the method of least squares, the estimator of is where denotes the inverse of . In the th approximating model (7), we set , so that . Thus, the estimator of is where and is the “hat” matrix.

Let be a weight vector, where . Define a weight set as

For all , the weighted average estimator of is

Naturally, the weighted average estimator of is

Furthermore, the weighted average estimator of is where and . It can be seen that the weighted estimator is an average estimator of , . The weighted “hat” matrix depends on nonrandom regressor and weight vector . In general conditions, the matrix is symmetric, but not idempotent.

For a positive integer , let be the maximal value of , , and let , be the nonzero eigenvalue of . Assume that both and are summable, namely,

Since the covariance matrix determines the algebraic structure of the model average estimator, we discuss its properties in the following.

Lemma 1. For any dimensional vectors and , one has , where is the Euclidean norm.

Proof. Through the definition of , one has
Applying Cauchy-Schwarz inequality, for , one gets
Similarly, holds. Therefore,

Because the matrix takes an important role in analyzing the problems of model selection, we state some of its properties. We set . Let and denote the eigenvalue and the largest eigenvalue of , respectively.

Lemma 2. One has (i) and (ii) , where denotes the trace operation.

Proof. It follows from and that is established. Next, we consider . Without loss of generality, let and be the eigenvalues and standardly orthogonal eigenvectors of , respectively. Observe that is idempotent, which implies that , . For any , it can be seen that where , , and . Due to , , it means that

From Lemma 2, we know that is nonnegative definite.

Lemma 3. Let denote the number of eigenvalues of . Then .

Proof. Let be the eigenvalues of . If , we know that Lemma 3 holds. Let and . It is easy to see that , where . Thus, it can be shown that which implies that .

Lemma 4. For any , there exists such that , and , in which .

Proof. Notice that and , where and are any positive semidefinite matrices.
Since and are symmetric and nonnegative definite, the first inequality is established by . Let be eigenvalue of . From the symmetric property of , we know that there exists a orthogonal matrix such that , where . Then, it yields , where is symmetric and nonnegative definite. The second inequality holds because
For the last formula, we note that
This completes the proof of Lemma 4.

Lemma 5. Let be a symmetric matrix and be a random vector with zero expectation. Then, one has , where is the operation of variance.

Proof. The proof of this lemma is provided in Lai and Xie [26].

3. Average Squared Error

Denote an average squared error by where is a fixed positive integer which is often used to eliminate the boundary effect. Andrews [28] suggested that the can take the value of , when errors obeyed an independent and identical distribution with variance . The most common situation is . The average squared error can be viewed as a measure of accuracy between and . Obviously, an optimal estimator can obtain the minimum value of . In other words, we should select a weight vector from to make that the average squared error takes value that is as small as possible.

In order to investigate the problem of weight selection, we give the expression of conditional expected average squared error as follows:

From the definition of , the following lemma can be obtained.

Lemma 6. The conditional expected average squared error can be rewritten as where .

Proof. A straight calculation of leads to
Using and taking conditional expectations on both sides of the above equality give rise to Lemma 6.

4. The -Class Generalized Information Criterion and Asymptotic Optimality

As the value of is unknown, the average squared error cannot be used directly to select the weight vector . Thus, we suggest a -class generalized information criterion (-GIC) to choose the weight vector. Further, we will prove that the selected weight vector from -GIC minimizes the average squared error .

The -GIC for the restricted model average estimator is where is larger than one and satisfies assumption (35) mentioned below. The -class generalized information criterion extends the generalized information criterion () proposed by Shao [27]. Because the can take different values, the -GIC includes some common information criteria for model selection such as the Mallows criterion (), the GIC criterion (,), the criterion (,), and the BIC criterion (,).

Lemma 7. If , we have .

Proof. Recalling the definition of , one gets Observe that
Notice that and . Moreover, we have that and .
Thus, taking conditional expectations on both sides of (31), we obtain
It follows from (33) and Lemma 6 that Lemma 7 is established as desired.

Lemma 7 shows that is equivalent to the conditional expected average squared error plus an error bias. Particularly, when approaches to infinite, is an unbiased estimation of .

The -GIC criterion is defined so as to select the optimal weight vector . The optimal weight vector is chosen by minimizing .

Obviously, the well-posed estimators of parameters are and with the weight vector . Under some regular conditions, we intend to demonstrate that the selection procedure is asymptotically optimal in the following sense:

If the formula (34) holds, we know that the selected weight vector from can realize the minimum value of . In other words, the weight vector is equivalent to the selected weight vector by minimizing and is the optimal weight vector for . The asymptotic optimality of can be established under the following assumptions.

Assumptions. Write . As , we assume that for a large and any positive integer .
The above assumptions have been employed by many literatures of model selection. For instance, the expression (35) was used by Shao [27] and the formula (36) was adopted by Li [29], Andrews [28], Shao [27], and Hansen [11].
The following lemma offers a bridge for proving the asymptotic optimality of .

Lemma 8. We have

Proof. Using , , and the definition of , one obtains
This completes the proof of Lemma 8.

Lemma 8 and imply that

The goal is to choose by minimizing . From (39), one only needs to select through minimizing where .

Compared with , it is sufficient to establish that and are uniformly negligible for any . More specifically, to prove (34), we need to check where “” denotes the convergence in probability.

Following the idea of Li [29], we testify the main result of our work that the minimizing of -class generalized information criterion is asymptotically optimal. Now, we state the main theorem.

Theorem 9. Under assumptions (35) and (36), the minimizing of -class generalized information criterion is asymptotically optimal, namely, (34) holds.

Proof. The asymptotically optimal property of -GIC needs to show that (41), (42), and (43) are valid.
Firstly, we prove that (41) holds. For any , by Chebyshev's inequality, one has which, by Lemma 1, is no greater than
Recalling the definition of , we get . Then, (45) does not exceed . By assumption (36), one knows that (45) tends to zero in probability. Thus, (41) is established.
To prove (42), it suffices to testify that
By Chebyshev's inequality and Lemmas 4 and 5, for any , we have
From (36), we know that (48) is close to zero. Thus, (46) is reasonable.
Recalling Lemma 4 and (35), it yields which derives (47). Then, (42) is proved.
Next, we show that the expression (43) also holds. A straightforward calculation leads to
From the identity (50), we know that
Obviously, the proof of (43) needs to verify that
To prove (52), it should be noticed that and . In addition,
Thus, there exist any and such that
Combining (36) and (45), one knows that both (54) and (55) tend to zero. In other words, (43) is confirmed. We conclude that the expressions (41), (42), and (43) are reasonable. This completes the proof of Theorem 9.

In practice, the covariance of errors is usually unknown and needs to be estimated. However, it is difficult to build a good estimate for in virtue of the special structure of . In the special case, when the random errors are independent and identical distribution with variance , the consistent estimator of can be built for the constrained models (7) and (8). Let in and , where corresponds to a “large” approximating model. Denote . The coming theorem will show that is a consistent estimate of .

Theorem 10. If when and , we have as .

Proof. Writing , one obtains where .
Since , it leads to
For any , it follows from (57) that
Equation (58) implies that . Let be the th diagonal element of the “hat” matrix . Then, is the th diagonal element of and satisfies .
Because and converge to mean square, we have as , where is the th element of , . Notice that
The above expression implies that the second term on the right hand of (56) approaches to zero. By the similar proof of (59), we obtain that the final term on the right-hand of (56) also tends to zero. The proof of Theorem 10 is complete.

In the case of independently and identically distributed errors, if we replace by in the -GIC, the -GIC can be simplified to

Here, we intend to illustrate that the model selection procedure of minimizing is also asymptotically optimal.

Theorem 11. Assume that the random error is i.i.d with mean zero and variance . Under the conditions (35) and (36), the is still asymptotically valid.

Proof. Using a similar technique of deriving (39), we obtain
With an appropriate modification of the proofs (41)–(43), one only needs to verify which is equivalent to showing
From the proof of Theorem 10, we have where .
By assumption (35), one knows that the above equation is close to zero. Thus, we obtain . It follows from Lemma 3 and the definition of that is no greater than . Then, the formula (63) is confirmed. Therefore, we conclude that the minimizing of is also asymptotically optimal.

5. Monte Carlo Simulation

In this section, Monte Carlo simulations are performed to investigate the finite sample properties of the proposed restricted linear model selection. This data generating process is where follows distribution, , and , is independently and identically distributed . The parameter , is determined by the rule , where is a parameter which is selected to control the population . The error is independent of and . We consider two cases of the error distribution.

Case 1. is independently and identically distributed .

Case 2. obey multivariate normal distribution , where is a dimensional covariance matrix. The th diagonal element of is generated from uniform distribution on . The th () nondiagonal element of is , where and denote the th and th elements of , respectively.
The sample size is varied between and . The number of models is determined by , where stands for the integer part of . We set so that varies on a grid between 0.1 and 0.9. The number of simulation trials is . For the -GIC, the value of takes one and adopts the effective number of parameters.
To assess the performance of -GIC, we consider five estimators which are AIC model selection estimators (AIC), BIC model selection estimators (BIC), leave-one-out cross-validated model selection estimator (CV, [17]), Mallows model averaging estimators (MMA, [11]), and   -GIC model selection estimator (-GIC). Following Machado [30], the AIC and BIC are defined, respectively, as

We employ the out-of-sample prediction error to evaluate each estimator. For each replication, are generated as out-of-sample observations. In the th simulation, the prediction error is where is selected by one of the five methods. Then, the out-of-sample prediction error is calculated by where is the number of replication. Obviously, the smaller implies the better method of model estimator. We consider under homoscedastic errors at first. The prediction error calculations are summarized in Figure 1. The four panels in each graph depict results for a variety of sample sizes. In each panel, is displayed on the -axis and is displayed on the -axis.

We find that the -GIC estimators are almost the best estimators among those considered. When is very large, the MMA estimators can sometimes be marginally preferred to the -GIC estimators. In each panel, the AIC and CV have quite similar prediction errors. For a smaller , the AIC obtains a higher prediction error than CV. However, the AIC estimators yield smaller than the CV estimators, when is increasing. In many situations, the of BIC estimator with a large are quite poor relative to the other methods.

Next, we discuss under correlative errors and the calculation is summarized in Figure 2. Broadly speaking, the conclusions are similar to those found in homoscedastic cases. The -GIC estimator frequently yields the most accurate estimators followed by the MMA estimator, and both average estimators enjoy significantly smaller than the other three estimators over a large portion of the space. When , the BIC estimator outperforms the -GIC estimator. Again, the AIC estimator is habitually the worst performing estimator with the CV being a close second in a large region of the space. Besides, their relative efficiency relies closely on sample size with the BIC estimator revealing increasing and the remaining four estimators showing decreasing , as increases.

6. Conclusions

In risk investment, an important subject is to find an optimal portfolio. The commonly used techniques are the optimization methods based on the scheme of mean-variance. However, those methods are cumbersome in computing and cannot obtain the closed solutions for some complex problems. To make up the defects of mean-variance, an alternative methodology for obtaining an optimal portfolio is to use model selection. This paper attempts to develop a statistical program to consider the selection problem of optimal tracking portfolio. We build the theoretical models of tracking portfolios by constrained linear models. Then, the selection problems of optimal portfolio boil down to choosing an optimal constrained linear model.

In the setting of unrestricted models or homoscedastic models, a large number of works investigate the problems of model selection. In distinction, we discuss the model selection for constrained models with dependent errors. The restricted models are estimated by the method of weighted average least squares. Thus, the selection of an optimal constrained model is equivalent to finding a series of optimal weights. We select the weights by minimizing a -class generalized information criterion (-GIC), which is an estimate of the average squared error from the model average fit. The procedure of selecting weights is proved to be asymptotically optimal. Through Monte Carlo simulation, the performance of -GIC is compared against that of four other methods. It is found that the -GIC gives the best performance in most cases.

There are two limitations of our results which are open for further research. First, what is the asymptotic distribution of the parametric estimators? Second, can the theory be generalized to allow for continuous weights? These questions remain to be answered by future research. In this work, we mainly adopt the method of regression analysis to solve the selection problem. In fact, some alternative mathematical tools can also be employed to explore the theoretical properties of model selection. For example, the optimal model can be selected by the methods of linear optimization or quadratic programming and we can apply the techniques of linear functional analysis and stochastic control to consider the inferences of parametric estimator. Besides, we mention that the applications of this study can also be extended to some other fields including risk management, ruin theory, and factor analysis.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This work was supported by the Shandong Institute of Business and Technology (SIBT Grant 521014306203).