Stochastic Systems and Control: Theory and ApplicationsView this Special Issue
Research Article | Open Access
On the Degrees of Freedom of Mixed Matrix Regression
With the increasing prominence of big data in modern science, data of interest are more complex and stochastic. To deal with the complex matrix and vector data, this paper focuses on the mixed matrix regression model. We mainly establish the degrees of freedom of the underlying stochastic model, which is one of the important topics to construct adaptive selection criteria for efficiently selecting the optimal model fit. Under some mild conditions, we prove that the degrees of freedom of mixed matrix regression model are the sum of the degrees of freedom of Lasso and regularized matrix regression. Moreover, we establish the degrees of freedom of nuclear-norm regularization multivariate regression. Furthermore, we prove that the estimates of the degrees of freedom of the underlying models process the consistent property.
With the increasing prominence of large-scale data in modern science, data of interest are more complex, which may be in the form of a matrix, not a vector. At the same time, the random noises are not always normal. These complex stochastic data are frequently collected in a large variety of research areas such as information technology, engineering, medical imaging and diagnosis, and finance [1–7]. For instance, a well-known example is the study of an electroencephalography data set of alcoholism. The study consists of 122 subjects with two groups, an alcoholic group and a normal control group, and each subject was exposed to a stimulus. Voltage values were measured from 64 channels of electrodes placed on the subject’s scalp for 256 time points, so each sampling unit is a matrix. To address scientific questions arising from those data, sparsity or other forms of regularization are crucial owing to the ultrahigh dimensionality and complex structure of the matrix data. Often, a variety of models in statistics lead to the estimation of matrices with rank constraints. The true signal often has low rank, which can be well approximated by a low rank matrix. Recently, Zhou and Li  proposed the so-called regularized matrix regression model to deal with these matrix form data, which is based on spectral regularization. This model includes the well-known Lasso as a special case; see  for more details. Moreover, one of the main results in  claimed the degrees of freedom of the proposed model under orthonormal assumption.
Degrees of freedom of the underlying stochastic model are one of the important topics. As we know, if we want to evaluate the performance of a model when we use it to analyze data, we need to choose the optimal tuning parameter in the same model. Many methods have been proposed to solve this problem. The popular methods include , AIC, and BIC [9–11]. There is also a computational cost method named cross-validation. Efron  showed that is an unbiased estimate of prediction error, and in most cases provides an accurate parameter over cross-validation. Thus, and AIC outperform the cross-validation. The fundamental idea of , AIC, and BIC is connected with the concept of degrees of freedom.
Degrees of freedom can be easily understood in linear model. In linear case, the degrees of freedom are the number of prediction variables. However, if there exist constraints on the prediction variables, the degrees of freedom do not exactly correspond to the number of variables; see, for example, [5, 12–18]. After Stein  got Stein’s unbiased estimation, analytical forms of the degrees of freedom of different models have been studied for vector case. For instance, Hastie and Tibshirani  showed that the degrees of freedom of a linear smoother equal the trace of the prediction matrix. In general, it is difficult to get the degrees of freedom of many models. In 1998 Ye  and in 2002 Shen and Ye  used the computational method to predict the degrees of freedom. However, there is a deficient thing that the more data, the more cost of computation. For high-dimension vector case, Zou et al.  gave the degrees of freedom of Lasso. Furthermore, Tibshirani and Taylor [17, 18] gave the degrees of freedom of generalized Lasso.
However, for matrix case, there are a few results about the degrees of freedom of matrix regression. One can see that getting the analytical form of the degrees of freedom of our model is very essential both in theory and in practice. Thus, it is important to study the degrees of freedom in matrix case in the big data era. Notice that, besides Zhou and Li’s work  about the degrees of freedom of regularized matrix regression, Yuan  got the degrees of freedom in low rank matrix estimation, which includes the cases of the rank constraints and nuclear-norm regularization. Note that Yuan  just considered the rank constraints of multivariate regression, and Zhou and Li  did not consider the mixed case, which is combined with matrix and vector. If we use the nuclear norm as the penalty, what are the degrees of freedom of that model? If the variables are mixed, what are the degrees of freedom of that model?
We will answer the above questions affirmatively in our paper. Firstly, we prove that the degrees of freedom of mixed matrix regression model are the sum of the degrees of freedom of Lasso and regularized matrix regression; this result can be useful to construct adaptive selection criteria for efficiently selecting the optimal model fit. Then, following the same idea we establish the degrees of freedom of nuclear-norm regularization multivariate regression. It is worth noticing that Zou et al.  not only gave the unbiased estimate of the degrees of freedom of Lasso model, but also proved the following consistency of the estimate. This is an interesting and important work on the estimates of the degrees of freedom of Lasso. Based on their work, we finally prove that the estimates of the degrees of freedom given in this paper are consistent.
Our paper is organized as follows. In Section 2, we introduce the primary model, basic concepts, and notations used in our paper. In Section 3, we show the process of computing the degrees of freedom of model (3). In Section 4, we give the degrees of freedom of multivariate regression with nuclear-norm regularization. In Section 5, we verify the consistent property of the estimates. We conclude the paper with a discussion of potential future research in Section 6.
In this section, we mainly introduce our model and basic concepts. First we present mixed matrix regression model. Then for convenient discussion and understanding of our work, we give some basic knowledge and notations.
Suppose is the response variable, is the prediction vector, and is the prediction matrix. They are known. Let and be unknown prediction vector and matrix. The statistical model of matrix regression is given as where is the sum of multiply of corresponding element of and ; is the prediction error of the model. Suppose we take samples Note that, in the real data case, there are always some special structures of and such that has low rank and is usually sparse. In this case, we define mixed matrix regression model aswhere , are the regularized parameters and is the nuclear norm of which is the sum of singular values of . That is, if has singular decomposition, , where and , are normal orthogonal matrix, and is a matrix whose elements of main diagonal are singular values of with , then . is defined as the sum of absolute values of every component of . That is, if , . Clearly, if in model (3), we will get Lasso model. For Lasso model, the research is very mature including algorithm and the degrees of freedom. In statistical parlance, Lasso uses an penalty which has the effect of forcing some of the coefficient estimates to be exactly equal to zero when the tuning parameter is sufficiently large. We say that Lasso yields sparse models that just involve a subset of the variables, performing variable selection. Lasso has been widely used in statistical and machine learning. In model (3), if , we will get regularized matrix regression model mainly studied in .
Now we review some basic results on the degrees of freedom. Based on Stein’s unbiased estimation, Efron et al.  showed that the effective degrees of freedom of any fitting procedure has a rigorous definition under the differentiability condition on the estimate of based on , where denotes the response vector. That is, given a method , let denote its fit. Then under the differentiability of , the degrees of freedom of are given by This means that the degrees of the freedom of are the trace of the Jacobian matrix which is a special case of Definition 1. Once we get the degrees of freedom, we can establish three well-known information criteria , AIC, and BIC under the normal noise case. That is,
In order to deal with the degrees of freedom of mixed matrix regression model, we define an operator to simplify the expression of optimal question and the Jacobian matrix of matrix function.
Definition 1. Suppose there is a matrix function :Then one defines the Jacobian matrix as .
Suppose , . We vectorize the matrix into a vector by column. For example, . Then the Jacobian matrix of can be written as
Definition 2. Let the operator be defined from to by
It is easy to verify that the operator is linear and .
Let , , . Then, we can rewrite mixed matrix regression model (3) asLet denote the unknown coefficients, and let denote the prediction matrix. Our paper is based on the assumptions that is full rank and the matrix data and vector data are independent, that is, .
3. The Unbiased Estimate of the Degrees of Freedom
We begin with the least squares estimate of our mixed matrix regression, which is the optimal solution of the problem By taking the partial deviation of the minimal question, we can know that From our definitions in Section 2, we can easily verify that , . From the relationship , we obtain thatBy taking the partial deviation of the implicit functions on above, we getThus, we derive . If is a full rank matrix, we can get .
Based on the definition of the degrees of freedom, we know that if the estimation is differentiable on , is an unbiased estimate of the degrees of freedom. Combining with the chain rule and the Jacobian matrix of fitted value with respect to responses, we can getThis together with the above arguments, we can getBecause , it is easy to yield
We are ready to present our main result in this section.
Theorem 3. Let be the usual least squares estimate of B and assume that it has distinct positive singular values ; then the unbiased estimate of the degrees of freedom of model (9) iswhere is the estimate of and is the number of nonzero elements in . Clearly, is the degrees of freedom of mixed matrix regression.
Theorem 3 is an immediate result of the following two propositions whose proofs are relegated to the Appendix for the sake of presentation.
Proposition 4. For any , the unbiased estimate of the degrees of freedom of regularized matrix regression model equals given bywhere is the usual least squares estimate of B and assume that it has distinct positive singular values .
Proposition 5. , the unbiased estimate of the degrees of freedom of Lasso equals given by
4. Multivariate Regression with Nuclear-Norm Regularization
This section considers the multivariate regression, which has the following statistical model where is an response matrix, is an prediction matrix, is a unknown coefficient matrix, and the regression random noise .
Very recently, Yuan  studied the degrees of freedom of multivariate regression with low rank constraint via the following optimal model: Since the above optimal model with the low rank constraint is difficult to compute, it is NP-hard problem. In this case, we usually relax the rank constraint to nuclear-norm regularization. Then we get the nuclear-norm regularization multivariate regression model
Following the same technique as in the proof of Theorem 3, we can easily obtain the degrees of freedom of the nuclear-norm regularization multivariate regression. We omit its proof for brevity.
Theorem 6. Assume that in (22). Let be the usual least squares estimate and assume that it has distinct positive singular values , where . With the convention for , the following expression is an unbiased estimate of the degrees of freedom of the regularized fit (22):Thus is the degrees of freedom of the nuclear-norm regularization multivariate regression.
5. Consistency of the Unbiased Estimate
The consistency of an estimate is important because it implies that the estimate is convergent to true value in probability. Suppose the estimated random variable is ; we use statistical methods to get an estimate , which is a function of the size of sample. If is a consistent estimate of , it means that, with the sample size increasing, equals almost everywhere. That is, for any , we can get
In this section, we prove the consistent property of the estimates of the degrees of freedom given in the former sections. We will first prove the consistency of the unbiased estimate of regularized matrix regression. To do so, we need the following proposition on the continuous property of .
Proposition 7. An unbiased estimate of the degrees of freedom of regularized matrix regression model iswhere is the singular value of the least square estimate. In this case, the degrees of freedom are only continuous in ,
Proof. For any , we know that . So the degrees of freedom of regularized matrix regression model are written asIt is obvious that is a linear function on . Thus, is continuous in , .
We next prove that is not continuous in , . If and , , we obtainIf and , , we have Therefore, we get . Clearly, is not continuous in , .
Now, we show the unbiased estimate is consistent to the true degrees of freedom.
Theorem 8. Suppose is the singular value of the least square estimate of the regularized matrix regression model, and , where is not equal to the singular values, that means , . Then, in probability.
Proof. By assumption and Proposition 7, it holds that is a continuous function in , . If we have a sequence satisfying , , the continuous mapping theorem implies that . Immediately, we see . By using the dominated convergence theorem, we can getHence, .
Notice that, for the vector case, Zou et al.  not only gave the unbiased estimate of the degrees of freedom of the Lasso model, but also proved the following consistency of the estimate.
Proposition 9. For the Lasso model, if with being a nontransition point, in probability.
In this paper, we mainly obtain the degrees of freedom of mixed matrix regression model. Moreover, we prove that the obtained estimates of degrees of freedom are consistent. Note that our results of the degrees of freedom are given under the assumption that the prediction matrix and vector are independent. However, if they are not independent but in linear relationship or another nonlinear relationship, or the number of samples is less than the number of variables, what is the analytical form of degrees of freedom? We will leave this as a future research topic.
Proposition A.1. For a given matrix A with singular value decomposition , denotes any function of singular vectors of . The optimal solution toshares the same singular vectors as A and its ordered singular values are the solution to
An immediate consequence of the above proposition is the well-known singular value thresholding formula for nuclear-norm regularization.
Corollary A.2. For a given matrix A with singular value decomposition . The optimal solution to shares the same singular vectors as A and its singular values are .
Before proving Proposition 4, we also need some lemmas.
Lemma A.3. Suppose that has singular decomposition . The estimate , where has diagonal entries .
Proof. According to the result of Corollary A.2, we just need to show that Note that, for any matrix , . Thus, we can getDirect calculation yields that . We then derive
Lemma A.4. One has
Proof. Since , the eigenvectors of the symmetric matrix coincide with the right singular vectors of . Then, by the chain rule,Now
By the well-known formula for the differential of eigenvector, , where is the Moore-Penrose generalized inverse of a matrix .
The Jacobian matrix of the symmetric product is , where is the commutation matrix.
Now, by cycle permutation invariance of the trace function, we have Then, By symmetry, we also have
Lemma A.5. One has
Proof. As in the proof of Lemma A.4, we utilize the fact that is the positive square root of the eigenvalues of the symmetric matrix . Then, by the chain rule and the Jacobian matrix of fitted value with respect to responses,By combining , , , and we obtain that
Proof of Proposition 4. We only need to show that the optimal of our model is the solution to the following problem:The least square estimate of in model (9) is the solution of the following:So . Under the assumption, it is interesting to find that it has no relationship with and can be get from the following model:Thus, by Lemma A.3, we haveBy Lemmas A.4 and A.5, we easily yield the desired conclusion.
It is worth noting that Zou et al.  showed that the degrees of freedom of Lasso fit are that , where is the effective set of the Lasso coefficient estimates . Thus, we know that is an unbiased estimation of the degrees of freedom.
Proof of Proposition 5. As we mentioned in Section 3, under a differentiability condition on , is an unbiased estimation of the degrees of freedom. By the chain rule, Because , we can get . The usual least square estimate for Lasso model is defined by So . If , , then we haveSo we can getIn the mixed case, under the assumptions, we obtain that the optimal is the solution of the following:It has no relationship with . Thus, in a similar way, we easily obtain The proof is completed.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported in part by the Fundamental Research Funds for the Central Universities (2017JBM323) and the National Natural Science Foundation of China (11671029).
- B. Pete and V. Sara, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer, 2011.
- S. Negahban and M. J. Wainwright, “Estimation of (near) low-rank matrices with noise and high-dimensional scaling,” The Annals of Statistics, vol. 39, no. 2, pp. 1069–1097, 2011.
- Y. Li, W. Zhang, and X. Liu, “Stability of nonlinear stochastic discrete-time systems,” Journal of Applied Mathematics, vol. 2013, Article ID 356746, 2013.
- X. Liu, Y. Li, and W. Zhang, “Stochastic linear quadratic optimal control with constraint for discrete-time systems,” Applied Mathematics and Computation, vol. 228, pp. 264–270, 2014.
- H. Zhou and L. Li, “Regularized matrix regression,” Journal of the Royal Statistical Society. Series B. Statistical Methodology, vol. 76, no. 2, pp. 463–483, 2014.
- Y. Zhao and W. Zhang, “Observer-based controller design for singular stochastic markov jump systems with state dependent noise,” Journal of Systems Science and Complexity, vol. 29, pp. 946–958, 2016.
- H. Ma and Y. Jia, “Stability analysis for stochastic differential equations with infinite Markovian switchings,” Journal of Mathematical Analysis and Applications, vol. 435, no. 1, pp. 593–605, 2016.
- R. Tibshirani, “Regression shrinkage and selection via the Lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, pp. 267–288, 1996.
- C. L. Mallows, “Some comments on Cp,” Technometrics, vol. 15, no. 4, pp. 661–675, 1973.
- H. Akaike, “Information theory and an extension of tha maximum likelihood principle,” in Part of the Series Springer Series in Statistics, In second international symposium on information theory, pp. 267–281, Springer, New York, NY, USA, 1973.
- B. Efron, “The estimation of prediction error: covariance penalties and cross-validation,” Journal of the American Statistical Association, vol. 99, no. 467, pp. 619–642, 2004.
- C. M. Stein, “Estimation of the mean of a multivariate normal distribution,” The Annals of Statistics, vol. 9, no. 6, pp. 1135–1151, 1981.
- T. Hastie and R. Tibshirani, Generalized Additive Models, Chapman & Hall, New York, NY, USA, 1990.
- H. Zou, T. Hastie, and R. Tibshirani, “On the “degrees of freedom'' of the lasso,” The Annals of Statistics, vol. 35, no. 5, pp. 2173–2192, 2007.
- J. Ye, “On measuring and correcting the effects of data mining and model selection,” Journal of the American Statistical Association, vol. 93, no. 441, pp. 120–131, 1998.
- X. Shen and J. Ye, “Adaptive model selection,” Journal of the American Statistical Association, vol. 97, no. 457, pp. 210–221, 2002.
- R. J. Tibshirani and J. Taylor, “Degrees of freedom in lasso problems,” The Annals of Statistics, vol. 40, no. 2, pp. 1198–1232, 2012.
- R. J. Tibshirani and J. Taylor, “The solution path of the generalized lasso,” The Annals of Statistics, vol. 39, no. 3, pp. 1335–1371, 2011.
- M. Yuan, “Degrees of freedom in low rank matrix estimation,” Science China. Mathematics, vol. 59, no. 12, pp. 2485–2502, 2016.
- B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” The Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004.
Copyright © 2017 Pan Shang and Lingchen Kong. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.