Tail Bounds for Norm of Gaussian Random Matrices with Applications
As major components of the random matrix theory, Gaussian random matrices have been playing an important role in many fields, because they are both unitary invariant and have independent entries and can be used as models for multivariate data or multivariate phenomena. Tail bounds for eigenvalues of Gaussian random matrices are one of the hot study problems. In this paper, we present tail and expectation bounds for the norm of Gaussian random matrices, respectively. Moreover, the tail and expectation bounds for the norm of the Gaussian Wigner matrix are calculated based on the resulting bounds. Compared with existing results, our results are more suitable for the high-dimensional matrix case. Finally, we study the tail bounds for the parameter vector of some existing regularization algorithms.
The random matrix theory (RMT) has developed into an important field of probability theory. RMT has been used in many disciplines, e.g., high-dimensional data analysis , neural networks , the matrix low-rank approximation , compressed sensing , dimension reduction , combinatorial optimization , deep learning , wireless communications , multiview unsupervised feature selection , local and global problems of random matrices , multiview subspace clustering , and feature selective projection .
The research on tail bounds for sums of random matrices can be traced back to the Ahlswede–Winter method . Tropp  achieved tighter results and proposed a user-friendly tail behavior for sums of random matrices. To conquer the influence of the matrix dimension, tail bounds dependent on the intrinsic dimension have been proposed [15, 16]. Zhang et al.  presented the dimension-free tail bounds of the largest singular value for sums of random matrices. Gao et al.  obtained dimension-free bounds for the largest singular values of matrix Gaussian series. There have also been studies on the small-deviation behavior of random matrices [19, 20].
1.1. Related Works
Let be a Gaussian random matrix, whose entries are independent standard normal variables. Let denote the Hadamard product of matrices. This means can be represented as the Hadamard product of two matrices:where the dimension of the matrix is the same as that of , whose entries are 1. According to Corollary 4.2 in  and assumption (in this paper, unless otherwise specified, the assumption is satisfied), we can obtain that for any ,
The symbol represents the spectral norm of .
According to Theorem 2 in , we can obtain that for any ,
The symbol represents the largest singular value (LSV) of .
The expectation bound is derived from Theorem 4.6.1 in  for the spectral norm of :
For the LSV of , the expectation bound can be obtained from Theorem 4 in :
However, to the best of our knowledge, there is little work on the tail bounds for the norm of Gaussian random matrices.
1.2. Overview of Main Results
Gaussian random matrices are used in many fields, e.g., signal processing and combinatorial optimization. In order to supplement and improve the theoretical research of random matrix concentration inequalities, in this paper, we present tail bounds for the norm of Gaussian random matrices, i.e., the upper bound of
The symbol represents the norm of . We also obtain the expectation bounds for the norm of Gaussian random matrices . According to the definition of the norm, our results are obtained based on the Laplace-transform method of the folded Gaussian distribution. As applications, we use the obtained theorem to compute the tail and the expectation bounds for the norm of the Gaussian Wigner matrix. In contrast to the existing results, our results are more suitable for the high-dimensional matrix case.
The rest of this paper is organized as follows. Section 2 introduces some preliminary knowledge on the norm and folded Gaussian distribution. Section 3 gives the tail and expectation bounds for the norm of Gaussian random matrices and presents the application of our results in the study of the Gaussian Wigner matrix. In Section 4, we study the tail bounds for the parameter vector of some existing regularization algorithms. The last section concludes the paper.
2. Notations and Preliminaries
In this section, we provide some preliminary knowledge on the norm of matrices and the folded Gaussian distribution.
2.1. The Norm of Matrices
For a matrix , the entrywise norm of a matrix is defined aswhere is the entry of . The norm is an optimal convex approximation to the norm, so it can also represent sparsity and makes the model easier to interpret. The norm is widely used in machine learning and its related fields, e.g., low-rank approximation  and dictionary learning .
2.2. Folded Gaussian Distribution
The folded Gaussian distribution is the absolute value of a random variable with a normal distribution, which is widely applied in many aspects, especially in Bayesian inference applications  and process capability measures . Given a standard normal variable , the random variable is subject to the folded Gaussian distribution. The probability density function (PDF) is given byfor , and 0 everywhere else.
The moment generating function (mgf) is given bywhere is the normal cumulative distribution function:where the is the error function (the error function is defined as: ).
Since , then, there exists such that
The third inequality holds because the is an odd function.
3. Main Results
In this section, we present the tail and the expectation bounds for the norm of Gaussian random matrices and use our theoretical findings to compute the tail and the expectation bounds of the Gaussian Wigner matrix.
3.1. Bounds for Gaussian Random Matrices
Based on the mgf bound (11), we first obtain the tail bound for the norm of Gaussian random matrices.
Theorem 1. Let be a Gaussian random matrix with dimension . That is, its entries are independent standard normal variables. Then there holds that for any ,
Proof. For any positive number , we haveThe first identity uses the monotonicity of the scalar exponential function and the second relation is Markov’s inequality. The infimum is attained when , and this completes the proof.
Compared with the previous result (2), the tail bound (12) has no dimensional coefficient term. Therefore, the tail bound (12) is more applicable for the case of high-dimensional matrices and does not need intrinsic dimension to optimize the result. Compared with the existing result (3), our results are less affected by dimension.
Based on Theorem 1, we can obtain the expectation bound for the norm of Gaussian random matrices.
Theorem 2. Given a Gaussian random matrix , then there holds that
Proof. According to Theorem 1, we haveAccording to Jensen’s inequality, we haveThis completes the proof.
3.2. Gaussian Wigner Matrix
As an application, we use our resulting bounds to compute the tail and expectation bounds for the norm of the Gaussian Wigner matrix.
Consider a Gaussian Wigner matrix in the following form:where are independent standard normal variables.
The research on the large deviation principle for the Gaussian Wigner matrix is a classic problem in the random matrix theory [26–28]. Next, based on our resulting bounds, we compute the tail and expectation bounds for the norm of as follows. For any ,
Compared with the previous result (17) () in , the tail bound (18) has no dimensional coefficient term . Note that the second bound (19) is tighter than the previous result (19) () in . Therefore, our bounds are more applicable for the case of high-dimensional matrices.
4. Analysis of Regularization Algorithms
Now that the comparison of tail bounds for different norms has been established, we will study the tail bounds for the parameter vector of some existing regularization algorithms. More precisely, we will analyse two well-known regularization algorithms for machine learning and regularization algorithms.
Given a sample set , where and . We consider the following linear regression model:where denotes the inner product, is the weight vector (weight matrix: 1 column, rows), and is the bias. The model is prone to overfitting and has a low tolerance for noisy information. To overcome these shortcomings, a regularization term is added to the objective function:where is the regularization coefficient. Typically, the regularization can be set to or . The linear regression model with is called ridge regression  and with is the least absolute shrinkage and selection operator (Lasso) .
As shown in Figure 1, the solution of Lasso usually contains some 0 components, i.e., the weight vector is sparse. Therefore, Lasso has become one of the most popular methods for feature extraction and has been successfully applied to many practical problems [31–34].
The aim is to find a that minimizes the objective function. The error bound with respect to the parameters is the supremumwith an alternative probability expression
In machine learning, the weight initialization method has a crucial impact on the convergence rate of the model and the quality of model performance. The Gaussian initialization method is the simplest initialization method, which is to initialize the weights according to a Gaussian distribution. Through the tail bound of Section 3, we know that the tail bound for the norm of Gaussian random matrices is tighter than the tail bound for the norm. Actually, the difference between and is that the minimization process is different. As shown in Figure 2, descends according to the absolute value function, while descends according to the quadratic function. The falling speed of is faster than that of near the point.
In this paper, we present the tail bound for the norm of Gaussian random matrices. In particular, we also give the expectation bound for the norm of Gaussian random matrices. As an application, we use our resulting bounds to compute the tail and expectation bounds for the norm of the Gaussian Wigner matrix. In contrast to the existing results, our results are more suitable for the high-dimensional matrix case. Finally, we study the tail bounds for the parameter vector of and regularization algorithms.
In future work, we will apply the theoretical results to some potential practical scenarios, such as clustering and classification.
No data were used to support this study.
Conflicts of Interest
The authors declare no conflicts of interest.
This work was supported by the National Natural Science Foundation of China (12101378), Project of Science and Technology Innovation Fund of Shanxi Agricultural University (2021BQ10), Project of Scientific Research for Excellent Doctors, Shanxi Province, China (SXBYKY2021046), and Shanxi Provincial Research Foundation for Basic Research, China (20210302124548).
P. Bühlmann and S. Van De Geer, Statistics for High-Dimensional Data: Methods, Theory and Applications, Springer Science & Business Media, Berlin, Germany, 2011.
C. Louart, Z. Liao, and R. Couillet, “A random matrix approach to neural networks,” Annals of Applied Probability, vol. 28, no. 2, pp. 1190–1248, 2018.View at: Google Scholar
A. Gittens and M. W. Mahoney, “Revisiting the nystr’́om method for improved large-scale machine learning,” Journal of Machine Learning Research, vol. 17, no. 1, pp. 3977–4041, 2016.View at: Google Scholar
K. L. Clarkson and D. P. Woodruff, “Low rank approximation and regression in input sparsity time,” in Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 81–90, Palo Alto, CA, USA, June 2013.View at: Google Scholar
A. Naor, O. Regev, and T. Vidick, “Efficient rounding for the noncommutative grothendieck inequality,” in Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, pp. 71–80, Palo Alto, CA, USA, June 2013.View at: Google Scholar
C. H. Martin and M. W. Mahoney, “Implicit self-regularization in deep neural networks: evidence from random matrix theory and implications for learning, ” Journal of Machine Learning Research, vol. 22, no. 165, 2021.
C. Zhang, L. Du, and D. Tao, “Lsv-based tail inequalities for sums of random matrices,” Neural Computation, vol. 34, 2016.View at: Google Scholar
J. A. Tropp, “An Introduction to matrix concentration inequalities,” 2015, https://arxiv.org/abs/1501.01571.View at: Google Scholar
Z. Song, D. P. Woodruff, and P. Zhong, “Low rank approximation with entrywise -norm error,” in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 688–701, Montreal Canada, June 2017.View at: Google Scholar
W. Jiang, F. Nie, and H. Huang, “Robust dictionary learning with capped -norm,” in Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, July 2015.View at: Google Scholar
A. Tikhonov and V. Arsenin, Solutions of Ill-Posed Problems, Halsted Press, New York, NY, USA, 1977.