Abstract

As major components of the random matrix theory, Gaussian random matrices have been playing an important role in many fields, because they are both unitary invariant and have independent entries and can be used as models for multivariate data or multivariate phenomena. Tail bounds for eigenvalues of Gaussian random matrices are one of the hot study problems. In this paper, we present tail and expectation bounds for the norm of Gaussian random matrices, respectively. Moreover, the tail and expectation bounds for the norm of the Gaussian Wigner matrix are calculated based on the resulting bounds. Compared with existing results, our results are more suitable for the high-dimensional matrix case. Finally, we study the tail bounds for the parameter vector of some existing regularization algorithms.

1.  Introduction

The random matrix theory (RMT) has developed into an important field of probability theory. RMT has been used in many disciplines, e.g., high-dimensional data analysis [1], neural networks [2], the matrix low-rank approximation [3], compressed sensing [4], dimension reduction [5], combinatorial optimization [6], deep learning [7], wireless communications [8], multiview unsupervised feature selection [9], local and global problems of random matrices [10], multiview subspace clustering [11], and feature selective projection [12].

The research on tail bounds for sums of random matrices can be traced back to the Ahlswede–Winter method [13]. Tropp [14] achieved tighter results and proposed a user-friendly tail behavior for sums of random matrices. To conquer the influence of the matrix dimension, tail bounds dependent on the intrinsic dimension have been proposed [15, 16]. Zhang et al. [17] presented the dimension-free tail bounds of the largest singular value for sums of random matrices. Gao et al. [18] obtained dimension-free bounds for the largest singular values of matrix Gaussian series. There have also been studies on the small-deviation behavior of random matrices [19, 20].

1.1.  Related Works

Let be a Gaussian random matrix, whose entries are independent standard normal variables. Let denote the Hadamard product of matrices. This means can be represented as the Hadamard product of two matrices:where the dimension of the matrix is the same as that of , whose entries are 1. According to Corollary 4.2 in [14] and assumption (in this paper, unless otherwise specified, the assumption is satisfied), we can obtain that for any ,

The symbol represents the spectral norm of .

According to Theorem 2 in [18], we can obtain that for any ,

The symbol represents the largest singular value (LSV) of .

The expectation bound is derived from Theorem 4.6.1 in [21] for the spectral norm of :

For the LSV of , the expectation bound can be obtained from Theorem 4 in [18]:

However, to the best of our knowledge, there is little work on the tail bounds for the norm of Gaussian random matrices.

1.2.  Overview of Main Results

Gaussian random matrices are used in many fields, e.g., signal processing and combinatorial optimization. In order to supplement and improve the theoretical research of random matrix concentration inequalities, in this paper, we present tail bounds for the norm of Gaussian random matrices, i.e., the upper bound of

The symbol represents the norm of . We also obtain the expectation bounds for the norm of Gaussian random matrices . According to the definition of the norm, our results are obtained based on the Laplace-transform method of the folded Gaussian distribution. As applications, we use the obtained theorem to compute the tail and the expectation bounds for the norm of the Gaussian Wigner matrix. In contrast to the existing results, our results are more suitable for the high-dimensional matrix case.

The rest of this paper is organized as follows. Section 2 introduces some preliminary knowledge on the norm and folded Gaussian distribution. Section 3 gives the tail and expectation bounds for the norm of Gaussian random matrices and presents the application of our results in the study of the Gaussian Wigner matrix. In Section 4, we study the tail bounds for the parameter vector of some existing regularization algorithms. The last section concludes the paper.

2.  Notations and Preliminaries

In this section, we provide some preliminary knowledge on the norm of matrices and the folded Gaussian distribution.

2.1.  The Norm of Matrices

For a matrix , the entrywise norm of a matrix is defined aswhere is the entry of . The norm is an optimal convex approximation to the norm, so it can also represent sparsity and makes the model easier to interpret. The norm is widely used in machine learning and its related fields, e.g., low-rank approximation [22] and dictionary learning [23].

2.2.  Folded Gaussian Distribution

The folded Gaussian distribution is the absolute value of a random variable with a normal distribution, which is widely applied in many aspects, especially in Bayesian inference applications [24] and process capability measures [25]. Given a standard normal variable , the random variable is subject to the folded Gaussian distribution. The probability density function (PDF) is given byfor , and 0 everywhere else.

The moment generating function (mgf) is given bywhere is the normal cumulative distribution function:where the is the error function (the error function is defined as: ).

Since , then, there exists such that

The third inequality holds because the is an odd function.

3.  Main Results

In this section, we present the tail and the expectation bounds for the norm of Gaussian random matrices and use our theoretical findings to compute the tail and the expectation bounds of the Gaussian Wigner matrix.

3.1.  Bounds for Gaussian Random Matrices

Based on the mgf bound (11), we first obtain the tail bound for the norm of Gaussian random matrices.

Theorem 1. Let be a Gaussian random matrix with dimension . That is, its entries are independent standard normal variables. Then there holds that for any ,

Proof. For any positive number , we haveThe first identity uses the monotonicity of the scalar exponential function and the second relation is Markov’s inequality. The infimum is attained when , and this completes the proof.

Compared with the previous result (2), the tail bound (12) has no dimensional coefficient term. Therefore, the tail bound (12) is more applicable for the case of high-dimensional matrices and does not need intrinsic dimension to optimize the result. Compared with the existing result (3), our results are less affected by dimension.

Based on Theorem 1, we can obtain the expectation bound for the norm of Gaussian random matrices.

Theorem 2. Given a Gaussian random matrix , then there holds that

Proof. According to Theorem 1, we haveAccording to Jensen’s inequality, we haveThis completes the proof.

Note that the second bound (14) is tighter than the previous result (5). This becomes more and more apparent as the dimension of the matrix increases.

3.2.  Gaussian Wigner Matrix

As an application, we use our resulting bounds to compute the tail and expectation bounds for the norm of the Gaussian Wigner matrix.

Consider a Gaussian Wigner matrix in the following form:where are independent standard normal variables.

The research on the large deviation principle for the Gaussian Wigner matrix is a classic problem in the random matrix theory [2628]. Next, based on our resulting bounds, we compute the tail and expectation bounds for the norm of as follows. For any ,

Compared with the previous result (17) () in [18], the tail bound (18) has no dimensional coefficient term . Note that the second bound (19) is tighter than the previous result (19) () in [18]. Therefore, our bounds are more applicable for the case of high-dimensional matrices.

4.  Analysis of Regularization Algorithms

Now that the comparison of tail bounds for different norms has been established, we will study the tail bounds for the parameter vector of some existing regularization algorithms. More precisely, we will analyse two well-known regularization algorithms for machine learning and regularization algorithms.

Given a sample set , where and . We consider the following linear regression model:where denotes the inner product, is the weight vector (weight matrix: 1 column, rows), and is the bias. The model is prone to overfitting and has a low tolerance for noisy information. To overcome these shortcomings, a regularization term is added to the objective function:where is the regularization coefficient. Typically, the regularization can be set to or . The linear regression model with is called ridge regression [29] and with is the least absolute shrinkage and selection operator (Lasso) [30].

As shown in Figure 1, the solution of Lasso usually contains some 0 components, i.e., the weight vector is sparse. Therefore, Lasso has become one of the most popular methods for feature extraction and has been successfully applied to many practical problems [3134].

The aim is to find a that minimizes the objective function. The error bound with respect to the parameters is the supremumwith an alternative probability expression

In machine learning, the weight initialization method has a crucial impact on the convergence rate of the model and the quality of model performance. The Gaussian initialization method is the simplest initialization method, which is to initialize the weights according to a Gaussian distribution. Through the tail bound of Section 3, we know that the tail bound for the norm of Gaussian random matrices is tighter than the tail bound for the norm. Actually, the difference between and is that the minimization process is different. As shown in Figure 2, descends according to the absolute value function, while descends according to the quadratic function. The falling speed of is faster than that of near the point.

5. Conclusion

In this paper, we present the tail bound for the norm of Gaussian random matrices. In particular, we also give the expectation bound for the norm of Gaussian random matrices. As an application, we use our resulting bounds to compute the tail and expectation bounds for the norm of the Gaussian Wigner matrix. In contrast to the existing results, our results are more suitable for the high-dimensional matrix case. Finally, we study the tail bounds for the parameter vector of and regularization algorithms.

In future work, we will apply the theoretical results to some potential practical scenarios, such as clustering and classification.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (12101378), Project of Science and Technology Innovation Fund of Shanxi Agricultural University (2021BQ10), Project of Scientific Research for Excellent Doctors, Shanxi Province, China (SXBYKY2021046), and Shanxi Provincial Research Foundation for Basic Research, China (20210302124548).