A Subspace Embedding Method in Norm via Fast Cauchy Transform
We propose a subspace embedding method via Fast Cauchy Transform (FCT) in norm. It is motivated by and complements the work of the subspace embedding method in norm, for all except , by K. L. Clarkson (ACM-SIAM, 2013). Unlike the traditionally used orthogonal basis in Johnson-Lindenstrauss (JL) embedding, we employ the well-conditioned basis in norm to obtain concentration property of FCT in norm.
Subspace embedding methods have recently proven useful in modern science and engineering. A well-known example is the famous Johnson-Lindenstrauss (JL) lemma [1, 2] which provides low-distortion subspace embeddings in the norm and has obtained lots of important applications in dimension reduction.
A vital mathematical problem in subspace embedding is to devise a basis for the embedded subspace such that where stands for the distortion; is a polynomial about . Obviously, this can be done by taking an orthonormal basis for norm due to the invariant property under an orthonormal transformation. For general norm with , the invariant property under an orthonormal transformation disappears. However, works [3, 4] pointed out that it can be overcome by constructing well-conditioned bases for (1). Actually, well-conditioned bases are more practical than orthonormal basis. Recently, Sohler and Wooddruff  have proved that, using random Cauchy variables, over linear mapping , with arbitrary large constant probability, for any fixed -dimensional subspace, for all holds. Clarkson et al.  extended this to norm linear regression, for all except . The result of FCT in norm could be formulated as follows.
Theorem 1 (Fast Cauchy Transform (FCT) in norm, ). There is a distribution, over matrices , with and , such that for an arbitrary and for all , the inequality holds with probability , where
That is, with large probability, simultaneously for all , . Setting , we obtain the distortion for an arbitrarily small constant . Further, for any , the product can be computed in time.
In this paper, we study the Fast Cauchy Transform in the case of norm and prove the following.
Theorem 2 (FCT in norm). There is a distribution, over matrices , with , such that for an arbitrary and for all , the inequality holds with probability , where
For any , the product can be computed in time. Setting to a small constant, since and , it follows that . To get concentration, we need to independently bounded in our proof which required to obtain the high probability result, and we to obtain the distortion . From norm to norm, in the embedding matrix, Cauchy variable becomes and the Cauchy distribution has density . Aiming to obtain embedding properties, we take in the tail inequality and obtain the FCT in norm which is our main result.
Let be an input matrix, where we assume and has full column rank. For , the norm of a vector is and for . Let denote the set . For matrices, we use the operator norm and the entrywise norm . denotes the identity matrix and refers to a generic constant. The Cauchy distribution having density . The Cauchy distribution will factor heavily in our discussion and bounds for sums of Cauchy random variables' square will be used throughout. First, we give Maurer's inequality as follows.
Lemma 3 (see ). Let be independent random variables, , and define . Then, for any ,
Here, we take to be .
Lemma 4 (see ). Let be i.i.d. (0,1) random variables with probability and let , where , with and . Then, for any ,
3. Main Technical Results: The Fast Cauchy Transform of Norm
3.1. Two Tail Inequalities
In the section, we present two tail inequalities and the definition of well-conditioned basis in norm. Then we present the Fast Cauchy Transform in norm, which is analog of the fast Johnson-Lindenstrauss transform, and prove our main result Theorem 2 in Section 3.2.
Lemma 5 (upper-tail inequality). For , let be i.i.d. Cauchy random variables and with . Let . Then, for any ,
Proof. For fixed and defined and , then we have ; note that . Because of , we have that By a union bound, . Further, ; hence, . First, we need to bound , Then, By using the pdf of a Cauchy variable, so We conclude that Because of Markov's inequality and , we have We set ; then
Lemma 6 (lower-tail inequality). For , let be independent Cauchy random variables and with and . Let . Then, for any ,
Proof. To bound the lower tail, we use Lemma 3. By homogeneity, it suffices to prove the result for . Let . Clearly and defining , we have that and . Thus, we have that where the last inequality holds by Lemma 3 for . Using the distribution of half-Cauchy, we can verify that by choosing a fixed to make sure , since , and , so and . It follows that and the result follows.
3.2. Definition of Well-Conditioned Basis and Construction of FCT in Norm
Clarkson et al.  constructed as ; let be a parameter governing the failure probability of our algorithm; we modified and obtain , where: has each column chosen independently and uniformly from the orthogonal standard basis vector for and is chosen independently from a (0,1)-distribution; for sufficiently large, we will set the parameter ; is a diagonal matrix with diagonal entries chosen independently from a Cauchy distribution; and is a block-diagonal matrix comprised of blocks along the diagonal. Each block is the matrix , where is the identity matrix and is the normalized Hadamard matrix. We will set . The effects of here are to spread the weight of a vector, so that has most entries that are not too small.
We describe norm well-conditioned basis and our main aim for constructing an well-conditioned basis is to make sure that we can tolerate the distortion of subspace embedding. To form an approximate basis for the range of , we are led to faster algorithms for a range of related problems including low-rank matrix approximation [8, 9].
Next we show the construction of well-conditioned basis , which consists two steps: (a) let be a matrix satisfying (4) and compute and its QR-factorization , where is an orthogonal matrix; (b) output . This structure is similar to the algorithm of  for computing an well-conditioned basis.
To obtain Theorem 2, we divide it into two propositions. To prove the upper bound, we use the existence of a -conditioned basis and apply this basis to show that cannot expand too much. To prove the lower bound, we show that the inequality holds with high probability for a particular ; then we use a suitable -net to obtain the result for all .
Proposition 8. With probability at least , for all , , where .
Proof. Let be a --conditioned basis space of , which implies that for some we have . By the construction of , for any , and so Thus our main aim is to show that . We have where . We need to bound . For any vector , we represent by block of size , so and . Recall that and . Then, It follows that Applying this to for , The entry of is ; here we take . So, Here are dependent Cauchy random variables and . Using as a standard orthogonal basis, we obtain Hence, we can apply Lemma 5, and to obtain, Setting to , then we can get . Thus, with probability at least , To prove the second section, first we will show a result for fixed ; we will describe in the next lemma.
Lemma 9. Let .
Proof. We represent any vector by its blocks of size , so and . Let ,. We have that and
We can conclude that , so . To analyze , , we could have . Here we take ; then ( is an independent Cauchy random variable), so To apply Lemma 6, we need to bound and . First, because is standard orthogonal, then
To bound , we will show that is nearly uniform. Because and are independent for , we can use Lemma 4 with , with and ; setting in Lemma 4, By a union bound, none of the exceed with probability at most . We assume this high probability event, so we get . We apply Lemma 4 with and to obtain . By a union bound, with probability at least .
Proposition 10. Assume Proposition 8 holds. Then, for all , holds with probability at least
Proof. The proposition follows by putting a -net on the range of (observe that the range of has dimension at most ). Specifically, let be any fixed dimensional subspace of ( is the range of ). Consider the -net on with cubes of side ; there are such cubes required to cover the hypercube ; and for any two points inside the same cube, . From each of the -cubes, select a fixed generic point and . In order to make sure that belongs to , by a union bound and Lemma 9,
We will condition on the high probability event that for all . For any with , let denote the representative point which is in the same cube of . Then . Considering by choosing , we have , with probability at least
Since , by choosing for large enough , the final probability of failure is at most . And also since and , by the inequality (1), we could get the distortion for an arbitrarily small constant .
In the paper, we extend selections of the embedding matrix and apply the well-conditioned basis in norm. Based on Maurer's inequality, we obtain the tail inequality about Cauchy variables. By the specified matrix , we present Fast Cauchy Transform in norm.
The work is supported in part by the National Science Foundation of China under Grants no. 61271014, no. 61072118, and no. 61170159; by Science Project of National University of Defense Technology JC120201, also by National Natural Science Foundation of Hunan Province (China) 13JJ2001. The authors would like to thank anonymous referees for helpful suggestions and they would also like to thank Hui Zhang and Tao Sun for many thoughtful discussions.
W. B. Johnson and J. Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space,” Contemporary Mathematics, vol. 26, pp. 189–206, 1984.View at: Google Scholar
K. L. Clarkson, “Subgradient arid sampling algorithms for ℓ1 regression,” in Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '05), pp. 257–266, January 2005.View at: Google Scholar
K. L. Clarkson, P. Drineas, M. Magdon-Ismail, X. R. Meng, M. W. Mahoney, and D. P. Woodruff, “The fast cauchy transform and faster robust linear regression,” in Proceeding of the ACM-SIAM Symposium on Discrete Algorithms, 2013.View at: Google Scholar
A. Maurer, “A bound on the deviation probability for sums of non-negative random variables,” Journal of Inequalities in Pure and Applied Mathematics, vol. 4, no. 1, article 15, 2003.View at: Google Scholar
M. W. Mahoney, Randomized Algorithms for Matrices and Data. Foundations and Trends in Machine Learning, NOW Publishers, Boston, Mass, USA, 2011.
P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff, “Fast approximation of matrix coherence and statistical leverage,” in Proceedings of the 29th International Conference on Machine Learning, 2012.View at: Google Scholar