Split-and-Combine Singular Value Decomposition for Large-Scale Matrix
The singular value decomposition (SVD) is a fundamental matrix decomposition in linear algebra. It is widely applied in many modern techniques, for example, high- dimensional data visualization, dimension reduction, data mining, latent semantic analysis, and so forth. Although the SVD plays an essential role in these fields, its apparent weakness is the order three computational cost. This order three computational cost makes many modern applications infeasible, especially when the scale of the data is huge and growing. Therefore, it is imperative to develop a fast SVD method in modern era. If the rank of matrix is much smaller than the matrix size, there are already some fast SVD approaches. In this paper, we focus on this case but with the additional condition that the data is considerably huge to be stored as a matrix form. We will demonstrate that this fast SVD result is sufficiently accurate, and most importantly it can be derived immediately. Using this fast method, many infeasible modern techniques based on the SVD will become viable.
The singular value decomposition (SVD) and the principle component analysis (PCA) are fundamental in linear algebra and statistics. There are many modern applications based on these two tools, such as linear discriminate analysis , multidimensional scaling analysis , and feature extraction, high-dimensional data visualization. In recent years, digital information has been proliferating and many analytic methods based on the PCA and the SVD are facing the challenge of their significant computational cost. Thus, it is crucial to develop a fast approach to compute the PCA and the SVD.
Currently there are some well-known methods for computing the SVD. For example, the GR-SVD is a two-step method which performs Householder transformations to reduce the matrix to bidiagonal form then performs the QR iteration to obtain the singular values [3, 4]. Since the off-diagonal regions are used to store the transform information, this approach is very efficient in saving the computational memory. If we only want to compute a few of the largest singular values and associated singular vectors of a large matrix, the Lanczos bidiagonalization is an important procedure for solving this problem [5–8]. However, the previously mentioned methods all require matrix multiplications for the SVD. One interesting problem is how do we compute the SVD for a matrix when the matrix size is huge and loading the whole matrix into the memory is not possible? The main purpose of this paper is to deal with this problem when the numerical rank of the huge matrix is small.
The second purpose of this paper is to update the SVD when the matrix size is extended by new data updating. If the rank of matrix is much smaller than the matrix size, Matthew proposed a fast SVD updating method for the low-rank matrix in 2006 . A rank- thin SVD of an matrix can be computed in time for . Although the matrix size remains unchanged in the Matthew’s method, we can still adopt the analysis of this paper for updating the SVD when the matrix size is changed.
Multidimensional scaling (MDS) is a method of representing the high-dimensional data into the low-dimensional configuration [10–12]. One of the key approaches of the MDS is simply the SVD, that is, if we can find a fast approach of the MDS then it is possible to find a fast approach of the SVD. When the data configuration is Euclidean, the MDS is similar to the PCA, in that both can remove inherent noise with its compact representation of data. The order three computational complexity makes it infeasible to apply to huge data, for example, when the sample size is more than one million. In 2008, Tzeng et al. developed a fast multidimensional scaling method which turned the classical order three MDS method to be linear . In this paper, we would like to implement SCMDS to the fast SVD approach, say SCSVD. The following subsections are reviews of the classical MDS and the SCSVD.
1.1. Review of the Classical MDS
Assume that is an -by- matrix with rank , where the th row of represents the coordinates of the th data point in an -dimensional space. Let be the product matrix of . Torgerson  proposed the first classical MDS method to reconstruct a matrix of Cartesian coordinates of points from a given symmetric matrix . The key of the classical MDS is to apply the double centering operator and the singular value decomposition.
Assume that is an -by- matrix whose elements are all ’s. We define a symmetric matrix by where is the row mean matrix of , is the column mean matrix of , and is the ground mean matrix of . The operator from to is called double centering. If we define a matrix by can be simplified to . Since matrix is symmetric, the SVD decomposes into . Then we have for some unitary matrix . In practice, we set to obtain the MDS result, namely, .
Although , for some unitary matrix , it is the same as shifting all the rows of such that the row’s mean is zero and then multiplying the result by . Hence, the row of preserves the pairwise relationship among the data points. is derived from by double centering, whose cost is lower than obtaining by the SVD. When the data size is huge, the cost of the classical MDS is dominated by the cost of the SVD, whose computational complexity makes it infeasible. Therefore, we need a fast MDS method to deal with the huge data problem.
The MDS method is useful in the application of dimension reduction. If the matrix is extended to the higher dimensional space and then shifted and/or rotated by an affine mapping, the double centering result of the corresponding product matrix stays the same. More generally, let be derived from by an affine mapping such that , where is orthogonal and and is a nonzero vector in and . That is is a big matrix but with a rank less than . Assume that , the double centering of obtains the same as double centering of .
Theorem 1. Let be the dimensional extension from by an affine mapping . Then the double centering of is the same as the double centering of .
Proof. Let and let , we have where is a vector in . By the definition of , and , we have where is the element of . Then we have,
Unlike the previous definition of , if is a distance matrix with each element , the double centering of is equivalent to , provided that . Hence, the MDS method performs double centering on , multiplies by , and then performs the SVD, which gives the configurations of the data set.
1.2. Review of the SCMDS
In 2008, we adapted the classical MDS so as to reduce the original complexity to , in which we have proved that when the data dimension is significantly smaller than the number of data entries, there is a fast linear approach for the classical MDS.
The main idea of fast MDS is using statistical resampling to split data into overlapping subsets. We perform the classical MDS on each subset and get the compact Euclidean configuration. Then we use the overlapping information to combine each configuration of subsets to recover the configuration of the whole data. Hence, we named this fast MDS method by the split-and-combine MDS (SCMDS).
Let be the -by- product matrix for some and let be a random permutation of . Assume that is a nonempty subset of for , where for and . is considered as the index of overlapping subgroups. is the submatrix of that is extracted from the rows and columns of by the index . Then is the product matrix of the submatrix of by index . Since the size of is much smaller than , we can perform the classical MDS easily on each to obtain the MDS result, say . After obtaining the representation coordinates of for each subgroup, we have to combine these representations into one. Without loss of generality, we only demonstrate how to combine the coordinates of the first two subgroups.
Assume that and are matrices whose columns are the two coordinate sets of the overlapping points with respect to and , respectively. Then there exists an affine mapping that maps to . Let and be the means of columns of and , respectively. In order to obtain the affine mapping, we apply QR factorization to both and . Then we have and . It is clear that the mean of the center of column vectors of has been shifted to zero. Because and come from the same data set, the difference between and is a rotation. Therefore, the triangular matrices and should be identical when there is no noise in and . Due to randomness of the sign of columns of in QR factorization, the sign of columns of might need to be adjusted according to the corresponding diagonal elements of , so that the signs of tridiagonal elements of and are the same.
After necessary modification to the sign of columns of , we conclude that Furthermore, we have Here, the unitary operator is and the shifting is . Since the key step of finding this affine mapping is QR decomposition, the computational cost is , where is the number of columns of and . Therefore, the cost complexity is limited by the number of samples in each overlapping region. The proof of the computational cost of SCMDS is given later.
Assume that there are points in a data set. We divide these samples into overlapping subgroups, and let be the number of points in each subgroup and let be the number of points in each intersection region. Then we have the relationship or For each subgroup, we apply the classical MDS to compute the configuration of each group data, which costs . In each overlapping region, we apply QR factorization to compute the affine transformation, which costs . Assume that the true data dimension is , and the lower bound of is . For convenience, we take for some constant . Then the total computational cost is about When , the computational cost is much smaller than , which is the computation time of the fast MDS method proposed by Morrison et al., 2003 . The key idea of our fast MDS method is to split data into subgroups, then combine the configurations to recover the whole one. Since all the order three complexities are restricted in the small number of data entries, we can therefore speed up MDS.
Proposition 2. When the number of samples is much greater than the dimension of data , the computational complexity of SCMDS is , which is linear.
After reviewing the MDS and the SCMDS, We will now demonstrate how to adapt the SCMDS method to become the fast PCA and how the fast PCA can become the fast SVD with further modification.
2.1. From the SCMDS to the SCPCA
Because the MDS is similar to the PCA when the data configuration is Euclidean, we can adapt the SCMDS method to obtain the fast PCA with the similar constraint when the rank is much smaller than the size and .
Equation (1) shows how to use double centering to convert the product matrix to and the . We note that the MDS result is equal to the matrix obtained from shifting the coordinates of with the mean of data points to the original point followed by a transformation by some unitary matrix .
Let ; the purpose of the PCA is to find the eigenvectors of the covariance matrix of and to obtain the corresponding scores. Since the covariance matrix of defined by is symmetric and finding the eigenvectors of the covariance matrix of is similar to obtaining of in (1), we set by omitting the constant . We note that such omission of the constant will not change the direction of the eigenvectors. When we want to carry out the PCA on , the related matrix can be obtained by , where is defined by In other words, carrying out the PCA on A is equivalent to carrying out the MDS on . That is and the result of PCA will be and . From the result of the MDS (), we can easily obtain and by normalizing the column vectors of .
In practice, we do not want to produce the product matrix when is huge. Rather, after we obtain matrix from , we will randomly produce the index permutation and use the index of overlapping subgroup to compute . And then we follow the step of the SCMDS to obtain the result, .
However, the result of the SCMDS is of the form , where is an unitary matrix in and is usually a nonzero vector. Removal of the factor can be easily done by computing the average of the row vectors of the SCMDS result. After removing the component from each row vector, we obtain .
Since and , we have which is a small matrix. It is easy to solve this small eigenvalue problem to obtain , , and . The computational cost from to obtain and is . Hence, the SCMDS approach to computing the SVD is still linear.
Notice that there are some linear SVD methods for thin matrix ( or ). Even in the case of big matrix with small rank, the traditional SVD method, for example the GR-SVD, can be implemented to be linear (because of the dependency, many components of the matrix become zero in the GR-SVD algorithm). However, if the rank of big matrix is almost full and the numerical rank is small or the matrix size is considerably huge that loading the whole matrix is impossible, the SCMDS has the advantage in computing speed. And we call this approach to obtaining and by the SCMDS to be SCPCA.
2.2. From the SCPCA to the SCSVD
The concepts of the SVD and the PCA are very similar. Since the PCA starts from decomposing the covariance matrix of a data set, it can be considered as adjusting the center of mass of a row vector to zero. On the other hand, the SVD operates directly on the product matrix without shifting. If the mean of the matrix rows is zero, the eigenvectors derived by the SVD are equal to the eigenvectors derived by the PCA. We are looking for a method which will give a fast approach to produce the SVD result without recomputing the eigenvectors of the whole data set, when the PCA result is given. The following is the mathematical analysis for this process.
Let be a column matrix of data set: , where is the mean of columns of . Hence, the row mean of is zero. Assume that we have the PCA result of , that is, . Then we have for some orthogonal matrix . Assume that the rank of is and is much smaller than the matrix size. We observe that or , depending on whether is spanned by columns of . If is spanned by , then where is the coefficient vector of when represented by , that is, .
If the singular value decomposition of is , we have Because the matrix is unitary, is automatically an orthogonal matrix as well. Then we have the SVD of .
Checking the matrix size of , we can see that to compute the SVD of is not a big task. This is because is a -by- matrix, and under our assumption, is much smaller than , so we can apply the economic SVD to obtain the decomposition of .
On the other hand, if is not spanned by , the analysis becomes where is a unit vector defined by Using the same concept of diagonalization in the case when is spanned by columns of , we find the SVD of Then , where is another orthogonal matrix, and hence the SVD of is completed.
Proposition 3. Let be a big size matrix and let its rank be much smaller than . Then the SCSVD or the SCPCA has linear computational complexity.
From the above analysis, we can have a fast PCA approach by computing the SCMDS first and then adapt the MDS result to obtain the PCA. We named this approach SCPCA. Similarly, the fast SVD approach, which computes the SCMDS first, then adapts the MDS result to obtain the PCA, and finally adapts the PCA result to the SVD, is called the SCSVD. These two new approaches work when the rank of is much smaller than the number of samples and the number of variables. To obtain the exact solution, the parameter must be greater than the rank of . In the SCPCA or SCMDS method, if , we only get the approximated solutions of the PCA and the SVD. Under the necessary criterion, we can reduce the computational complexity from to . If the numerical rank is not small, for example, , the computational complexity becomes almost the same as the original PCA and SVD. Our approach has no advantage in the latter case.
3. SVD for Continuously Growing Data
In this section, we look for the solution when the data is updated constantly and we need to compute the SVD continuously. Instead of scanning all the data again, we try to use the previous result of the SCSVD together with the new updated data to compute the next SVD. Before introducing our updating method, we review the general updating methods first.
Let be an -by- matrix, where is the number of variables and is the number of samples. And we assume that both and are huge. When new data comes in, we collect these new data to form a column matrix which is denoted by . Assume that we have the singular value decomposition of , that is, where , are orthogonal and is a diagonal. Since the data gets updated, the data matrix becomes To compute the singular value decomposition of , we need to compute the eigenvalue and eigenvector of .
If is spanned by the column space of , we can represent the column matrix by , where is the coefficient matrix of with columns of as the basis. Since is orthogonal, the coefficient matrix can be computed easily by . Then we have
Note that the matrix is positive symmetric and it is a small size matrix. Using the spectrum theorem, we can decompose this matrix into . Because the matrix is unitary, is rotated by . In this case, the computational cost is dominated by the rotation from to , and it is .
In general, the extended matrix is not spanned by the column space of and the decomposition of should be modified by where , and is the QR decomposition of . The cost to obtain and is , the computational cost of QR decomposition of is , the cost to obtain is , and the cost of rotation to is . When the cost of update process is linear . However, if the rank is not small ( or ), the computational cost is dominated by .
Now, we discuss how to update the SVD in the SCSVD approach. If the updated column vectors are spanned by the column vectors of some subgroup of the original data, say , the dimension will not increase by the extended matrix . We just reset the distance matrix in this subgroup by where is the number of vectors in the original -th subgroup. To compute the MDS result of this updated subgroup, it costs . Then we will combine this MDS representation points with the remaining points. The cost of obtaining the combination affine mapping is still which is increased by the new update. Combining this new subgroup with the whole data set needs . Then transfering the MDS result to the SVD needs , which is linear too.
If is not spanned by any subgroup of the previous SCMDS process, then we have to update to the largest group of the subgroups. Because the dimension of this updated group will increase, we have to make some modifications in the combined approach. If the column matrix of the MDS configuration of the new updated group is and the column matrix of the configuration of the other elements is , where , then we will extend the matrix to an -by- matrix by filling zeros in the bottom of the original . After adjusting and to the same dimension of rows, we perform the same combination step as before. Here, the width of matrix is almost the number of whole data, and this combination step takes .
There is one important difference between the general approach and the SCSVD approach. To obtain the SVD from the MDS result, we need the column mean vector of the matrix. The column mean vector of the original matrix must be computed, and then the column mean vector is updated by the simple weighted average when the new data comes in.
Notice that no matter whether is spanned by or not, the update method in the SCSVD has no advantage over the general update method although the computational costs are of the same order. This is because the true computational cost of the SCSVD updating method is always more than the general updating method, even though they are of the same order. Thus, when we have to update the new data to the matrix on which the SVD will be performed, the general method is recommended. However, when the matrix size is so huge that to expand becomes impossible, it is better to recompute the SVD result. In this case and with the condition that the true rank is much smaller than the matrix size, the SCSVD approach is recommended.
4. Experimental Result
In this section, we show that our fast PCA and fast SVD methods work well for big-sized matrices with small ranks. The simulated matrix is created by the product of two slender matrices and one small diagonal matrix, that is, , where . The size of the first matrix is , the second matrix is , and the third matrix is . Then the product of these three matrixes is of size and its rank is smaller than . The absolute values of the diagonal elements of are decreasing to simulate the situation as data normally comes in. When and are large and is much smaller than and , the simulated matrix satisfies our SCSVD condition. We pick , , and as our first example. Each element of the simulated matrix is generated from the normal distribution and then the diagonal terms of are adjusted. The average elapsed time of the SCSVD is 3.98 seconds, while the economical SVD takes 16.14 seconds. Here, each average value is derived from 100 repeats.
If we increase the matrix to , and fix the same rank , the elapsed time of the economical SVD is 1209.92 seconds, but the SCSVD is only 195.85 seconds. We observe that our SCSVD method demonstrates significant improvement. Figure 1 shows the speed comparison between the economical SVD (solid line) and the SCSVD (dashed line) with the square matrix size from 500 to 4000 by fixed rank 50. We also use fixed parameter and in each simulation test. We can see that the computational cost of the SVD follows the order 3 increase, compared with linear increase of the SCSVD.
Note that when the estimated rank used in the SCSVD is greater than the real rank of data matrix, there is almost no error (except rounding error) between the economic SVD and the SCSVD. The error between the economical SVD and the SCSVD is computed by comparing the orthogonality. Assume that is derived by the SCSVD and is derived from the economical SVD. Since the output of the column vectors of and might reverse, the error is defined by where is the first column of , is the first column of , and is the identity matrix of size . The error is about only. The average error in the same experiment in Figure 1 is and the standard deviation is . Thus, when the estimated rank of the SCSVD is greater than the true rank, the accuracy of the SCSVD is pretty much the same as the SVD in the case of a small rank matrix.
We would like to explore what happens if the estimated rank is smaller than the true rank. According to the experiment of the SCMDS , if the estimated rank is smaller than the true rank, the performance of the SCMDS will deteriorate. In our experiment, we set the true rank of the simulated matrix to be and then observe the estimated rank from to . The relationship between the error and the estimated rank with different matrix size is shown in Figure 2.
We can see that when the estimated rank decreases, the error arises rapidly. Lines in Figure 2 from the bottom to the top are the matrix size with to , respectively. The error increases slowly when the matrix size increases. We observe that making the estimated rank greater than the true dimension is essential for our method.
The purpose of the second simulation experiment is to observe the approximation performance of applying the SCPCA to a big full rank matrix. We generate a random matrix with a fixed number of columns and rows, say 1000. The square matrix is created by the form, , where is the essential rank, is the perturbation, and is a small coefficient for adjusting the influence to the previous matrix. Such a matrix can be considered as a big-sized matrix with a small rank added by a full rank perturbation matrix. We will show that our method works well for this type of matrices.
Figure 3 shows the error versus estimated rank, where the error is defined as (25), which is a comparison of the orthogonality between and . All the elements of matrices , , are randomly generated from the normal distribution and is a diagonal matrix in which the diagonal terms decay by the order, where and the essential rank . We can see that when the estimated rank increases, the composition error decreases. Especially when the estimated rank is greater than the essential rank , there is almost no error. Thus, it is important to make sure that the estimated rank is greater than the essential rank. In other words, when the estimated rank of the SCSVD is smaller than the essential rank, our SCSVD result can be used as the approximated solution of the SVD.
In the last experimental result, we let the matrix be growing and observe the performance between the general and the SCSVD update approaches. We start from a matrix that is formed by , where is a diagonal matrix such that the diagonal terms are positive and decayed rapidly like natural data and each element in and is a random number from the normal distribution . Then we update a random matrix into the matrix 10 times of the size. The element of the updated matrix is also from the normal distribution . When the matrix gets updated, we compute the SVD decomposition to obtain the first columns of . The updated result of the general approach is , and the updated result of the SCMDS approach is . We compute the error as (25) for and . The reason that we concern ourselves only with the first bases of the column space is that it is rare to use such a high dimension in the dimension reduction applications.
The computational time of the SCMDS is more than the general approach; however, the difference is within one second. The error of the SCMDS approach to update rows compared with recomputing the SVD decomposition is with standard deviation . The error of general approach is with standard deviation . The update time of the SCMDS and general approach is shown in Figure 4. The red solid line is the time for the SCMDS and the blue line is that for general approach. We can see that the SCMDS approach is about 8 times of general approach. Hence the SCSVD update approach is not recommended for either saving time or controlling error.
We proposed the fast PCA and fast SVD methods derived from the technique of the SCMDS method. The new PCA and the SVD have the same accuracy as the traditional PCA and the SVD method when the rank of a matrix is much smaller than the matrix size. The results of applying the SCPCA and the SCSVD to a full rank matrix are also quite reliable when the essential rank of matrix is much smaller than the matrix size. Thus, we can use the SCSVD in a huge data application to obtain a good approximated initial value. The updating algorithm of the SCMDS approach is discussed and compared with the general update approach. The performances of the SCMDS approach both in the computational time and error are worse than the general approach. Hence, the SCSVD method is only recommended for computing the SVD for a large matrix but is not recommended for the update approach.
This work was supported by the National Science Council.
M. Cox and T. Cox, Multidimensional Scaling, Handbook of Data Visualization, Springer, Berlin, Germany, 2008.
M. Chalmers, “Linear iteration time layout algorithm for visualizing high-dimensional data,” in Proceedings of the 7th Conference on Visualization, pp. 127–132, November 1996.View at: Google Scholar
A. Morrison, G. Ross, and M. Chalmers, “Fast multidimensional scaling through sampling, springs and interpolation,” Information Visualization, vol. 2, no. 1, pp. 68–77, 2003.View at: Google Scholar