Abstract

We present a randomized singular value decomposition (rSVD) method for the purposes of lossless compression, reconstruction, classification, and target detection with hyperspectral (HSI) data. Recent work in low-rank matrix approximations obtained from random projections suggests that these approximations are well suited for randomized dimensionality reduction. Approximation errors for the rSVD are evaluated on HSI, and comparisons are made to deterministic techniques and as well as to other randomized low-rank matrix approximation methods involving compressive principal component analysis. Numerical tests on real HSI data suggest that the method is promising and is particularly effective for HSI data interrogation.

1. Introduction

Hyperspectral imagery (HSI) data are measurements of the electromagnetic radiation reflected from an object or scene (i.e., materials in the image) at many narrow wavelength bands. Often, this is represented visually as a cube, where each slice of the cube represents the image at a different wavelength. Spectral information is important in many fields such as environmental remote sensing, monitoring chemical/oil spills, and military target discrimination. For comprehensive discussions, please see, for example, [13]. Hyperspectral image data is often represented as a matrix , where each entry is the reflection of ith pixel at the jth wavelength. Thus, a column of contains the entire image at a given wavelength; each row contains the reflection of one pixel at all given wavelengths—often referred to as the spectral signature of a pixel.

HSI data can be collected over hundreds of wavelengths—creating truly massive data sets. The transmission, storing, and processing of these large data sets often present significant difficulties in practical situations [1]. Dimensionality reduction methods provide means to deal with the computational difficulties of the hyperspectral data. These methods often use projections to compress a high-dimensional data space represented by a matrix into a lower-dimensional space represented by a matrix , which is then factorized. Such factorizations are referred to as low-rank matrix factorizations, resulting in a low-rank matrix approximation to the original HSI data matrix . See, for example, [2, 46].

Dimensionality reduction techniques are generally regarded as lossy compression; that is, the original data is not exactly represented or reconstructed by the lower-dimensional space. For lossless compression of HSI data, there have been efforts to exploit the correlation structure within HSI data plus coding the residuals after stripping off the correlated parts; see, for example, [7, 8]. However, given the large number of pixels, these correlations are often restricted to the spatially or spectrally local areas, while the dimension reduction techniques essentially explore the global correlation structure. By coding the residuals after subtracting the original matrix by its low-dimensional representation, one can compress the original data in a lossless manner, as in [8]. The success of lossless compression requires low entropy of the data distribution, and, as we shall see in the experiments section, generally the entropy of residuals for our method will be much lower than the entropy of the original data.

Low-rank matrix factorizations can be computed using two general types of algorithms: deterministic and probabilistic. The most popular methods for deterministic low-rank factorizations include the singular value decomposition (SVD) [9] and principal component analysis (PCA) [10]. Advantages of these methods include the following: first, often a small number of singular vectors (or principal components) sufficiently capture the action of a matrix; second, the singular vectors are orthonormal; third, the truncated SVD (TSVD) is the optimal low-rank representation of the original matrix in terms of Frobenius norm by the Eckart-Young theorem [9]. This last advantage is especially suited for compression with the TSVD method, since the Frobenius norm of the residual matrix is the smallest among all rank- representations of the original matrix, and hence we should expect a much lower entropy in its distributions—making it suitable for compressive coding schemes. Both decompositions offer truncated versions so that these decompositions can be used to represent an -band hyperspectral image with the data-size-equivalent of only images, where . For applications of the SVD and PCA in hyperspectral imaging see, for example, [11, 12].

The traditional deterministic way of computing the SVD of a matrix is typically a two-step procedure. In the first step, the matrix is reduced to a bidiagonal matrix using householder reflections or sometimes combined with a QR decomposition if . This takes floating-point operations (flops), assuming that . The second step is to compute the SVD of the bidiagonal matrix by an iterative method in iterations, each costing flops. Thus, the overall cost is still flops [13, Lecture 31]. In HSI applications, the datasets can easily break into the million-pixel or even giga-pixel level, which renders this operation impossible on typical desktop computers.

One solution is to apply probabilistic methods which give closely approximated singular vectors and singular values, while the complexity is at a much lower level. These methods begin by randomly projecting the original matrix to obtain a lower-dimensional matrix, while the range of the original matrix is asymptotically kept intact. The much-smaller projected matrix is then factorized using a full-matrix decomposition such as SVD or PCA, after which the resulting singular vectors are backprojected to the original space. Compared to deterministic methods, probabilistic methods often offer the lower cost and more robustness in computation, while achieving high-accuracy results. See the seminal paper [14], and the references therein.

Knowing the redundancy of HSI data, especially in the spectral dimension, recently we have observed studies on the compressive HSI sensing, either algorithmic [11, 15] or experimental [6, 16, 17], and all of them involve a random projection of the data onto a lower-dimensional space. For example, in [11] Fowler proposed an approach that exploits the use of compressive projections in sensors that integrate dimensionality reduction and signal acquisition to effectively shift the computational burden of PCA from the encoder platform to the decoder site. This technique, termed compressive-projection PCA (CPPCA), couples random projections at the encoder with a Rayleigh-Ritz process for approximating eigenvectors at the decoder. In its use of random projections, this technique can be considered to possess a certain duality with our approach to randomized SVD methods in HSI. However, CPPCA recovers coefficients of a known sparsity pattern in an unknown basis. Accordingly, CPPCA requires the additional step of eigenvector recovery.

In this paper, we present a randomized singular value decomposition (rSVD) method for the purposes of lossless compression, reconstruction, classification, and target detection. On a large HSI dataset we apply the rSVD method to demonstrate its efficiency and effectiveness of the proposed method. On another HSI dataset, we will show the effectiveness of the proposed algorithm in detecting targets, especially small targets, through singular vectors. In terms of reconstruction quality, we will compare our algorithm with CPPCA [11] by using the signal-to-noise ratio (SNR).

We note that Chen et al. [18] have recently provided an extensive study on the effects of linear projections on the performance of target detection and classification of hyperspectral imagery. In their tests they found that the dimensionality of hyperspectral data can typically be reduced to that of the original data without severely affecting performance of commonly used target detection and classification algorithms.

The structure of the remainder of the paper is as follows. In Section 2, we give a detailed overview of rSVD in Section 2.1, the connections between this work and CPPCA in Section 2.2, and the compression and reconstruction of HSI data in Section 2.3. In Section 3, we present numerical results of the rSVD method on two publicly available real data sets. Finally, we draw some conclusions and identify some topics for future work in Section 4.

2. Review of Randomized Singular Value Decomposition

We start by defining terms and notations. The singular value decomposition (SVD) of a matrix is defined as where and are orthonormal, and is a rectangular diagonal matrix whose entries on the diagonal are the singular values denoted as . The column vectors of and are left and right singular vectors, respectively, denoted as and . Define the truncated SVD (TSVD) approximation of as a matrix such that We define the randomized SVD (rSVD) of as follows: where and are both orthonormal, and is diagonal with diagonal entries denoted as . Denote the column vectors of and as and , respectively, and call them randomized singular vectors. Here, , , and are related to , , and , respectively. Define the residual matrix of a TSVD approximation as follows: and the residual matrix of a rSVD approximation as follows: Define the random projection of a matrix as follows: where is a random matrix with independent and identically distributed (i.i.d.) entries.

2.1. Randomized SVD Algorithm

The rSVD algorithm as considered by [14] explores approximate matrix factorizations using random projections, separating the process into two stages. In the first stage, random sampling is used to obtain a reduced matrix whose range approximates the range of ; in the second stage, the reduced matrix is factorized. In this paper, we use this framework for computing the rSVD of a matrix .

The first stage of the method is common to many approximate matrix factorization methods. For a given , we wish to find a matrix with orthonormal columns such that

Algorithm 1 [14] can be used to compute .

Input: An matrix and precision measure .
Output: An matrix and rank .
Initialize as an empty matrix, and .
while do
   .
   , where is a Gaussian random vector.
   .
   .
   .
   .
  Compute error .
end

Because in practice we may not know the target rank of , Algorithm 1 allows us to look for an appropriate target rank based upon given such that (7) holds. However if the target rank is known, one can avoid computing the error term at each iteration by replacing the While loop with a For loop. In practice, the number of columns of is usually chosen to be slightly larger than the numerical rank of [14]. Without loss of generality, we assume that , where . The columns of form an orthogonal basis for the range of , where is a matrix composed of the random vectors , typically with a standard normal distribution [14]. The range of the product is an approximation to the range of .

The second stage of the rSVD method is to compute the SVD of the reduced matrix . Since , it is generally computationally feasible to compute the SVD of the reduced matrix. Letting denote the SVD of , we obtain that where and are orthogonal matrices, and thus by (8), is an approximate SVD of , and the range of is an approximation to range of . Algorithm 2 summarizes the discussion above. See [14] for details on the choice of , along with extensive numerical experiments using randomized SVD methods and a detailed error analysis of the two-stage method described above.

Input: An matrix and rank with .
Output: The rSVD of : .
  Generate a Gaussian random matrix
  Form the projection of : .
  Construct by Algorithm 1.
  Set .
  Compute the SVD of , .
   .

Next we discuss several variations of Algorithm 2 depending on the properties of . We will test all cases in the numerical results section.

Case 1. If knowing the target rank , and if the singular values of A decay rapidly, we can skip Algorithm 1 by simply using the rank revealing QR factorization, , where is an orthogonal basis of the range of . Figure 1 from [19] compares the approximation error and the theoretical error of a matrix , and clearly when the singular values of decay rapidly, is close to the theoretical error with high probability.

Case 2. If the singular values of A decay gradually, or is not small, we may lose the accuracy of estimates. Consider introducing a power and forming as . Since has the same singular vectors as , while its singular values, , decay more rapidly. Hence the error will be smaller by Theorems 2.3 and 2.5 in [14]. From Figure 2, we see that the is not always close to , especially when , but, by increasing the power , we observe the reduction of errors.

Case 3. Algorithm 2 requires us to revisit the input matrix, while this may be not feasible for large matrices. For example, in ultraspectral imaging [20], one could have thousands of spectral bands, and PCA on such datasets would require computing the eigenvectors and eigenvalues of a covariance matrix with a huge dimension. Another example is in the atmospheric correction model called MODTRAN5 [21], that utilizes large lookup tables (LUTs), and the compression of LUTs by the PCA technique would again require the eigen decomposition of large covariance matrices. Here we introduce a variation of Algorithm 2 that only requires one pass over a large symmetric matrix. Now we define matrix as follows: and we multiply by , that is, Since , we have the following approximation: and hence by a least-square solution we have where the superscript represents the pseudoinverse. Notice the absence of in the approximate formula of . Thus, for a large symmetric , we will use (12) rather than to compute , while the rest of Algorithm 2 would remain the same.

2.2. Connections to CPPCA

A significant difference between the compressive-projection PCA (CPPCA) approach and our work is that CPPCA uses a random orthonormal matrix to compress the data matrix . In comparison, though we also use random projections, the orthonormal matrix is constructed from, and directly related to, the data matrix . In particular, we compute an orthogonal Q such that . Also, because the projection onto convex sets (POCSs) algorithm is used for reconstruction, the projection matrix of CPPCA has to be different for different blocks of the scene, which have to be independently drawn and orthogonalized; meanwhile, one random projection matrix in rSVD is sufficient and can be applied to the whole dataset. Another restriction of CPPCA lies in the fact that the Rayleigh-Ritz method requires well-separated eigenvalues [22], which might be true for the first few largest eigenvalues, but usually not true for the smaller eigenvalues. In a later section we present our approach for matrices with slowly decaying singular values in Case 2.

2.3. Compression and Reconstruction of HSI Data by rSVD

The flight times of airplanes carrying hyperspectral scanning imagers are usually limited by the data capacity, since within 5 to 10 seconds hundreds of thousands of pixels of hyperspectral data are collected [1]. Hence for real-time onboard processing, it would be desirable to design algorithms capable for processing this amount of data within to seconds before the next section of the scene is scanned. Here we use the proposed rSVD algorithm to losslessly compress blocks of HSI data, each within a frame of second flight time, which is equivalent to dividing the HSI data cube along the flight direction, either the or direction, with the number of rows ( direction) or columns ( direction) determined by the ground sample distance (GSD) and the flight speed. Algorithm 3 describes the rSVD encoder, which outputs to be stored onboard, where and are the outputs of Algorithm 2 for the jth block of data, while is the coded residual. These are then used by Algorithm 4 to reconstruct the original data losslessly, and we can see it only involves a one-pass matrix-matrix multiplication and is without iterative algorithms. Compared to CPPCA, the number of bytes used for storing the s and s is smaller, and the reconstruction only involves matrix-matrix multiplication. The only possible bottleneck might be the residual coding, but the recent development in floating point coding has seen throughputs reaching as much as [23] on a graphic processing unit (GPU), while even on an eight-Xeon-core computer we have seen throughput at , and both would be sufficiently fast to code the required amount of HSI data within seconds.

Input: An matrix and an integer .
Output: .
while  flight continues  do
  Acquire the HSI data, , scanned in the last few seconds.
  Apply Algorithm 1 for and .
  Compute the residual, .
  Code as with a parallel floating point coding algorithm.
  Store , and compressed .
   .
end

Input: .
Output: Reconstructed matrix .
For     do
  Decode from with a parallel floating point decoding algorithm.
   .
   .
end
Group all together in .

3. Numerical Experiments

3.1. Accuracy of the rSVD Estimates

In this section, we will first compare the results from rSVD and from the exact TSVD by the MATLAB function, “svds,” which computes the largest singular values and the associated singular vectors of a large matrix. It is considered to be an efficient and accurate method to obtain the TSVD. To simulate large HSI datasets, we generate random test matrices , with fixed at representing spectral channels, while , representing the number of pixels. The singular values of are simulated as following a power decay with the power set as , that is, . We will use Algorithm 2 to compute the rSVD. The comparison of computation time is shown in Figure 3(a), from which we find that “svds” is almost as effective as rSVD when is relatively small. However, when increases, the computation times of “svds” increase at a much faster pace than that of rSVD, and note that when , the processing time of rSVD is well within seconds, meeting the onboard processing time limit. To judge the accuracy of estimated singular vectors, we compute the correlation or the inner product of singular vectors in by “svds” and by rSVD as shown in Figure 3(b), where we clearly see that both sets are almost identical up to the fifteenth singular vector. To judge the accuracy of estimated singular values, we compute the relative absolute errors, , and plot them in Figure 3(c). Again we observe the high accuracy of the estimates up to the first singular values. In most HSI datasets, the singular values decay rate is generally faster than , and hence we should expect even higher accuracy of the estimates. Also, it is sufficient to estimate the first or even singular vectors and singular values, which would often cover more than of the original variance of the HSI data [1].

Then we numerically test the three special cases discussed in Section 2.1.

Case 1. We have considered square Toeplitz matrices with increasingly large sizes, . Figure 4 shows the rapid decay of a matrix, and hence they are suitable for testing the algorithm. Figure 5(a) shows that the relative Frobenius norm errors rise and fall in the order of and remain in the same order even when the size of a matrix increases. Figure 5(b) demonstrates that the computational time of rSVD is very short for the Toeplitz matrices whose singular values decay rapidly.

Case 2. Here we simulate matrices with slowly decaying singular values, that is, , with . For each matrix, we run the rSVD algorithm times for each power in the set, . The norm errors of reconstructed matrices are averaged across runs and normalized by the norm error when , that is, . Figure 6 shows that increasing the power improves the reconstruction quality or decreases the norm error, and greater effects are observed for larger because the singular values of are .

Case 3. To simulate covariance matrices, we generate a sequence of positive definite symmetric matrices with increasing size as . The eigen-spectrum follows a power decay with the power set as . We apply the modified as in Case 3 to compute its SVD, rather than using . We set . Figure 7(a) shows the computation time compared with using regular SVD in MATLAB, while Figure 7(b) shows the relative Frobenius norm errors between the original matrix and its low-rank approximation. Apparently the computation time used by rSVD is far less than the regular SVD, while the accuracies are quite high.

3.2. rSVD on a Large HSI Dataset and a Lossless Compression

The rSVD algorithm was also applied to a relatively large HSI dataset consisting of a image cube collected over Gulfport MS by a commercial hyperspectral sensor having a spectral range of 0.45 to 0.72 microns. This cube was then unfolded into a large matrix of size . Running an exact SVD algorithm is almost impossible on a regular desktop computer with limited memory and speed, while the rSVD algorithm gives us singular values and vectors very close to the true ones, with a significantly reduced amount of flops and memory required. For the first singular vectors and singular values, it only takes seconds on a desktop computer with Xeon  GHz Quadcores and  GB memory. From the computed singular values and vectors, we observed that the singular vectors after the ninth singular vector all appear to be noise, indicating that the data matrix does have a low-rank representation.

Figure 8 shows the results for a small scene from the large dataset described above, consisting of part of the University of Southern Mississippi Campus, extracted from the large Gulfport MS dataset. Notice the targets placed in the scene, for detection and identification tests. The first eight singular vectors, , are folded back from the transformed data. The first singular vector shown in Figure 8 is the mean image across 58 spectral bands, while the second singular vector shows high intensity at the grass and foliage pixels, the third shows the targets quite clearly, as well as high-reflectance sandy areas or rooftops, the fifth shows the low-reflectance pavement or roof tops, and shadows, and the seventh shows vehicles at various places marked by the circles. Starting from the eighth, the rest of the singular vectors appear to be mostly noise.

In Figure 9(b), the histogram of entries in shows that residuals are roughly distributed as a Laplacian distribution, and all residuals are within the range of , which is significantly smaller than the original range of in Figure 9(a). Moreover, most of the residuals (93%) are within the range (notice the log scale on the y axis), which means that the entropy of residuals is significantly smaller than the entropy of the original. This justifies a further coding step on the residuals so as to complete a lossless compression scheme. Here we apply the Hoffman coding due to its fast computation and show the compression ratios at various error rates, corresponding to the numbers of bits required to code the residuals. For example, a 16-bit coding would result in an error in the range of . Figure 10 provides us options on balancing compression with accuracy. For practical purposes, an error rate in the order of 0.001 might be sufficient, and this would result in a compression ratio of 2.5 to 4. For comparison purpose, the 3D-SPECK [7] on a small dataset of size results in a compression ratio of at the 16-bit coding. If more sophisticated coding algorithms than Hoffman coding are applied here, we could see more improvements on the compression ratios. For computing the compression ratios, we have assumed 16-bit coding (2-byte) for all the matrices, including , the residual matrix and the coded (compressed) residual matrix.

To test the suitability of the onboard real-time processing by Algorithm 3, we apply the rSVD on a matrix and see it is finished in seconds on a low-end dual-core laptop computer, and if, with a parallel coding algorithm for the residuals, we should finish Algorithm 3 within the required -seconds time frame.

3.3. Small Target Detection Using rSVD

For a small target detection experiment using rSVD, we choose a version of the Forest Radiance HSI dataset, which has been analyzed by using numerous target detection methods, see, for example, [18, 2426] Our rationale behind using an SVD algorithm in target detection lies in the fact that even though targets might be of small size, if all the spectrally similar targets have sufficient presence, some singular vectors of the HSI data matrix will reflect these features, and hence the presence of targets can be simply shown by these singular vectors. After removing the water-absorption and other noisy bands, we unfold the data cube into a matrix and apply Algorithm 2 for the singular vectors . Figure 11 shows the sum of first twelve folded back into a matrix, and we can clearly detect of the targets, while the other two are slightly visible.

3.4. Comparison between rSVD and CPPCA

In this section, we will compare rSVD with CPPCA from the aspects of accuracy and computation time, first on simulated data and then on a real HSI dataset. We first simulate a set of matrices with increasing number of rows (pixels), , while fixing as spectral bands. The singular value spectrum is simulated as following a power decay rate with the power set as . Both CPPCA and rSVD algorithms are applied to each simulated matrix, and results are compared in terms of their reconstruction quality and the computation time. Figure 12(a) shows that the running time of rSVD increases linearly with , while that of CPPCA remains constant, which is not surprising since CPPCA mainly works on eigenvectors of fixed dimension . However in terms of reconstruction quality, Figure 12(b) shows the advantage of rSVD. Here we set the number of reconstructed eigenvectors by CPPCA to since it provides the best norm errors, while for rSVD we set it to .

For the real dataset, we use a small section of the Gulfport dataset and fold it into a matrix with size . From the reconstructed matrices by both methods, with varying rank we compare their reconstruction qualities in terms of signal-to-noise ratio (SNR) in Figure 13, and the computation time in Table 1. Again we observe the better reconstruction quality though slightly slower computation of rSVD when compared to CPPCA.

Next, we compare the accuracy of the reconstruction of the eigenvectors, , of the covariance matrix of by these two methods. Given that PCA is an extremely useful tool in HSI data analysis, for example, for classification and target detection, it is essential to obtain a quality reconstruction of the eigenvectors by rSVD in terms of accuracy and efficiency. Here we simulate a matrix with orthogonalized random matrices, and , and a power-decay singular value spectrum in , with the power set as . Then we run both algorithms for times, and, within each time, we compute the angles between eigenvectors by CPPCA and the true ones, and between eigenvectors by rSVD and the true ones. The first row of Figure 14 shows the histograms of angles between the first eight reconstructed eigenvectors by CPPCA and the true ones, while the second row shows the histograms of the angles between the first eight eigenvectors by rSVD and the true ones. We can see that the first three or four eigenvectors by CPPCA appear to be close to the true ones, while the rest are not. Hence if using more than four eigenvectors reconstructed by CPPCA, we observe a decrease in reconstruction quality or an increase in the norm error. However in the second row, we see good accuracy of the eigenvectors computed by rSVD.

3.5. Classification of HSI Data by rSVD

Since the projection of a HSI data matrix by its truncated singular matrix, that is, contains most of the information in the original matrix , we can use any classification algorithm, such as the popular means, to classify HSI data, but also use its representation in a lower-dimensional space. Consider a small section of the Gulfport dataset. Figure 15 shows the first 8 columns of . From the first subfigure, we see that most information of the hyperspectral image is contained in the first column, while the second column almost contains the rest of the information which the first column does not contain. The rest of the columns contain information at more detailed and spatially clustered levels. Figure 16 shows the result of classification by -means, where we can see the low-reflectance water and shadows in yellow, the foliage in red, the grass in dark red, the pavement in green, high-reflectance beach sand in dark blue, and dirt/sandy grass in blue and light blue.

A comparison with results from running -means on the full dataset shows that only pixels of all the pixels are classified differently between the full dataset and its low-dimensional representation. Hence it is highly suitable to use this low-dimensional representation for classification.

4. Conclusions

As HSI data sets are growing increasingly massive, compression and dimensionality reduction for analytical purposes has become more and more critical. The randomized SVD algorithms proposed in this paper enable us to compress, reconstruct, and classify massive HSI datasets in an efficient way while maintaining high accuracy in comparison to exact SVD methods. The rSVD algorithm is also found to be effective in detecting small targets down to single pixels. We have also demonstrated the fast computation in compression and reconstruction of the proposed algorithms on a large HSI dataset in an urban setting. Overall, the rSVD provides a lower approximation error than some other recent methods and is particularly well suited for compression, reconstruction, classification, and target detection.

Acknowledgments

The authors wish to thank the referees and the project sponsors for providing very helpful comments and suggestions for improving the paper. Research by Jiani Zhang and Jennifer Erway was supported by NSF grant DMS-08-11106. Research by Robert Plemmons and Qiang Zhang was supported by the U.S. Air Force Of- fice of Scientific Research (AFOSR), award number FA9550-11-1-0194, and by the U.S. National Geospatial-Intelligence Agency under Contract HM1582-10-C-0011, public release number PA Case 12-433.