Abstract
Aiming at solving the difficulty of modeling on spatial coherence, complete feature extraction, and sparse representation in hyperspectral image classification, a joint sparse representation classification method is investigated by flexible patches sampling of superpixels. First, the principal component analysis and total variation diffusion are employed to form the pseudo color image for simplifying superpixels computing with (simple linear iterative clustering) SLIC model. Then, we design a joint sparse recovery model by sampling overcomplete patches of superpixels to estimate joint sparse characteristics of test pixel, which are carried out on the orthogonal matching pursuit (OMP) algorithm. At last, the pixel is labeled according to the minimum distance constraint for final classification based on the joint sparse coefficients and structured dictionary. Experiments conducted on two real hyperspectral datasets show the superiority and effectiveness of the proposed method.
1. Introduction
Remote sensing image classification has become an important part of remote sensing applications, which can be used in urban planning, environmental monitoring, classification, crop management, and many other applications [1–4]. Hyperspectral images (HSI) contain hundredsdimensional spectrum vectors, which may bring to higher accuracy for land cover recognition and classification. Therefore, hyperspectral remote sensing image classification has always been the concerning focus of researchers. At the same time, the supervised classification technique has been proved to be a more proper method for remote sensing classification [5]. It inputs a small number of representative marked areas as the training samples to training discriminate function and classifier and then computes the statistical characteristic of unlabeled samples and compares with labeled samples for predicting classification. Thus supervised learning is more efficient and realizes significantly improvement on the classification accuracy [6]. With the development of pattern recognition and the deepening research of compressed sensing, the support vector machine (SVM) [7], Bayesian [8], logistic regression [9], and manifold learning[10] have achieved more ideal classification results. In order to improve the efficiency of data processing and the accuracy of classification, researchers commonly used two of many strategies: one is feature dimension reduction or optimal spectral feature selection before classification, such as principal components analysis [11], spectral derivative features [12], and context information [13]. The second is to establish optimized classification model, such as kernel optimization [14] and ensemble learning [15]. On the other hand, with the further research of compressed sensing, sparse classification based on compressed sensing has been widely concerned. It was firstly applied to face detection and the classification can be achieved by sparse modeling with minimum reconstruction error, which has brought outstanding enhancement in classification accuracy. Sparse representation technology has been applied in various fields of computer vision pattern recognition, such as image segmentation, image restoration, super resolution, and face recognition.
Recently, sparse representation has been used for hyperspectral image classification, and achieved certain results [16–21]. The sparse representation classification method maps highdimensional signals into a few combinations of dictionary atoms and their coefficients. This method can extract data source features and describe category information effectively while removing noise. It can achieve more accurate classification based on minimum reconstruction error. Chen et al. [17] proposed that an unknown pixel is expressed as a sparse vector whose nonzero entries correspond to the weights of the selected training samples. The sparse vector is recovered by solving a sparsityconstrained optimization problem, and it can directly determine the class label of the test pixel. Castrodad et al. [18] proposed a sparse representation method at the subpixel level based on the learned blockstructured discriminative dictionaries. Chen et al. [22] proposed a sparse representation model based on sparsely representing a test sample in terms of all of the training samples in a feature space induced by a kernel function. Srinivas et al. [19] learn a discriminative graphbased classifier that captures interclass information for sparse representation vectors of each pixel in the local spatial neighbourhood of a central pixel. In [20], Fang et al. proposed a multiscale adaptive sparse representation model for dictionary estimation and defined an optimizable adaptive sets based on residual matrix to estimate sparse coefficients to determine the classification results, and the classification accuracy is improved a lot.
How to use context information is the key technology for accurate classification of hyperspectral images. In addition to the above neighbourhood information fusion methods, superpixel segmentation has become an important technology for features extraction and optimization in numerous applications of computer vision and digital image processing. The superpixels segmentation provides homogeneous regions of original image, and the complex structure of image is compressed and simplified for further analysis and processing. Based on this, Feng et al. [23] proposed a basic assumption that interior pixels of superpixels have similar markers, namely, that the sparse coefficients of homogeneous pixels have similar structure, and the pixel classification is regulated by contextual information exploited from superpixel with decision rule of majority voting. This method classifies the superpixels as a whole, and the overall classification effect is not fine enough. Zhang et al. [21] integrate spectral and spatial information into group sparse coding (GSC) via clusters which is an adaptive spatial partition derived from the meanshift superpixels.
All these methods use the spatial context information in different levels and put forward effective sparse model hypothesis. The spectralspatial methods are still suffer from the selection of adjacent region scales; overcomplete feature extraction and sparse representation model are still more arbitrary. They cannot provide accurate classification of boundary pixels and small regions in the image. To overcome these limitations, the authors consider that the superpixel segmentation can provide homogeneous regions of original image, and the complex structure of image is compressed and simplified for further analysis and processing.
Based on the above analyses, a joint sparse representing classification method is proposed based on flexible patches sampling of superpixels (SRCFPSS). The flowchart of SRCFPSS method is shown in Figure 1. Firstly, a group of relatively complete homogeneous superpixels are computed for reliable contextual information for sparse recovery. For each test pixel, its context information can be extracted from its corresponding superpixel. So we are sampling a set of pixels from inner pixels of corresponding superpixel sorted by similarity measure. Then, all the neighbour patches of sampling pixels assumes share a common sparsity pattern and the sparse coefficient of test pixel can be estimated by solving a sparse optimization problem. Then, the class label of the test pixel can be determined by the characteristics of the minimal total residuals.
The remainder of this paper is arranged as follows: the Section 2 is the detailed description of the proposed SRCFPSS algorithm. Experimental results are presented in Section 3, and finally, conclusions are given in Section 4.
2. Proposed Method
Firstly, we use the PCA model to extract first three main components to composite the pseudo color image. Then, the simple linear iterative clustering (SLIC) [24] method is executed on total variation (TV) diffusion of the pseudo color image to compute superpixles. We assume that the test pixels have a great correlation with the internal pixels of superpixels, and the test pixels can be joint represented by them. The inner pixels are sampled at equal intervals according to a given sampling frequency . All the sample pixels are extended to a set of patches to form the reconstruction matrix as joint spares representation of test pixels. At last the test pixels are classified according to the reconstruction error.
2.1. Superpixels Computing
Existing methods for superpixels computing always operate on lowdimensional feature space, such as nature images. These common superpixels segmentation methods may not be able to obtain a better result and time consuming when working on hyperspectral images contains hundredsdimensional spectrums [25, 26], and the PCA method is an effective way to deal with the computation complexity of highdimensional data [26]. Therefore, we choose the compression feature (first three principle components) to construct pseudo color image by principal component analysis. In order to deal with the complicated texture, many image enhancement strategies have been developed [27–33]. In this paper, we use the conduction function to implement the nonlinear coupling diffusion filtering on the pseudo color image, which is processed by the total variation model in [34]. The implicit smooth method is shown in formula (1):
where is a single band of the pseudo color image, is the divergence of vertical and horizontal dimensions with four features , is the iteration times, is the gauss diffusion function, and denotes the gradient vector. The iteration time relies on the superpixels scale for proper smoothing of local area. After nonlinear diffusion filtering, the SLIC segmentation is carried out for superpixels computing. According to experience, larger superpixel is not proper for common feature computing and the inner pixels number of 150 lead to a good compactness and uniformity [35].
2.2. SRCFPSS
For the SRC algorithm, determining the specific categories of a test pixel relies on the residuals of the feature reconstruction computed by dictionaries and corresponding sparse vectors. It is necessary to update the sparse vector and the corresponding dictionary to compute the optimized sparse feature vector.
First, the dictionary is initialized with a few labeled training samples collected from the image randomly within N classes. Select M labeled samples from original datasets according to the proportion randomly to set up a structured dictionary with M atoms, and define the atomic index set ; thus a test pixel can be represented as a linear combination of from all classes.
Based on the superpixels segmentation, for arbitrary test pixel , its corresponding superpixel can be arranged in a matrix . is the pixels number of the superpixels and is the band dimension of spectral space. All pixels within provide complementary and associate information to the same test pixel . In order to get a joint area consist of similar materials with , we compute the similarity between and inner pixels of by formula (2) and sort them as in descending order.where is the band counting as . Next, we sample N pixels at uniformly spaced from and choose their neighbour windows to construct overcomplete recovery matrix for , which can be employed to estimate the joint sparse vector for more accurate classification. The visual illustration of overcomplete sampling of superpixels for recovery matrix is shown in Figure 2. The scale of neighbour windows of sample pixels can be set as (if is even, ). Then the joint representation can be expressed as is the recovery matrix consisting of N neighbour windows for joint sparse representation of test pixel and = [] is the joint sparse coefficient. Then the sparse coefficient can be computed by solving following optimization problem: where is the upper bound of sparse level, which means the maximum number of selected atoms in the dictionary. Given the redundant structure dictionary , solving the sparse representation vector of each pixel is the key for classification. The OMP algorithm concentrates on finding the most relevant atoms to the current residual signal and updating new representative atom set in each iteration process. When the dictionary atoms are selected, the test pixel is projected onto the spanned subspace of the selected atoms, then recomputed sparse coefficients, and updated the residual error until the termination condition is reached. The advantage of OMP algorithm is that it is able to select the bestmatched atoms to given signal from the dictionary in each iteration of the approximation process. In order to enhance the impact of homogeneous regions for the sparse representation of the  iteration, it is necessary to calculate pixel residuals for all pixels in recover matrix and consist a correlation matrix . For any pixel in superpixels , calculate the residual correlation matrix as
Select atoms with maximum correlation values in matrix for each classes and patches, then sum the correlation for all patches of each class as and select the max and incorporate corresponding atoms indexes of each patches into the index set of selected representative atoms. Thus, the joint sparse coefficient is estimated as
The illustration of the joint sparse representation of two classes is displayed in Figure 3.
Update with new sparse coefficients as follows:The procedure is iterated until the number of iterationsK is satisfied. At last, output , and compute the class of pixels asThe label of the input pixels are determined by the minimal representation error between and its approximation recovered with and subdictionary . The description of SRCFPSS is summarized in Algorithm 1.

3. Experiment and Analysis
3.1. Datasets and Quantitative Metrics
In this section, we evaluate the proposed approach on two real hyperspectral datasets to verify its effeteness: AVIRIS Indian Pines image and Salinas image.
The Indian Pine image was collected from Indiana, USA, in June 1992, and it was provided by Purdue remote sensing image processing laboratory. The image is of size with a spectral coverage ranging from 0.2 to 2.4 m and 20m spatial resolution. For removals of noise and water absorption bands from the original data, 200 bands are retained as experimental data. The Indian Pine dataset contains 16 typical classes with 10249 samples. For each class, the numbers of training and test pixels are given in Table 1.
The Salinas image was also acquired by the AVIRIS sensor over Salinas River Basin in California. The image size is of and the spatial resolution is 3.7 meters per pixel. Similar to the India pine image, the water and noise bands were removed and 204 bands are retained for test. According to the image there are 16 different categories with 54129 samples. For each class, the numbers of training and test pixels are given in Table 2.
Three commonly preferred performance indexes overall accuracy (OA), average accuracy (AA), and the Kappa coefficient (Kappa) are adopted to evaluate the quality of classification results in the experiments.
3.2. Experiments and Performance Analysis
In this section, we choose five state of art classification approaches for comparison: PSRC [17], JSRM [17], MJSR [20], MASR [20], and BTC [36]. For each class of two test data, about and of the labeled samples were chosen from Indian Pine and Salinas for training and the rest are used for testing. We take ten runs of classification estimation to estimate the average accuracy to avoid any bias.
The scale parameter setting is as follows, JSRM using a single scale for two datasets India pine (77), Salinas (1111), MJSR, and MASA using multiscale for India pine (313) and Salinas (315). The parameters for the PSRC and BTC were set to the default values reported in [17, 36]. In the proposed method, the superpixels number is set by experience for getting a couple of compactness superpixels. Where the superpixels number of India pine image is 240 and the superpixels number of Salinas image is 540. The flexible and adjustable sampling frequency in each iteration is set to be =10 for India pine image and Salinas image. More atoms cannot bring the improvement of accuracy but time consuming. In this paper, the sparsity degree of for testing on other sparsity degrees performs poorly.
The classification evaluation of six approached of India pines is shown in Table 3 and we display the classification map and corresponding overall accuracy results in Figure 4. From the visual results, we can observe that the pixel wise sparse representation classifier provides much noise estimation of the classification. The JSRM, MJSR, and MASR incorporating the contextual information from adjacent area displayed smoother appearance and performed better on the quantitative comparison. These methods bring a big improvements on classification accuracy compared to PSRC. BTCWLS is a lightweight sparsitybased classification technique and also provides a smooth appearance than PSRC, JSRM, and MJSR. As can be seen, the proposed method supervised by superpixels outperforms the comparison methods on visual effect. For the quantitative comparison of OA, AA, and the Kappa coefficient in Table 3, the SRCFPSS also perform better than the other compared methods except AA compared with BTCWLS.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
The size of Salinas image is a relatively bigger than India pines and the average accuracy (OA, AA, and Kappa) results are in Table 4. The classification map and corresponding overall accuracy results are displayed in Figure 5.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
As can be seen from Table 4, the classification accuracy of PSRC is on a low level. With the contextual spatial information, the accuracy of joint sparse representation model has been greatly improved and the average accuracy improves nearly 25%. On this basis, the multiscale joint sparse representation and the multiscale adaptive sparse representation can improve the classification accuracy rate by roughly 5%. With homogeneous superpixel constraint, the classification accuracy of SRCFPSS has been generally improved: the overall accuracy is improved to (99.48%), the average accuracy is improved to 99.32%, and the Kappa coefficients are improved to 99.42% in the proposed method.
Figures 6 and 7 show the corresponding relationship between sampling frequency and the classification accuracy indexes (OA, AA, and Kappa coefficient) on India pines and Salinas image. These indexes are obtained by averaging the results conducting five independent runs. As can be observed in the two figures, the three indexes of the proposed classifier generally improve with the increase of sampling frequency.
4. Conclusion
Aiming at the problems of the imperfect utilization of the context information, this paper puts forward a sparse representation classification algorithm based on overcomplete sampling on superpixels for hyperspectral image. The main contribution of this paper is that collaborative sampling on spatial and spectral in superpixels exploits implicit context information for test pixels, and the joint sparse optimal of sampling patches fuses spectral and spatial structure information effectively of HSI data point. Thus, the joint sparse norm improves the category feature extraction and representation of pixels, providing significant enhancement of classification performance in the iterative process. The proposed SRCFPSS was tested on two hyperspectral images and obtained better classification performance. In addition, we will introduce the discriminative learning algorithms in the proposed model in our further work.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the NSFC (61602157, 61572173, and 41601450), the Science and Technology Planning Project of Henan Province (162102210062), the Key Scientific Research Fund of Henan Provincial Education Department for Higher School (15A520072), Doctoral Foundation (B201637), Young Scholar Sponsored of Henan Polytechnic University, Henan Postdoctoral Foundation, and Henan Science and Technology Innovation Outstanding Youth Program (184100510009).