Mathematical Problems in Engineering

Volume 2016, Article ID 3460281, 10 pages

http://dx.doi.org/10.1155/2016/3460281

## Sparse Representation Based Binary Hypothesis Model for Hyperspectral Image Classification

School of Air and Missile Defense, Air Force Engineering University, Xi’an 710051, China

Received 9 March 2016; Revised 9 May 2016; Accepted 17 May 2016

Academic Editor: Wonjun Kim

Copyright © 2016 Yidong Tang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The sparse representation based classifier (SRC) and its kernel version (KSRC) have been employed for hyperspectral image (HSI) classification. However, the state-of-the-art SRC often aims at extended surface objects with linear mixture in smooth scene and assumes that the number of classes is given. Considering the small target with complex background, a sparse representation based binary hypothesis (SRBBH) model is established in this paper. In this model, a query pixel is represented in two ways, which are, respectively, by background dictionary and by union dictionary. The background dictionary is composed of samples selected from the local dual concentric window centered at the query pixel. Thus, for each pixel the classification issue becomes an adaptive multiclass classification problem, where only the number of desired classes is required. Furthermore, the kernel method is employed to improve the interclass separability. In kernel space, the coding vector is obtained by using kernel-based orthogonal matching pursuit (KOMP) algorithm. Then the query pixel can be labeled by the characteristics of the coding vectors. Instead of directly using the reconstruction residuals, the different impacts the background dictionary and union dictionary have on reconstruction are used for validation and classification. It enhances the discrimination and hence improves the performance.

#### 1. Introduction

The technology for artificial target recognition in nature background is important to military science and civil autocontrol. Hyperspectral remote sensor captures digital images in hundreds of narrow spectral bands, which span the visible to infrared spectrum. The high spectral resolution of the data provides an invaluable source of information regarding the physical nature of the different materials and strengthens the capability to identify structures and objects in the image scene. As a result, it makes detecting and classifying the target at the same time possible, that is, integrated detection and classification. However, such a large number of spectral channels imply the high dimensionality of the data and bring challenge to image analysis. Most of the common technologies designed for the analysis of grey level, color, or multispectral images are not applicable to hyperspectral images.

One of the most important applications of HSI is classification. Different materials usually reflect electromagnetic energy differently at specific wavelengths. This enables discrimination of materials based on the spectral characteristics. Various techniques have been developed for HSI classification. SVM [1–3] is a powerful tool to solve supervised classification problem in remote sensing image scene and performs pretty well. Variations of SVM-based algorithms have also been proposed to improve the classification accuracy [4, 5]. In the context of supervised classification, such as SVM, an additional problem is the so-called Hughes phenomenon that occurs when the training set does not have enough samples to ensure a reliable estimation of the classifier parameters. It is hard to find the separating hyperplane between two classes with very limited reference data while the number of spectral channels is usually very large in hyperspectral image scene.

The sparse representation [6] has recently been applied to hyperspectral detection and classification [7–9] relying on the observation that the pixels belonging to same class approximately lie in same low-dimensional subspace. Thus, a query pixel can be sparsely represented by a few training samples (atoms) from dictionary, and the associated sparse representation vector will implicitly encode the class information. While SVM is a binary classifier (multiclass SVM requires one-against-one or one-against-all strategy [10]), the SRC is a multiclass classifier, which is from a reconstruction point of view. The SRC can be regarded as generalized model and often results in good performance. However, the number of classes present in hyperspectral image scene has to be known to structure the dictionary and calculate the residuals when employing the sparsity model as [11, 12]. In addition, the sparsity model may not be suitable for background pixels due to the interaction between the target and background sparse vectors.

In this paper, a sparse representation based binary hypothesis (SRBBH) model, which can be regarded as a discriminative and semisupervised model, is proposed to strengthen the performance of sparsity model. Only the number of desired classes, which is accessible in most practical cases, is required for SRBBH model. Thus, a target pixel is approximately represented by union dictionary consisting of both corresponding target training samples and background training samples while a background pixel can be approximately represented just by background dictionary. Different from SRC, different impacts the background dictionary and union dictionary have on reconstruction, instead of residuals themselves, are used for validation and classification in SRBBH model. This scheme enhances the discriminative power of different subspaces and then improves the classification performance. However, when the data structure is complex and the problem becomes nonlinear, the SRBBH model based classifier (SRBBHC) may not be competent any more. With implicitly exploiting the higher order structure of the given data, the kernel algorithm obtains significant performance improvement. Therefore, the kernel SRBBHC (KSRBBHC) is developed to project the data into high-dimensional feature space in which the data becomes linearly separable. Taking the projected data into consideration, KSRBBHC intends to separately represent the desire class and undesired class in corresponding high-dimensional feature space and makes classification performance better.

The rest of the paper is organized as follows. Section 2 briefly reviews the conventional SRC and its kernel version. Section 3 proposes the SRBBHC and its kernel version for HSI. The effectiveness of proposed SRBBHC and KSRBBHC is demonstrated by experimental results in Section 4. Finally, conclusions are drawn in Section 5.

#### 2. Sparse Representation Based Classification

##### 2.1. SRC

In sparsity model, it is assumed that the spectral signatures of pixels belonging to same class approximately lie in same low-dimensional subspace. A query pixel is given , where is the number of bands. Then the linear representation of can be written in terms of all training samples aswhere is a structured dictionary whose columns are training samples of all classes and is the sparse coefficient vector. can be recovered by solvingwhere denotes the -norm, which is defined as the number of nonzero entries in the vector, and is a preset upper bound on sparsity level. The problem in (2) is a NP-hard problem, which can be approximately solved by greedy algorithms, such as orthogonal matching pursuit (OMP) [13] and subspace pursuit (SP) [14], or relaxed to convex programming [15]. In this paper, the OMP algorithm is exploited to generate sparse coefficient vector. The OMP algorithm augments the support set by one index per iteration until atoms are selected or the approximation error is within a preset threshold.

Once the sparse coefficient vector is obtained, the class label of is determined by the minimal residual between and its approximation from each class of subdictionary:where is the dataset of training sample from class and is the coefficient vector associated with .

##### 2.2. KSRC

Kernel methods outperform the classical linear algorithms by implicitly exploiting the nonlinear information of given data [16]. It relies on the observation that a pixel in kernel-induced feature space can be linearly represented in terms of the training samples in same space [9]. Let be the data point of interest and let be its representation in kernel-induced feature space. Similar to the SRC, the linear representation of in terms of training samples in kernel-induced feature space can be formulated aswhere is the training dictionary in kernel-induced feature space and is the coefficient vector. The vector can be recovered by solving

Problem (5) can be approximately solved by kernelized sparse recovery algorithms, such as the kernelized orthogonal matching pursuit (KOMP) and kernelized subspace pursuit (KSP). Implementation details of the KOMP and KSP can be found in [17]. In this paper, the KOMP is used to solve the problem with RBF kernel function. To avoid directly evaluating the inner product in high-dimensional feature space, the kernel-based learning algorithm uses an effective kernel trick to implement dot products in the feature space without knowing the exact mapping function .

Once the sparse vector is obtained, the residual associated with th subject in the feature space is then computed bywhere and are, respectively, kernel tricks of target dictionary with the query pixel and itself. The class label of query pixel is then determined as

Though SRC and KSRC have been proved to be powerful approaches as shown in [8], the main idea of the SRC and KSRC is only appropriate for extended surface target classification such as plantation and geology. In this case, the pixels in a large neighborhood are likely to consist of similar materials, and different subjects are next to each other without undesired subject (background) existing between them. As a result, the spectral mixture only occurs along the boundary, leading to the fact that most of the target pixels are observed without corruption brought by background and most of the training samples selected from dataset for dictionary are pure. However, this probably would never happen to small size target whose spectrum is almost mixed with background. Thus, it may be unreliable for conventional SRC and KSRC to use the residual information for validation and classification. On the other hand, it is assumed that we have hold all class of subjects present in the hyperspectral image scene, including the number of classes and corresponding training samples. In other words, the SRC and KSRC cannot distinguish between small targets in hyperspectral image scene for generally lacking training samples of undesired class.

#### 3. SRBBHC and the Kernelized Version

In this section, we introduce the proposed SRBBH model based classification algorithm for HSI, which utilizes the binary hypothesis for quality validation as well as the reconstruction residuals by the two hypotheses for classification. Moreover, a kernelized version of the proposed classifier is also introduced for nonlinear classification in a high-dimensional feature space.

##### 3.1. SRBBHC

When employing SRC and KSRC for HSI, it is assumed that the number of classes present in the image scene is known. However, it is actually difficult to know this information due to scene complexity. In many practical situations, only the number of desired classes is available. Fortunately, considering the regions of interest, such as artificial target in nature background, what we wonder is the class label of desired subjects, but not the class label of all kinds of subjects. In other words, we need to detect and then reject the undesired query sample before classification. On the other hand, different from extended surface target, the pixels of target with small size are almost mixed with background spectrum. Detecting and classifying desired subjects from the mixed pixel are generally difficult, especially when the background spectrum has a close or even larger abundance than target. In addition, although the background and target training samples have distinct spectral signatures and lie in two different subspaces, the two subspaces are usually not orthogonal, due to spectral variation [18]. In such case, the reconstruction residual via corresponding target training samples may be on the contrary larger than that via background training samples, which will lead to mistake target for background. Thus, it is no longer sufficient to directly use the reconstruction residuals for validation and classification. The SRBBHC solves these problems by utilizing a binary hypothesis model with more reasonable dictionaries, where the query pixel is, respectively, modeled with background dictionary under the null hypothesis and with union dictionary under the alternative hypothesis. And then the binary hypothesis is used for validation. In SRBBHC and its kernelized version, the different impacts the background dictionary and union dictionary have on reconstruction, instead of residuals themselves, are used for validation and classification. In a sense, the SRBBHC can be viewed as a joint target detection and classification scheme, which firstly detects the valid samples and then classifies them.

In detail, denote the union dictionary consisting of both target training samples from class and background training samples as , , where is target subdictionary associated with class and is background subdictionary. If belongs to undesired class, the spectrum lies in a low-dimensional subspace spanned by the background training samples. As a result, the residuals of different union dictionary are similar to the residual of background subdictionary , that is, hypothesis . On the other hand, if belongs to class , the union dictionary will give better representation, leading to smaller residual than background subdictionary, that is, hypothesis . The binary hypothesis for quality validation is modeled as follows:where and are coefficient vectors associated with and , respectively. In other words, the problem is reformed into local binary classification problem, where the binary hypothesis is used to decide if the test pixel is a valid sample from one of the classes we desire.

According to the sparse coding theory [15], the coefficient vectors can be recovered by solving following minimization problem with the same sparsity level:

Once the sparse coefficient vectors are obtained, the semantic information can be directly extracted from the coefficient vectors. The residuals of background subdictionary and different union dictionaries are calculated as

If we decide the given as a valid sample belongs to class , the union dictionary will also give much better representation than the other union dictionaries , leading to larger difference between and . Then, will be labeled to the class with greatest difference between and . Defining a vector , the outputs of integrated detection and classification decision are then made bywhere is a threshold used for validation. When , the query pixel will be labeled as undesired class, that is, background. The threshold makes important effects on validation and hence classification. However, in this study, the threshold is determined experimentally due to lack of parameter analysis theory. In our future work, we will investigate how to automatically choose appropriate for different test datasets.

Considering the size of the desired subjects, the background dictionary is generated locally for each query pixel through a dual concentric window centered at query pixel. Only the samples in the outer region are involved in . As a result, the background dictionary is constructed adaptively for each pixel and captures the background spectral signature of the query pixel better. It is important to note that same sparsity level must be adopted for each union dictionary to make sure of the comparability of residuals of two hypotheses in SRBBHC.

##### 3.2. KSRBBHC

For a hyperspectral image scene, the spectral mixing may be nonlinear due to the complex imaging condition in many practical situations [19]. As a result, the data structure may become complex and the problem becomes no longer linearly separable. In such case, the linear SRBBHC is not competent any more. Fortunately, kernel methods can project the linearly nonseparable data into a high-dimensional feature space in which those data become more separable. Here we extend the proposed SRBBHC into a kernel vision, referred to as KSRBBHC.

Similar to the KSRC, suppose that is the representation of query pixel in the high-dimensional feature space; the SRBBH model becomeswhere and , respectively, result from mapping and into kernel-induced feature space by mapping function and and are coefficient vectors, respectively, associated with and .

Employing the RBF kernel function, the residuals of the kernel SRBBHC in terms of background dictionary and different union dictionaries are, respectively, computed bywhere and are the estimation for and by KOMP.

Similarly, will be labeled to the class with greatest difference between and . Defining a new vector , the outputs of integrated detection and classification decision are then made bywhere is also a threshold used for validation. When , the query pixel will be labeled as undesired class, that is, background. Similar to SRBBHC, although is important for validation and classification, it is also determined experimentally.

#### 4. Experimental Results

In this section, the classification performance of KSRBBHC is evaluated and compared to the other four classifiers (SVMC, SRC, KSRC, and SRBBHC), and RBF kernel function is used for KSRC and KSRBBHC. The average recognition rate (ARR) and overall recognition rate (ORR) are suggested as performance parameters. The effectiveness of the proposed algorithms is evaluated with two datasets: a synthetic dataset ROI-I and a real dataset ROI-II, as shown in Figures 4 and 5.

The ROI-I is constructed by implanting five classes of targets, which are, respectively,* artificiality*,* clay*,* tree*,* plane*, and* grass* with a background scene size of 100 × 100 pixels collected by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) from San Diego, CA, USA. The image has 224 spectral channels (189 available) in wavelengths ranging from 370 to 2510 nm. In detail, each class of target is linearly mixed with background by varying abundance from 0.3 to 0.7 with step 0.1 in five small neighborhoods of size 2 × 2 pixels. In other words, the image contains 25 desired subjects occupying 100 pixels. 4 unmixed samples per class are randomly chosen for training, and the dual window sizes , are set as 3 × 3 and 9 × 9 according to the size of desired target. We compute the ARRs and ORRs of KSRC and KSRBBHC for ROI-I with varying kernel parameter and sparsity level , as shown in Figure 1. One can see from Figure 1 that the KSRC and KSRBBHC both are sensitive to , while plays a nearly negligible role for the two kernel vision algorithms when is fixed, especially when is fixed as 0.001, 0.01, and 0.1. When is fixed as 10, the ARRs and ORRs of KSRC and KSRBBHC both remain at a relatively high level and change smoothly with . As a result, the kernel parameter is set as 10 for the KSRC and KSRBBHC.