#### Abstract

Hyperspectral remote sensing technology is a rapidly developing new integrated technology that is widely used in numerous areas. Rich spectral information from hyperspectral images can aid in the classification and recognition of the ground objects. However, the high dimensions of hyperspectral images cause redundancy in information. Hence, the high dimensions of hyperspectral data must be reduced. This paper proposes a hybrid feature selection strategy based on the simulated annealing genetic algorithm (SAGA) and the Choquet fuzzy integral (CFI). The band selection method is proposed from subspace decomposition, which combines the simulated annealing algorithm with the genetic algorithm in choosing different cross-over and mutation probabilities, as well as mutation individuals. Then, the selecting bands are further refined by CFI. Experimental results show that the proposed method can achieve higher classification accuracy than traditional methods.

#### 1. Introduction

Hyperspectral remote sensors peculiarly provide measurements of the Earth’s surface with very high spectral resolution, usually resulting in tens of channels. Unlike multispectral sensors, the high spectral resolution renders hyperspectral remote sensors very powerful in applications requiring the identification of subtle differences in ground covers (e.g., material quantification and target detection). On the other hand, the large-dimensional data spaces generated by these sensors introduce challenging methodological problems. In the context of supervised classification, the most important methodological issue raised by these sensors is the so-called curse of dimensionality (also known as the Hughes effect) that occurs when the numbers of features and of available training samples are unbalanced [1].

Meanwhile, hyperspectral remote sensing images have nonlinear properties. These nonlinear properties originate from the multiscattering between photons and ground targets, within pixel spectral mixing, and from scene heterogeneity. In addition, given that the pixel size in most remote sensing systems is sufficiently large to include different types of land cover, classification error arises and produces unreliable classification results. In this case, traditional classifiers may fail completely.

In remote sensing literature, numerous methods have been developed to solve the hyperspectral data classification problem. A successful approach to hyperspectral data classification is based on the support vector machine (SVM). SVM determines two classes by identifying the optimal separating hyperplane that maximizes the margin between the closest training sample and the separating hyperplane. Data samples located at the hyperplane border are referred to as support vectors and are used to create a decision surface. The properties of SVM for both full-dimensional and reduced-dimensional data have been investigated, while multi-class SVM strategies have been considered in [2]. Hyperspectral image classification using different kernel-based approaches has been analyzed and compared, and SVM has been found to be more useful than other kernel-based methods in [3]. SVM classification performance is compared with other well-known neural approaches in [4], which exhibited that SVM provides simplicity, robustness, and increased classification accuracy compared with neural networks. In addition, some improved SVM methods have also been successfully used in hyperspectral image classification. The proposed method, called contextual SVM using Hilbert space embedding showed significant improvement over other methods on several hyperspectral images in [5]. A semisupervised method for addressing a domain adaptation problem based on multiple-kernel SVMs in the classification of hyperspectral data was presented in [6]. Thus, SVM is very suitable for hyperspectral image classification. However, dimension reduction is not sufficiently considered in SVM.

Commonly used dimension reduction methods fall into two categories, namely, feature selection and feature extraction. Since every band of hyperspectral data has its own corresponding image, the feature extraction approach maps a high-dimensional feature space to low-dimensional space via linear or nonlinear transformation. However, the original physical interpretation of the image cannot be retained. Thus, feature extraction approaches are unsuitable for the dimension reduction of hyperspectral images. Given that the spectral distance between adjacent bands in the hyperspectral data is only 10 nm and because the correlation between them is extremely high [7], a considerable redundancy is observed, which should be largely reduced by the feature selection or band selection methods to improve classification efficiency and accuracy. A semisupervised feature-selection technique for hyperspectral image classification was developed in [8]. A method for unsupervised band selection by transforming the hyperspectral data into complex networks was presented in [9]. Therefore, a new dimension reduction method is proposed that combines the simulated annealing genetic algorithm (SAGA) with the Choquet fuzzy integral (CFI).

A population and temperature ladder-based new genetic algorithm (GA) or the so-called SAGA was recently proposed to examine a sample from a distribution defined on a space of finite binary sequence. The feature selection strategy of hyperspectral images based on GA and SVM was proposed in [10, 11]. A GA-based feature selection and local-Fisher’s discriminant analysis-based feature projection are performed for effective dimensionality reduction in [12]. But SAGA method works by simulating a parallel population of samples with different temperatures. The population is updated via selection, mutation, cross-over, and exchange operations that are highly similar with GA. SAGA has the learning capability of GA, as well as the fast-mixing capability of parallel tempering (simulated tempering). In most cases, classification accuracy is only used as the fitness function, but internal relations between bands and classes have not been taken into account. Considering the above problem, a correction method based on CFI is proposed. The CFI does not assume the independence of one element from another and, based on any fuzzy measure, it is employed to perform the overall evaluation of an input pattern [13]. Moreover, the fuzzy measure defined on an attribute is used as the relative degree of importance of this attribute such that the connection weights can be interpreted as the fuzzy measure values or the degrees of importance of the respective input variables. The band selection method of this paper that is based on SAGA and CFI (SAGA-CFI) cannot only improve the accuracy of classification, but also effectively reduce the uncertainty of the information in order to further improve the accuracy.

Since hundreds of bands in the hyperspectral imagery exist, the direct search space for SAGA and CFI on the original band space becomes extremely huge. An adaptive subspace decomposition (ASD) method for hyperspectral data dimensionality reduction was proposed in [14]. To avoid the impact of enormous data sets on traditional statistical classification techniques, the ASD scheme is used. Thus, the differences between global and local statistical characteristics have been fully considered, and the problem presented by a limited number of training samples is then alleviated.

In this paper, we use SAGA and CFI in every subspace to choose suitable bands based on ASD which differs from the previous work [5, 6, 8–12] in three aspects. First of all, ASD is employed to divide the bands into disjoint subspace rather than mutual information. Although mutual information may make better performance than ASD, it also cannot be chosen in this paper because mutual information is interconnected with entropy of information, and it can be directly formulated by entropy. It is better to keep independence between ASD and CFI. Furthermore, based on GA, SAGA is used in band selection which includes a schedule of temperatures and approaches the global minimum when the temperatures change gradually. Last but not least, CFI is first employed to further optimize the band selection method. Thus, we reduce the search space and computational complexity, while avoiding the selection of an excessive number of adjacent bands.

The remainder of this paper is organized as follows. Section 2 introduces subspace decomposition. Section 3 presents the proposed SAGA. In Section 4, a brief description of three related elements and fuzzy measure followed by CFI is given. Section 5 provides the SVM classification adopted in this paper. Section 6 describes the proposed method. Experiments and analysis are demonstrated in Section 7. Finally, Section 8 concludes the paper.

#### 2. Subspace Decomposition

The main characteristics of hyperspectral remote sensing data are a large quantity of imaging channels (approximately 220 bands) and a narrow band spectrum. The spectrum of hyperspectral data is highly concentrated, rendering overall and local characteristics quite different. We may lose some important local characteristics if we select the bands from the total space. In terms of the overall situation, the bands are notably characterized by groups. We can divide all bands into several groups as long as a lower correlation exists between adjacent bands. Subspace decomposition not only reduces the dimension of the images, but also significantly improves the efficiency of data processing. Division of data sources based on ASD and fusion classification based on consensus theory is proposed in [15]. So the commonly used method continues to be ASD. According to the correlation matrix of hyperspectral images between bands, the full data space with dimensionality is adaptively decomposed into numerous subspaces with different dimensionalities. In each subspace, the bands have very strong correlation, while the energy is more concentrated. Hence, full data dimensionality can be logically reduced.

Since different bands have different correlations, all subspaces do not have the same dimensionality. Therefore, the goal is to match the features of each subspace with one or few classes. For this purpose, the new method primarily depends on the correlation matrix between different bands. The element of the correlation matrix is defined as

The value of the matrix ranges between 0 and 1. The closer is to 1, the more correlation exists between the two bands. and are the mean values of and , respectively. is the value of the mathematical expectation.

#### 3. Simulated Annealing Genetic Algorithm

Traditional selection, cross-over, and mutation operator, as well as the selection of fitness proportion in GA, allow the superior chromosome to maintain its predominance or strengthen it in the subsequent generations. The convergent chromosome may not be the overall optimal chromosome. SAGA combines the simulated annealing algorithm with GA. Thus, SAGA can perform the temperature-control function in the simulated annealing algorithm by controlling selection probability [16]. If we want to sample from a distribution defined on a space of finite binary sequence, we employ the following: where is the -dimensional binary vector with , is the scale parameter (a so-called temperature that can be any value of interest), and is the fitness function in terms of GA.

First, a sequence of distributions is constructed as follows: where, for , . The temperatures form a ladder with the order . For convenience, we denote the ladder by . Note that we always set as to correspond to the target distribution from which we obtain the sample. denotes a population of samples where is a sample from and is called a chromosome or an individual in terms of GA, while represents the population size. In SAGA, the Boltzmann distribution of the population is expressed as where . The population is updated by selection, cross-over, mutation, and exchange operators.

##### 3.1. Selection

The probability of having the chromosome chosen first is and probability of is where

##### 3.2. Cross-Over

One chromosome pair, such as and (), is selected from the current population through the roulette wheel. Two offspring, and , are generated according to a specific cross-over operator. A new population is proposed as and is accepted with probability according to the Metropolis-Hastings rule that is expressed as follows: where denotes the selection probability of from the population and denotes the selection probability of from the population .

##### 3.3. Mutation

We define the mutation operator as an additional move of the Metropolis-Hastings rule. One chromosome, such as , is uniformly chosen from the current population . A new chromosome is generated by the addition of a random vector , such that where is usually chosen to achieve moderate acceptance probability for the mutation operation. The new population is accepted with the probability according to the following Metropolis-Hastings rule:

##### 3.4. Exchange

A straightforward implementation of relative parallel tempering can outperform simulated annealing in several crucial respects, and parallel tempering can offer a powerful alternative to simulated annealing for combinatorial optimization problems [17]. Given the current population and the attached temperature ladder in , we propose to obtain a new population by making an exchange between and without changing the . That is, . The new population is then accepted with probability ) according to the Metropolis-Hastings rule below:

##### 3.5. Fitness Function

In addition, another key of SAGA is the design of the fitness function. We use only the classification accuracy obtained from the training feature subset as the fitness function. The purpose of the iterative repetition is to determine the optimal feature subset and to maximize classification accuracy. The adopted classifier is SVM, which is described in Section 5.

#### 4. Choquet Fuzzy Integral

Based on subspace decomposition, CFI method is used to further refine the selecting bands. The definition of fuzzy measure and Choquet integral are shown in [18, 19].

*Definition 1 (fuzzy measure (see [18])). *Denote the Borel set as *B* which is obtained from the domain , and then define a fuzzy measure on , it must satisfy the following conditions:(1); , is null set;(2)given two subsets , if , so ;(3)If , then . According to the definition of fuzzy measure, Sugeno introduces the measure.

*Definition 2 ( measure (see [19])). * For all the sets , , there exists to satisfy

Obviously, when , -fuzzy measure is the probability measure.

Given a finite set and orders , the mapping as fuzzy density function, also as single-point importance. If , according to (20), the following formula can be deduced as where . Because , the value of can be achieved by solving . It can be proved that given the fixed set , , there exists one and only one and . So, if the fuzzy density () is given, it can get the unique -fuzzy measure.

With regards to the theory on information fusion, fuzzy density server as the importance or the contribution of the source . The group of source can determine a unique -fuzzy measure in the process of data fusion. Based on the -fuzzy measure, Choquet proposed a fuzzy integral method.

*Definition 3 (Choquet integral (see [19])). *Given a function , and its Choquet integral on fuzzy measure is defined as

In the equation, the value of the function can be interpreted as a credibility estimation of the source for specific target. Note that the function is increasing, ; fuzzy measure is the importance or contribution of information source with respect to the ultimate decision-making or estimation, .

According to (13), CFI can be seen as the weighted sum of , and the weights depend on of the rank of , and the value of decides the rank of ; so the CFI is a nonlinear function of function . It is clear that when , the -fuzzy measure is the probability measure, and the CFI is a linear function of . The CFI is used in data fusion if we regard the as a result of goal judgment and —as the degree of importance or contribution. Obviously the CFI is the nonlinear combination of the result of information source with the importance of information source.

Before computing the fuzzy integral we must compute the value of . From (12), we know that the solution to of the fuzzy integral is the root of high-order polynomial. If there are many sources, there is computation burden to get the value of parameter , blocking the online and real-time of algorithm.

#### 5. SVM Classification Methods

Training data are required to train the SVM model. However, these data cannot be separated without errors. The data points that are closest to the hyperplane are used to measure the margin, while SVM attempts to identify the hyperplane that maximizes the margin and minimizes a quantitative proportion to the number of misclassification errors [20, 21]. SVM derives the optimal hyperplane as the solution of the following convex quadratic programming problem [22]: where are the labeled training datasets, and ; , and defines a linear classifier in the feature space; is the regularization parameter defined by the user; and is a positive slack variable that handles permitted errors.

The aforementioned optimization problem can be reformulated through a Lagrange function, where Lagrange multipliers can be found via dual optimization to generate a convex quadratic programming solution as follows [23–25]: where is the vector of Lagrange multipliers, while is a kernel function which is introduced as follows [26]:

Thus, the final result is a discrimination function conveniently expressed as a function of the data in the original (lower) dimensional feature space [27]:

#### 6. Proposed Method

##### 6.1. Adaptive Subspace Decomposition

In the beginning, adaptive subspace decomposition is used to divide into seven subspace according to (1). All values are identified, and then the proper threshold is set. The continuous bands of in the same subspace are subsequently placed. We can dynamically control the number of subspaces and the number of bands in each subspace by changing the threshold .

##### 6.2. The Band Order Method in Subspace

SAGA is used in order to find out the optimal bands in each subspace. Here we choose common binary coding method as the genetic coding mode, and the iteration times of SAGA is 50. Generally, a subspace has many bands, and all the suitable bands should be chosen. Meanwhile, if the subspace has only one band, it must be chosen.

##### 6.3. The Band Reorder Method in Subspace

After the bands are chosen according to SAGA, they also can be further optimized based on the CFI method. CFI takes into account the factors of entropy of information, correlation coefficient, and standard distance between the means.

###### 6.3.1. Entropy of Information and Variance

According to Shannon's information theory, entropy measures information content in terms of uncertainty. The entropy of the hyperspectral components represents the information content of each component. Thus, the higher the entropy, the richer the information content, resulting in a more meaningful representation. The entropy or total information [28] is defined as where is the probability of pixel value .

Variance represents deviation from mean value to the gray-scale value of pixel. The formulae of computing mean value and variance are as follows [29]: where and are the numbers of two adjacent bands. and represent the width and the height of image, and is the gray-scale value of pixel .

###### 6.3.2. Correlation Coefficients

In statistics, the correlation coefficient denotes the accuracy of a least square fitting to the original data. It is a normalized measure of the strength of the linear relationship between two variables. Correlation is employed in many types of applications, such as hyperspectral image processing where it is used to measure and to quantitatively compare the similarity between bands [30]. The two-dimensional normalized correlation function for image processing is shown below: where is a real number between −1 and 1.

###### 6.3.3. Standard Distance between the Means

Object classes need to be analyzed in depth in which the band is easy to be distinguished [31] that is, the statistical distance between object classes in the band. Standard distance between the means is defined as where and are spectrum means of corresponding regions of the two samples. and are variances of corresponding regions of the two samples. reflects separability of the two samples in each band.

Then, the procedure of the band reorder method using CFI is as follows.(1)According to (18), entropies of information in each subspace are computed and recorded as .(2)According to (21), the correlation coefficients in each subspace are computed and recorded as .(3)According to (22), standard distances between the means in each subspace are computed and recorded as .(4)Belief function is constructed and domain is . The relations between index value of each factor and band reorder are described below. The bigger the entropy is, the rich the information is. The smaller the correlation coefficient is, the more independent the band is. The bigger the standard distance between the means is, the easier to distinguish the two samples is. So the belief functions of CFI are listed as follows: where . Equation (23) is reordered and a new equation (24) is generated: , , and are minimum, median, and maximal values of the three, respectively.(5)Another important problem that needs to be fixed is fuzzy measure . Belief function is arranged in ascending order, and the biggest one is of primary importance. In each subspace, , (6)The formula of computing fuzzy integral value is as follows: where , .

##### 6.4. Flowchart of the Proposed Method

The flowchart of this paper is illustrated in Figure 1.

#### 7. Experiments and Analysis

##### 7.1. Hyperspectral Images

Experiments were conducted on a hyperspectral data set from the Northwest Indiana Indian Pine test site 3 (2 × 2 mile portion of Northwest Tippecanoe County, Indiana) on June 12, 1992. These data include 145 by 145 pixels and 220 bands. The false color image is shown in Figure 2, which is composed of band 89, band 5, and band 120.

##### 7.2. Subspace Decomposition Experiment

The ASD scheme is used to obtain the correlation value between the bands. Table 1 gives the values of the parts of the correlation matrix according to (1).

As presented in Table 1, the autocorrelation coefficient of each band is equal to 1, and the correlation value is very high. In this paper, the ASD method is performed using the correlation criterion of a given threshold, which is 0.8. The full data space is decomposed into seven subspaces. The dimensions of each subspace are shown in Table 2.

From the 220 spectral channels acquired by the AVIRIS sensor, 41 bands were discarded because they were affected by atmospheric problems. The discarded bands were as follows: 1–4, 78, 80–86, 103–110, 149–165, and 217–220. As a result, the new dimensions of each subspace are shown in Table 3.

##### 7.3. SAGA in Each Subspace

The hyperspectral image is categorized into seven classes according to the real data on the ground. The ratio of training and test samples is 1 : 3 because SVM is suitable for small samples. SAGA is used in each subspace, while fitness is computed and illustrated in Figure 3. We select the most optimum band in each subspace. SAGA in subspace numbers. 3, 6, and 7 is unnecessary because each of these subspaces contains only one band. The kernel function used is a radial basis function, while the two SVM parameters (i.e., and ) are selected based on fivefold cross-validation during the training phase. The search range for is in and for .

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

##### 7.4. Index Value of CFI

Entropy, correlation coefficient, and standard distance between the means of each band are computed. The index values of CFI are then obtained and sorted in descending order in each subspace. The bigger the index value is, the more important the band is. is a given threshold of index value. Table 4 shows the index values of the bands when threshold is 0.940.

SAGA is used to determine which band/bands shall be selected in each subspace, but it cannot indicate which bands have higher priority than others. The index values of CFI are then further refined the selecting bands, and the more effective optimizations come into being.

##### 7.5. Computational Time Complexity

There is one issue that needs to be considered. The proposed procedure constructs and analyses probably consume considerable time. Thus, we compare the time complexity of the four methods GA, SAGA, CFI, and SAGA-CFI in this part. The time complexity of SAGA-CFI is , just the same as the other three methods. This means that the processing cost of SAGA-CFI is no more than the others.

##### 7.6. Classification Experiment

The hyperspectral image is also categorized into seven classes, while the ratio of training and test samples remains 1 : 3. The numbers of training samples and of test samples are shown in Table 5.

In this work, we implement another two similar classification methods for hyperspectral images to compare with the algorithm proposed in the paper. One method is based on SAGA and SVM classification (SAGA-SVM). The other method is based on CFI and SVM classification (CFI-SVM). Similarly, the method of this paper is based on SAGA, CFI, and SVM classification (SAGA-CFI-SVM). The error matrices of the three methods are presented in Tables 6, 7, and 8, while the total accuracy and Kappa value are exhibited in Table 9. The threshold is 0.94 in all of the above methods. Table 10 shows the different total accuracy and Kappa value by changing the threshold .

In the error matrix, the product’s accuracy (PA) is defined as and the user’s accuracy (UA) is defined as where is the value on the major diagonal of the th row in the error matrix, is the total number of the th row, and is the total number of the th column.

To measure the agreement between the classification and the reference data, we compute the kappa coefficient based on the following equation: where is the number of total pixels.

The original reference image, the SAGA-SVM classification image, the CFI-SVM classification image, and the SAGA-CFI-SVM are illustrated in Figure 4.

**(a)**

**(b)**

**(c)**

**(d)**

**(e)**

**(f)**

**(g)**

**(h)**

#### 8. Conclusions

An innovative band selection algorithm called SAGA-CFI has been developed and combined with the classification method SVM to classify hyperspectral remote sensing images. On the basis of subspace decomposition, SAGA was used in each subspace to lower the computational complexity and select the suitable bands, and CFI method was adopted to further modificate the selecting bands in order to increase classification accuracy. SAGA-CFI-SVM has been implemented to achieve improved classification methods compared with conventional algorithms. Comparison results show that the proposed method is superior in terms of classification accuracy.

The classification of hyperspectral remote sensing images based on SAGA-CFI-SVM in this paper is far from complete and thus requires further research. One problem cited is the further reduction of the computational complexity of SAGA and the acceleration of the searching procedure faster. Another problem is the thorough improvement of the kernel function to obtain significantly higher classification accuracy. Last but not least, we need to study the classification method based on selective ensemble support vector machine, for it may further improve the accuracy.

#### Acknowledgments

This study is supported by the National Natural Science Foundation of China (no. 61271386) and funded by the CRSRI Open Research Program (no. CKWV2013215/KY) and the Industrialization Project of Universities in Jiangsu Province (no. JH10-9).