#### Abstract

A band selection method based on two layers selection (TLS) strategy, which forms an optimal subset from all-bands set to reconstitute the original hyperspectral imagery (HSI) and aims to cost a fewer bands for better performances, is proposed in this paper. As its name implies, TLS picks out the bands with low correlation and a large amount of information into the target set to reach dimensionality reduction for HSI via two phases. Specifically, the fast density peaks clustering (FDPC) algorithm is used to select the most representative node in each cluster to build a candidate set at first. During the implementation, we normalize the local density and relative distance and utilize the dynamic cutoff distance to weaken the influence of density so that the selection is more likely to be carried out in scattered clusters than in high-density ones. After that, we conduct a further selection in the candidate set using mRMR strategy and comprehensive measurement of information (CMI), and the eventual winners will be selected into the target set. Compared with other six state-of-the-art unsupervised algorithms on three real-world HSI data sets, the results show that TLS can group the bands with lower correlation and richer information and has obvious advantages in indicators of overall accuracy (OA), average accuracy (AA), and Kappa coefficient.

#### 1. Introduction

Hyperspectral imagery (HSI) is a combination of computer generated imagery (CGI) and spectral detection technology, and it can help us analyze the characteristics of objects without direct contact. Since each pixel in HSI has both plane coordinate and spectral information, we usually describe HSI as a three-dimensional cube; that is, on the spectral axis, each band corresponds to a 2D image. Due to the different degree of absorption and reflection of object surface against electromagnetic waves with various wavelengths, as well as the continuous accuracy improvement of spectral acquisition instruments, spectra are distributed on hundreds of narrow bands (generally bandwidth less than 10 nm) continuously. Up to now, HSIs obtained via remote sensing mapping are widely applied for data analyses in many application fields, such as mineral exploration [1], environmental and atmospheric monitoring [2, 3], and agricultural information services [4]. Compared with color image and multispectral image, more information can be recorded in HSI because of its high resolution, which is very useful for targets classification. However, it also brings some technological obstacles such as high dimensionality and information redundancy owing to similar or overlapped bands. Existing research studies have shown that high correlations frequently appear in some adjacent bands that probably cause the “Hughes phenomenon” [5]. Therefore, we always preprocess the spectrum before classification, including noise removed and redundancy reduction, which can effectively cut down the operation costs and improve the processing speed on the premise of maintaining accuracy of image recognition.

There are two approaches to achieve dimensionality reduction for HSI, i.e., band extraction and band selection (BS) [6, 7]. The former projects the all-bands into low dimensional subspace to form a simplified image; however, it may lead inherent feature of information to change. Some recent technologies include singular spectrum analysis [8], sparse representation [9], and stacked auto-encoders [10]. In essence, BS is a combined optimization problem; that is, we should find out a band combination with rich information, low correlation, and good discrimination via the evaluation criteria function. Whatever method is adopted, it is difficult to increase speed and accuracy simultaneously, so a common practice is that improve the efficiency of dimensionality reduction by optimizing some existent algorithms.

In this paper, our main contributions are summarized as follows. (1) Strategy of two layers selection is proposed for generating an optimal band combination. Specifically, we build a candidate set U using the bands selected by FDPC at first, and then spectral analyses are conducted to choose the high-quality elements from U, where we take not only information contained in a single band but also correlation of interband into consideration. (2) We put forward a new information evaluation method called comprehensive measurement of information (CMI), and by introducing the standard deviation and k-neighbors average similarity, both individual information and correlations among each other are considered synthetically. (3) Inspired by the idea of the greedy algorithm, we adopt the mRMR method to enrich the target set iteratively that can minimize redundancy and maximize representativeness.

The remaining sections are organized as follows. In Section 2, we will introduce some related research work about BS technologies in recent years. In the following section, principles of FDPC and information analysis as well as implementation flow of mRMR will be presented in detail. In Section 4, we state concretely how to build an optimal subset to replace the original spectrum via the TLS algorithm. A series of experiments and comparative analyses for results will be conducted to prove the efficiency of the proposed algorithm, and we arrange these in Section 5. In the last section, some conclusions will be given.

#### 2. Related Work

As mentioned previously, it is an effective way to reduce the spectral dimensions of HSI via BS because this preprocessing can remove the redundant parts contained in the original spectrum, which is beneficial to decreasing the storage and computing consumption for subsequent image procession. If we have grasped the facts that various objects reflect against the electromagnetic waves, establishing an object-spectrum dictionary can guide us to select bands accurately; however, it is time-consuming, costly, and even impossible to get them in many cases. Unsupervised methods can well adapt to various application scenarios, which just make full use of the band distribution and interband relationship. At present, unsupervised BS is mainly categorized into ranking-based method, searching-based method, sparsity-based method, clustering-based method, and so on.

##### 2.1. Unsupervised Band Selection Methods

For a long time, the research studies related to BS mainly focus on two themes. One is the selection algorithm that is commonly designed by using idea of supervised, semisupervised, or unsupervised method, and the other is the output evaluation criterion which is adopted to measure the performance of an algorithm. According to the needs of discussion, we briefly introduce some unsupervised methods, as well as corresponding algorithms used in Section 5.

The ranking-based method evaluates the importance of a band by using a criterion and employs top-ranked ones instead of all-bands to represent HSI. Clearly, this method can find out most discriminative bands, while the high correlation is inevitable owing to differences between each other neglected. Constrained band selection (CBS) [11] and maximum variance principal component analysis (MVPCA) [12] are both typical ranking-based algorithms. Compared with CBS, MVPCA is more sensitive to noise, so it should be used selectively according to the characteristics of data sets.

The searching-based method converts BS into an optimization problem of a given criterion and iteratively searches for the best bands to constitute a target set. For example, in [13], linear prediction (LP) is adopted to evaluate the similarity between a single band and other ones, and on this basis, the best band in current round is picked out. The sparsity-based method uses sparse representation or regression to reveal the information structure of a data set, and we select the representative bands by solving an optimization problem using sparsity constraints. The improved sparse subspace clustering (ISSC) algorithm [14] which will be employed for comparative analysis in Section 5is a sparse representation-based method.

Nodes belonged to the same cluster have similar features, so based on clustering, we select several exemplars to replace the entire cluster. Hierarchical-based clustering firstly initializes the whole set or a single node as a cluster and then groups the nodes by aggregation or splitting. Classical algorithms include WaLuDi [15], BIRCH [16], and CURE [17]. For the partition-based clustering algorithm, both number and centers of the clusters must be initialized in advance. We adjust the composition of each cluster by constantly updating the ownership of nodes until the stop condition is met. Some typical algorithms, such as *K*-means [18] and FCM [19], are widely applied in various classification applications. Observed from the perspective of geometric distribution, the high-density areas are separated by low-density ones, and each cluster is corresponding to a data subset with the maximum local density that can be connected. Therefore, the density-based algorithm can solve the classification problem for irregularly spatial distribution perfectly, and some algorithms, such as AP [20], DBSCAN [21], and FDPC [22], have shown good performances in nonspherical distribution clustering.

In the process of seeking a target set to reconstitute HSI, both initialization parameters and noise information impact greatly on the implementation effect. In many cases, the capability of an algorithm depends on initial parameters heavily, and the improper parameters may cause deviations between the clustering results and actual situations, even significant errors. Furthermore, during image acquisition, noises are generated inevitably owing to the environment or imaging equipment, which probably bring obstacles to subsequent processing. Usually, noise nodes are outliers with low density, so the density-based clustering algorithm has prominent advantage to noise recognition.

##### 2.2. Fast Density Peaks Clustering

The FDPC algorithm was proposed by Rodriguez and Laio in 2014, which can obtain the globally optimal solution through a few parameters and simple process (no iteration required), and an obvious advantage compared with other clustering-based algorithms is that it can find arbitrary-shaped clusters, rather than just spherical regions. In the field of HSI processing, besides band selection [23], FDPC is also applied to superpixel segmentation [24]. Nevertheless, its performance still needs to be promoted, including computational complexity reduction, adaptive ability of parameters enhanced, and accuracy and robustness improved.

The time complexity of FDPC is O(n^{2}) without considering dimensions, where *n* is the number of nodes. Accordingly, the algorithm is unsuitable for large-scale data clustering because of its high complexity. As improvements, researchers introduce parallel algorithms (e.g., EDDPC [25] and LSH-DDP [26]) or use grid treatment in advance (e.g., DGB [27], DPCG [28], and PDPC [29]) to accelerate it. For example, FastDPC-KNN [30] provides a solution, in which KNN is adopted to cooperate with FDPC, and the time complexity is reduced to O(n.log2^{n}). It utilizes cover tree to speed up the calculation by distinguishing the type of peak density so as to avoid calculating the distance in the global range.

Parameter self-adaption (PS) is another important research issue for FDPC. Specifically, cutoff distance delimits the neighborhood size of each node that directly determines the statistical result of local density and also has a great influence on the composition of clusters. PS can cut down the probability of errors caused by experience setting, and it is more adaptable to various data scenarios. Researchers have employed some methods such as density estimation [31] and ADPC-KNN [32] to realize the PS.

#### 3. Method and Strategy

##### 3.1. FDPC for Candidate Set

FDPC is based on the following two assumptions. In each cluster, firstly, the density of a center is higher than that of the surrounding nodes, and secondly distance between the center and higher density node is relatively large. Moreover, there are two extremely important values in the algorithm, i.e., local density and relative distance , and both of them depend on the similarity matrix *S*. A hyperspectral image can be described in both spectral and geometric spaces, , where *L* and *N* are denoted as the number of bands and pixels, respectively. Thus, is the response of all pixels to band, and is reflection of pixel on different bands. Generally, we should build an initial similarity matrix at first, and the similarity between b and and is expressed as the following equation:

In practice, Gaussian kernel function is commonly applied to calculate Euclidian distance. In equation (2), is defined as the interband distance based on matrix, and we obtain the correlation between the pairwise bands. Obviously, the closer two bands are, the higher redundancy is.

Closely related to cutoff distance , the local density is defined as follows:

A convenient and intuitive way to get is that indicator function accumulates the nodes with Euclidean distances from is less than . However, this approach does not distinguish the contribution of distance to density, and increases by one as long as . Hence, we also adopt Gaussian kernel function to overcome the limitation. is the only parameter provided for human-machine interaction, and experience shows that the algorithm performs well when is set to 1%-2% of all interband distances sorted in descending order. Inappropriate may cause high overlap between clusters or produce a large number of meaningless clusters. Since FDPC is very sensitive to , we should set it precisely through some reasonable methods, e.g., PSO [33] and ADPclust [34], rather than relying on the empirical values. In Section 4.2, we determine according to the number of required bands.

Next, we give the definition of as the following equation: is the distance between and the node farthest from it, only when has the maximum local density. More generally, there are several nodes with higher density around ; the distance between it and the nearest node is taken as . After and of all nodes are obtained, we establish the decision graph to describe them. In Figure 1(b), the nodes that can act as cluster centers are usual outliers; for example, node 1 has the largest projection values on two-dimensional axis, which means that it has both highest local density and sufficient intercluster spacing.

**(a)**

**(b)**

However, most of the remaining nodes are concentrated near the bottom of graph with small (region B), which indicates that they are grouped around the high-density nodes and have less power to be independent centers. In addition, FDPC has strong noise detection capability, and it can help us eliminate interference bands before BS. In Figure 1(b), the nodes near the vertical axis are probably labeled as noise ones, e.g., node 27 and 28.

For each , we utilize inner product to integrate density and distance at first and sort in descending order for getting a priority sequence . On this basis, we select *m* top-ranked bands, i.e., to form a candidate set U, where is the subscript of band corresponding to and *m* is the number of required bands. The outputs of FDPC are all exemplars in each cluster, and the vital information is maintained accordingly. However, some bands that can provide more information for classifier are probably not picked out, such as boundary ones, because FDPC is more likely to select in high-density regions rather than low-density ones. Hence, based on the candidate set, we must conduct further analysis from the perspective of spectral information.

##### 3.2. Layer 2 Selection for Target Set

In this paper, we analyze the amount of information (AoI) and band correlation as foundation and integrate them to evaluate information comprehensively. As stated in the last section, the bands selected by FDPC are already representative, whereas it is one-sided owing to just from the view of spatial position which implies that less, similar, or overlapped information may still exist in candidate set.

###### 3.2.1. Comprehensive Measurement of Information

Shannon entropy is a common index to measure event uncertainty, and researchers usually employ it to distinguish the AoI contained in a band. It is generally believed that an event with large entropy corresponds to strong uncertainty, which means that more information can be provided for judgement. Assuming that the band gets different values with various probabilities, its Shannon entropy is defined as equation (6), where is the possible value of .

The standard deviation is another way to measure AoI, and it reflects the uncertainty through the difference between a set of data and its mean value, as defined in the following equation:where is the mean value of . Apparently, greater corresponds to large AoI.

It is improper to consider AoI in a single band alone, but ignore the relationships among them because high information correlation between adjacent bands is also very common, just like spatial redundancy.

Hence, we put forward CMI to reevaluate information situation for a band by taking both AoI and information redundancy into account comprehensively.

The correlation between and its *h*-neighbors is measured by average similarity , where is correlation coefficient between adjacent bands and *h* is an even number. For example, when *h* = 2, we judge the information independence of via average similarity on pairs of , which is called the nearest neighbor metric. In general, we get in a wider neighborhood by appropriately increasing *h* because we cannot guarantee that is always greater than . However, the probability of information redundancy between bands with large label difference is very small, so excessive *h* may cause meaningless computation.

According to equation (8), if is what we are looking for, it should have either a large or a small or both. Hence, it can prevent bands with high information redundancy from being selected via CMI.

###### 3.2.2. Further Selection Employed mRMR

The abbreviation mRMR denotes maximum representativeness and minimum redundancy.has a larger weight because of measurement scale during the implementation of FDPC, so the targets are most probably generated in the high-density regions. In Figure 1(b), if we want to select more from region B, the results must be the neighbors of node 1 instead of any node else. Clearly, FDPC guarantees the representativeness of candidate set but inclines to cause redundancy, so based on its outputs, we employ mRMR strategy to conduct a further filter.

Let be the candidate set, and target set and residual set are denoted as and , respectively; . Supposing that *k* (*k* ≥ 1) bands have already existed in , if the (*k* + 1) th band is required from to enrich , the best one should satisfy the following conditions. (1) Lowest correlation within , that is, the average distance from it to every element in is farthest; (2) Highest similarity with , which indicates that it has the most power to represent other bands in . According to formula (9), we select the most appropriate .

Motivated by above descriptions, taking AoI and information correlation into account simultaneously as formula (10), we get the target set with strong representativeness, good discrimination, and low redundancy to reach dimensionality reduction for HSI.

#### 4. TLS Algorithm

##### 4.1. Implementation Flow

TLS integrates spatial position, information contained in a single band, and correlation between each other to evaluate the importance of a band, so it is suitable for spectral dimensionality reduction because of the comprehensiveness of its outputs.

In this section, we explain how TLS works. As the preliminary BS (layer 1 selection), FDPC prioritizes the bands firstly and selects the top-ranked ones to establish *U*. In layer 2, we make some relevant initialization for preparation, , , and score for each band in *U* by using CMI index. In current round, the most valuable band satisfied formula (10) is picked out to join , and those ones that have approximate information to will be removed from . Iterate until , and the informative and low information-redundancy band combination is built.

We give the technology roadmap of TLS as Figure 2.

TLS filters the redundant information bands via threshold . According to equation (8), becomes the winner in a certain selection round only when it has both rich AoI and strong information independence. For each , if , we take out of owing to its high correlation. We state the implementation flow of TLS in Algorithm 1and put some related explanations and analyses in Section 4.2and 4.3.

##### 4.2. Normalization and Parameter Initialization

As mentioned previously, different metrics cause that has a heavy impact on prioritization, and bands with high-densities are more attractive to FDPC. As a direct improvement, both and are normalized to interval (0, 1).

In addition, we also reduce the influence of by adjusting dynamically so that the probability of selecting in the low-density regions increases gradually. Improper may lead to algorithm failure, even domino effect happened and cannot be corrected by itself. For the sake of simplicity, the empirical method sets with fixed size; however, it is inefficient when dealing with high-dimensional data or fake peaks.

In order to make the density value relatively accurate, it ought to be avoided as much as possible that a band appears in different neighborhood repeatedly. Hence, we deem that should not be fixed but change dynamically corresponding to *m*, shown as the following equation:where is the initial value of cutoff distance. With the increase in *m*, keeps getting smaller, and the situation that a band belongs to different density neighborhood will gradually disappear. Usually, , and is multiplied by a coefficient to determine . In extreme case, if each node corresponds to a cluster, i.e., , we get .

##### 4.3. Performance Analysis to TLS

The time complexity of FDPC is , which is mainly the time consumption of building similarity matrix. On this basis, TLS increases the cost of acquiring CMI of each band (linear complexity and iteratively generating the target set . Therefore, in the field of dimensionality reduction to HSI, the time complexity of TLS is which affects the real-time performance when dealing with high-resolution images.

Besides high time complexity, TLS also needs to initialize *m* in advance because it has no ability to automatically configure the number of clusters according to the data distribution. In layer 1, no peak or fake peak will cause the proposed algorithm invalid, for the hypothesis that makes FDPC work does not hold. In addition, the outputs of mRMR are not back-traceable, which implies that it cannot be deleted if a band has been selected into the target set.

In conclusion, the distinct advantage is that TLS can find out a more effective band combination in the condition of using the same *m* with others. Clearly, TLS not only inherits the characteristics of FDPC, such as good at exemplar selection, noise insensitivity, and no initialization to cluster center, but also makes information to be an important reference by using CMI.

#### 5. Experiment and Discussion

In this section, a series of comparative experiments have been designed and implemented on three HSI data sets, and the capability comparisons of TLS and other state-of-the-art algorithms using OA, AA, and Kappa coefficient are followed. Analyses and discussions are carried out in three aspects: (1) the difference of band distribution formed by various algorithms; (2) influence of number of the selected bands on HSI performance; and (3) influence of other factors, such as classification model and data set, on the performance of the algorithms. Before the experiments, the relevant contents should be introduced firstly, including data sets, algorithm competitors, classifiers for validation, and indicators for capability comparison.

##### 5.1. Preparation for Experiments

###### 5.1.1. Data Sets

Three real-world HSI data sets which are derived from remote sensing images, i.e., Indian Pines, Pavia University (PaviaU), and Salinas are used for experiments. The essential information about them is briefly described in Table 1.

As universal data sets, there are some common characteristics with them. First of all, pixels of land cover that belong to the same class have the similar features, whereas the spectra corresponding to distinct classes are obviously different, which is very suitable for BS by clustering methods. Secondly, the distribution of pixels among classes is inhomogeneous and even most pixels of HSI are concentrated in a few bands, as shown in Figure 3. Finally, some contaminated bands have been removed to ensure the validity of the data; for example, 16 bands disturbed by external circumstances in Indian Pines, which are numbered 104–108, 150–163 and 220, are cleared beforehand.

**(a)**

**(b)**

**(c)**

###### 5.1.2. Basic Setup for Experiment

In order to verify the effectiveness of TLS, in this paper, MVPCA [12], WaLuDi [15], DBSCAN [21], FDPC [22], LP [13], and ISSC [14] algorithms are applied to reconstitute HSIs as competitors, respectively.

We train KNN (*K* = 5) and SVM (RBF kernel function) models with labeled samples in advance, and the classifiers have stronger generalization ability after sufficient experiences mastered. Due to the uncertainty of individual result, we take the average of 10 rounds as finals using cross-validation so as to make the outputs of algorithms more referable and convincing. In Indian Pines/PaviaU/Salinas, 30%/10%/10% pixels in every class are for classifiers learning and 10%/5%/5% ones are for tests during each round.

For a specific data set, we set the ranges of parameters and thresholds for testing, and the values corresponding to the best results are adopted. The detailed settings are as follows: , , , , and .

###### 5.1.3. Performance Indicators

OA, AA, and Kappa coefficient are commonly used as indicators to evaluate classification effect based on confusion matrix. OA represents the ratio of number of correctly classified pixels to the total; however, it cannot show the real situation when the class-scale difference is relatively large. As a more reasonable metric, AA reflects the recognition accuracy on a single class. Kappa coefficient is usually employed for consistency check, and in general, a larger Kappa coefficient means that the prediction results are more consistent with the ground truths. Specifically, conclusion is substantial when 0.8 > Kappa > 0.6, while Kappa ≥ 0.8 corresponds to perfect matching.

##### 5.2. Results Analysis and Discussion

###### 5.2.1. Distribution of Algorithm Outputs

As mentioned in Section 5.1.2, seven algorithms are adopted to select bands, representing the original image with reduced spectral dimensions. For example, the results of 10 bands selected in Indian Pines are shown in Figure 4, from which we can observe the band distribution and redundancy intuitively.

Obviously, the redundancy produced by MVPCA is highest among all the employed algorithms, and most of selected bands are concentrated in the interval [120, 140]. As stated in Section 2, the ranking-based algorithm can find out critical bands efficiently by prioritizing, while it probably results in high redundancy and low discrimination because of correlation between the bands neglected. There are no significant differences in the performance of remaining algorithms although ideas they adopted are not exactly the same. The selection results are not uniformly distributed in entire band interval, and concentration may appear in some local intervals. In Figure 4, different algorithms will conduct selections in the same interval, which means that the attractiveness of interval with remarkable characteristics to various algorithms is similar, but the specific output within an interval may be different. Nevertheless, the dispersion of bands selected by TLS in global range is still better than that of some competitors because layer 2 plays an effective role. From the illustration, the concentrated bands appear in four/three intervals intuitively when we employ FDPC/DBSCAN, whereas they appear in just two if WaLuDi/LP/ISSC/TLS is used.

###### 5.2.2. Accuracy and Consistency Check

*Comparison of Accuracy Index*. As illustrated in Figures 5–7, we conclude common characteristics at first. No matter what algorithm or data set is employed, the improvement of OA is always synchronized with the increase in *m*. Nevertheless, the band contributions to classifier decrease gradually, which implies that excessive selections have no great significance for the evolution of classifier parameter. As Figure 5(b), OA of each algorithm except MVPCA has been improved by about 20% which is brought by the increase in *m* from 6 to 30; however, if we raise the number to 36 or 42, OAs are maintained at the current level and the classification capability has not gone better obviously.

**(a)**

**(b)**

**(a)**

**(b)**

**(a)**

**(b)**

Moreover, the effects of the reduced sets formed by different algorithms to image recognition are unstable, which depends on both the classifier model and data set. Intuitively, OA of the SVM model is higher than that of KNN in Indian Pines significantly, while the performances of two classifiers are not widely different in PaviaU and Salinas. From the view of the model, SVM seeks a hyperplane to maximize the margin between two classes by learning the experiences provided by support vectors, and it has good generalization power, as well as strong ability of noise resistance. Differently, KNN uses the nearest neighbors voting method to determine the class attribution of a sample, and its accuracy is slightly lower than SVM. On the other hand, the capability of the same algorithm may be diverse when dealing with different data set, and for each algorithm, OAs in PaviaU and Salinas are superior to those in Indian pines. Evidently, there are several small-scale classes in Indian Pines, even three ones with less than 50 samples (Figure 3(a)), and samples contained in these classes have high probabilities of misclassification that make OA declining.

According to the number of available bands *L*, we take as required number (40 bands from Indian Pines and Salinas and 20 bands from PaviaU), and the accuracy of individual class, AA, and OA of algorithm is shown in Tables 2–4. In each table, we notice the following facts. Firstly, whatever algorithm is employed, the recognition accuracies on most small-scale classes are relatively lower (such as grass-pasture-mowed, oats, buildings-grass-trees-drives in Table 2, gravel in Table 3, and lettuce-romaine-6wk in Table 4), but there are exceptions (such as wheat in Table 2and shadows in Table 3). This may be caused by over-fit that classifier takes OA as the criterion to fit the samples on training set as much as possible. It will generate some false negative samples on the small-scale class during model training; in other words, some samples that originally belong to small-scale class are wrongly classified to large-scale one. Therefore, when the classification model is applied to test set, its generalization ability will decrease. Although AA can make up for this defect, the main way is to reserve an appropriate number of bands with high quality to promote the discrimination ability of classifier. Secondly, OA is larger than AA and the difference reflected in Indian Pines is more prominent. Compared with averaging accuracies of all classes, it can make smaller impact to accuracy if we use proportion of quantity. Finally, some algorithms perform well just on the specific classes (such as MVPCA on Alfalfa in Table 2and DBSCAN on self-blocking bricks in Table 3), which means that the effect of the algorithm relates not only to class scale but also to match degree to data distribution. Similarly, TLS is not superior to its competitors on some classes, such as Alfalfa and meadows, although it is the best on entire data set.

By comparing AA and OA, TLS has shown its superiority, and the performance of ISSC is closest to it. WaLuDi, DBSCAN, and FDPC have similar capabilities in the aspect of BS, whereas MVPCA and LP are relatively poor. The stability of an algorithm can be shown via variance, and a small variance corresponds to low volatility. In Tables 3and 4, the recognition accuracy of TLS on each class has the smallest change relative to the mean value, whereas its stability is second only to ISSC in Table 2.

In particular, if we want a rough recognition for HSI quickly, TLS can pick out a few high-quality bands to accelerate the training process of classifier. For example, in Figure 6(a), the accuracy of TLS exceeds 70% by training the SVM model with only 6 bands, which is about 5% higher than that of WaLuDi, LP, and FDPC. However, the advantage of TLS is gradually weakened along with more bands appended, and OA of various algorithms is quite close when *m* reaches a certain value.

*Comparison of Consistency Index*. In Table 5, Kappa coefficients of various algorithms with different *m* are all within the interval (0.7, 0.95), indicating that the classification results are highly consistent with the actual values in spite of only a part of ones utilized to represent the entire set. Similarly, Kappa curve is also proportional to the number of selected bands, while the rising speed slows down step by step. Moreover, observed from the classification model and data set, it is confirmed that SVM is more suitable for working on these data sets, and we can obtain higher Kappa coefficients when conducting experiments on PaviaU and Salinas owing to its relatively balanced pixel distribution compared with Indian Pines. From the aspect of the algorithm, TLS has stronger discrimination and can help the classifier to make more accurate judgement.

###### 5.2.3. Execution Time

In Section 4.3, we have analyzed the time complexity of TLS and pointed out that the algorithm has no advantage in execution speed. The running time of the algorithm is mainly dependent on the hardware configuration; however, data set and the number of selected bands will also affect it. In this paper, the experiments run on a Windows 10 computer with an Intel i5 Quad Core processor and 8 GB of random-access memory. The corresponding execution time of seven algorithms under different conditions is shown in Table 6.

According to the setting in Section 5.1.2, the number of samples used for experiments varies on different data sets, which is clearly reflected by the time consumed, so the execution time of all algorithms is longest in PaviaU accordingly. Besides that the speed of an algorithm is greatly impacted by its execution mode, and there is no doubt that sort/noniteration may cost less time than iteration. Hence, MVPCA takes the least amount of time followed by ISSC, FDPC, and DBSCAN in order. TLS is faster than LP, and the time consumed by the WaluDi is the longest.

#### 6. Conclusion

In this paper, we propose a two layers selection (TLS) algorithm to establish a dimensionality-reduced band set for HSI. On the premise of keeping the basic features of the spectrum, the bands with strong discrimination, low redundancy, and high information are picked out to complete the image reconstitution, and TLS achieves this goal through two phases. First, we employ the FDPC algorithm to sort the inner products of the local density and relative distance of all nodes in the all-bands set aiming at building a priority sequence, and the bands corresponding to top-ranks are collected into the candidate set. Owing to great influence of local density on FDPC outputs, we utilize methods of normalization and dynamic cutoff distance to realize the cherry-pick in scattered low-density regions as much as possible. After getting CMI, mRMR is adopted to group the bands that meet the given requirements in candidate set into the target set iteratively. In order to verify the effectiveness of TLS, six state-of-the-art algorithms are used as competitors to carry out experiments on three remote sensing image data sets. The comparative results that use indicators of OA, AA, and Kappa coefficient show that the band combination created by TLS is optimal. Especially, if we want a classification model to achieve higher accuracy with less training cost, TLS provides an effective way to cut down the dimensions of samples. Besides HSI processing, it also fits some applications where the sample has two or more types of features so that the hierarchical selection can be implemented.

Although lots of work has been done to improve the capability of the BS method, there are still many technical obstacles that need to be overcome in the future. Henceforth, the theory research studies will mainly focus on how to cut down the complexity of algorithms and improve their accuracy and robustness. Meanwhile, enhancing the adaptability to large-scale and high-dimensional data environment is also the direction of our innovation.

#### Data Availability

The data used to support the findings of this study are included within the paper.

#### Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

#### Acknowledgments

This research was funded by the Key Project of Natural Science Research of Education Department in Anhui Province of China (grant nos. KJ2020A0757 and KJ2019A0864).