Abstract

In this paper, we propose a new method for hyperspectral images (HSI) classification, aiming to take advantage of both manifold learning-based feature extraction and neural networks by stacking layers applying locality sensitive discriminant analysis (LSDA) to broad learning system (BLS). BLS has been proven to be a successful model for various machine learning tasks due to its high feature representative capacity introduced by numerous randomly mapped features. However, it also produces redundancy, which is indiscriminate and finally lowers its performance and causes heavy computing demand, especially in cases of the input data bearing high dimensionality. In our work, a manifold learning method is integrated into the BLS by inserting two LSDA layers before the input layer and output layer separate, so the spectral-spatial HSI features are fully utilized to acquire the state-of-the-art classification accuracy. The extensive experiments have shown our method’s superiority.

1. Introduction

Hyperspectral images (HSIs) are produced by hyperspectral sensors by capturing reflectance values on tens or even hundreds of spectral bands for each pixel. The increased spectral resolution of HSIs makes them essential for many remote sensing tasks in various fields, such as agriculture [1], environment [2], and military [3], etc. To obtain semantic abstraction from HSIs, classification requires mapping from pixel values to land-use and/or land-cover descriptions, which is nontrivial because the high spectral redundancy detrimentally affects the classification process in terms of the curse of dimensionality problem [4] and noisy labels [5]. Moreover, accompanied by increasing spatial resolution, the widespread adoption of integrated spatial and spectral information in HSIs’ analysis has further increased the dimensionality of input data [6]. It has been proven in many cases that the useful spectral information for HSIs classification implies a nonlinear embedded submanifold of the original feature space, which can be retrieved by manifold-learning-based feature extraction methods [7, 8]. Sun et al. modified isometric mapping (ISOMAP) by accelerating its process to reduce the dimensionality of HSIs [9]. Fauvel et al. investigated the kernel principal component analysis (KPCA) cooperating with a linear classifier in HSIs classification and showed its privilege over the original principal component analysis method [10]. In contrast to previous global approaches, locally based methods like locally linear embedding (LLE) [11] and Laplacian eigenmaps (LE) [12] merely attempt to preserve the local geometrical structure of data, thus bringing about two prominent advantages: computational efficiency and representation capacity [13]. Some recent researches tried to formulate locally based manifold learning with a supervised regularization, thereby creating discriminative and compact feature representations [1416]. Locality sensitive discriminant analysis (LSDA) [17] was developed as an extension to linear discriminant analysis (LDA) by integrating the discriminative properties of LDA with the nearest neighborhood graph (NNG) modeling the local geometrical structure of the underlying manifold. Unlike other NNG based approaches (e.g., locality preserving projections (LPP) [18] and LE) [12], LSDA was used within-class graph and between-class graph to obtain good between-class separation and preserve the within-class local structure as well. It can then be expected as a useful feature reduction method for supervised classification tasks.

During the past decades, machine learning methods have been widely used to achieve higher semantic prediction accuracy on HSIs. For example, kernel machines such as support vector machine (SVM) and kernel Fisher discriminant analysis (KFDA) have been used successfully for HSIs classification [19]. Ensemble learning methods like random forest [20] and rotation forest [21] also showcased their benefits, especially when the available labeled training samples are limited [22]. Inspired by the biological nervous system, neural network models have achieved great success in general media information processing [23] and HSIs analysis [24, 25]. Moreover, models with random weights (NNRW) such as random vector functional link networks (RVFL) [26], Schmidt’s method [27], and extreme learning machine (ELM) [28] set arbitrary weights and biases for the hidden layer while the weights for output layer are obtained analytically. As noniterative artificial neural network (ANN) based frameworks, the NNRW algorithms enable high training efficiency while still retaining the powerful representation learning capacity [29]. In the field of HSIs classification, Xia et al. [30] reported that the general ELM was more accurate and much faster than SVM. Zhou et al. compared ELM with the composite kernel (ELM-CK) to SVM with CK (SVM-CK) and revealed that the ELM-based method still holds its advantages [31].

Recently, a new NNRW method, which broadly extends the hidden layer of RVFL called broad learning system, has been introduced [32, 33]. The main distinctive feature of BLS is that the input data are randomly mapped to features in “feature nodes,” which are subsequently transformed by nonlinear activation function to form “enhancement nodes.” Such a higher-order network structure provides an alternative way of learning deep features. In addition, the universal approximation capability of a broad learning system has been proven [33]. Jin et al. developed a robust broad learning system (RBLS) by modifying the regular terms of the cost function in order to promote its generalization performance on contaminated data modeling [34]. Through replacing the feature nodes with Takagi–Sugeno (TS) fuzzy subsystems, Feng and Chen crafted a neurofuzzy model called fuzzy broad learning system for regression and classification tasks [35]. Kong et al. applied BLS to HSIs classification for the first time. The semisupervised framework enabled the proposed method (i.e., semisupervised BLS (SBLS)) to leverage limited labeled samples and substantial unlabeled samples [36]. Although SBLS has shown its advantages over many approaches, including deep learning-based methods, the potential of BLS in HSIs classification is far from being fully exploited under the current situation.

In this paper, we propose a new framework for HSIs classification called BLS-LSDA, which integrates hierarchical spectral-spatial information abstraction, manifold learning method, and BLS. Our method firstly extracts spectral-spatial features by iteratively abstracting pixels’ neighborhood in a hierarchical manner. Then the features are input into manifold learning nodes implementing LSDA. The reduced dimensional features, which are discriminative and locality preserving, are sent to the feature nodes and afterward, the enhancement nodes of BLS. The following layer which is identical to the previous LSDA one is adopted to exploit the intrinsic structure of high order features produced by random mapping. At last, the weights of output nodes are acquired by a ridge regression learning algorithm. Our contributions are highlighted as follows:(1)Our method integrates a manifold learning algorithm with BLS in a multilayer neural network model, thus providing enhanced feature representation capacity to BLS(2)A novel implementation of spectral-spatial response (SSR) [37] consisting of Gabor filter and adaptive weighted filter (AWF) is developed to extract deep features of HSIs without a deep learning scheme(3)With comparative experiments conducted on 3 standard HSIs datasets, we show the proposed approach’s advantage in classification accuracy over the state-of-the-art methods

The rest of this paper is organized as follows. Section 2 gives a brief overview of BLS. In section 3, we present our method along with the details of the learning algorithm. Section 4 compares the performance of our method in three benchmark datasets with several prominent approaches and analyses the experimental results. Finally, discussions and conclusions are given in section 5.

2. Broad Learning System

Being different from the deep neural networks (DNN), BLS has no need of gradually searching for the models’ optimized parameters with backpropagation (BP).

The learning procedure of BLS consists of only one step, e.g., performing matrix inversion to figure out the weights of links between the nodes of the hidden layer and output layer. As a single hidden layer feedforward neural network (SLFN), the most prominent characteristic of BLS is the adoption of mapped feature nodes to construct the enhancement nodes, which bring in higher feature representation capability. Figure 1 shows the framework of the original BLS. Given the input data set X, the ith group of mapped feature nodes can be established by the following:where is the randomly chosen weights, and is the bias. It should be noted that different functions can be adopted for the n different groups of the mapped nodes. Concatenating the mapped nodes, we get the following:

Then is fed into the enhancement nodes to produce further abstraction of the input data as follows:where and are weights and biases, respectively, is the activation function. Usually, sigmoid function is used. Eventually, the hidden layer of BLS is as follows:

Then the output layer can be obtained by the following:where is the connection weights between the BLS’s hidden layer nodes and output layer nodes. Since the H and Y are already known in the learning procedure, we can calculate by rigid regression of the pseudoinverse as follows:

3. Methods

3.1. MultiScale Composite Spatial Features

Spatial information has been utilized for hyperspectral image classification for many years [38, 39], along with the recognition that a smoother classification map always ensures higher classification accuracy [40]. However, for those pixels lying along the edges, a smoothing filter may jeopardize the classification accuracy gain. Thus, in most cases, the spatial information derived by smoothing filters was composed with the raw spectral band values to acquire a trade-off of the context-based and isolated pixel values [31], or context-sensitive adaptive filters were designed to give out the edge-preserving maps [41]. In this work, we utilize the adoptive weighted filter (AWF) proposed by Zhou and Wei [42] to extract spatial information from HSIs. Meanwhile, given its deficiency in obtaining the differential information and inspired by the success of Gabor features applying in hyperspectral image analysis by enhancing the spatial discrimination on the highly contrastive area [43, 44], we exploit the benefit of integrating a simple two-dimensional Gabor filter and the AWF for feature extraction.

By assuming that neighborhood pixels which have similar spectrum distribution are more likely to belong to the same class, AWF obtains the weight of each pixel in the neighborhood by evaluating the similarity between it and the central pixel of the filter as follows:

The similarity is calculated by the Gaussian radial basis function as follows:where is the central pixel and is the pixel located at the ith row and jth column of the neighborhood. The is the standard deviation of the pixels’ difference, as follows:

Derived from the computational model for human beings’ visual cortical channels, the 2D Gabor filter (https://en.wikipedia.org/wiki/Gaborfilter) has been widely used in computer vision for various low-level tasks [45, 46]. It is a directional sinusoidal function modulated by a Gaussian envelope on a 2D (h, v) plane, which can be expressed in the complex form as follows:wherewhere denotes the wavelength of the sinusoidal factor, is the orthogonal orientation to the parallel stripes of a Gabor function. is the phase offset, denotes the standard deviation of the Gaussian envelope, and is the spatial aspect ratio that specifies the ellipticity of the support of the Gabor kernel.

The two spatial features are then integrated into a multiscale framework. We extract AWF and Gabor features through 5 × 5, 7 × 7, 9 × 9, 11 × 11, and 13 × 13 filters, respectively. At each scale, the convolution is conducted 3 times iteratively, and then the obtained features are sent to the next step. Figure 2 shows a brief view of the extraction of the multiscale composite spatial features.

3.2. BLS-LSDA

We believe that the validity of BLS greatly roots from its numerous randomly constructed hidden nodes which form a “broad” neural network structure. However, excessive nodes generally cause heavy computational or storage consumption, especially when the number of input nodes is boosted. Moreover, the randomly produced nodes have been often criticized for their arbitrariness that may deteriorate the performance in real-world applications [47].

The underlying structure of input data which is useful for classification can be retained by discriminate analysis methods, which intend to seek feature representations that address the interclass separation. It is necessary for BLS to make a compromise between the arbitrarily created nodes and their usefulness in differentiating input features of varied classes. To fulfill this task, an effective way is to introduce LSDA into BLS. Deriving from LDA, LSDA overwhelms its prototype by revealing the local geometrical structure of the data manifold additionally. In this work, we craft a novel neural network model named BLS-LSDA by inserting two layers applying LSDA as in Figure 3. Details of the layers are listed as follows.

(1)A layer was added between the input layer and the hidden layer of BLS to decrease the dimensionality of input data as well as enhance its separability;(2)By inserting an extra layer applying LSDA between the hidden layer and output layer, we further introduce a groupwise feature mapping which will benefit the weights retrieving.

For each LSDA layer, given m samples , and their labels as input, LSDA splits the nearest neighbors of into and which are the ’s nearest neighbor sample sets of the same and different labels, respectively.where k denotes the number of nearest neighbors of . Based on and , within-class graph and between-class graph are constructed with weight matrices and .

In order to map the points in feature space to a line so that the within-class points stay as close as possible while the between-class points stay as far as possible, given the map as , where is a matrix, the projection vector can be retrieved by,where is the Laplacian matrix, is a diagonal matrix with . And is a scalar between 0 and 1. See [17] for details.

Algorithm 1 depicts the framework of the BLS-LSDA training process. Let be the Gabor and be the AWF features of sample . They are fed into the first LSDA layer separately to produce dimensional reduced features and . Then the two features are weighted and concatenated as follows:

(i)Require: Training samples , and the corresponding labels ; the group of mapped nodes in BLS n.
(ii)Ensure: The weights of the output layer, .
(1) Extract the Gabor feature and AWF feature of sample ;
(2) Feed and into the layer which implement LSDA, to produce dimensional reduced features and ;
(3) Concatenate weighted and by equation (15), yielding ;
(4) for i = 1; i<n; i++ do
(5) Assign a random value to and ;
(6) Calculate the mapped feature values .
(7) end for
(8) Concatenate the mapped feature values to get a mapped feature group ;
(9) Assign and with random values;
(10) Calculate the enhancement nodes with ;
(11) Apply LSDA to each in and in another LSDA layer to get and ;
(12) Concatenate and to produce ;
(13) Calculate the connection weights between the BLS’s hidden layer and an output layer with .

According to equations (1) and (3), now we get the groups of feature nodes and enhancement nodes of BLS-LSDA as follows:

Each group of mapped features and enhancement features are fed into the second LSDA layer separately, and then we concatenated the outputs to get (see Algorithm 1). At last, the weights of the output layer are calculated with equation (6).

4. Experimental Result and Analysis

4.1. Datasets

To investigate the performance of our proposed framework, three open remote sensing datasets shown in Figure 4 have been taken as benchmarks. The first one is the Indian Pine dataset, which was collected by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pines test site located in North-western Indiana, USA. The scene consisting of 145 × 145 pixels was captured with 224 spectral bands in the wavelength ranging 0.4–2.5 × 10−6 meters. The number of labeled samples in its 16 classes is quite unbalanced, ranging from 20 to 2455. In our experiment, only 200 bands were used in order to avoid the effect of water absorption.

The second dataset is the Pavia University dataset collected by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor with 115 bands. The dataset has a spatial size of 610 × 340 pixels with 9 labeled land-cover classes. Due to the noise, 12 bands are discarded in our experiment.

Salinas dataset was also captured by AVIRIS sensor while being characterized by high spatial resolution (3.7 m/pixel). The ground-truth covered contains 16 classes. 20 water absorption bands were also removed from the Salinas in the experiment.

4.2. Parameter Settings

To quantitatively compare the classification results of BLS-LSDA with some prominent or state-of-the-art methods, including SVM, KELM, SVM-CK, KELM-CK [31], HiFi-We [48], and MASR [49], three frequently used indexes as overall accuracy (OA), average accuracy (AA), and kappa coefficient (K) are adopted in our experiment. For each time, class-wise r (r = 5, 10, 15, 20, 25, 30, 35, 40) labeled samples randomly chosen from each dataset are used for training, while the left is taken as testing samples (When there are no sufficient labeled samples, half of them are selected for training). To make the comparison more reliable, each listed evaluation result is obtained by taking an average of 10 times measurements under the same model settings. Moreover, in order to show the main features of our method, some parameters are manually chosen, as shown in Table1. The analysis of parameters’ sensitivities can be found in section 4.4.

4.3. Classification Results

We present our experimental results on each dataset in Tables (2), (3), and (4). Based on the results shown in all tables, we can find that our proposed BLS-LSDA is superior to the classic methods (i.e., SVM and KELM) and their derivations with spectral-spatial kernel method (i.e., SVM-CK and KELM-CK), as well as recent prominent methods (i.e., HiFi-We and MASR) focusing on exploring the advantage of spectral-spatial filters in HSI classification. Moreover, in order to explore the utility of LSDA layers inserted into the BLS model, the classification results using the original BLS are also provided.

For the Indian Pines dataset, our method achieves up to 94.2 ± 0.95% OA, 97.0 ± 0.49% AA, and 93.3 ± 1.10% k when 40 training samples are used. The advantage of our method over classic methods is much more obvious than it over other methods; however, MASR is better than the proposed method on AA by nearly 0.3%. However, for the University of Pavia dataset, by using the same number of training samples, the classification accuracies are 94.8 ± 0.79% OA, 96.0 ± 0.66% AA, and 93.0 ± 1.09% k, which surpass all the chosen comparative methods by 3-4% generally. For the Salinas dataset, our BLS-LSDA also achieved higher OA (98.0 ± 0.43%), AA (99.1 ± 0.18%), and k (97.7 ± 0.48%) than all other compared methods.

The advantage of BLS-LSDA is more obvious when there are a limited number of training samples (i.e., 5, 10, and 15). As an example, in Table (2), when 10 random samples are used for training, our method claims a nearly 4% OA increase over HiFi-We, which achieves the highest OA in comparative methods with 81.6 ± 2.26%. We believe that it is due to the high representation learning capacity of our proposed network structure. Moreover, the classification accuracies’ standard deviations of our method are overall lower than other methods, which means that the proposed model is more robust to the randomly chosen training data. However, an exception to the previous statement can be found in Table 3, indicating that when there are extremely limited training samples, LSDA may fail to capture the representative features.

Figures 5, 6, and 7 visually show the classification results of BLS-LSDA and other compared methods when r = 40. By visual evaluation, we can conclude that our proposed method is good at balancing the classification accuracy of pixels at both sharp and smooth regions. Taking the Salinas dataset (Figure 7) as an example, it can be easily seen that our method surpasses the other methods on the homogeneous area (i.e., the two smooth patches marked with grey circles), while the edges or acute angles are also well-preserved.

4.4. Parameters’ Sensitivities Analysis

We have evaluated the impact of different values of the model’s parameters shown in Table 1. It reveals that the classification results are not sensitive to different h, C, and s. Also, since N1, N2, and N3 are intertwining, we choose these parameters empirically. Besides, it has been well recognized that the dimensionality of the reduced subspace is crucial in manifold learning. Here, we investigate the performance of BLS-LSDA with different subspace dimensions on three benchmark datasets. For each dataset and each class, 20 training samples were randomly selected and the remaining samples were used for testing. All experiments were performed 10 times in order to get the average results. Figure 8 depicts the relationship between the classification results (OA) and the dimensions of the reduced subspace on three datasets.

For Indian Pines, we can find that when the dimensions of reduced subspace are less than 15, the overall accuracy goes up with the increase of the dimensions; otherwise, the curve becomes flat. Similar curves can also be observed when the other two datasets are taken. However, the turning points are found when the number of dimensions equals 10.

We also investigate the change of classification accuracies with . Figure 9 shows that with  = 0.1 the model acquires its best performance.

5. Conclusion

In this paper, we present a novel method for HSI classification which is based on BLS and LSDA. The two algorithms are integrated into a multilayered neural network model. To utilized both the spatial and spectral information, the Gabor filter and AWF filter are adopted to produce the input of the BLS-LSDA. Our experiments on three open benchmark datasets have shown their advantages against compared methods in terms of OA, AA, and k.

Also, our work has shown that with limited dimensional features acquired by LSDA layers in the model, high classification accuracy can be achieved, which means computational efficiency in real applications.

We believe that BLS-LSDA is a successful improvement on the original BLS for HSI classification; however, there are still some problems to be tackled. Our future work would address the initialization of weights and offsets with heuristic algorithms instead of random assignments.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work has been financially supported by the Fundamental Research Funds for the Central Universities under Grant nos. CCNU20ZN002 and CCNU20TD005.