Abstract

The classification technology of hyperspectral images (HSI) consists of many contiguous spectral bands that are often utilized for a various Earth observation activities, such as surveillance, detection, and identification. The incorporation of both spectral and spatial characteristics is necessary for improved classification accuracy. In the classification of hyperspectral images, deep learning has gained significant traction. This research analyzes how to accurately classify new HSI from limited samples with labels. A novel deep-learning-based categorization based on feature extraction and classification is designed for this purpose. Initial extraction of spectral and spatial information is followed by spectral and spatial information integration to generate fused features. The classification challenge is completed using a compressed synergic deep convolution neural network with Aquila optimization (CSDCNN-AO) model constructed by utilising a novel optimization technique known as the Aquila Optimizer (AO). The HSI, the Kennedy Space Center (KSC), the Indian Pines (IP) dataset, the Houston U (HU) dataset, and the Salinas Scene (SS) dataset are used for experiment assessment. The sequence testing on these four HSI-classified datasets demonstrate that our innovative framework outperforms the conventional technique on common evaluation measures such as average accuracy (AA), overall accuracy (OA), and Kappa coefficient (k). In addition, it significantly reduces training time and computational cost, resulting in enhanced training stability, maximum performance, and remarkable training accuracy.

1. Introduction

Due to the fast growth of photonics with optics, sensors in hyperspectral (HS) are needed to install in several satellites. HSI classification is an essential and challenging task that is targeted towards labelling each pixel contained in a hyperspectral image. HSI images contained spatial-spectral information which is useful for detecting scene objects [1]. This had been used in many fields like environmental surveillance, astronomy, and precise agriculture [2].

In the earlier days, HSI classification was done by the machine learning methods such as support vector machines (SVM) [3, 4], k-nearest neighbor (KNN) [5, 6], multinomial logistic regression (MLR) [7, 8], and decision tree [9, 10]. Within the similar data which exists, spectral changes in various materials and various spaces might have the same features, so the attained details were still corrupt because of inadequate spatial structure feature extraction. To solve these issues, it is hard to perfect the classification of HSI. So, numerous spectral and spatial feature extraction methods are proposed.

These techniques have validated major classification performance, which is not in effect for classifying HSI in difficult situations. In recent times, deep learning techniques had achieved maximum success for this kind of task [1113]. So, this method had reached admirable performance for different analysis-oriented tasks, e.g., object recognition and image classification. To classify HSI, entire spatial and spectral perspectives must be considered for the processing. Intuitively, HSI consists of a higher number of images and every image signifies electromagnetic spectrum classification. Temporarily, the spatial perspective denotes 2D spatial data of objects consistent in the HSI. Thus, HSI is typically denoted as the 3D spectral-spatial data. Therefore, many methods had been proposed in the literature [14, 15].

Towards concurrently modelling spectral-spatial data, certain developer attempts were made. This method performed operations in a stacked manner along with convolution over spectral and spatial feature space in a stacked manner, named CNN model [16]. Apparently, the benefit of this CNN model may create rich feature maps. Moreover, the major drawback of this method is threefold. Initially, It is hard to generate a deeper CNN structure. An intention in the resultant area increasingly improves through cumulative amount in the convoluted function that confines the interpretation ability and depth of the model. Next to that, the cost of the memory is too expensive while maximum convolution operations were performed [1720]. To reduce the abovementioned challenges, we introduced the new CNN model namely compressed synergic deep convolution neural network with Aquila optimization (CSDCNN-AO).

The significant goals to achieve the above-said objectives are listed below:(i)to determine the suitable deep learning method which provides huge support for HSI image classification.(ii)To reduce the complexity and loss function in classification.(iii)To develop the future outcome based on both present and traditional output.

The major contribution of this technique is given below.

This combination will reduce the learning complexity of the wavelet concept and reduce the loss function with the Aquila optimization. This Aquila optimization method could reduce the enormous amount of data features by maintaining its unique possessions and using less time for computation and less memory space. Furthermore, a synergic deep convolutional neural network (CNN) is useful and intended for getting an initial result, similarly, the CNN weights are optimized by Aquila optimization for reducing an error rate. Here, the key role is the compression of data with the Aquila optimization technique with CNN for increasing accuracy with maximum steadiness among both exploitation and exploration of optimization.

The organization of the work is given below:

the literature survey is given in Section 2. In Section 3, the proposed methodology is given. In Section 4, the experimental results and discussions are explained. At last, in Section 5, the conclusion is given.

2. Literature Review

Yang et al. [21] present a novel synergistic CNN for an accurate HSI classification. The SyCNN contains the hybrid structure of 2D and 3D CNNs with a data interaction module with feature learning that fuses both spatial and spectral HSI data. Moreover, it presents a three-dimensional process earlier to a fully connected layer that supports and extracts features effectively. But still, they could not handle high-dimensional data.

Li et al. [22] suggested an HSI model called local and hybrid dilated convolution fusion network (LDFN) that combines both the local and rich spatial features through expanding the perception field. Initially, several functions were considered, such as dropout, standard convolution, batch normalization, and average pooling. After that, both local and dilated convolution operations were involved in efficient spatial-spectral feature extraction. On the other hand, parameters were manually selected in the suggested paper.

Patel et al. [23] suggested HSI categorization by an autoencoder through CNN (AECNN). Pre-processed by autoencoder-enhanced HSI features that helped towards obtaining optimized weights in CNN initial layers. Thus, here, CNN with a shallow model could be applied towards extracted features from the HSI data. But still, they need to cover more contextual information and advanced strategies for robustification of the spatial information.

Wang et al. [24] suggested a semi-supervised HSI classification model which improved deep learning. Here, the suggested model namely the arbitrary multiple graphs method, and then replaced skilled learning with the anchor graph method that could be labelled a significant unlabelled data automatically and precisely. In this, the number of training samples is limited.

Shi et al. [25] presented a model namely the 3D coordination attention mechanism (3DCAM). This attention process could not attain the HIS’s spatial position in both vertical and horizontal ways. Also, HSIs spatial and spectral data were extracted, using CNN. The drawback is that the implementation complexity is not considered.

Zhao et al. [26] suggested combining stacked autoencoder (SAE) with 3D deep residual network (3DDRN) to classify HSI. An SAE neural network was designed to reduce HSI size. 3DCNN and residual network module were used to develop 3DDRN. The 3DDRN extracted spectral-spatial features from dimension-reduced 3D HSI cubes. 3DDRN continuously identified deep features, which were passed into SoftMax to complete classification. Batch normalization (BN) and dropout were used to avoid overfitting training data.

Yin et al [27] developed a spatial-spectral mixed network for HSI categorization. The network collects spatial-spectral information from HSI using three layers of 3-D convolution and one layer of 2-D convolution. This network employs Bi-LSTM to boost spectral band interactions and extract spectral features as a series of images. Combining two FC layers and utilising SoftMax for classification creates a unified neural network. However, the model misclassified samples in the dataset.

Paul et al. [28] developed SSNET, which blends 3D and 2D convolutions of HSI spectral-spatial information with SPP for creating spatial features at various scales. SPP is employed in two-dimensional local convolutional filters for HSI classification because it resists object distortions. SPP layer’s fixed feature vector output reduces trainable parameters and improves classification performance. They do, however, have a complicated structure.

Zhang et al. [29] introduced an SSAF-DCR for hyperspectral image classification. Three components were linked to extract features in the recommended network. First, a dense spectral block reuses spectral characteristics as much as possible. Then, a spectral attention block refines and optimises the spectral features. In the second segment, a dense spatial block and an attention block pick spatial features. But in this, the selection of the number of features is not considered.

Yan et al. [30] offer a 3D cascaded spectral-spatial element attention network (3D-CSSEAN) for picture classification. Using the spectral element attention module and the spatial element attention module, the network may concentrate on key spectral and spatial aspects. Two-element attention modules were built using activation functions and element-wise multiplication. The model can extract classification-helping properties and is computationally efficient. The network structure is also suitable for small sample learning since the attention module has few training parameters. On the other hand, obtaining labelled samples are expensive and difficult.

To overcome existing challenges, our proposed work introduces novel techniques which are discussed in the following section.

3. Proposed Synergic Deep Learning Model

Let us assign the hyperspectral image where represented entire bands with band samples. Additionally, is the sample in which with labels. Usually, HSI classification is affected due to inter-class similarity and high intra-class variability. To compensate for these issues, we introduce the proposed technique namely, the synergic deep learning model with the feature reduction principle. This method minimizes complexities for computation by reducing spectral and spatial feature dimensions. Here, we evaluate the efficiency of the subsequent feature suppression methods using a hybrid synergic deep CNN model. The proposed synergic deep learning model consists of synergic deep learning (SDL)-based feature extraction, feature reduction, classification, and loss function optimization. The schematic representation of the proposed method is represented in Figure 1, which is given in the following sections.

3.1. Synergic Deep Convolutional Neural Network Feature Extraction

In this proposed model as shown in Figure 2, we extract the HSI useful features which are normally represented by the input layer, n DCNN components and synergic network . Recently, DCNN yields more attention for the classification which is proposed to reduce the number of input variables and develop the neural network architecture. DCNN is a combination of layers where each layer performs different functions. Pre-processing, convolution, pooling, and final classification operations are sequentially performed in synergic DCNN [31]. The forward process is a convolution operation on the inputs. The multiplication between weights and inputs is combined across layers. The filter has the same number of layers as input volume channels, and output volume has the same depth as the number of filters. In the convolution process, several computations are carried out. Every layer is composed of neurons that take input values, perform calculations, and produces output values, which are forwarded to the next layer. Under CNN, there are four important operations performed in feature learning: the convolution, the activation, the pooling, and the normalization. Before convolution operation, pre-processing is worked out.

3.1.1. Pair Input Layer

Synergic pair input layers are trained randomly, and here, each 200-data group with corresponding class labels is given to the DCNN units. Here, the image is in the size of 224 × 224 × 3. Before applying the data to the next layer, we have to apply the feature reduction principle.

3.1.2. Feature Reduction by Wavelet Transform

In this feature reduction concept, we used wavelet transform with the Haar basis model so that they can handle the high-dimensional data efficiently. Here, two filters and are applied for effective feature reduction. These filters are incorporated with the transforms to yield deducted input coefficients. The following equation is for the feature reduction which is given in equation (1).

As a result of this transformation into the DCNN, learning complexity and learning time can be reduced. In this process, it reduces CNN architecture with the number of features.

3.1.3. DCNN Component

In every DCNN component, we initiate with ResNet-101 architecture which is denoted as DCNN-n (n = 1, 2, …, N). This type of architecture is suitable for synergic deep learning (SDL) method. Here, we consider the data sequence with compressed features and output class label series . This has to be intended with the variable which undertakes cross-entropy loss expressed in equation (2).

The above equation (2), means the forward computing process. In the same way, the variable used in DCNN-n is mentioned as , and these components will not share enormous DCNN components.

In this SDN model, synergic labels in DCNN are applied to input layers, embedding, and learning layers. In SDN, the consequence data pair is denoted as , and this pair of input is given to (DCNNi, DCNNj). Output from the FC layer is given in the following equations (3) and (4).

In the next stage, all the deep features are embedded and the resultant outcome is expressed in the following equation (5).

Loss in binary cross-entropy is given as below:

The above expression represents the synergic attributes, and represents the synergic forward computation. This process validates data pair classes and yields a recovery response belonging to the synergic (SN) errors.

3.1.4. Training and Testing

In this stage, we do the SN maximization processwhere, and represents the learning ratewhere, refers to the trade-off among synergic error and classification sub-model. Additionally, test data classification belonging to the SN DCNN component is processed under some of the prediction vectors which are represented as . Further, the test data class label is deliberated as below:

3.2. Image Classification

This is the final stage to classify the HSI images concerning the different class labels. This classification is performed under the SoftMax layer which has more attention for the multi-label classification. It leads to a mapping function on behalf of the input vector as of space to class labels, which is given in equation (10).where, and refers the weights, and this has to be tuned using the optimization process. As a result, we can reduce the loss function in this architecture.

3.3. Loss Reduction by Aquila Optimization Algorithm

Losses in this SDL are reduced by the Aquila optimization algorithm with the weight tuning process. This Aquila optimization algorithm yields the best solution despite the definite limitations.

The mathematical model of Aquila optimization (AO) [32] consists following stages: expanded exploration, narrowed exploration, expanded exploitation, and narrowed exploitation.

3.3.1. Expanded Exploration

In this work, Aquila recognizes the best weight based on the best hunting area. Here, the best hunting area refers to the minimum losses. In this process, the AO (weight optimization) extensively explores extraordinary soar to conclude the search space area.where, refers to the next iteration solution, and this is estimated by the initial search method . is considered as the best until iteration T. Expanded search (exploration) is controlled by the iteration. In addition to that, represented the current location mean value which is calculated in the following equation. t and T are the maxima and current iterations.where, n is the population size.

3.3.2. Narrowed Exploration

In this stage, AO barely discovers (explores) the certain space of the targeted prey for the solution.where, is the next iteration solution. and is the levy flight distribution function and dimension space, respectively. Additionally, is the random solution which is taken from the range of (1, …, n).where, refers to the constant which has the value of 0.01. Moreover, and are constant numbers.

In the above equation (15), is the constant value. Moreover, the value of u and are calculated as follows, which is used for spiral search in this optimization.

has the values from 20 to toward fixed search cycles, and has the value of 0.00565. differs based on dimension, then is a minimum value which is a constant 0.005.

3.3.3. Expanded Exploitation (X3)

In this stage, weight optimization exploits the accurate value of the solution for getting nearer to prey and attack.where refers to the next iteration solution, and represents the estimated prey location. In addition to that, represents the current mean value at the Tth iteration, and means the random value which is between 0 and 1. and are the small values (0, 1) which are adjustment parameters for the exploitation process. and represents the upper and lower bound of the problem, respectively.

3.3.4. Narrowed Exploitation

In this phase, attacking is processed in the last location.where demonstrates the next iteration solution. mentions the quality function which is applied for balancing the search strategies. specifies several optimization motions that are applied for tracking the prey. specifies the values that are reduced from two to zero. represents t iteration with the current solution.

refers to the tth iteration’s quality function, and means random value between 0 and 1. and t presents the maximum and current iteration, respectively. Levy(D) is the levy flight distribution function calculated using equation (6). As a result, we can get optimum weights which reduces losses in the architecture.

4. Experimental Results and Discussion

In our work, we have used four HSI datasets which are used for analyzing our proposed CSDCNN-AO technique. Here, we use Houston U (HU) dataset [33], Indiana Pines (IP) [34], Kennedy Space Center (KSC) [35], and Salinas Scene (SS) dataset [17]. In the case of the IP dataset, the size of the dataset is 145 × 145. For the KSC dataset, the size is equivalent to 512 × 614 with 13 classes of ground truths.

4.1. Dataset and Its Description
4.1.1. Houston U (HU) Dataset

The first dataset is GRSS DFC 2013, which measures 349 1905 bytes, and has 144 bands spanning the wavelength range 380–1050 nm. It was obtained by the National Center for Airborne Laser Mapping (NCALM) and has a spatial resolution of 2.5 metres over the University of Houston. The picture is separated into two halves: the bright and dark sections. The bright section has 4143 samples, whereas the dark section contains 824 samples.

4.1.2. Indiana Pines (IP)

This agricultural dataset was collected in 1992 from Northwest Indiana utilising the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor. It has 145 × 145 pixels and 16 vegetation classifications with 20 m per pixel spatial resolution. After removing 4 zero bands and 20 bands affected by water absorption effects, 200 spectral bands ranging from 400 to 2500 nm with 10-nm intervals were used for analysis.

4.1.3. Kennedy Space Center (KSC)

The AVIRIS instrument in Florida collected the Kennedy Space Center dataset in 1996. It has a resolution of 512 by 614 pixels, 176 bands, and 13 categories.

4.1.4. Salinas Scene (SS) Dataset

Experiments on the Salinas Scene collected by the AVIRIS sensor over Salinas Valley, California, USA, with a spatial resolution of 3.7 m per pixel in the wavelength range of 0.4–2.5 m and a spectral resolution of 10 nm, used a second set of AVIRIS data. It measures 512 × 217 × 224 pixels (water absorption bands included).

The model for comparison enactment depending on the IP dataset through different classes are evaluated.

In this Table 1, we evaluated the classification performance for the Indian Pines Scene dataset. Here, overall accuracy, average accuracy, and Kappa coefficients are evaluated. From the results, we can show that our proposed CSDCNN-AO yields maximum performance than other techniques. In Table 1, CSDCNN-AO achieves a better result for the 13th class. In the case of CSDCNN, the 8th class achieves a better performance. For SDCNN, the 16th class has the maximum performance. DCNN also attains maximum performance for the 16th class only. For RNN, it has the maximum performance under the 6th class.

In the above Figure 3, (a) represents the original image and here we evaluated the results of the proposed algorithm with other algorithms like CSDCNN-ALO [36], CSDCNN-PSO [37], CSDCNN-WOA [38], and CSDCNN-GWO [39]. Different application [4045] were used in different fields for optimization. Among these methods, our proposed work yields the maximum performance since the performance of our proposed work is nearly equivalent to the original ground truth image compared to others.

In this Table 2, we evaluated the classification performance for the KSC dataset. Here, abovementioned performances are evaluated. From the results, we can show that our proposed CSDCNN-AO yields the maximum performance than other techniques. In Table 2, CSDCNN-AO achieves a better result for the 10th class. In the case of CSDCNN, the 11th class achieves a better performance. For SDCNN, the 13th class has the maximum performance. DCNN attains the maximum performance for the 8th class. For RNN, it has the maximum performance under the 6th class.

In the above Figure 4, (a) represents the original image and here we evaluated the results of the proposed algorithm with other algorithms like CSDCNN-ALO, CSDCNN-PSO, CSDCNN-WOA, and CSDCNN-GWO. From these methods, our proposed work yields the maximum performance since the obtained proposed image is nearly equivalent to the original ground truth image.

In this Table 3, we evaluated the classification performance for the Salinas Scene (SS) dataset. From the results, we can show that our proposed CSDCNN-AO yields the maximum performance than the other techniques. In Table 3, CSDCNN-AO achieves a better result for the 13th class. In the case of CSDCNN, the 16th class achieves a better performance. For SDCNN, the 14th class has the maximum performance. DCNN also attains the maximum performance for the 16th class only. For RNN, it has the maximum performance under 11th class.

In the above Figure 5, (a) represents the original image and here we evaluated the results of the proposed algorithm with other algorithms like CSDCNN-ALO, CSDCNN-PSO, CSDCNN-WOA, and CSDCNN-GWO. From these methods, our proposed work yields the maximum performance since the obtained proposed image is nearly equivalent to the original ground truth image.

In Table 4, we evaluate the classification performance for the Houston U dataset. From the results, we can show that our proposed CSDCNN-AO yields the maximum performance than the other techniques. In Table 4, CSDCNN-AO achieves a better result for the 11th class. In the case of CSDCNN, the 8th class achieves a better performance. For SDCNN, the 15th class has the maximum performance. DCNN also attains the maximum performance for the 14th class only. For RNN, it has the maximum performance under 8th class.

In the above Figure 6, (a) represents the original image then we evaluated the outcome of the proposed algorithm with other algorithms like CSDCNN-ALO, CSDCNN-PSO, CSDCNN-WOA, and CSDCNN-GWO. From these methods, our proposed work yields the maximum performance since the obtained proposed image is nearly equivalent to the original ground truth image.

The input images were obtained from the four datasets. Results are obtained after feature extraction, feature reduction, classification, and loss function optimization. The four different datasets taken for testing purposes are the HU dataset, IP, KSC, and SS dataset. These four datasets have shown promising results in this classification. The results (i.e., computational complexity, overall accuracy and loss functions) that are obtained by these datasets are given in the following figure.

The computational complexity attained for various iterations are shown in Figure 7. The usage of various optimizations along with the synergic deep CNN has improved the performance of the proposed algorithm. The computational complexity attained by Aquila optimization is much better as it has identified the optimal solution in lesser number of iterations, due to this the computational complexity has to be increased while increasing the iterations. Not like other meta-heuristic algorithms, this optimization algorithm has provided satisfactory results on weight parameter selection compared to ALO, WOA, PSO, and GWO. Therefore, in this proposed process, the Aquila optimization is encouraged.

The overall accuracy comparison for the abovementioned data sets is shown in Figure 8. Among all the datasets, the dataset named KSC has shown a higher accuracy value than other algorithms. These four efficient datasets are taken for comparison. However, the overall accuracy is evaluated with the coefficient loss. The comparison analysis in terms of overall accuracy is affected while coefficient loss is increased. Our proposed work yields maximum accuracy of 99.02%, and this is lagged for the increasing coefficient losses.

The loss comparison for proposed and existing algorithms for the four datasets are shown in Figure 9. Among all the techniques, our proposed CSDCNN-AO has shown a lower loss value than other algorithms. The four efficient existing algorithms that are taken for comparison are CSDCNN, SDCNN, DCNN, and RNN. However, the loss shown in all these datasets are found to be much less than that in other existing algorithms. Especially for the KSC dataset, obtained losses are very low compared to another one. This is because the proposed technique has enhanced the effectiveness of the classification process.

5. Conclusion

Compressed spatial and spectral characteristics are employed as the key perception to develop a compressed synergic deep convolution neural network with Aquila optimization (CSDCNN-AO) for efficient HSI classification in this study. This combination will reduce the wavelet concept’s learning difficulty and the Aquila optimization’s loss function. This Aquila optimization approach may minimize the maximum number of data features without losing their characteristic state, while using less computing time and memory. Our proposed approach is superior to existing deep learning models due to higher learning ability of our synergic deep learning model based on compressed features. While comparing with the other techniques, our proposed approach can reach the maximum level of classification. In addition, the experimental results showed that the loss function does not significantly impact classification accuracy. In addition, the outcome demonstrates that the CSDCNN-AO approach has the highest accuracy among all the four datasets. Furthermore, the performance of average accuracy, total accuracy, and Kappa coefficients is optimal when implemented on all datasets. However, the proposed technique lacks optimal performance with certain samples. In future research, this issue will be resolved using a new model.

Data Availability

The dataset can be obtained from the corresponding author based on the request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through the Large Groups Project under grant number RGP. 2/252/43.