Abstract

Sensor data analysis is used in many application areas, for example, Artificial Intelligence of Things (AIoT), with the rapid developing of the deep neural network learning that promotes its application area. In this work, we propose the Depth and Width Changeable Deep Kernel Learning-based hyperspectral sensing data analysis algorithm. Compared with the traditional kernel learning-based hyperspectral data classification, the proposed method has its advantages on the hyperspectral data classification. With the deep kernel learning, the feature is mapped through many times mapping and has the more discriminative ability. So, the deep kernel learning has the better performance compared with the multiple kernels learning. And it has the ability to adjust the network architecture for hyperspectral data space, with the optimization equation of the span bound. The experiments are implemented to testified the feasibility and performance of the algorithms on the hyperspectral data analysis, with the classification accuracy of hyperspectral data. The comprehensive analysis of the experiments shows that the proposed algorithm is feasible to hyperspectral sensor data analysis and its promising classification method in many areas data analysis.

1. Introduction

With the developing of the deep learning theory, the researchers have proved the deep architecture is feasible and effective in several applications. Deep kernel learning is widely used in image analysis, and the results show it is the effective framework [1, 2], including image annotation [3], classification [4], segmentation [5], and anomaly object detection [6]. The previous works present the deep multiple kernels learning [7] and the novel objective function [8], and in this deep kernel learning framework, the adaptive back propagation is used to update coefficients and weights of the network to improve the ability of representation [912].

In the deep kernel framework, the accuracy and speed of visual recognition are effectively improved by using random gradient descent, which provides a research framework for the follow-up deep kernel learning. The researcher proposed a multilayer kernel machine based on the deep confidence network in 2012, which describes the structure from the support vector machine to deep multilayer more completely [12]. From the perspective of popular learning, Brahma et al. analyzed the principle of improving the performance of kernel learning under the deep neural network from the perspective of popular learning and gave a relatively perfect theoretical framework of deep kernel learning [13]. For the classification problem, Wong et al. proposed a regularized depth Fisher kernel mapping method in 2011. The learning sample space was explicitly mapped to the feature space, and the deep neural network was used to improve the separability of features according to the Fisher criteria [14]. Based on the traditional KPCA, The research results show that FKNN and DLF effectively improve the image classification performance [15]. To solve the clustering problem, the researcher proposed a -means algorithm based on RBF kernel based on the nonlinear kernel and multilayer architecture [16], the researchers used the kernel -means method to detect the change of the satellite image [17], and realized the pixel representation of a pseudo training set of the nonlinear clustering partition by deep learning areas. For the recursive problem, the researcher proposed a local deep kernel learning algorithm to accelerate the prediction performance of nonlinear support vector machines and realize the function of optimizing the spatial tree structure [18]. In view of the problem of the deep model selection, Strobl et al. proposed an idea based on multicore deep learning and effectively improved the performance of deep kernel learning through learning multilayer and multiple kernel functions [8]. For the improvement of the deep kernel learning algorithm with different application backgrounds, Li et al. proposed a data recognition algorithm combining CNN and multicore learning for emotional recognition, which effectively improved the recognition performance [19]. On tumor classification of ultrasound images, a research framework based on multicore deep learning was proposed [20]. On the problem of discourse level analysis, Poria proposed a data recognition algorithm based on the multicore model and deep CNN. For the problem of protein-protein interaction in biological information, Yang proposed a classification algorithm based on deep kernel learning [21]. Deep kernel learning has to be applied to many areas, for example, image recognition [33] and pattern prediction [34]. The structure variable deep kernel learning architecture is very feasible, because the width and depth are adjusted. The features are extracted with the different kernel networks, and the performance are different with the different parameters. These parameters are adjusted through the data training. The number of training data is different in the different applications.

In our recent work [35], the novel multiple kernel learning-based deep neural network algorithm work was proposed for the data classification. The work presented the algorithm framework, and it has the advantage on the nonlinear data classification. For the hyperspectral data classification, the algorithm [35] has its ability for the data analysis, but it must be optimized for the hyperspectral data analysis. The hyperspectral data analysis is different from the general data classification in [35]. We have the analysis on the kernel learning with the advantage on the hyperspectral data analysis in the previous work [36], but the algorithm in [36] is the traditional kernel learning framework, with less performance than the deep learning. So, it is effective to improve the performance of kernel learning on the hyperspectral learning. The similar results are testified on the SAR image classification in the recent work [37], only many kernel functions are combined with some combination weights, and the features are mapped with only one time of projection compared with the traditional kernel, but the structure in [37] is unchangeable, and the performance is limited.

So, motived by the work [3537], we propose the novel structured changeable deep kernel for hyperspectral sensing data analysis. Different from the work [35], the parameter of structure is optimized with the optimization algorithm, so compared with the algorithm in [35], the algorithm has the ability of adjusting the architecture for the more performance on the hyperspectral data analysis compared with the algorithm in [36, 37]. In the framework, the architecture parameters are solved with the optimization equation using the span bound. The performance on the hyperspectral sensor data is testified and evaluated for hyperspectral applications.

2. Proposed Variable Structure-Based Deep Learning Network

2.1. Framework

As an excellent method of nonlinear feature extraction and classification, the kernel learning method can be extended to the deep learning network architecture and enhance the ability of multilayer feature extraction. Although the deep learning network and kernel learning algorithm have been combined and the algorithm is presented in the previous work [37], it cannot achieve the adaptive adjustment of the network structure. So, the network is not adaptive to heterogeneous data.

For the hyperspectral data analysis, the hyperspectral data ,where the hyperspectral vector with the labels vector . With the kernel function , the function is calculated with with the nonlinear mapping features of the input hyperspectral data. The hyperspectral data matrix, , is calculated with the following equation. And the classification criterion function is shown as where is the coefficient vector and is the bias vector for the classification criterion . For the hyperspectral data classification application, the optimization equation is listed as follows: where , and the parameters and are chosen with the crossvalidation method; then, the algorithm can be defined as where is the number of the different kernel functions, and is the parameter of combining the kernel functions. Then, under the multiple kernel functions, the classification decision criterion is transformed as

And with this learning model, the combined kernel of each layer in the DWS-MKL algorithm is a linear combination of multiple kernel functions. The general framework is shown in Figure 1. is the total number of basis kernel functions, and is the combination coefficient of the combined kernel function. In the previous work [35], we have present the MKL framework.

The overall architecture of the DWS-MKL algorithm’s variable depth and width combined structure is shown in Figure 2. Since the layers of the DWS-MKL algorithm are directly cascaded, the output of the combined kernel function of the previous layer is the input of the base kernel function of the next layer, and the channels are independent of each other. Therefore, it is easy to adjust the depth and width combination of the algorithm. For the same data set, the classification accuracy of the DWS-MKL algorithm under different depth and width combined architectures is different. For data sets of different scales, if the depth and width combination structure are fixed, the classification accuracy rate under all data sets cannot be maximized. Therefore, it is necessary and reasonable to determine the complexity of the model according to the complexity of the problem to be solved. Adjust the structure of the algorithm flexibly according to the data set, which is the main feature of the DWS-MKL algorithm. is the first combined core of the last layer except for the deep and wide combined structure of the DWS-MKL algorithm. is the combination coefficient corresponding to the base kernel function.

The DWS-MKL algorithm uses SVM as a classifier to solve basic data classification problems. The input of the SVM classifier is the output feature of the combined kernel . The specific architecture of the classifier is shown in Figure 3. The decision function of the SVM is

Where is the dual coefficient and is the bias of the decision function . The optimization problem of SVM is where , is the slack variable, and is the regularization coefficient. The basic structure of the DWS-MKL algorithm is a multiple combined kernel, and each combined kernel is a weighted linear combination of multiple basic kernel functions. Therefore, the decision function of the algorithm can be written as where is the combination coefficient of the deep and width combination architecture of the DWS-MKL algorithm. Without loss of generality, the decision function of the DWS-MKL algorithm is uniformly written as where combination parameters , dual coefficients , and offsets are learned through the DWS-MKL algorithm.

2.2. Algorithm

Extending the combined kernel cascade to -layers, the combined kernel function of the first layer of a deep multiple kernel learning algorithm can be expressed as where and are the input vectors of the algorithm. is the nonlinear mapping kernel function. In addition to cascading multilayer combined kernel, the DWS-MKL algorithm extends multiple deep structures to multiple channels, and the channels are independent of each other. Finally, the feature output of each channel is input to a combined kernel according to the sum and average rule. The combined kernel formula is as follows: where is the combined kernel of the last layer except for the deep and wide combined structure of the DWS-MKL algorithm. The total number of combined kernel in each layer is determined by the number of independent channels of the algorithm. The combined kernel of the DWS-MKL algorithm in each layer, and channel is defined as where is the base kernel function of the layer channel, and is the combination coefficient corresponding to the base kernel function.

There are many choices for the basis kernel function of the combined kernel, such as the linear kernel function, polynomial kernel function, RBF kernel function, Laplace kernel function, and sigmoid kernel function. The basic kernel function of the DWS-MKL algorithm can be combined according to specific practical applications. In the example test of the DWS-MKL algorithm of the present invention, three kernel functions are selected as the basis kernel functions of each combined kernel, namely, linear kernel, polynomial kernel, and RBF kernel. The specific formula is shown in Table 1.

2.3. Procedure

The depth and width of the algorithm are limited to . After that, we input the training data for iterative training and use a crossvalidation algorithm to determine the penalty coefficient and the optimal depth and width . After training, we obtain the SVM classification model with classification capability and the combination coefficient of the variable depth and width architecture of the algorithm. We represent the training set data as . is the learning rate.

Input:, , , , , .
Output: Parameter matrix.
1 Initialize , compute ;
2 forto N do
3  compute the optimal vector with ;
4  fortodo
5   fortodo
6      compute , , ;
7      update ;
8   end
9  end
10 end

3. Experiments

3.1. Algorithm Feasibility Evaluation

Firstly, we evaluate the feasibility of the deep kernel learning, on the MSTAR data.

In the experiment, the same SAR image target as the deep belief network experiment is used as the verification object, and ten kinds of chariot targets under different angles are selected as the data set. Firstly, the original images in MSTAR data set are preprocessed and cut to the same size. Then, 1000 images are selected as the training set and 1000 images as the test set. In the experiments, in order to test the influence of the number of samples in the data set on the results of feature extraction, a comparative experiment was carried out with 400 training sets and 400 test sets as the data objects under the condition that other conditions remain unchanged.

Different data sets are constructed to verify the effect of feature extraction in the two modes, and the target classification results after feature extraction of depth kernel mapping are compared with other common methods. Firstly, the classification task is used to verify the effect of feature extraction. The ordinary SVM only supports two classification methods. Therefore, this part carries out pairwise combination for different categories of chariot targets, then constructs data sets according to the combination structure, and verifies the effect of deep kernel mapping structure feature extraction in classification under two different data set sizes. The results are shown in Table 2.

Through five groups of two classification tasks, the feature extraction results of depth kernel mapping are verified for different objects and different size data sets. The performance of the same group of network parameters in the classification task of different targets is slightly different, but the overall accuracy rate has been improved with the depth of the structure, and the increase of the amount of data contained in the data set will improve the classification accuracy, but the calculation time will also increase accordingly. Therefore, in the follow-up research, we need to focus on the actual application of the indicators of operation speed and algorithm performance. The network structure is optimized and adjusted. After verifying the classification results, some classification methods are implemented to evaluate the algorithms, and some results are shown in Table 3.

The detection accuracy of five kinds of targets is different, which shows that the differences between each category of targets and other categories are different, and the performance of features extracted by different targets is also different. However, with the increase of structure depth, the detection performance is also improved. It shows that the depth structure can improve the ability of kernel mapping in remote sensing image feature extraction, but for all kinds of specific targets and detection, it is also necessary to optimize the structural parameters to improve its performance. Under the same computing resources and data objects, we compare the classification performance of the common target feature extraction methods in the target classification task. The support vector machine with RBF kernel is selected in the single kernel mapping, the four-layer network structure with good effect is selected for the deep confidence network, and the common AlexNet model is selected for the convolution network. The classification results of each method are shown in Table 4.

By comparing the classification results of feature extraction based on depth kernel mapping and other classification methods, it can be found that the feature extraction effect of deep multicore mapping is the best. Compared with the common single kernel mapping algorithm, the accuracy has been greatly improved, which shows that the depth structure can improve the feature extraction performance of the kernel mapping algorithm. In this paper, we achieve the better performance with the depth structure of the foundation, the existing loss function, and the number and category of the kernel function. So, it is effective for the mapping structure with the parameter optimization algorithm.

Secondly, we have the evaluations on the UCI dataset, to test the performance with different depths and depth widths.7 public data sets from the UCI benchmark are chosen for the binary and multiple classification tasks. For the comparisons, we have implemented other algorithms, including EasyMKL [22], SimpleMKL [23], SM1MKL [24], and L2MKL [25, 26]. These algorithms with the optimal parameter in the references and 5-fold cross validation method are used to choose the parameter for our algorithm. Some results are shown in Table 5.

From a statistical point of view, there are 8 frameworks that are combined by 3-width in our method regardless of the depth. This work proves that a wider width can improve the performance of the framework when combined with the appropriate depth. In summary, DWS-MKL outperforms the MKL algorithm and improves the classification performance of the algorithm.

3.2. Performance on the Hyperspectral Data Classification

In this paper, we have the experiments on the Indian Pines dataset, Pavia University data, and these datasets are the same to our previous work [36] for the comparisons. The Kappa coefficient (KC) and Overall accuracy (OA) are used to evaluate the algorithms. 10 basis kernels are used to learning, and [0.01, 2] is the scale parameter . On the classifier settings, a standard multiclass SVM is used to classification, and the optimal parameters are chosen with the crossvalidation method. The original spectral features are reduced with the PCA algorithm. From the comparisons, the 15 methods also are implemented on two hyperspectral databases, including RBF-SKL [27], POLY-SKL [27], Mahal-RBF-SKL [31], Mahal-Poly-SKL [31], SK-CV -RBF-SKL [30], SK-POLY-SKL [30], EasyMKL [22], SimpleMKL [23], SM1MKL [24], L2MKL [25, 26], NMF-MKL [32], KNMF-MKL [32], Euclidean-MKL1 [28], Euclidean-MKL2 [29], and QMKL [36]. The detail algorithm description can refer to work [36].

As shown in Table 6, compared with other kernel learning methods, the proposed method achieves the highest recognition rate. The results show that the deep structure can effectively improve the discrimination ability of data. In the comparison of the algorithms, including single kernel learning, multicore learning, and variable structure kernel learning, the variable structure deep kernel learning proposed in this paper achieves the best results, which shows that the algorithm is feasible. The database used in the experiment is actually collected database, which has good practicability. It can be shown that the method proposed in this paper can be used in practical applications.

3.3. Discussion

From the above experimental results, we can see that the framework of deep kernel learning is effective and can effectively solve the problem of classification. The results on hyperspectral data sets show that the proposed variable depth kernel learning architecture is also suitable for practical data classification problems and can improve the accuracy of classification. In addition, this method has the characteristics of small amount of calculation and small scale of network parameters, which is especially suitable for application scenarios with limited computing resources. In the paper, three kinds of data have their own characteristics, so the UCI data, radar data, and hyperspectral data have the representative meaning for the practical application. So, in the practical applications, the algorithms should have the corresponding performances to the results. Moreover, the proposed algorithm can be implemented in the limited computing ability of platform, for example, DSP and FPGA. So, it can be used for the many computing practical application areas.

4. Conclusion

In this paper, we propose the novel deep kernel learning for the hyperspectral data classification, and the solve method with the gradient projection is used to optimize the error of the classifier. The Depth and Width Changeable Deep Kernel Learning-based hyperspectral sensing data analysis algorithm is proposed. Some experiments show that the proposed method has its advantages on the hyperspectral data classification. The experiments are implemented to testified the feasibility and performance of the algorithms on the hyperspectral data analysis. In the future research, the computing efficiency should be improved, because it can be used in the video recognition and image processing with small computing ability.

Data Availability

We have not used specific data from other sources for the simulations of the results. The two popular hyperspectral datasets in this paper, Indian Pines dataset and Pavia University data, are free download with the website: http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes. The proposed algorithm is implemented in python.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

We would like to thank Dr. Li Li, Prof. Junbao Li, and Prof. Yanfeng Gu to provide the kernel-based learning programs of their papers for comparison in the experiments. This work is supported by the National Science Foundation of China under Grant No. 61871142.