Mathematical Problems in Engineering

Mathematical Problems in Engineering / 2018 / Article
Special Issue

Advancements in Mathematical Methods for Pattern Recognition and its Applications

View this Special Issue

Research Article | Open Access

Volume 2018 |Article ID 9632569 | https://doi.org/10.1155/2018/9632569

Hamidullah Binol, "Ensemble Learning Based Multiple Kernel Principal Component Analysis for Dimensionality Reduction and Classification of Hyperspectral Imagery", Mathematical Problems in Engineering, vol. 2018, Article ID 9632569, 14 pages, 2018. https://doi.org/10.1155/2018/9632569

Ensemble Learning Based Multiple Kernel Principal Component Analysis for Dimensionality Reduction and Classification of Hyperspectral Imagery

Academic Editor: Mustansar A. Ghazanfar
Received09 Apr 2018
Accepted09 Aug 2018
Published06 Sep 2018

Abstract

Classification is one of the most challenging tasks of remotely sensed data processing, particularly for hyperspectral imaging (HSI). Dimension reduction is widely applied as a preprocessing step for classification; however the reduction of dimension using conventional methods may not always guarantee high classification rate. Principal component analysis (PCA) and its nonlinear version kernel PCA (KPCA) are known as traditional dimension reduction algorithms. In a previous work, a variant of KPCA, denoted as Adaptive KPCA (A-KPCA), is suggested to get robust unsupervised feature representation for HSI. The specified technique employs several KPCAs simultaneously to obtain better feature points from each applied KPCA which includes different candidate kernels. Nevertheless, A-KPCA neglects the influence of subkernels employing an unweighted combination. Furthermore, if there is at least one weak kernel in the set of kernels, the classification performance may be reduced significantly. To address these problems, in this paper we propose an Ensemble Learning (EL) based multiple kernel PCA (M-KPCA) strategy. M-KPCA constructs a weighted combination of kernels with high discriminative ability from a predetermined set of base kernels and then extracts features in an unsupervised fashion. The experiments on two different AVIRIS hyperspectral data sets show that the proposed algorithm can achieve a satisfactory feature extraction performance on real data.

1. Introduction

Hyperspectral imaging (HSI) provides simultaneously spatial and high resolution spectral data and helps to classify/recognize the materials that are challenging to discriminate with conventional imaging techniques [1]. However, it suffers from the curse of dimensionality. For instance, the curse of dimensionality causes increase in cost of storage, transmission, and processing of hyperspectral images. To overcome such challenges, dimensionality reduction techniques have been applied to hyperspectral data in the existing literature [2]. In general, HSI has spectral redundancy in many spectral channels. For this reason, dimension reduction or compression is possible and even necessary, especially for these bands.

Even though there are several dimension reduction approaches in the literature, including manifold learning [3, 4] and tensors [5], principal component analysis (PCA) [6] is the one among the popular techniques [79]. PCA is the discrete form of the continuous Karhunen-Loève Transform and it projects the data into a subspace so that the variance retained is maximized and the least square reconstruction error is minimized [10]. Use of PCA for dimensionality reduction in HSI is a computationally suitable approach and it helps preserve the most of the variance of the raw data. Although PCA has some theoretical inadequacies [11, 12] for use on remote sensing data, particularly hyperspectral images [13], the practical applications show that the results obtained using PCA are still competitive for the purpose of classification [14, 15]. The ability of PCA is limited for high-dimensional data since it relies on only second-order statistical information. The nonlinear version of the PCA, denoted as kernel PCA (KPCA), has been proposed to overcome these limitations [16].

Since the KPCA involves the higher-order statistics, it provides more information from the original data [17] and so it is employed in many applications including remote sensing data due to its satisfactory performance. In [18], classification performance of an artificial neural network has been demonstrated to outperform the classical approach using kernel principal components. Fauvel et al. [19] showed that the KPCA is better than the classical PCA in terms of classification accuracies. A general overview of feature reduction techniques for classification of hyperspectral images is presented in [9]. They performed comparative experiments between the unsupervised, e.g., PCA and KPCA, and supervised techniques, e.g., double nearest proportion (DNP) [20] and kernel nonparametric weighted feature extraction (KNWFE) [21]. Since the supervised learning techniques generally focus on improving class separability, these methods are expected to produce better results in terms of classification performance. The comparative results with KNWFE indicate that PCA and KPCA are still preferable to reduce dimensionality of hyperspectral images.

Fundamentally, KPCA is a version of PCA whose performance is greatly affected by the choice of the kernel and parameters. Namely, the selection of the optimal kernel and parameters is crucial for KPCA to achieve good performance. However, the application results show that no single kernel function can be best for all kinds of machine learning problems [22] and, therefore, learning of optimum kernels over a kernel set is an active research area nowadays [2327]. Li and Yang presented an ensemble KPCA method with Bayesian inference strategy in [28]. They exploited only Gaussian radial basis function (RBF) with different scale parameters as subkernels. Zhang et al. [29] have developed a method for unsupervised kernel learning in the KPCA, dubbed as A-KPCA, and applied the new method for object recognition problems.

The A-KPCA learns the kernels via an unsupervised learning approach. The 1D input vectors, e.g., feature vectors, are transformed into 2D feature matrices by different kernels. Each column of the feature matrix comes from corresponding 1D input vector. Nonlinear feature extraction (FE) is obtained from one set of projective vectors corresponding to the column direction of the feature matrices. The set of projective vectors corresponding the row direction of the 2D feature matrices is utilized for searching optimal kernels combination simultaneously. Despite having superior performance compared to KPCA, the A-KPCA has some critical limitations. Specifically, A-KPCA works completely unsupervised, and it is thus incapable of enhancing the class separability and it has no kernel preselection process. These are the main motivations of our work.

In this paper, a novel framework is introduced for hyperspectral FE and classification based on multiple KPCA models with an Ensemble Learning (EL) strategy in a semisupervised manner. EL is a process of combining multiple models, called experts, to set up a strong model for a specific machine learning problem [30]. Strong discriminative ability of individual experts and high diversity among them are required to produce satisfactory models [31, 32]. An acceptable classification performance highly depends on the class separability of features that is directly related to the discriminative ability. Inspired by EL, we extend the A-KPCA method by employing multiple kernels such that subkernels possessing higher discrimination ability are highlighted. The proposed approach, multiple kernel PCA (M-KPCA), learns an ensemble of multiple kernel principal components on an available labeled data set, and the final features are extracted via a weighted combination of all subkernels according to their separability performance. The early purpose of this paper is the utilization of the KPCA and A-KPCA in hyperspectral images and to determine impact of using nonlinear versions of PCA on classification performance. The further contributions and novelties in this paper can be summarized as follows: (1) a novel multikernel PCA strategy is presented by exploiting Ensemble Learning to evaluate and select the kernels; (2) M-KPCA acquires the superior classification results than PCA, KPCA, and A-KPCA by highlighting the subkernels with a class separability based weighting strategy; (3) M-KPCA produces better or competitive classification performance with other popular unsupervised FE methods like locality preserving projections (LPP) [33], random projections (RP) [34], and t-distributed stochastic neighbor embedding (t-SNE) [35]. After FE with all mentioned methods, the popular and robust support vector machines (SVMs) classifier is used for supervised classification. Since SVMs consider samples close to the class boundary, called support vectors, they show great performance even in high-dimensional data with small training samples [36, 37].

The paper is outlined as follows. Section 2 reviews the related work. In Section 3, the proposed framework of M-KPCA is presented. Next, a series of experiments are carried out on real data sets for verifying our method’s effect in Section 4. Finally, Section 5 concludes this paper.

2.1. KPCA Background

The raw data is projected into the feature space by a nonlinear mapping function and the useful information is concentrated into some principal components corresponding to the larger eigenvalues [19]. Define a learning set as , . Let be a nonlinear mapping from the input space to a high-dimensional feature space. The inner product in feature space is calculated by the kernel function in the original input space:where the superscript represents the transpose operation. Denote and . Assuming , i.e., data are centered in , then the total scatter matrix can be defined as . To compute the projective vector for optimal solution, the KPCA employs the following norm:

Computation of optimal projective vector provides solution for the eigenvalue problem: in which and eigenvectors . Hence, (2) can be rewritten as an equivalent problem:where is the kernel matrix. Solutions of (3) are corresponding to the largest eigenvalues; then is the solution vector of (2). The KPCA based FE does not include the nonlinear mapping as any kernel method, and it only needs a kernel function in the input space. To obtain better performance with KPCA, the parameters of the kernel are optimized. However, this optimization cannot produce adequate solutions for every application or data sets because of the nature of the kernel itself [22]. To overcome this drawback, an adaptive kernel combination technique is introduced in [29].

2.2. Adaptive KPCA (A-KPCA)

As pointed out in Section 1, the performance of KPCA is notably affected by the selection of kernels and its parameters. Therefore, it needs some extensions. Let be a set of nonlinear mappings. As mentioned in Section 2.1, the inner products in are described as the kernels. Using definition of , can be written. In this equation, is the Hilbert space as the direct sum of and the inner product in can be defined as

To construct a 2D feature matrix, a sample of learning set is transformed to high-dimensional feature space and then is obtained. Here, each column of corresponds to a nonlinear mapping generated by ’s. Thus, vector-based data is converted to matrix based format. Assuming ’s have zero means, i.e., , can be written. Equation (6) includes generated feature vectors. Appropriate and matrices must be determined to optimizewhere are projective vectors corresponding to columns of while corresponding to rows of . The purpose of is to extract features, while the purpose of is kernel selection. In other words, the unsupervised kernel learning and nonlinear FE are simultaneously realized according to projective vectors which are included in and . is the Frobenius norm of matrix, i.e., , where denotes the trace of a matrix. It can be defined as , where . Since the size of original is very large, i.e., , can be written instead of . Hence, is obtained. These calculations allow us to rewrite (6) as (7):where the constrains of (7) are and . Here is sized kernel matrix and it is constructed as follows:

To solve this optimization problem, inspired by Ye’s work [38], an iterative procedure is presented by the following theorem [29].

Theorem 1. Let and be the optimal solution to (7): then (i) eigenvectors corresponding to the largest eigenvalues of the matrix form for a given ; (ii) eigenvectors corresponding to the largest eigenvalues of the matrix create for a given .

After computing and , these matrices can be used to extract the nonlinear features for a test instance . Kernel matrix is constructed and then projected according to , so the nonlinear features are contained in . The A-KPCA method is given in Algorithm 1.

Input: Given training set
(a) Create the kernel matrix for each .
(b) Get initial and .
(c) For given , calculate the eigenvectors of corresponding to the largest eigenvalues.
(d) For given , calculate the eigenvectors of corresponding to the largest eigenvalues.
(e) , goto step (c) until convergence.
Output: L and N.

3. Multiple Kernel PCA (M-KPCA)

In Section 2, we have demonstrated that A-KPCA manipulates more than one subkernels. A mapping rule transforms input data samples into corresponding Reproducing Kernel Hilbert Space. Each kernel thus acquires a particular type of information from a given data set, thereby providing a partial description of view data. The value of this specific information may vary according to different machine learning tasks such as classification, clustering, dimensionality reduction, etc. For instance, in a classification problem, high discrimination ability of kernels yields the better results. Hence, we add this capability to A-KPCA with ideas of EL. Our proposed technique learns new representation for a hyperspectral image exploiting all available training data. It is thus independent of the classifier.

As seen in formulation (8) and Theorem 1, there are not any coefficients to quantify the contribution of subkernels in classification. In other words, the A-KPCA utilizes the unweighted summation. Nevertheless, the discriminative ability of kernels in FE plays significant role for the separability of the classifier. If we add a weighting coefficient on right side of (8), then it becomes

The discriminative ability of a kernel can be measured by an ideal kernel in a given classification task. Cristianini et al. [23] introduced a measure of similarity between two arbitrary kernels or between a kernel and an ideal kernel called kernel alignment (KA). The alignment between two regular kernels is given aswhere the Frobenius product of two Gram matrices and is defined as [23, 39]. This measure can be viewed as the cosine of the angle between and , so it fluctuates between for arbitrary matrices. However, since we consider only positive semidefinite Gram matrices in KA, the score is lower bounded by zero. The alignment can also be adopted to capture the degree of agreement between a kernel and the target label matrix, also considered as ideal kernel. A larger value of KA indicates the higher discriminative ability and it is one of the main strengths for a subclassifier such that they improve the ensemble effect in an EL strategy [40, 41]. An idealized kernel for a binary classification problem can be composed of the dot product of target labels, i.e., , and the alignment between a kernel and the ideal kernel is written as

Our goal is to construct an A-KPCA based algorithm which has improved separability of multiclass patterns. Here, kernel class separability (KCS) measure based on scatter matrix is employed to measure the class separability of training samples in feature space. The KSC is a general form of KA and it can be written in the form [42]:where and , respectively, stand for between-class scatter matrix and within-class scatter matrix in kernel space and the traces of them are obtained aswhere denotes the number of training samples in the th class, , and is the th sample in the related class. and are the mean vector for th class and the mean vector for all training samples, respectively. is the mapping function from the input space to the feature space as described in the beginning of Section 2.1. A larger value of signifies superior class separability in the training set. A maximization problem may thus be created to obtain optimal kernels and their parameters or eliminate weak kernels [43], but, in this paper, we directly exploit the value of (12) as the measure of discriminability; hencewhere .

After all, we extend the noniterative A-KPCA algorithm using kernel class separability measure with a semisupervised strategy. The proposed noniterative M-KPCA technique is given in Algorithm 2.

Input: Given training set with labels .
(a) Obtain ’s for corresponding pre-selected kernels using Eqs. (12) and (13).
(b) Create the kernel matrix for each as in Eq. (9).
(c) Calculate the eigenvectors and eigenvalues of . Sort the eigenvectors according to
the decreasing order of and select first eigenvectors .
(d) Calculate the eigenvectors and eigenvalues of . Sort the eigenvectors according to
the decreasing order of and select first eigenvectors .
(e) The final subspaces are and .
Output: and .

4. Experiments

In this section, we investigate the performance of the proposed M-KPCA algorithm compared with a number of conventional and state-of-the-art techniques on two Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) hyperspectral data sets. Our experiments are conducted on a machine with an Intel Core i5-2410M CPU at 2.30GHz and 8GB DDR-III RAM.

4.1. Data Sets and Experimental Setup

The first set is an airborne remote sensing data captured by the AVIRIS sensor over northwest Indiana on June 12, 1992. Indian Pines data has 16 labeled classes and 145 lines/scene and 145 pixels/line. Originally, the scene has 220 spectral bands (10 nm spectral bandwidth from 0.4 to 2.5 μm); after discarding the water absorption and noise bands, based on [44, 45], only 159 bands were used in the experiments.

Airborne hyperspectral data is acquired by AVIRIS sensor at 18 m spatial resolution over Kennedy Space Center (KSC) during March 1996 and has been employed as a second data source. Noisy bands and water absorption bands are removed. The remaining of the HSI data has 176 bands for 13 wetland and upland classes. Figure 1 shows the RGB image of Indian Pines and false color image of the KSC. Table 1 lists the summary of the data sets in our experiments. All samples in each data are adjusted in the range , as suggested in [46].


NoIndian PinesSamplesKSCSamples
Class nameTrainTestClass nameTrainTest

1Alfalfa1440Scrub190571
2Corn-no till3591075Willow swamp61182
3Corn–min till209625Cabbage hamm64192
4Corn59175Cabbage palm63189
5Grass-pasture124373Slash pine40121
6Grass-trees187560Oak57172
7Grass-past. moved620Hardwood swamp264791
8Hay-windrowed122367Graminoid marsh108323
9Oats515Spartina marsh130390
10Soybean-no till242726Cattail marsh101303
11Soybean-min till6171851Salt marsh105314
12Soybean-clean till154460Mud flats126377
13Wheat53159Water232695
14Woods324970
15Bldg-Grass-Tree95285
16Stone-steel towers2471

Total2594777215413670

The PCA and KPCAs are implemented using the SIMFEAT toolbox [47]. In each experiment, a single kernel is selected for the KPCA. In addition to Gaussian radial basis function (RBF) kernel which is formulated as , we have employed three more kernels (see Table 2). Before solving the eigenvalue problem, the parameter in the RBF, Laplacian and Cauchy kernels should be selected or optimized. Unless otherwise stated the kernel parameter is set towhere is the centroid of the total training data [48]. The kernel parameter is nonoptimized and same for each exploited kernel given in Table 2. However, the aim of combination of nonoptimized kernels is to yield a better FE technique for classification.


KernelsFormula

RBF
Laplacian
Cauchy
Histogram Intersection (HIST)

The valuable parts of the obtained cumulative eigenvalues after eigen-decomposition for each method are shown in Figure 2. According to the cumulative eigenvalues of PCA, two principal components reach 99% of total variance for Indian Pines case. Nevertheless, in KSC case, three principal components are needed to reach 99% of information. According to these results, the new dimensions of Indian Pines and KSC for classification experiment are, respectively, defined as 2 and 3. However, hyperspectral information cannot be represented utilizing only the second-order statistics as it is pointed out in Section 1. From Figure 2, it can be derived that more kernel principal components (KPCs) are needed to realize the same amount of variance as for PCA. Note that the total number of components with PCA is equal to the number of bands, i.e., 159 for Indian Pines, while it is equal for KPCA to the size of the number of training samples, i.e., 2594, which is significantly higher. For the Indian Pines data set, the first 11, 51, 31, and 33 KPCs are needed to accomplish 99% of the cumulative variance with RBF, Laplacian, Cauchy, and histogram intersection (HIST) kernels, respectively. We observe that 8 KPCs are needed with the RBF, 56 with the Laplacian, 18 with the Cauchy, and 55 with the HIST kernel to achieve same amount of information considering to KSC results. In the case of A-KPCA and M-KPCA, p is set to 1 for kernel selection. For the Indian Pines data set, 35 adaptive KPCs and 20 multiple KPCs contain 99% of information and only 14 adaptive KPCs and 12 multiple KPCs for the KSC.

In order to demonstrate the first principal components (PCs) more efficiently, a subimage of size 100 × 100 in the KSC hyperspectral cube is selected. The first PCs for all of the methods are depicted in Figure 3.

After FE, SVM classifier has been employed for classification. For nonlinear SVMs, we have used the RBF kernel which is formulated in Table 2. The classification experiments and the optimization of parameters, C and σ, of SVMs are achieved using LIBSVM [49] with 5-fold cross validation technique. Since SVMs are designed to solve binary problems, various approaches have been proposed for multiclass situations such as remote sensing applications. The most popular approaches for multiclass classification are one-against-all (1AA) and one-against-one (1A1). In this paper, we have applied the 1AA strategy for each class. Each test sample is finally labeled as the class whose output score is maximum.

Finally, we compare the proposed M-KPCA algorithm against five state-of-the-art dimension reduction algorithms, i.e., linear discriminant analysis (LDA) [50], LPP, probabilistic PCA (pPCA) [51], RP, and t-SNE. LDA, LPP, pPCA, and t-SNE are implemented using the MATLAB toolbox [52] for dimensionality reduction, and RP algorithm is designed based on Wang’s work [53].

4.2. Comparison with KPCA and A-KPCA

The original data sets, termed as raw, are also classified for comparisons. Tables 3 and 4 compare the performance of all models numerically (class accuracies and overall accuracy (OA) in percentages) and statistically (kappa test) for the Indian Pines and the KSC data sets, respectively.


FeatureRawPCAKPCARBFKPCALapKPCACauKPCAHISTA-KPCAM-KPCA

# of features1592115131333520

SVMC430.5279.17534.67663.98663.98663.98603.46346.67
params2.1815.2213.37 13.2813.2713.2816.594.74

187.0433.3488.8974.0777.7890.7490.7494.44
288.7740.5986.8275.8082.0883.1989.8289.96
386.0906.9586.8165.4782.6178.3088.4988.60
492.3135.9091.0368.3881.6282.4891.4596.58
598.1958.9597.7988.7395.7790.7497.5998.59
698.9391.4398.3997.5997.9996.5299.0699.33
792.3130.7792.3188.4692.3184.6292.3196.15
899.3995.5098.9897.7598.9897.3499.3999.59
990.0000.0095.0090.0085.0090.0095.0095.00
1089.6766.3291.1280.5890.5088.7491.2291.84
1191.9073.3093.2786.9591.1389.5593.1194.25
1294.3021.0195.1180.7890.8888.2795.4495.60
1399.0673.5899.0698.1198.5897.6499.0699.53
1497.3086.7998.0796.9197.9996.5297.8497.60
1580.0015.5377.1153.4270.2658.9582.6388.95
1692.6390.5394.7494.7491.5890.5390.5394.74

OA92.4759.7892.6983.9290.1688.3093.4494.28
0.91300.53990.92100.82150.89670.87490.93170.9348


FeatureRawPCAKPCARBFKPCALapKPCACauKPCAHISTA-KPCAM-KPCA

# of features176385618551412

SVMC534.66279.17430.54663.98181.02346.69117.38165.69
paramsσ3.779.8734.95 5.5810.695.585.615.01

197.5092.6497.5095.6696.5896.3298.0398.29
294.2484.7793.8390.1291.7791.3694.2495.47
393.7589.8493.3691.8093.7593.7593.3697.26
478.1749.2180.1671.4378.1775.4081.3593.25
577.6456.5260.8755.2861.4957.7678.8888.82
675.1150.6658.0855.0275.1158.9582.5386.90
793.3473.3389.5291.4390.4891.4394.2995.24
896.2978.6594.2090.2695.1391.1896.7596.28
999.0497.5098.6596.7398.6597.5099.2399.61
1010091.5899.7597.7799.2697.7710099.26
1199.0597.8599.0598.0999.0598.8199.0599.28
1299.0681.1196.8295.2397.4295.6399.0199.20
1310099.8910099.8999.8999.8999.89100

OA95.5186.5393.7891.6594.3492.5996.1497.52
0.95220.84930.93470.91060.94070.92130.96010.9724

Inspection of Table 3 reveals A-KPCA outperforms PCA and all the four KPCAs. Further analysis shows that the KPCA performs significantly better than the conventional PCA. Regarding the OAs, it is clear that the M-KPCA based classification produces more accurate results when compared to the A-KPCA based classification. RBF kernel gives the best results for KPCA among the other kernel functions as seen in Table 3.

The results for the KSC data set are reported in Table 4. Regarding the PCA and KPCA results, FE does not improve the accuracies significantly. The comparison between KPCA and PCA shows that KPCA performs better than the PCA in terms of classification accuracies. Moreover, classification of the A-KPCA features is more precise that the one yielded employing the all KPCs. As with the previous experiment, the best results are obtained with the M-KPCA. Figures 4 and 5 represent the available labeled scenes and classification maps of all models for the Indian Pines and KSC data sets, respectively.

In the last experiment, we increase the number of KPCAs in both A-KPCA and M-KPCA utilizing different scale parameters in the same kernel. Tables 3 and 4 show that the best single kernel for each data set is different. Therefore, we, respectively, adopt the seven RBF and Cauchy kernel functions for Indian Pines and KSC such as their scale parameters in the range . The central parameter is determined by (15). The SVM is again employed for classification after FE. The selection of eigenvalues for each method is defined in 99% confidence interval. Table 5 summarizes the classification accuracies of this experiment. The results show that the M-KPCA-based features are better than the individual KPCAs and A-KPCA features on all data sets, no matter which kernel parameter is applied.


DataKernelMetricKPCAA-KPCAM-KPCA

Indian PinesRBFOA83.5488.6990.5292.6957.8052.4119.4794.0996.03
0.80060.87030.89180.92020.50360.44050.12670.93270.9459

KSCCauchyOA91.8393.7893.2494.3480.2772.5469.7695.9396.36
0.90890.93070.92470.94090.77940.69180.66050.95470.9574

4.3. M-KPCA versus Other Dimension Reduction Algorithms

In this section, we compare our method (M-KPCA) with the five FE methods, i.e., LDA, LPP, pPCA, RP, and t-SNE. M-KPCA is constructed with the subkernels indicated in Table 2, and kernel parameters are determined from (15). Different values of the dimensionality number of the new subspaces are tested for the SVM classifier across the two data sets. A set of values are independently generated for the subspace dimension. The classification accuracy is reported for each model, and we plot the results in Figure 6.

Inspection of Figure 6 reveals that proposed method regularly outperforms the competing FE methods for multiclass classification in higher dimensions. For instance, if the number of extracted features is set to 50, M-KPCA improves over the best competing method RP by 5.19% in terms of OA on Indian Pines and by 3.78% on KSC. It can be also found from Figure 6 that t-SNE method is highly stable against any dimensional changes. On comparison of methods, we also observe that the performance of LDA and pPCA is limited for both data sets. Considering the lower dimensions (i.e., when the number of new dimension is assigned a value smaller than 10), the best features are produced by the t-SNE which is also the most time-consuming method. The rest of the methods are sorted as RP, LDA, LPP, pPCA, and M-KPCA in ascending order according to the average computation times.

5. Conclusion

In this paper, a novel semisupervised KPCA framework named multiple KPCA (M-KPCA) is proposed for effective feature extraction of hyperspectral images. It applies ensemble strategy to favor good candidate kernels during nonlinear projections. A noniterative algorithm is developed to simultaneously feature extraction and kernel combination based on a kernel class separability criteria. In terms of the number of kernels, KPCA uses only one base kernel with predefined parameter(s) (if existing). In terms of the kernel quality, A-KPCA has no measurement procedure to evaluate the efficiency of kernels. M-KPCA overcomes these drawbacks of both KPCA and A-KPCA.

Dimension reduced HSI data is classified by nonlinear SVMs to compare classification performance for several models. Experiments on two real HSI data sets demonstrate that the best kernel type varies according to data (see Tables 3 and 4). In the first test, KPCA presents better performance compared to the conventional PCA. Overall evaluation for dimension reduction performance of PCA, KPCAs, A-KPCA, and M-KPCA techniques shows that the M-KPCA is more successful than the others. In the second experiment, we have employed seven candidate kernel functions using different kernel parameters for each data. These KPCAs are then utilized to construct the A-KPCA and M-KPCA. Experiments on the AVIRIS data sets confirm that the M-KPCA outperforms the individual KPCAs and the A-KPCA in terms of both OA and Kappa coefficient. Moreover, the comparative results in Section 4.3 demonstrate that M-KPCA experimentally accomplished superior or competitive classification accuracy more than the other unsupervised state-of-the-art FE methods.

The results clearly validate that semisupervised learning of kernels with M-KPCA increases the robustness of nonoptimized KPCAs. One and probably most important limitation of the M-KPCA is its computational complexity, related to the number of samples used for constructing the kernel matrix. Therefore, our future work aims to address the problem of reducing the complexity. It is also possible to extend the proposed method to a selective approach which eliminates weak kernels before feature extraction.

Data Availability

The Indian Pines and KSC data that support the findings of this study are, respectively, available in https://engineering.purdue.edu/~biehl/ and http://www.ehu.eus/ccwintco/index.php?title =Hyperspectral_Remote_Sensing_Scenes.

Conflicts of Interest

The author declares that they have no conflicts of interest.

Acknowledgments

The authors would like to thank Daoqiang Zhang and Zhi-Hua Zhou for providing a part of the source code for the A-KPCA.

References

  1. Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classification via kernel sparse representation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 1, pp. 217–231, 2013. View at: Publisher Site | Google Scholar
  2. D. W. Scott, The Curse of Dimensionality and Dimension Reduction in Multivariate Density Estimation: Theory, Practice, and Visualization, John Wiley & Sons, 1992.
  3. H. Huang, F. Luo, J. Liu, and Y. Yang, “Dimensionality reduction of hyperspectral images based on sparse discriminant manifold embedding,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 106, pp. 42–54, 2015. View at: Publisher Site | Google Scholar
  4. L. Zhang, Q. Zhang, L. Zhang, D. Tao, X. Huang, and B. Du, “Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding,” Pattern Recognition, vol. 48, no. 10, pp. 3102–3112, 2014. View at: Publisher Site | Google Scholar
  5. L. Zhang, L. Zhang, D. Tao, and X. Huang, “Tensor discriminative locality alignment for hyperspectral image spectral-spatial feature extraction,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 1, pp. 242–256, 2013. View at: Publisher Site | Google Scholar
  6. P. Nagabhushan, D. S. Guru, and B. H. Shekar, “Visual learning and recognition of 3D objects using two-dimensional principal component analysis: A robust and an efficient approach,” Pattern Recognition, vol. 39, no. 4, pp. 721–725, 2006. View at: Publisher Site | Google Scholar
  7. L. O. Jimenez, “Hyperspectral data analysis and supervised feature reduction via projection pursuit,” IEEE Transactions on Geoscience and Remote Sensing, vol. 37, no. 6, pp. 2653–2667, 1999. View at: Publisher Site | Google Scholar
  8. S. Kaewpijit, J. L. Moigne, and T. El-Ghazawi, “Automatic reduction of hyperspectral imagery using wavelet spectral analysis,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 4, pp. 863–871, 2003. View at: Publisher Site | Google Scholar
  9. X. Jia, B.-C. Kuo, and M. M. Crawford, “Feature mining for hyperspectral image classification,” Proceedings of the IEEE, vol. 101, no. 3, pp. 676–697, 2013. View at: Publisher Site | Google Scholar
  10. C. Lee and D. A. Landgrebe, “Analyzing high-dimensional multispectral data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 31, no. 4, pp. 792–800, 1993. View at: Publisher Site | Google Scholar
  11. J. A. Richards, Remote Sensing Digital Image Analysis, Springer-Verlag, Berlin, Germany, 1993.
  12. D. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing, John Wiley & Sons, Hoboken, NJ, USA, 1999.
  13. H. Binol, S. Ochilov, M. S. Alam, and A. Bal, “Target oriented dimensionality reduction of hyperspectral data by Kernel Fukunaga–Koontz Transform,” Optics and Lasers in Engineering, vol. 89, pp. 123–130, 2015. View at: Publisher Site | Google Scholar
  14. M. Lennon, G. Mercier, M. Mouchot, L. Hubert-Moy, and S. B. Serpico, “Curvilinear component analysis for nonlinear dimensionality reduction of hyperspectral images,” in Proceedings of the Image and Signal Processing for Remote Sensing VII, vol. 4541, pp. 157–168, Toulouse, France. View at: Publisher Site | Google Scholar
  15. L. Journaux, X. Tizon, I. Foucherot, and P. Gouton, “Dimensionality reduction techniques: An operational comparison on multispectral satellite images using unsupervised clustering,” in Proceedings of the 7th Nordic Signal Processing Symposium, NORSIG 2006, pp. 242–245, Rejkjavik, Iceland, June 2006. View at: Google Scholar
  16. B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural Computation, vol. 10, no. 5, pp. 1299–1319, 1998. View at: Publisher Site | Google Scholar
  17. A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural Networks, vol. 13, no. 4-5, pp. 411–430, 2000. View at: Publisher Site | Google Scholar
  18. M. Fauvel, J. Chanussot, and J. A. Benediktsson, “Kernel principal component analysis for feature reduction in hyperspectrale images analysis,” in Proceedings of the 7th Nordic Signal Processing Symposium, NORSIG 2006, pp. 238–241, Rejkjavik, Iceland, June 2006. View at: Google Scholar
  19. M. Fauvel, J. Chanussot, and J. A. Benediktsson, “Kernel principal component analysis for the classification of hyperspectral remote sensing data over urban areas,” EURASIP Journal on Advances in Signal Processing, vol. 2009, Article ID 783194, 2009. View at: Google Scholar
  20. H.-Y. Huang and B.-C. Kuo, “Double nearest proportion feature extraction for hyperspectral-image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 48, no. 11, pp. 4034–4046, 2010. View at: Google Scholar
  21. B.-C. Kuo, C.-H. Li, and J.-M. Yang, “Kernel nonparametric weighted feature extraction for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 47, no. 4, pp. 1139–1155, 2009. View at: Publisher Site | Google Scholar
  22. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multiple parameters for support vector machines,” Machine Learning, vol. 46, no. 1–3, pp. 131–159, 2002. View at: Publisher Site | Google Scholar
  23. N. Cristianini, J. Kandola, A. Elisseeff, and J. Shawe-Taylor, “On kernel-target alignment,” in Advances in Neural Information Processing Systems, pp. 367–373, 2002. View at: Google Scholar
  24. C. S. Ong, A. J. Smola, and R. C. Williamson, “Hyperkernels,” in Advances in Neural Information Processing Systems, S. Becker, S. Thrun, and K. Obermayer, Eds., pp. 478–485, 2003. View at: Google Scholar
  25. J. T. Kwok and I. W. Tsang, “Learning with idealized kernels,” in Proceedings of the 20th International Conference on Machine Learning, pp. 400–407, Washington, Wash, USA, 2003. View at: Google Scholar
  26. I. W.-H. Tsang and J. T.-Y. Kwok, “Efficient Hyperkernel learning using second-order cone programming,” IEEE Transactions on Neural Networks and Learning Systems, vol. 17, no. 1, pp. 48–58, 2006. View at: Publisher Site | Google Scholar
  27. H. Binol, A. Bal, and H. Cukur, “Differential evolution algorithm-based kernel parameter selection for Fukunaga-Koontz Transform subspaces construction,” in Proceedings of the High-Performance Computing in Remote Sensing V, vol. 9646, Toulouse, France, September 2015. View at: Google Scholar
  28. N. Li and Y. Yang, “Ensemble Kernel principal component analysis for improved nonlinear process monitoring,” Industrial & Engineering Chemistry Research, vol. 54, no. 1, pp. 318–329, 2014. View at: Publisher Site | Google Scholar
  29. D. Zhang, Z.-H. Zhou, and S. Chen, “Adaptive kernel principal component analysis with unsupervised learning of kernels,” in Proceedings of the 6th International Conference on Data Mining, ICDM 2006, pp. 1178–1182, Hong Kong, China, December 2006. View at: Publisher Site | Google Scholar
  30. F. S. Uslu, H. Binol, M. Ilarslan, and A. Bal, “Improving SVDD classification performance on hyperspectral images via correlation based ensemble technique,” Optics and Lasers in Engineering, vol. 89, pp. 169–177, 2017. View at: Publisher Site | Google Scholar
  31. K. Tumer and J. Ghosh, “Error correlation and error reduction in ensemble classifiers,” Connection Science, vol. 8, no. 3-4, pp. 385–404, 1996. View at: Publisher Site | Google Scholar
  32. S. Mao, L. Jiao, L. Xiong, and S. Gou, “Greedy optimization classifiers ensemble based on diversity,” Pattern Recognition, vol. 44, no. 6, pp. 1245–1261, 2011. View at: Publisher Site | Google Scholar
  33. X. He and P. Niyogi, “Locality preserving projections,” in Advances in Neural Information Processing Systems, vol. 16, The MIT Press, Cambridge, MA, USA, 2004. View at: Google Scholar
  34. E. Bingham and H. Mannila, “Random projection in dimensionality reduction: applications to image and text data,” in Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '01), pp. 245–250, San Francisco, Calif, USA, August 2001. View at: Google Scholar
  35. L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2625, 2008. View at: Google Scholar
  36. J. Inglada, “Automatic recognition of man-made objects in high resolution optical remote sensing images by SVM classification of geometric image features,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 62, no. 3, pp. 236–248, 2007. View at: Publisher Site | Google Scholar
  37. H. Binol, G. Bilgin, S. Dinc, and A. Bal, “Kernel Fukunaga-Koontz Transform Subspaces for Classification of Hyperspectral Images with Small Sample Sizes,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 6, pp. 1287–1291, 2015. View at: Publisher Site | Google Scholar
  38. J. Ye, “Generalized low rank approximations of matrices,” in Proceedings of the 21st International Conference on Machine Learning (ICML '04), pp. 887–894, Banff, Canada, July 2004. View at: Google Scholar
  39. H. Xiong, M. N. S. Swamy, and M. O. Ahmad, “Optimizing the kernel in the empirical feature space,” IEEE Transactions on Neural Networks and Learning Systems, vol. 16, no. 2, pp. 460–474, 2005. View at: Publisher Site | Google Scholar
  40. T. Sun, L. Jiao, F. Liu, S. Wang, and J. Feng, “Selective multiple kernel learning for classification with ensemble strategy,” Pattern Recognition, vol. 46, no. 11, pp. 3081–3090, 2013. View at: Publisher Site | Google Scholar
  41. H. Binol, H. Cukur, and A. Bal, “A supervised discriminant subspaces-based ensemble learning for binary classification,” International Journal of Advanced Computer Research, vol. 6, no. 27, pp. 209–214, 2016. View at: Publisher Site | Google Scholar
  42. L. Wang, “Feature selection with kernel class separability,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 9, pp. 1534–1546, 2008. View at: Publisher Site | Google Scholar
  43. L. Wang and K. L. Chan, “Learning kernel parameters by using class separability measure,” in Proceedings of the Advances in Neural Information Processing Systems, Sixth workshop on Kernel Machines, Canada, 2002. View at: Google Scholar
  44. R. I. Faulconbridge and M. R. Pickering, “Unsupervised band removal leading to improved classification accuracy of hyperspectral images,” in Proceedings of the in Proceedings of the 29th Australasian Computer Science Conference, 2006. View at: Google Scholar
  45. O. Rajadell and P. Garcia-Sevilla, “Textural features for hyperspectral pixel classification,” in IbPRIA 2009: Pattern Recognition and Image Analysis, vol. 5524 of Lecture Notes in Computer Science, pp. 208–216, Springer, Berlin, Germany, 2009. View at: Google Scholar
  46. C.-H. Li, B.-C. Kuo, C.-T. Lin, and C.-S. Huang, “A spatial-contextual support vector machine for remotely sensed image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 3, pp. 784–799, 2012. View at: Publisher Site | Google Scholar
  47. J. Arenas-Garcia, K. B. Petersen, G. Camps-Valls, and L. K. Hansen, “Kernel multivariate analysis framework for supervised subspace learning: A tutorial on linear and kernel multivariate methods,” IEEE Signal Processing Magazine, vol. 30, no. 4, pp. 16–29, 2013. View at: Publisher Site | Google Scholar
  48. H. Binol, F. S. Uslu, and A. Bal, “Unsupervised nonlinear feature extraction method and its effects on target detection,” International Journal of Electrical Electronics and Data Communication, vol. 3, no. 8, pp. 43–46, 2015. View at: Google Scholar
  49. C. Chang and C. Lin, “LIBSVM: a Library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, article 27, 2011. View at: Publisher Site | Google Scholar
  50. K. Etemad and R. Chellappa, “Discriminant analysis for recognition of human face images,” Journal of the Optical Society of America A: Optics, Image Science & Vision, vol. 14, no. 8, pp. 1724–1733, 1997. View at: Publisher Site | Google Scholar
  51. M. E. Tipping and C. M. Bishop, “Probabilistic principal component analysis,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 61, no. 3, pp. 611–622, 1999. View at: Publisher Site | Google Scholar | MathSciNet
  52. L. J. P. van der Maaten, E. O. Postma, and H. J. van den Herik, Matlab Toolbox for Dimensionality Reduction, MICC, Maastricht University, 2007.
  53. S. Wang, “A practical guide to randomized matrix computations with MATLAB implementations,” 2015, https://arxiv.org/abs/1505.07570. View at: Google Scholar

Copyright © 2018 Hamidullah Binol. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder
Views1326
Downloads549
Citations

Related articles

Article of the Year Award: Outstanding research contributions of 2020, as selected by our Chief Editors. Read the winning articles.