Combined Kernel-Based BDT-SMO Classification of Hyperspectral Fused Images
To solve the poor generalization and flexibility problems that single kernel SVM classifiers have while classifying combined spectral and spatial features, this paper proposed a solution to improve the classification accuracy and efficiency of hyperspectral fused images: (1) different radial basis kernel functions (RBFs) are employed for spectral and textural features, and a new combined radial basis kernel function (CRBF) is proposed by combining them in a weighted manner; (2) the binary decision tree-based multiclass SMO (BDT-SMO) is used in the classification of hyperspectral fused images; (3) experiments are carried out, where the single radial basis function- (SRBF-) based BDT-SMO classifier and the CRBF-based BDT-SMO classifier are used, respectively, to classify the land usages of hyperspectral fused images, and genetic algorithms (GA) are used to optimize the kernel parameters of the classifiers. The results show that, compared with SRBF, CRBF-based BDT-SMO classifiers display greater classification accuracy and efficiency.
Hyperspectral remote sensing images have great spectral resolution and ground-object recognition capabilities because they have hundreds of fine and continuous wave bands. However, these images suffer from some problems, for example, the low spatial resolution, the high proportion mixed pixels, the huge amount of data, the uncertainty of data, and the Hughes phenomenon affecting the supervised classification. Hence, it is ineffective to classify based on spectral features alone. To improve the spatial resolution, hyperspectral images can be fused with high spatial resolution images, and the classification accuracy can be improved by combining spectral features with spatial features. The support vector machine is a well-established learning method that provides an approach to the traditional Hughes effect and overfitting problems. SVM has evident advantages and thus is widely used for classifying hyperspectral remote sensing images, especially for small samples, nonlinear, and high-dimensional pattern recognition problems. However, the traditional kernel function is a single kernel function. Single kernel-based SVM classifiers deliver lackluster performance in classifying combined spectral and spatial features. Multi-kernel SVM classifiers are more effective than single kernel classifiers because the former can combine kernel functions according to different classification features and improve classification performance by combining the advantages of spectral, spatial, and structural features. In this paper, a binary decision tree-based multiclass SMO (BDT-SMO) classifier is proposed to improve the classification accuracy and efficiency of hyperspectral remote sensing images. Two radial basis functions (RBFs) are employed for spectral and textural features and are then combined in a weighted manner to form a new combined radial basis function (CRBF). Eventually, the single radial basis function- (SRBF-) based BDT-SMO classifier and the CRBF-based BDT-SMO classifier are used, respectively, to classify hyperspectral fused images of land usages, and genetic algorithms are used to optimize the kernel parameters of classifiers. The results show that CRBF-based BDT-SMO classifiers have superior classification accuracy and efficiency.
2. Support Vector Machines (SVM)
SVM is a statistical learning-based machine learning method. The key idea behind an SVM is to take the Vapnik structural risk minimization (SRM) as the inductive principle . If the feature space is linearly separable, the optimal classification hyperplane with low VC dimensions can be constructed in the high-dimensional feature space as the decision plane to maximize the distance between the two classes of data. Otherwise, kernel functions can be used to map the data into the high-dimensional feature space and then construct the optimal classification hyperplane in the high-dimensional feature space . The optimal classification hyperplane has to be able to separate the two classes (the error rate is 0 during the training) and maximize the distance between the two classes . An example of the SVM optimal classification hyperplane is shown in Figure 1 .
In Figure 1, circles and squares represent the two respective classes of samples. Let H be the optimal classification line. The points in the lines H1 and H2 parallel to H are support vectors, and the margin denotes the classification distance. Consider that for the classification plane , the following is true: , . In this case, the classification distance is . Maximizing the distance is equal to minimizing . The classification plane that satisfies this condition is the optimal classification plane.
For the linearly separable sample set (), , , , the Lagrange optimization method can be used to convert the classification plane optimization problem into a dual convex quadratic optimization problem. Hence, the linear sample classification problem is equal to the maximization problem of the following function : where denotes the Lagrange multiplier of each sample. This is a quadratic function optimization problem subject to the inequality constraints. In the solutions to (1), only a small subset of is not equal to 0, and the corresponding samples are the support vectors. The classification plane is determined by the support vectors. That is, a small number of samples suffice to construct the optimal classification plane. Solving the above problem results in the following optimal classification function : where denotes the nonzero samples (i.e., support vectors) and is the classification threshold that can be computed using any support vector or is defined as the median of a pair of support vectors from both classes.
For nonlinear classification, nonlinear mapping (i.e., the kernel function) can be performed to map the input data from the original space, , to the high-dimensional feature space, , and then construct the hyperplane in the high-dimensional feature space . In this case, a kernel function must be constructed in the original space and be made equal to the inner product of the transformed high-dimensional feature space. In this instance, the nonlinear classification problem is converted to the maximization of the following function : where is a constant. Solving the above problem will result in the following classification criterion function :
For nonlinear classification, choosing different inner kernel functions will produce different SVMs and have a different optimal classification hyperplane in the feature space. It has been shown that the common kernels are suitable for most nonlinear classification problems. The four common kernel functions are the linear kernel, the polynomial kernel, the radial basis function (RBF), and the sigmoid function . The choice of SVM kernels and setup of the kernel parameters are largely dependent on empirical and experimental analysis, because no well-established methods are currently available for this. Foody and Mathur showed that the kernel parameters and the error penalty factor, , rather than the class of the chosen kernels are the decisive factors in the performance of SVM .
Because traditional SVM algorithm is only suitable for two-class classification, the multiclass classification is often completed through the specific combination of multiple SVM classifiers. The multiclass classification methods commonly used are “one against one” SVM (OAO-SVM), “one against all” SVM (OAA-SVM), error correction code SVM (ECC-SVM), directed acyclic graph SVM (DAG-SVM), and binary decision tree SVM (BDT-SVM) .
3. Sequential Minimal Optimization (SMO)
Traditional SVM algorithms can be boiled down to quadratic programming problems. Most existing approaches to the training of large-scale sample sets are based on decomposition and iteration, that is, solving the original quadratic programming problem by decomposing it into smaller quadratic programming problems. This method enormously increases computation complexity without guaranteeing accuracy . The most prominent feature of SMO algorithms is that there is no need to run iterated algorithms because each small optimization problem can be solved analytically. Although the solution to each small optimization problem is not necessarily the final solution of the optimized Lagrange multiplier, the objective function approaches the minimal value . The Lagrange multiplier can be minimized repeatedly until the KKT conditions are satisfied and the objective function is minimized. The procedures of the SMO training algorithm are shown in Figure 2 . From Figure 2, it can be seen that SMO training involves firstly the determination of the two Lagrange multipliers and and the computation of ’s upper bound, , and lower bound, . Secondly, needs to be updated. If is smaller than the threshold, the updating fails. Otherwise, needs to be updated according to relevant rules, and the values of all objective functions, , should also be updated. The deviation between the function’s output and the target classification, , will be computed. If is smaller than the threshold, the algorithm terminates. SMO algorithms have been designed to solve two problems: firstly, they can solve simple optimization problems using analytical methods and secondly, they can choose strategies for optimizing the Lagrange multipliers . Although SMO has more quadratic programming subproblems, it adopts the “divide and conquer” strategy. Its learning can therefore be performed in a parallel manner, enabling it to handle heavy matrix operations (e.g., inner product computation of the kernel functions) and sample searches. Hence, its processing speed is improved substantially. SMO algorithms can solve large-scale SVM training problems because they do not need to process large matrices and have no special requirements on memory space . The SMO algorithm is also only suitable for two-class classification like traditional SVM algorithm, the multiclass classification is often completed through the specific combination of multiple SMO classifiers, and the multiclass classification methods commonly used are OAO-SMO, OAA-SMO, ECC-SMO, DAG-SMO, and BDT-SMO [7, 10].
4. Binary Decision Tree-Based Multiclass SMO (BDT-SMO) Classifiers
It was shown in [11–17] that both OAO-SVM and OAA-SVM have high classification speeds. When the data size is huge, however, the training efficiency will be downgraded severely and some regions are inseparable. Thus, they are unsuitable for classifying high-dimensional hyperspectral images that involve large amounts of sample data. ECC-SVM’s generalization is independent of the dimensions of the features, but its coding schemes are highly subjective, because the problems of how to determine the code length, generate codes based on the minimum intersymbol Hamming distance, and search for the optimal arrangement have yet to be solved [11, 12]. The DAG-SVM model has a tree structure. Although it can classify quickly and provides an approach to the inseparable region problems, its classification efficiency is unsatisfactory when faced with many classes. In addition, the root nodes are mostly selected empirically, so improperly selected root nodes can directly reduce classification accuracy. BDT-SVM is similar to DAG-SVM, but the difference is that BDT-SVM only needs to construct two-class SVM classifiers, whereas DAG-SVM needs internal nodes . Additionally, each node in BDT-SVM has only one parent node and the internal node can be something other than a one-against-one classifier. In DAG-SVM, each node can have many parent nodes, and each internal node must be a one-against-one classifier . BDT-SVM is characterized by high classification efficiency, fault tolerance, and generalization ability and thus it can produce better classification results than the other algorithms mentioned above .
An improvement on traditional SVM algorithms, SMO classification is faster than traditional algorithms for large sample datasets. SMO is therefore used in this work to replace traditional SVM algorithms, and the separation-based binary decision tree SMO (BDT-SMO) is employed for classifying hyperspectral remote sensing images. The existing researches [18, 19] have shown that, compared with OAO-SMO, OAA-SMO, ECC-SMO, and DAG-SMO, BDT-SMO has the best accuracy, efficiency, and generalization ability in the classification.
5. Multiple Radial Basis Kernel Function via Weighted Combination
5.1. Definition and Properties of Kernel Function
For the SVM nonlinear classification problem, the input data must be nonlinearly mapped in the original space, , to the linear classifiable data in the high-dimensional feature space using the kernel function, as well as constructing the classification hyperplane in the high-dimensional feature space. If a core function, , satisfies the Mercer conditions, then corresponds to the inner product in a transformation space . The Mercer theorem was proposed by Hilbert-Schmidt and it states the following: let be a compact subset of , for any given symmetric function, , in , if its integral operator in the Hilbert space satisfies the integral positive definiteness conditions : then, there must exist a feature space and a mapping , and The kernel function provides an effective approach to SVM nonlinear classification, and the inherent linearity of the high-dimensional space makes SVM more practically feasible. In various practical kernel applications, the kernel function operation in the original input space is equivalent to the high-dimensional operation executed after the nonlinear function is transformed, and the details of the nonlinear mapping are usually unknown. Different kernel functions satisfying the Mercer theorem can be combined as required by the application to construct a combined kernel function of higher complexity, which can be linear or nonlinear . From the kernel function properties, it is known that if and are kernel functions in and there exist constants and , then the function in the following equations is also the kernel function :
From the above equations it can be seen that the following equation will hold true if , and each kernel function satisfies the Mercer conditions above :
5.2. Radial Basis Kernel Function
In theory, any function satisfying the Mercer theorem can be used as a kernel function, but different kernel functions must be designed for different applications. Common kernel functions include linear kernels, polynomial kernels, sigmoid kernels, and radial basis functions (RBFs). The first three functions are global kernels, and only RBF is a local kernel. Extensive work has shown that RBF-based SVM outperforms SVM based on the other three kernels and thus is used widely . The inner product of the vectors and in the feature space for RBF is  where is a center point in RBF, is the norm of the vector representing the distance between and , and is a width parameter of the function that controls the radial scope of the function and, that is, determines the width of the -centered range within which RBF exerts its influence. RBF is a highly local kernel, and the learning ability of RBF-based SVM is largely dependent on the parameter and the punitive factor, . A small value means high SVM learning ability, low empirical risk in the structural risk, and poor generalization ability, giving rise to a large confidence range . SVM uses the punitive factor, , to balance empirical risk with confidence level to minimize the actual risk. A high value of means a high degree of data fitting and poor generalization ability. Therefore, proper values of and need to be selected to minimize structural risk and maximize generalization ability for classifiers .
5.3. Combined Radial Basis Kernel Function (CRBF)
There is some evidence [19, 23] that RBF can achieve high accuracy for spectral and textural feature classification and is capable of multiscale learning. To improve the accuracy and efficiency of spectral remote sensing classification, two different RBFs are firstly used for spectral and textural features and then perform weighted combination to form the CRBF. Consider  as the feature vector of any pixel in the fused image, as the combined vector of the pixel’s spectral features, as the pixel’s textural feature vector; and can be combined in a separate or cross manner. The kernels of the two combination approaches are shown as follows, respectively:
Equation (11) shows the separate combination of multiple single-kernel RBFs, where is the weight and and are the number of spectral feature kernels and textural feature kernels, respectively. From the kernel properties, it can be seen that if and , then each kernel satisfies the Mercer conditions. Equation (12) shows the cross-combination of multiple single-kernel RBFs. If each kernel satisfies the Mercer conditions, it needs and in and and in to have the same number of dimensions. It is seldom used in practice due to the high computational complexity of the kernel. In this work, the spectral features have more dimensions than the textural features, so (11) is necessary for combinatorial classification of spectral and textural features. According to the classification features selected in this work, (11) is adjusted by using the same kernel for spectral-terrain features. In addition, the same kernel is used for different types of textural features (gray, gradient, and scale) to reduce the computational complexity of the kernel. The adjusted kernel is shown as follows:
The CRBF constructed based on spectral and textural features is shown as follows: where is the weight and and are the standardization parameters determining the width of the range surrounding the center of the spectral radial kernel and the center of the textural radial kernel, respectively. The weight is used to balance the effect of the spectral radial kernel (SPRBF) and the textural weighted radial basis function kernel (WRBF) in CRBF. When is close to 1, SPRBF is dominant in CRBF; otherwise, WRBF is dominant. By adjusting , the weights of the effect of spectral and textural features in CRBF can be properly modified, and the optimum combinations to improve the learning and generalization abilities of CRBF can be identified.
6. DT-SMO Parameter Optimization Strategy Based on Genetic Algorithms
The types of kernels and selection of the corresponding parameters have enormous impact on the performance of the SVM classifiers. As mentioned above, the performance of the CRBF-based BDT-SMO classifier is dependent on the punitive factor, , the weight, , and the standardization parameters, and . Thus the parameter optimization strategy for CRBF is of great importance. The SVM parameters are traditionally determined by empirical means or via cross-validation. The empirical method is highly subjective and practically infeasible. The cross-validation approach is prone to getting stuck in a local minimum and it is hard to achieve the optimal approximation of a function because it requires the function to be continuous and derivable. In recent years, particle swarm optimization (PSO) algorithms and genetic algorithms (GA) have been used widely and effectively for SVM parameter selection and optimization.
In this work, GA was first used to obtain the kernel parameters of SRBF (including parameters and ) and CRBF (including parameters , , , and ). Next, the trained BDT-SMO classifier is employed to classify the hyperspectral fused images of land usages. SRBF and CRBF parameters can be obtained via GA according to the following steps in Figure 3 . CRBF has more kernel parameters than SRBF, and the parameters optimization processes of SRBF and CRBF are similar, so only the parameters’ optimization process of CRBF based on GA is described below.
(1) The BDT-SMO training samples are taken as the original population, the population size is determined, binary encoding is performed on all parameters, and the original GA population is selected randomly.
In order to transfer the excellent gene segments to the next generation better, the corresponding chromosomes (a string composed of specific symbols according to a certain order) of the parameters should be designed by a coding mechanism before searching process of GA. In this work, the four parameters of CBRF are encoded with binary codes, and the order of parameters in chromosome is , , , and . The lengths of the four parameters are represented with , , , and , respectively, and the length of the chromosome is , so the length of searching interval is . In this work, the values of ~ are set as follows: , , , and . The searching intervals of the parameters’ optimization are set according to the possible value ranges of the parameters as follows: , , , and . The structure of the chromosomes is shown in Figure 4.
(2) The fitness function is designed to measure how suitable the selected parameters are, and the fitness of each individual in the population is computed. In this work, the kernel parameters are optimized to improve the prediction accuracy of classification, that is, to make the actual values of the training samples approach the predicted values as closely as possible. Therefore, the total difference between the actual values and the predicted values is a proper measurement of the fitness. The fitness function used in this work is shown as follows :
In (15), represents the number of training samples and and represent the prediction value and actual value of the training sample . The bigger the value is, the better the fitness of the corresponding parameters becomes. That is, the quality of the individuals will be better, and the possibility that better genes are inherited to the next generation will be greater.
(3) Independent genetic manipulation is performed on the population: the independent genetic manipulation in this paper includes selection, crossover, mutation, and fitness evaluation. Firstly, the fitness of all individuals is calculated and the roulette wheel selection strategy is adopted to select individuals from the current population for evolution into the next generation. Secondly, a crossover operation is performed using the one-point crossover algorithm and the crossover probability () to obtain new individuals for the next generation . Thirdly, a mutation operation is performed using the simple mutation algorithm and the mutation probability (); new individuals are generated by randomly changing some gene segments in the old individuals. Finally, in the fitness evaluation, the fitness function (15) is used to measure the merits and drawbacks of the individuals. In the genetic algorithm, designing the appropriate fitness function can improve the optimization efficiency. In this work, the population size of GA is 100 and the values of and are as follows: , .
(4) Check whether it satisfies the termination conditions. If it does, the GA is terminated and the optimal parameters are returned. Otherwise, the third step is repeated to generate new parameter values. When the times of iterations reach the specified number or the satisfied solutions have been obtained, the iterations will be terminated and the results will be outputted. In this work, in order to avoid too long search time, not more than 1000 times of iterations are taken to get the final optimized parameters unless the satisfied values of the four parameters have been obtained.
7. Experiments and Analysis
7.1. Experimental Zones and Samples Selection
7.1.1. Experimental Zones and Related Images
To support the credibility of the experimental results, two zones are used in the experiment. The first zone is located in Xindian Village, Jinan District, part of Gulou District, Fuzhou, adjacent to Fuzhou National Forest Park. The covered area forms a rectangular shape, and the positions of the four apexes are (N26°10′24.87′′, E119°17′40.90′′), (N26°10′22.31′′, E119°20′19.64′′), (N26°6′39.45′′, E119°17′36.49′′), and (N26°6′36.90′′, E119°20′15.14′′). The zone has an elevation of 5~500 m, where the northern terrain is higher than the south, and the north is mostly mountainous with an elevation of 50~590 m, covered by laurel forest, mason pine, and tea garden, and interlaced with lanes, shrub-meadow, and bare land. The south part is flat as the core of the urban area of Fuzhou. The EO-1 Hyperion hyperspectral image and the EO-1 ALI panchromatic image for this zone are of the size 155 by 230 pixels and 465 by 690 pixels, respectively, and were captured on March 26th, 2003, covering an area of about 32.09 km2.
The second zone is located in Huangyan District, Taizhou, Zhejiang, and is part of the major urban area in Taizhou. The covered area takes a rectangular shape, and the positions of the four apexes are (N28°40′40.14′′, E121°14′55.85′′), (N28°40′42.20′′, E121°17′37.17′′), (N28°37′22.34′′, E121°14′59.14′′) and (N28°37′24.40′′, E121°17′40.37′′). The southeastern part of the zone is mostly mountain and forest land with an elevation of 50~500 m, covered by laurel forest, mason pine, and orchard, and interlaced with shrub-meadow, bare land, and lanes. As the heart of the urban area of Fuzhou, other parts of the zone are flat with an elevation of 5~50 m, rich in buildings, farmland, roads, and a small quantity of woodland and rivers. The Hyperion hyperspectral image and the ALI panchromatic image for this zone are 153 by 214 pixels and 459 by 642 pixels, respectively, and were captured on March 10th, 2003, covering an area of about 29.47 km2.
The obtained Hyperion hyperspectral images and the ALI panchromatic images needed preprocessing (e.g., remove undesired bands (the original images contain 241 bands), convert radiation values, remove bad lines, repair stripes, eliminate the Smile effect, perform atmospheric correction, geometric accuracy correction, etc.). The aim is to generate Hyperion images with 134 ground reflectance bands and ALI images with 1 panchromatic band. Next, the Gram-Schmidt (GS3) method is adopted to fuse the images above. The images are fused very effectively because the Hyperion and ALI images are both from the EO-1 sensors and have the same phase. In addition, existing research  has shown that, compared with other methods of hyperspectral images fusion, Gram-Schmidt (GS3) method can both maintain the spectral features and spatial texture features of hyperspectral images, and it is a relatively ideal method for the fusion of Hyperion hyperspectral images and ALI panchromatic images. The fused images from the two zones are shown in Figure 5.
(a) Zone 1
(b) Zone 2
The land in the two zones is divided into 12 classes based on its usage: farmland (C1), forest (C2), garden (C3), grassland (C4), bare land (C5), industrial and mining warehousing land (C6), residential land (C7), public administration/commerce and service/land concerning foreign affairs (C8), special land (C9), communications and transportation land (C10), water space (C11), and others (C12).
7.1.2. Selection of Samples
Representative regions are selected through field work and comprehensive analysis of remote sensing images to collect ground-object samples. It is usually required that the training samples are no fewer than the number of test samples and the training samples for each type of ground object are no fewer than the number of classification features. With the assistance of the high resolution images of the zones and the 1 : 10000 land usage maps, all the representative samples are extracted for each class of land usage from the two experimental zones, as shown in Tables 1 and 2.
After the samples are extracted, it is required that the feature data of the training and test samples are normalized, in order to eliminate the difference of features in terms of dimension, order of magnitude, and data dispersion.
7.2. Extraction and Selection of Classification Features
Because some ground objects (e.g., roads, buildings, mountains, and rivers) in the experimental zones show highly obvious textural or terrain features, the classification features in this work include the spectral, textural, and terrain features. To improve classification accuracy, as many features as possible are encompassed in this work because the overfitting problem in SVM schemes is not serious when the number of the features is larger.
One hundred and eighty-one features are extracted from the images of the experimental zones, including 136 spectral features, 42 textural features, and 3 terrain features. The spectral features include 134 ground reflectance features, 1 NDVI feature (normalized vegetation index), and 1 NDBI feature (normalized construction index). The textural features involve 6 gray distribution features (second moment, contrast, correlation, variance, inverse gaps, and information entropy extracted through the use of the joint occurrence gray matrix method), 15 gray gradient features  (e.g., small gradient dominance, large gradient dominance, heterogeneity of gray distribution extracted via the gray-gradient joint occurrence matrix method proposed by Hong Jiguang in 1984), and 21 textural scale features (semivariance, energy, and mean extracted from the 7 high-frequency components after the second level wavelet decomposition). The terrain features include 3 features (elevation, slope, and aspect) extracted from the DEMs of the experimental zones.
All the features belonging to different types can be represented by different feature vectors as follows: spectrum: = ; texture (gray-distribution): ; texture (gray-gradient): ; texture (gray-scale): ; terrain: .
The 5 feature vectors above can be integrated into 16 different combined feature vectors; the details of the different combined feature vectors are shown in Table 3.
In the classification experiment of remote sensing images, the test samples are used to verify the accuracy of the classification, and the classification results can be analyzed through the confusion matrixes. The evaluation indexes of classification results are commonly user accuracy (UA), producers’ accuracy (PA), and overall accuracy (OAA) and Kappa coefficient. In addition, the speed of the classification can be measured by the prediction time (TS). In this work, in order to select the best classification method and feature vector, some cross-experiments are performed, respectively, with the 16 different feature vectors above and 5 different classification methods. The classification methods include the minimum distance classifier (MDC), maximum likelihood classifier (MLC), BP neural network (BPN), spectral angle mapping (SAM), and sequential minimal optimization (SMO).
In order to analyze the influence of different numbers of training samples on the classification results, seven different numbers of sample subsets are selected randomly from the samples extracted from the experimental zones. The seven subsets relatively contain 5%, 10%, 20%, 40%, 60%, 80%, and 100% of the extracted samples and are applied to classification experiments using SVM and other methods. The stability of the classification results can be measured by the standard deviation. Obviously, each classification set has been repeated 7 times. The experiments are relatively repeated 7 times with different number of samples, and the classification results in Figures 6 and 7 are the stable values in the seven experiments. The experimental results are shown in Figures 6 and 7 (because the classification results of the two experimental zones are very similar, so only the result of experimental zone 1 is shown below).
Figures 6 and 7 show that the accuracy of the classification methods based on only spectral feature vector (v1) is lower than the other feature vectors, and the accuracy and speed of SMO method are higher than the other methods for the all feature vectors. In addition, the accuracy of SMO is the highest of all using the feature vector of v12 .
In addition, the different combined feature vectors have the similar effects on the Kappa coefficient and the overall accuracy, so the effect on the latter will not be shown here.
Therefore, in the land use classification of hyperspectral fused images, the method of SMO based on the features (178 dimensions) combined of spectrum and textural features can achieve the best classification accuracy and speed.
7.3. SMO-Based Multiclass Classification of Hyperspectral Fused Images of Land Usage
It is first required that the optimal SMO-based multiclass classification approach is determined before classification using the combined kernel-based SMO method which is undertaken. The training samples of the two zones are learned using OAO-SMO, OAA-SMO, DAG-SMO, ECC-SMO, and BDT-SMO, respectively, based on the same kernel (SRBF). GA is used to optimize the kernel parameters and . The experiments are relatively repeated 7 times with different number of samples like in Section 7.2. The seven subsets relatively contain 5%, 10%, 20%, 40%, 60%, 80%, and 100% of the extracted samples and are applied to classification experiments using different SMO multiclass classification approaches. The stability of the classification results can be measured by the standard deviation. The experimental results are provided in Tables 4 and 5, and the classification accuracies in tables are all the stable values in the seven experiments.
From Tables 4 and 5, the overall accuracy, average accuracy, Kappa coefficient values, and standard deviation of BDT-SMO approach are all higher than the other approaches. Obviously, BDT-SMO is the best multiclass classification approach for hyperspectral fused images.
7.4. CRBF-Based BDT-SMO Classification of Hyperspectral Fused Images of Land Usage
Land usages of the two zones are classified using linear, polynomial, sigmoid, SRBF, and CRBF-based BDT-SMO classifiers, respectively. The experiments are relatively repeated 7 times with different number of samples like in Section 7.3. The seven subsets relatively contain 5%, 10%, 20%, 40%, 60%, 80%, and 100% of the extracted samples and are applied to BDT-SMO classification experiments using different kernel functions. The stability of the classification results can be also measured by the standard deviation. The experimental results are provided in Tables 6 and 7, and the classification accuracies in tables are all the stable values in the seven experiments.
Tables 6 and 7 deliver the experimental results of the two zones, where and denote the offset coefficient and the number of polynomial orders, and denote the scale and attenuation parameters of the sigmoid kernel, and GA are employed to optimize the kernel parameters.
From Tables 6 and 7, it can be seen that the CRBF-based BDT-SMO classifiers have the highest overall accuracy (at 85.43% and 86.15%, 1.97% and 2.10% higher than the SRBF kernel), average accuracy (at 85.67% and 85.52%, 2.01% and 2.15% higher than the SRBF kernel), and Kappa coefficient values (0.8411 and 0.8483). The standard deviations of the classification accuracies of the CRBF-based BDT-SMO classifiers (0.63 and 0.62) are obviously lower than the others. In addition, CRBF has the fastest training speed second only to SRBF and the fastest test speed. The standard deviations of the CRBF-based and SRBF-based BDT-SMO classifiers are very similar and obviously lower than the others. Although, compared with SRBF, the training speed of the CRBF-based BDT-SMO classifiers is a little lower than SRBF-based, but the CRBF-based BDT-SMO classifiers have the better accuracy and test speed in classification and have almost the same stability with the SRBF-based BDT-SMO classifiers. Apparently, CRBF-based BDT-SMO classifiers have the better performance. The results of the CRBF-based and SRBF-based BDT-SMO classification of hyperspectral sensing images of land usages are provided in Figures 8 and 9. In addition to the overall classification results, the accuracy of each individual class of land usages is also improved, as shown in Tables 8 and 9. From the tables above, it can be seen that CRBF can more accurately classify each class of land usage. Especially for the four difficult classes—C3 (garden), C5 (bare land), C8 (public administration/commerce and service/land concerning foreign affairs), and C10 (communications and transportation land), the classification accuracy is improved enormously, and the producers’ accuracy (PA) is up to 80.04%–80.51%, 66.17%–67.84%, 83.13%–84.76%, and 82.54%–84.07%, which is 3.99%–5.93%, 2.26%–2.78%, 3.37%–5.21%, and 3.13%–4.44% higher than the single RBF kernel. Therefore, the CRBF-based BDT-SMO classifiers can classify the land usages using hyperspectral fused images in experimental zones more effectively.
To more intuitively show the benefit of CRBF for classification of C3, C5, C8, and C10, 4 instances are selected, respectively, from the two zones for verification. BDT-SMO classification results of these instances under the single RBF and CRBF are provided in Table 10.
From Table 10, it can be seen that the 8 positions are all misclassified when the SRBF-based BDT-SMO classifier is used and are all classified correctly when the CRBF-based BDT-SMO classifier is used. Therefore, the CRBF-based BDT-SMO classifier is excellent enough to classify land usage using hyperspectral fused images effectively and efficiently.
In order to improve the accuracy and efficiency of the classification of hyperspectral fused images, a new combined radial basis kernel function (CRBF) is proposed by combining spectral and textural features in a weighted manner. The binary decision tree-based multiclass SMO (BDT-SMO) based on the kernel function of CRBF is used in the classification of hyperspectral fused images, and the parameters of CRBF-based BDT-SMO classifiers are optimized by genetic algorithms (GA). Experimental results show that, compared with SRBF, CRBF-based BDT-SMO classifiers have better accuracy and efficiency in the classification of hyperspectral fused images.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
The work was supported by the National Natural Science Foundation of China (no. 41201417) and the Science and Technology Project of Fujian Province Education Department of China (no. JA13364). The authors would like to thank Zhilei Lin and Professor Luming Yan for useful assistance, suggestions, and discussions.
V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY. USA, 1998.View at: MathSciNet
N. Xue, “Comparison of multi-class support vector machines,” Computer Engineering and Design, vol. 32, no. 5, pp. 1792–1795, 2011.View at: Google Scholar
T. N. Do, V.-H. Nguyen, and F. Poulet, “GPU-based parallel SVM algorithm,” Journal of Frontiers of Computer Science and Technology, vol. 3, no. 4, pp. 368–377, 2009.View at: Google Scholar
J. C. Platt, Fast Training of support Vector Machines Using Sequential Minimal Optimization, MIT Press, Cambridge, Mass, USA, 1999.
T. G. Dietterich and G. Bakiri, “Solving multiclass learning problems via error-correcting output codes,” Journal of Artificial Intelligence Research, no. 2, pp. 263–286, 1995.View at: Google Scholar
Y. Liu, “Using SVM and error-correcting codes for multiclass dialog act classification in meeting corpus,” in Proceedings of the 9th International Conference on Spoken Language Processing, INTERSPEECH (ICSLP '06), pp. 1938–1941, Pittsburgh, Pa, USA, September 2006.View at: Google Scholar
Q. Ai, Y. Qin, and J. Zhao, “An improved directed acyclic graphs support vector machine,” Computer Engineering and Science, vol. 33, no. 10, pp. 145–148, 2011.View at: Google Scholar
J. C. Platt, N. Cristianini, and J. Shawe-Taylor, “Large margin DAGs for multiclass classification,” Proceedings of Advances in Neural Information Processing Systems, vol. 12, pp. 547–553, 2000.View at: Google Scholar
K. Li, Q. Ren, B. Wen et al., “Application of separation degree and binary decision tree-based multi-class SVM in classification and recognition of weld defects,” Journal of Sichuan University (Natural Science), no. 3, pp. 520–524, 2010.View at: Google Scholar
X. Wang and Y. Qin, “Research on SVM multi-class classification based on binary tree,” Journal of Hunan Institute of Engineering, vol. 18, no. 3, pp. 68–70, 2008.View at: Google Scholar
F. Huang, “Research on classification of hyperspectral remote sensing imagery based on BDT-SMO and combined features,” Journal of Multimedia, vol. 9, no. 3, pp. 456–462, 2014.View at: Google Scholar
G. Wang, “Properties and construction methods of kernel in support vector machine,” Computer Science, vol. 33, no. 6, pp. 172–178, 2006.View at: Google Scholar
G. Feng, “Parameter optimizing for Support Vector Machines classification,” Computer Engineering and Applications, vol. 47, no. 3, pp. 123–124, 2011.View at: Google Scholar
M. Zhong, “Research on intelligent schedule of public traffic vehicles based on parallel genetic algorithm,” Computer Era, no. 12, pp. 18–20, 2011.View at: Google Scholar
J. Hong, “The texture analysis methods of gray gradient co-occurrence matrix,” Acta Automatica Sinica, vol. 10, no. 1, pp. 22–25, 1984.View at: Google Scholar