Computational Intelligence and Neuroscience

Volume 2018 (2018), Article ID 1018789, 7 pages

https://doi.org/10.1155/2018/1018789

## A Multiple Kernel Learning Model Based on -Norm

^{1}School of Information, Renmin University of China, Beijing 100872, China^{2}School of Computer Science and Technology, Huaiyin Normal University, Huai’an, Jiangsu 223300, China

Correspondence should be addressed to Xun Liang; moc.361@gnail__nux

Received 29 July 2017; Revised 7 December 2017; Accepted 24 December 2017; Published 23 January 2018

Academic Editor: Toshihisa Tanaka

Copyright © 2018 Jinshan Qi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

By utilizing kernel functions, support vector machines (SVMs) successfully solve the linearly inseparable problems. Subsequently, its applicable areas have been greatly extended. Using multiple kernels (MKs) to improve the SVM classification accuracy has been a hot topic in the SVM research society for several years. However, most MK learning (MKL) methods employ -norm constraint on the kernel combination weights, which forms a sparse yet nonsmooth solution for the kernel weights. Alternatively, the -norm constraint on the kernel weights keeps all information in the base kernels. Nonetheless, the solution of -norm constraint MKL is nonsparse and sensitive to the noise. Recently, some scholars presented an efficient sparse generalized MKL (- and -norms based GMKL) method, in which established an elastic constraint on the kernel weights. In this paper, we further extend the GMKL to a more generalized MKL method based on the -norm, by joining - and -norms. Consequently, the - and -norms based GMKL is a special case in our method when . Experiments demonstrated that our - and -norms based MKL offers a higher accuracy than the - and -norms based GMKL in the classification, while keeping the properties of the - and -norms based on GMKL.

#### 1. Introduction of MKL

The support vector machine (SVM) is a classification and regression tool based on the statistical machine learning [1]. By utilizing the kernel function, the SVM transfers the data into a high dimension space, builds an optimal separating hyperplane, and consequently solves the nonlinear problem. In solving an SVM problem, it is critical to choose an adequate kernel function. The widely used kernel functions are the radial basis functions and polynomial functions. To select an effective kernel function is very important, and different kernels and parameters produce different classification and regression results. In our paper, we try to use the features of different kernels and improve the classification accuracy of SVM.

The multiple kernel learning (MKL) model [2] is a flexible learning model. In the recent research, the MK learning (MKL) can obtain higher classification accuracy than the sole one. As the MKL uses different combinations of kernel functions and has larger flexibility, its performance is normally better. Constructing the MK model, in fact, is the process of seeking the combination of kernels to get the best classification accuracy. Thus, in the MK framework, to seek the weights of the different kernels is the big problem for MKL [3, 4]. The simplest form of MKL is norm [5]. The -norm MKL finds the kernel weight in a simplex form and thus yields a sparse solution [6, 7]. The sparsity of selected kernels is helpful in identifying an appropriate combination of data sources or subsets with different features in real world applications. However, the method may discard useful information and thus result in a suboptimal generalization.

Alternatively, the -norm MKL was proposed by another group of researchers, and it improves -norm MKL in some scenarios. Unfortunately, the solution of -norm MKL is nonsparse, which means it uses all kernels in the forecasting stage. Also, the -norm MKL is sensitive to noise. Additionally, when there exist noisy data in the training set, the classification accuracy would be greatly decreased. Furthermore, it suffers poor interpretation and can lead to high computational and storage cost, too.

Thus, there is research intending to combine the -norm MKL and -norm MKL. The algorithm is called the generalized MKL (GMKL) [8], which combines both advantages of - and -norms and is able to have a higher accuracy in classifications. Nonetheless, the GMKL algorithm is just specialized in the combination of the sparse MKL method and the nonsparse kernel learning method, -norm MKL. The research made a contribution to the merging of the - and -norm MKL, and the GMKL in a general model [9]. In this paper, we extend the algorithm in a more general form, which combines the sparse MKL and* all *nonsparse MKL algorithms. Thus, we would like to generalize the -norm MKL to the -norm.

In our paper, we combine - and -norms [10], by extending the constraint of kernels as . We call our algorithm MKL based on -norm (MKL-BP). In particular, when the MKL-BP algorithm will be degenerated into the GMKL algorithm. In our experiments, when , the accuracy of our algorithm tends to be stable and is higher than the results with . Meantime, compared with the - and -norm MKL method, the MKL-BP shows the higher accuracy in the classifications too. The advantage of using norms is that more flexibility can be achieved during the experiments. As changes, the generalization and precision vary accordingly.

The paper is organized as follows: Section 2 describes in detail the MKL-BP model. Section 3 analyzes and verifies the relevant definitions and theorems of MKL-BP model. The implementation solution of MKL-BP model is described in Section 4. Section 5 uses the MKL-BP model to carry out experiments on the UCI datasets and compares its accuracy, running time, and so on with those of other MKL models. Section 6 concludes this research with directions for future work.

#### 2. Base Framework of MKL-BP

Based on the statistics machine learning in the classification problem, we can get the general model below:

The smallest empirical risk is , while the smallest regulation risk is . The parameter is a presetting constant, used for balancing the empirical and regulation risks.

In the -SVM, the model could be shown as

By optimizing problem (2), the classifier could be shown as

Using the Langrage function and kernel , , we could get the dual form of problem (2):

Problem (4) is a simplest form of SVM. In the MKL model, kernel is combined with a series of kernels linearly. The kernel is shown as

In (5), refers to the weight of kernel , and refers to the number of kernels. By using (5) and replacing in (4), we can get the standard form of MKL:where and refers to the constraint domain of . In the MKL model, the simplest domain is the -norm MKL, where . The research shows that in the - and -norm MKLs, where , there is better classification character in some aspects.

The research combined the - and -norm MKLs, and the GMKL model. The paper showed that the novel model keeps the sparsity of the -norm MKL and the classification accuracy does not decrease when facing the noisy data. Domain in the GMKL model is . The setting constant is used to balance the - and -norm MKLs, and . The experiments showed that when , the model gets the best classification accuracy.

However, the paper just specialized the sparse and nonsparse MKL models. In this paper, we would like to generalize the model. Concretely, we generalize domain as . We called our model the MKL based on -norm (MKL-BP).

We would like to bring the character of our model in the next paragraph, where we will show the model keeping the character of GMKL. Then we give the algorithm of the model to solve the high dimensional constraint problem. We would make some simulation experiments to show the classification accuracy, running time, and used kernel of our model, compared with different models.

#### 3. Theorem of MKL-BP

Theorem 1. *Not all the kernels are selected in the MKL-BP model, and of the selected kernels are unique.*

*Proof. *By fixing as , we could easily know that the optimizing result of in (6) would be irrelevant to . We use the Langrage function and getBy trying to get the partial derivatives of , we get thatBy setting , we get :Considering when in (9) is below zero, we set asFrom (10), we could easily find that when , we get . So not all kernels would be selected in the model when . Thus, our model successfully selects the useful kernels in optimization. Also, from (10), the optimization result of is unique in our model.

Specially, when , the algorithm is degenerated into the -norm MKL, and we getWe find that all , which indicates that all kernels are selected in the -norm MKL, so it would not discard useful kernels in the optimization. However, the model would not get high accuracy in prediction when faced with noisy data. Also in that scenario, the model may cause higher computational complexity.

*Definition 2 (similar kernel). *With the optimization of (4) and , if the selected kernels and correspond to the formula below, we call them similar kernels:

Theorem 3. *Similar kernels would get the same kernel weights when approaches the limit.*

*Proof. *We calculate as below:When approaches to the limit, . Theorem 3 indicates that when approaches the limit, among different kernels would be very small, and thus the classification accuracy does not change.

#### 4. Solution of MKL-BP

Although we have presented the MKL-BP model, it is still hard to optimize problem (6). Problem (6) is quadratic programming with a high dimension constraint. In the GMKL algorithm, [11] used the level method to solve the problem. However, in our model, the constraint is -dimensional and the method in [11] does not work. So, we resort to the Taylor expansion method to solve the problem approximately.

We use the coordinate decreasing method to solve the problem in the iteration; we fix or , then solve the subproblem, and finally update or .

*Process 1. *Update by fixing* u*. At the first time, is initialed as the approximate solution of ; (6) turns to a standard SVM problem below:

Number refers to the iteration time of algorithm. We employ the SMO algorithm to solve this standard problem.

*Process 2. *Update by fixing ; (6) turns to quadratic programming with a high dimensional constraint. Then use the Taylor expansion to decrease the dimension:By using the transformation in (15), the constraint turns toNow with the Taylor expansion, we successfully changed the high dimensional constraint to a quadratic constraint. Next, we use the level method and CVX toolbox as the GMKL to solve the problem in Process 2. CVX toolbox is a useful MATLAB toolbox in solving many mathematic problems.

*Process 3. *Update or until the stop criterion is satisfied. The stop criterion is that the program has reached the iteration time or the changes of the objective function have reached the threshold.

We could find that when , we successfully changed the problem to the GMKL, so the complexity is the same as that of GMKL. And according to [8], the complexity of GMKL is , when is the threshold of solution.

#### 5. Experiments

In this section we use the UCI data to evaluate the classification accuracies in different algorithms.

We evaluate the following algorithm:

*(1) Ave-Kernel*. We use a base combination of the kernels. The weights of base combination of kernels are . We use the standard SVM solver to solve the Ave-kernel.

*(2) Simple-MKL*. It is a traditional -MKL model, which is a useful comparison algorithm in many papers.

*(3)**-MKL*. The constraint of the kernel weight is ; in our paper we set as -MKL.

*(4) GMKL*. The constraint of kernel weights is , and in our paper, we set as the GMKL.

To be consistent with the past work, all the solvers of the SVM QP are from the LibSVM QP solver. For updating and solving kernel weights, we use the CVX toolbox.

For the SVM parameter , we set it as 100. For the MKL-BP algorithm in our paper, the parameter settings are as below:

The setting of parameter is 2, 3, 4, 5, 6, 7, 8, 16, 32, 64. When , the algorithm is degenerated to the GMKL. The setting of the parameter is 0.5 as the MKL-BP.

We will use the UCI database to analyze our MKL-BP algorithm; the experiment used 5 UCI datasets. The format of the datasets is given in Table 1.