Mathematical Problems in Engineering

Volume 2019, Article ID 6941475, 8 pages

https://doi.org/10.1155/2019/6941475

## Multiple Kernel Dimensionality Reduction via Ratio-Trace and Marginal Fisher Analysis

^{1}School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, Jiangsu, China^{2}School of Intelligent Manufacturing, Jiangsu Vocational Institute of Architectural Technology, Jiangsu, Xuzhou 221008, China

Correspondence should be addressed to Yongguo Yang; moc.liamtoh@88gnaygy and Mingming Liu; moc.621@mml_izjsj

Received 12 July 2018; Revised 20 November 2018; Accepted 11 December 2018; Published 14 January 2019

Academic Editor: Ezequiel López-Rubio

Copyright © 2019 Hui Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Traditional supervised multiple kernel learning (MKL) for dimensionality reduction is generally an extension of kernel discriminant analysis (KDA), which has some restrictive assumptions. In addition, they generally are based on graph embedding framework. A more general multiple kernel-based dimensionality reduction algorithm, called multiple kernel marginal Fisher analysis (MKL-MFA), is presented for supervised nonlinear dimensionality reduction combined with ratio-race optimization problem. MKL-MFA aims at relaxing the restrictive assumption that the data of each class is of a Gaussian distribution and finding an appropriate convex combination of several base kernels. To improve the efficiency of multiple kernel dimensionality reduction, the spectral regression frameworks are incorporated into the optimization model. Furthermore, the optimal weights of predefined base kernels can be obtained by solving a different convex optimization. Experimental results on benchmark datasets demonstrate that MKL-MFA outperforms the state-of-the-art supervised multiple kernel dimensionality reduction methods.

#### 1. Introduction

Recently, multiple kernel dimensionality reduction methods have been attracting many researchers, and a series of methods are proposed based on the graph embedding framework [1–7]. These methods generally transform primal data into high-dimensional feature spaces deduced by a set of base kernels, where a linear transformation is seeking to perform dimensionality reduction. Consequently, these nonlinear dimensionality reduction methods not only deal with high-dimensional data effectively, but automatically select optimal kernels by predefining a set of base kernels. It has been demonstrated that multiple kernel method performs better than single kernel-based methods in dimensionality reduction.

Although existing multiple kernel dimensionality reduction methods are significantly superior to single kernel-based dimensionality reduction methods, they are still confronted with challenging issues. Firstly, these algorithms have to iteratively solve the time-consuming generalized eigenvalue problem, which is a part of alternative optimization methods. Secondly, in addition, it generally transforms the primal model into the simple form by relaxing the SDP (Semidefinite Programming) problem or utilizes gradient descent algorithms to obtain local optima, which all could have a negative effect on its performance. To overcome the shortcomings mentioned above, some multiple kernel dimensionality reduction algorithms based on spectral regression were proposed recently [8–10]. They transform eigen-decomposition of dense matrices into a linear regression problem by means of spectral regression. However, they still make good use of the convex relaxation or gradient descent to optimize the kernel weights. Instead of convex relaxation, a multiple kernel learning framework was recently proposed to avoid relaxing the primal problem [11], which learns a transformation into a space of lower dimension by converting a ratio-trace maximization problem into a semi-infinite linear program. But this method still needs to iteratively compute generalized eigen-decomposition of dense matrices. In addition, these methods are regarded as multiple kernel versions of KDA and unified under the graph embedding framework. Thus, they all have the assumption that the distribution of each class is considered to be a unimodal Gaussian. This property often does not exist in real world applications and separability of the different classes cannot be well characterized by interclass scatter. Although kernel marginal Fisher analysis (KMFA) has been developed to overcome this limitation by using an intrinsic graph and another penalty graph [12], it has to choose the kernel type and determine its parameters beforehand.

Motivated by these methods, in this paper, a new multiple kernel dimensionality reduction algorithm, called multiple kernel marginal fisher analysis (MKMFA), is presented for supervised nonlinear dimensionality reduction. MKMFA not only solves the problem of the restrictive assumption of existing multiple kernel dimensionality reduction methods, but has the ability of automatically constructing appropriate kernels for nonlinear dimensionality reduction by means of the ratio-trace model. Furthermore, spectral regression is used to address the issue of dense metrics decomposition and speed up the learning of MKMFA. Finally, as other multiple kernel-based dimensionality reduction methods would do, it can also solve the out-of-sample extension problem.

#### 2. Related Work

##### 2.1. Marginal Fisher Analysis

Graph framework is a general platform designing for dimensionality reduction algorithms, and ISOMAP, LLE, and Laplace feature mapping algorithm can be derived from it. With this framework, we develop a new dimensionality reduction algorithm, in order to avoid limitations of traditional linear discriminant analysis in data distribution assumption and available projection direction.

The assumption of the linear discriminant analysis algorithm is that the data of each class is Gaussian distribution, which is usually nonexistent in practical problems. Without this property, the separability of different classes cannot be characterized by interclass scatter. This limitation of LDA can be overcome by developing new standards that are characterized by intraclass compactness and interclass separability. To this end, we propose a new algorithm using the graph embedding framework which is called marginal Fisher analysis (MFA). We design an intrinsic graph with the characteristics of intraclass compactness and another penalty graph characterized by interclass separability. Specifically, the intrinsic graph illustrates the adjacency relationship of the intraclass point, and the connection of each sample to the nearest neighbor in the same class. The penalty graph describes the adjacency relationship of the interclass marginal point and the marginal point pairs of different categories.

By following the graph embedding formulation, intraclass compactness is characterized from the intrinsic graph by the term [11–13]where Here, indicates the index set of the nearest neighbors of the sample in the same class. Interclass separability is characterized by a penalty graph with the term [11–13]where

Here, is a set of data pairs that are the nearest pairs among the set , where denotes the index set of the samples belonging to the* c*th class. The algorithmic procedure of marginal Fisher analysis algorithm is formally stated as follows [11–13]:

Firstly, project the data set into PCA subspace by preserving dimensions or a certain energy. The transformation matrix of PCA was represented by .

Construct the intraclass compactness and interclass separability graphs. In the intraclass compactness graph, for each sample , set the adjacency matrix if is among the -nearest neighbors of in the same class. In the interclass separability graph, for each class , set the similarity matrix if the pair is among the shortest pairs among the set.

Marginal Fisher Criterion. From the linearization of the graph embedding framework, we have the Marginal Fisher Criterionwhich is a special linearization of the graph embedding framework with

Output the final linear projection direction as

##### 2.2. Ratio-Trace Optimization Problem

For any two symmetric positive semidefinite matrices and , the ratio-trace problem is defined as [14]

For a given kernel function , the kernelized versions of these algorithms solve the following ratio-trace problem:where is a transformation matrix, is the kernel matrix with , and (0,1) is a regularization parameter used to prevent overfitting. and are (algorithm-dependent) symmetric positive semidefinite matrices. The optimal solution to (9) is given by the generalized eigenvectors corresponding to the nonzero generalized Eigenvalues:Once is obtained, the new representation for a data sample can be computed using

#### 3. Multiple Marginal Fisher Analysis Kernel Dimensionality Reduction via Ratio-Trace

##### 3.1. Kernel Marginal Fisher Analysis via Ratio-Trace

The kernel trick is widely used to improve the separation ability of a linear supervised dimensionality reduction algorithm. By using the kernel trick, the marginal Fisher analysis can be further improved. By replacing and by and , respectively, problem (9) can be rewritten as follows:

Note that the graphs of kernel marginal Fisher analysis (KMFA) may be different from MFA because the nearest neighbors for each sample in KMFA is different from one in MFA. In each class the nearest in-class neighbors of each sample and the closest out-of-class sample pairs can be measured through the use of the kernel mapping function from the original feature space to the higher dimensional Hilbert space. The distance between sample and sample can be obtained by the following formula:

##### 3.2. Multiple Kernel Marginal Fisher Analysis Dimensionality Reduction

In this section, a multiple kernel Fisher analysis framework is presented to incorporate spectral regression and ratio-trace into multiple kernel learning for dimensionality reduction. On one hand, spectral regression does not increase speed at the cost of some accuracy. On the other hand, the ratio-trace optimization algorithm can avoid conventional convex relaxation or gradient descent optimization method. The formulation of multiple kernel learning with MFA and ratio-trace will be illustrated, which not only combines multiple kernel dimensionality reduction with MFA, but selects optimal kernels more effectively than other multiple kernel dimensionality reduction methods by semi-infinite linear program (SLIP).

In the MKL framework, the kernel function is parametrized as a linear combination of predefined base kernels :where and the weights are learned from the data. Under the kernel marginal Fisher analysis framework based on ratio-trace, a multiple kernel variant of KMFA is deduced by combining MFA with MKL, termed as MKMFA, which is formulated as the following optimization problem:

Given the input data point , where and is the class label of . Denote as the training data matrix. The detailed steps of MKMFA are given as follows:

*Step 1. *Constructing the intraclass compactness graph and interclass separability graph .

*Step 2. *We extend the Marginal Fisher Criterion to the multiple kernel case in the following way:

Firstly, intraclass compactness is characterized from the intrinsic graph by the termwhere , , and is a diagonal matrix with the diagonal elements defined as .

Secondly, interclass separability is characterized by a penalty graph with the termwhere is the degree matrix of .

To obtain a multidimensional projection, we consider a set of c sample coefficient vectors, denoted by . Finally, Multiple Kernel Marginal Fisher Criterion can be denoted as follows:where is a regularization parameter used to prevent overfitting.

*Step 3. *Assume the ranks of and are and , respectively. Let and be the nonzero Eigenvalue-Eigenvector pairs of and , respectively. We can obtain the optimal by solving the following semi-infinite linear program [15]:where being M functions defined and and for .

*Step 4. *Solve the ratio-trace problem (18) using spectral regression to get optimal . Since and are all sparse matrices, we can use spectral regression to obtain in the following way:

Find the largest generalized eigenvectors of the following eigen-problem:

Find () by solving the following least squares regression:where is the* i*-th element of .

Algorithm 1 summarizes the algorithm for solving (18). This iterative algorithm is referred to as MKL-MFA. The alternating algorithm for solving the proposed SILP problem belongs to a family of algorithms for solving general semi-infinite programming problems called the exchange methods, in which the constraints are exchanged at each iteration. These methods have been guaranteed to converge [16].