Computational Intelligence and Neuroscience

Volume 2017, Article ID 4629534, 9 pages

https://doi.org/10.1155/2017/4629534

## Ranking Support Vector Machine with Kernel Approximation

^{1}National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha, China^{2}College of Computer, National University of Defense Technology, Changsha, China

Correspondence should be addressed to Rongchun Li; nc.ude.tdun@ilnuhcgnor

Received 18 November 2016; Revised 12 January 2017; Accepted 18 January 2017; Published 13 February 2017

Academic Editor: Carlos M. Travieso-González

Copyright © 2017 Kai Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.

#### 1. Introduction

Learning to rank is an important research area in machine learning. It has attracted the interests of many researchers because of its growing application in areas like information retrieval systems [1], recommender systems [2, 3], machine translation, and computational biology [4]. For example, in document retrieval domain, a ranking model is trained based on the training data of some queries. Each query contains a group of corresponding retrieved documents and their relevance levels labeled by humans. When a new query arrives for prediction, the trained model is used to rank the corresponding retrieved documents for the query.

Many types of machine learning algorithms have been proposed for the ranking problem. Among them, RankSVM [5], which is extended from the basic support vector machine (SVM) [6], is one of the commonly used methods. The basic idea of RankSVM is transforming the ranking problem into pairwise classification problem. The early implementation of RanSVM [7] was slow because the explicit pairwise transformation led a large number of the training samples. In order to accelerate the training process, [8] proposed a primal Newton method algorithm to solve the linear RankSVM-struct problem without the need of explicit pairwise transformation. And [9] proposed the RankSVM based on the structured output learning framework.

As with the SVM, kernel trick can be used to generalize the linear ranking problem to nonlinear case for RankSVM [7, 9]. Kernel RankSVM can give higher accuracy than the linear RankSVM for complex nonlinear ranking problem [10]. The nonlinear kernel can map the original features into some high-dimensional space where the nonlinear problem can be ranked linearly. However, the training time of kernel RankSVM dramatically grows as the training data set increases in size. The computational complexity is at least quadratic in the number of training examples because of the calculation of kernel matrix. Kernel approximation is an efficient way to solve the above problem. It can avoid computing kernel matrix by explicitly generating a vector representation of data that approximates the kernel similarity between any two data points.

The approximation methods can be classified into two categories: the Nyström method [11, 12] and random Fourier features [13, 14]. The Nyström method approximates the kernel matrix by a low rank matrix. The random Fourier features method approximates the shift-invariant kernel based on Fourier transformation of nonnegative measure [15]. In this paper, we use the kernel approximation method to solve the problem of lengthy training time of kernel RankSVM.

To the best of our knowledge, this is the first work using the kernel approximation method to solve the learning to rank problem. We use two types of approximation methods, namely, the Nyström method or random Fourier features, to map the features into high-dimensional space. After the approximation mapping, primal truncated Newton method is used to optimize pairwise L2-loss (squared Hinge-loss) function of the RankSVM model. Experimental results demonstrate that our proposed method can achieve high performance and fast training speed than the kernel RankSVM. Compared to state-of-the-art ranking algorithms, our proposed method can also get comparable or better performance. Matlab code for our algorithm is available online (https://github.com/KaenChan/rank-kernel-appr).

#### 2. Background and Related Works

In this section, we present the background and related works of learning to rank algorithm and RankSVM.

##### 2.1. Learning to Rank Algorithms

Learning to rank algorithms can be classified into three categories: pointwise approach, pairwise approach, and list-wise approach.(i)Pointwise: it transforms the ranking problem into regression or classification on single objects. Then existing regression or classification algorithms are directly applied to model the labels of single objects. This approach includes McRank [16] and OC SVM [17].(ii)Pairwise: it transforms the ranking problem into regression or classification on object pairs. It can model the preferences within the object pairs. This approach includes RankSVM [5] and RankBoost [18].(iii)List-wise: it takes ranking lists as instances in both learning and prediction and can optimize the list-wise loss function directly. This approach includes ListNet [19], AdaRank [20], BoltzRank [21], and SVM MAP [22].In this paper, we focus on the pairwise ranking algorithm based on SVM.

##### 2.2. Linear RankSVM

Linear RankSVM is a commonly used pairwise ranking algorithm [5]. For the web search problem with queries and a set of documents of each query, features are extracted from the query-document pair (, ) and label is the relevance level of the to the query . Thus, the training data is a set of label-query-instance tuples . Let denote the set of preference pairs. If , and are in the same query () and is preferred over (). The goal of linear RankSVM is to get a ranking functionsuch that , , and .

RankSVM has a good generalization due to the margin-maximization property. According to [27], the margin is defined as the closest distance between two data points when the data points project to the ranking vector :Maximizing the margin is good because data point pairs with small margins represent very uncertain ranking decisions. RankSVM can guarantee to find a ranking vector with the maximum margin [27]. Figure 1 shows the margin-maximization of four data points for linear RankSVM. The weights of two linear ranking, namely, and , can both rank the four data correctly. But generalizes better than because the margin of is larger than the margin of .