Complexity

Volume 2019, Article ID 5937274, 17 pages

https://doi.org/10.1155/2019/5937274

## Two-Phase Incremental Kernel PCA for Learning Massive or Online Datasets

^{1}School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China^{2}Shandong Co-Innovation Center of Future Intelligent Computing, Yantai, China^{3}BASIRA Lab, Faculty of Computer and Informatics, Istanbul Technical University, Istanbul, Turkey^{4}School of Science and Engineering, Computing, University of Dundee, UK^{5}Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea^{6}School of Electronic Engineering, Xian University of Posts and Telecommunications, Xi’an, China^{7}School of Computer Science and Engineering, Xidian University, Xi’an, China^{8}Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

Correspondence should be addressed to Dinggang Shen; ude.cnu.dem@nehsgd

Received 2 October 2018; Revised 17 December 2018; Accepted 8 January 2019; Published 11 February 2019

Guest Editor: Jose Garcia-Rodriguez

Copyright © 2019 Feng Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

As a powerful nonlinear feature extractor, kernel principal component analysis (KPCA) has been widely adopted in many machine learning applications. However, KPCA is usually performed in a batch mode, leading to some potential problems when handling massive or online datasets. To overcome this drawback of KPCA, in this paper, we propose a two-phase incremental KPCA (TP-IKPCA) algorithm which can incorporate data into KPCA in an incremental fashion. In the first phase, an incremental algorithm is developed to explicitly express the data in the kernel space. In the second phase, we extend an incremental principal component analysis (IPCA) to estimate the kernel principal components. Extensive experimental results on both synthesized and real datasets showed that the proposed TP-IKPCA produces similar principal components as conventional batch-based KPCA but is computationally faster than KPCA and its several incremental variants. Therefore, our algorithm can be applied to massive or online datasets where the batch method is not available.

#### 1. Introduction

As a conventional linear subspace analysis method, principal component analysis (PCA) can only produce linear subspace feature extractors [1], which are unsuitable for highly complex and nonlinear data distributions. In contrast, as a nonlinear extension of PCA, kernel principal component analysis (KPCA) [2] can capture the higher-order statistical information contained in data, thus producing nonlinear subspaces for better feature extraction performance. This has propelled the use of KPCA in a wide range of applications such as pattern recognition, statistical analysis, image processing, and so on [3–8]. Basically, KPCA firstly projects all samples from the input space into a kernel space using nonlinear mapping and then extracts the principal components (PCs) in the kernel space. In practice, such nonlinear mapping is performed implicitly via the “kernel trick”, where an appropriately chosen kernel function is used to evaluate the dot products of mapped samples without having to explicitly carry out the mapping. As a result, the extracted kernel principal component (KPC) of the mapped data is nonlinear with respect to the original input space.

Standard KPCA has some drawbacks which limit its practical applications when handling big or online datasets.* Firstly*, in the training stage, KPCA needs to store and compute the eigenvectors of a kernel matrix, where is the total number of samples. This computation results in a space complexity of and a time complexity of , thus rendering the evaluation of KPCA on large-scale datasets very time-consuming.* Secondly*, in the testing stage, the resulting kernel principal components have to be defined implicitly by the linear expression of the training data, and thus all the training data must be saved after training. For a massive dataset, this translates into high costs for storage resources and increases the computational burden during the utilization of kernel principal components (KPCs). Furthermore, KPCA is impractical for many real-world applications where online samples are progressively collected since it is used in a batch manner. This implies that each time new data arrive, KPCA has to be conducted from scratch.

To overcome these limitations, many promising methods have been proposed in the past few years. These methods can be grouped into two classes. The first class is the batch-based modeling method, which requires that all training data is available for estimating KPCs. Rosipal and Girolami proposed an EM algorithm for reducing the computational cost of KPCA [9]. However, the convergence behavior of the EM algorithm to KPCA cannot be guaranteed in theory. In [6], the kernel Hebbian algorithm (KHA) was proposed as an iterative variant of KPCA algorithm. By kernelizing the generalized Hebbian algorithm (GHA), KHA computes KPCA without storing the kernel matrix, such that large-scale datasets with high dimensionality can be processed. Nonetheless, KHA has a scalar gain parameter which is either held constant or decreased according to a predetermined annealing schedule, leading to slow convergence during the training stage. To improve the convergence of KHA, gain adaptation methods were developed by providing a separate gain for each eigenvector estimate [10]. An improved version of KPCA was proposed based on the eigenvalue decomposition of a symmetric matrix [11], where datasets are divided into multiple subsets, each of which is processed separately. One of the major drawbacks of this approach is that it requires storing the kernel matrix, which means that the space complexity could be extremely large for large-scale dataset. Another variant of conventional KPCA is greedy KPCA [12, 13], which was employed to approximate the KPCs by a prior filtering of the training data. However, prior filtering of the training data could be computationally expensive. Overall, compared with standard KPCA, these batch-based modeling methods can potentially reduce the time or space complexity to some degree. Unfortunately, such methods cannot handle online data.

The second class is incremental methods, which can compute KPCs incrementally to handle online data processing. Chin and Suter proposed an incremental version of KPCA [14, 15], which is called IKPCA-RS for the notational simplicity. In IKPCA-RS, singular value decomposition is used to update an eigenfeature space incrementally for incoming data. However, IKPCA-RS may lead to high time complexity especially when dealing with high-dimensional data. In [16, 17], an incremental KPCA was presented based on the empirical kernel map. It is more efficient in memory requirement than the standard KPCA. However, it is only an approximate method and only suitable for polynomial kernel function. Inspired by the incremental PCA algorithm proposed by Hall et al. [18], Kimura et al. presented an incremental KPCA algorithm [19] in which an incremental updating algorithm for eigenaxes is derived based on a set of linearly independent data. Subsequently, some modified versions are proposed by Ozawa and Takeuchi et al. [20, 21]. Furthermore, in order to incrementally deal with data streams which are given in a chunk of multiple samples at one time, other extensions of KPCA were also successively presented [22–24]. Hallgren and Northrop [25] proposed an incremental KPCA (INKPCA) by applying rank one updates to the eigendecomposition of kernel matrix. However, INKPCA needs to store the whole data when evaluated on a new sample. Notably, incremental methods have the capacity of integrating new data, initially unavailable, in some way that maintains nonincreasing memory. However, to the best of our knowledge, most of these methods operate in the kernel space where all the samples are* implicitly* represented. This has two key limitations.* First*, a number of incremental methods may suffer from high computational cost.* Second*, the others can only capture the approximate KPCs rather than the accurate ones, which may affect the accuracy of its subsequent process.

Before continuing, a note on mathematical notations is given as follows. We use lower case and upper case letters (e.g., ) to denote scalars, lower case letters with the subscript (e.g., ) to denote an element from a matrix or a vector, lower case bold letters (e.g., ) to denote vectors, and upper case bold letters (e.g., ) to denote matrices. We use () to denote the transpose of a vector (matrix) and to denote the L2-norm of a vector. Furthermore, we adopt to denote a set, to denote a matrix with column vectors and to denote a matrix composed of the corresponding element . In this paper, always denotes a column vector and the inner product between and is expressed as . The lower case bold letter denotes a nonlinear mapping. The mapped sample of the input sample is a column vector.

To address these limitations, we propose a two-phase incremental KPCA (TP-IKPCA), where the mapped data is represented in an* explicit* form and KPCs are updated in an* explicit* space. The computational cost of the whole process is very low and the accuracy of KPCs can be theoretically guaranteed. An overview of TP-IKPCA is briefly illustrated in Figure 1. In this figure, denotes the sample set in a* d*-dimensional input space and denotes the total number of available samples. Let denote the nonlinear mapping which maps the sample set into an* h*-dimensional implicit kernel space, resulting in the mapped sample set . Here,* h* may be very large or even infinite, depending on the specific mapping. The TP-IKPCA includes two phases.* In the first phase*, we develop an incremental algorithm to capture standard orthogonal basis of the subspace spanned by and then* explicitly* obtain the projection vectors of by where denotes the number of a standard orthogonal basis .* In the second phase*, the existing incremental method of PCA is employed to capture KPCs based on the explicit data in the projection space. In the following sections, we will detail how to incrementally express the implicit mapped data using an explicit form. We will also theoretically verify that performing PCA based on the implicitly mapped samples is equivalent to that of based on the explicit projection vectors .