Journal of Electrical and Computer Engineering

Volume 2018, Article ID 8745251, 5 pages

https://doi.org/10.1155/2018/8745251

## A New Feature Extraction Algorithm Based on Orthogonal Regularized Kernel CCA and Its Application

^{1}College of Ocean Information Engineering, Hainan Tropical Ocean University, Sanya, China^{2}Henan Xuehang Education and Information Service Co., Zhengzhou, China^{3}Shanghai Renyi Technology Co., Ltd., Shanghai, China

Correspondence should be addressed to Fugeng Zeng; moc.liamxof@gnegufgnez

Received 3 April 2018; Revised 21 July 2018; Accepted 23 August 2018; Published 29 October 2018

Academic Editor: Jar Ferr Yang

Copyright © 2018 Xinchen Guo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In this paper, an orthogonal regularized kernel canonical correlation analysis algorithm (ORKCCA) is proposed. ORCCA algorithm can deal with the linear relationships between two groups of random variables. But if the linear relationships between two groups of random variables do not exist, the performance of ORCCA algorithm will not work well. Linear orthogonal regularized CCA algorithm is extended to nonlinear space by introducing the kernel method into CCA. Simulation experimental results on both artificial and handwritten numerals databases show that the proposed method outperforms ORCCA for the nonlinear problems.

#### 1. Introduction

Canonical correlation analysis (CCA) is a technique of multivariate statistical analysis, which deals with the mutual relationships of two sets of variables [1–3]. This method extracts the representative variables which are the linear combination of the variables in each group. The relationships between new variables can reflect the overall relationships between two groups of variables [4].

The orthogonal regularization canonical correlation analysis (ORCCA) algorithm [5] is that the original formula of CCA algorithm with orthogonal constraints is substituted for CCA conjugate orthogonalization [6, 7]. When the number of samples is less and the sample distribution patterns of different classifications are different, the ORCCA algorithm has the better ability of classification. A suboptimal solution to eigenvalue decomposition problem can be obtained by introducing two regularization parameters [8]. So, the complexity of time and space for the quadratic optimization problem should be considered at the same time. ORCCA algorithm is the same as CCA algorithm that both their goals look for the linear combinations of the variables in each group. But when the nonlinear relationships between the variables exist, ORCCA algorithm cannot extract effectively the comprehensive variables.

In this paper, the kernel method [9–11] is introduced into ORCCA algorithm, and ORKCCA algorithm is presented. The kernel method maps the linear inseparable data in the low-dimensional space into a higher-dimensional space [12, 13]. In the higher-dimensional space, the characteristics of the data can be extracted and analyzed through the linear method. By introducing kernel function, the computation of the orthogonal regularization canonical correlation analysis extends to a nonlinear feature space. Experimental results show that the accuracies of classification of our method in the nonlinear space are significantly improved. The experimental results show ORKCCA is feasible.

#### 2. Orthogonal Regularized CCA Algorithm

Given *n* pairs of pairwise samples and , where , . We assume that the samples have been centered. ORCCA algorithm aims at finding a pair of projection directions and which satisfy the following optimal problem [5].

The objective function in Equations (1) can be expanded as follows:where , , and .

The optimal model in Equation (1) can be rewritten as

According to the Lagrange multipliers method, Lagrange function is as follows:where both and are Lagrange multipliers.

The solutions to Equation (4) are given as follows:where and denote identity matrices of size *p* *p* and *q* *q*, respectively.

Both and in Equations (5) and (6) are called regularization parameters. By solving Equation (5), the eigenvalues and their corresponding eigenvectors can be obtained. The eigenvalues and their corresponding eigenvectors can be obtained from Equation (6).

#### 3. Orthogonal Regularized Kernel CCA Algorithm (ORKCCA)

ORCCA algorithm can give the linear relationships between two groups of random variables. But if the linear relationships between two groups of random variables do not exist, the performance of ORCCA will not work well. The kernel method is an effective way to analyze the nonlinear pattern problem. So, the kernel method is introduced into ORCCA algorithm, and ORKCCA algorithm is proposed.

Both and are nonlinear mappings which map original random variables and into and in *P*-dimensional space (*P* > *p*) and *Q*-dimensional space (*Q* > *q*), . Let , , where , .

ORCCA is implemented in higher-dimensional spaces and . So, Equation (7) can be obtained by substituting , , , and into Equation (1) as follows:

Expanding the objective function in Equation (7), we get

Applying the kernel trick to Equation (8), and can be computed, namely, , where is kernel function. Centralization is exerted on and . The optimal model in which the kernel method is introduced can be given by using Equation (9):where , , and .

According to the Lagrange multiplier method, the Lagrange function is as followswhere and are Lagrange multipliers. Taking the partial derivatives of with respect to and and letting them zero, we getwhere and are positive semidefinite matrices and and are positive numbers.

So, and can be obtained from Equation (11):where and are the identity matrices of size *P* *P* and *Q* *Q*, respectively.

Equations (14) and (15) can be obtained through replacing and with their expressions in Equations (12) and (13), respectively.

As like before, both and in Equations (14) and (15) are called regularization parameters. By solving Equation (14), the eigenvalues and their corresponding eigenvectors can be obtained. The eigenvalues and their corresponding eigenvectors can be obtained from Equation (15).

#### 4. Simulation Experiments

In this section, we evaluate our method compared with ORCCA on artificial and handwritten numerals databases.

##### 4.1. Experiment on Artifical Databases

The pairwise samples and are generated from the expressions in Equations (16) and (17), respectively.where obeys uniform distribution on and and are Gaussian noise with standard deviation 0.05. The radial basis function is chosen as kernel function, where .

###### 4.1.1. Determining Regularization Parameters

For the selection of the regularization parameters, by far there is no reliable method to determine the optimal values. In this paper, in order to simplify the calculation, let and . The regularization parameters were chosen from 10^{−5}, 10^{−4}, 10^{−3}, 10^{−2}, 10^{−1}, and 1. This method is used in the literature [5].

According to Equations (16) and (17), 100 pairs of data are randomly generated as the training samples. Canonical variables are calculated from the ORCCA and ORKCCA algorithms for the different values of regularization parameters. The correlation coefficients of canonical variables are sorted by the descending order. Many pairs of canonical variables can be gained from the two algorithms. For the sake of simplicity, the most representative of the former two groups of canonical variables are examined.

The average value of the correlation coefficients of the former two groups of canonical variables is regarded as criterion that judges the regularization parameters is good or not. The larger the average value is, the better the regularization parameters are.

Table 1 lists the average value of the correlation coefficients of the former two groups of canonical variables for the different values of the regularization parameters.