Mathematical Problems in Engineering

Volume 2018 (2018), Article ID 6598025, 7 pages

https://doi.org/10.1155/2018/6598025

## Extrinsic Least Squares Regression with Closed-Form Solution on Product Grassmann Manifold for Video-Based Recognition

^{1}Beijing Key Lab of Multimedia and Intelligent Software Technology, Faculty of Information Technology, Beijing University of Technology, 100 Pingleyuan, Chaoyang District, Beijing 100124, China^{2}School of Software Technology, Dalian University of Technology, No. 2 Linggong Road, Ganjingzi District, Dalian 116024, China

Correspondence should be addressed to Lichun Wang

Received 21 August 2017; Accepted 30 January 2018; Published 1 March 2018

Academic Editor: Simone Bianco

Copyright © 2018 Yuping Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

Least squares regression is a fundamental tool in statistical analysis and is more effective than some complicated models with small number of training samples. Representing multidimensional data with product Grassmann manifold has recently led to notable results in various visual recognition tasks. This paper proposes extrinsic least squares regression with Projection Metric on product Grassmann manifold by embedding Grassmann manifold into the space of symmetric matrices via an isometric mapping. The proposed regression has closed-form solution which is more accurate compared with numerical solution of previous least squares regression using geodesic distance. Experiments on several recognition tasks show that the proposed method achieves considerable accuracy in comparison with some state-of-the-art methods.

#### 1. Introduction

As an important application of computer vision, video-based recognition such as action recognition [1] attracts more and more attention. For inferring the correct label of a query in a given database of examples, there are mainly two kinds of methods. One kind approach is based on representations with the handcrafted features and the other kind is based on deep learning architectures such as Convolutional Neural Networks (CNN) [2]. Generally speaking, deep learning algorithms have been shown to be successful when large amount of data is available [3, 4]. However, the size of database for many recognition tasks in daily life is small. In this case, deep learning algorithms lose efficacy and it becomes important to analyze the structure of data and represent it with discriminant features.

Nowadays, Grassmann manifold has proven a powerful representation for video-based applications like activity classification [5], action recognition [6], age estimation [7], face recognition [8, 9], and so on. In the above applications, Grassmann manifold is used to characterize the intrinsic geometry of data. Taking one representative work as an example, Lui [10] factorized a data tensor using Higher Order Singular Value Decomposition (HOSVD) and imposed each factorized element on a Grassmann manifold. This representation yields a very discriminating structure for action recognition.

Inference on manifold spaces can be achieved extrinsically by embedding manifold into Euclidean space, which can be considered as flattening the manifold. In the literature, the most popular choice for embedding manifold is through considering tangent spaces [11, 12]. For example, Lui [10] presented a least squares regression on product Grassmann manifold, in which the weighted average from the training samples was computed in tangent space and was projected back to Grassmann manifold by standard logarithmic and exponential map. The distance between points to the tangent pole is equal to geodesic distance, which is restrictive and may lead to inaccurate modeling. An alternate method considers embedding Grassmann manifold into space of symmetric matrices by a diffeomorphism [13] and uses Projection Metric [14] which is equal to the true Grassmann geodesic distance up to a scale of .

In this paper, by representing multidimensional data on product Grassmann manifold with same form as Lui [10], we propose an extrinsic least squares regression on product Grassmann manifold using Projection Metric and give a closed-form solution which is more accurate. Least squares regression as a simple statistical model has many advantages such as simple calculation and being more effective than some complicated models with small number of training samples [15]. We experiment with the proposed method on three kinds of small-scale datasets including hand gesture, Ballet, and traffic; the higher recognition rates reveal that our method is competitive to some state-of-the-art methods.

The rest of this paper is organized as follows: Section 2 introduces mathematical background; Section 3 gives product Grassmann manifold representation for video; Section 4 presents distance on product Grassmann manifold; Section 5 proposes extrinsic least squares regression on product Grassmann manifold; Section 6 gives classification based on extrinsic least squares regression; Section 7 shows experiments on different datasets, and experiment results show that the proposed method achieves considerable accuracy; Section 8 analyzes the time complexity of proposed method and Section 9 gives a conclusion.

#### 2. Mathematical Background

In this section, we introduce the mathematical background used in this paper.

##### 2.1. Grassmann Manifold

Stiefel manifold is the set of all matrices with orthonormal columns; that is, where is the identity matrix. Grassmann manifold can be defined as a quotient manifold of with an equivalence relation . In fact, for any , where is the subspace spanned by columns of . In other words, Grassmann manifold is the space of -dimensional linear subspaces of for [16], which may be specified by arbitrary orthogonal matrix with dimension . Notice it is not unique for the choice of matrix for a point on Grassmann manifold; that is, the same point on Grassmann manifold can be spanned by different matrix and .

##### 2.2. Higher Order Singular Value Decomposition (HOSVD)

HOSVD is a multilinear SVD operating on tensor. Let be a tensor with order . The process of reordering the elements of an -mode tensor into a matrix is called matricization. The mode- matricization of a tensor is denoted by (see details in [17]). Then each is factored using SVD as follows:where is a diagonal matrix, is an orthogonal matrix which spanned the column space of , and is an orthogonal matrix which spanned the row space of . By using HOSVD method, an order tensor can be decomposed as follows: where is core tensor, are orthogonal matrices given in (3), and denotes mode- multiplication.

##### 2.3. Product Manifold

Let be manifolds; the product manifold of the manifolds is defined as where denotes Cartesian product and is called factor manifold.

#### 3. Product Grassmann Manifold Representation for Video

Video is a kind of multidimensional data and can be represented as tensor , where , , and represent height, width, and length of video, respectively. The variation of each mode can be captured by HOSVD. Lui et al. [18] found that traditional HOSVD is not appropriate for forming product manifold, so they redefined the traditional definition of HOSVD to factorize tensor using the orthogonal matrices , , and described in (3). That is, where is core tensor.

Since is a tall orthogonal matrix, hence it is a point on Stiefel manifold. Then is a point on Grassmann manifold. Hence, is a point on product Grassmann manifold. Then is a representation for videos on product Grassmann manifold.

#### 4. Distance on Product Grassmann Manifold

The metric on Grassmann manifold is geodesic distance which is the shortest curve between two -dimensional subspaces and , that is, with representing the principal angles [16]. Recently, Chikuse [13] introduced a projection embedding , , where denotes space of symmetric matrices. And Hamm and Lee [19] defined a distance called* Projection Metric* on Grassmann manifold as follows.

*Definition 1. *Given two points and on Grassmann manifold , the distance between and is defined as

*Remark 2. *In fact, for any matrix , there exists a orthogonal matrix such that , then element is equal to element . In this case, . Hence it is feasible to use the matrix representing . And is equal to geodesic distance of two points on Grassmann manifold [14].

Based on Definition 1, we give a kind of definition of distance on product Grassmann manifold which sums distance of each factor Grassmann manifold.

*Definition 3. *Given two points and on product Grassmann manifold , the distance between and is defined as

#### 5. Extrinsic Least Squares Regression on Product Grassmann Manifold

Least squares regression is a simple and efficient technique in statistical analysis. In Euclidean space, parameter is estimated by minimizing the residual sum-of-square error where is training set and is regression value. The estimated parameter has closed solution as Hence the corresponding error is

In Grassmann manifold space, Lui [10] extended the linear least squares regression to a nonlinear form. In detail, the estimated parameter is equal to where is a nonlinear similarity operator, is a set of training samples on manifold, and is an element on manifold. So the corresponding error is where is an operator mapping points from vector space back to manifold. While Grassmann manifold is not closed under normal matrix subtraction and addition, the mapping is realized by employing exponential mapping and its inverse without closed-form solution. To realize the composition map , an improved Karcher Mean Computation algorithm is employed. To avoid loss of the above iterative algorithm, we introduce an extrinsic least squares regression on Grassmann manifold by embedding its elements to space of symmetric matrices. Due to the distance on product Grassmann manifold in (8) being additive for each factor, the extrinsic least squares regression on product Grassmann manifold equals three independent subregression problems on each factor. Taking one factor as example, we show the details in the following.

Let be training set where is number of samples, and is fitting parameter. is regression value. Similar to the idea of least squares regression in Euclidean space, we give a regression on Grassmann manifold, which is defined in the embedded space of symmetric matrices. The residual is measured as follows:where is the th element in vector . Next we show how to solve the optimization. We have and we define Hence model (14) becomes Let derivation of (17) with respect to equal to 0; we have So the solution of optimization (14) isHence the corresponding error becomes

#### 6. Recognition Based on Extrinsic Least Squares Regression

In this subsection, we consider 3-order product Grassmann manifold for videos, while the situation for higher order is similar. Suppose classes are defined for the data. We denote training set corresponding with the th class as , where is number of samples. Our objective is inferring to which class the test sample belongs.

The residual error of query sample for class is defined as where are solutions of subregression on each factor Grassmann manifold, respectively. The category of the query sample is determined by

#### 7. Experiments on Different Datasets

In this section, we show performance of the proposed method against some state-of-the-art methods on two kinds of datasets.

##### 7.1. Action Recognition

###### 7.1.1. Cambridge Hand Gesture Dataset

The Cambridge hand gesture dataset [20] contains 900 video sequences with nine kinds of hand gestures, which is divided into 5 sets according to different illuminations. Figure 1 shows some hand gesture samples. Set 5 (normal illumination) is considered for training while the remaining sequences (with different illumination characteristics) are used for testing. The original sequences are converted to grayscale and resized to . We denote our method as ELSR and report the correct recognition rate (CRR) for the four illumination sets in Table 1. Compared with product manifold (PM) [10], Grassmann Sparse Coding (gSC) [14], Grassmann Locality-Constrained Coding (gLC) [14], kernel Grassmann Sparse Coding (kgSC) [14], and kernel Grassmann Locality-Constrained Coding (kgLC) [14], we find that our method is competitive to these state-of-the-art methods.