Abstract

Covariance matrices, known as symmetric positive definite (SPD) matrices, are usually regarded as points lying on Riemannian manifolds. We describe a new covariance descriptor, which could improve the discriminative learning ability of region covariance descriptor by taking into account the mean of feature vectors. Due to the specific geometry of Riemannian manifolds, classical learning methods cannot be directly used on it. In this paper, we propose a subspace projection framework for the classification task on Riemannian manifolds and give the mathematical derivation for it. It is different from the common technique used for Riemannian manifolds, which is to explicitly project the points from a Riemannian manifold onto Euclidean space based upon a linear hypothesis. Under the proposed framework, we define a Gaussian Radial Basis Function- (RBF-) based kernel with a Log-Euclidean Riemannian Metric (LERM) to embed a Riemannian manifold into a high-dimensional Reproducing Kernel Hilbert Space (RKHS) and then project it onto a subspace of the RKHS. Finally, a variant of Linear Discriminative Analyze (LDA) is recast onto the subspace. Experiments demonstrate the considerable effectiveness of the mixed region covariance descriptor and the proposed method.

1. Introduction

Image classification is an important and prevalent topic in pattern recognition and computer vision research. Particularly when the object detecting [1] and visual tracking [2, 3] are widely developed, these technologies have been effectively applied to numerous real world scenarios [48].

Feature representation is one of the fundamental steps for the detection, tracking, and classification task. The raw pixel value is the simplest choice. This has been used in many traditional classification methods such as Nonparametric Discriminant Multimanifold Learning (NDML) [9] and Discriminant Analysis with Local Gaussian Similarity Preserving (DA-LGSP) [10]. However, the descriptor is usually limited by the high-dimensional representation, and it is not robust to noise.

A natural extension of raw pixel values is to search for more useful information, such as histogram of image gradients and edges, colors. Using information about primary colors (Red, Green, and Blue), Marcin et al. calculate values of Hue-Saturation-Brightness (HSB) and build models of input image regions by the HSB in some practical application scenes [11, 12]. However it is nonuniversal to describe objects features only in color space. Histogram-based representations such as Scale-Invariant Feature Transform (SIFT) descriptor [13] and Histogram of Oriented Gradients (HOG) [14] descriptor are more robust and discriminative. Nevertheless, these descriptors are difficult to handle regarding occlusion. Model the object appearance using multiple features is another choice, but it requires a large amount of data [15, 16].

Recently, covariance descriptors [17, 18] have attracted increased attention in computer vision and pattern recognition applications. Tuzel et al. [17] first propose representing image regions by a nonsingular covariance matrix of d-features such as location, intensity, higher order directional derivatives, and edge orientation. The dimensionality of the covariance descriptor is independent of the image size and has only different values due to symmetry. Furthermore, the covariance matrix does not have any information regarding the ordering and the number of points, which implies a certain scale and rotation invariance in different images. But, region covariance descriptor neglects the mean of feature vectors, while we think the mean vector could increase discriminative validity to some extent. For this reason, we propose a new region descriptor by considering the mean vector of d-features.

Covariance matrices lie on Riemannian manifolds, not on Euclidean space. It is not adequate to neglect the nonlinear geometry and directly use most of the conventional machine learning methods. This often yields undesirable effect such as the swelling of diffusion tensors [19]. One of the key issues is how to develop efficient discriminative learning algorithms on Riemannian manifolds. A popular choice is mapping points of a manifold onto the tangent space at a particular point and then designing the classification there [20]. This has been applied to pedestrians detecting [1], action classification [21] and multiclass classification problems [22]. However, flattening a manifold through tangent spaces may generate inaccurate modeling, especially for the regions far away from the tangent pole. Another framework follows the idea of the kernel method in Euclidean space [23]. Kernel-based methods embed the manifold into a high-dimensional Reproducing Kernel Hilbert Spaces (RKHS) and further project the elements in RKHS onto Euclidean space by an explicit mapping. Benefiting from the explicit mapping, many classification methods originally proposed on Euclidean space can be extended to Riemannian manifolds [2429]. Riemannian Locality Preserving Projections (RLPP) [30] is a more recent kernel-based method that has been proposed for the visual classification task. Modeling images as region covariance matrices, RLPP proposes mapping a Riemannian manifold onto Euclidean space by a Riemannian pseudo kernel, and then exploiting Locality Preserving Projection (LPP) for discriminative learning. Considerable results on visual recognition tasks have been achieved. However, the Riemannian pseudo kernel derived from Affine-Invariance Riemannian Metric (AIRM) [31] is not guaranteed to be positive definite. Besides, RLPP is a supervise method, but we argue that RLPP fails to take full advantage of the labels. Jayasumana et al. [32] present a framework on Riemannian manifolds to identify the positive definiteness of Gaussian RBF kernel and utilize the Log-Euclidean Gaussian kernel to Kernel Principal Component Analysis (KPCA) for the recognition task.

Based on the related works, we propose a discriminative learning method, Mixed Region Covariance Discriminative Learning (MRCDL) for image classification. The main contributions within this paper are as follows:

(1) We propose a novel region descriptor, which is more discriminative than the region covariance descriptor.

(2) To address the classification task, we proposed a subspace projection framework for Riemannian manifolds. The previous framework projects the elements of Riemannian manifolds onto Euclidean space based upon a linear hypothesis, but the final feature space in our framework is a subspace of RKHS, which is isomorphic to Euclidean space. We give the well-founded derivation of the subspace mapping.

(3) We exploit KLDA on the final low-dimensional subspace to adapt to the subspace projection framework.

The rest of the paper is organized as follows: we briefly review the Riemannian geometry and the properties of RKHS in Section 2. Section 3 presents the related works of our proposed method. In Section 4, we present the proposed MRCDL in detail. In Section 5, the experimental results are depicted to demonstrate the effectiveness of MRCDL and mixed region covariance. Conclusions are made in Section 6.

2. Preliminaries

In this section, we briefly review the geometry of Riemannian manifolds and the properties of RKHS.

2.1. Riemannian Geometry on SPD Matrices

The space of real SPD matrices, denoted by , is a connected Riemannian manifold. A Riemannian manifold is a differentiable manifold endowed with a symmetric positive definite and smoothly second order tensor field. So-called tensor is a multilinear functional on linear space. The tensor at a given point on a differentiable manifold is the tensor on the tangent space at the point. The tensor fields map the points on the manifold to the tensors on the tangent space at the points. Once we define a symmetric positive definite and smoothly second order tensor field on a differentiable manifold, we define a smoothly varying inner product on each tangent, vice versa. The family of symmetric positive definite and smoothly second order tensor fields is known as the Riemannian metric of the manifold. It enables us to induce the geodesic distance between two points on the manifold. The geodesic distance between two points is the length of the shortest curve connecting the two points. This is the most suitable measure of distance on a Riemannian manifold. In the following, we discuss some popular Riemannian metrics proposed on .

The affine-invariant geodesic distance induced by AIRM [31] is the most natural measure of dissimilarity between two points lying on a Riemannian manifolds. The distance of AIRM between and is defined aswhere is an ordinary matrix operator which is computed through singular value decomposition, and denotes the matrix Frobenius norm.

The AIRM enjoys several useful theoretical properties; however, this metric has the drawback of high time complexity in practice. The Log-Euclidean geodesic distance [33] is the distance between two SPD matrices measured on the tangent space of at the identity matrix. The Log-Euclidean geodesic distance between and is given byThis distance is a lower bound on the actual geodesic distance. It is more efficient than the affine-invariant geodesic distance on computation.

The Jensen-Bregman LogDet Divergence (JBLD) [34] can be understood as a proxy of the geodesic distance based on matrix divergence. This divergence is also much faster to compute and turns out to be empirically very effective. The JBLD between and is defined aswhere denotes the determinant of a matrix.

This measure does not require matrix eigenvalue computations or logarithms and at the same time enjoys many of the properties of AIRM. But JBLD is not the true geodesic distances and can lead to poor accuracy.

2.2. RKHS

Let be a function space equipped with an inner product which could induce a complete inner product space, then [35] is a Hilbert space of functions. If there exists a function ,(1)for all , (2)for all and ,

then is called Reproducing Kernel Hilbert Space and is the reproducing kernel of .

Let be a mapping defined by , for all , . It can be induced that

Hilbert space can be generated from a kernel function. A function can be called a kernel function if it satisfies the following:

(1) Symmetric: for all ,

(2) Square integrable: for all , is square integrable

(3) Positive definite: for all positive integers and all , the matrix is positive definite.

According to Moore–Aronszajn Theorem [35], as long as a kernel function is defined, a unique RKHS as well as its reproducing kernel is also defined. Therefore, a kernel function can be used to represent a RKHS.

The success of many state-of-the-art computer vision algorithms arises from the use of kernel methods on Euclidean spaces. One of the advantages of kernel methods is that it transforms the nonlinear space to a linear space by embedding a manifold into RKHS. In addition, we never need to compute the actual vectors since the inner products in the RKHS can be evaluated by means of the kernel function.

3.1. Region Covariance Descriptor

The region covariance descriptor, as a special case of SPD matrices, proposes a natural way of fusing multiple features that might be correlated. Let be the input image or the rectangular region in the input image. For each pixel inside , it can be represented by a feature vector including pixel coordinates, intensity or colors, the first and the second order derivatives of intensities in the and directions, etc. For example, we can extract the features from the pixel located in the -th row and the -th column in a given image region:where is the image intensity of the pixel; and are the first order derivatives of intensities in the and the directions, respectively. Suppose there are pixels in height and pixels in width in . Let be the d-dimensional feature vectors of points inside , where . The region covariance of can be computed bywhere ; denotes the mean of the points.

3.2. RLPP

Next, we briefly introduce the implementation of RLPP [30]. Suppose there are N training images belonging to c classes. RLPP characterizes input images as covariance matrices by using the region covariance descriptor. Let denote the N training samples; each is a SPD matrix corresponding to an input image region (or image), and let denote the corresponding labels. As mentioned above, covariance matrices are SPD matrices lying on a Riemannian manifold. Classical learning algorithms cannot be directly utilized on the manifolds. Thus RLPP exploits a Riemannian pseudo kernel to map the from onto a linear space . The Riemannian pseudo kernel defined in RLPP is formulated as

RLPP implements the visual recognition task in an extrinsic manner by confining the solution to be linear. Given a SPD matrix as the input, its projection on the feature space is obtained bywhere is the projection matrix and .

The aim of RLPP is to optimize by preserving the local geometrical structure of the original space, which can be modeled by a similarity graph . The binary graph incurs a heavy penalty if points from the same class are mapped far away in the mapped feature space:

The optimized mapping is obtained by minimize the following objective function:

Therefore, the minimization problem reduces to finding:

The optimal is given by the generalized eigenvectors corresponding to the m smallest nonzero eigenvalues of . Then, the recognition task over Riemannian manifolds is reduced to classification problem in vector space. The N training samples can be projected to and explicitly represented by a series of m-dimensional vectors through (7). As such, for any given testing image, the corresponding covariance matrix and the kernel vector are firstly compute through (6) and (7) respectively. According to (8), the test image is also projected to . Nearest Neighbor (NN) classification or support vector machine is then conducted in this m-dimensional vector space.

4. The Proposed Method

In this section, we give a detail description of the proposed MRCDL.

4.1. Framework

Generally speaking, the learning framework of training can be viewed as a four-stage procedure. The first step is to represent the N labeled training samples with the proposed mixed region covariance descriptor, denoted as the covariance matrices as . The second step is to embed the SPD matrices from the Riemannian manifold into a high-dimensional RKHS with the Gaussian RBF kernel induced by LERM. As previously stated, RKHS is high-dimensional inner product space of functions rather than vectors, which means that the discriminant analysis cannot be utilized directly. Thus, our third step is to project the elements in the RKHS onto a subspace of the RKHS, which is isomorphic to Euclidean space . Benefiting from the notion of inner space and the properties of subspace, the mapping from the Riemannian manifold to the final linear subspace can be achieved by an explicit mapping. The subspace projection framework is novel although it is similar to the method in [18, 30, 36]. Finally, discriminative learning based LDA is conducted to obtain the optimized subspace . Then, the samples are mapped to this low-dimensional discriminant subspace . The projections of N training samples can be denoted as . An overall illustration of training stage is shown in Figure 1.

For the testing stage, using the mixed region covariance matrices of test images as input, we project them onto the discriminant subspace by the explicit mapping learned from the training stage. The classification task can be realized in terms of K-Nearest-Neighbor (KNN) classifier on the subspace.

4.2. Mixed Region Covariance Descriptor

As depicted in Section 3.1, given a rectangular image region , we can extract the feature vectors such as intensity, color, gradients and filter responses for each pixel. Let be the d-dimensional feature vectors of the n pixels inside . Firstly, we compute the region covariance descriptor as (6) and the mean vector . By employing the information geometry theory [37, 38], the mean vector and covariance matrix of can be embedded into a SPD matrix, and then the image region is represented as a SPD matrix:where .

The mixed region covariance descriptor inherits the advantages of the region covariance descriptor, namely, good robustness, antinoise capability, and low dimensionality. Furthermore, the mean vectors could improve the discriminative learning ability. Increased performance can be proved by the result of experiments in Section 5.

It is worth noting that the proposed mixed region covariance descriptor totally differs from Gaussian Mixture Model (GMM), which is a parametric statistical model under the assumption of Gaussian distribution for images [39] or image sets [38]. The images are assigned to Gaussian components with respect to a posterior probability. While mixed region covariance descriptor represents images in a global manner, it makes no assumptions about the data distribution. The mean vector and region covariance are computed with feature vector of pixels rather than Gaussian components.

4.3. Gaussian Kernel and Subspace Projection of RKHS

It is known that Gaussian kernel is effective in Euclidean spaces. We derive a Gaussian kernel on Riemannian manifolds by replacing Euclidean distance with geodesic distance induced by LERM, which could conserve the geometric properties of manifold and avoid high time complexity. Most importantly, it yields a positive definite kernel, which is a necessary condition for reproducing kernel. The kernel is defined aswhere is the Log-Euclidean distance.

Let be the RKHS generated by the reproducing kernel defined in (13). We can define a mapping from to RKHS :

The maps of the input covariance matrices on the RKHS can be denoted as . Although each in the RKHS is a function rather than a vector, the inner product of the Hilbert space can be evaluated by the property of kernel function:

Next, we construct an m-dimensional subspace in RKHS . Let , ; the set could be an orthogonal basis of , only if it is orthonormal, which means that . We havewhere ,

Then, we can get the constraint of the projection matrix :where is the -dimensional identity matrix. Thus, we know that could span a subspace of only if the constraint is satisfied.

We further project onto the subspace spanned by . The projection of on is

Thus, we can project the points from a Riemannian manifold onto a low-dimensional linear subspace under the constraint of . Let be the low-dimensional embedding of ; we have

The subspace projection framework is different from the mapping in [18, 30, 36], which claims that the points on a Riemannian manifold can be mapped onto Euclidean space directly or indirectly based upon a linear hypothesis, but they have not given the justification, while in the proposed subspace projection framework, the constraint of is an important precondition.

4.4. Discriminative Learning on Riemannian Manifolds

A finite dimensional linear space must be isomorphic to the same dimensional Euclidean space [37]; thus the subspace of RKHS is isomorphic to -dimensional Euclidean space . The proposed MRCDL recast LDA [40] from Euclidean space to the subspace and develop a variant of LDA to seek the optimal subspace. The LDA is well known for its effectiveness to learn discriminant subspaces. In order to make the points from the same class be more compact and the points from different classes to be further separated, LDA seeks for a projection matrix such that the ratio of the between-class scatter matrix and the within-class scatter matrix is maximized after the projection.

Suppose the input data come from classes. We denote the data of the -th class as , and can be represented as where is the selection matrix such that ; is the number of images in the -th class.

We denote the mean and the centralization of as and , respectively, and they can be computed bywhere is a dimensional vector whose elements are all 1; is the centralized matrix.

The between-class measure of the -th class can be computed bywhere .

Then the between-class scatter matrix can be represented aswhere .

Let ; we havewhere .

The between-class scatter matrix can be formulated aswhere and .

The MRCDL seeks to solve the following optimization under the constraint of :

The optimal is given by the m largest eigenvectors of the eigenproblem: . We can get the m-dimensional embedding of the training samples by (19). Given a test example , its m-dimensional projection in the discriminant subspace is obtained likewise. Then the class of test region can be predicted by KNN classification in the m-dimensional vector space.

The proposed MRCDL is presented in Algorithm 1.

Input:
(i) The mixed region covariance descriptors of the image regions for
training and the corresponding labels .
(ii) The number of images in the -th class , .
(iii) The mixed region covariance descriptor of the test image region.
Processing:
1: Calculate the matrix logarithm for all ,
2: for each point   do
3: for each point   do
4: Calculate the log-Euclidean distance between and (2),
5: end for
6: end for
7: Calculate the kernel matrix (13),
8: for each class   do
9: Calculate the mean (21) and the centralized matrix (22),
10: end for
11: Calculate the within-class scatter matrix (24),
12: Calculate the between-class scatter matrix (26),
13: Solve the eigen-problem: . The eigenvectors corresponding to the
largest eigenvalues of form ,
14: Calculate the projections of (19),
15: Calculate the projection of the testing matrix (18),
16: The label of the test region is determined by KNN classifier between and
Output:
(i) The class of the test region .
4.5. Computational Complexity

The complexity of computing the covariance of rectangular region mainly focuses on the computation of region covariance. A fast computation of the region covariance descriptor can be realized based on the integral image [1, 6], and the computational cost for any input image region is . Considering the properties of the LERM, we perform the matrix logarithm operate on each mixed region covariance matrix; it can be computed in time. The complexity of computing the descriptor and matrix log operation is linear in N.

Neglecting the complexity of the exponential in kernel functions and considering the symmetry of kernel matrix, the complexity of computing kernel matrix is . The optimization of the projection matrix involves the calculation of within-scatter, the calculation of between-class scatter, and the eigenvalue decomposition. The complexity of computing within-class scatter and between-class scatter is and , respectively. The complexity of the eigenvalue decomposition is .

In summary, the entire computational complexity of the MRCDL is

5. Experiment

Experiments are performed on the recognition and categorization tasks. Several state-of-the-art approaches for image classification are adopted for comparison. The performance of the mixed region covariance and the proposed MRCDL can be verified in this section. All of the methods were implemented on an Intel(R) Core(TM) i5-4670K (3.40GHz) PC, using Matlab R2014.

5.1. Datasets and Experimental Settings

We used the COIL-20 (Columbia Object Image Library) dataset [41] and the ETH80 dataset [42] for the object categorization task. The COIL-20 consists of 20 different objects; for each object there are 72 images of different viewpoints. The images are manually cropped and are normalized to pixels. To calculate the mixed region covariance descriptor of an image, each pixel in the image was mapped onto a 5-dimensional feature space , where is the image intensity and , , , are the first and the second order derivatives of intensities in the and the directions, then each image was described by a covariance matrix. We randomly selected 10 images per class for training and the remaining 62 images for testing.

ETH80 dataset contains 8 categories including apples, pears, tomatoes, cars, cows, cups, dogs, and horses. Each category comprises pictures of 10 object instances and 41 images per object. Images in the dataset are manually cropped and are rescaled to pixels. Each pixel was mapped onto a 10-dimensional feature vector to calculate the mixed region covariance matrix, where and are pixel locations, , , represent the corresponding RGB color values, is the intensity, and , , , are the first and the second order derivatives of intensities in the and the directions. The descriptor for each image therefore was a dimensional SPD matrix. In each experiment, this database was split into the training one and the testing one, where each category had 5 objects for training and the remaining 5 objects for testing. We randomly chose 20 images per object in the training objects and the testing objects. That is to say, there were 100 images for training and 100 images for testing in each category. The test images and the training images came from different object instances even in the same categories. The KNN classifier was adopted to predict the labels of the input testing images.

We conduct the texture classification task on the Brodatz dataset, which consists of 111 textures of size [43]. Each image is divided into four images, half of which are used for training and the remaining for testing. In each training image, the mixed region covariance descriptors of 50 randomly chosen image regions of size were computed from a 5-dimensional feature vector . For a given test image, we computed 100 covariance matrices from randomly selected image regions of size between and . We used the KNN classifier (k=5) for the classification of each input region; thus we obtained 100 labels for each image. The classes of the testing images were predicted using majority voting among the 100 labels.

The proposed MRCDL is compared with KPCA [32], RLPP [30], NDML [9], and DA-LGSP [10]. To validate the effectiveness of the mixed region covariance, the proposed MRCDL is also conducted under the original region covariance descriptor, denoted by RCDL. The KPCA and the RLPP are also conducted under the mixed region covariance descriptor, denoted by KPCA+MRC and RLPP+MRC, respectively. In NDML and DA-LGSP, we preserve nearly 95% image energy to select the number of principal components. The KNN () search is used for constructing the neighborhood graphs, as well as RLPP. The parameter in Gaussian kernel of KPCA, RLPP, and MRCDL is set to the average value of distances. Besides, the 5NN classifier is adopted to predict the labels of the testing images (or image regions).

5.2. Performance Comparison with Other Methods

The performance of the methods is compared on the recognition rates and the computation time. The recognition   is the ratio between the number of correct identifying tests and the number of the total testing images. To overcome the randomness of experiments, the recognition rate is the average of 20 rounds of tests. We present the recognition rates in Table 1. Since NDML and DA-LGSP require uniform dimensions of the input data, the texture recognition task cannot be realized by the two methods.

As shown in Table 1, our proposed MRCDL not only makes a considerable improvement against RLPP, but also achieves the best recognition rate in the compared methods. This is because the use of class labels provides more between-class and within-class information for discriminative learning. In addition, the recognition rates of RCDL, RLPP and KPCA are lower than the recognition rates of the corresponding methods, namely MRCDL, RLPP+MRC, and KPCA+MRC. This result demonstrates the performance of the mixed region covariance descriptor.

On COIL-20 dataset, we compared the computational time of different methods. The time cost for each method is listed in Table 2. The NDML and DA-LGSP are faster than the methods based on covariance descriptors, since the cost of matrix logarithm operation is time-consuming. But our method is much faster than RLPP. The justification for this big difference is that the geodesic distance induced by LERM is more efficient than AIRM.

5.3. Analysis of Parameter m

Parameter m represents the dimensionality of the projecting subspace space, which has a certain influence on the recognition rates. In this part, the recognition rates versus different dimensionality are discussed. We present the results in Figures 2, 3, and 4. It can be found that the curve of each method shows increasing trend with the increase of dimensionality. Each curve almost keeps unchanged when the recognition rate achieves its maxima. The effectiveness of mixed region covariance has been proved again by the best recognition rates on all of the three datasets. Moreover, we can find that the dimensionality corresponding to the maxima of MRCDL and RCDL is smaller than other methods especially on Brodatz dataset, which indicates that the proposed method obtains good performance even in lower dimensionality.

5.4. Analysis of the Kernel Width

In the statistical model of a Gaussian distribution, the parameter determines the bandwidth of the distribution. As shown in (13), as long as the Riemannian metric is fixed, the parameter plays an important role in Gaussian kernel. For small values of , the neighborhoods of points on the RKHS are small. When becomes bigger, the dissimilarity of points tends ambiguous. Note that corresponding to the case when all of the elements of kernel matrix are set to 1. We suggest making moderately sensitive to distances and choosing the average value of distances as the value of . Experiments are conducted on COIL-20 dataset and ETH80 dataset to demonstrate the effect of . The recognition rates under different times of the average value are presented in Figure 5. It is shown that the method performs much better when is set to 0.5 to 5 times of the average value of distances.

6. Conclusions

In this paper, we proposed an image classification method called mixed region covariance discriminative learning (MRCDL) on Riemannian manifolds. Image regions are represented by the proposed mixed region covariance descriptors. The descriptor could improve the discriminative learning ability of covariance descriptors. This method employed a Log-Euclidean Gaussian kernel to embed Riemannian manifolds into RKHS and further projected the points onto a subspace of the RKHS. The kernel version of LDA is conducted on the low-dimensional subspace. Compared with other methods, the proposed method always outperforms other methods, even with low dimensionality. The justification comes from the advantages of covariance descriptors, the within-class and between-class information. Unfortunately the proposed method still suffers several drawbacks: first of all, the computational cost of our method is not as fast as NDML and DA-LGSP; nevertheless, the time increment is relatively low. Secondly, our method is a supervised method, which means that all of the training samples should be labeled beforehand.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China through the Project “Research on Nonlinear Alignment Algorithm of Local Coordinates in Manifold Learning” under Grant 61773022.