Abstract

Image recognition tasks involve an increasingly high amount of symmetric positive definite (SPD) matrices data. SPD manifolds exhibit nonlinear geometry, and Euclidean machine learning methods cannot be directly applied to SPD manifolds. The kernel trick of SPD manifolds is based on the concept of projecting data onto a reproducing kernel Hilbert space. Unfortunately, existing kernel methods do not consider the connection of SPD matrices and linear projections. Thus, a framework that uses the correlation between SPD matrices and projections to model the kernel map is proposed herein. To realize this, this paper formulates a Hilbert–Schmidt independence criterion (HSIC) regularization framework based on the kernel trick, where HSIC is usually used to express the interconnectedness of two datasets. The proposed framework allows us to extend the existing kernel methods to new HSIC regularization kernel methods. Additionally, this paper proposes an algorithm called HSIC regularized graph discriminant analysis (HRGDA) for SPD manifolds based on the HSIC regularization framework. The proposed HSIC regularization framework and HRGDA are highly accurate and valid based on experimental results on several classification tasks.

1. Introduction

Vision recognition tasks are often encountered in real-life application [13]. Most traditional image recognition algorithms are constructed in the Euclidean space [4, 5]. Recently, symmetric positive definite (SPD) matrices [6] have received more and more attention in terms of region covariance descriptor [79], Gaussian mixture model (GMM) [10], diffusion tensors [11, 12], and structure tensors [13, 14]. These descriptors utilize second-order statistical information to capture the correlation between different features and are effective in various applications [1519]. The SPD matrices lie on an SPD manifold when endowed with an appropriate Riemannian metric. It is not adequate to directly use most of the conventional machine learning methods on SPD manifolds [20, 21]. Therefore, developing methods for classifying the points on SPD manifolds is of significant interest.

Most existing classification methods on SPD manifolds employ Riemannian metrics and matrix divergences as the dissimilarity measurement [2226], e.g., the log-Euclidean Riemannian metric (LERM) [21] and the affine-invariance Riemannian metric (AIRM) [20], and Jensen–Bregman LogDet divergence (JBLD) [27, 28] is not a real Riemannian metric, it facilitates a quick and approximate computation of the distance. However, these methods cannot be developed to other manifolds owing to specific metrics. Another common approach maps the points of a manifold to the tangent space of a specific matrix. Under this approach, traditional dimensionality reduction methods have been extended to Riemannian manifolds [29]. Tuzel et al. applied LogitBoost for classifying SPD manifolds [30], and the method employed in this case was generalized to multiclass classification [31]. Tangent space approximations could preserve the manifold geometry to some extent. However, mapping the points to the tangent space may bring inaccurate distance measurement, particularly for points far away from the center. In [32], sparse coding by embedding the SPD matrices to the unified tangent space was proposed.

More recent studies have addressed the nonlinearity by adopting the kernel trick. Kernel methods project the SPD manifold to a reproducing kernel Hilbert space (RKHS) and further project the points to linear spaces. Thus, classification algorithms can be extended to SPD manifolds [3336]. However, mapping from the RKHS to the Euclidean space is based on a linear assumption; the intrinsic relationship of input data and projections is not considered.

In this study, a novel kernel framework of SPD manifolds by considering the connection of SPD matrices and linear projections is proposed to address the problem. An intrinsic relationship between SPD matrix and low-dimensional representation is introduced herein to make the low-dimensional representations more respectful to the intrinsic feature of the input data. In the proposed framework, the relationship of the SPD matrices and projections can be reflected by the Hilbert–Schmidt independence criterion (HSIC), and an HSIC regularization term is added to the traditional kernel framework. HSIC is usually used to express the statistical correlation between two datasets; it has been extended to supervised sparse learning feature selection [3739], dictionary learning [40, 41], subspace learning [42], and nonlinear dimensionality reduction [43]. Although HSIC has been found in many applications, it seems not to have been directly applied on SPD manifolds. This study is an extension of our previously published work [44]. In [44], we proposed a method named HSIC subspace learning (HSIC-SL) by using global HSIC maximization. Compared with HSIC-SL, we use HSIC herein in the form of regularization term to build a novel kernel framework on SPD manifolds. The most significant aspect of the proposed framework is that it allows us to extend the traditional kernel methods to new HSIC regularization kernel methods, and the effectiveness of new methods can be improved. Additionally, this paper proposes an algorithm based on the HSIC regularization framework and graph embedding. Our primary contributions are summarized as follows:(1)The HSIC is applied to the kernel framework, and an HSIC regularization kernel framework is proposed. The application of the proposed framework to most of the existing kernel methods on SPD manifolds improves the effectiveness of these methods. This work can provide important contributions to the development of kernel methods on SPD manifolds.(2)Three kernel functions involved in HSIC regularization are presented. The most appropriate kernel function can be selected based on the target application, which increases the flexibility of the proposed framework.(3)A method called HSIC regularized graph discriminant analysis (HRGDA) on SPD manifolds is proposed based on the HSIC regularization kernel framework. HRGDA uses a LogDet divergence kernel for embedding and a variant of kernel linear discriminant analysis (LDA) for learning.

As mentioned previously, the kernel trick is the most common method for addressing the nonlinearity of SPD manifolds. Riemannian locality preserving projections (RLPP) [45] embed Riemannian manifold in vector space via a Riemannian kernel; however, their time cost is high and the kernel is not always SPD. Jayasumana et al. [46] presented a theorem to judge the SPD of Gaussian radial basis function (RBF) kernels. Harandi et al. executed sparse coding by embedding SPD matrices into RKHS through matrix divergences [47]. Zhuang et al. proposed kernel learning and Riemannian metric (KLRM) based on data-dependent kernel learning [48]. Covariance discriminative learning (CDL) [8] maps the SPD matrices to a vector space by using matrix logarithm. Kernel-based subspace learning (KSLR) [19] defines an improved log-Euclidean RBF kernel and seeks the optimal subspace by using linear discriminant analysis (LDA).

2.1. Kernelized Schemes

The framework of kernel methods on SPD manifolds is presented in Figure 1. Here, let be N samples on an SPD manifold . First, the input SPD matrices on the SPD manifold are embedded onto a high-dimensional RKHS with a predefined kernel function . Second, the data are further projected on an m-dimensional linear subspace , which is isomorphic to a vector space . Under the isomorphism assumption of kernel mapping, the projection of is obtained using , where is the transformational matrix and , . Furthermore, we have . Third, the transformation matrix is learned using the training data. To this end, manifold learning methods are performed on the low-dimensional Euclidean space. Therefore, the learning methods transform into the following optimization problem:where is a cost function. Occasionally, the optimization problem is a minimum optimization problem [45, 46]. Finally, cluster and classification tasks can be realized in the vector space.

The optimization problem of equation (1) is customary to be designed as the cost function in many methods such that the optimization problem can be solved by eigenvalue decomposition [8, 19, 45]. Then, the objective functions of these methods can be constructed as

The proposed HSIC regularization kernel framework is designed to improve the performance of these methods, e.g., RLPP, CDL, and KSLR, in the form of equation (2).

2.2. RLPP

RLPP exploits locality preserving projections for discriminative learning; it tries to seek the optimal by preserving the local geometry of Riemannian manifolds, which is reflected by a similarity matrix. Here, the binary matrix provides a penalty if adjacent points from the same class are mapped far away. The binary matrix is defined as follows:where and are the labels.

The objective function is obtained using the following minimum optimization problem:where is the kernel matrix, , and .

Therefore, the minimization problem is reduced to solve the following:

2.3. CDL

CDL uses the matrix logarithm operator to define the kernel function. Moreover, the objective function of CDL is provided as follows:where is the connection matrix defined as

Both RLPP and CDL are constructed based on the traditional kernel framework, which means that the map from the input space to the projective space is a linear assumption. Thus, this paper proposes the HSIC regularization kernel framework by introducing statistical correlation between SPD matrices and low-dimensional representations. In the HSIC regularization kernel framework, the intrinsic connection of SPD matrices and low-dimensional projections is measured by HSIC. Under this framework, some existing algorithms based on kernel trick can be developed into new HSIC regularization kernel algorithms. As for the subspace learning methods in these algorithms, both RLPP and CDL only consider local geometric features and ignore the discriminative information. This paper proposes HRGDA based on the HSIC regularization framework. HRGDA combines local geometry and label information to learn the transformational matrix.

3. Preliminaries

3.1. Geometry of SPD Manifolds

Let be a real symmetric matrix space. A symmetric matrix is considered to be positive semidefinite if holds for all nonzero vectors, which is denoted as . Let be the eigenvalues of ; thus, has nonnegative values. This property is derived from the implicit structure of matrix . If the eigenvalues of are positive, then is an SPD matrix. Correspondingly, the inequality holds for any nonzero , which is also denoted as . The real SPD matrix space is denoted as , and the space forms an SPD manifold when endowed with a Riemannian metric. Manifolds are Hausdorff topological spaces with countable basis. For every point, there is an open set neighborhood local homeomorphism to the -dimensional vector space. For differentiable manifolds, all tangent vectors at a specific point are included in the tangent space of that point. The inner product is called the Riemannian metric. The norm of a tangent vector can be derived from the inner product, i.e., .

Let be the point of manifold and be a tangent vector of . Here, there exists exactly one geodesic corresponding to the tangent vector . The geodesic connecting and is transformed into a straight line, and the distance of the geodesic is equal to the length of the line. The Riemannian distance between and on the manifold is obtained using the geodesic from to , and this relation is illustrated in Figure 2.

Furthermore, the exponential map maps to , i.e., . The inverse operation of is the logarithmic map, i.e., . The definition of exponential and logarithmic maps is given as follows:

In the symmetric matrix case, the exponential and logarithmic maps can be computed using the eigenvalue decomposition. The symmetric matrix can be decomposed as . Thus, and can be, respectively, computed as follows:

3.2. Riemannian Metrics

The geodesic distance is the most common distance measure for two SPD matrices. The affine-invariant distance defined by AIRM [20] can be obtained as follows:where is the Frobenius norm. This metric inherits the characters of invariant distance; however, it exhibits high time complexity when practically implemented.

Another approach for measuring the geodesic distance is LERM [21]. The log-Euclidean distance can be defined as

The log-Euclidean distance is close to the actual geodesic distance and is easier to be computed than AIRM. However, the computational cost can be high for applications with numerous input matrices owing to matrix logarithms.

Driven by concerns regarding simple calculations, the matrix divergence is a fast and approximate candidate of the geodesic distance. Maher et al. introduced Jeffreys Kullback–Leibler divergence (JKLD) aswhere is the matrix trace. This measure is fast; however, it can overestimate the geodesic distance.

Recently, the JBLD has been discussed as a proxy for the Riemannian distance. The JBLD is defined as follows:where is the matrix determinant. This divergence is efficient because it does not require matrix logarithms.

4. HSIC Regularization

4.1. HSIC Regularization Kernel Framework

HSIC [49] is used to characterize the internal connection between two random vectors. The theoretical basis of HSIC is a bit complex. The derivation of HSIC relies on complex mathematical theories such as HS operators [50], cross-covariance operator, mean function [51], and functional analysis. However, HSIC can be calculated by an empiric HSIC. The empiric HSIC can be calculated aswhere is the centralizing matrix and and are the kernel matrices of and , respectively. We have summarized the relevant theories of HSIC in [44]. For detailed derivation, please refer to our previously published work.

The HSIC of the SPD matrices and the low-dimensional representations is denoted by . To facilitate easy calculation, we select the linear kernel as the kernel function of , i.e., . Then, the kernel matrix of can be computed as

Thus, can be computed as

Since has no relation with Y, we ignore the coefficient in equation (16). We obtainwhere .

As described in Figure 1, to address non-Euclidean geometry of SPD manifolds, the traditional kernel-based framework first embeds the input SPD matrices onto a high-dimensional RKHS with a predefined kernel and then projects them into a low-dimensional linear subspace. The relationship between low-dimensional projection and SPD matrix is actually a linear assumption. Under this assumption of the traditional kernel framework, the intrinsic connection of and is ignored. The statistical correlation of input data and projective data should be taken into account during transformation. Thus, HSIC of SPD matrices and low-dimensional representations is introduced herein to align the low-dimensional representation with the intrinsic features of input data, where HSIC is typically used to measure the statistical dependence between two datasets. In order to make the proposed kernel framework applicable to the traditional kernel-based methods, this paper proposes to add the HSIC regularization term to equation (1). Then, we havewhere is a regular term coefficient.

In summary, the objective function is formulated aswhere depends on the traditional method.

4.2. Kernel Pool

In the calculation of HSIC, the reproducing kernel of is fixed; however, the kernel function is alternative. It can be selected from the kernel pool based on specific practical applications. To generate a valid Hilbert space, the kernel function must be SPD [45]. Here, we discuss three alternative kernels.

4.2.1. Log-Linear Kernel

The log-linear kernel is generalized using the polynomial kernel in a Euclidean space. It is defined as follows:

4.2.2. Log-Gaussian Kernel

The log-Gaussian kernel is expressed as

It replaces the Euclidean distance between and with the LERM in the popular Gaussian RBF kernel.

4.2.3. LogDet Divergence Kernel

The LogDet divergence kernel is defined as

The kernel is a conditionally positive kernel [47], and it is guaranteed to be an SPD kernel for the following:

4.3. HSIC Regularization Graph Discriminant Analysis

The HRGDA is proposed based on the HSIC regularization framework. This method uses the LogDet divergence kernel in equation (22) for embedding and a variant of kernel LDA for learning. In kernel-based methods on SPD manifolds, the log-linear and log-Gaussian kernels are commonly used for Hilbert space embedding. Specifically, the log-linear kernel is used in CDL and the log-Gaussian kernel is adopted in KPCA and KSLR. The LogDet divergence kernel used in HRGDA is an effective kernel, since the computation complex of JBLD is lower than LERM and AIRM. The HRGDA seeks for a transformation matrix which maximized the between-class graph matrix and minimized the within-class graph matrix. The graph discriminant analysis is defined as follows:where is the transformational matrix and and are the between-class and within-class graph matrices, respectively.

The within-class graph matrix is defined from the following function:

The adjacency graph contains local geometry and is defined as

By substituting equation (26) into equation (25), we obtain the following:where the element on the diagonal of is row sums of A, . Thus, can be defined as

The between-class scatter is defined fromwhere is the number of classes, is the center of the p-th class, and .

Similarly, can be defined aswhere is a diagonal matrix and .

To sum up, the HRGDA can be formulated as

The optimal problem can be solved through eigenvalue decomposition.

4.4. Computational Complexity

The time complexity of HSIC regularization kernel framework contains three main parts: (1) calculating the learning function, i.e., ; (2) calculating the HSIC, i.e., ; and (3) conducting eigenvalue decomposition of optimization problem.

The computational complexity for calculating the learning function is the same as the traditional methods. Besides, the cost of eigenvalue decomposition is . Thus, the proposed framework brings no additional calculations of parts (1) and (3). Part (2) is an additional calculation. It can be seen from equation (17) that the computational complexity of is mainly in the calculation of kernel matrix . The kernel pool provides three useful kernel functions. The computational complexity of log-linear, log-Gaussian, and LogDet divergence kernel is , , and , respectively. Among the three kernel functions, the LogDet divergence kernel has the lowest computational complexity. The reason is that the computational complexity of matrix determinant is lower than matrix logarithm.

5. Experiments

The effectiveness of HSIC regularization framework is testified in the experiments. Here, we consider four widely used datasets for performing visual recognition tasks, i.e., the QMUL [52], FERET [53], COIL-20 [54], and ETH80 [55] datasets.

5.1. Datasets and Settings

The QMUL dataset comprises a set of 20,005 images of human heads captured by using cameras at an airport terminal. This dataset is split into different sets according to the direction of the head, namely, “back,” “front,” “left,” “right,” and “background.” Sample images are presented in Figure 3. The images are divided into training and testing sets beforehand. To compute the region covariance descriptor of each image, the feature vector is expressed as follows:where , , and are the values of the CIELAB color space, and are the first-order gradients of , respectively, and is the response of eight difference-of-Gaussians filters. We randomly select 200 and 100 images of each class as training and testing data, respectively.

We choose the “b” subset of the FERET dataset for face recognition experiments. The FERET contains 2000 face images of 200 people. The size of pictures is 64  64 pixels. The training set comprises images of “ba,” “bc,” “bh,” and “bk” classes. The remaining constitutes the test data. The feature vector is expressed as , where is the gray scale and are the response values of Gabor filter. The values of u and are 0–4 and 0–7, respectively.

The COIL-20 dataset consists of 20 objects, each comprises 72 pictures with a size of 128  128 pixels. The sample images are presented in Figure 4. Gray scale, first- and second-order gradients are used to compute the region covariance descriptor. Hence, the region covariance descriptor of an image is a matrix. 10 images are randomly selected to the training data, and the remaining images are the testing data.

ETH80 is an object dataset, including images of apples, pears, cars, and dogs. The dataset contains a total of 410 images from 10 instances. The size of images in ETH80 is 128  128 pixels (Figure 5). For the region covariance descriptor, we extract the following features:where , , and represent red, green, and blue color scale, respectively, is the gray scale, and , , , and are the first- and second-order gradients of gray scale. The dimensionality of region covariance descriptor is 10  10. In each object, half of the instances are randomly selected for training and the remaining are used for testing. 100 images are randomly selected for each instance.

5.2. Compared Methods

To verify the performance of the HRGDA and HSIC regularization framework, RLPP [45], KSLR [19], and CDL [8] were combined with the proposed HSIC regularization, denoted as RLPP-HR, KSLR-HR, and CDL-HR, respectively. The kernel used in is the log-Gaussian kernel. The methods for HSIC regularization are compared with several recognition methods on SPD manifolds, namely, HSIC-SL [44], KPCA [46], RSR [47], TSC [2], Riem-DLSC [22], logEuc-SC [32], and KLRM-DL [48]. All the parameters of the compared algorithms are set based on the recommendation provided in the corresponding literature. The kernel function in HSIC-SL is log-Gaussian kernel. The recognition accuracies of the QMUL and FERET datasets are given in Tables 1 and 2, respectively.

5.3. Comparison of Kernels in HSIC Regularization

In this experiment, three kernel functions are used to compute HSIC regularization, namely, the log-linear, log-Gaussian, and LogDet divergence kernel; here, the HSIC regularization is presented as HR (log-linear), HR (log-Gaussian), and HR (LogDet divergence), respectively. Table 3 lists the recognition results of different kernels on the COIL-20 and ETH80 datasets.

5.4. Discussion

The detailed classification results of all methods on four image datasets (i.e., QMUL, FERET, COIL-20, and ETH80 datasets) are presented in Tables 13. The discussion was presented as follows:(1)To show that the proposed HSIC regularization kernel framework improves the effectiveness of traditional kernel framework, we compare the classification accuracy of the HSIC regularization kernel methods with that of the traditional kernel methods. RLPP-HR, CDL-HR, and KSLR-HR are the HSIC regularization kernel methods corresponding to RLPP, CDL, and KSLR, respectively. As shown in Tables 1 and 2, the classification accuracy of RLPP-HR is higher than that of RLPP. Similarly, CDL-HR and KSLR-HR achieve better classification accuracy than CDL and KSLR, respectively. This conclusion is more obvious in Table 3. Irrespective of the type of kernel function employed in HSIC, the classification accuracy of the HSIC regularization kernel methods is better than that of the traditional methods. These results indicate that HSIC regularization considerably improves the performance of traditional algorithms. Moreover, the proposed HSIC regularization kernel framework is superior to the traditional kernel framework because the former considers the intrinsic connection of SPD matrices and low-dimensional projections.(2)Based on the proposed HSIC regularization kernel framework, this study presents a new method called HRGDA. As shown in Table 1, the recognition accuracy of HRGDA is the best on the QMUL dataset. Table 2 presents the performances of all methods on FERET. The recognition accuracy of HRGDA on “bd” and “bg” is higher than those of the other methods; however, in the case of average accuracy, the proposed HRGDA is the second. The reason might be that FERET is a face recognition dataset which contains subtle features. The HRGDA performs slightly worse when classifying datasets comprising subtle features. Table 3 gives the classification accuracy of all methods for the COIL-20 and ETH80 datasets. The HRGDA method is superior to the compared methods for the COIL-20 dataset. In the case of ETH80, the performance of HRGDA is slightly worse than that of KSLR-HR (log-linear). It is worth noting that the KSLR-HR (log-linear) is a new method generated from the proposed HSIC regularization kernel framework. In other words, the HRGDA is still better than the traditional methods for the ETH80 dataset. Among the four classification experiments, the classification accuracy of HRGDA is the best in the case of QMUL, COIL-20, and ETH80 datasets and the second in the case of FERET. In general, the HRGDA is indeed an excellent algorithm on SPD manifolds.(3)We compare the effectiveness of the kernel functions for computing HSIC in Table 3. It is shown that the selection of the kernel function affects the performance of HSIC regularization. The performance of different kernels is diverse across different datasets. However, irrespective of the type of kernel functions employed in HSIC, the classification accuracy increases by 2–8% on the basis of the traditional methods. User can compare the result of different kernels and select the most appropriate one.

Overall, we consider that the experimental results verify the proposed HSIC regularization framework as an effective framework for kernel methods on SPD manifolds. Also, the proposed HRGDA is an accurate and valid learning method of SPD manifolds.

6. Conclusions

Herein, we propose an HSIC regularization kernel learning framework to improve traditional kernel framework on SPD manifolds by introducing the HSIC of SPD matrices and low-dimensional projections. Traditional kernel framework neglects the connection of SPD matrices and linear projections. To solve this problem, the proposed framework uses HSIC to measure the statistical correlation between SPD matrices and projections. The proposed framework can be applied to some specific forms of kernel methods on SPD manifolds, such as RLPP, CDL, and KSLR. Moreover, HSIC regularization consistently improves the classification accuracy of the traditional methods. To improve the applicable scenarios of this framework, we investigate different kernel functions to calculate the HSIC, i.e., log-linear, log-Gaussian, and LogDet divergence kernels. Additionally, we propose a method on the basis of the HSIC regularization kernel framework. Experiments demonstrate that the HSIC regularization consistently improves the classification accuracy of the traditional algorithms. We believe that the finding of this work can provide important contributions to the development of kernel methods on SPD manifolds. Furthermore, the proposed HRGDA is also found to be an effective method on SPD manifolds.

However, there are still several deficiencies in applications. The performance of HRGDA is slightly worse than that of other methods in the classification of subtle texture. We will develop additional kernel functions on SPD manifolds for this framework because having access to diverse kernel functions will increase the flexibility and applicability of HSIC regularization.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China through the project “Research on Nonlinear Alignment Algorithm of Local Coordinates in Manifold Learning” under grant no. 61773022, the Character and Innovation Project of Education Department of Guangdong Province under grant no. 2018GKTSCX081, the Young Innovative Talents Project of Education Department of Guangdong Province under grant no. 2020KQNCX191, the Guangzhou Science and Technology Plan Project of Bureau of Science and Technology of Guangzhou Municipality under grant no. 202102020700, and the Educational Big Data Enterprise Lab of Guangzhou Panyu Polytechnic under grant no. 2021XQS05.