Applications in Science and Engineering for Modelling, Analysis and Control of Chaos
View this Special IssueResearch Article  Open Access
Shangfang Li, "A Gaussian Process Latent Variable Model for Subspace Clustering", Complexity, vol. 2021, Article ID 8864981, 7 pages, 2021. https://doi.org/10.1155/2021/8864981
A Gaussian Process Latent Variable Model for Subspace Clustering
Abstract
Effective feature representation is the key to success of machine learning applications. Recently, many feature learning models have been proposed. Among these models, the Gaussian process latent variable model (GPLVM) for nonlinear feature learning has received much attention because of its superior performance. However, most of the existing GPLVMs are mainly designed for classification and regression tasks, thus cannot be used in data clustering task. To address this issue and extend the application scope, this paper proposes a novel GPLVM for clustering (CGPLVM). Specifically, by combining GPLVM with the subspace clustering method, our CGPLVM can obtain more representative latent variable for clustering. Moreover, it can directly predict the new samples by introducing a back constraint in the model, thus being more suitable for big data learning tasks such as analysis of chaotic time series and so on. In the experiment, we compare it with the related GPLVMs and clustering algorithms. The experimental results show that the proposed model not only inherits the feature learning ability of GPLVM but also has superior clustering accuracy.
1. Introduction
In machine learning tasks, data are often distributed in a highdimensional space and have many redundant features. Training machine learning models in such highdimensional data may result in not only higher computational and storage complexities but also the model overfitting problem [1]. Existing research studies have shown that highdimensional data are often embedded in lowdimensional manifold. We therefore can utilize the dimension reduction and feature learning method to learn the lowdimensional manifold and obtain more representative feature for the improvements of machine learning model accuracy and efficiency. Thus, effective feature representation is the key to success of machine learning applications.
In the past decade, many related methods have been proposed, such as dictionary learning [2], autoencoder [3], Gaussian process latent variable model (GPLVM) [4], Isomap [5], and locally linear embedding [6]. Among these models, the GPLVM for nonlinear feature learning has received extensive attention because of its superior feature learning ability and has been used in many applications such as dynamical system [7], modelling and control of nonlinear system [8]. Given a few training samples, it can effectively learn the lowdimensional manifold that is embedded in the highdimensional space, thus has been widely used in the dimension reduction and data visualization tasks [9, 10].
Although has the abovementioned advantages, the conventional GPLVM is just a fully unsupervised feature learning model, thus cannot meet the demands of realworld applications, when dealing with specific machine leaning tasks such as analysis of chaotic time series, dynamical system [7], and modelling and control of nonlinear system in which we also observe the response values of the inputs. How to modify the GPLVM and improve its performance is the key content of the related research studies. To date, the extensions of this model mainly focus on the supervised and unsupervised learning methods [9, 11, 12]. These methods assume that apart from the input features, we also observe labels of the samples. By their extensions, the GPLVM can effectively utilize the supervised information to improve the classification accuracy of the learned latent variables. However, in the realworld applications, we may also deal with unsupervised clustering tasks in which we cannot obtain the label information or any other auxiliary information, thus bringing more challenges to application of the GPLVM in clustering tasks.
In order to address the abovementioned issues, this paper proposes a fusion model that combines the GPLVM with the subspace clustering model [13] to simultaneously obtain more representative features and accurate clustering results. Moreover, we also use the back constraint trick [14] in the model, which makes the model predict new samples directly and more suitable for big data learning tasks such as analysis of chaotic time series. In the experiment, we verify the performance of the proposed model on multiple datasets. The experimental results show that our model has much superior clustering performance than the other related models.
2. Related Work
2.1. Gaussian Process Latent Variable Model
The GPLVM is a fully unsupervised and nonlinear latent variable model. In this model, given observed samples (where denotes the training sample), our objective is to learn the corresponding lowdimensional latent variable. In this paper, we use () to denotes the latent variable of . Obviously, the GPLVM can realize dimension reduction by learning the latent variables. Specifically, the GPLVM assumes that the generation process of as follows:where is the feature of the training sample, is the noise term that follows a Gaussian distribution , and is a function that follows a Gaussian process prior. We use to denote the outputs of with inputs . Thus, we have , where is the kernel matrix that is computed by using kernel function on the latent variables in . The row and column element of is computed as . By integrating out the intermediate variable , we can obtainthe following marginal likelihood function:where denotes the hyperparameters involved in the kernel function and noise distribution. In the model optimization process, the GPLVM learns the latent variable and hyperparameters jointly by maximizing the above likelihood function and obtains the lowdimensional representation finally.
From the abovementioned generation process, as a fully unsupervised dimension reduction model, the GPLVM cannot embed auxiliary information when dealing with specific machine learning tasks, thus cannot meet demands of realworld application. For example, in analysis of chaotic time series, data of similar time will have similar features. If it can utilize this knowledge, the GPLVM will learn more representative features for the task and significantly improve the prediction accuracy. The existing methods for the extension of GPLVM mainly focus on embedding supervised information to improve its classification and regression accuracy, for example, the discriminative GPLVM (DGPLVM) and supervised GPLVM (SGPLVM). For the extension to the clustering task, the related works is much fewer. The existing unsupervised GPLVM just focuses on how to preserve the local distance and learn better latent variables or features. For example, local preserving projection GPLVM (LPPGPLVM) combines the objective of local preserving projection with that of the GPLVM, thus simultaneously learning the lowdimensional representation and preserving the local structures [15]. The GPLVM with back constraints (BGPLVM) introduces a backconstraint (from observed space to latent space) into the GPLVM. By this way, it can also realize the preservation of local distance.
2.2. Subspace Clustering
The goal of subspace clustering is to segment a set of data samples into different subspaces; thus, similar samples are in the same subspace, while dissimilar samples are in different subspaces. Over the past decade, subspace clustering has been used in various clustering tasks and many welldesigned algorithms have been proposed such as Gaussian mixture model (GMM) based methods [16, 17], matrix factorization (MF) based methods [18, 19], algebrabased methods [20], and spectral clustering methods [13, 21, 22]. Among these models, the subspace clustering method based on spectral clustering has been widely applied because of its concise implementation process and reliable performance. It uses lowrank representation to construct the affinity matrix of the spectral clustering. Its objective is to find the lowrank representation of input data by optimizing the following function:where we assume that each sample can be expressed by the linear combination of other samples. The above lowrank penalty term can be considered as a global constraint on the subspace structure of samples and makes similar samples have similar weights. In general, we can use the following nuclear norm to replace the penalty term:where we use the nuclear norm to approximate the rank of . Considering that the data often contain noise, we use the following formulation to learn the selfrepresentation matrix :
In lowrank subspace clustering, we can first construct the affinity matrix and the Laplace matrix and then use spectral clustering to cluster the data. and can be constructed as follows:where denotes a diagonal matrix and . After obtaining the Laplace matrix, we can optimize the following objective function to obtain the latent variable :
Obviously, is composed of the eigenvectors corresponding to the smallest eigenvalues. At last, we can run the kmeans algorithm on the learned and obtain the clustering result.
3. Model Construction and Optimization
3.1. Designing of the Gaussian Process Latent Variable Clustering Model
Assuming that there are observed samples denoted as , our goal is to learn the lowdimensional latent variable corresponded to these observed variables and make the latent variable have more superior clustering performance (i.e., make the common clustering algorithms obtain accurate clustering result on the learned ).
In order to achieve the above goal, we assume that the latent variable has the following prior distribution:where is a constant that makes and has the following form:where is the row and column element of the affinity matrix . Equation (9) often can be written as follows:
In this paper, we assume that the generation process observed variables from the latent variable can be constructed by conditional distribution . Thus, from the Bayes formulation, we can obtainthe posterior distribution of latent variable as
Since is a constant, we therefore can obtain the optimal latent variable by maximizing the following joint marginal distribution:
To introduce the GPLVM into this model, we assume that is generated by latent function which follows a Gaussian process prior. Thus, equation (12) can be written aswhere denotes the hyperparameter that is involved in the kernel function and denotes the variance of Gaussian noise distribution.
By the above modelling process, the GPLVM can effectively embed the sample similarity information when learning the latent variable, thus improving its latent variable clustering ability. However, how to learn the affinity matrix is still an urgent problem of this paper and other related algorithms such as selfrepresentation learning and subspace clustering. In this paper, we borrow the idea of lowrank selfrepresentation learning and introduce the following lowrank subspace constraints into the model:
It is worth noting that, in this paper, we assume that , i.e., we directly use matrix as the affinity matrix. This setting is the same as that of [23] and its role is similar to the affinity matrix of the original subspace clustering. This CGPLVM is very similar to the LPPGPLVM. However, in the LPPGPLVM, the Laplace matrix is fixed. Different from LPPGPLVM, the Laplace matrix in our CGPLVM can be learned in the training process. Thus, our CGPLVM has more superior performance than the LPPGPLVM.
One important limitation of GPLVM and selfrepresentation is that they cannot effectively predict the new samples. To mitigate this problem, we introduce a back constraint on the proposed model. Thus, given a new sample, it can effectively predict the corresponding lowdimensional latent variable using the constraint function. Specifically, given an observed sample , we assume that we can use a function to obtain latent variable :where is the neural network function with learnable parameter . At last, we obtain the objective of the proposed model as follows:
The whole model structure is shown in Figure 1.
4. Model Optimization
In order to optimize (16), we transform it into the following optimization problem:where , , and are regularization terms. By this formulation, we can use the alternating iterative optimization method to learn all the parameters. First, we fix and write (17) as
This problem can be solved effectively by using gradientbased methods, and its gradient with respect to can be computed as
The gradients with respect to and are similar to the above formulation. For the sake of brevity, we have omitted their derivation processes. We then can fix , and and write (17) as
The gradient of the first term with respect to can be computed aswhere denotes the row of matrix and denotes the vector whose element is 1. The gradient of the second term can be computed as
The subgradient of the third term is
By the above derivations, we can learn the whole model, as descripted in Algorithm 1. The main computation complexity of CGPLVM is the inversion of the kernel matrix, which has a complexity of , where is the number of training samples. The main storage complexity of this is the storage of kernel matrix, which has a complexity of . Thus, both the computation and storage complexities are the same as those of the conventional GPLVM.

5. Experiments and Analysis
5.1. Experimental Setup
To verify the effectiveness of CGPLVM, we use 8 datasets in the experiments. The detailed information of these datasets is given in Table 1.

The YEAST is a dataset for the prediction of protein localization sites. The USPS is a digits dataset that was gathered at the Center of Excellence in Document Analysis and Recognition at SUNY Buffalo, as part of a project sponsored by the US Postal Service. YALE, JAFFE, and ORL are three face recognition datasets, as shown in Figure 2. TR11, TR41, and TR45 are three textual datasets.
(a)
(b)
(c)
In order to fully verify the advantage of CGPLVM, we compare it to the related Gaussian process latent variable model (i.e., GPLVM, BGPLVM, and LPPGPLVM) and clustering methods, such as spectral clustering method (SC) [24], kernel spectral clustering (KSC) [25], and simplex sparse representation learning (SSR) [21]. All the kernelbased models (GPLVM, LPPGPLVM, KSC, and CGPLVM) used Radial Basis Function (RBF) as the kernel function. It is worth noting that some other kernel functions can also be used in the proposed model such as linear kernel, Laplacian kernel, and circular kernel. Furthermore, all the hyperparameters in these kernel functions can also be learned in the same form as descripted in the paper. During the experiment process, the hyperparameters , , and are chosen from . The hyperparameters involved in other models are set to be the same as those of original paper. In the experiment process, we use the Gaussian process toolkit (GPFlow) ^{1} to implement the GPbased model. Other related models are all implemented with python. All the algorithms are tested on the Windows computer with i7 9700 CPU, 16G RAM.
5.2. Clustering Results and Analysis
In the experiments, we use clustering accuracy, purity, and normalized mutual information (NMI) as the clustering measurement. At the clustering stage, the latent variables learned by different methods are used as inputs and the kmeans algorithm is used to obtain the final clustering methods. The dimension and the number of clusters are set to be the same as the number of classes. At the same time, in order to mitigate the initial value sensitivity problem of kmeans method, we randomly initialize and run the kmeans method 20 times. Finally, we calculate the mean and standard deviation of these 20 experiments. The experimental results are shown in Tables 2–4, where the best results are given in bold.



From Tables 2–4, we can observe that the GPLVM, as an unsupervised dimension reduction model, usually obtain latent variables that have poor clustering performance. The BGPLVM and LPPGPLVM can preserve the local distance of samples during the feature learning process, thus obtaining more representative latent variables. Meanwhile, the LPPGPLVM obtains much better result than the BGPLVM which indicates that graph Laplace regularization is more suitable for clustering than back constraints. In general, spectral clustering and subspace clustering methods have better performance than the GPLVM. As we can see, SC, KSC, and SSR outperform GPLVM, BGPLVM, and LPPGPLVM. In this paper, the proposed CGPLVM combines the subspace clustering with GPLVM, thus effectively improving the clustering performance of the GPLVM. As shown in the experimental results, the CGPLVM has more superior clustering result than other related models in most cases.
6. Conclusion and Future Work
This paper proposes a joint model by combining the lowrank subspace with the back constraint GPLVM to address the poor clustering performance problem of the conventional GPLVM. The proposed CGPLVM can not only obtain lowdimensional latent variables but also directly predict the new samples, thus effectively extending the application scope of GPLVM on tasks such as analysis of chaotic time series. The experimental results show that the CGPLVM has much better latent variable learning ability and superior clustering performance. In the future work, we will further extend the CGPLVM to make it suitable for much bigger dataset and supervise tasks such as classification and regression, improving its efficiency and application scope.
Data Availability
The experimental data used to support the findings of this study have been deposited in the UCI repository (https://archive.ics.uci.edu/ml/index.php).
Conflicts of Interest
The author declares that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the Project of Yulin Normal University Research under Grant 2015YJYB03.
References
 X. Li and L. Shu, “Kernel based nonlinear dimensionality reduction for microarray gene expression data analysis,” Expert Systems with Applications, vol. 36, no. 4, pp. 7644–7650, 2009. View at: Publisher Site  Google Scholar
 K. KreutzDelgado, B. D. Murray, K. Engan, T.W. Lee, and T. J. Sejnowski, “Dictionary learning algorithms for sparse representation,” Neural Computation, vol. 15, no. 2, pp. 349–396, 2003. View at: Publisher Site  Google Scholar
 L. Fengfu, Q. Hong, and Z. Bo, “Discriminatively boosted image clustering with fully convolutional autoencoders,” Pattern Recognition, vol. 83, pp. 161–173, 2018. View at: Google Scholar
 N. D. Lawrence, “Probabilistic nonlinear principal component analysis with Gaussian process latent variable models,” Journal of Machine Learning Research, vol. 6, pp. 1783–1816, 2005. View at: Google Scholar
 J. B. Tenenbaum, V. De Silva, J. Langford et al., “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000. View at: Publisher Site  Google Scholar
 S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000. View at: Publisher Site  Google Scholar
 J. M. Wang, A. Hertzmann, D. M. Blei et al., “Gaussian process dynamical models,” in Proceedings of the 18th International Conference on Neural Information Processing Systems, pp. 1441–1448, Vancouver, Canada, December 2005. View at: Google Scholar
 J. Hall, C. E. Rasmussen, J. M. Maciejowski et al., “Modelling and control of nonlinear systems using gaussian processes with partial model information,” in Proceedings of the Conference on Decision and Control, pp. 5266–5271, Orlando, FL, USA, December 2012. View at: Google Scholar
 L. Cai, L. Huang, C. Liu et al., “Age estimation based on improved discriminative Gaussian process latent variable model,” Multimedia Tools and Applications, vol. 75, no. 19, pp. 11977–11994, 2016. View at: Publisher Site  Google Scholar
 C. Lu and X. Tang, “Surpassing humanlevel face verification performance on LFW with gaussianface,” in Proceedings of the Computer Vision and Pattern Recognition, Columbus, OH, USA, June 2014. View at: Google Scholar
 X. Gao, X. Wang, D. Tao et al., “Supervised Gaussian process latent variable model for dimensionality reduction,” Systems Man and Cybernetics, vol. 41, no. 2, pp. 425–434, 2011. View at: Google Scholar
 S. Eleftheriadis, O. Rudovic, M. Pantic et al., “Discriminative shared Gaussian processes for multiview and viewinvariant facial expression recognition,” IEEE Transactions on Image Processing, vol. 24, no. 1, pp. 189–204, 2015. View at: Publisher Site  Google Scholar
 G. Liu, Z. Lin, Y. Yu et al., “Robust subspace segmentation by lowrank representation,” in Proceedings of the International Conference on Machine Learning, pp. 663–670, Haifa, Israel, June 2010. View at: Google Scholar
 N. D. Lawrence and J. Quinonerocandela, “Local distance preservation in the GPLVM through back constraints,” in Proceedings of the International Conference on Machine Learning, pp. 513–520, New York, NY, USA, June 2006. View at: Google Scholar
 W. Xiu, “A latent variable model based on local preservation,” Pattern Recognition and Artificial Intelligence, vol. 23, no. 3, pp. 369–375, 2010. View at: Google Scholar
 P. S. Bradley and O. L. Mangasarian, “Kplane clustering,” Journal of Global Optimization, vol. 16, no. 1, pp. 23–32, 2000. View at: Publisher Site  Google Scholar
 P. Tseng, “Nearest qFlat to m Points,” Journal of Optimization Theory and Applications, vol. 105, no. 1, pp. 249–252, 2000. View at: Publisher Site  Google Scholar
 J. P. Costeira and T. Kanade, “A multibody factorization method for independently moving objects,” International Journal of Computer Vision, vol. 29, no. 3, pp. 159–179, 1998. View at: Publisher Site  Google Scholar
 C. W. Gear, “Multibody grouping from motion images,” International Journal of Computer Vision, vol. 29, no. 2, pp. 133–150, 1998. View at: Publisher Site  Google Scholar
 R. Vidal, Y. Yi Ma, and S. Sastry, “Generalized principal component analysis (GPCA),” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1945–1959, 2005. View at: Publisher Site  Google Scholar
 J. Huang, F. Nie, H. Huang et al., “A new simplex sparse learning model to measure data similarity for clustering,” in Proceedings of the International conference on Artificial Intelligence, pp. 3569–3575, New Delhi, India, February 2015. View at: Google Scholar
 S. Luo, C. Zhang, W. Zhang et al., “Consistent and specific multiview subspace clustering,” in Proceedings of the National Conference on Artificial Intelligence, pp. 3730–3737, Vancouver, Canada, February 2018. View at: Google Scholar
 Z. Kang, H. Xu, B. Wang, H. Zhu, and Z. Xu, “Clustering with similarity preserving,” Neurocomputing, vol. 365, pp. 211–218, 2019. View at: Publisher Site  Google Scholar
 U. Von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, 2007. View at: Publisher Site  Google Scholar
 C. Alzate and J. A. K. Suykens, “Multiway spectral clustering with outofsample extensions through weighted kernel PCA,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 32, no. 2, pp. 335–347, 2009. View at: Google Scholar
Copyright
Copyright © 2021 Shangfang Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.