The Simplified Expression of Machine Learning and Multivariate Statistical Analysis Based on the Centering Matrix

Xue, Zhen; Zhang, Liangliang

doi:https://doi.org/10.1155/2021/5545061

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 5545061 | https://doi.org/10.1155/2021/5545061

The Simplified Expression of Machine Learning and Multivariate Statistical Analysis Based on the Centering Matrix

Zhen Xue¹and Liangliang Zhang¹

Academic Editor: Maria Patrizia Pera

Received31 Jan 2021

Revised24 Mar 2021

Accepted26 Mar 2021

Published28 Apr 2021

Abstract

In machine learning (ML) algorithms and multivariate statistical analysis (MVA) problems, it is usually necessary to center (zero centered or mean subtraction) the original data. The centering matrix plays an important role in this process. The full consideration and use of its properties may contribute to the speed or stability improvement of some related ML algorithms. Therefore, in this paper, we discussed in detail the properties of the centering matrix, proved some previously known properties, and deduced some new properties. The involved properties mainly consisted of the centering, quadratic form, spectral decomposition, null space, projection, exchangeability, Kronecker square, and so on. Based on this, we explored the simplified expression role of the centering matrix in the principal component analysis (PCA) and regression analysis theory. The results show that the sum of deviation squares which is widely used in regression theory and variance analysis can be expressed by the quadratic form with the centering matrix as the kernel matrix. The ML algorithms introducing the centering matrix can greatly simplify the learning process and improve predictive ability.

1. Introduction

Among the regression problems of machine learning (ML) and during the course of training the neural networks, we usually need to center (zero centered or mean subtraction) the original data. The centering matrix can achieve this goal. It is a symmetric idempotent matrix and has been successfully applied in various fields, such as transfer learning [1, 2], feature learning [3], extreme learning [4], ensemble learning [5–9], dictionary learning [10, 11], and multivariate statistical analysis (MVA) [12–16]. The practical applications involve a wide range of aspects (e.g., data-driven fault diagnosis and prognosis [3, 17, 18], sentiment analysis [7, 8], web page classification [9], information retrieval [4, 19, 20], image denoising [10], and signal processing of electronics [11], theoretical chemistry and graph theory [21], and rank data analysis [22]). For instance, Chen S. Z. et al.[4] proposed a flexible ranking extreme learning machine (ELM) method based on matrix-centering transformation to replace the traditional graph Laplacian matrix-based methods. In [19, 20], T. Pahikkala et al. presented two kinds of ranking algorithm for information retrieval through introducing the centering matrix to construct block diagonal matrix as the weighted matrix and finding the linear solution of Rank Reweighted Least Squares (RankRLS). The application of the centering matrix in the optimization problem of transfer learning is discussed in [1, 2]. Long M. S. et al.[2] established an optimization problem and then proposed a feature transformation by adopting the centering matrix. In a recent study, the covariance matrices of the source and target features were expressed by the centering matrix [3]. Wang Q. and Zhang L. investigated the online updating of the generalized inverse of the centered matrix of the data matrix in [5, 6]. In [23], Liu L. P. et al. pointed out the least square solution to linear discriminant analysis (LDA) [24] is the multiplication of pseudoinverse of the centered data matrix and the indicator matrix, which makes the solution being updated without eigenanalysis. In another study, Sun L. et al.[25] investigated the relationship between the generalized eigenvalue problem and least squares problem by using the centering matrix. In [26], the application of the centering matrix in data dimension reduction algorithm of principal component analysis (PCA) [27, 28] is discussed. The application in statistical analysis of the centering matrix is addressed in [12–16]. In [21], Gutman I. and Xiao W. examined the generalized inverse of the Laplacian matrix of a connected graph. One of the conclusions obtained by them was the Laplacian matrix which commutes with its generalized inverse, and their product is the centering matrix. Recently, a MATLAB toolbox data-based key-performance-indicator oriented fault detection toolbox (DB-KIT) was developed for prognosis and fault diagnosis of the complex systems in [17] and the canonical variate dissimilarity analysis was proposed for process incipient fault detection in [18]. Both studies in which the centering matrix plays an important role greatly promoted the development of the data-driven fault diagnosis and prognosis in the modern industry.

With regard to the centering matrix, more consideration and use of its properties and its matrix notation simplified role may contribute to the speed or stability improvement of some related ML algorithms. For this purpose, in this paper, we firstly discussed in detail and established some properties of the centering matrix and drew a mind map to clearly illustrate the relationship between these properties and their potential role in ML and MVA. Then, we explored the theoretical simplification applications of the centering matrix in ML and MVA theory by two examples. The research results can provide help for the proposal or speed improvement of some related ML algorithms on the regression problem or neural network algorithms.

The centering matrix is defined as follows [13].

Definition 1. is called the centering matrix of order n, where is the identity matrix of order n, is the column vector of n ones, is a square matrix of order n with all elements unity, and holds.
For example, is a centering matrix of order 3.
The rest of this paper is organized as follows. Section 2 discusses the properties of the centering matrix. Section 3 presents the main application of the centering matrix in ML and MVA. Finally, we conclude this paper in Section 4.

2. Properties of the Centering Matrix

It can be proved that the centering matrix has the following important properties. In what follows, will always represent the centering matrix, unless otherwise stated.

Lemma 1 (centering property, see [13], and we prove it). For any data vector of order n, when performing a left multiplication on , denoted by (or ), it has the same effect as subtracting the mean of the elements of from each element, i.e., where

Proof. This property illustrates that the data vector can be centered through left multiplication by . This also reveals the origin of the name centering matrix for . Meanwhile, is called the centered vector of . It is easy to know that the sum of all entries of the centered vector is 0.

Corollary 1. Let be a data matrix of order , each column represents the n observations on a particular variable and each row represents a single observation on m variables. Then, the centered matrix of is

Proof. can be partitioned as , and each column is the data vector . We can obtain that its centered vector is by Lemma 1. So, the centered matrix of is
Multiplication by the centering matrix can be used to remove not only the mean of a single vector but also multiple vectors stored in the rows or columns of a matrix. For any matrix of the multiplication removes the means from each of the m columns, while the multiplication removes the means from each of the n rows [14].
It can be easily verified that the centering matrix is symmetric. Sincethe centering matrix is idempotent, i.e., The extension form is
It can be seen from the above two paragraphs that is symmetric and idempotent. Therefore, is a projection matrix. It can be easily proved that is also a projection matrix.
Let the adjoint matrix of be denoted by Since and is also a projection matrix.

Theorem 1 (quadratic form property, see [13]). Let be a vector with a set of data as entries. Then, the covariance or total sum of squares (SST) of the data can be expressed by a quadratic form whose kernel matrix is the centering matrix i.e.,

Proof. By Lemma 1, can be obtained. Then, we calculate the inner product of the vector with itself:Using the symmetry and idempotency of , we can obtain the equation .
The centering matrix has close relationship with the Helmert matrix which is an orthogonal matrix and has important application in the defining contrasts in general linear models of statistics. The corresponding conclusion is as follows.

Proposition 1 (see [12, 29], and we prove it). Let the Helmert matrix of order n be partitioned as where , is a matrix consisting of the last rows of , and each row of is given by the following formula:where and are row vectors of order i with all elements unity and zero, respectively. Then,

Proof. can be obtained by Theorem 1. For is orthogonal, i.e., whereasso thus, Hence, the conclusion holds.
The centering matrix also has a close association with the Laplacian matrix of a connected graph. The related property is as follows.

Proposition 2. (see [21]). The Laplacian matrix of a connected graph with n vertices and its generalized inverse satisfy the following equality:

Theorem 2. The eigenvalues of are 0 of multiplicity 1 and 1 of multiplicity

Proof. Owing to we can find the eigenvalue of at first. Assume that its eigenvalue be and the corresponding eigenvector be . Then, Since and , we can obtain whereas so therefore, or
Let the multiplicity of eigenvalue 0 be . Since is real symmetric, it can be diagonalizable; thus, It is easy to know that , so Because there must be n eigenvalues for square matrix of order n, we can obtain that the multiplicity of eigenvalue n is 1.
The eigenvalue of is equivalent to whereas or so the eigenvalues of are 0 of multiplicity 1 and 1 of multiplicity .

Corollary 2 (the spectral decomposition of the centering matrix). For the centering matrix there must exist an orthogonal matrix of order n such that where are the unit orthogonal eigenvectors corresponding to eigenvalue 1, is the unit orthogonal eigenvector corresponding to eigenvalue 0, and .

Proof. Since the centering matrix is real symmetric, it can be diagonalizable. We can know that the eigenvalues of are 0 of multiplicity 1 and 1 of multiplicity by Theorem 2. Suppose that the unit orthogonal eigenvectors corresponding to eigenvalue 1 are and the unit orthogonal eigenvector corresponding to eigenvalue 0 is Let Then, is orthogonal and satisfies where is a diagonal matrix. The following formula can be obtained by using the equation for left multiplication by and right multiplication by :

Remark 1. In the previous formula, each term at the right end is the outer product of each of the unit orthogonal eigenvector of with itself multiplying the corresponding eigenvalue. Each term is a square matrix with rank 1. The above formula has simple and special structure, so it has important applications in statistics.

Corollary 3. The centering matrix can be decomposed as , where satisfies the relation

Proof. The equation can be obtained by the proof process of Corollary 2, where the definitions of and are given in Corollary 2. It is easy to know that Therefore, where is a matrix of order such that :We can obtain that has 0 eigenvalue by Theorem 2; therefore, and thus, is irreversible (singular).
The equality can be obtained by Corollary 2. Hence, we have [30] .
It indicates that the Moore–Penrose generalized inverse of is itself, i.e.,
We can know that the eigenvalues of is nonnegative by Theorem 2, so it is positive semidefinite, denoted by hence, it can be used as the covariance matrix of some random variable.
Because , so for arbitrary nonzero column vector . Let where 1 is the ith element. Then, for i.e., the diagonal elements of are nonnegative.
We can know that is similar to diagonal matrix by Corollary 2. The similar matrices have the same rank, so i.e., the rank of is .
We can verify that the rank of is 1 by the theory of linear algebra. The proof process is as follows. On the one hand, therefore, , and thus, Because so On the other hand, since there must exist a nonzero minor of order n-1 for Then, we can obtain and thus, In summary, we have
We can know that the eigenvalues of are 0 with multiplicity 1 and 1 with multiplicity by Theorem 2. Since the trace of a square matrix equals the sum of all eigenvalues of the matrix, i.e., the trace of the centering matrix is .
Because and so i.e., the trace of is .
Although is singular, and is nonsingular for . The related conclusion is as follows:

Proposition 3 (see [12], and we prove it). The centering matrix and the matrix satisfy the relations and for

Proof. For and so Using this equation, we have Therefore, In the same way, we can obtain
Using the equation we get the result

Lemma 2 (see [12], and we prove it). The centering matrix commutes with matrix , and their product is zero matrix, i.e.,

Proof. and

Corollary 4. The null space of is whose dimension is 1 and a set of bases is

Proof. Since and that is, the dimension of the null space is 1. The equation can be obtained by Lemma 2. So, the null space of is i.e.,. Obviously, a set of bases of is

Corollary 5. For the centering matrix , the set of all eigenvectors corresponding to the eigenvalue 0 is

Proof. Since the vector is the column of the matrix and uses Lemma 2, the equation holds. We get the result that the sum vector is an eigenvector of corresponding to the eigenvalue 0 according to the definition of eigenvector. Then, we have the desired result.

Proposition 4 (see [14], and we prove it). The vector resulted from the vector by the transformation and denoted by , which is the projection of to dimension subspace , where the subspace is orthogonal to the subspace

Proof. We can obtain that the eigenvalues of are 0 with multiplicity 1 and 1 with multiplicity by Theorem 2. Suppose that the eigenspace of eigenvalue 1 is dimension subspace and the eigenspace of eigenvalue 0 is 1 dimension subspace ; then, is orthogonal to [22]. It is worth mentioning that the eigenspace is also the null space of the centering matrix .The result can be obtained by Corollary 5. For , assuming that the eigenvector corresponding to eigenvalue 1 is then we can obtain that by the definition of eigenvalue and eigenvector, i.e., Replacing the corresponding value, we have thus, i.e., Since is a projection matrix, is the projection of the vector to dimension subspace
In the following, we will discuss the Kronecker product property of
From the previous text, we can know that is a positive semidefinite matrix. Suppose is a positive semidefinite matrix of order . Since the Kronecker product of two positive semidefinite matrices is also positive semidefinite, is also a positive semidefinite matrix [31]. In particular, is also a positive semidefinite matrix. Next, we will further discuss the other property of

Proposition 5 (Kronecker square property). The Kronecker square of is defined as . It has the following characteristics:(i)It is a projection matrix(ii)It is irreversible(iii)Its Moore–Penrose Generalized inverse is itself, i.e., (iv)Its rank and trace are all equal to , i.e.,

Proof. (i)On the one hand, It shows that is symmetric. On the one hand, Hence, is an idempotent matrix. To sum up, is a projection matrix.(ii) Hence, is irreversible.(iii) So, the Moore–Penrose generalized inverse of is equal to itself.(iv) and Therefore,
The relationship between some previous properties such as the centering property, quadratic form property, spectral property, and their roles in ML, MVA, theoretical chemistry, and graph theory can be illustrated in Figure 1.

3. The Application of the Centering Matrix in ML and Statistical Analysis

3.1. The Application in the PCA

It is often necessary to calculate the eigenvector of the sample covariance matrix in dimensionality reduction by using the PCA, one of the ML algorithms. In this case, it is more convenient to represent the covariance matrix by the centering matrix.

Example 1. Suppose the sample data matrix isThen, the mean of the sample isThe sample covariance matrix is as follows [26]:Replacing by and using the symmetry and idempotency of , we can obtainwhere is the scatter matrix.
This example shows that the sample covariance matrix and scatter matrix can be expressed succinctly by the centering matrix. Meanwhile, we can know that the sample covariance matrix is a Gramian matrix and a positive semidefinite matrix. Similarly, the correlation matrix can also be expressed simply by the centering matrix.

3.2. The Application in the Regression Analysis

Example 2. The centering matrix has important simplified expression role in regression problem of supervised learning. Suppose there is a linear regression model as shown below between the random vector and nonrandom factors where is the random error, and are unknown parameters. In order to determine their value, substitute the data , by independent observations into the above formula; then, we can obtain the following equations:The corresponding matrix form is whereIt can be found that the least square estimation of the parameter is
The fitted value of can be obtained by substituting the above formula into and omitting error term where is a projection matrix of order n. The expression of residual vector is shown as follows:where is the unity matrix of order n, and if is singular, then find its Moore–Penrose generalized inverse.
Next, we are going to express the regression sum of squares (SSR), the error sum of squares (SSE), and SST by the centering matrix and do further derivation.(i)The following formula about SSR can be obtained by Theorem 1 and the equation On the one hand, the equality holds [32]. On the other hand, Combining the above two equations, we have Therefore,(ii)SSE is also called residual sum of squares and defined as Due to the above formula is changed into We can obtain the following expression of by Theorem 1; hence, We can obtain by (i). Therefore, the preceding formula reduces to (iii)According to Theorem 1, SST can be expressed asIn the above example, for the sum of squares such as SSR and SSE, we expressed it by the centering matrix firstly and then obtained the quadratic form expression of after a series of derivation. For the sum of squares SST, we directly obtained its quadratic form expression by Theorem 1. The Table 1 gives the quadratic form expression in which the kernel function is and the quadratic form expression of for SSR, SSE, and SST, respectively.
From Table 1, we can observe that SSR, SSE, and SST are all expressed as the quadratic form with the kernel matrix Generally speaking, sum of squares all can be considered to be expressed by the centering matrix. For instance, the sum of squares of normally distribution known as distribution is such a case. From formulas (23), (25), and (26), we can easily deduce the famous decomposition equality which is of vital importance in regression theory and analysis of variance.
In addition, the centering matrix has important simplified role in the ranking algorithms in information retrieval [4, 19, 20]. For instance, in [19], T. Pahikkala et al. introduced the centering matrix to construct a block diagonal matrix as weighted matrix, express the loss function, obtain the matrix notation of a least squares problem, and conveniently find the linear solution of RankRLS algorithm.

4. Conclusion

As we know, the centering matrix plays an important role in ML and MVA. In this paper, we mainly carried out the theory research of the centering matrix. We discussed in detail some properties of the centering matrix, proved some previously known properties, deduced some new properties of the centering matrix, and finally drew a mind map to clearly illustrate the relationship between these properties and their potential role in ML and MVA. Then, we explore the application of the centering matrix such as PCA and regression analysis. The research results show that the centering matrix has excellent characteristics and simplified expression role by matrix notation in ML and MVA. The sample covariance matrix, scatter matrix, and sum of deviation squares all can be expressed succinctly by the centering matrix. Since the centering matrix has good properties such as symmetry and idempotency, for the algorithms based on it, the learning process can be greatly simplified and the performance such as training complexity, stability, and speed can be improved.

Data Availability

This paper is the theoretical research on the centering matrix. There are no corresponding programs and software.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61801437), Research Topics of Social and Economic Statistics in Shanxi Province (KY2020128), and Shanxi Natural Science Foundation (201801D221040).

References

M. S. Long, J. M. Wang, G. G. Ding, J. G. Sun, and P. S. Yu, “Transfer feature learning with joint distribution adaptation,” in Proceedings of the ICCV, pp. 2200–2207, Sydney, Australia, December 2013.
View at: Google Scholar
T. Yang, 2020, The centering matrix [Online] https://orzyt.cn/posts/centering-matrix/.
J. An, P. Ai, and D. K. Liu, “Deep domain adaptation model for bearing fault diagnosis with domain alignment and discriminative feature learning,” Shock and Vibration, vol. 2020, 14 pages, 2020.
View at: Publisher Site | Google Scholar
S. Z. Chen, K. Chen, C. F. Xu, and L. Lan, “Flexible ranking extreme learning machine based on matrix-centering transformation,” in Proceedings of the IJCNN, Rio de Janeiro, Brazil, July 2018.
View at: Google Scholar
Q. Wang and L. Zhang, “Online updating the generalized inverse of centered matrices,” in Proceedings of the 25th AAAI Conference on Artificial Intelligence, San Francisco, California USA, August 2011.
View at: Google Scholar
Q. Wang, Research on Several Key Problems of Ensemble Learning Algorithms, Fudan University, Shanghai, China, 2011.
A. Onan and S. Korukoğlu, “A feature selection model based on genetic rank aggregation for text sentiment classification,” Journal of Information Science, vol. 43, no. 1, pp. 25–38, 2017.
View at: Publisher Site | Google Scholar
A. Onan, S. Korukoğlu, and H. Bulut, “A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification,” Expert Systems with Applications, vol. 62, pp. 1–16, 2016.
View at: Publisher Site | Google Scholar
A. Onan, “Classifier and feature set ensembles for web page classification,” Journal of Information Science, vol. 42, no. 2, pp. 150–165, 2016.
View at: Publisher Site | Google Scholar
M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Transactions on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.
View at: Publisher Site | Google Scholar
J. Mairal, F. Bach, and J. Ponce, “Sparse modeling for image and vision processing,” Foundations and Trends in Computer Graphics and Vision, vol. 8, pp. 46–56, 2014.
View at: Google Scholar
S. R. Searle, Matrix Algebra Useful for Statistics, John Wiley & Sons, New York, NY, USA, 1982.
X. D. Zhang, Matrix Analysis and Application, Higher education press, Beijing, China, 2004.
Wikipedia, “Centering matrix [online],” 2020, https://encyclopedi a.thefreedictionary.com/Centering+matrix.
View at: Google Scholar
K. T. Fang and M. Chen, Matrix Algebra in Statistics, Higher Education Press, Beijing, China, 2013.
J. E. Gentle, Matrix Algebra: Theory, Computations, and Application in Statistics, Springer Science+Business Media, LLC, New York, NY, USA, 2007.
Y. Jiang and S. Yin, “Recent advances in key-performance-indicator oriented prognosis and diagnosis with a MATLAB toolbox: db-kit,” IEEE Transactions on Industrial Informatics, vol. 15, no. 5, pp. 2849–2858, 2019.
View at: Publisher Site | Google Scholar
K. E. S. Pilario and Y. Cao, “Canonical variate dissimilarity analysis for process incipient fault detection,” IEEE Transactions on Industrial Informatics, vol. 14, no. 12, pp. 5308–5315, 2018.
View at: Publisher Site | Google Scholar
T. Pahikkala, A. Airola, P. Naula, and T. Salakoski, “Greedy rankrls: a linear time algorithm for learning sparse ranking models,” in Proceedings of the Sigir 2010 Workshop on Feature Generation and Selection for Information Retrieval, pp. 11–18, Geneva, Switzerland, July 2010.
View at: Google Scholar
T. Pahikkala, W. Waegeman, A. Airola, T. Salakoski, and B. D. Baets, “Conditional ranking on relational data,” in Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 1–16, Athens, Greece, September 2010.
View at: Google Scholar
I. Gutman and W. Xiao, “Generalized inverse of the Laplacian matrix and some applications,” Bulletin: Classe des sciences mathematiques et natturalles, vol. 129, no. 29, pp. 15–23, 2004.
View at: Publisher Site | Google Scholar
J. I. Marden, Analyzing and Modeling Rank Data, vol. 59, Taylor & Francis Group, LLC, Boca Raton, FL, USA, 1995.
L. P. Liu, Y. Jiang, and Z. H. Zhou, “Least square incremental linear discriminant analysis,” in Proceedings of ICDM, pp. 298–306, Miami, Florida, December 2009.
View at: Google Scholar
R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936.
View at: Publisher Site | Google Scholar
L. Sun, S. W. Ji, and J. P. Ye, “A least squares formulation for a class of generalized eigenvalue problems in machine learning,” in Proceedings of the 26th International Conference on Machine Learning (ICML), pp. 1207–1216, Montreal, Canada, 2009.
View at: Google Scholar
CSDN, “Dimension reduction,” 2020, https://blog.csdn.net/cengjing12/article/details/106268447?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-5.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-5.nonecase.
View at: Google Scholar
K. Pearson, “On lines and planes of closest fit to systems of points in space,” Philosophical Magazine, vol. 2, no. 6, pp. 559–572, 1901.
View at: Publisher Site | Google Scholar
H. Hotelling, “Analysis of a complex of statistical variables into principal components,” Journal of Educational Psychology, vol. 24, no. 6, pp. 417–441, 1933.
View at: Publisher Site | Google Scholar
J. Li, X. L. Li, Y. T. Su, and Z. Z. Ding, Problem Solutions for Matrix Analysis and Applications, Tsinghua University Press, Beijing, China, 2007.
S. G. Wang, The Theory and its Application of Linear Models, The Anhui Education Press, Hefei, China, 1987.
R. F. Yan and X. M. Fan, “The probability proof of several theorems in matrix theory,” Journal of Northwest Normal University (Natural Science), vol. 63, no. 2, pp. 71-72, 1987.
View at: Google Scholar
C. L. Mei and J. C. Fan, Data Analysis Method, Higher Education Press, Beijing, China, 2006.

Copyright

Copyright © 2021 Zhen Xue and Liangliang Zhang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

1057

Downloads

601

Citations