Artificial Intelligence and Its ApplicationsView this Special Issue
Research Article | Open Access
Chang Liu, Tao Yan, WeiDong Zhao, YongHong Liu, Dan Li, Feng Lin, JiLiu Zhou, "Incremental Tensor Principal Component Analysis for Handwritten Digit Recognition", Mathematical Problems in Engineering, vol. 2014, Article ID 819758, 10 pages, 2014. https://doi.org/10.1155/2014/819758
Incremental Tensor Principal Component Analysis for Handwritten Digit Recognition
To overcome the shortcomings of traditional dimensionality reduction algorithms, incremental tensor principal component analysis (ITPCA) based on updated-SVD technique algorithm is proposed in this paper. This paper proves the relationship between PCA, 2DPCA, MPCA, and the graph embedding framework theoretically and derives the incremental learning procedure to add single sample and multiple samples in detail. The experiments on handwritten digit recognition have demonstrated that ITPCA has achieved better recognition performance than that of vector-based principal component analysis (PCA), incremental principal component analysis (IPCA), and multilinear principal component analysis (MPCA) algorithms. At the same time, ITPCA also has lower time and space complexity.
Pattern recognition and computer vision require processing a large amount of multi-dimensional data, such as image and video data. Until now, a large number of dimensionality reduction algorithms have been investigated. These algorithms project the whole data into a low-dimensional space and construct new features by analyzing the statistical relationship hidden in the data set. The new features often give good information or hints about the data’s intrinsic structure. As a classical dimensionality reduction algorithm, principal component analysis has been applied in various applications widely.
Traditional dimensionality reduction algorithms generally transform each multi-dimensional data into a vector by concatenating rows, which is called Vectorization. Such kind of the vectorization operation has largely increased the computational cost of data analysis and seriously destroys the intrinsic tensor structure of high-order data. Consequently, tensor dimensionality reduction algorithms are developed based on tensor algebra [1–10]. Reference  has summarized existing multilinear subspace learning algorithms for tensor data. Reference  has generalized principal component analysis into tensor space and presented multilinear principal component analysis (MPCA). Reference  has proposed the graph embedding framework to unify all dimensionality reduction algorithms.
Furthermore, traditional dimensionality reduction algorithms generally employ off-line learning to deal with new added samples, which aggravates the computational cost. To address this problem, on-line learning algorithms are proposed [13, 14]. In particular, reference  has developed incremental principal component analysis (IPCA) based on updated-SVD technique. But most on-line learning algorithms focus on vector-based methods, only a limited number of works study incremental learning in tensor space [16–18].
To improve the incremental learning in tensor space, this paper presents incremental tensor principal component analysis (ITPCA) based on updated-SVD technique combining tensor representation with incremental learning. This paper proves the relationship between PCA, 2DPCA, MPCA, and the graph embedding framework theoretically and derives the incremental learning procedure to add single sample and multiple samples in detail. The experiments on handwritten digit recognition have demonstrated that ITPCA has achieved better performance than vector-based incremental principal component analysis (IPCA) and multilinear principal component analysis (MPCA) algorithms. At the same time, ITPCA also has lower time and space complexity than MPCA.
2. Tensor Principal Component Analysis
In this section, we will employ tensor representation to express high-dimensional image data. Consequently, a high-dimensional image dataset can be expressed as a tensor dataset , where is an dimensional tensor and is the number of samples in the dataset. Based on the representation, the following definitions are introduced.
Definition 1. For tensor dataset , the mean tensor is defined as follows:
Definition 2. The unfolding matrix of the mean tensor along the th dimension is called the mode- mean matrix and is defined as follows:
Definition 3. For tensor dataset , the total scatter tensor is defined as follows: where is the norm of the tensor.
Definition 4. For tensor dataset , the mode- total scatter matrix is defined as follows: where is the mode- mean matrix and is the mode- unfolding matrix of tensor .
Since it is difficult to solve orthogonal projective matrices simultaneously, an iterative procedure is employed to approximately compute these orthogonal projective matrices. Generally, since it is assumed that the projective matrices are known, we can solve the following optimized problem to obtain : where and is the mode- unfolding matrix of tensor .
According to the above analysis, it is easy to derive the following theorems.
Theorem 5 (see ). For the order of tensor data , that is, for the first-order tensor, the objective function of MPCA is equal to that of PCA.
Proof. For the first-order tensor, is a vector, then (6) is So MPCA for first-order tensor is equal to vector-based PCA.
Theorem 6 (see ). For the order of tensor data , that is, for the second-order tensor, the objective function of MPCA is equal to that of 2DsPCA.
Proof. For the second-order tensor, is a matrix; it is needed to solve two projective matrices and , then (5) becomes The above equation exactly is the objective function of B2DPCA (bidirectional 2DPCA) [20–22]. Letting , the projective matrix is solved. In this case, the objective function is Then the above equation is simplified into the objective function of row 2DPCA [23, 24]. Similarly, letting , the projective matrix is solved; the objective function is Then the above equation is simplified into the objective function of column 2DPCA [23, 24].
Although vector-based and 2DPCA can be respected as the special cases of MPCA, MPCA and 2DPCA employ different techniques to solve the projective matrices. 2DPCA carries out PCA to row data and column data, respectively, and MPCA employs an iterative solution to compute projective matrices. If it is supposed that the projective matrices are known, then is solved. Equation (6) can be expressed as follows: where .
Because Based on the Kronecker product, we can get the following: So Since is an orthogonal matrix, , , , and .
If the dimensions of projective matrices do not change in iterative procedure, then The above equation is equal to B2DPCA. Because MPCA updates projective matrices during iterative procedure, it has achieved better performance than 2DPCA.
Theorem 7. MPCA can be unified into the graph embedding framework .
Proof. Based on the basic knowledge of tensor algebra, we can get the following: Letting , , we can get the following: where the similarity matrix ; for any , we have . So (16) can be written as follows: So the theorem is proved.
3. Incremental Tensor Principal Component Analysis
3.1. Incremental Learning Based on Single Sample
Given initial training samples , , when a new sample is added, the training dataset becomes .
The mean tensor of initial samples is The covariance tensor of initial samples is The mode- covariance matrix of initial samples is When the new sample is added, the mean tensor is The mode- covariance matrix is expressed as follows: where the first item of (23) is The second item of (23) is
Consequently, the mode- covariance matrix is updated as follows: Therefore, when a new sample is added, the projective matrices are solved according to the eigen decomposition on (26).
3.2. Incremental Learning Based on Multiple Samples
Given an initial training dataset , , when new samples are added into training dataset, , then training dataset becomes into . In this case, the mean tensor is updated into the following: Its mode- covariance matrix is The first item in (28) is written as follows: where Putting (30) into (29), then (29) becomes as follows: The second item in (28) is written as follows: where Then (32) becomes as follows: Putting (31) and (34) into (28), then we can get the following: It is worthy to note that when new samples are available, it has no need to recompute the mode- covariance matrix of all training samples. We just have to solve the mode- covariance matrix of new added samples and the difference between original training samples and new added samples. However, like traditional incremental PCA, eigen decomposition on has been repeated once new samples are added. It is certain that the repeated eigen decomposition on will cause heavy computational cost, which is called “the eigen decomposition updating problem.” For traditional vector-based incremental learning algorithm, the updated-SVD technique is proposed in  to fit the eigen decomposition. This paper will introduce the updated-SVD technique into tensor-based incremental learning algorithm.
For original samples, the mode- covariance matrix is where . According to the eigen decomposition , we can get the following: So it is easy to derive that the eigen-vector of is the left singular vector of and the eigen-values correspond to the extraction of left singular values of .
For new samples, the mode- covariance matrix is where . According to (35), the updated mode- covariance matrix is defined as follows: where . Therefore, the updated projective matrix is the eigen-vectors corresponding to the largest eigen-values of . The main steps of incremental tensor principal component analysis are listed as follows: input: original samples and new added samples, output: projective matrices.
Step 1. Computing and saving
Step 2. For
Processing QR decomposition for the following equation:
Processing SVD decomposition for the following equation:
Computing the following equation:
Then the updated projective matrix is computed as follows: end.
Step 3. Repeating the above steps until the incremental learning is finished.
3.3. The Complexity Analysis
For tensor dataset , , without loss of generality, it is assumed that all dimensions are equal, that is, .
Vector-based PCA converts all data into vector and constructs a data matrix , . For vector-based PCA, the main computational cost contains three parts: the computation of the covariance matrix, the eigen decomposition of the covariance matrix, and the computation of low-dimensional features. The time complexity to compute covariance matrix is , the time complexity of the eigen decomposition is , and the time complexity to compute low-dimensional features is .
Letting the iterative number be 1, the time complexity to computing the mode- covariance matrix for MPCA is , the time complexity of eigen decomposition is , and the time complexity to compute low-dimensional features is , so the total time complexity is . Considering the time complexity, MPCA is superior to PCA.
For ITPCA, it is assumed that incremental datasets are added; MPCA has to recompute mode- covariance matrix and conducts eigen decomposition for initial dataset and incremental dataset. The more the training samples are, the higher time complexity they have. If updated-SVD is used, we only need to compute QR decomposition and SVD decomposition. The time complexity of QR decomposition is . The time complexity of the rank- decomposition of the matrix with the size of is . It can be seen that the time complexity of updated-SVD has nothing to do with the number of new added samples.
Taking the space complexity into account, if training samples are reduced into low-dimensional space and the dimension is , then PCA needs bytes to save projective matrices and MPCA needs bytes. So MPCA has lower space complexity than PCA. For incremental learning, both PCA and MPCA need bytes to save initial training samples; ITPCA only need bytes to keep mode- covariance matrix.
In this section, the handwritten digit recognition experiments on the USPS image dataset are conducted to evaluate the performance of incremental tensor principal component analysis. The USPS handwritten digit dataset has 9298 images from zero to nine shown in Figure 1. For each image, the size is . In this paper, we choose 1000 images and divide them into initial training samples, new added samples, and test samples. Furthermore, the nearest neighbor classifier is employed to classify the low-dimensional features. The recognition results are compared with PCA , IPCA , and MPCA .
At first, we choose 70 samples belonging to four classes from initial training samples. For each time of incremental learning, 70 samples which belong to the other two classes are added. So after three times, the class labels of the training samples are ten and there are 70 samples in each class. The resting samples of original training samples are considered as testing dataset. All algorithms are implemented in MATLAB 2010 on an Intel (R) Core (TM) i5-3210 M CPU @ 2.5 GHz with 4 G RAM.
Firstly, 36 PCs are preserved and fed into the nearest neighbor classifier to obtain the recognition results. The results are plotted in Figure 2. It can be seen that MPCA and ITPCA are better than PCA and IPCA for initial learning; the probable reason is that MPCA and ITPCA employ tensor representation to preserve the structure information.
The recognition results under different learning stages are shown in Figures 3, 4, and 5. It can be seen that the recognition results of these four methods always fluctuate violently when the numbers of low-dimensional features are small. However, with the increment of the feature number, the recognition performance keeps stable. Generally MPCA and ITPCA are superior to PCA and IPCA. Although ITPCA have comparative performance at first two learning, ITPCA begin to surmount MPCA after the third learning. Figure 6 has given the best recognition percents of different methods. We can get the same conclusion as shown in Figures 3, 4, and 5.
The time and space complexity of different methods are shown in Figures 7 and 8, respectively. Taking the time complexity into account, it can be found that at the stage of initial learning, PCA has the lowest time complexity. With the increment of new samples, the time complexity of PCA and MPCA grows greatly and the time complexity of IPCA and ITPCA becomes stable. ITPCA has slower increment than MPCA. The reason is that ITPCA introduces incremental learning based on the updated-SVD technique and avoids decomposing the mode- covariance matrix of original samples again. Considering the space complexity, it is easy to find that ITPCA has the lowest space complexity among four compared methods.
This paper presents incremental tensor principal component analysis based on updated-SVD technique to take full advantage of redundancy of the space structure information and online learning. Furthermore, this paper proves that PCA and 2DPCA are the special cases of MPCA and all of them can be unified into the graph embedding framework. This paper also analyzes incremental learning based on single sample and multiple samples in detail. The experiments on handwritten digit recognition have demonstrated that principal component analysis based on tensor representation is superior to tensor principal component analysis based on vector representation. Although at the stage of initial learning, MPCA has better recognition performance than ITPCA, the learning capability of ITPCA becomes well gradually and exceeds MPCA. Moreover, even if new samples are added, the time and space complexity of ITPCA still keep slower increment.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This present work has been funded with support from the National Natural Science Foundation of China (61272448), the Doctoral Fund of Ministry of Education of China (20110181130007), the Young Scientist Project of Chengdu University (no. 2013XJZ21).
- H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “Uncorrelated multilinear discriminant analysis with regularization and aggregation for tensor object recognition,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 103–123, 2009.
- C. Liu, K. He, J.-L. Zhou, and C.-B. Gao, “Discriminant orthogonal rank-one tensor projections for face recognition,” in Intelligent Information and Database Systems, N. T. Nguyen, C.-G. Kim, and A. Janiak, Eds., vol. 6592 of Lecture Notes in Computer Science, pp. 203–211, 2011.
- G.-F. Lu, Z. Lin, and Z. Jin, “Face recognition using discriminant locality preserving projections based on maximum margin criterion,” Pattern Recognition, vol. 43, no. 10, pp. 3572–3579, 2010.
- D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and Gabor features for gait recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700–1715, 2007.
- F. Nie, S. Xiang, Y. Song, and C. Zhang, “Extracting the optimal dimensionality for local tensor discriminant analysis,” Pattern Recognition, vol. 42, no. 1, pp. 105–114, 2009.
- Z.-Z. Yu, C.-C. Jia, W. Pang, C.-Y. Zhang, and L.-H. Zhong, “Tensor discriminant analysis with multiscale features for action modeling and categorization,” IEEE Signal Processing Letters, vol. 19, no. 2, pp. 95–98, 2012.
- S. J. Wang, J. Yang, M. F. Sun, X. J. Peng, M. M. Sun, and C. G. Zhou, “Sparse tensor discriminant color space for face verification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 6, pp. 876–888, 2012.
- J. L. Minoi, C. E. Thomaz, and D. F. Gillies, “Tensor-based multivariate statistical discriminant methods for face applications,” in Proceedings of the International Conference on Statistics in Science, Business, and Engineering (ICSSBE '12), pp. 1–6, September 2012.
- N. Tang, X. Gao, and X. Li, “Tensor subclass discriminant analysis for radar target classification,” Electronics Letters, vol. 48, no. 8, pp. 455–456, 2012.
- H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “A survey of multilinear subspace learning for tensor data,” Pattern Recognition, vol. 44, no. 7, pp. 1540–1551, 2011.
- H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “MPCA: multilinear principal component analysis of tensor objects,” IEEE Transactions on Neural Networks, vol. 19, no. 1, pp. 18–39, 2008.
- S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: a general framework for dimensionality reduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40–51, 2007.
- R. Plamondon and S. N. Srihari, “On-line and off-line handwriting recognition: a comprehensive survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 63–84, 2000.
- C. M. Johnson, “A survey of current research on online communities of practice,” Internet and Higher Education, vol. 4, no. 1, pp. 45–60, 2001.
- P. Hall, D. Marshall, and R. Martin, “Merging and splitting eigenspace models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 9, pp. 1042–1049, 2000.
- J. Sun, D. Tao, S. Papadimitriou, P. S. Yu, and C. Faloutsos, “Incremental tensor analysis: theory and applications,” ACM Transactions on Knowledge Discovery from Data, vol. 2, no. 3, article 11, 2008.
- J. Wen, X. Gao, Y. Yuan, D. Tao, and J. Li, “Incremental tensor biased discriminant analysis: a new color-based visual tracking method,” Neurocomputing, vol. 73, no. 4–6, pp. 827–839, 2010.
- J.-G. Wang, E. Sung, and W.-Y. Yau, “Incremental two-dimensional linear discriminant analysis with applications to face recognition,” Journal of Network and Computer Applications, vol. 33, no. 3, pp. 314–322, 2010.
- X. Qiao, R. Xu, Y.-W. Chen, T. Igarashi, K. Nakao, and A. Kashimoto, “Generalized N-Dimensional Principal Component Analysis (GND-PCA) based statistical appearance modeling of facial images with multiple modes,” IPSJ Transactions on Computer Vision and Applications, vol. 1, pp. 231–241, 2009.
- H. Kong, X. Li, L. Wang, E. K. Teoh, J.-G. Wang, and R. Venkateswarlu, “Generalized 2D principal component analysis,” in Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN '05), vol. 1, pp. 108–113, August 2005.
- D. Zhang and Z.-H. Zhou, “(2D)2 PCA: two-directional two-dimensional PCA for efficient face representation and recognition,” Neurocomputing, vol. 69, no. 1–3, pp. 224–231, 2005.
- J. Ye, “Generalized low rank approximations of matrices,” Machine Learning, vol. 61, no. 1–3, pp. 167–191, 2005.
- J. Yang, D. Zhang, A. F. Frangi, and J.-Y. Yang, “Two-dimensional PCA: a new approach to appearance-based face representation and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 131–137, 2004.
- J. Yang and J.-Y. Yang, “From image vector to matrix: a straightforward image projection technique-IMPCA vs. PCA,” Pattern Recognition, vol. 35, no. 9, pp. 1997–1999, 2002.
- J. Kwok and H. Zhao, “Incremental eigen decomposition,” in Proceedings of the International Conference on Artificial Neural Networks (ICANN '03), pp. 270–273, Istanbul, Turkey, June 2003.
- P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specific linear projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711–720, 1997.
Copyright © 2014 Chang Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.