Abstract
Conventional incremental PCA methods usually only discuss the situation of adding samples. In this paper, we consider two different cases: deleting samples and simultaneously adding and deleting samples. To avoid the NPhard problem of downdating SVD without right singular vectors and specific position information, we choose to use EVD instead of SVD, which is used by most IPCA methods. First, we propose an EVD updating and downdating algorithm, called EVD dualdating, which permits simultaneous arbitrary adding and deleting operation, via transforming the EVD of the covariance matrix into a SVD updating problem plus an EVD of a small autocorrelation matrix. A comprehensive analysis is delivered to express the essence, expansibility, and computation complexity of EVD dualdating. A mathematical theorem proves that if the whole data matrix satisfies the lowrankplusshift structure, EVD dualdating is an optimal rankk estimator under the sequential environment. A selection method based on eigenvalues is presented to determine the optimal rank k of the subspace. Then, we propose three incremental/decremental PCA methods: EVDDIPCA, EVDDDPCA, and EVDDIDPCA, which are adaptive to the varying mean. Finally, plenty of comparative experiments demonstrate that EVDDbased methods outperform conventional incremental/decremental PCA methods in both efficiency and accuracy.
1. Introduction
Principal component analysis (PCA) [1], known as the subspace learning, or the KarhunenLoeve transform [2], has been an active topic in machine learning and pattern recognition societies in the last several decades. As a wellknown unsupervised linear dimension reduction and multivariate analysis method, PCA has been applied to biometric recognition [3], gene classification [4], latent semantic indexing [5], and visual tracking [6].
In order to obtain the optimal set of normal orthogonal basis, which endues PCA with the minimal reconstruction error, the batchmode PCA can be achieved in two ways: the eigenvalue decomposition (EVD) of the data covariance matrix and the singular value decomposition (SVD) of the data matrix. Both approaches have a high computational cost and a mass demand of storage, in the case of a highdimensional and largescale dataset. In practical applications, not all the observations are available before training. Especially in online usage, samples arise sequentially along with time. In these situations, the batchmode PCA does not satisfy the demand for realtime process due to its requirement to recompute the EVD or SVD of the whole data every time.
To solve this issue, incremental learning has been investigated for more than two decades in both applied mathematics and machine learning community, whose task is to update the learning results without reexecuting the whole process when adding new data points. Various effective incremental PCA (IPCA) methods have been proposed.
In a period of knowledge explosion, the fast growing information is usually adulterated with mock, invalid, or expired data. The presence of a few deviated samples might tremendously contaminate the solved model, such as principal directions in PCA. The overdue instances, which can be regarded as outliers compared to unexpired instances in some degree, could reduce the accuracy of data model. Therefore, for an intelligent learning system, the only function to admit new instances is not enough, but the capability to eliminate aberrant samples is also necessary. This is the aim of decremental learning. Comparing with IPCA, decremental PCA (DPCA) did not receive adequate attention in the literature. Only a few methods have been proposed in the last ten years. Besides, there is no incremental decremental algorithm of subspace learning to the best of our knowledge. Similar works are only about support vector machine (SVM) [7]. Cauwenberghs and Poggio [8] propose an incremental decremental method of adding and/or deleting a single sample, and Karasuyama and Takeuchi [9] expand it to the case of multiple instances.
Because the essence of PCA is SVD or EVD in the mathematical form, the task of incremental PCA and decremental PCA is equivalent to updating and downdating SVD or EVD. In existing methods, most IPCA approaches adopt similar strategies via updating SVD. However, these tactics based on SVD may be impossible to be implemented for decremental PCA. Lorenzelli and Yao [10] point out that SVD downdating is NPhard without knowing right singular vectors. Hall et al. [11] argue that right singular vectors of the remained matrix cannot be computed without visiting elements of the right singular vectors of the original matrix. Besides, in many practical applications, such as subspace learning [12, 13], image reconstruction [14], face recognition [15], and visual tracking [16], only left singular vectors are needed as the projection matrix, so right singular vectors are usually not stored to save memory. If the data matrix and the right singular vectors are not preserved, the position information of deleted points in the queue may be unknowable in the decremental case, which causes right singular vectors to be incomputable. The problem of incomplete position information does not arise in increment PCA, because it is a common sense that new instances are appended to the tail of queue.
Based on the demand on incremental decremental learning and the difficulty of decremental learning in the analysis above, we introduce a novel online subspace method for simultaneous incremental decremental learning. The contributions in this paper are as follows.(1)To avoid the problem of lacking right singular vectors in decremental learning, we utilize EVD instead of SVD and propose a dualdating algorithm for eigenspace, that is, EVD dualdating, which can accept and delete samples at the same time. Our algorithm transforms the EVD updating and downdating of the covariance matrix into a SVD updating problem plus an EVD of a small autocorrelation matrix. To the best of our knowledge, it is the first attempt of simultaneous incremental decremental subspace learning and has a simpler and unitized mathematical form, which theoretically guarantees a better performance than the conventional multiplestep implementation.(2)Several theoretical and computational analyses are presented to further explore the property of EVD dualdating, including the essence and geometric explanation of EVD dualdating, expansive forms of EVD dualdating for data revising and weighted updating, the computation complex of EVD dualdating, a mathematical theorem which demonstrates the optimality of EVD dualdating in the sequential mode if the data matrix satisfies lowrankplusshift structure, and a selection method of the optimal rank based on eigenvalues.(3)It is proofed that the change of mean caused by adding or deleting samples in the varyingmean PCA can be transformed into adding and deleting several equivalent vectors in the zeromean PCA. Thus, three online PCA algorithms are derived based on EVD dualdating to cope with changeable mean: incremental PCA (EVDDIPCA), decremental PCA (EVDDDPCA), and incremental decremental PCA (EVDDIDPCA).
The remaining of this paper is organized as follows. Section 2 briefly reviews the updating and downdating methods of both SVD and EVD and incremental PCA. The proposed EVD downdating algorithm and its analyses are presented in Section 3. In Section 4, EVD dualdating is applied to incremental decremental PCA with mean updating. Section 5 presents the experiment results and comparisons with other approaches. Section 6 concludes this paper. In the end, proofs of lemmas and theorems are in the Appendix.
2. Related Work
Over the past few decades, many efficient incremental PCA methods have been proposed. Generally, existing IPCA algorithms can be divided into three categories. The first category updates eigenvectors without any matrix decomposition. The typical method is the candid covariancefree IPCA (CCIPCA) [17]. The second category updates principal components via EVD updating. The subspace merging and splitting model, developed by Hall et al. [12], belong to this category. With the help of partition RSVD [14] and SVD updating [18], the third category is the most studied which recomputes singular values and singular vectors via sequentially updating SVD.
Weng et al. [17] propose an incremental PCA method without computing the covariance matrix, the candid covariancefree IPCA(CCIPCA). The CCIPCA algorithm computes principal components sequentially and considers the complementary space of lower order PCs when calculating higher order PCs. Because the computation of the th PC depends on the th one, the error will be accumulated in the whole process. Besides, the sampletodimension ratio needs to be large enough to avoid some problems coming from the view of statistical estimation. This condition is not satisfied in many situations.
Hall et al. [12] develop a merging and splitting eigenspace model (MSES). This algorithm is an online subspace learning algorithm based on EVD, via solving a small eigenproblem on a new orthonormal basis. MSES is able to update or downdate EVD by adding or subtracting the eigenspace of added or deleted samples and adaptive to the change of data mean.
Except these two approaches above, other incremental PCA methods are based primarily on SVD. Levy and Lindenbaum [19] propose the sequential KarhunenLoeve (SKL) method based on the partitioned RSVD algorithm, which simplifies the SVD of a large data matrix into the SVD of some small ones via a sequential procedure. Then, this sequentialized partitioned RSVD algorithm is utilized to extract PCs from a sequence of human face images. Besides, a forgetting factor is employed to weaken the affection of old data. However, in SKL, the mean of data is not taken into consideration, so the result is not accurate enough when confronting an image sequence with a variational mean, such as a human face under a changeable illumination. Skoaj and Leonadris [13] develop the weighted and robust incremental subspace learning (WRISL) algorithm, which has the ability to deal with the change of mean and weighted data. However, WRISL does not consider the chunk updating mode; in other words, only one sample can be manipulated in each round. The mean updating for multiple samples is solved by Ross et al. [16], who demonstrate that the covariance matrix of the combined data is equal to the sum of covariance matrices of the old data, the new data, and an additional vector when taking the mean into account. According to this, Ross et al. obtain an extended SKL algorithm with mean updating, which is applied to visual tracking and successfully locates one human face and one toy with different poses under variational background and illumination, both indoor and outdoor.
Zha and Simon [18] propose a more generalized mathematical formula to update SVD, namely, SVD updating. This algorithm, which is applied to LSI, is an efficient incremental method to recalculate the rank SVD for updating documents, updating terms, and term weight corrections. Moreover, Zha and Simon prove that if the united data matrix satisfies the lowrankplusshift structure, the result of the SVD updating algorithm with the new data and the optimal rank approximate of old data is still an optimal rank estimation. Zhao et al. [20] propose a chunk incremental PCA approach via the SVD updating algorithm, known as SVDUIPCA. Comparing to other incremental PCA methods, SVDUIPCA computes the eigendecomposition of the autocorrelation matrix instead of the covariance matrix. The motivation is that usually the sample number is much smaller than the data dimension in practical applications, so the dimension of the autocorrelation matrix is also smaller than the covariance matrix. Then, Zhao et al. find a strategy to update the eigendecomposition of a autocorrelation matrix by SVD updating. However, the change of mean is not considered in SVDUIPCA, so it is not suitable for the situation with changeable mean. Besides, it suffers from the problem of growing demand for storage and computation, because the size of autocorrelation matrix is dilating along with the new data, and an additional process is needed to transfer the resulting right singular vectors and kept whole data to principal components. Huang et al. [15] propose an improved SVDUIPCA method to handle changeable mean data and decrease the storage, where only a small package of concentrated data is saved to calculate left singular vectors.
Although a great deal of research has been accomplished about incremental subspace learning, the research on decremental learning is still inadequate in the literature. The merging and splitting eigenspace model developed by Hall et al. [12] can downdate EVD to recompute PCs when deleting some samples from the old data. Meanwhile, they claim that it is impossible to achieve SVD downdating in a closed form with their model. Brand [21] proposes a fast modification model of rank singular value decomposition (MSVD). As an extension of the term weight corrections form of SVD updating, MSVD is able to recompute the rank SVD of the modified data matrix after updating, downdating, revising, and recentering terms. However, this method does not take the mean into consideration, so its result is not accurate when the data mean was timevarying. Melenchón and Martínez [22] develop a method for downdating, composing, and splitting SVD (DCSSVD) with a changeable mean. DCSSVD accomplishes these by downdating, composing, and splitting the right singular vectors firstly, then computing the mean and SVD of the remained right singular vectors, and finally calculating the resulting SVD. However, this method suffers from a severe efficiency problem, since the core process is the SVD of a matrix, whose computation complexity is , still depending on the data dimension. AIPCA, proposed by Wang et al. [23], is a decremental version of SVDUIPCA algorithm which recomputes the eigendecomposition of the autocorrelation matrix by MSVD. Although AIPCA achieves decremental subspace learning, it inherits disadvantages of SVDUIPCA and MSVD, such as incapability of handling changeable mean, a large memory to preserve the data matrix, and an additional process to transfer its results to left singular vectors.
Beside the accuracy and efficiency, the severest problem faced by SVDbased decremental methods is that it is a NPhard problem without the position information of deleted samples in the data matrix, which might be not obtainable in many practical applications.
3. EVD Dualdating
In Section 3.1, we first briefly review the SVD updating [18] algorithm. Our EVD dualdating algorithm is proposed in Section 3.2. Then, the related analyses are reported in the following sections.
3.1. SVD Updating
Given a data matrix , its SVD is , where , , and . The best rank approximation of is where is the diagonal matrix with the largest singular values and and are the first columns of and , respectively. We call (1) the rank singular value decomposition (rank SVD) of .
When new samples come, how to compute the rank SVD of the new data matrix by only using , , , and ?
To solve this problem, Zha and Simon [18] propose an efficient mathematical tool, namely, SVD updating. Its detailed procedure is described in Algorithm 1.

3.2. EVD Dualdating
In this section, a thorough discussion of the proposed dualdating algorithm for EVD is presented. Dualdating means updating and downdating together; in other words, we consider the situation of adding and deleting samples simultaneously.
Given a data matrix , its SVD is . Let the covariance matrix (in this section, we do not distinguish the covariance matrix and the scatter matrix, since their difference is only the coefficient ), then is a symmetric positive semidefinite matrix, and its EVD is , where . The best rank approximation of is where is the first columns of and is a diagonal matrix with the largest eigenvalues in . For any matrix , we call (2) the rank eigenvalue decomposition (rank EVD) of or .
Now some old samples are to be deleted, where can be composed of arbitrary columns in . Without losing any generality, let be the last columns: . Meanwhile, new instances are available: . We are interested in how to express the rank EVD of the final data matrix as modifications to , via and .
3.2.1. Basic Procedure
The basic procedure of the proposed EVD dualdating algorithm is as follows. Let
Thus, the covariance matrix of can be written as
The basic idea of EVD dualdating is to transform the dualdating problem into a SVD updating problem plus an extra process with a small computation complexity. Firstly, consider the matrix . Knowing , , we assume that right singular vectors in the rank SVD of are . Then, the rank SVD of can be calculated by the SVD updating algorithm: where , , and .
Let
Because is a symmetric positive semidefinite matrix, is also symmetric positive semidefinite. Usually , so is a small matrix compared to the covariance matrix of . The EVD of is where , is the diagonal eigenvalue matrix.
Finally, the rank EVD of is
By (3) to (9), we have successfully converted the dualdating problem of EVD into a SVD updating problem of adding samples plus an EVD of a small matrix.
3.2.2. Further Simplification
Although the basic procedure of our EVD dualdating algorithm is given, one problem still remains unsolved: the assumed right singular vectors is unobtainable. Here we address this problem by simplifying the computation of .
Consider the results of the SVD updating algorithm on the rank SVD of : where , , , , , , , and .
From (11), it can be seen that the right singular vectors are actually not needed, and the computation of is simplified.
3.2.3. Algorithm
The detailed procedure of EVD dualdating has been presented above. To sum up, the pseudocode of our EVD dualdating algorithm is described in Algorithm 2. To achieve pure updating or downdating of EVD, it is only needed to let or be an empty set. From the computation progress, EVD dualdating is degenerated into the standard SVD updating in the updating mode.

3.3. Analysis of EVD Dualdating
In this section, we first analyze the mechanism of EVD dualdating for incremental and decremental learning. Second, some extended forms of EVD dualdating are given for particular uses. Third, the computation complexity on the proposed EVD dualdating algorithm is presented. Fourth, the optimality of EVD dualdating in the sequential usage is demonstrated. Finally, we discuss how to determine the optimal rank .
3.3.1. Mechanism of Incremental and Decremental Learning
For convenience, when analysing the essence of incremental and decremental learning based on EVD dualdating, we only consider the pure updating or downdating situation and denote the changed matrix as in both situations.
According to the procedure of EVD dualdating, two key decompositions are the SVD updating of the equivalent adding matrix and the EVD of the small matrix . And in the SVD updating algorithm, the core step is the rank SVD of the following matrix: where is the diagonal variance matrix of the original data , is the coefficient matrix by projecting onto the subspace spanned by , and is the upper triangular reconstruction error matrix of . So a conclusion can be obtained that, in , the left columns represent the original data, and the remained columns represent the added or deleted data.
Then, divide the columns of into two partitions: where are the first rows of and are the last rows of . Thus, , stand for the old data and the changed data, respectively.
So the matrix can be written as
Now, let us observe the situation from the view of geometry shown in Figure 1. On the left is the column space of data. The red arrows represent the orthogonal basis of the old subspace. The green arrow is the added or deleted samples , whose projection on and reconstruction error are the green dashed arrows. After QR decomposition, a new basis in the extended subspace is made up by the red and pink arrows. However, the projection of the data matrix on this basis is not completely diagonalized. So, the SVD of the coefficient matrix on this basis is executed to obtain the diagonalizing matrix. Then, the new orthogonal basis after adding is represented by the blue arrows. At this time, the row space of is drawn on the right of the figure. The black and pink arrows compose a standard orthogonal basis, where the black ones are the elements corresponding to old samples, and the green one is the elements corresponding to changed samples. The blue arrow represents the orthogonal vectors in the row space of after adding via SVD updating. Because EVD dualdating adds samples at first no matter whether the case is incremental or decremental, so it needs to make an adjustment in the row space of . If it is deletion, the elements corresponding to changed samples are signchanged. As shown in Figure 1, the component, marked by the cyan blue dashed arrow, is reversed. From (14), is in fact the sum or difference of autocorrelation matrices of the old data and the changed data . According to the relationship between the column and row space, the EVD of is utilized to acquire the new rotation matrix in the data space. Finally, the resulting orthogonal basis is the orange arrows.
To sum up, the aim of EVD dualdating is to obtain the projection matrix caused by the change of sample set, and the essence of EVD dualdating is to transform the EVD of a varying covariance matrix in the data space to the EVD of a varying autocorrelation matrix in a dimensionreduced row space.
3.3.2. Extendibility of EVD Dualdating
From the deduction of EVD dualdating, it can be seen that nearly no restriction is imposed on , , and . In the downdating mode, the procedure is still feasible even if is not columns of . Meanwhile, can be selected as any matrix which matches the dimension. The only condition needed to be satisfied is that must be a positive semidefinite matrix. From another view, our EVD dualdating algorithm has a favorable extendibility.
The standard dualdating mode for EVD dualdating is adding and deleting samples synchronously. As we mentioned before, when or is an empty set, EVD dualdating can work at the pure incremental or decremental mode. When , EVD dualdating can be seen as data revising. Another useful extension is forgetting updating, or the socalled weighted updating, which is very important for online applications. In the learning procedure, prior instances should be assigned a low weight since they become antiquated as time goes on. Without proper weighting mechanisms, the contribution of too many old similar samples can become too prominent that new instances seem meaningless. In [16], the forgetting factor is used to destrengthen the effect of old images of the tracked object. Via EVD dualdating, a concise but meticulous weighting formula can be acquired, in which the weight of an arbitrary sample can be modulated, similar as the way adopted in [13]. The sole operation is modifying as follows: The equivalent data matrix of weighted updating is . The detailed expansions of EVD dualdating are listed in Table 1.
3.3.3. Computation Complexity of EVD Dualdating
Before analyzing, we define some signs to simplify the representation: and stand for the QR and SVD decomposition of a rectangle matrix and , , and stand for the QR, SVD, and EVD decomposition of a square matrix. The computation of the proposed EVD dualdating algorithm is composed of four parts: QR of , SVD of , EVD of , and other multiplication operations including calculating reconstruction error, , and . Because the first column of is already orthogonal, its QR decomposition only operates the last columns actually. The computation complexity is presented in Table 2, where the computation is presented into two parts: matrix decomposition and transformation cost.
In the pure updating or downdating mode, there are two matrix decompositions in our EVDDualdating algorithm, one more than other pure updating and downdating methods. This may cause EVD dualdating slower than other methods. But taking the dimension and the transformation cost into account, the efficiency of EVD dualdating is close or even better, comparing to other methods. The main advantage of our algorithm can be reflected in the dualdating mode. As the only method achieving simultaneous updating and downdating, EVD dualdating can avoid many repeating processes and decrease the cumulative error. An experimental comparison of efficiency and accuracy on our EVD dualdating and other incremental and decremental methods is presented in Section 5.
3.4. Justification of the Sequential Usage of and
In many online applications, it is impossible to store the original data because of the limitation of the physical medium and the consideration about efficiency. Described in a mathematical form, this means that the original data matrix is unobtainable and replaced by its best rank approximation which can be calculated by and . So, it is urgent to demonstrate the effectiveness of EVD dualdating in a sequential process.
Zha and Simon [18] proof that when the combined matrix satisfied the lowrankplusshift structure, SVD updating is optimal when is replaced by its best rank approximation. Here, a theoretical demonstration is given to illustrate that if the whole data matrix satisfies the lowrankplusshift structure, the result of EVD dualdating after any adding or deleting operations is also an optimal rank estimation. First, we state Lemma 1 without proof.
Lemma 1. Let , , and its EVD is , where and are the th eigenvector and eigenvalue, respectively. Then for , one has
The lemma above indicates that for the rank EVD it is safe to cutoff the minor eigenspaces, without affecting the optimality. With this, we discuss under the lowrankplusshift structure, when is replaced by , the information discarded will also be discarded after EVD dualdating. The conclusion is summarized in Theorem 2, whose proof can be found in the Supplementary Material (available online at http://dx.doi.org/10.1155/2014/429451).
Theorem 2. Given a matrix , with its  approximation , the deleted data from , the added data , . Let be the remained data from , the full data, and the final data, where the underline means deletion. Let be the remained matrix after deleting columns corresponding to from ’s  approximation, and let be the final data from . Assume that satisfies the lowrankplusshift structure; that is, where is symmetric and positive semidefinite with ; then
3.5. Criterion for the Optimal Rank Selection
In the deduction above, the rank of subspace is assumed to be a fixed number ; however the optimal dimension of subspace depends on a priori information, which is possibly unknown in practical applications. Based on the fact that the bulk of variability of a given dataset can be captured by the top few eigenvectors, we introduce an eigenvaluebased method to determine the best rank of subspace during the online learning procedure.
Supposing the truncation operation is not yet executed in steps 4 and 6 of Algorithm 2, the ranks of obtained and are . First, we define the rate : which indicates the proportion of the first eigenvalues in all eigenvalues. Then, the best dimension can be selected as the minimum number exceeding some threshold: where the threshold is a value in . In the batch mode, the threshold only depends on the proportion of information to be preserved. For EVD dualdating, because the estimation of rank and truncation are performed in every round, the threshold is relative to the ratio of saved and new information. In practical implementation, it can be chosen according to the chunk size of added and deleted samples.
4. Incremental Decremental PCA Based on EVD Dualdating
In the deduction of EVD dualdating in Section 3.2, the mean of samples is not considered, but in practical applications, centralization is a necessary process to reduce the effect of environment. In this section, we first provide a brief review of PCA. Then, we take mean into account and propose three online subspace learning algorithms: EVDDIPCA, EVDDDPCA, and EVDDIDPCA.
4.1. Principal Component Analysis
Principal component analysis (PCA) is one of the most popular multivariate analysis and dimension reduction methods. The goal of PCA is to find a set of normal orthogonal basis, socalled principal components, which has the best reconstruction performance in the sense of minimum mean squared error (MMSE).
Given a data matrix , the covariance matrix of is defined by . Principal components (PCs) are the first eigenvectors corresponding to the largest eigenvalues . Let , ; then and can be achieved by the EVD of the covariance matrix, . Another way of solving PCA is to compute the SVD of the centralized data matrix , where stands for a full row vector, each column of left singular vectors is a principal component, and is the singular value matrix.
4.2. Incremental and Decremental PCA
When confronting a huge dataset with a high dimension, both batchmode methods, no matter EVD or SVD, cost tremendous time and storage. Besides, for an online learning system, it has to face an awkward circumstance that not all the instances are available before training, or some expired instances need to be deleted after training. Obviously, these problems exceed the ability of the batchmode PCA. The incremental and decremental PCA is a natural solution.
In this section, we consider EVD dualdating with a timevarying mean, and deduce the incremental decremental PCA formula based on EVD dualdating. As mentioned before, EVD dualdating degenerates into SVD updating without right singular vectors in the updating mode, so EVDDIPCA is actually the same as the extended sequential KL algorithm. Nonetheless we still present it in this paper for integrity. The interested reader can find more details in the reference paper [16].
The key idea of EVDDbased incremental and decremental PCA algorithm is that centralizing the original samples, the added samples, and the deleted samples separately and utilizing some meanrevising vectors to keep the covariance matrix equal to the original one. The methods of determining these meanrevising vectors are introduced in Lemmas 3, 4, and 5. For incremental or decremental PCA, there is only one meanrevising vector, noted as the equivalent added vector or the equivalent deleted vector , respectively, which is direct ratio to the difference of the original mean and the changed sample mean. For increment decremental PCA, the situation is a little more complex. Because of the existence of intersecting items, there are three meanrevising vectors, two equivalent added vectors , , and one equivalent deleted vector . Based on these lemmas, the proposed EVDDIPCA, EVDDDPCA, and EVDDIDPCA algorithms are presented in Algorithms 3, 4, and 5.



4.2.1. Incremental PCA
Lemma 3. Let , be two data matrices, and let their concatenation be . Denote the means and scatter matrices of , , and as , , and and , , and , respectively. This holds where .
4.2.2. Decremental PCA
Lemma 4. Let , be two data matrices, and let be their concatenation. Denote the means and scatter matrices of , , and as , , and and , , and , respectively. This holds where .
4.2.3. Incremental Decremental PCA
Lemma 5. Let , , and be three data matrices, and let , . Denote the means and scatter matrices of , , , and as , , , and and , , , and , respectively. This holds where , , and .
Remark 6. As an important approach of dimension reduction, PCA is utilized as the preprocessing method for many other machine learning methods, and the feature extraction method in other applications. Because these methods usually work in the subspace of PCA, there is a great demand to achieve simultaneous online incremental decremental subspace learning and data reconstruction. Artac et al. [24] propose a method to sequentially compute the coefficients of a sample in IPCA. Here, we introduce an incremental approach to update the projection coefficients of a data point after renewing the subspace via EVDIDPCA, without storing the original data. For any sample , assuming the eigenvectors is and the mean is when it is added into the dataset, the reconstruction of is , where is the projection coefficients of on the basis . Then, at each round of EVDIDPCA, the projection coefficients of can be updated by where is the first rows of . It is worth noticing that in (24) is a procedure variable in EVD dualdating, and only needs to be computed once for all samples, so the computational amount of updating is small, , but the memory to store a data point is reduced from to .
5. Experiment
In this section, experiments of the proposed algorithms based on EVD dualdating are presented, compared with other classic methods. Because incremental PCA has been discussed a lot in the earlier literature and the proposed EVDDIPCA is actually equivalent to the extended sequential KL algorithm, we do not verify IPCA methods in this paper any more. The interested reader can find the performance analysis and comparison in relative papers [12, 15, 16, 20]. In the following content, decremental PCA, incremental decremental PCA experiments on realworld datasets are firstly reported; then, an adaptive rank selection experiment of EVDDualdating on an artificial dataset is represented. All experiments are performed with Matlab, on a computer with dualcore 2.0 GHz CPU and 4 G RAM.
5.1. DPCA Experiment: Performance Evaluation on RealWorld Data
In order to verify the performance and efficiency of the proposed EVDDDPCA and EVDDIDPCA, four datasets are used, including cases of both high dimension and huge size. The FERET [25] database is a standard dataset used for facial recognition system evaluation managed by DARPA and NIST. The AR [26] dataset is a popular face image dataset, where images are shot under different facial expressions, illumination conditions, and partial occlusions due to sun glasses and scarf. The Yale Face Database B (Yale B) [27] contains 5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses illumination conditions). Subsets of AR, FERET, and Yalb B are employed in our simulation, which includes 952, 720, and 4050 cropped and centralized face images, respectively. The fourth database is the Columbia Object Image Library (COIL100) [28], and it includes 7200 color images of 100 objects, which are taken at pose intervals of 5 degrees, corresponding to 72 poses per object. The detailed information of four datasets and our experiment settings are listed in Table 3.
To compare the performance of decremental learning, we implement the proposed EVDDDPCA algorithm with the batchmode PCA, MSES [12], MSVD [21], DCSSVD [22], and AIPCA [23]. First, the whole data are learned via the batchmode PCA; then, assuming some classes are expired data, their samples are deleted chunk by chunk. In our experiment, the number of expired classes is 40 for FERET, 39 for AR, 30 for Yale B, and 40 for COIL100, and the chunk size is 10. Every experiment is repeated 20 times to reduce the disturbance from the process scheduling of operating system and randomized grouping. The performance is mainly evaluated by their efficiency, accuracy of eigenspace, and performance of face recognition:(i)execution time;(ii)weighted angle between PCs of the batchmode PCA and DPCA methods: where is the th eigenvalue of the batch mode and is the angle between the th PCs;(iii)recognition rate.
5.1.1. Computational Efficiency by the Subspace Dimension
Recalling the analysis of computation complexity in Section 3.3, the practical computational efficiency depends on the dimension of the small matrix for decomposition and the cost of transformation. From Table 2, DCSSVD has a larger computation complexity of matrix decomposition, and ; AIPCA has a larger computation complexity of transformation, . For MSES, MSVD and our EVDD, they have close computation complexities: MSES needs one additional to extract the eigenspace model of deleted samples before subtracting; MSVD has two QR decomposition of the residual matrices in both the row and column space, and , as well as a larger transformation cost ; EVDD has one additional to transform updating to downdating. Therefore, when the data dimension is high or the size of dataset is huge, that is, , DCSSVD and AIPCA achieve lower efficiencies, and MSES, MSVD, our EVDD achieve close higher efficiencies. This conclusion is also demonstrated by Figures 2(a), 2(b), 2(c), and 2(d), which show the execution time by kept PCs (: 10–200) of MSES, DCSSVD, MSVD, AIPCA, and EVDDDPCA, on FERET, AR, Yale B, COIL100. From these figures, we observe that our proposed EVDDDPCA achieves a better or comparable efficiency.
(a) FERET
(b) AR
(c) Yale B
(d) COIL100
5.1.2. PC Estimation Equality to GroundTruth PCs
In order to evaluate the accuracy, the angles between the resulting PCs of DPCA methods and the batchmode PCA can be adopted. But, we choose the weighted angles by their corresponding eigenvalue, which are more suitable for evaluation because they emphasize the importance of the leading PCs. Figures 3(a), 3(b), 3(c), and 3(d) show the weighted angles of the first 50 PCs of DPCA methods on four datasets, when the number of kept PCs is 100, and the chunk size is 10. Figures 4(a), 4(b), 4(c), and 4(d) show the weighted angles error of the first 50 PCs of DPCA methods on different datasets by the number of kept PCs (: 10–200), when the chunk size is 10.
(a) FERET
(b) AR
(c) Yale B
(d) COIL100
(a) FERET
(b) AR
(c) Yale B
(d) COIL100
From these figures, our proposed EVDDDPCA algorithm performs the best accuracy of the eigenvector estimation. The accuracy of principal direction depends on the estimation of mean and the cutoff error. The error of mean will cause a bias of the origin for data centralizing, which may cause the direction of the resulting basis totally different in the worst situation. The cutoff error accumulates in the sequential process, so the more times the truncation happens, the lower accuracy the final result remains. The method to update the mean is the same in MSES and EVDDDPCA, whose estimate is equal to the true mean. In DCSSVD, the new mean is updated via the right singular vectors . However, is cut off to the reduced dimension, so its estimation of mean is not accurate. But the inaccuracy of mean will not affect its computation of singular vectors, because the mean correction item is stripped off from , and no data centralizing process is executed. So errors of the singular vectors in EVDDDPCA, MSES and DCSSVD mainly come from the cutoff error. Before splitting, MSES calculates the EVD of the deleted data, whose result is cut off to the kept dimension. The step will bring in more cutoff error. DCSSVD directly deals with the right singular vectors to achieve downdating, so it actually ignores the information of deleted samples reflected by high order PCs. In AIPCA and MSVD, the mean is not updated, so all the remained samples centralized with the old mean. Therefore, their results deviate far away from true PCs. In Figures 3(a), 3(b), 3(c), and 3(d), it can be seen that the weighted angle of our proposed EVDDDPCA is much smaller than other methods, because of the accurate estimate of mean and the smaller cutoff error. MSES and MSVD have close performances, and AIPCA and MSVD have larger errors. The same conclusion can be obtained in Figures 4(a), 4(b), 4(c), and 4(d). The fluctuation at the beginning of these curves is because the dimension of observed PCs is increasing from 10 to 50.
5.1.3. Results of Recognition with Minimum Distance Classifier
In the recognition experiment, the resulting PCs are used as the projection matrix to project the testing image to the subspace, then minimum distance classifier (MDC) is utilized for recognition. The advantage of MDC in our online application is that only the mean of each class in the projection subspace needs to be saved. The distance between a sample and a class in MDC is defined by a Mahalanobis distance: where is the projection vector in the subspace, is the mean of the class in the subspace, is the eigenvalue matrix estimated by EVD dualdating.
Figures 5(a), 5(b), 5(c), and 5(d) represent recognition rates of the fulldata PCA, the batchmode PCA, DPCA methods. The result shows that the fulldata PCA has a lower recognition rate due to the existence of expired instances, and all DPCA methods have close recognition rates, nearly equal to the batchmode PCA. The similar results are also obtained by Ozawa et al. [29]. This phenomenon can be explained via random projection (RP) [30]. According to JohnsonLindenstrauss lemma [31], arbitrary set of points in a highdimensional Euclidean space can be mapped onto a () dimension subspace where the distances between all pair of points are approximately preserved. So as long as is large enough, for arbitrary dimensional random projection, the classification performance is mainly determined by MDC and the structure of data space itself. In our experiments, the smallest is between the range on FERET, AR, Yale B, and about on COIL100.
(a) FERET
(b) AR
(c) Yale B
(d) COIL100
5.2. IDPCA Experiment: Performance Evaluation on RealWorld Data
To compare the performance of incremental decremental subspace learning methods, we implement the proposed EVDDIDPCA algorithm with the batchmode PCA, MSES [12], MSVD [21], DCSSVD [22], and AIPCA [23]. Because DCSSVD only accomplishes decremental PCA, we combine it with the extended SKL to achieve IDPCA. As a decremental version of SVDUIPCA, AIPCA is connected with SVDUIPCA to fulfill IDPCA in our experiment.
The datasets for IDPCA is the same as in the DPCA experiment and the configuration is shown in Table 4. In our experiment, samples of pretraining classes are learned by the batchmode PCA, then at every round, a chunk of samples in expired classes are deleted, and meanwhile a chunk of samples in new classes are added. The chunk size is 10. The training/testing rate is the same as in the DPCA experiment. Execution time, weighted angle, and recognition rate are used to evaluate the performance of IDPCA methods.
5.2.1. Computational Efficiency by the Subspace Dimension
Figures 6(a), 6(b), 6(c), and 6(d) present the runtime by the number of kept PCs () of IDPCA methods. Different from other IDPCA methods, which process incremental learning and decremental learning separately, our EVDDIDPCA deals with deleted and added samples simultaneously, and avoids the repeating execution of preprocessing, postprocessing and some matrix decompositions. Therefore, as shown in Table 2, via the dualdating scheme, EVD dualdating has a more concise form with less matrix decompositions and a lower transformation cost. So in our experiment, the proposed EVDDIDPCA performs much higher efficiency than other methods, especially, when the scale of dataset is large.
(a) FERET
(b) AR
(c) Yale B
(d) COIL100
5.2.2. PC Estimation Equality to GroundTruth PCs
Figures 7(a), 7(b), 7(c), and 7(d) show the weighted angles between the first 50 PCs of different IDPCA methods, when the number of kept PCs is 10, and the chunk size is 10. Figures 8(a), 8(b), 8(c), and 8(d) show the error norm of weighted angles between the first 50 PCs of IDPCA methods on different datasets by the number of kept PCs(: 10–200), when the chunk size is 10. As the only real incremental decremental PCA method with an accurate mean estimation and a dualdating scheme, EVDDIDPCA can obtain principal eigenvectors with fairly better approximation than other methods via avoiding redundant cutoff error. These figures show that the estimation of leading PCs in EVDDIDPCA is significantly superior to opponents.
(a) FERET
(b) AR
(c) Yale B
(d) COIL100
(a) FERET
(b) AR
(c) Yale B
(d) COIL100
5.2.3. Results of Recognition with Minimum Distance Classifier
Figures 9(a), 9(b), 9(c), and 9(d) represent recognition rates of the fulldata PCA, the batchmode PCA, IDPCA methods. The result is similar as shown in DPCA experiments, that the recognition rate of fulldata PCA is lower because of the existence of expired classes, and recognition rates of considered IDPCA methods are close, mainly depending on MDC and the structure of data space, when is large enough to satisfy JohnsonLindenstrauss lemma.
(a) FERET
(b) AR
(c) Yale B
(d) COIL100
Besides, one important advantage of EVDDDPCA, not reflected by these DPCA and IDPCA experiments, is that the specific position information of deleted and added samples is not needed, which are necessary for DCSSVD, AIPCA, and MSVD.
5.3. Automatic Rank Selection and Weighted EVD Dualdating
In this experiment, the selection of the dimension of subspace without any a priori is evaluated. An artificial dataset is used here, which includes data points generated from the following model: where , is a coefficient vector and is a small noise, sampled from a normal distribution . In the simulation, the data dimension is , and the number of generated samples is . Then, samples are sequentially learned at different chunk sizes (5, 10, 20) by our EVD dualdating and weighted EVD dualdating algorithms. The weights are , and in weighted EVD dualdating. In every round, the number of kept PCs is determined by (20), and the thresholds of preserved proportion are with respect to the chunk sizes 5, 10, and 20. Figure 10 shows the updating curves of kept rank during the online learning process, where the solid lines stand for weighted EVD dualdating, and the dash lines stand for EVD dualdating. From this figure, kept ranks in all curves quickly rise from the chunk size to 50–60 at the beginning, which means new features have been added to the eigenspace. Then, ranks of the weighted EVD dualdating tend to a common stable value 53. It is worth noted that the red solid line with the smallest chunk size 5 has the fast convergence speed, and the blue solid one with the largest chunk size converges slowest. For normal EVD dualdating, because the influence of leading PCs is not weakened, as the online learning progresses, it becomes unwelcome to new features, and later exclusive to minor PCs. Therefore, their kept ranks, reflected by dash lines, all have a quick decreasing trend. For example, the blue dash line (chunk 20) ends with a rank less than 30, after all samples are learned.
6. Conclusion
This paper focuses on the problem of online incremental/decremental subspace learning and reports a novel dualdating algorithm of EVD, namely, EVD dualdating. Different from previous works, the proposed EVD dualdating algorithm can renew the EVD of a data matrix while adding and deleting samples simultaneously. With EVD dualdating, IPCAEVDD, DPCAEVDD, and IDPCAEVDD are presented to handle the changeable mean, where the variation is equivalent to add and delete several additional vectors in the case of zeromean PCA. Plenty of comparative experiments on both realworld and artificial databases demonstrate that our EVD dualdating algorithm has a significant better approximation accuracy and computational efficiency than other stateoftheart incremental and decremental PCA methods.
Appendices
A. Proof of Lemma 4
By definition,
And, the scatter matrix of is
B. Proof of Lemma 5
By definition,
And, the scatter matrix of is
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Grants nos. 61175028, 61375007), the Ph.D. Programs Foundation of Ministry of Education of China (Grants nos. 20090073110045), and the Shanghai Pujiang Program (Project no. 12PJ1402200).
Supplementary Materials
In many online applications, it is impossible to store the original data because of the limitation of the physical medium and the consideration about efficiency. Described in a mathematical form, this means that the original data matrix A is unobtainable and replaced by its best rankk approximation which can be calculated by U_{k} and Λ_{k}. Theorem 2 proofs that under the lowrankplusshift structure, when A is replaced bybest_{k}(A), the information discarded will also be discarded after EVD dualdating. In other words, EVD dualdating is an optimal rankk estimator in the sequential usage.