Multimedia Data FusionView this Special Issue
Robust Online Object Tracking Based on Feature Grouping and 2DPCA
We present an online object tracking algorithm based on feature grouping and two-dimensional principal component analysis (2DPCA). Firstly, we introduce regularization into the 2DPCA reconstruction and develop an iterative algorithm to represent an object by 2DPCA bases. Secondly, the object templates are grouped into a more discriminative image and a less discriminative image by computing the variance of the pixels in multiple frames. Then, the projection matrix is learned according to the more discriminative image and the less discriminative image, and the samples are projected. The object tracking results are obtained using Bayesian maximum a posteriori probability estimation. Finally, we employ a template update strategy which combines incremental subspace learning and the error matrix to reduce tracking drift. Compared with other popular methods, our method reduces the computational complexity and is very robust to abnormal changes. Both qualitative and quantitative evaluations on challenging image sequences demonstrate that the proposed tracking algorithm achieves more favorable performance than several state-of-the-art methods.
Online object tracking is a fundamental problem in many computer vision applications such as surveillance, driver assistance systems, and human-computer interactions [1–4]. Although researchers have made great progresses in this area, object tracking remains a challenging problem due to the difficulty arising from the appearance variability of an object. Intrinsic and extrinsic changes inevitably cause large appearance variation. Due to the nature of the tracking problem, an effective appearance model is of prime importance for the success of a tracking algorithm [5–8].
In recent years, as a popular dimensionality reduction and feature extraction technique, linear subspace learning has been successfully used in robust visual tracking. Supervised discriminative methods for classification and regression have also been exploited to solve visual tracking problems. For example, Avidan developed a tracking algorithm that employs the support vector machine (SVM) classifier within an optic flow framework . Along similar lines, Williams et al. developed a method in which an SVM-based regressor was used for tracking . As a result of training the regressor on in-plane image motion, this method is not effective in tracking objects with out-of-plane movements. In , the author combines the sparse coding and Kalman filtering together and chooses the color histogram and gradient histogram as features to track the object. The template updating strategy of the algorithm is to replace a random template of the original template library with the last tracking result. This updating strategy can easily introduce tracking errors when abnormal changes happen to the object, which would result in failure of tracking. A method by casting object tracking as a sparse approximation problem in a particle filter framework is proposed in . The author solves the problem of object occlusion through the introduction of trivial template which, however, extends the number of templates greatly and increases the computational complexity. In that case, the practical value of algorithm is greatly reduced.
Motivated by the abovementioned discussions, we propose an online object tracking algorithm based on feature grouping and 2DPCA. Firstly, we introduce regularization into the 2DPCA reconstruction and develop an iterative algorithm to represent an object by 2DPCA bases. Secondly, the object templates are grouped into a more discriminative image and a less discriminative image by computing the variance of the pixels in multiple frames. Then, the projection matrix is learned according to the more discriminative image and the less discriminative image, and the samples are projected using the projection matrix. The object tracking results are obtained using Bayesian maximum a posteriori probability (MAP) estimation (Figures 1, 2, and 3). Finally, to further reduce tracking drift, we employ a template update strategy which combines incremental subspace learning and the error matrix. This strategy adapts the template to the appearance change of the target and reduces the influence of the occluded target template as well. The experimental results show that our proposed method has strong robustness to abnormal changes.
2. Robust Online Object Tracking Based on Feature Grouping and 2DPCA
2.1. The Theory of 2DPCA
Principal component analysis (PCA) is a well-established linear dimension-reduction technique, which has been widely used in many areas (such as face recognition ). It finds the projection directions along which the reconstruction error to the original data is minimum and projects the original data into a lower dimensional space spanned by those directions corresponding to the top eigenvalues. Recent studies demonstrate that two-dimensional principal component analysis (2DPCA) could achieve performance comparable to PCA with less computational cost.
Given a series of image matrices , 2DPCA aims to obtain an orthogonal left-projection matrix , an orthogonal right-projection matrix , and the projection coefficients by solving the following objective function:
Then the coefficient can be approximated by . We note that the underlying assumption of (6) is that the error term is Gaussian distributed with small variances. This assumption is not able to deal with partial occlusion as the error term cannot be modeled with small variances when occlusion occurs. In this paper, we propose an object tracking algorithm by using 2DPCA basis matrices and an additional MLE error matrix .
Let the objective function be the problem is where denotes an observation matrix, indicates its corresponding projection coefficient, is a regularization parameter, and describes the error matrix.
2.2. Feature Grouping
In object tracking problem, consists of templates and forms the object template library, where . In our experiments, we make and the object template size ; that is, . The result image block in current frame is denoted as , which has the same size as object template .
We would like to decompose each template , into a more discriminative image and a less discriminative image . First, we rewrite as , and compute the variance of each pixel in each template. if the pixel has a smaller variance, we can think that this pixel is more stable. Based on this heuristic, we can decompose those templates into a more discriminative image and a less discriminative image.
Denote by the mean of the templates . The variance of the th pixel in template is
We want to put those pixels having smaller into a more discriminative image and the remaining into a less discriminative image .
After decomposing, each template can be written as , and we have . However, if we learn the projection matrix only based on , the result cannot be very satisfying because is also useful for determining the projection direction. To effectively exploit the information in both and , we propose to learn a subspace where the energy in is well preserved and the energy in is suppressed. Denote by , , and the mean vectors of , , and , respectively, and let , , and be the centralized image vectors. Accordingly we have the centralized datasets , , and . Clearly, we have , , and . After projection, the average energy of is where is the total scatter matrix of and “tr” is the matrix trace operator.
Similarly, the average energy of is where is the total scatter matrix of .
To preserve the while suppressing the , we seek for a projection matrix to maximize the energy while minimizing the energy by solving the following optimization problem:
An equivalent form of (4) is
Apparently, the desired can be computed by using generalized eigenvalue decomposition; that is, matrix is composed of the generalized eigenvectors of corresponding to the largest eigenvalues.
3. Bayesian Map Estimation
We can regard object tracking as a hidden state variables’ Bayesian MAP estimation problem in the hidden Markov model; that is, with a set of observed samples , we can estimate the hidden state variable using Bayesian MAP theory.
According to the Bayesian theory, where stands for a state transition model for two consecutive frames and stands for an observation likelihood model. We can obtain the object’s best state in th frame through maximum a posteriori probability estimation; that is, where stands for the th sample of state variable in th frame. In this paper, we choose .
3.1. State Transition Model
We choose object’s motion affine transformation parameters as state variable , where and , respectively, represent the -direction and -direction translation of the object in th frame, stands for the rotation angle, represents the scale change, stands for the aspect ratio, and stands for the direction of tilt.
We assume that the state transition model follows the Gaussian distribution; that is, where is a diagonal matrix whose diagonal elements are motion affine parameter’s variation , and .
3.2. Observation Likelihood Model
We use object’s reconstruction error to build observation likelihood model; that is, where means Gaussian distribution, and , respectively, represent the mean and variation of Gaussian distribution, stands for the number of pixels of an object template, and stands for the reconstruction error of th pixel of object templates in th frame.
4. Experimental Results and Analysis
In order to show the robustness of the object tracking algorithm based on projection discussed in this paper, we choose several sets of public test videos taken under different environments to test the performance of our algorithm (Figures 4, 5, and 6). Given the limited space, in this section we only list three of them to show the tracking results and error curves. Different abnormal changes have happened to the moving objects in chosen test videos such as occlusion, rotation, illumination variation, or scale variation (Table 1).
The implementation of the algorithm is based on Windows operating system. The configuration of the computer is AMD Athlon (TM) X2 Dual Core QL-62 2.00 GHz, 1.74 GB memory. In order to evaluate the performance of the algorithm, we choose six currently most representative and classic tracking algorithms to do the comparison. The six classic algorithms are L1 Tracker , IVT Tracker , PN Tracker , VTD Tracker , MIL Tracker , and Frag Tracker .
The average processing speeds of our method for different test videos are listed in Table 2. The statistical data is obtained using Opencv2.2. Table 2 shows that the proposed algorithm can meet the real-time performance.
This paper presents a robust tracking algorithm via feature grouping and 2DPCA. In this work, we represent the tracked object by using 2DPCA bases and a feature grouping. With the proposed model, we can reduce the effect of abnormal pixels on tracking algorithms. We obtained the tracking result using Bayesian maximum a posteriori probability estimation framework and designed a stable and robust tracker. Then, we explicitly take partial occlusion and misalignment into account for appearance model update and object tracking. Experiments on challenging video clips show that our tracking algorithm performs better than several state-of-the-art algorithms. Our future work will be the generalization of our representation model into other related fields.
This research described in this paper was supported by the Fundamental Research Funds for the Central Universities (DC110321, DC120101132, and DC120101131). This work was supported by project of Liaoning Provincial Department of Education (L2012476 and L2010094) and by National Natural Science Foundation of China (61172058).
M. X. Jiang, M. Li, and H. Y. Wang, “A robust combined algorithim of object tracking based on moving object detection,” in Proceedings of the International Conference on Intelligent Control and Information Processing (ICICIP '10), pp. 619–622, Dalian, China, August 2010.View at: Publisher Site | Google Scholar
O. Williams, A. Blake, and R. Cipolla, “A sparse probabilistic learning algorithm for real-time tracking,” in Proceedings of the 8th IEEE International Conference on Computer Vision, pp. 353–360, October 2003.View at: Google Scholar
Z. Kalal, J. Matas, and K. Mikolajczyk, “P-N learning: bootstrapping binary classifiers by structural constraints,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 49–56, San Francisco, Calif, USA, June 2010.View at: Publisher Site | Google Scholar
B. Babenko, S. Belongie, and M. Yang, “Visual tracking with online multiple instance learning,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops '09), pp. 983–990, Miami Beach, Fla, USA, June 2009.View at: Publisher Site | Google Scholar