Object Tracking via 2DPCA and -Regularization
We present a fast and robust object tracking algorithm by using 2DPCA and -regularization in a Bayesian inference framework. Firstly, we model the challenging appearance of the tracked object using 2DPCA bases, which exploit the strength of subspace representation. Secondly, we adopt the -regularization to solve the proposed presentation model and remove the trivial templates from the sparse tracking method which can provide a more fast tracking performance. Finally, we present a novel likelihood function that considers the reconstruction error, which is concluded from the orthogonal left-projection matrix and the orthogonal right-projection matrix. Experimental results on several challenging image sequences demonstrate that the proposed method can achieve more favorable performance against state-of-the-art tracking algorithms.
Visual tracking is one of the fundamental topics in computer vision and plays an important role in numerous researches and practical applications such as surveillance, human computer interaction, robotics, and traffic control. Existing object tracking algorithms can be divided into two categories, that is, discriminative or generative. Discriminative methods treat tracking as a binary classification problem with local search which estimates the decision boundary between an object image patch and the background. Babenko et al.  proposed an online multiple instance learning (MIL), which treats ambiguous positive and negative samples as bags to learn a discriminative classifier. Zhang et al.  propose a fasting compressive tracking algorithm which employs nonadaptive random projections that preserve the structure of the image feature.
Generative methods typically learn a model to represent the target object and incrementally update the appearance model to search for the image region with minimal reconstruction error. Inspired by the success of sparse representation in face recognition , supersolution , and inpainting , recently, sparse representation based visual tracking [6–9] has also attracted increasing interests. Mei and Ling  first extend sparse representation to object tracking and cast the tracking problem as determining the likeliest patch with a sparse representation of templates. The method can handle partial occlusion by treating the error term as sparse noise. However, it requires solving a series of complicated norm related minimization problems many times and the time complexity is quite significant. Although some modified norm methods have been proposed to speed up tracker, they are still far away from being real time.
Recently, many object tracking algorithms have been proposed to exploit the power of subspace representation from different points. Ross et al.  present a tracking method that incrementally learns a PCA low-dimensional subspace representation, efficiently adapting online to changes in the appearance of the target. However, this method is sensitive to partial occlusion. Zhong et al.  proposed a robust object tracking algorithm via sparse collaborative appearance model that exploits both holistic templates and local representations to account for appearance changes. Zhuang et al.  cast the tracking problem as finding the candidate that scores the highest in the evaluation model based upon a matrix called discriminative sparse similarity map. Qian et al.  exploit an appearance model based on extended incremental nonnegative matrix factorization for visual tracking. Wang and Lu  present a novel online object tracking algorithm by using 2DPCA and -regularization. This method can achieve good performance among many scenes. However, the coefficients and the sparse error matrix used in this method need an iterative algorithm to compute and the space and time complexity are too large to meet the real-time tracking.
Motivated by the aforementioned work, this paper presents a robust and fast norm tracking algorithm with adaptive appearance model. The contributions of this work are threefold: () we exploit the strength of 2DPCA subspace representation using -regularization; () we remove the trivial templates from the sparse tracking method; () we present a novel likelihood function that considers the reconstruction error, which is concluded from the orthogonal left-projection matrix and the orthogonal right-projection matrix. Both qualitative and quantitative evaluation on video sequences demonstrate that the proposed method can handle occlusion, illumination changes, scale changes, and no-rigid appearance changes effectively in a lower computation complexity and can run in real time.
2. Object Representation via 2DPCA and -Regularization
Principal component analysis (PCA) is a classical feature extraction and data representation technique widely used in the areas of pattern recognition and computer vision. Compared with PCA, two-dimensional principal component analysis (2DPCA)  is based on 2D matrices rather than 1D vectors. So the image matrix does not need to be previously transformed into vector. That is, the extraction of image features is computationally more efficient using 2DPCA than PCA. In this paper, we represent the object by using 2D basis matrices. Given a series of image matrices , the projection coefficients matrices can be got by solving the following function: where denotes the Frobenius norm; represents the orthogonal left-projection matrix; represents the orthogonal right-projection matrix.
The cost function is set as an -regularization quadratic function: Here, is a constant. The solution of (2) is easily derived as follows: Here, and mean the identity matrix. stands for Kronecker product. means the vector-version of the matrix . Therefore, we can get the projection coefficients matrix . Let . Obviously, the projection matrix is independent from and we can precalculate it in each frame before circulation for all candidates. When a new candidate comes, we can simply calculate to obtain , which makes the proposed method very fast.
Here, we abandon the trivial templates completely, which makes the target able to be represented by the 2DPCA subspace fully. The error matrix can be obtained by the following equation after we get the projection coefficients matrix from (3): So, the error matrix can be calculated once.
3. Tracking Framework Based on 2DPCA and -Regularization
Visual tracking is treated as a Bayesian inference task in a Markov model with hidden state variables. Given a series of image matrices , we aim to estimate the hidden state variable recursively: where is the motion model that represents the state transition between two consecutive states. is the observation model which indicates the likelihood function.
Motion Model. We apply an affine image warp to model the target motion between consecutive states. Six parameters of the affine transform are used to model of a tracked target. Let , where , , , , , and denote and translations, rotation angle, scale, and aspect ration and skew, respectively. The state transition is formulated by random walk; that is, , where is a diagonal covariance matrix which indicates the variances of affine parameters.
Observation Model. If no occlusion occurs, an image observation can be generated by a 2DPCA subspace (spanned by and and centered at ). Here, we consider the partial occlusion in the appearance model for robust tracking. Thus, we assume that the centered image matrices can be represented by the linear combination of the projection matrices and . Then, we draw candidates in the state . For each of the observed image matrices, we solve a -regularization problem:where denotes the th sample of the state . Thus, we obtain and the likelihood can be measured by the reconstruction error: However, it is noted that, just by penalizing error level, the precise location of the tracked target can be benefited. Therefore, we present a novel likelihood function, which considers both the reconstruction error and the level of error matrix: where can be calculated by (9): Here, is calculated by (3).
Online Update. In order to handle the appearance change of tracked target, it is necessary to update the observation model. If some imprecise samples are used to update, the tracked model may degrade. Therefore, we present an occlusion-radio-based update mechanism. After obtaining the best candidate state of each frame, we compute the corresponding error matrix and the occlusion ratio . Two thresholds = 0.1 and = 0.6 are introduced to define the degree of occlusion. If , the tracked target is not occluded or a small part of it is occluded by the noise. Therefore, the model with sample is updated directly. If , the tracked target is partially occluded. The occluded part is replaced by the average observation and the recovered candidate is used for update. If , most part of the tracked target is occluded. Therefore, the sample is discarded without update. After we cumulate enough samples, we use an incremental 2DPCA algorithm to update the tracker (left- and right-projection matrices).
The proposed tracking algorithm is implemented in MATLAB which runs on a computer with Intel i5-3210 CPU (2.5 GHz) with 4 GB memory. The regularization is set to 0.05. The image observation is resized to pixels for the proposed 2DPCA representation. For each sequence, the location of the tracked target object is manually labeled in the first frame. 600 particles are adopted for the proposed algorithm accounting for the trade-off between effectiveness and speed. Our tracker is incrementally updated every 5 frames.
To demonstrate the effectiveness of the proposed tracking algorithm, we select six state-of-the-art trackers: the tracker , the PN tracker , the VTD tracker , the MIL tracker , the Frag tracker , and the 2DPCA tracker  for comparison on several challenging image sequences including Occlusion 1, David Outdoor, Caviar 2, Girl, Car 4, Car 11, Singer 1, Deer, Jumping, and Lemming. The challenging factors include severe occlusion, pose change, motion blur, illumination variation, and background clutter.
4.1. Qualitative Evaluation
Severe Occlusion. We test four sequences (Occlusion 1, DavidOutdoor, Caviar 2, and Girl) with long time partial or heavy occlusion and scale change. Figure 1(a) demonstrates that algorithm, Frag algorithm, 2DPCA, and our algorithms perform better, since these methods take partial occlusion into account. algorithm, 2DPCA, and our algorithms can handle occlusion by avoiding updating occluded pixels into the PCA basis and 2DPCA basis separately. Frag algorithm can work well on some simple occlusion cases (e.g., Figure 1(a), Occlusion 1) via the part-based representation. However, this method performs poorly on some more challenging videos (e.g., Figure 1(b), DavidOutdoor). MIL tracker is not able to track the occluded target in DavidOutdoor and Caviar 2, since the Harr-like features the MIL method adopted are less effective in distinguishing the similar objects. For the Girl video, the in- and out-of-plane rotation, partial occlusion, and the scale change make it difficult to track. It can be seen that the Frag and the proposed tracker work better than the other methods.
Illumination Change. Figures 1(e) and 1(f) present tracking results using the Car 4, Car 11, and Singer 1 sequences with significant change of illumination and scale as well as background clutter. The tracker, 2DPCA tracker, and the proposed tracker perform well in the Car 4 sequences whereas the other trackers drift away when the target vehicle goes underneath the overpass or the trees. For Car 11 sequences, 2DPCA and the proposed tracker can achieve robust tracking results whereas the other trackers drift away when drastic illumination change occurs or when similar object appears. In the Singer 1 sequence, the drastic illumination and scale change make it difficult to track. It can be seen that the proposed tracker performs better than the other methods.
Motion Blur. It is difficult for tracking algorithms to predict the location of the tracked objects when the target moves abruptly. Figures 1(h) and 1(i) demonstrate the tracking results in the Deer and Jumping sequences. In Deer sequences, the animal appearance is almost indistinguishable due to the fast motion and most methods lost the target right at the beginning of the video. At frame 53, PN tracker locates the similar deer instead of the right object. From the results, we can see that the VTD tracker and our tracker perform better than the other algorithms. 2DPCA tracker may be able to track the target again by chance after failure. The appearance changes of the Jumping sequences are drastic such that the , Frag, and VTD tracker drift away from the object. Our tracker successfully keeps track of the object with small errors whereas the MIL, PN, and 2DPCA can track the target in some frames.
Background Clutter. Figure 1(j) illustrates the tracking results in the Lemming sequences with scale and pose change, as well as severe occlusion in cluttered background. Frag tracker lost the target object at the beginning of the sequence and when the target object moves quickly or rotates, the VTD tracker fails too. In contrast, the proposed method can adapt to the heavy occlusion, in-plane rotation, and scale change.
4.2. Quantitative Evaluation
To conduct quantitative comparisons between the proposed tracking method and the other sate-of-the-art trackers, we compute the difference between the predicted and the ground truth center location error in pixels and overlap rates which are most widely used in quantitative evaluation. The center location error is usually defined as the Euclidean distance between the center locations of tracked objects and their corresponding labeled ground truth. Figure 2 demonstrates the center error plots, where a smaller center error means a more accurate result in each frame. Overlap rate score is defined as . is the tracked bounding box of each frame and is the corresponding ground truth bounding box. Figure 3 shows the overlap rates of each tracking algorithm for all sequences. Generally speaking, our tracker performs favorably against the other methods.
4.3. Computational Complexity
The most time consuming part of the generative tracking algorithm is to compute the coefficients using the basis vectors. For the tracker, the computation of the coefficients using the LASSO algorithm is . is the dimension of subspace and is the number of basis vectors. The load of the 2DPCA tracker  with regularization is . stands for the number of iterations (e.g., 10 on average). For our tracker, the trivial templates are abandoned and square templates are not used. So the load of our tracker is . The tracking speed of , 2DPCA, and our method is 0.25 fps, 2.2 fps, and 5.2 fps separately (fps, frame per second). Therefore, we can get that our tracker is more effective and much faster than the aforementioned trackers.
In this paper, we present a fast and effective tracking algorithm. We first clarify the benefits of the utilizing 2DPCA basis vectors. Then, we formulate the tracking process with the -regularization. Finally, we update the appearance model accounting for the partial occlusion. Both qualitative and quantitative evaluations on challenging image sequence demonstrate that the proposed method outperforms several state-of-the-art trackers.
The authors declare that they have no competing interests.
This project is supported by the Shandong Provincial Natural Science Foundation, China (no. ZR2015FL009).
B. Babenko, M. H. Yang, and S. Belongie, “Visual tracking with online multiple instance learning,” in Proceedings of the 22th IEEE Conference on Computer Vision and Pattern Recognitionin, pp. 983–990, San Francisco, Calif, USA, 2009.View at: Google Scholar
K. H. Zhang, L. Zhang, and M. H. Yang, “Real time compressive tracking,” in Proceedings of 12th European Conference on Computer Vision, pp. 864–877, Florence, Italy, 2012.View at: Google Scholar
T. Zhang, B. Ghanem, S. Liu, and N. Ahuja, “Robust visual tracking via multi-task sparse learning,” in Proceedings of the 25th IEEE Conference on Computer Vision and Pattern Recognition, pp. 2042–2049, Providence, RI, USA, 2012.View at: Google Scholar
Z. Kalal, J. Matas, and K. Mikolajczyk, “P-N learning: bootstrapping binary classifiers by structural constraints,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '10), pp. 49–56, San Francisco, Calif, USA, June 2010.View at: Publisher Site | Google Scholar
A. Adam, E. Rivlin, and I. Shimshoni, “Robust fragments-based tracking using the integral histogram,” in Proceedings of the 19th IEEE Conference on Computer Vision and Pattern Recognition, pp. 798–805, New York, NY, USA, 2006.View at: Google Scholar