Abstract

We present a novel visual object tracking algorithm based on two-dimensional principal component analysis (2DPCA) and maximum likelihood estimation (MLE). Firstly, we introduce regularization into the 2DPCA reconstruction and develop an iterative algorithm to represent an object by 2DPCA bases. Secondly, the model of sparsity constrained MLE is established. Abnormal pixels in the samples will be assigned with low weights to reduce their effects on the tracking algorithm. The object tracking results are obtained by using Bayesian maximum a posteriori (MAP) probability estimation. Finally, to further reduce tracking drift, we employ a template update strategy which combines incremental subspace learning and the error matrix. This strategy adapts the template to the appearance change of the target and reduces the influence of the occluded target template as well. Compared with other popular methods, our method reduces the computational complexity and is very robust to abnormal changes. Both qualitative and quantitative evaluations on challenging image sequences demonstrate that the proposed tracking algorithm achieves more favorable performance than several state-of-the-art methods.

1. Introduction

As one of the fundamental problems of computer vision, visual tracking plays a critical role in advanced vision-based applications (e.g., visual surveillance, human-computer interaction, augmented reality, intelligent transportation, and context-based video compression) [13]. However, building a robust model-free tracker is still a challenging issue due to the difficulty arising from the appearance variability of an object of interest, which includes intrinsic appearance variability (e.g., pose variation and shape deformation) and extrinsic factors (illumination changes, camera motion, occlusions, etc.).

Typically, a complete tracking system can be divided into three main components: (1) an appearance observation model, which evaluates the likelihood of a candidate state belonging to the object model, (2) a motion model, which aims to model the states of an object over time (such as Kalman filtering and particle filtering), and (3) a search strategy for finding the most likely states in the current frame (e.g., mean shift and sliding window). In this paper, we are devoted to developing a robust appearance model.

Due to the power of subspace representation, subspace-based trackers (e.g., [4, 5]) are robust to in-plane rotation, scale change, illumination variation, and pose change. However, they are sensitive to partial occlusion caused by their underlying assumption that the error term is Gaussian distributed with small variances. This assumption does not hold for object representation when partial occlusion occurs as the noise term cannot be modeled with small variances.

An effective tracking algorithm (called L1 tracker) based on sparse representation within a particle filter framework is developed in [6]. The L1 tracker represents the tracked target by using a set of target templates and trivial templates. The target templates depict a subspace on the tracked object and the trivial templates aim to model the occlusion effectively. However, the use of trivial templates increases the number of templates significantly, which make the computational complexity of L1 tracker too high to satisfy real applications.

In [7], the authors also presented a sparse coding-based tracker by combing sparse coding and Kalman filtering and fusing the color and gradient features. To account for the variations of the tracked object during the tracking processing, they use a template update strategy by replacing a random template of the original template library with the last tracking result. However, this simple update manner can easily introduce tracking errors when abnormal changes occur, which may cause tracking drift.

Motivated by aforementioned discussions, we propose an object tracking algorithm based on 2DPCA and MLE. Firstly, we introduce regularization into the 2DPCA reconstruction and develop an iterative algorithm to represent an object by 2DPCA bases. Secondly, the model of sparsity constrained MLE is established. Abnormal pixels in the samples will be assigned with low weights to reduce their affects on the tracking algorithm. The object tracking results are obtained by using Bayesian maximum a posteriori probability (MAP) estimation. Finally, to further reduce tracking drift, we employ a template update strategy which combines incremental subspace learning and the error matrix. This strategy adapts the template to the appearance change of the target and reduces the influence of the occluded target template as well. The experimental results show that our algorithm can achieve stable and robust performance especially when occlusion, rotation, scaling, or illumination variation occurs.

2. Visual Object Tracking Model Based on 2DPCA and MLE: The Theory of 2DPCA

2.1. The Theory of 2DPCA

Principal component analysis (PCA) is a well-established linear dimension-reduction technique, which has been widely used in many areas (such as face recognition [8]). It finds the projection directions along which the reconstruction error to the original data is minimum and projects the original data into a lower dimensional space spanned by those directions corresponding to the top eigenvalues. Recent studies demonstrate that two-dimensional principal component analysis (2DPCA) could achieve performance comparable to PCA with less computational cost [9, 10].

Given a series of image matrices , 2DPCA aims to obtain an orthogonal left-projection matrix , an orthogonal right-projection matrix , and the projection coefficients by solving the following objective function:

Then the coefficient can be approximated by . We note that the underlying assumption of (1) is that the error term is Gaussian distributed with small variances. This assumption is not able to deal with partial occlusion as the error term cannot be modeled with small variances when occlusion occurs. In this paper, we propose an object tracking algorithm by using 2DPCA basis matrices and an additional MLE error matrix .

Let the objective function be the problem is where denotes an observation matrix, indicates its corresponding projection coefficient, and is a regularization parameter. describes the error matrix.

2.2. MLE Model

The basic idea of sparse coding is to use the templates in a given dictionary to represent a testing sample (as ), where is sparse coding coefficient vector. Traditionally, the sparsity can be measured by L0-norm and the L0-norm minimization is an NP-hard problem. Fortunately, [11] proves that when the solution is sparse enough, L0-norm minimization is equivalent to the L1-norm minimization.

Therefore, the sparse coding problem can be defined as [12, 13] where is a very small constant. This model shows two constraints in sparse coding: one is that constrains the sparsity of represented signal; the other is that constrains the accuracy of the represented signal [1417].

The analysis of the two constraint terms mentioned earlier is as follows. For object tracking, the accuracy constraint is more important than the sparsity one, especially when occlusion, rotation, scaling, or illumination variation happens to the object. In that case, considering some possible abnormal changes, whether the model can accurately describe the object or not will directly determine the success or failure of tracking algorithm. Most of current algorithms are presented under the assumption that the sparse coding residual follows the Gaussian distribution. In practice, however, this assumption is limited when abnormal changes happen which will inevitably lead to the failure of tracking algorithm.

In sparsity constraints, though L1-norm minimization is more efficient than the L0-norm minimization, the fact is that the L1-norm minimization programming is still very time consuming. Object tracking algorithms are different from face recognition algorithms in that face recognition algorithms do not demand fast processing speed in a sample training process, while in object tracking, slow processing speed will directly affect the practical value of the object tracking algorithm. In that case, the introduction of L1-norm minimization into the field of object tracking would greatly reduce the performance of tracking algorithms.

We note that the tracking accuracy and speed are two important aspects for evaluating the performance of object tracking algorithms. Therefore, in this paper, we develop an MLE-based model that improves the traditional sparse coding model from the two aspects and then apply it to achieve an effective and efficient tracker.

In the field of object tracking, accuracy is the most important issue. Hence, at first, we need to improve the accuracy constraint term in the traditional sparse coding model.

When the reconstruction error follows the Gaussian distribution, the traditional sparse coding solution can be written as where is a regularization parameter. For object tracking, the dictionary consists of templates and forms the object template library.consider , . In our experiments, we make and the object template size ; that is,  . The result image block in current frame is denoted as , . denotes the coefficient vector of sparse coding. Equation (5) is obviously a minimum variance estimation problem with sparsity constraint. When object’s reconstruction error follows the Gaussian distribution, the solution of (5) is the maximum likelihood estimation.

However, in practical applications, when the object suffers from occlusion, rotation change, scale change, or illumination variation, the reconstruction errors of abnormal pixels will not follow the Gaussian distribution. In that case, these algorithms may not track the object accurately. Therefore, we need to build a more adaptive object representing model.

First, we rewrite the dictionary as , where row vector  , , is the th row of . Meanwhile, we rewrite tracking result image block as , where , , is the th pixel of . In that case, the reconstruction error , where , , is the th pixel’s reconstruction error.

Assume that are independently and identically distributed according to a certain probability density function , where denotes the parameter set that characterizes the distribution. Then the likelihood function would be , and MLE aims to maximize this likelihood function or, equivalently, minimize the objective function: , where , to simplify the computation.

Taking into account the sparsity constraint of , the MLE of can be formulated as the following minimization:

According to [6], formula (6) can be converted into weighted sparse coding problem where is a diagonal matrix with diagonal elements as follows: which also stands for the th pixel’s weight value. and are positive constants. If we make , then the model would be the traditional sparse coding problem. Hence, we can see that formula (7) is more adaptive than (3).

In this study, we choose it as the weight function where is a scale factor (we choose in our experiments). The physical meaning of is to allocate smaller weights to those pixels with bigger residuals (probably abnormal pixels) and allocate bigger weights to pixels with smaller residuals. By setting a reasonable weight threshold, we can get rid of those abnormal pixels lower than the threshold and do further sparse coding. In that case, we can effectively reduce the effect of abnormal pixels and therefore achieve good performance during the tracking processing.

From (9), we can see that the weight value is bounded between 0 and 1 which makes sure that even the pixels with very small residuals would not have too large weight values. This would guarantee the stability of the algorithm.

3. Bayesian MAP Estimation

We can regard object tracking as a hidden state variables’ Bayesian MAP estimation problem in the Hidden Markov model; that is, with a set of observed samples , we can estimate the hidden state variable using Bayesian MAP theory.

According to the Bayesian theory, where stands for a state transition model for two consecutive frames and stands for an observation likelihood model. We can obtain the object’s best state in th frame through maximum posterior probability estimation; that is, where stands for the th sample of state variable in th frame. In this paper, we choose .

3.1. State Transition Model

We choose object’s motion affine transformation parameters as state variable , where and , respectively, represent the -direction and -direction translation of the object in th frame, stands for the rotation angle, represents the scale change, stands for the aspect ratio, and stands for the direction of tilt.

We assume that the state transition model follows the Gaussian distribution; that is, where is a diagonal matrix whose diagonal elements are motion affine parameter’s variation , , , , , .

3.2. Observation Likelihood Model

We use object’s reconstruction error to build observation likelihood model; that is, where means Gaussian distribution, and , respectively, represent the mean and variation of Gaussian distribution, stands for the number of pixels of an object template, and stands for the reconstruction error of th pixel of object templates in th frame.

3.3. Templates Updating

To consider that the appearance of the target may change during the tracking processing, it is necessary to dynamically update the template library.

In this paper, we use a method named “Half Updating Strategy” to update the templates. We take the tracking results of first frames as the initial templates, and from th frame on, we use the algorithm mentioned earlier to obtain and save the tracking results. During this process, if the result image block has equal to 50% abnormal pixels, then we do not update the tracker. When we have tracking results, that is, half of the number of initial templates, we replace the first templates in original template library with the newly accumulated tracking results. Then a “Half Updating” is finished.

4. Experimental Results and Analysis

In order to evaluate the performance of our tracker, we conduct experiments on three challenging image sequences (Table 1 and Figures 1, 2, and 3). These sequences cover most challenging situations in object tracking: occlusion, motion blur, in-plane and out-of-plane rotation, large illumination change, scale variation, and complex background. For comparison, we run six state-of-the-art algorithms with the same initial position of the target. These algorithms are the Frag tracking [18], IVT tracking [19], MIL tracking [20], L1 tracking [6], PN tracking [21], and VTD tracking [22] methods. We present some representative results in this section.

5. Conclusions/Outlook

This paper presents a robust tracking algorithm via 2DPCA and MLE. In this work, we represent the tracked object by using 2DPCA bases and an MLE error matrix. With the proposed model, we can remove the abnormal pixels and thus reduce the effect of abnormal pixels on tracking algorithms. We take the object’s reconstruction error into the Bayesian maximum posterior probability estimation framework and design a stable and robust tracker. Then, we explicitly take partial occlusion and misalignment into account for appearance model update and object tracking. Experiments on challenging video clips show that our tracking algorithm performs better than several state-of-the-art algorithms. Our future work will be the generalization of our representation model into other related fields.

Acknowledgments

This research described in this paper was supported by the Fundamental Research Funds for the Central Universities (DC110321, DC120101132, and DC120101131). This work was supported by Project of Liaoning Provincial Department of Education (L2012476, and L2010094). This work was supported by National Natural Science Foundation of China (61172058).