Abstract

We address object tracking problem as a multitask feature learning process based on low-rank representation of features with joint sparsity. We first select features with low-rank representation within a number of initial frames to obtain subspace basis. Next, the features represented by the low-rank and sparse property are learned using a modified joint sparsity-based multitask feature learning framework. Both the features and sparse errors are then optimally updated using a novel incremental alternating direction method. The low-rank minimization problem for learning multitask features can be achieved by a few sequences of efficient closed form update process. Since the proposed method attempts to perform the feature learning problem in both multitask and low-rank manner, it can not only reduce the dimension but also improve the tracking performance without drift. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art tracking methods for tracking objects in challenging image sequences.

1. Introduction

Object tracking is one of the well-known problems in computer vision with many applications including intelligent surveillance, human-computer interface, and motion analysis. In spite of significant success, designing a robust object tracking algorithm remains still challenging issue due to factors from real-world scenarios such as severe occlusion, scale and illumination variations, background clutter, rotations, and fast motions.

An appearance model-based tracking method, which evaluates the likelihood of an observed image patch belonging to object class, considers some critical factors such as object representation and representation scheme. The object representation can be categorized by adopted features [1, 2] and description models [3, 4]. The representation scheme can be either generative or discriminative. The generative methods regard the appearance modeling as finding the image observation with minimal reconstruction error [5, 6]. On the other hand, the discriminative methods focus on determining a decision region that distinguishes the object from the background [7, 8].

Various object tracking methods based on object appearance models can handle only moderate changes and usually fail to track when the object appearance significantly changes. As a result, an appearance model learning process is required for robust object tracking under challenging issues such as object deformation.

Recently, sparse representation-based norm minimization methods have been successfully employed for object tracking [913], where an object is represented as one of multiple candidates in the form of sparse linear combination of a dictionary that can be updated to maintain the optimal object appearance model. Although the sparse representation-based method can robustly track an object with partial occlusion, the computational cost for the norm minimization in each frame is still expensive.

Bao et al. applied the accelerated proximal gradient (APG) [14] approach to efficiently solve the minimization problems for object tracking. Structured multitask tracking (MTT) was proposed by mining object discriminative structure between different particles rather than individually learning each particle [15]. Zhang et al. address their tracking performance as a fast solution of the MTT problem using Bao’s learning method of joint particle representation. By decomposing the sparse coefficients into two matrices for maintaining inliers and outliers, the robust MTT- (RMTT-) based object tracking was proposed in [16].

The low-rank sparse tracking (LRST) was proposed by representing all samples using only a few templates [17]. The LRST is improved with incremental learning method of low-rank features [18] and adaptive pruning with exploiting temporal consistency [19].

In spite of these improvements in MTT and LRST in the particle filter framework, the computational cost increases with the number of particles. Furthermore, MTT regards the sparse representations of sampled particles as independent state data without considering relationships between particles. For this reason, MTT may not work if the particles are drawn from specific probability distribution. The LRST-based methods are difficult to apply directly for object tracking in online video processing due to its structural computation complexity such as nuclear norm minimization.

To solve the above mentioned problems, we propose a novel object tracking algorithm based on multitask feature learning using joint sparsity and low-rank representation. We assume that the object representation can be incrementally optimized in the robust principal component analysis (RPCA) framework. The RPCA can be performed by decomposing the observations as the sum of a low-rank matrix and a sparse matrix; thus it can successfully recover the intrinsic subspace structure from corrupted observations. We extract features with low-rank representation within a few frames. After obtaining the subspace basis of object features, the features represented by all possible low-rank and the sparse property are learned using a variant of multitask feature learning framework. Finally, a novel incremental alternating direction method- (ADM-) based low-rank optimization strategy is efficiently applied for update of sparse error and features. The low-rank optimization problem for learning multitask features can be achieved by a few sequences of efficient closed form updating operation for the optimal state variables of object tracking.

2. Low-Rank Representation of Object with Joint Sparsity

2.1. Low-Rank Representation of Object

RPCA [20] was recently introduced to recover low-rank features in learning for the reduced subspace. Given a sparse matrix with corrupted observation, RPCA attempts to exactly recover the low-rank matrix and minimize sparse error . Let denote the nuclear norm of the low-rank matrix such as the sum of singular values of . Under this assumption, the principal component pursuit (PCP) is defined as in the following minimization: where represents the norm and is a regularization parameter.

The assumption under RPCA framework [20], for disentangling the low-rank and sparse component, is that has singular vectors and is not sparse. The singular value decomposition of can be expressed as where is the rank of , are positive singular values, and and are the matrices of left- and right-singular projection vectors.

The object tracking problem can be solved under the RPCA framework. Suppose we have sampled particles and the correspondingly observed candidate of an object at time . Each observation can be represented as a linear combination of basis vectors , which spans the data space such as , where , and each column of indicates the discriminative feature representation of each particle observation. In this paper, we utilize the strength of both subspace learning and sparse representation for object appearance model in the principal component analysis (PCA) as [21]

In feature learning literature, the multitask feature learning [22, 23] employs the norm minimization to involve features selection across tasks under a strict assumption that all tasks share a common underlying representation.

The goal of the proposed object tracking framework is to search particles that have the most similar feature to previous tracking result. Since particles are densely sampled around the current object state, in order to estimate the optimal low-rank features in iterative steps, we focus on all possible nonsmooth sparse errors, joint sparsity of features based on low-rank, and and constraints that produce solution for the within- and between-feature-tasks sparsity. With this consideration, we formulate the following object tracking problem by adopting three equality constraints as where is called , norm of matrix and , , , and are regularization parameters.

In the proposed object tracking formulation in (4), we should minimize the rank of the matrix for features of all object candidates. Since there is no closed form solution for the rank minimization problem, we replace the rank minimizing of matrix with its convex envelope by the nuclear norm as . Moreover, mining similarities between all particle structures can improve the tracking results. However, in the object tracking process, some candidates are completely different from others when particles are sampled from an abnormally large region. In order to address this problem, we place the matrix in shared feature and nonshared sparse feature . The shared inlier feature exploits the similarities of particles while the nonshared sparse feature considers possible appearance variations of object and background between candidates, and only a small number of candidates are required to reliably represent the observation of each candidate. In order to reduce sparse errors, we minimize the norm of error matrix . This error assumption has been originally introduced in [9] in the presence of object occlusion.

In order to solve this objective function in (4), we introduce three equality constraints and slack variables such as , , and as

The above mentioned problem can be minimized using the conventional inexact augmented Lagrange multiplier (IALM) method [24] that has well-defined convergence properties for nonsmooth optimization which has been used in matrix rank minimization problems. This method is an iterative process that employs the Lagrangian function by adding quadratic penalty terms that assign high cost to the infeasible data and admit closed form updates for each unknown variable. Since the alternating direction method (ADM) [25] is introduced for the low-rank matrix completion and is based on the alternating direction augmented Lagrangian method, we employ ADM for solving minimization of ALM functions.

2.2. Optimization

To solve the objective function given in (5), we formulate the ALM-based ADM model [24, 26] as where , , , and are Lagrangian multipliers, is called Frobenius norm of matrix , and is a penalty parameter.

The proposed strategy for given variable update is performed by computing sparse coefficients based on the low-rank representation with updated error and jointly sparse features. Thus, we first update the sparse error of the observation matrix by referring to the proof [24] as where .

For updating low-rank variable , we employ the proof in [27] such that low-rank matrix approximation problem with -norm data fidelity can be solved by a soft-thresholding operation on the singular values of observation matrix as where and .

We update the and referring to the proof [24] as

The ADM can perform minimization with its alternative property; thus we formulate the objective function for updating sparse coefficient based on (6) as is updated with the already updated sparse error , low-rank representation , norm , and norm using the following model: We obtain the closed form update framework as where is the partial derivative of with respect to and is a component-wise soft-thresholding operator defined by [28] such that if ; otherwise, . We set and .

Then, we update Lagrange multipliers as where .

2.3. Convergence

The convergence of the proposed object tracking algorithm can be guaranteed using the Karush-Kuhn-Tucker (KKT) point approach by referring to linearized ADM with adaptive penalty (LADMAP) approach [26]. The LADMAP [26] aims to solve the following type of convex problems: where , , and could be either vectors or matrices, and are convex functions, and and are linear mappings.

In many problems, and are vector or matrix norms, and when and are identity mappings, the augmented Lagrangian functions of (14) for minimizing and have closed form solutions. However, when and are not identity mappings, they may not be easy to solve. Therefore, Lin et al. [26] proposed linearizing the quadratic penalty term in the augmented Lagrangian function and adding a proximal term for updating and , resulting in the following updating scheme: where is the Lagrange multiplier, is the penalty parameter, and and are some parameters for each norm. Lin et al. [26] proposed a strategy to adaptively update the penalty parameter and proved that when is nondecreasing and is upper bound and and , then converges to an optimal solution to (14), where and are the operator norm of and , respectively.

For updating the penalty parameter in (6) for the proposed problem, we apply the concept of LADMAP, where , , , and are , , , and , respectively, in (6), with some algebra: where is an upper bound of . The value is decided as where is a constant, and we used in this work.

The iterations equations (7)–(13) and (16)-(17) stop when the following two conditions are satisfied: where and are small thresholds.

The above two stopping criteria are based on the KKT conditions of problem (6). Algorithm 1 summarizes the overall algorithm for the proposed low-rank representation-based multitask feature learning with joint sparsity.

(1) Input: A data set of data points , and parameters , , , .
(2) Output: Optimal coefficient , and errors
(3) Initialization: Set , , , , , ,  , , ,
, and
(4) Iterate until (18) and (19) is not satisfied:
Fix other variables and update using (7)
Fix other variables and update using (8)
Fix other variables and update and using (9) and (10)
Fix other variables and update using (12)
Update , , , and parameters using (13)
Update using (16) and (17)
(5) End

2.4. Object Tracking

Given a set of observed images at the th frame , the goal of object tracking is to recursively estimate the state , maximizing the following state distribution: where denotes the motion model between two temporally consecutive states and indicates the likelihood function.

In the motion model, we regard the state variable as six affine parameters [4] such as horizontal and vertical translations, rotation, scale, aspect ratio, and skew such as . The motion model is assumed to a have random Gaussian distribution such as , where is a diagonal covariance matrix.

The solution of the proposed algorithm in Algorithm 1 can be obtained by minimizing the following functional as where indicates the th sample data of the state and a reconstruction error for the nonoccluded region of the object is given as .

The likelihood function can be formulated by the reconstruction error given in [29]

This likelihood function, however, cannot deal with severe appearance deformations. For this reason, the likelihood function is modified by an additionally labeled penalty constraint about the similarity in neighboring error matrices for the deformed region of the object such as , where and , respectively, represent matrices with nonzero elements of and . Thus, the proposed likelihood function is given as where is the element-wise multiplication of matrices.

3. Experimental Results

The proposed object tracking algorithm is implemented and tested using ten challenging sequences. Major challenging issues in the test sequences include scale variation, shape deformation, fast motion, out-of-plane rotation, background clutter, object rotation, occlusion, illumination variation, out of view, motion blur, and low resolution. The proposed method is compared with a number of state-of-the-art tracking algorithms such as SMTT [15], APGL1 [14], SPCA [21], ASL [2], ILRF [18], and RMTT [16].

The proposed object tracking algorithm is implemented in MATLAB and processes 1.5 frames per second on a Pentium 2.7 GHz dual core PC without any hardware accelerator such as GPU. For each test sequence, the initial location of the object is manually selected in the first frame. Each image sample from the target and background is normalized to a patch.

3.1. Regularization Parameters and

Regularization parameters in (4) are empirically chosen using the center pixel error criterion in [30].

We fix and swap others since represents the sparse error matrix while , and are related to weight variables , , and , respectively. Parameters , , and are parameterized by a discrete set with elements as .

We test different combinations of , , and about three different images sequences having challenging issues such as occlusion, deformation, and scale variation with frames and compute the central pixel errors from all frames. For each in the discrete set with different and , we produce average center pixel errors. When tracking performance is acceptable for giving the fixed , we average all the center pixel errors. With this attempt, we set the regularization parameters as , , , and .

3.2. Quantitative Evaluation

For quantitative performance comparison, center pixel error evaluation method is used and overlap ratio criterion is computed. The center pixel error represents the distance between the predicted and the ground truth center pixels. Figure 1 shows the result of center pixel errors of seven different object tracking algorithms for ten test sequences.

The overlap ratio criterion represents the ratio between the number of frames for a specific object to be completely tracked and the total number of frames in the image sequence. In order to decide whether the object is successfully tracked, we employ the overlapped score defined in [30]. Figure 2 shows overlap ratios between the ground truth region and the tracking region.

3.3. Qualitative Evaluation

Jogging 1 and Jogging 2 sequences include a variety of critical conditions, such as disappearing object, out-of-plane rotation, and deformation. Object deformation and out-of-plane rotation make all the existing methods except ASL [2] drift away object. On the other hand, the proposed tracking method can successfully track the object since it can represent the object appearance more completely using a large scale training data with efficient multitask feature learning for object representation.

The moving object in the Football 1 sequence undergoes in- and out-of-plane rotation and background clutter. Experimental results demonstrate that the proposed method achieves the best tracking performance in this sequence. Other methods cannot avoid drift at some instances in the neighborhood of the th frame.

There is a blurry scale variation in Freeman 3 and Car Scale sequences. For this reason, it is difficult to predict the location of the moving object. Furthermore, it includes drastic appearance change caused by motion blur. The proposed method performs well since it can adapt to the scale and appearance change of the object and overcome the influence of motion blur by using joint sparsity-based object appearance representation.

In addition to scale variation and in- and out-of-plane object rotations, Couple and Crossing sequences include a critical factor such as fast motion. The fast motion can be involved by both camera and object. For tracking an object in Crossing sequence, the proposed method and SPCA [21] methods perform very well whereas other methods lost their object in neighborhood of the 30th and 80th frames.

The Liquor sequence contains severe motion blur, scale and illumination variation, and fast motion. Although full occlusion occurs by a bottle with the similar color, the proposed tracking method successfully tracks the object without drifting.

In the Subway sequence, object's information is lost by discriminant factor such as background clutter and the total occlusion. The proposed method can successfully track the moving object since it preserves the sparsity of object appearance with optimal sparse coding using low-rank feature representation with considering the joint sparsity. Other trackers except ILRFT [18] and RMTT [16] drift in the neighborhood of the th and th frames.

The Skiing sequence contains severely deformed object. The proposed method successfully performs tracking due to its novel object likelihood function defined in (21). It employs more refined likelihood function formulated by the weighted reconstruction error for rigid object regions and by the labeling penalty constraint about the similarity in residual matrices for deformed object regions as well. Figure 3 shows the overall tracking results in each test sequence. The overlapped ratio between the ground truth and predicted tracking region using proposed method turns to at around frame , while the center pixel error between the ground truth and predicted tracking region using proposed method still increases. From this fact, we conclude that the proposed method loses tracking in around frame , while other state-of-the-art tracking methods totally lose their tracking in around frame .

4. Conclusion

In this paper, we present an effective, robust object tracking method using a multitask feature learning-based low-rank representation with joint sparsity. In order to overcome the limitation of existing sparse representation-based object tracking methods, we employ the novel optimization process of low-rank representation of objects by using a recently proposed model minimization method. The efficient subspace learning-based sparse coding and simultaneous update method of both optimal sparse codes and error matrix can be appropriately updated in the process of tracking in case of severe appearance variation. Experimental results demonstrate that the proposed method can successfully track objects in various video sequences with critical issues such as occlusion, deformation, plane rotations, background clutter, motion blur, scale and illumination variations, and fast motion.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported by the MSIP, Korea, under the ITRC Support Program (NIPA-2014-CAU) supervised by NIPA, in part by the Technology Innovation Program (Development of Smart Video/Audio Surveillance SoC and Core Component for Onsite Decision Security System) under Grant 10047788, and by the Ministry of Science, ICT and Future Planning as Software Grand Challenge Project (Grant no. 14-824-09-002).