Control Problem of Nonlinear Systems with ApplicationsView this Special Issue
Feature Selection Tracking Algorithm Based on Sparse Representation
In order to enhance the robustness of visual tracking algorithm in complex environment, a novel visual tracking algorithm based on multifeature selection and sparse representation is proposed. In the framework of particles filter, particles with low target similarity are first filtered out by a fast algorithm; then, based on the principle of sparsely reconstructing the sample label, the features with high differentiation against the background are involved in the computation so as to reduce the disturbance of occlusions and noises. Finally, candidate targets are linearly reconstructed via sparse representation and the sparse equation is solved by using APG method to obtain the state of the target. Four comparative experiments demonstrate that the proposed algorithm in this paper has effectively improved the robustness of the target tracking algorithm.
Visual target tracking algorithm, due to its wide application in fields such as robotics visual control, human-machine interaction, intelligent assistance driving, and video surveillance, has attracted increasing attention from the researchers. However, in complex environment, due to changes in illumination and expression of the target object, object occlusion, and noises disturbance, design of an accurate and real-time visual tracker remains a challenging problem .
In recent years, a kind of appearance modeling technology called sparse representation has been widely used for information compression and pattern recognition. Agarwal and Roth  have achieved good results in object recognition via sparse representation. Researches [3–5] demonstrate that high recognition rate has been obtained in face recognition via sparse representation. Mei and Ling [6, 7] first introduced sparse representation theory into visual tracking and developed a sparse representation tracking algorithm based on particles filter. The algorithm uses the template dictionary to linearly reconstruct the candidate targets and imposes sparsity constraints on the reconstruction coefficients. Apart from the target template, the algorithm uses the trivial template to construct the template dictionary, showing good robustness to occlusion; however, it requires a large amount of calculation because the algorithm adopts the LESSO method to solve the sparse equation. Bai and Li  developed a structural sparse representation model, which divides the sample into blocks and uses the Block Orthogonal Matching Pursuit (BOMP) algorithm to solve the sparse equation. It effectively improves the calculation speed, but the algorithm is not robust to illumination change. Hou et al.  provided a tracking algorithm of block sparse representation, in which each block is given the corresponding weights to improve the robustness to occlusion, but there may be drift due to large noises. A target tracking algorithm based on the structural sparse representation is proposed in , which constructs a vector pool of sparse representation coefficients by blocking the target sample and identifies the target state through the similarity information in the vector pool. But this algorithm fails to effectively use the residual error information; its robustness needs to be further improved. Based on this, Hou et al.  developed a target tracking algorithm with sparse representation based on ranking. The algorithm gives full consideration to the sparse representation coefficients and residual error information while locating the target, which improves the robustness of the tracking algorithm. In the framework of structural sparse representation, the sparse coefficients of the candidate targets is classified in  by training the Naive Bayes Classifier, strengthening the algorithm’s ability to differentiate between the target and its background. But the algorithm fails to extract the features with differentiation; its robustness needs to be further improved. Wang et al.  propose a least soft-threshold squares tracking algorithm based on sparse representation, which models the reconstructed residual error terms with the Gaussian-Laplacian distribution and finds the optimal solution of the objective equation by using iterative and soft-threshold methods. Bao et al.  impose constraints on the target template coefficients and the trivial template with -norm and -norm, respectively. Additionally, it adopts the Accelerated Proximal Gradient (APG) method to solve the objective equation with sparsity constraints. As a result, the accuracy of the tracker and the speed of calculation are significantly improved. Zhuang et al.  propose a multitask concept that is similar to ; the algorithm also uses the Accelerated Proximal Gradient (APG) method to solve the objective equation by iteration, but it requires a fair amount of calculation. In the framework of sparse representation, Lan et al.  propose an objective equation based on adaptive multifeature selection and solve the equation by using the Accelerated Proximal Gradient (APG) method. The multifeature selection method has effectively improved the robustness of the tracking algorithm, but it requires a large amount of calculation.
In order to improve the robustness of visual tracking algorithm, this paper proposes a novel tracking algorithm based on multifeature fusion and sparse representation. According to the image intensity, the proposed algorithm uses the simple algorithm to filter out the candidate targets largely dissimilar to the target. Then, the discriminative features, which come from multifeature, are selected by using a method of sparsely reconstructing sample label. Finally, it uses -norm to linearly reconstruct the candidate targets and obtains the state of the target.
2. Overview of the Tracking Algorithm
The theorem of proposed tracking algorithm based on multifeature fusion and sparse representation in this paper is shown in Figure 1. Compared with other tracking algorithms, it has the following contributions:(1)This paper proposes a fast algorithm that can quickly filter out the particles with low similarity to the target template, improving the calculation speed of the algorithm.(2)Based on the sample feature constructed from multifeature, the proposed algorithm uses the method of sparsely reconstructing positive and negative sample label to extract sample features, which can help these extracted features more discriminatively.(3)The conventional tracking algorithms based on sparse representation always use -norm to impose the constraints on the linear reconstruction function, but they require a large amount of calculation, whereas the proposed algorithm in this paper uses APG method to solve the sparse function with nonnegative constraints, greatly improving the calculation speed of the algorithm.
3. Fast Particle Filter Algorithm
In the framework of particle filtering, most particles bear little resemblance to the target. So they can be filtered out through simple algorithm. Based on this idea, a fast algorithm is proposed to filter out the particles with low similarity to the target.
Let denote the sample set of the candidate targets, and denote the target template, where is one of the candidate target samples, is the dimension of the sample, and represents the number of particles. In order to reduce the amount of calculation, only the gray image information of the sample is involved in the calculation. By normalizing the candidate target and target template , we can get and , where is the th feature of , is the th feature of , and is the similarity measure of and . If bears much resemblance to , then there will be no remarkable difference between the values of . So we can filter out the particles according to the fluctuation of values of :where is the th element of and is the mean of all elements of . The value of reflects the fluctuation of values of . If the target is disturbed by occlusions or noises, the real target can possibly be filtered out when the value of is too small. In order to solve this problem, only the elements with 50% variance value in will be involved in the calculation. Sort the elements in in descent order according to their values and eliminate the elements with values in the top half; then we can get the vector . Suppose , where is the th element of ; then we can filter out the particles according to the value of . Through experiment, we select the with smaller value () from all the particles to participate in the later operation. Through the algorithm, we can quickly eliminate two-thirds of particles with low similarity to the object, which effectively improves the calculation speed of the tracking algorithm.
4. Feature Selection
In order to improve the robustness of the tracking algorithm, we use the image intensity and LBP feature to construct the sample features. Denote the gray features of the sample by and the LBP feature of the sample by . When and are combined, we can get , where represents the sample feature, . Because the dimension of is high, it contains a fair amount of redundant information. In order that the algorithm can differentiate between target and background better, we need to select the discriminative features from .
During the tracking process, according to the target sample in the first frame and the tracking result, we can get the positive and negative sample set of the target , where the positive sample of the target is and the negative sample is , where represents the number of the positive samples and indicates the number of negative samples. Use the sparse reconstruction theory to select the sample features; then we can getwhere is the norm operator and represents the label vectors of the positive and negative samples. The positive and negative samples are set to 1 and , respectively. is the sparse adjustment coefficient and represents the reconstruction vectors, . It should be noted that, in (2), we use the 2-norm to impose constraints on the sparsity of . The benefit is that it can effectively reduce the amount of calculation in solving the sparse function. Set threshold as , and when , the corresponding feature has a strong ability to differentiate between the target and its background.
Construct the mapped vector of feature selection , . Consider
Here, is the feature vector before the feature selection. After the feature selection, the feature vector is defined aswhere denotes the multiplication of the corresponding elements within the vector.
5. Solution to the Sparse Equation with APG Approach
After feature selection, we get the feature vector set of the candidate targets , , and denotes the dimension of each sample after feature selection. Let denote the template dictionary, where is the target sample set, , and represents the unit diagonal matrix used to reduce the disturbance of occlusions and noises.
Using template dictionary to make sparse linear reconstruction for the candidate targets, we can getwhere represents the sparse coefficients, is the sparse coefficients corresponding to the target template, and is the sparse coefficients corresponding to and . Impose the nonnegative constraints on the elements in to improve the robustness of the tracker .
After adding a penalty term, (5) with nonnegative constraints can be updated bywhere is the penalty term defined by
The solution to (6) is equivalent to optimizing the convex function. This paper uses the Accelerated Proximal Gradient (APG) approach to find the optimal solution to (6). Setwhere is a differentiable convex function and is a discontinuous convex function. The specific algorithm is shown in Algorithm 1.
Algorithm 1. Minimization algorithm for (6) by using APG method is as follows:(1)Initialize, , (2)For = 1 : 4(3);(4);(5);(6)end for
In Algorithm 1, step needs to find the minimum value of the function, so it can be updated bywhere . Equation (9) can be seen as to solve the minimum values of the two functions as follow:where and are the elements corresponding to and in . It can be seen that the optimal solution to (10) is and the optimal solution to (11) is .
6. Object Tracking
The proposed algorithm in this paper is implemented in the framework of particles filter. In the first frame of the video, the initial state of the target is picked by mouse or captured through target recognition. Let denote the observation values from the first frame to the th frame of the video. is the state of the th particle in the th frame. The target state in th frame isHere, can be obtained by solving the following equations:
In (13), is the state transfer function. The state of the sample is defined by six-dimensional affine vector , which represents the coordinate, coordinate, length-width ratio, rotation angle, torsion angle, and scale, respectively. Suppose that the six parameters are mutually independent; then the state transfer function can be represented aswhere is the diagonal matrix constituted by .
The relation between the similarity measure in (14) and the residual error based on sparse representation can be represented as
The proposed algorithm is shown in Algorithm 2.
Algorithm 2. The proposed algorithm in this paper is as follows.
Input. Video, the target state in the first frame(1)Obtain the target dictionary of the initial target and positive and negative samples(2)for = 1 : ( is the number of the frames of the video)(3)Update particles with normal distribution(4)Extract the intensity of the particles and filter out particles(5)Extract LBP feature of the sample and get (6)Use sparse reconstruction theory to select the sample features, and then get the feature vector (7)Construct a sparse equation and solve it with APG method(8)if can be divided by 5(9)Update the feature vector , update template dictionary, and update the positive and negative samples(10)end if(11)end for
Ouput. The state of target in each frame.
7. Experiments and Discussion
The proposed algorithm in this paper is implemented in Matlab 2009b. In order to make the tracking results of the proposed algorithm more convincing, the experiment selects other three representative tracking algorithms for comparison, including tracking algorithm , IVT tracking algorithm , and ASLSAM tracking algorithm. The four algorithms will be tested by four internationally used tracking test videos including “Panda,” “Woman Square,” “Trellis,” and “ThreePassShop2cor”. It is a challenging task to track these targets because all the videos pose challenging factors such as partial occlusion and variations in illumination, pose, and scale.
The first test video is Panda with a cartoon panda as the tracked target. For this video, the difficulty in tracking the target lies in the partial occlusion and the target rotation. Figure 2 shows the test results of the four algorithms in the 4th, 55th, 108th, 176th, 201th, and 241th frame. The test results of the proposed algorithm, ASLSAM algorithm, algorithm, and IVT algorithm are marked by the blue, red, gray, and purple boxes, respectively. As shown in Figure 2, our proposed algorithm can handle the target very well before the 201th frame but drift from the target in the 201th and 241th frame due to the occlusion. However, it has never missed the target throughout the tracking process. While the other three algorithms miss the target when there are occlusion and target rotation.
The second test video is Trellis with the face as the target. There is a great difficulty in tracking the target because of the drastic illumination and variation in pose in the video. As shown in Figure 3, our proposed algorithm can track the target faithfully throughout the tracking process without being affected by the pose and illumination change. But algorithm and IVT algorithm miss the target in the 276th frame and ASLSAM algorithm misses the target in the 532th frame.
The third test video is Woman Square with a pedestrian as the target. The main challenge of tracking the target comes from the partial occlusion. As shown in Figure 4, only our proposed algorithm has successfully tracked the target without being affected by the partial occlusion while all the other three algorithms miss the target in the 143th frame as a result of the partial occlusion.
The fourth test video is ThreePassShop2Cor. The difficulty of tracking the target lies in the occlusion, the disturbance of the similar object, and scale variation. The tracking results of the four algorithms are shown in Figure 5. Because of the disturbance of occlusion and similar objects, our proposed tracker drifts from the target in the 125th frame but relocates the target later. While the ASLSAM and IVT algorithms locate the similar object instead of the tracked target in the 125th frame and the algorithm misses the target in the 303th frame.
In order to compare the tracking results better, Table 1 lists the maximum, mean, and standard variance values of the tracking error of the four algorithms in four image sequences. The tracking error refers to the Euclidean distance between the center point of the target derived from the tracking algorithm and the center point of the actual target. It can be seen that the tracking results on all the four videos demonstrate that our proposed algorithm achieves more favorable performance than the other three.
Apart from handling the tracking error, in order to reflect the relation between the tracking results and the actual appearance of the target, Table 2 lists the success rate of the four tracking algorithms based on the PASCAL VOC standard . It can be seen that the success rate of our proposed algorithm is remarkably higher than that of the other three.
A new sparse representation-based tracking algorithm in the framework of particles filter is proposed in this paper. The new method first uses a fast algorithm to filter out the particles largely dissimilar to the tracked target, which reduces the amount of calculation. Then, with the intensity feature and LBP feature combined, the algorithm extracts the features with discriminative ability via sparse representation, improving its robustness to the occlusion and disturbance. Furthermore, it adopts the APG method to solve the sparse equation with nonnegative coefficient constraints, improving the computational speed and robustness of the proposed algorithm. Finally, the experiments demonstrate that the proposed tracker achieves more favorable tracking results.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publishing of this paper.
This work is supported by the Foundation of Innovation Demonstration Project for NC mechanical product in Guangdong, China (no. 2013B011301026).
S. Agarwal and D. Roth, “Learning a sparse representation for object detection,” Computer Science, vol. 2353, pp. 97–101, 2006.View at: Google Scholar
J. Wright, Y. Yang Allen, A. Ganesh et al., “Robust face recognition via sparse representation,” in Proceedings of the 8th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 210–227, Amsterdam, The Netherlands, September 2008.View at: Google Scholar
Y.-E. Hou, W.-G. Li, S. Sekou et al., “Target tracking algorithm with structured sparse representation based on ranks,” Journal of South China University of Technology (Natural Science Edition), vol. 41, no. 11, pp. 23–29, 2013.View at: Google Scholar
X. Lan, A. J. Ma, and P. C. Yuen, “Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation,” in Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14), pp. 1194–1201, Columbus, Ohio, USA, June 2014.View at: Publisher Site | Google Scholar