Advancements in Mathematical Methods for Pattern Recognition and its ApplicationsView this Special Issue
An Improved Kernelized Correlation Filter Based Visual Tracking Method
Correlation filter based trackers have received great attention in the field of visual target tracking, which have shown impressive advantages in terms of accuracy, robustness, and speed. However, there are still some challenges that exist in the correlation filter based methods, such as target scale variation and occlusion. To deal with these problems, an improved kernelized correlation filter (KCF) tracker is proposed, by employing the GM(1,1) grey model, the interval template matching method, and multiblock scheme. In addition, a strict template update strategy is presented in the proposed method to accommodate the appearance change and avoid template corruption. Finally, some experiments are conducted. The proposed method is compared with the top state-of-the-art trackers, and all the tracking algorithms are evaluated on the object tracking benchmark. The experimental results demonstrate obvious improvements of the proposed KCF-based visual tracking method.
Visual tracking is one of the most fundamental tasks in the field of computer vision, which has several practical applications including unmanned aerial vehicles, traffic control, human computer interaction, and video surveillance [1–4]. There are various issues that should be considered in designing a tracking system, such as real-time tracking, adaptivity, and robustness . In recent years, although the visual tracking technology has progressed significantly , it remains challenging for long-term visual object tracking due to a mass of factors, such as occlusion, deformation, illumination variation, scale variation, and fast motion of targets [7, 8].
Various methods have been proposed to deal with the visual tracking problem. For example, Ma et al.  proposed a joint blur state estimation and multitask reverse sparse learning framework to solve the motion blur problem in visual tracking. Zhang et al.  presented a robust correlation tracking algorithm with discriminative correlation filters to handle target scale variations in complex scenes. The popular tracking algorithms can be categorized into generative and discriminative methods . The generative method focuses on learning a perfect appearance model to represent the target and then to find the most similar candidate by matching [12, 13]. Among these methods, KLT  and NCC  are the original methods. They are simple and efficient; however, they tend to drift as the search area enlarges. Different from the generative method, the discriminative method formulates the tracking problem as a binary classification task and distinguishes the target from the background by using a discriminative classifier . Usually, the classifier is trained and updated by positive and negative samples. In particular, the correlation filter based discriminative method has been proven to have high efficiency and recently attracted a considerable amount of research attention . But there are also a lot of limitations in the correlation filter based method , such as the scale variation and occlusion.
To deal with the problems existing in the correlation filter based method, many improved methods have been proposed. For example, Henriques et al.  presented the Circulant Structure tracker with Kernels (CSK), which solved the expensive computation of dense sampling through a cyclically shifted sampling method. This study also proved that the kernel matrix of the samples also had circulant structure. Based on CSK, the kernelized correlation filter (KCF) is proposed, which adds multichannel features to the correlation filter pipeline . The KCF method adopts HOG features instead of the raw pixel values used in CSK, which makes the tracking results more accurate. Additionally, many algorithms are designed to handle the target scale variation in tracking. For example, the Discriminative Scale Space Tracker (DSST)  uses a feature pyramid and a 3D correlation filter to estimate scale. The Scale Adaptive with Multiple Features tracker (SAMF)  uses a fixed scaling pool to sample the candidates at different sizes. These methods can deal with some limitations of the correlation filter based trackers. However, there are still some problems that are not solved very well, such as the speed and occlusion issue.
In this paper, an improved KCF-based visual tracking method is proposed, which is focused on the problems associated with occlusion and scale variation in the correlation filter based trackers. The main contributions of this paper are summarized as follows. (1) A general long-term motion model is established based on the interval template matching in redetection. (2) A surveyed area is proposed for the detector searching. In addition, the GM(1,1) model is used to estimate the position of the target. When a full occlusion occurs, the GM(1,1) model is also used to predict the position of the target based on the previous behaviors of the target. (3) Two scale variables of the target are defined to measure the scale variation during the tracking. Then a novel method is proposed to estimate the scale variables by using a part-based model. (4) Aiming at the problem of the offset of target position which will reduce the precision and success rate in the process of tracking, the edge information of target and the maximum response value are utilized to correct the target position.
This paper is organized as follows. The proposed KCF-based tracker is described in Section 2. Section 3 gives out the experiments and the results analysis. Some discussions on the performance of the proposed approach are given in Section 4. Finally, Section 5 contains our conclusions.
2. The Proposed Approach
In this paper, the problem of visual tracking based on KCF method is studied, and an improved method is proposed. The work flow of the proposed method is shown in Figure 1, where the KCF tracker is the key part, and another two important parts are the long-term tracking method and the scale estimation method. The general KCF tracking algorithm has been presented in other literature (see [20, 23] for details), so here only the necessary description of the KCF-based method and the improvements of it will be given out.
2.1. The Proposed Long-Term Tracking Method
The general KCF tracking algorithm includes two main phases, namely, the training and detection phases. In the training phase, it involves training classifier with a set of samples . In detection phase, a new sample is detected bywhere is the kernel correlation of and , is a parameter of , and stands for the Fourier transform operation.
In this paper, the success of the tracking result is determined based on the maximum response value in and a flag is defined, namely,where is a predefined threshold and means that the target was not tracked successfully.
To deal with the problem of tracking failure, the target position for the KCF tracker is reinitialized using the grey prediction model GM(1,1) and the interval template matching method. The basic idea of the proposed long-term tracking method is as follows. (1) The grey model GM(1,1) is used to predict the target’s position in the first five frames when the target is lost, and the forecast position is used to update the KCF tracker. (2) If the KCF tracker still cannot accurately track the target after the update, then the template matching is performed. In this study, the detector searches for the target in a surveyed area instead of in the whole image. (3) When a full occlusion is occurring, neither KCF tracker nor the detector can find the target. Because the full occlusion usually takes place for a short time, the position of the target can be estimated based on its previous position by the GM (1, 1) method. After leaving the occlusion situation, the target can be found by the detector again and the tracking process is continued. The details of the proposed method are introduced as follows.
2.1.1. The Position Estimation Based on GM(1,1)
The grey model GM(1, 1) is a forecast model in grey system, which has a series of characteristics including less data, low computational complexity, and accurate forecasting [24, 25]. So it is suitable for the real-time visual target tracking. In the grey prediction model, the initial data set is defined aswhere is the total number of the initial data set. The 1-AGO (accumulated generating operation, AGO) is defined asAccording to GM (1, 1), the first-order grey differential equation is donated:where and are the grey parameters, which can be solved by the least-square method, namely,where the coefficient vector and the accumulated matrix are expressed byThen the solution of (4) is obtained as follows:By applying the inverse accumulated generation operation, the predicted equation will be obtained:where (k) is called the predicted value of the GM(1,1) model.
In the proposed approach, if the target cannot be tracked successfully based on the ordinate and horizontal coordinates of the target in the previous four frames, the corresponding horizontal coordinate prediction model and the ordinate coordinate prediction model are respectively established, and then two models are used to realize the prediction of the horizontal and ordinate coordinates of the target. The center position of the target can be expressed bywhere and are the GM(1,1) model prediction values by using the position of the target in the previous four frames, and are the variance of and , is a threshold to estimate the magnitude of the change in several numbers in and , and and are the average of and , respectively.
2.1.2. Occlusion Handling Based on Interval Template Matching Method
Redetection involves recovering the tracker when it fails. If the object is lost, the detector facilitates the reinitialization of the target position for the KCF tracker, so the performance of the KCF tracker is improved for long-term visual object tracking in challenging conditions. In the proposed method, the detector is based on a sliding window approach and template matching approach. In order to implement the proposed algorithm in real time, the grey model GM(1,1) is used to predict the target’s position in the first five frames after the target is lost, and then the forecast position is used to update the KCF tracker.
If the KCF tracker still cannot accurately track the target after the update, then template matching is performed. In this paper, the Bhattacharyya coefficient is used to calculate similarity between the target and the candidate model during the template matching process, namely ,where stand for the feature vector of grayscale histogram of the target model and the candidate model, if the original image is grayscale. Otherwise, they are the reconstruction feature vectors of channel histograms of the target model and the candidate model. The value of is within the range of , and the larger it is, the more similar the two models are.
In the template matching process, if the detector performs the search on the entire area of an image, it will take a significant amount of time to find the target. Hence, it is indispensable to detect only on a small region instead of the entire region of an image. And the detector should extract an area with high probability of containing the target. Based on the statistical information obtained from the GM (1, 1) model introduced above, a surveyed area is defined as follows:where and are the height and width of the matching template. The detection strategy based on the proposed surveyed area is shown in Figure 2.
2.2. Scale Estimation Using Multiblock Scheme
In visual tracking, one of the main drawbacks of the KCF tracker is that it does not address the issue of the target scale variation. When the target scale changes, the KCF tracker is prone to drift and cannot efficiently locate the target. To deal with this problem, a part-based model to estimate the target scale is proposed. In the proposed method, a global block is first used to cover the entire target. Then the main four parts are divided out from the global block (see Figure 3). The splitting direction is simply determined by the height, width, and the modified position of target, and two scale variables ( and ) are defined to estimate the height and width variation of the target, respectively, which are calculated bywhere , , , and are defined aswhere , , , and are the number of pixel points in the top, bottom, left, and right four blocks, respectively. If there are more black pixels than white pixels, the , , , and are the number of black pixel points; otherwise, they are the number of white pixel points. The basic work flow is as follows. When a new frame comes, the -th frame image patches are captured in several scale spaces based on the -th frame image patches. The four blocks are converted into binary image blocks, and the number of black and white pixel points in the four blocks is figured out. In the -th frame, the larger the target, the more black pixel points are counted in the four blocks. Therefore, and not only reflect the scale change of target but also can be used to update the height and width of the target separately. In this paper, the following rule is used to judge whether there is a scale change of the target:where and are two predefined parameters, and means the scale change occurs. In order to more accurately estimate the change of the target scale, a new position is constructed aswhere is the average value of each pixel point which is detected by using the Sobel operator on target , is the learning rate, and is the target position detected by the KCF tracker. Then, the height and width of target can be updated by
By the proposed approach, the target scale can be estimated recursively in every frame during the tracking. The procedure of the proposed target scale estimation based on the multiblock scheme is abstracted as follows and an example of it is shown in Figure 3.
Step 1. Deploy the Sobel operator to modify the center position of target.
Step 2. Determine the multiblock scheme by the height, width, and the modified position of target.
Step 3. Calculate two scale variables by using the multiblock scheme.
Step 4. Judge whether there is a scale change of the target.
Step 5. Update the scale of the target recursively in every frame, if the scale of the target changes.
2.3. Model Updating for Tracker
In the KCF tracker, to ensure that the tracking process can adapt to the target variations at the following input frames, the parameter and the feature in (1) should be updated as follows:
In the detector model, the target model describes the similarity between the candidate model and the target. Therefore, the target model should also be constantly updated to strictly represent the target. In this paper, the target model is updated by the following rule:where is the update rate and is the initial template of the target.
The pseudocode of the proposed tracking method is summarized in Algorithm 1
3. Evaluation of the Proposed Method
In this section, the proposed method is evaluated on the object tracking benchmark OTB50 [28, 29]. These sequences are recorded in various scenarios and contain different challenges such as scale variation, background clutter, illumination variation, and occlusion. The experiments are implemented with MATLAB, on a computer with Intel Core i7-6500U and 2.50GHz CPU with 4.096GB RAM. In this paper, the HOG cell size is and the number of orientation bins is nine. To mitigate the boundary effect, the extracted features are multiplied by a cosine window. The parameters of the proposed method are listed in Table 1.
To objectively evaluate the proposed algorithm (SLKCF), which is compared with other state-of-the-art tracking methods, including the Multistore Tracker (MUSTER) , Discriminative Scale Space Tracker (DSST) , Scale Adaptive with Multiple Features tracker (SAMF) , KCF tracker , CSK tracker , Struck , Tracking-Learning-Detection method (TLD) , Compressive Tracker (CT) , and Distribution Fields for Tracker (DFT) .
3.1. Quantitative Evaluation
In this study, three performance criteria often used in other literature are chosen to evaluate these trackers. The first one is the distance precision (DP), which is defined as the ratio of the number of correctly tracked frames to the total number of frames for a range of distance thresholds at 20 pixels in the sequence. If the distance precision is higher at low thresholds, it means that the results are more accurate. The second one is the success rate (SR), which shows the percentage of successfully tracked frames. The standard for successful tracking is that the bounding box overlap between the tracked object and the ground-truth is bigger than a threshold value (which is set to 0.5 in this study). The third one is the tracking speed, which is also a vital criterion to evaluate a tracker. If a tracker can obtain a speed of 25 frame per second (FPS), this tracker is regarded to run in real time. In the quantitative evaluation experiments, eleven representative videos are selected from the object tracking benchmark OTB50 and used to show the performances of the proposed approach. The average distance precision and success rate plots of one-pass evaluation (OPE) in these eleven videos are shown in Figure 4. The precision and the success rate of all compared trackers in these eleven videos are listed in Tables 2 and 3. To further test the proposed approach, all the sequences of OTB50 are used to compare the deferent methods and the results are listed in Table 4.
The results of Figure 4 show that the proposed method (SLKCF) achieves outstanding performance in the distance precision and the success rate, which indicate that the SLKCF tracker has the best overall performance using both the two metrics. Also the results in Figure 4 show that the proposed approach significantly outperforms the traditional KCF tracker. The results in Tables 2 and 3 reveal that the proposed occlusion handling model and scale estimation using multiblock scheme can help the KCF tracker achieve outstanding performance. The average precision of the proposed approach is but only of the standard KCF method. The average rate of the proposed approach is but only of the standard KCF method. Table 4 shows the proposed approach has the best performance with DP=86.29% and SR=78.31% for all the sequences of OTB50. Though the proposed method is slower than the KCF and the CSK tracker, it is faster than the other trackers and can also track the target in real time.
3.2. Qualitative Evaluation
In order to evaluate the proposed algorithm more intuitively in comparison with other trackers, Figure 5 displays the results of different trackers for several representative sequences. In Figure 5(a), the sequences of jumping show the original images are grey and the scene with fast motion due to the target. Owing to the integration of the GM(1,1) grey model and the interval template matching method, the proposed SLKCF obtains good performance. The KCF tracker does well when the target moved smoothly but drifts when fast motion occurs. Figure 5(b) shows a full occlusion in which the target overlaps with the pole. Before the full occlusion, the target can be tracked by every tracker successfully. However, when the full occlusion is occurring, only the proposed method tracks the target. At this moment, the GM(1,1) model helps correct the tracking process. In Figure 5(c), the results are obtained when background clutter, illumination changes, motion blur, occlusion are presented. At this moment, the proposed SLKCF tracker can also deal with these challenging factors well like other methods. And the proposed mechanisms in this study make it more efficient and consume less computational time. The sequences in Figure 5(d) depict scale variation accompanied with partial occlusion. The ratio of the maximal target size to the minimal one is more than 30 when the target vehicle approaches the camera from far away. DSST, SAMF, MUSTER, and the proposed method can adapt to the target scale change. It is obvious that the four trackers above work well when the scale changes slightly. After the second hundredth frame, DSST, SAMF, and MUSTER only can capture part of the car. In contrast, the proposed method in this paper can accurately track the entire car. The results in Figures 5(e)-5(f) show that the proposed approach can deal with the problem efficiently so that the target undergoes scale variation, illumination changes, and occlusion.
The videos in the object tracking benchmark are annotated with different attributes, which represent challenges in each video. The attributes include the occlusion (OCC), illumination variation (IV), deformation (DEF), scale variation (SV), fast motion (FM), background clutter (BC), in-plane rotation (IPR), out-of-plane rotation (OPR), and out of view (OV). In order to further discuss the validity of the proposed algorithm in these challenging situations, some experiments are conducted. The results of distance precision for the challenging attributes are shown in Table 5 and Figure 6. The results show that the proposed approach ranks the first with a large margin comparing to other trackers. These promising results show that the proposed tracker (SLKCF) is more effective than the other tracker for these challenging attributes.
In this paper, the target tracking task is studied and an improved KCF-based tracking algorithm that can be implemented in real time is proposed. In the proposed approach, the KCF tracker is combined with the GM(1,1) grey model, the interval template matching method, and part-based model, which enable the proposed algorithm to track the target in challenging environments containing occlusion and object scale variation successfully. These performances of the proposed approach above are superior to the traditional KCF-based tracker. The experimental results show that the proposed method outperforms the other trackers in terms of distance precision and success rate. In future work, the tracking speed of the proposed approach should be further improved and some novel intelligent methods will be studied to improve the performance of the tracking algorithm.
All data supporting this study are openly available from the website of the Visual Tracker Benchmark at http://cvlab.hanyang.ac.kr/tracker_benchmark/datasets.html.
Conflicts of Interest
The authors declared that they have no conflicts of interest in this work.
This work was supported by the National Natural Science Foundation of China (61873086, 61801169, and 61573128) and the Fundamental Research Funds for the Central Universities (2018B23214).
C. Kanellakis and G. Nikolakopoulos, “Survey on Computer Vision for UAVs: Current Developments and Trends,” Journal of Intelligent & Robotic Systems, vol. 87, no. 1, pp. 141–168, 2017.View at: Google Scholar
C.-C. Chiang, M.-C. Ho, H.-S. Liao, A. Pratama, and W.-C. Syu, “Detecting and recognizing traffic lights by genetic approximate ellipse detection and spatial texture layouts,” International Journal of Innovative Computing, Information and Control, vol. 7, no. 12, pp. 6919–6934, 2011.View at: Google Scholar
M. Crocco, M. Cristani, A. Trucco, and V. Murino, “Audio surveillance: A systematic review,” ACM Computing Surveys, vol. 48, no. 4, 2016.View at: Google Scholar
J. Ma, H. Luo, B. Hui, and Z. Chang, “Robust scale adaptive tracking by combining correlation filters with sequential Monte Carlo,” Sensors, vol. 17, no. 3, 2017.View at: Google Scholar
J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “Exploiting the circulant structure of tracking-by-detection with kernels,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, vol. 7575, no. 4, pp. 702–715, 2012.View at: Google Scholar
Y. Li and J. Zhu, “A scale adaptive kernel correlation filter tracker with feature integration,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, vol. 8926, pp. 254–265, 2015.View at: Google Scholar
S. Jeong, G. Kim, and S. Lee, “Effective visual tracking using multi-block and scale space based on kernelized correlation filters,” Sensors, vol. 17, no. 3, 2017.View at: Google Scholar
Q. Yu, J. Lyu, L. Jiang, and L. Li, “Traffic anomaly detection algorithm for wireless sensor networks based on improved exploitation of the GM(1,1) model,” International Journal of Distributed Sensor Networks, vol. 12, no. 7, 2016.View at: Google Scholar
Y. El merabet, Y. Ruichek, S. Ghaffarian et al., “Maximal similarity based region classification method through local image region descriptors and Bhattacharyya coefficient-based distance: Application to horizon line detection using wide-angle camera,” Neurocomputing, vol. 265, pp. 28–41, 2017.View at: Publisher Site | Google Scholar
C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang, “Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking,” International Journal of Computer Vision, pp. 1–26, 2018.View at: Google Scholar