Abstract

In order to avoid the inaccurate location or the failure tracking caused by the occlusion or the pose variation, a novel tracking method is proposed based on CamShift algorithm by decomposing the target into multiple subtargets for location separately. Distance correlation matrices are constructed by the subtarget sets in the template image and the scene image to evaluate the correctness of the location results. The error locations of the subtargets can be corrected by resolving the optimization function constructed according to the relative positions among the subtargets. The directions and sizes of the correctly located subtargets with CamShift algorithm are updated to reduce the disturbance of the background in the tracking progress. Simulation results show that the method can perform the location and tracking of the target and has better adaptability to the scaling, translation, rotation, and occlusion. Furthermore, the computational cost of the method increases slightly, and its average tracking computational time of the single frame is less than 25 ms, which can meet the real-time requirement of the TV tracking system.

1. Introduction

Target tracking is a key technology in computer vision field. It can be widely applied to industry, surveillance, robotics, human machine interface, and so forth [14]. It is a challenging task in real-world scenarios to track a target precisely due to the variability of the target’s movements, shapes, clutter, and occlusions. There are many factors which impact the tracking performance [58]. Firstly, the motion of the target is random and uncertain. The target may change its posture in the tracking process, which can lead to the scaling, translation, or rotation of the target in the tracking image. The size and shape of the target in the current scene may be different from those in the previous one or the target template. Secondly, the illumination in the scene may be instable. Uneven illumination or variant brightness can cause disturbances to target tracking. Besides, complex background affects the tracking performance badly. Note that the complex background means that color of background is close to that of the target rather than that the content of the background is complex. In other word, it is difficult to distinguish the target from the background with the similar color.

Some tracking algorithms have been proposed to deal with the problems above. For instance, superpixel tracking can handle heavy occlusion and recover from drifts [9]. And patch-based tracking algorithm decomposes a target into several patches based on appearance similarity and spatial distribution to recognize the target in the complex background [10]. Generally, improving the accuracy of the target model is one of the effective methods to enhance the location accuracy, but it has the cost of computational time. Therefore, a method is needed to model an accurate target and perform target tracking precisely with less computation. MeanShift [11, 12] algorithm and CamShift (Continuously Adaptive Mean-SHIFT) [13] algorithm have attracted more attention of scholars for their characteristics of fast matching. Particularly, CamShift can adaptively adjust the size of the target area, so that it has good adaptability to the pose variation of the target [14, 15]. These two methods have good tracking performance under the condition of high-contrast between the target and the background. However, when the background is complex, the adaptive adjustment of the size of the target area can be disturbed by some background information, which may cause lower modeling accuracy, location error, or even tracking failure [16]. Therefore, some improved methods are proposed to enhance the accuracy of the location. For instance, one of the effective ways is to fuse some other features [17, 18], such as shape, edge, or texture, to improve the modeling precision. However, that needs a larger computational cost and makes the model more complex.

In order to improve the reliability of the target tracking algorithm, an improved CamShift tracking strategy is proposed to enhance the performance of the target tracking. The target is decomposed into multiple subtargets. Each subtarget is tracked separately by CamShift algorithm. The error locations of subtargets can be corrected according to the relative positions among all the subtargets. The target can be synthesized by these subtargets. In this tracking strategy, the error correction can enhance the reliability of target tracking result in high location accuracy. Particularly, this strategy shows the good adaptability when the target is occluded. Different from the superpixel tracking algorithm [9] and patch-based tracking algorithm [10], the method proposed in this paper adopts the CamShift algorithm to perform the subtarget tracking, and the correct tracking result can be obtained by the error correction as long as some of the subtargets are located correctly.

2. Subtarget Set and Distance Correlation Matrix

Suppose that the target contains distinguishing feature areas. Each distinguishing feature area can be regarded as a subtarget. All the subtargets are grouped into a subtarget set . Thus, the target can be described by the subtarget set .

One element in the subtarget set represents the center coordinates of one subtarget accordingly. Set the center coordinates ofthe subtarget as (, ); the subtarget set can be described as

The distance between the th and the th subtargets is defined as where (, ) and (, ) are the center coordinates of the th and the jth subtargets in the subtarget set U.

The distance correlation matrix of the subtarget set can be defined as

The target motion can be seen as the rigid body motion, and the ratios of the relative distances among all the subtargets are fixed. According to these relative distances, error locations of subtargets can be corrected.

3. Target Tracking Based on Target Decomposition

After the target decomposing, the hue histogram of each subtarget is modeled in the HSV color space, CamShift algorithm is used to locate every subtarget, and the error locations of subtargets can be corrected by optimization calculation. The position of the target in the scene can be determined according to the locations of subtargets. The target tracking process can be described as follows.

Step 1. Set the position coordinates of the th pixel in the th subtarget region as , , and the center position coordinates of the th subtarget as . The hue histogram of the th subtarget can be defined as where is the hue levels. is the probability distribution of the hue, represents the characteristic value of the hue quantified of the pixel , and is the Kronecker delta function defined as is the Gaussian basis functions defined as where is the bandwidth of the Gaussian basis functions.

Step 2. The probability projection image of the search window of each subtarget can be built by the back-projection of the hue histogram. Set to be the hue characteristic value of the pixel in the search window of the th subtarget. The gray value of the pixel in the probability projection image can be defined as where the symbol represents the rounding operation. The larger the gray value of the pixel in the probability projection image, the larger the probability that the pixel belongs to the target area.

Step 3. Zero and first order moments of the search window can be calculated as

Step 4. The center of mass of the search window can be calculated as

Step 5. Adaptively adjust the size of the search window as
Move the center of the search window of the subtarget to its center of mass. Repeat Step 2 to Step 5 until the shift distance is less than the threshold.

Step 6. Judge the location correctness of the subtargets.
Suppose that the coordinate is the located position of the subtarget . The new subtarget set in the scene can be defined as

The new distance correlation matrix of the subtarget set in the scene can be defined as

The correctness of the subtargets located can be judged according to the difference between the two distance correlation matrixes and . Firstly, centers ; and can be got by and as follows:

Secondly, based on the centers above, kernel functions can be defined as where , , and .

The sum function of the kernel functions above is defined as

The position corresponding to the maximum of the sum function is determined as the kernel center , and the judgment matrix is defined as where where is the center threshold. If all elements in the row are zero, that is, , , it can be determined that the subtarget is located wrongly.

Step 7. Correct the error locations of subtargets.
If the error location subtargets exist, they can be corrected by the subtargets located correctly. Suppose that there are subtargets located correctly, the center coordinates of the subtarget is (, ), where , and the center coordinates corrected of the error location subtarget is . The distance between them can be defined as
The evaluation function can be defined as
The best corrected center coordinates are determined where the evaluation function above is minimized.

Step 8. Adjust the direction angle and size of the subtargets located correctly.
Second order moments of the search window can be calculated as Thus,
The major axis of the ellipse area of the subtarget is calculated as where
The minor axis of the ellipse area of the subtarget is calculated as where
The direction angle of the ellipse area of the subtarget is calculated as where
The direction angles and sizes of the error location subtargets are not adjusted.

Step 9. Return to Step 1 to track the target in the next scene.
In (17), is the ratio of the distances and . When all the subtargets are located correctly, all the ratios , ; ; and are equal for the geometric similarity regardless of whether the scale of the target varies or not. In other words, the kernel center represents the geometric similarity ratio of the targets in the template image and the scene image. When the difference between and is large, at least one subtarget, or , may not be located correctly. The corrected position of the subtarget located wrongly corresponds to the minimization result of the evaluation function . So, in (17)–(21), the ratio of the relative distances between the subtargets in the target template and the scene is used to judge the correctness of the tracking results, which can ensure that the tracking strategy is also effective to the scale variation of the target. Therefore, the tracking strategy has good adaptability to the translation, rotation, scaling, and occlusion of the target.

4. Simulation Results

The tracking strategy proposed in this paper is used to perform the target tracking in the complex circumstances of the target partial occlusion, the pose variation, and target scaling. The feasibility and effectiveness of the proposed method is shown by the comparative analysis with the basic CamShift algorithm and the improved CamShift algorithm based on multifeature fusion [18].

4.1. Tracking Experiment for the Target Occluded Partially

Figure 1 shows the initial template, the white ellipse, in the first frame scene. The rear elevation of the car is chosen as the initial template which contains two red taillights, one blue license plate, and one black rear window. The target partial occlusion occurs during the video sampling.

Comparison tracking results of different tracking methods are shown in Figures 2 to 5.

The tracking results shown in Figure 2 are obtained by the basic CamShift algorithm (Method 1). It is clear that the results located by the basic CamShift algorithm are inaccurate. Particularly, because the size of the target changes with the target motion, a lot of background information, which causes serious interference with the target location, is included in the target area.

The tracking results shown in Figure 3 are obtained by the CamShift algorithm based on multifeature fusion (Method 2). Multiple features can improve the location accuracy to some extent. But it still cannot restrain the interference of the occlusion, so large location error is inevitable in the simulation results above.

The tracking results shown in Figure 4 are obtained by our method but without the correction of the error location subtarget (Method 3). It cannot correctly locate the subtarget occluded, which leads to the location inaccuracy of the target.

The tracking results shown in Figure 5 are obtained by our method with the correction of the error location subtarget (Method 4). Though subtargets occluded exist in the tracking process, they can be located accurately according to other subtargets, which can ensure the accurate target location results. After targets move out of the occlusion region, all the subtargets go back to the normal tracking. Thus, the tracking result can be always right, and the high location accuracy can be obtained in the long-term tracking process.

Table 1 shows the center deviation between the target located and the real target. And Table 2 shows the percentage deviations of the major axis and minor axis of the ellipse target area. The center deviation is the distance between the located center and the real center of the target. And the percentage deviation of the major (minor) axis is the ratio of the absolute deviation between the major (minor) axis of the ellipse target area containing all the subtargets and the target template in the scene to the major (minor) axis of the target template in the scene.

The size of the scene image is . From the experimental data, whether according to the center deviation or the percentage deviation, the tracking strategy proposed in this paper has better tracking performance. The target center located is closer to the real target center, and the target area is certainly consistent with the target template.

4.2. Tracking Experiment for the Target with Pose Change

Figure 6 shows the initial template, the red ellipse, in the first frame scene. The rear elevation of the car is also chosen as the initial template which contains two red taillights, one blue license plate, and one black rear window. The pose of the target changes during the video sampling. The color of both the car body and its side window is the same as the target, which seriously disturbs the location results.

Comparison tracking results of different tracking methods are shown in Figures 7 to 10.

The tracking results shown in Figure 7 are obtained by the basic CamShift algorithm (Method 1). The tracking result is unstable because of the background interference. The target area contains a lot of background, which cause the inaccurate location of the target.

The tracking results shown in Figure 8 are obtained by the CamShift algorithm based on multifeature fusion (Method 2). Though the multiple features can restrain the interference of the background but cannot restrain the interference caused by the car body or its window. The car can be tracked, but the tracked areas are different from the initial target template. So the location accuracy is not satisfactory.

The tracking results shown in Figure 9 are obtained by our method but without the correction of the error location subtarget (Method 3). Some subtargets disappear from the scene because of the pose change of the car. The disappearing subtargets cannot be located correctly. For instance, because the left light disappears, in Figure 9(d), it is wrongly located to the right light, which causes the location error of the target.

The tracking results shown in Figure 10 are obtained by our method with the correction of the error location subtarget (Method 4). The disappearing subtargets can be located correctly according to the relative distances among all the subtargets, for instance, in Figure 10(d), which ensures the accuracy of the target tracking. When the disappearing subtargets reappear in the latter frame again, for instance, left light in Figures 10(e) and 10(f), they can be tracked normally. The target can always be tracked accurately.

Tables 3 and 4 show the center deviation and the percentage deviations of the major (minor) axis separately.

The size of the scene image is 640 × 480. From the experimental results, the same conclusion as the above can be got. The tracking strategy proposed in this paper has better tracking performance. Each subtarget has different distinguishing feature, which can prevent the single color to disturb the tracking results of all the subtargets. And though the occlusion, the change of the pose, and background can cause the error location of some subtargets, the relative distances among the subtargets can correct the error location subtargets, which ensure the good adaptivity of the tracking strategy. Therefore, the tracking strategy can be applied in some complex cases.

4.3. Tracking Experiment for the Target with the Scale Variation

Figure 11 shows the initial template, the red ellipse, in the first frame scene. The head is chosen as the initial template which contains eyes, mouth, and hair. The scale of the target varies and the partial occlusion occurs during the tracking process. Besides, the skin color regions outside of the template, for instance the neck, can disturb the location results.

Comparison tracking results of different tracking methods are shown in Figures 12 to 15.

The tracking results shown in Figure 12 are obtained by the basic CamShift algorithm (Method 1). The tracking results shown in Figure 13 are obtained by the CamShift algorithm based on multifeature fusion (Method 2). Because the skin color is the main color of the target, the tracking results obtained by the two methods above are about the same. Both the neck and the hand disturb the location results.

The tracking results shown in Figure 14 are obtained by our method but without the correction of the error location subtarget (Method 3). The correct tracking results can be obtained when only the scale of the target increases and no occlusion occurs. When the hand covers the mouth in Figures 14(e) and 14(f), error location occurs at the later stage of the target tracking.

The tracking results shown in Figure 15 are obtained by our method with the correction of the error location subtarget (Method 4). Though the scale of the target varies during the tracking process, the correct tracking results can still be obtained because the geometric similarity ratio of the subtargets is used to correct the location results. It is also proved that the method is adaptive to the scaling of the target.

Tables 5 and 6 show the center deviation and the percentage deviations of the major (minor) axis separately.

The size of the scene image is 640 × 480. From the experimental results, the location accuracies of method 1 and method 2 are similar because Camshift algorithm is suitable to track the target with a large number of the skin color regions. However, the two methods cannot distinguish the skin color regions in the background or the target, so the location error is big. Our method can correct the target position by the geometric similarity ratio of the subtargets, so the location accuracy can be improved.

4.4. Tracking Experiment for the Target from PETS 2001 Dataset

A target in the Camera1.mov from PETS 2001 Dataset is tracked by the methods above. Figure 16 shows the initial template, the red ellipse, in the first frame scene. The rear elevation of the car is still chosen as the initial template which contains two taillights, one license plate, and one rear window. The target tracking is performed in the overtaking process.

Comparison tracking results of different tracking methods are shown in Figures 17 to 20.

From Figure 17, most of colors in the target are similar to that of the background, so the basic CamShift algorithm is hard to track the target effectively. Likewise, from Figure 18, though the CamShift algorithm based on multifeature fusion can improve the tracking performance, it still fails at last. From Figure 19, it can be seen that two taillights and one license plate, which is significantly different from the background, can be tracked effectively. However, it is difficult to track the rear window which has the similar color to that of the road in the background. Therefore, the whole target cannot be located effectively. After the error correction, as shown in Figure 20, the target can be tracked correctly.

Table 7 shows the center deviation between the target located and the real target. And Table 8 shows the percentage deviations of the major axis and minor axis of the ellipse target area.

The size of the scene image is 640 × 480. From the experimental data, when the target can be decomposed into several subtargets with different saliency colors, the target tracking can be performed easily by the subtargets located correctly. Since the different subtarget can be a saliency subtarget in different background, the method proposed in this paper has better adaptability than others.

4.5. Comparisons on Computational Costs

The VS2010 is chosen as the environment for software development to perform the experiments above. The computer configuration is AMD A4-6219 APU with AMD Radeon R3 Graphics and 8 G RAM. Computational costs, the average tracking computational time of the single frame, are compared in Table 9.

Though the number of the subtargets tracked is increased in our method, the size of each subtarget is small. Therefore, from the experiments results, it can be seen that the computational cost of our method increases slightly. The computational cost on the error correction is little. Instead, the computational cost of the CamShift algorithm based on multifeature fusion is large because of the extraction of the fused features. Furthermore, the average tracking computational time of the single frame of every method above is less than 40 ms and that of our method is less than 25 ms. Therefore, all the methods above can meet the real-time requirement of the TV tracking system.

5. Conclusions

In order to meet the requirement of the target precise location and tracking in the complex cases, the tracking strategy based on the target decomposition is proposed. The target is decomposed into some subtargets with the distinguishing features, which are grouped into a subtarget set. Every subtarget can be located independently, and the relative distances among the subtargets can be used to correct the error location results, which can enhance the adaptability of the tracking method in the complex tracking cases. When the target is occluded partially or does some complex motion, such as the scaling, translation, rotation, and pose variation, the tracking strategy can still get better tracking performance.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This study is supported by the National Natural Science Foundation of China (Project nos. 61203302 and 61403277) and the Natural Science Foundation of Tianjin (Project no. 14JCYBJC18900).