Abstract

Object tracking using Mean Shift (MS) has been attracting considerable attention recently. In this paper, we try to deal with one of its shortcoming. Mean shift is designed to find local maxima for tracking objects. Therefore, in large target movement between two consecutive frames, the local and global modes are not the same as previous frames so that Mean Shift tracker may fail in tracking the desired object via localizing the global mode. To overcome this problem, a multibandwidth procedure is proposed to help conventional MS tracker reach the global mode of the density function using any staring points. This gradually smoothening procedure is called Multi Bandwidth Mean Shift (MBMS) which in fact smoothens the Kernel Function through a multiple kernel-based sampling procedure automatically. Since it is important for us to have less computational complexity for real-time applications, we try to decrease the number of iterations to reach the global mode. Based on our results, this proposed version of MS enables us to track an object with the same initial point much faster than conventional MS tracker.

1. Introduction

Using a kernel function as a density estimator are methods in image processing which drew much attention. Object tracking [17] with MS is a nonparametric technique, introduced in [810]. Actually, it is a mode detector algorithm in the density distribution space. This method assigns weights to the pixels within a window. It is proposed that MS iteration step size can be adapted [9]. It also said MS and Newton algorithm are connected [11]. MS is utilized in object tracking as an alternative for particle filtering tracking [2, 12, 13]. The Kernel scale can also be updated [3]. To avoid changes in target representation, multiple kernel functions can be used in order to maintain the pixel location values [5, 7, 14]. Another way toward using a kernel function is similarity function [15]. The main problems with MS: The first is failure caused in rapid movement of an object in two consecutive frames; the second is that MS is a local optimizing technique, that is, we should not expect to localize the global and optimal mode; the third is slow tracking. In fact, MS searching procedure is initialized by the tracked object in the last frame. Flattening the optimization surface (created by windowing around the initial point) is used to handle sampling method through finding the global maximum [1618]. The multi-bandwidth kernel function can flatten the cost function (i.e., Likelihood Surface) to avoid local modes rather than global one. Through increasing the bandwidth of the kernel, all modes will be unified into one main mode whose peak location is the nearest point to the global optimal mode as we desired. Then, by using the convergence point of this mode of unified likelihood surface, the first step will be easily modified through MS iterations. The bandwidth determines the degree of flatness of the surface. In this way, the number and the position of modes are being evolved and changed slowly. Lower computational cost is more desirable in real-time tracking. Using nearest neighbor of a sample in MS iteration and kd-tree to reduce the number of nearest neighbors decrease the cost [1, 19]. The density distribution is described through clustering the feature space also to lessen the cost [20]. Quasi-Newton is also then put forward to linearize the rate of convergence [11, 21]. Many attempts have been made to enhance the efficient bound optimization algorithm [2224]. Our proposed method can detect an object globally. This technique can prevail over the recent methods and also extend the last technical efforts [25]. This paper is summarized as follows: an overview and reviewing analysis over MS is given in Section 2. The proposed Method will be elaborated along with enhancement of MS iterations in detecting the global mode of cost function in Section 3. Implementation of the proposed method and how faster this proposed method can function are revealed in Section 4. Using color as feature, automatic initialization, how to cope with background problem, and being a self-localizer are all stated with experimental results in Section 5. At the end, the conclusion comes in Section 6.

2. Mean Shift

The conventional and original version is a nonparametric kernel density estimator

According to (1), where is a kernel function (i.e., here Gaussian function) with 2D bandwidth as , which is equal to (i.e., identity matrix) [9, 10]. In this way, we can have kernel estimator as formulated through the following equation: is kernel function and its gradient is . Here is a vector created by MS procedure. These equations result in:

In this paper, it is proposed to use multiple bandwidths in a conventional mean shift tracker (i.e., a broad bandwidth tracking a larger motion). A broad bandwidth played the central role to help tracking a larger motion. Due to the smoothness incorporated by the large bandwidth, the fixed point iteration can track due to converging faster. It is argued that the bandwidths can be automatically obtained. However, it can be seen below the overall algorithm for choosing effect of the algorithm in detail. The algorithm is also shown and explained below. Since we are dealing with an automatic bandwidth selection, the optimal bandwidth is the main and final goal after smoothening likelihood surface (i.e., cost function). The optimal bandwidth is the bandwidth in which global mode and other local modes are clear enough to seek (i.e., initial state of likelihood surface) and in the same bandwidth seeking about the global mode will take place by MBMS (Multi-Bandwidth Mean Shift).

In fact, there are a variety of ways to select the optimal bandwidth for an automatic optimal bandwidth selecting procedure but in this automatic procedure, first we calculate the optimal bandwidth according to [26, 27] in order to have a minimum AMISE (i.e., Asymptotic Mean Integrated Square Error) as introduced in [26, 27], AMISE is a an estimate of distance between two different densities for evaluating the performance of a kernel density estimator [28]. Having found the optimal bandwidth, we mean to find a large bandwidth sufficiently enough through the proposed MB (i.e., Multi-Bandwidth) procedure. By sufficiently this large bandwidth, we desire a bandwidth which is much larger than the optimal bandwidth with the minimum AMISE.

This large bandwidth is meant to create a unimodal likelihood surface which is consisting of a single mode. This mode will be used by Mean Shift procedure in the first run. It does not matter whether this mode is global or not. Having found the final location of this mode using Mean Shift helps us achieve finally the global mode via the MB procedure in the optimal bandwidth (i.e., very last selected bandwidth) using MS iterations.

Having observed the above equations, therefore we can have now the optimal bandwidth at the last stage of MBMS, in which all modes especially global mode are clear. If we use three or four times as much as the optimal bandwidth as you can see in 1D and 2D likelihood surfaces (i.e., Cost Function) Figures 1 and 2, we will have a unimodal surface then. By detecting this lonely single mode, the closest point in the best basin of attraction, we can reach the real global mode in more stable way without facing any other local mode.

3. The Proposed Method

Monotonic decrease of bandwidth ends to h0 as the experimental optimal bandwidth. Figure 1 shows a one-dimensional Gaussian mixture with some modes which are all going to be unified and to change into a unimodal surface via the bandwidth increasing trend [29].With a multi-bandwidth process, it will lead to the evolution of modes as illustrated in Figure 1. In this way, seeking the global mode would not be trapped in the local modes. The algorithm and the procedure are very simple as you see in the Figure 1. The great advantage of this method is that the starting point location is not important at all. This paper is based specifically on Gaussian kernel due to its monotonicity [30]. The optimal bandwidth is h0. The Gaussian kernel can also be transformed to reduce the cost [29, 31, 32].

There is also a method to find the minima instead of maxima [33]. In order to find an important mode among others, variable selection of bandwidth is utilized [34]. In our work, we can guarantee the tracking with the proper and manual bandwidth selection but it also can be extended through automatic estimation of bandwidth using some features like the target region variance [4]. At the largest bandwidth, we have a unimodal surface. Via MS iteration the convergence at the largest bandwidth (unimodal) will be easily achieved as it illustrated in Figure 2. It is mainly argued that there are multiple bandwidths in the proposed method. However, it is shown how these multiple bandwidths collaborate during bandwidth selection procedure in MBMS (i.e., Multi-Bandwidth Mean Shift). Here, the strategy of the use of these multiple bandwidth selection is given below.

MBMS algorithm:(1)selecting the sequence of bandwidth as (i.e., Multi-bandwidth smoothening procedure),(2)a starting location for first MB (multi-bandwidth) procedure and converge using in which is using MS (Mean Shift),(3)we run MS for each (i.e., indicated in 1) to get the convergence position with the initial position , this means that the convergence position achieved in previous bandwidth. Finally, reaching means that we have located the global mode on the likelihood surface (i.e., Cost Function).

4. The Improved MS

Multi-bandwidth selecting and utilizing cost us some delay and computational burden. MS is proved to be a quadratic bound optimization according to [6, 9, 11]. This algorithm is an ascending approach with adaptable iteration size. In MS, no default parameter is needed to be set at first. It has been reasoned that a Gaussian mixture model can be optimized by fixed-point bound optimization method [35]. This method is also applicable to the surface created by other kernel functions [24]. Because of its speed, the bound optimization algorithms has a slow convergence [2224]. Suppose that the similarity function for two frames is defined with so that the adaptable step size can be determined as

is a learning rate as you see above. If we want a more reliable convergence, it should be as proved and generalized in [22, 24]. In our work, it is set , then it begins initializing and running MS iteratively. Gaussian kernel function is helping us because its gradient equals to the original function so that their application in weight assignment and flattening the surface with the same bandwidth is quite similar [28]. This analysis is validated by our experimental results.

4.1. Experimental Result

We run the two algorithms for the same data sets and insert the results in Table 1 and Figures 4 and 5. The starting point is initialized in different location. As you see the improved MS achieves better results than original MS and we can claim that it functions even faster than PF [30, 36]. The proposed algorithm outperforms MS in number and step-size of iterations. It can be also compared with the Quasi-Newton method implemented in [21, 3739]. This developed MS is implemented in MATLAB programming space and is run in an Asus notebook inside Vista windows Pentium 2 GHz CPU. The proposed MS is more different than conventional MS. In comparison with other linear bound fixed-point optimization algorithms such as Quasi-Newton [21], our proposed method is more accurate and robust. Because of the possibility of overshooting due to sufficiently large step-size occurrence this method can also be utilized and adopted in clustering through segmentation, since it is potential enough to gather patterns into a cluster as a global and local mode.

5. Tracking Application and Implementation

Implementing the algorithm and the experimental result on an object is represented in a normalized rectangle region as the target region. We choose color as feature in target model and target candidate region. As it is clear, we should measure the similarity between the two selected regions in two consecutive frames through similarity measurement criteria such as Bhattacharyya, Kullback-Leibler or Matusita, respectively in [2, 4, 5, 40, 41]. The pixel of rectangle region are being assigned weight according to their distance from the center of normalized circle surrounded in the rectangle, less weight and value they have through Gaussian function distribution. Via the weighted pixels, their colors in RGB scale will be extracted. Then the color would be sorted into m Bin histogram with their weight, consequently their value would be summed up in their R, G and B color bins of the histogram. This process is followed in both target model and candidate regions

and are the normalized density value of the target model and candidate. Each pixel value is converted into histogram bin via its color accompanied by its weight. The similar process is performed upon the both target model and candidate. With the initial point the tracking procedure started through approximating the distance between the place and location of the object in the two consecutive frames . Then the new location of object would be found and object is tracked

Using the Tailor expansion, the linear first-order extension helps us solve the optimization problem efficiently via MS iteration at the initial point through extending similarity [2, 5].

5.1. Location Approximation

Conventional MS cannot seek a global mode in presence of local mode due to the fixed-bandwidth which is created by rapid motion, illumination changes, clutter and occlusion as shown in Figure 3. In our experiment, it does not matter for improved MS the initial point to start. It can easily locate the target fast. In the examples, there may be different number of bandwidth but entirely they are 4 or 3 for all tracking sequences as clarified in Figure 3. We are using color as feature but of course it is possible to be taking advantage of other features as well like motion and orientation. If we do not use the multi-bandwidth procedure, most of the searches will end to local modes unless they are initially located at the closest basin points to the global mode in Figure 3. Using the broad bandwidth kernel function at first MS run will let us find this closest basin point near to the global mode.

5.2. Object Location and Track

Entirely all of the methods in object tracking, ever written and proposed could have covered some of the weak points in this field, but there are some common problems in all of them: All have problem in finding the object location in large distance movement between two successive frames. The starting-point for them all is very important to track the object correctly. Background problem such as clutter, occlusion, and illumination changes can completely influence the tracking path and cause failure exactly as illustrated in Figure 4. They are not really capable to self-recovering from their failure due to the same local modes existing in Figure 4(a). It failed in tracking as shown in Figure 4(b).

The conventional MS has all of these problems described above as well. In our work, we have enabled the tracker to be robust enough in different initial points in an image by considering the efficiency and efficacy of bandwidth variation of kernel function through adaptive step-size iteration. We are actually utilizing an object detector incorporated in localizing procedure to recover from any failure when occurred. It was also previously proposed to use a detector for Particle filtering tracking-based [42]. By choosing a broad bandwidth we try to pass other local modes to reach the basin of attraction for the desired global mode. Some implementation problems:

The multi-bandwidth tracker starts in a 3 or 4 bandwidth shifting iteration in an MS procedure. It is worth to say that through using color as feature so that there may be some unwanted modes created just because of the difference between two points color values [43] which may cause a local mode that cause the MS seeker to be trapped in its basin of attraction. Through a modification in color histogram value, we can increase the step-size adaptively. If we want to have faster mode seeking so in a trade off balance we may lose some accuracy as well. Recently it has been proposed that we can also add some accuracy with a little more time of computation [44]. We have to increase our computational cost due to larger bandwidth kernel windowing in a frame. As explained, this algorithm was applied to application of an object tracking with faster mode seeking results [36] as shown in Table 1(a) and 1(b).

The most important problem of the proposed method is that the series of bandwidth selection is manual, but we can be looking for some issues to be proposing an automatic selector of bandwidth using some features, but in this paper, we are using manually multi-bandwidth series to track correctly as illustrated in Figure 5.

Figure 6. In this circle tracking scenario, MS fails to track the circle due to getting distracted by the dotted square (i.e., a local mode) but in the same Frame(17) while the MBMS (i.e., the proposed method) is successfully tracking the circle in spite of square distraction in that frame. As can be seen in (Frame 26), MS tracker is unable to track successfully as the circle is moved in front of the square.

Figure 8. In this Bus tracking scenario, MS fails to track the Bus due to getting distracted by the similar color clutters (i.e., two local modes) while the MBMS (i.e., the proposed method) is successfully tracking the bus in spite of clutter distraction in these frames. As can be seen in MS tracker is unable to track successfully as the bus is moved in front of the same color clutters.

Figure 10. In this hand tracking scenario, MS (a) fails to track the hand due to getting distracted by the face (i.e., a local mode) but in the same frame (140) while the MBMS (i.e., the proposed method) (b) is successfully tracking the hand in spite of face distraction in that frame. As can be seen in ((a)-frame 260) MS tracker is unable to track successfully as the hand is moved in front of the face. Compared to the real GT (i.e., Ground Truth) in (c), we can observe that MBMS successfully tracks the hand through the entire sequence.

Definitely, we can observe that MBMS successfully performs better hand tracking through the entire sequence with lower number of iterations than MS.

At Table 2-Comparison of Number of Iterations for Convergence for 1D and 2D data set. The Initial Location For Each Run Is Shown in the Second Column.(1)Data set no. 1 (1D synthetic data). A total of 1000 data points are drown with equal probability from four normals: (3,1), (1,1), N(0,1), and (−2,1)(2)Data set no. 2 (2D synthetic data). A total of 1050 bivariate data points are drown with equal probability from three normals

6. Conclusion

A new kernel-based object tracking framework is proposed. The contribution is mainly the use of a prior large bandwidth for a priori tracking followed by the estimated tracking. This framework is robust to noise and clutters so that it can escape from many local maxima. This tracking algorithm (i.e., MBMS) can converge faster than does the conventional kernel-based object tracking (i.e., MS). However, there are still some problems, and some weaknesses which are to be later clarified and reparaphrased. Many results can be analyzed theoretically. This paper as a reference can be much helpful for later extension of this work. The experimental results above must have illustrated this approach performance. As shown in above database, it can also be concluded that in rapid motion of an object, large displacement between two adjacent frames occurs which will lead MS to a failure in tracking an object. By means of multi-bandwidth proposal, we can be improving MS in recovering from the failure by incorporating a detector in localization process called multi-bandwidth kernel functionality. In comparison with conventional MS and other techniques like [1] and fast transforming the Gaussian mixture [7, 29], the speed of convergence has increased and the number of iterations has decreased in contrast with an enhancement in each step-size iteration. Object tracking is an important issue in Artificial Intelligence. Its worldwide usages in robotic engineering, Machine Intelligence, Computer Vision, and Human-Computer Interface (HCI) are well-known throughout the world of engineering sciences. In the future, this method can be extended to the more automatic bandwidth selector equipped with several other features to track objects in many varieties of applications inside the industry.